[Deep Learning] 손글씨 인식 (MNIST)

AI Research Topic/Computer Vision Basics

[Deep Learning] 손글씨 인식 (MNIST)

꾸준희

|2017. 8. 25. 16:54

728x90

참고자료 : 신경망 첫걸음 (한빛미디어)

딥러닝에서의 Hello World는 손글씨 숫자 이미지를 인식하는 것이다.

MNIST 데이터셋

MNIST 데이터셋은 훈련용 55000개 및 테스트용 1만개로 이루어진 손글씨 숫자의 흑백 이미지 데이터이다. 이 데이터는 http://yann.lecun.com/exdb/mnist 에서 다운 받을 수 있다.

이미지를 다루는 경우에 데이터 전처리나 포매팅이 중요하지만, 이는 시간이 많이 걸리는 부분이다. 그러므로 이 데이터셋은 딥러닝을 시작하기에 안성맞춤이라고 한다.

이 흑백 이미지는 가로세로 비율은 그대로 유지하고 20x20 픽셀로 정규화 되어있다. 정규화 알고리즘 (가장 낮은 것에 맞춰 전체 이미지 해상도를 감소)에는 앤티에일리어싱 처리가 되어 이들 이미지에는 회색 픽셀이 들어있다. 그 다음 이미지의 중심을 계산하여 28x28 픽셀 크기의 프레임 중앙에 위치시킨다.

이 예제에서 사용할 머신러닝 방법은 지도학습이다. 이미지 데이터에는 그 이미지가 어떤 숫자인지를 나타내는 레이블 정보가 함께 들어있다.

이런 경우 먼저 레이블 데이터와 함께 전체 숫자 이미지를 로드한다. 그리고 훈련 과정 동안 학습 모델은 이미지를 입력받아 각 카테고리에 대한 점수를 원소로 갖는 벡터 형태로 결과를 출력한다.

훈련과정을 거쳐야만 레이블과 일치하는 카테고리가 가장 높은 점수를 갖는 결과가 나타난다.

하지만 훈련과정을 거치지 않으면 그런 일이 일어날일은 없기 때문에, 출력 점수와 기대 점수의 차이를 측정하는 오차함수를 계산해야한다.

학습 모델은 오차를 줄이기 위해 가중치 매개변수를 조정하게 되고, 전형적인 딥러닝 시스템에는 수억 개의 가중치 매개변수와 수억 개의 훈련용 레이블 데이터가 있다.

데이터셋을 쉽게 다운로드 하기 위해 input_data.py 를 사용 할 수 있다. 현재는 텐서플로우 코드베이스에 통합되어 있어서 따로 다운로드 할 필요 없이 임포트하여 사용하면 된다고 한다.

(임포트가 안되는 경우에는 코드 복붙)

아래는 input_data.py 의 내용이다.

# Copyright 2015 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

"""Functions for downloading and reading MNIST data."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import gzip
import os

import numpy
from six.moves import urllib
from six.moves import xrange  # pylint: disable=redefined-builtin
import tensorflow as tf

SOURCE_URL = 'http://yann.lecun.com/exdb/mnist/'


def maybe_download(filename, work_directory):
  """Download the data from Yann's website, unless it's already here."""
  if not tf.gfile.Exists(work_directory):
    tf.gfile.MakeDirs(work_directory)
  filepath = os.path.join(work_directory, filename)
  if not tf.gfile.Exists(filepath):
    filepath, _ = urllib.request.urlretrieve(SOURCE_URL + filename, filepath)
    with tf.gfile.GFile(filepath) as f:
      size = f.size()
    print('Successfully downloaded', filename, size, 'bytes.')
  return filepath


def _read32(bytestream):
  dt = numpy.dtype(numpy.uint32).newbyteorder('>')
  return numpy.frombuffer(bytestream.read(4), dtype=dt)[0]


def extract_images(filename):
  """Extract the images into a 4D uint8 numpy array [index, y, x, depth]."""
  print('Extracting', filename)
  with tf.gfile.Open(filename, 'rb') as f, gzip.GzipFile(fileobj=f) as bytestream:
    magic = _read32(bytestream)
    if magic != 2051:
      raise ValueError(
          'Invalid magic number %d in MNIST image file: %s' %
          (magic, filename))
    num_images = _read32(bytestream)
    rows = _read32(bytestream)
    cols = _read32(bytestream)
    buf = bytestream.read(rows * cols * num_images)
    data = numpy.frombuffer(buf, dtype=numpy.uint8)
    data = data.reshape(num_images, rows, cols, 1)
    return data


def dense_to_one_hot(labels_dense, num_classes=10):
  """Convert class labels from scalars to one-hot vectors."""
  num_labels = labels_dense.shape[0]
  index_offset = numpy.arange(num_labels) * num_classes
  labels_one_hot = numpy.zeros((num_labels, num_classes))
  labels_one_hot.flat[index_offset + labels_dense.ravel()] = 1
  return labels_one_hot


def extract_labels(filename, one_hot=False):
  """Extract the labels into a 1D uint8 numpy array [index]."""
  print('Extracting', filename)
  with tf.gfile.Open(filename, 'rb') as f, gzip.GzipFile(fileobj=f) as bytestream:
    magic = _read32(bytestream)
    if magic != 2049:
      raise ValueError(
          'Invalid magic number %d in MNIST label file: %s' %
          (magic, filename))
    num_items = _read32(bytestream)
    buf = bytestream.read(num_items)
    labels = numpy.frombuffer(buf, dtype=numpy.uint8)
    if one_hot:
      return dense_to_one_hot(labels)
    return labels


class DataSet(object):

  def __init__(self, images, labels, fake_data=False, one_hot=False,
               dtype=tf.float32):
    """Construct a DataSet.
    one_hot arg is used only if fake_data is true.  `dtype` can be either
    `uint8` to leave the input as `[0, 255]`, or `float32` to rescale into
    `[0, 1]`.
    """
    dtype = tf.as_dtype(dtype).base_dtype
    if dtype not in (tf.uint8, tf.float32):
      raise TypeError('Invalid image dtype %r, expected uint8 or float32' %
                      dtype)
    if fake_data:
      self._num_examples = 10000
      self.one_hot = one_hot
    else:
      assert images.shape[0] == labels.shape[0], (
          'images.shape: %s labels.shape: %s' % (images.shape,
                                                 labels.shape))
      self._num_examples = images.shape[0]

      # Convert shape from [num examples, rows, columns, depth]
      # to [num examples, rows*columns] (assuming depth == 1)
      assert images.shape[3] == 1
      images = images.reshape(images.shape[0],
                              images.shape[1] * images.shape[2])
      if dtype == tf.float32:
        # Convert from [0, 255] -> [0.0, 1.0].
        images = images.astype(numpy.float32)
        images = numpy.multiply(images, 1.0 / 255.0)
    self._images = images
    self._labels = labels
    self._epochs_completed = 0
    self._index_in_epoch = 0

  @property
  def images(self):
    return self._images

  @property
  def labels(self):
    return self._labels

  @property
  def num_examples(self):
    return self._num_examples

  @property
  def epochs_completed(self):
    return self._epochs_completed

  def next_batch(self, batch_size, fake_data=False):
    """Return the next `batch_size` examples from this data set."""
    if fake_data:
      fake_image = [1] * 784
      if self.one_hot:
        fake_label = [1] + [0] * 9
      else:
        fake_label = 0
      return [fake_image for _ in xrange(batch_size)], [
          fake_label for _ in xrange(batch_size)]
    start = self._index_in_epoch
    self._index_in_epoch += batch_size
    if self._index_in_epoch > self._num_examples:
      # Finished epoch
      self._epochs_completed += 1
      # Shuffle the data
      perm = numpy.arange(self._num_examples)
      numpy.random.shuffle(perm)
      self._images = self._images[perm]
      self._labels = self._labels[perm]
      # Start next epoch
      start = 0
      self._index_in_epoch = batch_size
      assert batch_size <= self._num_examples
    end = self._index_in_epoch
    return self._images[start:end], self._labels[start:end]


def read_data_sets(train_dir, fake_data=False, one_hot=False, dtype=tf.float32):
  class DataSets(object):
    pass
  data_sets = DataSets()

  if fake_data:
    def fake():
      return DataSet([], [], fake_data=True, one_hot=one_hot, dtype=dtype)
    data_sets.train = fake()
    data_sets.validation = fake()
    data_sets.test = fake()
    return data_sets

  TRAIN_IMAGES = 'train-images-idx3-ubyte.gz'
  TRAIN_LABELS = 'train-labels-idx1-ubyte.gz'
  TEST_IMAGES = 't10k-images-idx3-ubyte.gz'
  TEST_LABELS = 't10k-labels-idx1-ubyte.gz'
  VALIDATION_SIZE = 5000

  local_file = maybe_download(TRAIN_IMAGES, train_dir)
  train_images = extract_images(local_file)

  local_file = maybe_download(TRAIN_LABELS, train_dir)
  train_labels = extract_labels(local_file, one_hot=one_hot)

  local_file = maybe_download(TEST_IMAGES, train_dir)
  test_images = extract_images(local_file)

  local_file = maybe_download(TEST_LABELS, train_dir)
  test_labels = extract_labels(local_file, one_hot=one_hot)

  validation_images = train_images[:VALIDATION_SIZE]
  validation_labels = train_labels[:VALIDATION_SIZE]
  train_images = train_images[VALIDATION_SIZE:]
  train_labels = train_labels[VALIDATION_SIZE:]

  data_sets.train = DataSet(train_images, train_labels, dtype=dtype)
  data_sets.validation = DataSet(validation_images, validation_labels,
                                 dtype=dtype)
  data_sets.test = DataSet(test_images, test_labels, dtype=dtype)

  return data_sets

원래는 아래 코드를 실행하면 훈련 데이터가 들어있는 mnist.train과 테스트 데이터가 들어있는 mnist.test를 얻게 된다.

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

필자는 input_data.py 가 안불러와져서 input_data 코드를 복사해서 붙여넣기 하고, 다음과 같이 썼다.

mnist = read_data_sets("MNIST_data/", one_hot=True)

그러면 아래와 같이 다운이 된다.

Extracting MNIST_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz

이는 네트워크가 빠르지 않을 수 있어서 에러가 발생하면 몇 차례 재시도 하면 된다고 한다.





import tensorflow as tf

# 배열형태의 객체를 텐서플로우의 convert_to_tensor를 이용하여 텐서로 변환
# 그 다음 get_shape 함수 사용
tf.convert_to_tensor(mnist.train.images).get_shape()
# output 결과 :: TensorShape([Dimension(55000), Dimension(784)])

W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))

x = tf.placeholder("float", [None, 784])
y = tf.nn.softmax(tf.matmul(x,W) + b)

y_ = tf.placeholder("float", [None,10])
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

sess = tf.Session()
sess.run(tf.global_variables_initializer())



for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
    correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    if i % 100 == 0:
        print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

학습 시간은 10분 조금 넘게 소요

결과는 다음과 같이 나온다.

728x90

저작자표시 (새창열림)

'AI Research Topic > Computer Vision Basics' 카테고리의 다른 글

[Image Processing] Computer Vision 분류 (0)	2018.07.22
[Machine Vision] 머신비전이란 (0)	2018.02.11
[Deep Learning] 회귀분석 (Regression) 과 경사하강법 (Gradient Descent) (3)	2017.08.25
[Deep Learning] pre-training 과 fine-tuning (파인튜닝) (0)	2017.08.17
[Deep Learning] end-to-end trainable neural network (3)	2017.08.04

[Deep Learning] 손글씨 인식 (MNIST)

'AI Research Topic > Computer Vision Basics' 카테고리의 다른 글

티스토리툴바