Convolutional Neural Network in TensorFlow

翻译自Build a Convolutional Neural Network using Estimators

TensorFlow的layer模块提供了一个轻松构建神经网络的高端API，它提供了创建稠密（全连接）层和卷积层，添加激活函数，应用dropout regularization的方法。本教程将介绍如何使用layer来构建卷积神经网络来识别MNIST数据集中的手写数字。

MNIST数据集由60,000训练样例和10,000测试样例组成，全部都是0-9的手写数字，每个样例由28x28大小的图片构成。

Getting Started

首先来搭建TensorFlow程序的骨架，创建一个叫cnn_mnist.py的文件，并在其中添加下面代码：

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

# Imports
import numpy as np
import tensorflow as tf

tf.logging.set_verbosity(tf.logging.INFO)

# Our application logic will be added here

if __name__ == "__main__":
  tf.app.run()

下面的教程将指导如何在该文件中添加代码来构建、训练、评价卷积神经网络。最终完成代码可以在这里下载。

Intro to Convolutional Neural Networks

卷积神经网络（Convolutional Neural Networks, CNNs）是图像分类任务的主流架构，CNNs通过对图像的原始像素数据作用一系列的滤波来提取并学习高阶特征，然后模型使用该特征来进行分类。CNNs包含三个部分：

卷积层（Convolutional layers）：卷积层在图像上应用一定数目的卷积滤波。对于图像的每一个子区域，该层使用一些数学运算来产生输出特征图中的一个值。然后卷积层一般会继续对输出结果使用ReLU激活函数以在模型中引入非线性性。
池化层（Pooling layers）：池化层对卷积层提取的图像数据进行下采样，来减小特征图的维度以减少处理时间。一个广泛使用的池化算法是最大池化，最大池化提取特征图的一个子区域（如2x2的子像素块），只保留其最大值。
全连接层（Dense (fully connected) layers）：全连接层在通过卷积层和池化层处理后得到的特征中进行分类，在全连接层中，该层的每一个节点与下一层的每一个节点都有连接。

通常，一个卷积神经网络由一堆进行特征提取的卷积模块组成，每一个模块由一个卷积层，紧接着一个池化层组成，最后一个卷积模块后面接着一个或多个全连接层来进行分类。最后一个全连接层的每一个节点对应模型目标类别中的每一个分类，并借助一个softmax激活函数来为每一个节点产生一个0到1之间的值（所有节点值的和为1），可以借助softmax得到的值来解释目标图像落在每个类别中的相对概率。

Note：斯坦福大学的Convolutional Neural Networks for Visual Recognition课程资料有关于CNN架构的更详细的介绍。

Building the CNN MNIST Classifier

接下来使用下面的CNN架构来构建一个模型对MNIST数据集中的图像进行分类：

Convolutional Layer #1: 应用32个5x5的滤波（提取5x5的像素块），并使用ReLU激活函数
Pooling Layer #1: 使用最大池化，滤波大小为2x2，stride为2（使得被池化的区域不会重叠）
Convolutional Layer #2: 应用64个5x5的滤波，并使用ReLU激活函数
Pooling Layer #2: 同样，使用最大池化，滤波大小为2x2，stride为2
Dense Layer #1: 1024个神经元，dropout regularization rate为0.4（在训练的过程中每个元素有0.4的概率被丢弃）
Dense Layer #2 (Logits Layer): 10个神经元，每个代表数字的类别（0到9）

tf.layers模块提供了创建这三个神经网络层的方法：

conv2d(). 创建一个2维的卷积层，参数包括滤波个数，滤波核大小，padding，激活函数。
max_pooling2d(). 使用最大池化算法构建一个2维的池化层。参数包括池化滤波大小和strides。
dense(). 构建一个全连接层，参数包括神经元的个数和激活函数。

这些方法的输入都是一个张量（Tensor），输出是一个变换的张量，这也使得层与层之间的连接变得简单，即只需将一层的输出当作下一层的输入。

打开cnn_mnist.py文件并添加下面符合TensorFlow的Estimator API接口的cnn_model_fn函数。cnn_mnist.py将MINIST特征数据、标记和模型模式（TRAIN，EVAL，PREDICT）作为参数，配置CNN，然后返回预测、损失和一个训练操作：

def cnn_model_fn(features, labels, mode):
  """Model function for CNN."""
  # Input Layer
  input_layer = tf.reshape(features["x"], [-1, 28, 28, 1])

  # Convolutional Layer #1
  conv1 = tf.layers.conv2d(
      inputs=input_layer,
      filters=32,
      kernel_size=[5, 5],
      padding="same",
      activation=tf.nn.relu)

  # Pooling Layer #1
  pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)

  # Convolutional Layer #2 and Pooling Layer #2
  conv2 = tf.layers.conv2d(
      inputs=pool1,
      filters=64,
      kernel_size=[5, 5],
      padding="same",
      activation=tf.nn.relu)
  pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2)

  # Dense Layer
  pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64])
  dense = tf.layers.dense(inputs=pool2_flat, units=1024, activation=tf.nn.relu)
  dropout = tf.layers.dropout(
      inputs=dense, rate=0.4, training=mode == tf.estimator.ModeKeys.TRAIN)

  # Logits Layer
  logits = tf.layers.dense(inputs=dropout, units=10)

  predictions = {
      # Generate predictions (for PREDICT and EVAL mode)
      "classes": tf.argmax(input=logits, axis=1),
      # Add `softmax_tensor` to the graph. It is used for PREDICT and by the
      # `logging_hook`.
      "probabilities": tf.nn.softmax(logits, name="softmax_tensor")
  }

  if mode == tf.estimator.ModeKeys.PREDICT:
    return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)

  # Calculate Loss (for both TRAIN and EVAL modes)
  loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)

  # Configure the Training Op (for TRAIN mode)
  if mode == tf.estimator.ModeKeys.TRAIN:
    optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
    train_op = optimizer.minimize(
        loss=loss,
        global_step=tf.train.get_global_step())
    return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)

  # Add evaluation metrics (for EVAL mode)
  eval_metric_ops = {
      "accuracy": tf.metrics.accuracy(
          labels=labels, predictions=predictions["classes"])}
  return tf.estimator.EstimatorSpec(
      mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)

下面的章节将更加详细的介绍创建每一层的tf.layers代码，还有怎样计算损失，配置训练操作，生成预测。熟悉CNN的可以直接跳到Training and Evaluating the CNN MNIST Classifier章节.

Input Layer

在layers模块中创建二维图像数据的卷积和池化层的方法期望输入张量默认的形状是[batch_size, image_height, image_width, channels]，这个行为可以通过使用data_format参数来改变，

batch_size.在训练中进行梯度下降时使用的样例子集的大小。
image_height.样例图像的高度。
image_width. 样例图像的宽度。
channels. 样例图像的颜色通道数。对于彩色图像，通道数是3(red, green, blue)，对于黑白图像monochrome images，通道数是1(black)。
data_format. 一个字符串，channels_last和channels_first中的一个值，默认是channels_last，其中channels_last对应输入形状(batch, ..., channels)，channels_first对应输入形状(batch, channels, ...)。

在这个例子中，MNIST数据集由28x28的黑白图像组成，所以输入层的目标形状是[batch_size, 28, 28, 1]。为了将输入的特征图转化为这个形状，可以使用下面的reshape操作：

1	input_layer = tf.reshape(features["x"], [-1, 28, 28, 1])

注意这里将batch size指定为-1，表示这个维度将根据features["x"]中输入值的个数来动态计算，其他维度的大小均设置为常量。这样可以将batch_size当作一个可以调节的超参数。例如，如果按照5个batches传递样例，features["x"]将包含3,920个值，输入层的形状是[5, 28, 28, 1]。同样，如果传递100个batches的样例，features["x"]将包含78,400个值，输入层的形状是[100, 28, 28, 1]。

Convolutional Layer #1

在第一个卷积层，这里对输入层使用32个5x5的滤波，并使用ReLU激活函数，可以使用layers模块中的conv2d()方法来创建该层：

conv1 = tf.layers.conv2d(
    inputs=input_layer,
    filters=32,
    kernel_size=[5, 5],
    padding="same",
    activation=tf.nn.relu)

参数inputs指定输入张量，形状必须是[batch_size, image_height, image_width, channels]。这里将第一个卷积层与形状为[batch_size, 28, 28, 1]的输入层input_layer连接。
注意：当传入的参数data_format=channels_first时，conv2d()的输入张量形状必须是[batch_size, channels, image_height, image_width]。

参数filters指定使用滤波的数目，kernel_size通过[height, width]的形式指定滤波的维度，如果滤波的height和width的值相同，可以直接使用一个整数来设置kernel_size参数，如kernel_size=5。

参数padding指定两个枚举变量中的一个，valid或same，大小写不敏感，默认值为valid。如果需要输出张量与输入张量有相同的height和width值，设置padding=same，TensorFlow将在输入张量的边缘添加0值，以确保输出张量的height和width为28.（如果不设置为padding，在28x28的张量上进行5x5的卷积操作将产生一个24x24的张量，因为在28x28的格子上只有24x24个位置能够提取5x5的小块。）

参数activation指定作用在卷积输出张量上的激活函数，这里借助tf.nn.relu来指定ReLU激活函数。

这里通过conv2d()生成的输出张量的形状为[batch_size, 28, 28, 32]：跟输入有着同样的height和width维度，但是现在有32个通道，其来自于32个滤波。

Pooling Layer #1

下面将第一个池化层连接到刚刚创建的卷积层，这里使用layers中的max_pooling2d()方法来创建一层来进行滤波大小为2x2，stride为2的最大池化：

1	pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)

同样，inputs指定输入张量，形状是[batch_size, image_height, image_width, channels]，这里输入的张量是conv1，即第一个卷积层的输出，其形状为[batch_size, 28, 28, 32]。

注意：跟conv2d()一样，如果传入参数data_format=channels_first，max_pooling2d()也必须接受形状为[batch_size, channels, image_height, image_width]的张量。

参数pool_size指定最大池化滤波大小[height, width]，如果两个维度的值一样，可以直接用一个整数值代替，如pool_size=2。

参数strides指定stride的大小。这里将stride设为2，表示滤波提取的子区域在height和width方向应该相差2个像素间隔（对于2x2滤波，这意味着被提取的区域都不会有交叠）。如果想为height和width设置不同的stride值，可以通过指定一个元组或列表如stride=[3, 6]来实现。

由max_pooling2d()产生的输出张量pool1的形状为[batch_size, 14, 14, 32]：这里2x2的滤波将height和width分别减少了50%。

Convolutional Layer #2 and Pooling Layer #2

和之前一样，可以通过conv2d()和max_pooling2d()将第二个卷积层和池化层连接到已有的CNN上。对于第二个卷积层，这里使用64个5x5的滤波，并同样使用ReLU激活函数。对于第二个池化层，这里使用和第一个池化层同样的配置，即2x2的最大池化，stride为2。

conv2 = tf.layers.conv2d(
    inputs=pool1,
    filters=64,
    kernel_size=[5, 5],
    padding="same",
    activation=tf.nn.relu)

pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2)

注意第二个卷积层将第一个池化层的输出张量作为输入，并输出张量conv2，conv2的形状为[batch_size, 14, 14, 64]，和pool1具有同样的height和width（由于padding="same"），64个通道是由于有64个滤波作用。
第二个池化层将conv2作为输入，输出pool2，其形状为[batch_size, 7, 7, 64]。

Dense Layer

接下来，在当前CNN上添加一个全连接层（包括1,024个神经元，使用ReLU激活函数），以在前面卷积层和池化层提取的特征上做分类。在连接该层之前，需要将特征图pool2展开成[batch_size, features]的形状，这样该张量便只有两个维度：

1	pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64])

在上面的reshape()运算中，-1表示batch_size维度将根据输入数据的样例个数来动态计算，每个样例有7 (pool2 height) * 7 (pool2 width) * 64 (pool2 channels)个特征，因此特征的维度是7764（总共3136）。输出张量pool2_flat的形状为[batch_size, 3136]。

现在可以通过layers中的dense()方法来连接全连接层：

1	dense = tf.layers.dense(inputs=pool2_flat, units=1024, activation=tf.nn.relu)

参数inputs指定输入张量：展开的特征图pool2_flat。参数units指定全连接层中的神经元数目。参数activation指定激活函数，这里同样使用tf.nn.relu来添加ReLU激活函数。

为了改进模型的结果，这里对全连接层使用dropout正则化，使用layers中的dropout方法：

1 2	dropout = tf.layers.dropout( inputs=dense, rate=0.4, training=mode == tf.estimator.ModeKeys.TRAIN)

同样，inputs表示输入张量，这里是全连接层的输出。参数rate指定dropout rate，这里使用0.4，表示在训练过程中有40%的元素会被随机丢弃。参数training由一个布尔值指定当前是否是训练模式。dropout只在training是True的情况下使用。这里检查传入模型函数cnn_model_fn的模式是否是TRAIN模式。

输出的张量dropout的形状是[batch_size, 1024]。

Logits Layer

神经网络的最后一层是logits layer，该层将返回预测的原始值。这里创建一个有10个神经元（每个神经元表示0-9的目标类别）的全连接层，使用默认的线性激活函数：

1	logits = tf.layers.dense(inputs=dropout, units=10)

CNN最终的输出张量logits的形状为[batch_size, 10]。

Generate Predictions

模型的logits layer返回一个形状为[batch_size, 10]的张量作为预测的原始值，接下来将这些原始值转换为2种不同的格式使得模型函数能够返回：

每个样例预测的类别：0-9的一个数字
一个样例属于每个类别的概率，如某个样例是0的概率、1的概率等等。

给定一个类别，所预测的类别是logits张量所对应行中的最大值，可以通过tf.argmax函数找到该最大值的索引：

1	tf.argmax(input=logits, axis=1)

参数input指定需要提取最大值的张量，即logits。参数axis指定需要寻找最大值的输入张量的轴，这里需要沿着索引为1的维度，即对应我们的预测结果来找最大值（注意张量logits的形状为[batch_size, 10]）。

通过使用softmax函数tf.nn.softmax从logits layer得到概率：

1	tf.nn.softmax(logits, name="softmax_tensor")

注意：这里使用参数name来显示地命名这个运算为softmax_tensor，这样后面可以引用它。（后面要为softmax值设置记录（logging））

这里将预测结果编制进一个词典，返回一个EstimatorSpec对象：

predictions = {
    "classes": tf.argmax(input=logits, axis=1),
    "probabilities": tf.nn.softmax(logits, name="softmax_tensor")
}
if mode == tf.estimator.ModeKeys.PREDICT:
  return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)

Calculate Loss

对于训练和评价，都需要定义一个损失函数来估计模型预测的值与实际的目标类别的接近程度。对于多目标分类问题如MNIST，一般用交叉熵来作为损失度量。下面的代码计算模型在TRAIN或EVAL模式下的交叉熵：

1	loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)

张量labels包含样例预测索引的列表，如[1, 9, …]。logits包含最后一层的线性输出。函数tf.losses.sparse_softmax_cross_entropy从这两个输入以高效、数值稳定的方式计算softmax crossentropy，也叫categorical crossentropy或negative log-likelihood。

Configure the Training Op

在前面的小节中，已经将CNN的损失定义为logits层和已知labels的softmax cross-entropy，下面配置模型在训练过程中去优化该损失函数，这里将使用0.001的学习率、随机梯度下降法作为优化算法：

if mode == tf.estimator.ModeKeys.TRAIN:
  optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
  train_op = optimizer.minimize(
      loss=loss,
      global_step=tf.train.get_global_step())
  return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)

注意：需要更深入地了解为Estimator模型函数配置训练运算，参考Defining the training op for the model.

Add evaluation metrics

为了在模型中增加准确度量，这里在EVAL模式下定义一个eval_metric_ops字典：

eval_metric_ops = {
    "accuracy": tf.metrics.accuracy(
        labels=labels, predictions=predictions["classes"])}
return tf.estimator.EstimatorSpec(
    mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)

Training and Evaluating the CNN MNIST Classifier

MNIST CNN模型的函数已经完成，下面准备训练并评价该模型。

Load Training and Test Data

首先载入训练和测试数据，在cnn_mnist.py文件中添加一个main()函数：

def main(unused_argv):
  # Load training and eval data
  mnist = tf.contrib.learn.datasets.load_dataset("mnist")
  train_data = mnist.train.images # Returns np.array
  train_labels = np.asarray(mnist.train.labels, dtype=np.int32)
  eval_data = mnist.test.images # Returns np.array
  eval_labels = np.asarray(mnist.test.labels, dtype=np.int32)

这里将训练特征数据（55,000张手写数字图像的原始像素值）和训练标签（每张图像对应的从0到9的值）作为numpy arrays的形式分别存储在train_data和train_labels中。同样，评价特征数据（10,000张图像）和评价标签被分别存储在eval_data和eval_labels中。

Create the Estimator

接下来为模型创建一个Estimator（一个进行高端模型训练、评价和推断的TensorFlow类）。在 main()中添加如下代码：

1
2
3

# Create the Estimator
mnist_classifier = tf.estimator.Estimator(
    model_fn=cnn_model_fn, model_dir="/tmp/mnist_convnet_model")

参数model_fn指定用于训练、评价和预测的模型函数，这里传入前面创建的cnn_model_fn函数。参数model_dir指定模型数据(checkpoints)存储的路径。

Set Up a Logging Hook

由于CNNs需要花费一定的时间去训练，这里设置一些记录以能够追踪训练的过程。可以使用TensorFlow的tf.train.SessionRunHook创建一个tf.train.LoggingTensorHook来记录softmax layer得到的概率值。在main()中添加如下代码：

# Set up logging for predictions
tensors_to_log = {"probabilities": "softmax_tensor"}
logging_hook = tf.train.LoggingTensorHook(
    tensors=tensors_to_log, every_n_iter=50)

这里在tensors_to_log中存储想要记录的张量的字典，每个key是记录输出中要打印的标签，对应的标签是张量在TensorFlow图中的名字，这里probabilities可以在softmax_tensor中找到，前面在cnn_model_fn中生成概率时在softmax运算中指定的名字。

注意：如果不显示地通过name参数来给一个运算命名，TensorFlow会指定一个默认的名字。通过TensorBoard可视化运算图或者打开TensorFlow Debugger (tfdbg)可以发现每个运算的名字。

接下来创建LoggingTensorHook，并将tensors_to_log传入tensors参数，这里设置every_n_iter=50表示在训练中每50步输出一次probabilities。

Train the Model

接下来创建train_input_fn函数并在mnist_classifier中调用train()来准备训练模型。在main()中添加下面代码：

# Train the model
train_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": train_data},
    y=train_labels,
    batch_size=100,
    num_epochs=None,
    shuffle=True)
mnist_classifier.train(
    input_fn=train_input_fn,
    steps=20000,
    hooks=[logging_hook])

在numpy_input_fn的调用中，训练特征数据和标签分别被传入x（作为一个dict）和y。batch_size被设置为100，表示在每一步中模型训练100个样例批次。num_epochs=None表示模型会一直训练直到达到给定的步数。shuffle=True表示随机改组训练数据。
在train的调用中，设置steps=20000表示模型将会训练20,000步。将logging_hook传入参数hooks使得在训练的过程中其可以被触发。

Evaluate the Model

一旦训练完成，便可以计算其在MNIST测试集上的准确率来评价模型，这里调用evaluate方法来评价在model_fn的eval_metric_ops参数中指定的度量，在main()中添加如下代码：

# Evaluate the model and print results
eval_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": eval_data},
    y=eval_labels,
    num_epochs=1,
    shuffle=False)
eval_results = mnist_classifier.evaluate(input_fn=eval_input_fn)
print(eval_results)

为了创建eval_input_fn，设置num_epochs=1，这样模型可以在每个epoch评价度量并返回结果，这里设置shuffle=False以在数据中依次迭代。

Run the Model

到目前为止，CNN模型函数、Estimator、训练/评价逻辑都编码完成，接下来看看结果。运行cnn_mnist.py。

注意：训练CNNs非常耗时。cnn_mnist.py的估计完成时间取决于处理器，但在CPU上一般需要1小时以上。为了更快速地训练，可以减小传入train()的步数，但这会影响准确率。

随着模型训练，可以看到如下输出记录：

INFO:tensorflow:loss = 2.36026, step = 1
INFO:tensorflow:probabilities = [[ 0.07722801  0.08618255  0.09256398, ...]]
...
INFO:tensorflow:loss = 2.13119, step = 101
INFO:tensorflow:global_step/sec: 5.44132
...
INFO:tensorflow:Loss for final step: 0.553216.

INFO:tensorflow:Restored model from /tmp/mnist_convnet_model
INFO:tensorflow:Eval steps [0,inf) for training step 20000.
INFO:tensorflow:Input iterator is exhausted.
INFO:tensorflow:Saving evaluation summary for step 20000: accuracy = 0.9733, loss = 0.0902271
{'loss': 0.090227105, 'global_step': 20000, 'accuracy': 0.97329998}

可以看到最终在测试集上可以达到97.3%的准确率。

Additional Resources

To learn more about TensorFlow Estimators and CNNs in TensorFlow, see the following resources:

Creating Estimators in tf.estimator provides an introduction to the TensorFlow Estimator API. It walks through configuring an Estimator, writing a model function, calculating loss, and defining a training op.
Advanced Convolutional Neural Networks walks through how to build a MNIST CNN classification model without estimators using lower-level TensorFlow operations.

上次更新日期：七月 19, 2018

Reference

Build a Convolutional Neural Network using Estimators