Tutorial for Training LeNet on MNIST with Caffe

LeNet

The design of LeNet contains the essence of CNNs that are still used in larger models such as the ones in ImageNet. In general, it consists of a convolutional layer followed by a pooling layer, another convolution layer followed by a pooling layer, and then two fully connected layers similar to the conventional multilayer perceptrons.

经典LeNet结构:

input->conv1(20,)-pool1-conv2(50,)-pool2-f1(500,ReLU)-f2(10,softmax)->output

lenet_train_test.prototxt

batch size设置在net.prototxt中而不是solver.prototxt中,用以明确blob的dims
bottom: layer的input blob; top: layer的output blob
对于0-255区间的pixel，需要归一化到0-1区间，scale = 1/256. = 0.00390625
lr_mult: 1表示learning时，weight的learning rate需要x1;
lr_mult: 2表示learning时，bias的learning rate需要x2 (this usually leads to better convergence rates)
InnerProduct默认输出的是z,而不是a=sigmoid(z)
ReLU是Inplace操作，输入输出blob都是ip1,对于其他Layer,input和output的blob不能是相同的

Input Layer types

Input Layer types for train_val.prototxt

``python
solver.net.forward() # load mini-batch images from training data


**Data** 
```prototxt
name: "mnist"
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "examples/mnist/mnist_train_lmdb"
    batch_size: 64
    backend: LMDB
  }
}


name: "CaffeNet"
layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mirror: true
    crop_size: 227
    mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
  }
# mean pixel / channel-wise mean instead of mean image
#  transform_param {
#    crop_size: 227
#    mean_value: 104
#    mean_value: 117
#    mean_value: 123
#    mirror: true
#  }
  data_param {
    source: "examples/imagenet/ilsvrc12_train_lmdb"
    batch_size: 256
    backend: LMDB
  }
}

** ImageData**: read raw images.

layer {
  name: "data"
  type: "ImageData"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mirror: true
    crop_size: 227
    mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
  }
  image_data_param {
    source: "data/flickr_style/train.txt"
    batch_size: 50
    new_height: 256
    new_width: 256
  }
}

Input Layer types for deploy.prototxt

** DummyData **: no labels, only for forward and get probs

layer {
  name: "data"
  type: "DummyData"
  top: "data"
  dummy_data_param {
    shape {
      dim: 1
      dim: 3
      dim: 227
      dim: 227
    }
  }
}

** Input** : typically used for networks that are being deployed.

layer {
  name: "data"
  type: "Input"
  top: "data"
  input_param { 
  shape {
      dim: 10
      dim: 3
      dim: 227
      dim: 227
    }
}

LeNet train_val.prototxt

name: "LeNet"
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "examples/mnist/mnist_train_lmdb"
    batch_size: 64
    backend: LMDB
  }
}
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "examples/mnist/mnist_test_lmdb"
    batch_size: 100
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 50
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool2"
  top: "ip1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 500
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "ip1"
  top: "ip1"
}
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "ip2"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
  top: "loss"
}

LeNet solver.prototxt

# The train/test net protocol buffer definition
net: "examples/mnist/lenet_train_test.prototxt"
    
# batch_size定义在net.prototxt中,train_mini_batch_size = 64,test_mini_batch_size = 100

# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100 # test_iter = num_test_images/test_mini_batch_size = 10000/100
# Carry out testing every 500 training iterations.
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
# The learning rate policy
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 10000  # epoch = 
# snapshot intermediate results
snapshot: 5000
snapshot_prefix: "examples/mnist/lenet"
# solver mode: CPU or GPU
solver_mode: GPU

learning rate policy (todo…)

This is the same policy as our default LeNet.

1
2
3

s.lr_policy = 'inv'
s.gamma = 0.0001
s.power = 0.75

EDIT HERE to try the fixed rate (and compare with adaptive solvers)
fixed is the simplest policy that keeps the learning rate constant.

1	s.lr_policy = 'fixed'

Set lr_policy to define how the learning rate changes during training.

# Here, we 'step' the learning rate by multiplying it by a factor `gamma`
# every `stepsize` iterations.
s.lr_policy = 'step'
s.gamma = 0.1
s.stepsize = 20000

solver types (todo…)

solver types include “SGD”, “Adam”, and “Nesterov” among others.

1	s.type = "SGD"

Train LeNet

1 2	cd $CAFFE_ROOT ./examples/mnist/train_lenet.sh

#!/usr/bin/env sh
set -e

./build/tools/caffe train --solver=examples/mnist/lenet_solver.prototxt $@

train output

I0807 16:15:29.555564  4273 solver.cpp:310] Iteration 10000, loss = 0.00251452
I0807 16:15:29.555619  4273 solver.cpp:330] Iteration 10000, Testing net (#0)
I0807 16:15:29.634243  4281 data_layer.cpp:73] Restarting data prefetching from start.
I0807 16:15:29.635372  4273 solver.cpp:397]     Test net output #0: accuracy = 0.9909
I0807 16:15:29.635409  4273 solver.cpp:397]     Test net output #1: loss = 0.0302912 (* 1 = 0.0302912 loss)
I0807 16:15:29.635416  4273 solver.cpp:315] Optimization Done.
I0807 16:15:29.635439  4273 caffe.cpp:259] Optimization Done.

Deploy model

for train, train_test.prototxt + solver.prototxt
for deploy, deploy.prototxt+ model.caffemodel

depoly: no weight_filler,bias_filler, loaded from weights.caffemodel. if not set weights file, w,b default to 0s

PyCaffe

pycaffe interfaces

The Python interface – pycaffe – is the caffe module and its scripts in caffe/python. import caffe to load models, do forward and backward, handle IO, visualize networks, and even instrument model solving. All model data, derivatives, and parameters are exposed for reading and writing.

caffe.Net is the central interface for loading, configuring, and running models.
caffe.Classifier and caffe.Detector provide convenience interfaces for common tasks.
caffe.SGDSolver exposes the solving interface.
caffe.io handles input / output with preprocessing and protocol buffers.
caffe.draw visualizes network architectures.
Caffe blobs are exposed as numpy ndarrays for ease-of-use and efficiency.

Tutorial IPython notebooks are found in caffe/examples: do ipython notebook caffe/examples to try them. For developer reference docstrings can be found throughout the code.

Compile pycaffe by make pycaffe. Add the module directory to your $PYTHONPATH by export PYTHONPATH=/path/to/caffe/python:$PYTHONPATH or the like for import caffe.

Reference

History

20180807: created.