The design of LeNet contains the essence of CNNs that are still used in larger models such as the ones in ImageNet. In general, it consists of a convolutional layer followed by a pooling layer, another convolution layer followed by a pooling layer, and then two fully connected layers similar to the conventional multilayer perceptrons.
# The train/test net protocol buffer definition net: "examples/mnist/lenet_train_test.prototxt" # batch_size定义在net.prototxt中,train_mini_batch_size = 64,test_mini_batch_size = 100
# test_iter specifies how many forward passes the test should carry out. # In the case of MNIST, we have test batch size 100 and 100 test iterations, # covering the full 10,000 testing images. test_iter: 100 # test_iter = num_test_images/test_mini_batch_size = 10000/100 # Carry out testing every 500 training iterations. test_interval: 500 # The base learning rate, momentum and the weight decay of the network. base_lr: 0.01 momentum: 0.9 weight_decay: 0.0005 # The learning rate policy lr_policy: "inv" gamma: 0.0001 power: 0.75 # Display every 100 iterations display: 100 # The maximum number of iterations max_iter: 10000 # epoch = # snapshot intermediate results snapshot: 5000 snapshot_prefix: "examples/mnist/lenet" # solver mode: CPU or GPU solver_mode: GPU
EDIT HERE to try the fixed rate (and compare with adaptive solvers) fixed is the simplest policy that keeps the learning rate constant.
1
s.lr_policy = 'fixed'
Set lr_policy to define how the learning rate changes during training.
1 2 3 4 5
# Here, we 'step' the learning rate by multiplying it by a factor `gamma` # every `stepsize` iterations. s.lr_policy = 'step' s.gamma = 0.1 s.stepsize = 20000
solver types (todo…)
solver types include “SGD”, “Adam”, and “Nesterov” among others.
1
s.type = "SGD"
Train LeNet
1 2
cd$CAFFE_ROOT ./examples/mnist/train_lenet.sh
#!/usr/bin/env sh
set -e
./build/tools/caffe train --solver=examples/mnist/lenet_solver.prototxt $@
train output
I0807 16:15:29.555564 4273 solver.cpp:310] Iteration 10000, loss = 0.00251452
I0807 16:15:29.555619 4273 solver.cpp:330] Iteration 10000, Testing net (#0)
I0807 16:15:29.634243 4281 data_layer.cpp:73] Restarting data prefetching from start.
I0807 16:15:29.635372 4273 solver.cpp:397] Test net output #0: accuracy = 0.9909
I0807 16:15:29.635409 4273 solver.cpp:397] Test net output #1: loss = 0.0302912 (* 1 = 0.0302912 loss)
I0807 16:15:29.635416 4273 solver.cpp:315] Optimization Done.
I0807 16:15:29.635439 4273 caffe.cpp:259] Optimization Done.
Deploy model
for train, train_test.prototxt + solver.prototxt
for deploy, deploy.prototxt+ model.caffemodel
depoly: no weight_filler,bias_filler, loaded from weights.caffemodel. if not set weights file, w,b default to 0s
The Python interface – pycaffe – is the caffe module and its scripts in caffe/python. import caffe to load models, do forward and backward, handle IO, visualize networks, and even instrument model solving. All model data, derivatives, and parameters are exposed for reading and writing.
caffe.Net is the central interface for loading, configuring, and running models.
caffe.Classifier and caffe.Detector provide convenience interfaces for common tasks.
caffe.SGDSolver exposes the solving interface.
caffe.io handles input / output with preprocessing and protocol buffers.
caffe.draw visualizes network architectures.
Caffe blobs are exposed as numpy ndarrays for ease-of-use and efficiency.
Tutorial IPython notebooks are found in caffe/examples: do ipython notebook caffe/examples to try them. For developer reference docstrings can be found throughout the code.
Compile pycaffe by make pycaffe. Add the module directory to your $PYTHONPATH by export PYTHONPATH=/path/to/caffe/python:$PYTHONPATH or the like for import caffe.