0%

compile baidu anakin on ubuntu 16.04

Guide

version

  • gcc 4.8.5/5.4.0
  • g++ 4.8.5/5.4.0
  • cmake 3.2.2
  • nvidia driver 396.54 + cuda 9.2 + cudnn 7.1.4
  • protobuf 3.4.0

install nvidia-docker2

see nvidia-docker2 guide on ubuntu 16.04

test

1
sudo docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi

build and run

build

1
2
3
git clone https://github.com/PaddlePaddle/Anakin.git anakin
cd anakin/docker
./anakin_docker_build_and_run.sh -p NVIDIA-GPU -o Ubuntu -m Build

error occur with cudnn. skip.

run

1
./anakin_docker_build_and_run.sh  -p NVIDIA-GPU -o Ubuntu -m Run

compile anakin

1
2
3
sudo docker run -it --runtime=nvidia fdcda959f60a bin/bash
root@962077742ae9:/# cd Anakin/
git checkout developing

build

1
2
3
4
5
6
7
8
# 1. use script to build
./tools/gpu_build.sh

# 2. or you can build directly.
mkdir build
cd build
cmake ..
make -j8

x86 build

1
./tools/x86_build.sh

OK. no errors.

gpu build

1
./tools/gpu_build.sh

build errors occur. no cudnn found.

compile anakin in host

install protobuf

install protobuf 3.4.0, see Part 1: compile protobuf-cpp on ubuntu 16.04

configure env

vim .bashrc

1
2
3
4
5
6
7
# cuda for anakin
export PATH=/usr/local/cuda/bin:$PATH

# CUDNN for anakin
export CUDNN_ROOT=/usr/local/cuda/
export LD_LIBRARY_PATH=${CUDNN_ROOT}/lib64:$LD_LIBRARY_PATH
export CPLUS_INCLUDE_PATH=${CUDNN_ROOT}/include:$CPLUS_INCLUDE_PATH

source .bashrc

build anakin

x86 build

1
2
3
4
git checkout developing  
./tools/x86_build.sh

mv output x86_output

OK. no errors.

if error occurs, then

1
2
3
4
5
rm -rf CMakeFiles
rm -rf anakin/framework/model_parser/proto/*.h
rm output

chown -R kezunlin:kezunlin anakin

gpu build

1
2
./tools/gpu_build.sh
mv output gpu_output

gpu build with cmake

1
2
3
cd anakin
mkdir build
cd build && cmake-gui ..

anakin overview

anakin

用Anakin来进行前向计算主要分为三个步骤:

  1. 将外部模型通过Anakin Parser解析为Anakin模型
  2. 加载Anakin模型生成原始计算图,然后需要对原始计算图进行优化。
  3. Anakin会选择不同硬件平台执行计算图。

Tensor

Tensor接受三个模板参数:

 template<typename TargetType, DataType datatype, typename LayOutType = NCHW>
 class Tensor .../* Inherit other class */{
  //some implements
  ...
 };
  • TargetType是平台类型,如X86,GPU等等,在Anakin内部有相应的标识与之对应;
  • datatype是普通的数据类型,在Anakin内部也有相应的标志与之对应;
  • LayOutType是数据分布类型,如batch x channel x height x width [NxCxHxW], 在Anakin内部用一个struct来标识。 Anakin中数据类型与基本数据类型的对应如下:

TargetType

Anakin TargetType | platform
:—-: |
NV | NVIDIA GPU
ARM | ARM
AMD | AMD GPU
X86 | X86
NVHX86 | NVIDIA GPU with Pinned Memory

DataType

Anakin DataType C++ Description
AK_HALF short fp16
AK_FLOAT float fp32
AK_DOUBLE double fp64
AK_INT8 char int8
AK_INT16 short int16
AK_INT32 int int32
AK_INT64 long int64
AK_UINT8 unsigned char uint8
AK_UINT16 unsigned short uint8
AK_UINT32 unsigned int uint32
AK_STRING std::string /
AK_BOOL bool /
AK_SHAPE / Anakin Shape
AK_TENSOR / Anakin Tensor

LayOutType

Anakin LayOutType ( Tensor LayOut ) Tensor Dimention Tensor Support Op Support
W 1-D YES NO
HW 2-D YES NO
WH 2-D YES NO
NW 2-D YES YES
NHW 3-D YES YES
NCHW ( default ) 4-D YES YES
NHWC 4-D YES NO
NCHW_C4 5-D YES YES

理论上,Anakin支持申明1维以上的tensor,但是对于Anakin中的Op来说,只支持NW、NHW、NCHW、NCHW_C4这四种LayOut,其中NCHW是默认的LayOutType,NCHW_C4是专门针对于int8这种数据类型的。

Graph

Graph类负责加载Anakin模型生成计算图、对图进行优化、存储模型等操作。

template<typename TargetType, DataType Dtype, Precision Ptype>
class Graph ... /* inherit other class*/{

  //some implements
  ...

};

load

//some declarations
...
auto graph = new Graph<NV, AK_FLOAT, Precision::FP32>();
std::string model_path = "the/path/to/where/your/models/are";
const char *model_path1 = "the/path/to/where/your/models/are";

//Loading Anakin model to generate a compute graph.
auto status = graph->load(model_path);

//Or this way.
auto status = graph->load(model_path1);
//Check whether load operation success.
if(!status){
  std::cout << "error" << endl;
  //do something...
}

optimize

//some declarations
...
//Load graph.
...
//According to the ops of loaded graph, optimize compute graph.
graph->Optimize();

save

//some declarations
...
//Load graph.
...
// save a model
//save_model_path: the path to where your model is.
auto status = graph->save(save_model_path);

//Checking
if(!status){
  cout << "error" << endl;
  //do somethin...
}

Net

Net是计算图的执行器,通过Net对象获得输入和输出。

template<typename TargetType, DataType Dtype, Precision PType, OpRunType RunType = OpRunType::ASYNC>
class Net{
  //some implements
  ...

};
  • Precision指定Op的精度。
  • OpRunType表示同步或异步类型,异步是默认类型。OpRunType::SYNC表示同步,在GPU上只有单个流;OpRunType::ASYNC表示异步,在GPU上有多个流并以异步方式执行。

Precision

Precision Op support
Precision::INT4 NO
Precision::INT8 NO
Precision::FP16 NO
Precision::FP32 YES
Precision::FP64 NO

现在Op的精度只支持FP32, 但在将来我们会支持剩下的Precision.

OpRunType

OpRunType Sync/Aync Description
OpRunType::SYNC Synchronization single-stream on GPU
OpRunType::ASYNC Asynchronization multi-stream on GPU

create a executor

//some declarations
...
//Create a pointer to a graph.
auto graph = new Graph<NV, AK_FLOAT, Precision::FP32>();
//do something...
...

//create a executor
Net<NV, AK_FLOAT, Precision::FP32> executor(*graph);

get input tensor

//some declaratinos
...

//create a executor
//TargetType is NV [NVIDIA GPU]
Net<NV, AK_FLOAT, Precision::FP32> executor(*graph);

//Get the first input tensor.
//The following tensors(tensor_in0, tensor_in2 ...) are resident at GPU.
//Note: Member function get_in returns an pointer to tensor.
Tensor<NV, AK_FLOAT>* tensor_in0 = executor.get_in("input_0");

//If you have multiple input tensors
//You just type this code below.
Tensor<NV, AK_FLOAT>* tensor_in1 = executor.get_in("input_1");
...
auto tensor_inn = executor.get_in("input_n");

fill input tensor

//This tensor is resident at GPU.
auto tensor_d_in = executor.get_in("input_0");

//If we want to feed above tensor, we must feed the tensor which is resident at host. And then copy the host tensor to the device's one.

//using Tensor4d = Tensor<Ttype, Dtype>;
Tensor4d<X86, AK_FLOAT> tensor_h_in; //host tensor;
//Tensor<X86, AK_FLOAT> tensor_h_in; 

//Allocate memory for host tensor.
tensor_h_in.re_alloc(tensor_d_in->valid_shape());
//Get a writable pointer to tensor.
float *h_data = tensor_h_in.mutable_data();

//Feed your tensor.
/** example
for(int i = 0; i < tensor_h_in.size(); i++){
  h_data[i] = 1.0f;
}
*/
//Copy host tensor's data to device tensor.
tensor_d_in->copy_from(tensor_h_in);

// And then

get output tensor

//Note: this tensor are resident at GPU.
Tensor<NV, AK_FLOAT>* tensor_out_d = executor.get_out("pred_out");

execute graph

executor.prediction();

code example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
std::string model_path = "your_Anakin_models/xxxxx.anakin.bin";
// Create an empty graph object.
auto graph = new Graph<NV, AK_FLOAT, Precision::FP32>();
// Load Anakin model.
auto status = graph->load(model_path);
if(!status ) {
LOG(FATAL) << " [ERROR] " << status.info();
}
// Reshape
graph->Reshape("input_0", {10, 384, 960, 10});
// You must optimize graph for the first time.
graph->Optimize();
// Create a executer.
Net<NV, AK_FLOAT, Precision::FP32> net_executer(*graph);

//Get your input tensors through some specific string such as "input_0", "input_1", and
//so on.
//And then, feed the input tensor.
//If you don't know Which input do these specific string ("input_0", "input_1") correspond with, you can launch dash board to find out.
auto d_tensor_in_p = net_executer.get_in("input_0");
Tensor4d<X86, AK_FLOAT> h_tensor_in;
auto valid_shape_in = d_tensor_in_p->valid_shape();
for (int i=0; i<valid_shape_in.size(); i++) {
LOG(INFO) << "detect input dims[" << i << "]" << valid_shape_in[i]; //see tensor's dimentions
}
h_tensor_in.re_alloc(valid_shape_in);
float* h_data = h_tensor_in.mutable_data();
for (int i=0; i<h_tensor_in.size(); i++) {
h_data[i] = 1.0f;
}
d_tensor_in_p->copy_from(h_tensor_in);

//Do inference.
net_executer.prediction();

//Get result tensor through the name of output node.
//And also, you need to see the dash board again to find out how many output nodes are and remember their name.

//For example, you've got a output node named obj_pre_out
//Then, you can get an output tensor.
auto d_tensor_out_0_p = net_executer.get_out("obj_pred_out"); //get_out returns a pointer to output tensor.
auto d_tensor_out_1_p = net_executer.get_out("lc_pred_out"); //get_out returns a pointer to output tensor.
//......
// do something else ...
//...
//save model.
//You might not optimize the graph when you load the saved model again.
std::string save_model_path = model_path + std::string(".saved");
auto status = graph->save(save_model_path);
if (!status ) {
LOG(FATAL) << " [ERROR] " << status.info();
}

anakin converter

1
2
3
4
5
6
cd anakin/tools/external_converter_v2
sudo pip install flask prettytable

vim config.yaml
# ...
python converter.py

config.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
OPTIONS:
Framework: CAFFE
SavePath: ./output
ResultName: mylenet
Config:
LaunchBoard: ON
Server:
ip: 0.0.0.0
port: 8888
OptimizedGraph:
enable: OFF
path: ./anakin_optimized/lenet.anakin.bin.saved
LOGGER:
LogToPath: ./log/
WithColor: ON

TARGET:
CAFFE:
# path to proto files
ProtoPaths:
- /home/kezunlin/program/caffe/src/caffe/proto/caffe.proto
PrototxtPath: /home/kezunlin/program/caffe/examples/mnist/lenet.prototxt
ModelPath: /home/kezunlin/program/caffe/examples/mnist/lenet_iter_10000.caffemodel

FLUID:
# path of fluid inference model
Debug: NULL # Generally no need to modify.
ModelPath: /path/to/your/model/ # The upper path of a fluid inference model.
NetType: # Generally no need to modify.

LEGO:
# path to proto files
ProtoPath:
PrototxtPath:
ModelPath:

TENSORFLOW:
ProtoPaths: /
PrototxtPath: /
ModelPath: /
OutPuts:

ONNX:
ProtoPath:
PrototxtPath:
ModelPath:

  • input: caffe.proto + lenet.prototxt + lenet_iter_10000.caffemodel
  • output: output/mylenet.anakin.bin + log/xxx.log

anakin test

model_test.cpp

1
2
3
4
cat Anakin/test/framework/net/model_test.cpp

cd gpu_output
./unit_test/model_test '/home/kezunlin/program/anakin/demo/model/'

example_nv_cnn_net.cpp

1
cat Anakin/examples/cuda/example_nv_cnn_net.cpp

my example

my workspace

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
ls demo/
anakin_lib build cmake CMakeLists.txt image model src


tree demo/src/ demo/model/ demo/cmake demo/image
demo/src/
└── demo.cpp
demo/model/
└── mylenet.anakin.bin
demo/cmake
├── anakin-config.cmake
├── msg_color.cmake
├── statistic.cmake
└── utils.cmake
demo/image
├── big.jpg
└── cat.jpg

0 directories, 8 files

anakin_lib

use ./tools/gpu_build.sh to generate gpu_build_sm61 and rename to anakin_lib

1
2
3
4
5
6
7
8
./tools/gpu_build.sh
# ...

mv gpu_build_sm61 anakin_lib

ls anakin_lib/
anakin_config.h libanakin_saber_common.so libanakin.so log unit_test
framework libanakin_saber_common.so.0.1.2 libanakin.so.0.1.2 saber utils

anakin-config.cmake

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
set(ANAKIN_FOUND TRUE) # auto 
set(ANAKIN_VERSION 0.1.2)
set(ANAKIN_ROOT_DIR "/home/kezunlin/program/anakin/demo/anakin_lib")

set(ANAKIN_ROOT ${ANAKIN_ROOT_DIR})
set(ANAKIN_FRAMEWORK ${ANAKIN_ROOT}/framework)
set(ANAKIN_SABER ${ANAKIN_ROOT}/saber)
set(ANAKIN_UTILS ${ANAKIN_ROOT}/utils)


set(ANAKIN_FRAMEWORK_CORE ${ANAKIN_FRAMEWORK}/core)
set(ANAKIN_FRAMEWORK_GRAPH ${ANAKIN_FRAMEWORK}/graph)
set(ANAKIN_FRAMEWORK_LITE ${ANAKIN_FRAMEWORK}/lite)
set(ANAKIN_FRAMEWORK_MODEL_PARSER ${ANAKIN_FRAMEWORK}/model_parser)
set(ANAKIN_FRAMEWORK_OPERATORS ${ANAKIN_FRAMEWORK}/operators)

set(ANAKIN_SABER_CORE ${ANAKIN_SABER}/core)
set(ANAKIN_SABER_FUNCS ${ANAKIN_SABER}/funcs)
set(ANAKIN_SABER_LITE ${ANAKIN_SABER}/lite)

set(ANAKIN_UTILS_LOGGER ${ANAKIN_UTILS}/logger)
set(ANAKIN_UTILS_UINT_TEST ${ANAKIN_UTILS}/unit_test)

#find_path(ANAKIN_INCLUDE_DIR NAMES anakin_config.h PATHS "${ANAKIN_ROOT_DIR}")
mark_as_advanced(ANAKIN_INCLUDE_DIR) # show entry in cmake-gui

find_library(ANAKIN_SABER_COMMON_LIBRARY NAMES anakin_saber_common PATHS "${ANAKIN_ROOT_DIR}")
mark_as_advanced(ANAKIN_SABER_COMMON_LIBRARY) # show entry in cmake-gui

find_library(ANAKIN_LIBRARY NAMES anakin PATHS "${ANAKIN_ROOT_DIR}")
mark_as_advanced(ANAKIN_LIBRARY) # show entry in cmake-gui

# use xxx_INCLUDE_DIRS and xxx_LIBRARIES in CMakeLists.txt
set(ANAKIN_INCLUDE_DIRS
${ANAKIN_ROOT}
${ANAKIN_FRAMEWORK}
${ANAKIN_SABER}
${ANAKIN_UTILS}

${ANAKIN_FRAMEWORK_CORE}
${ANAKIN_FRAMEWORK_GRAPH}
${ANAKIN_FRAMEWORK_LITE}
${ANAKIN_FRAMEWORK_MODEL_PARSER}
${ANAKIN_FRAMEWORK_OPERATORS}

${ANAKIN_SABER_CORE}
${ANAKIN_SABER_FUNCS}
${ANAKIN_SABER_LITE}

${ANAKIN_UTILS_LOGGER}
${ANAKIN_UTILS_UINT_TEST}
)

set(ANAKIN_LIBRARIES ${ANAKIN_SABER_COMMON_LIBRARY} ${ANAKIN_LIBRARY} )

message( "anakin-config.cmake " ${ANAKIN_ROOT_DIR})

CMakeLists.txt

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
cmake_minimum_required(VERSION 2.8.8)

project(demo)

include(cmake/msg_color.cmake)
include(cmake/utils.cmake)
include(cmake/statistic.cmake)

#add_definitions( -Dshared_DEBUG) # define macro

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11")

set(ROOT_CMAKE_DIR ./cmake)
set(CMAKE_PREFIX_PATH ${CMAKE_PREFIX_PATH} "${ROOT_CMAKE_DIR};${CMAKE_PREFIX_PATH}")
MESSAGE( [cmake] " CMAKE_PREFIX_PATH = ${CMAKE_PREFIX_PATH} for find_package")

# Find includes in corresponding build directories
set(CMAKE_INCLUDE_CURRENT_DIR ON)

find_package(OpenCV REQUIRED COMPONENTS core highgui imgproc features2d calib3d)
include_directories(${OpenCV_INCLUDE_DIRS})

# find anakin-config.cmake file
#include(cmake/anakin-config.cmake)
find_package(ANAKIN REQUIRED)
include_directories(${ANAKIN_INCLUDE_DIRS})

#message( [opencv] ${OpenCV_INCLUDE_DIRS} )
#message( [opencv] ${OpenCV_LIBS} )
#message( [anakin] ${ANAKIN_INCLUDE_DIRS} )
#message( [anakin] ${ANAKIN_LIBRARIES} )

add_executable(${PROJECT_NAME}
src/demo.cpp
)

# dl pthread
# error with -std=c++11 -lpthread -ldl

target_link_libraries(${PROJECT_NAME}
dl
pthread
${OpenCV_LIBS}
${ANAKIN_LIBRARIES}
)

src/demo.cpp

edit from Anakin/examples/cuda/example_nv_cnn_net.cpp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
#include <iostream>
using namespace std;

// opencv
#include <opencv2/core.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/highgui.hpp>
using namespace cv;

// anakin
#include "utils/logger/logger.h"
#include "framework/graph/graph.h"
#include "framework/core/net/net.h"

/*util to fill tensor*/
#include "saber/core/tensor_op.h"
using namespace anakin;
using namespace anakin::graph;
using namespace anakin::saber;

/*
+------------+-----------------+-------+-----------+
| Input Name | Shape | Alias | Data Type |
+------------+-----------------+-------+-----------+
| input_0 | [64, 1, 28, 28] | NULL | NULL |
+------------+-----------------+-------+-----------+
+-------------+
| Output Name |
+-------------+
| prob_out |
+-------------+
*/

int fill_tensor(Tensor4d<X86, AK_FLOAT>& h_tensor_in, const cv::Mat& image)
{
// write data to tensor
int height = image.rows;
int width = image.cols;

LOG(INFO)<<"height*width ="<< height*width <<std::endl; // 784
LOG(INFO)<<"h_tensor_in.size() ="<<h_tensor_in.size()<<std::endl; // 784

float* tensor_ptr = h_tensor_in.mutable_data(); // int, float or double.

const float* ptr;
for (int h = 0; h < height; ++h)
{
ptr = image.ptr<float>(h); // row ptr
for (int w = 0; w < width; ++w)
{
*tensor_ptr++ = *ptr++;
}
}

return 1;
}

int main(int argc, const char** argv) {

const char *model_path = "../model/mylenet.anakin.bin";

Mat image = imread("../image/cat.jpg",0);
cv::resize(image,image,Size(28,28));
//imshow("image",image);
//waitKey(0);

/*init graph object, graph is the skeleton of model*/
Graph<NV, AK_FLOAT, Precision::FP32> graph;

/*load model from file to init the graph*/
auto status = graph.load(model_path);
if (!status) {
LOG(FATAL) << " [ERROR] " << status.info();
}

/*set net input shape and use this shape to optimize the graph(fusion and init operator),shape is n,c,h,w*/
graph.Reshape("input_0", {1, 1, 28, 28});
graph.Optimize();

/*net_executer is the executor object of model. use graph to init Net*/
Net<NV, AK_FLOAT, Precision::FP32> net_executer(graph, true);

/*use input string to get the input tensor of net. for we use NV as target, the tensor of net_executer is on GPU memory*/
auto d_tensor_in_p = net_executer.get_in("input_0");
auto valid_shape_in = d_tensor_in_p->valid_shape();

/*create tensor located in host*/
Tensor4d<X86, AK_FLOAT> h_tensor_in;

/*alloc for host tensor*/
h_tensor_in.re_alloc(valid_shape_in);

/*init host tensor by random*/
//fill_tensor_host_rand(h_tensor_in, -1.0f, 1.0f);

image.convertTo(image, CV_32FC1); // faster
fill_tensor(h_tensor_in,image);

/*use host tensor to int device tensor which is net input*/
d_tensor_in_p->copy_from(h_tensor_in);

/*run infer*/
net_executer.prediction();

LOG(INFO)<<"infer finish";

/*get the out put of net, which is a device tensor*/
auto d_out=net_executer.get_out("prob_out");

/*create another host tensor, and copy the content of device tensor to host*/
Tensor4d<X86, AK_FLOAT> h_tensor_out;
h_tensor_out.re_alloc(d_out->valid_shape());
h_tensor_out.copy_from(*d_out);

/*show output content*/
for(int i=0;i<h_tensor_out.valid_size();i++){
LOG(INFO)<<"out ["<<i<<"] = "<<h_tensor_out.data()[i];
}
}

compile demo

1
2
3
4
5
mkdir build
cd build
cmake ..
make
./demo

output

ERR| 16:45:56.00581| 110838.067s|         37CBF8C0| operator_attr.h:94]  you have set the argument: is_reverse , so it's igrored by anakin
 ERR| 16:45:56.00581| 110838.067s|         37CBF8C0| operator_attr.h:94]  you have set the argument: is_reverse , so it's igrored by anakin
   0| 16:45:56.00681| 0.098s|         37CBF8C0| parser.cpp:96] graph name: LeNet
   0| 16:45:56.00681| 0.099s|         37CBF8C0| parser.cpp:101] graph in: input_0
   0| 16:45:56.00681| 0.099s|         37CBF8C0| parser.cpp:107] graph out: prob_out
   0| 16:45:56.00742| 0.159s|         37CBF8C0| graph.cpp:153]  processing in-ordered fusion : ConvBatchnormScaleReluPool
   0| 16:45:56.00742| 0.160s|         37CBF8C0| graph.cpp:153]  processing in-ordered fusion : ConvBatchnormScaleRelu
   0| 16:45:56.00742| 0.160s|         37CBF8C0| graph.cpp:153]  processing in-ordered fusion : ConvReluPool
   0| 16:45:56.00742| 0.160s|         37CBF8C0| graph.cpp:153]  processing in-ordered fusion : ConvBatchnormScale
   0| 16:45:56.00742| 0.160s|         37CBF8C0| graph.cpp:153]  processing in-ordered fusion : DeconvRelu
   0| 16:45:56.00742| 0.160s|         37CBF8C0| graph.cpp:153]  processing in-ordered fusion : ConvRelu
   0| 16:45:56.00742| 0.160s|         37CBF8C0| graph.cpp:153]  processing in-ordered fusion : PermutePower
   0| 16:45:56.00742| 0.160s|         37CBF8C0| graph.cpp:153]  processing in-ordered fusion : ConvBatchnorm
   0| 16:45:56.00742| 0.160s|         37CBF8C0| graph.cpp:153]  processing in-ordered fusion : EltwiseRelu
   0| 16:45:56.00742| 0.160s|         37CBF8C0| graph.cpp:153]  processing in-ordered fusion : EltwiseActivation
 WAN| 16:45:56.00743| 0.160s|         37CBF8C0| net.cpp:663] Detect and initial 1 lanes.
   0| 16:45:56.00743| 0.161s|         37CBF8C0| env.h:44] found 1 device(s)
   0| 16:45:56.00743| 0.161s|         37CBF8C0| cuda_device.cpp:45] Device id: 0 , name: GeForce GTX 1060
   0| 16:45:56.00743| 0.161s|         37CBF8C0| cuda_device.cpp:47] Multiprocessors: 10
   0| 16:45:56.00743| 0.161s|         37CBF8C0| cuda_device.cpp:50] frequency:1733MHz
   0| 16:45:56.00743| 0.161s|         37CBF8C0| cuda_device.cpp:52] CUDA Capability : 6.1
   0| 16:45:56.00743| 0.161s|         37CBF8C0| cuda_device.cpp:54] total global memory: 6078MBytes.
 WAN| 16:45:56.00743| 0.161s|         37CBF8C0| net.cpp:667] Current used device id : 0
 WAN| 16:45:56.00744| 0.161s|         37CBF8C0| input.cpp:16] Parsing Input op parameter.
   0| 16:45:56.00744| 0.161s|         37CBF8C0| input.cpp:19]  |-- shape [0]: 1
   0| 16:45:56.00744| 0.161s|         37CBF8C0| input.cpp:19]  |-- shape [1]: 1
   0| 16:45:56.00744| 0.161s|         37CBF8C0| input.cpp:19]  |-- shape [2]: 28
   0| 16:45:56.00744| 0.161s|         37CBF8C0| input.cpp:19]  |-- shape [3]: 28
 ERR| 16:45:56.00744| 0.161s|         37CBF8C0| net.cpp:210] node_ptr->get_op_name()  sass not support yet.
 ERR| 16:45:56.00744| 0.161s|         37CBF8C0| net.cpp:210] node_ptr->get_op_name()  sass not support yet.
 WAN| 16:45:57.00269| 0.686s|         37CBF8C0| context.h:40] device index exceeds the number of devices, set to default device(0)!
   0| 16:45:57.00270| 0.687s|         37CBF8C0| net.cpp:300] Temp mem used:        0 MB
   0| 16:45:57.00270| 0.687s|         37CBF8C0| net.cpp:301] Original mem used:    0 MB
   0| 16:45:57.00270| 0.687s|         37CBF8C0| net.cpp:302] Model mem used:       1 MB
   0| 16:45:57.00270| 0.687s|         37CBF8C0| net.cpp:303] System mem used:      153 MB
   0| 16:45:57.00270| 0.687s|         37CBF8C0| demo.cpp:40] height*width =784
   0| 16:45:57.00270| 0.687s|         37CBF8C0| demo.cpp:41] h_tensor_in.size() =784
   0| 16:45:57.00270| 0.688s|         37CBF8C0| demo.cpp:105] infer finish
   0| 16:45:57.00270| 0.688s|         37CBF8C0| demo.cpp:117] out [0] = 0
   0| 16:45:57.00270| 0.688s|         37CBF8C0| demo.cpp:117] out [1] = 0
   0| 16:45:57.00270| 0.688s|         37CBF8C0| demo.cpp:117] out [2] = 0
   0| 16:45:57.00270| 0.688s|         37CBF8C0| demo.cpp:117] out [3] = 1
   0| 16:45:57.00270| 0.688s|         37CBF8C0| demo.cpp:117] out [4] = 0
   0| 16:45:57.00270| 0.688s|         37CBF8C0| demo.cpp:117] out [5] = 0
   0| 16:45:57.00270| 0.688s|         37CBF8C0| demo.cpp:117] out [6] = 0
   0| 16:45:57.00270| 0.688s|         37CBF8C0| demo.cpp:117] out [7] = 0
   0| 16:45:57.00270| 0.688s|         37CBF8C0| demo.cpp:117] out [8] = 0
   0| 16:45:57.00270| 0.688s|         37CBF8C0| demo.cpp:117] out [9] = 0

For Windows (skip)

version

  • windows 10
  • vs 2015
  • cmake 3.2.2
  • cuda 8.0 + cudnn 6.0.21 (same as caffe) sm_61
  • protobuf 3.4.0

protobuf

see compile protobuf-cpp on windows 10

compile

1
2
3
4
#git clone https://github.com/PaddlePaddle/Anakin.git anakin
git clone https://github.com/kezunlin/Anakin.git anakin
cd anakin
mkdir build && cd build && cmake-gui ..

with options

CUDNN_ROOT "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/"
PROTOBUF_ROOT "C:/Program Files/protobuf" 

BUILD_SHARED ON
USE_GPU_PLACE ON
USE_OPENMP OFF
USE_OPENCV ON

generate Anakin.sln and compile with VS 2015 with x64 Release mode.

error fixs

we get 101 errors, hard to fix.
skip now.

Reference

History

  • 20180903: created.