Guide
version
- gcc 4.8.5/5.4.0
- g++ 4.8.5/5.4.0
- cmake 3.2.2
- nvidia driver 396.54 + cuda 9.2 + cudnn 7.1.4
- protobuf 3.4.0
install nvidia-docker2
see nvidia-docker2 guide on ubuntu 16.04
test
sudo docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
build and run
build
git clone https://github.com/PaddlePaddle/Anakin.git anakin
cd anakin/docker
./anakin_docker_build_and_run.sh -p NVIDIA-GPU -o Ubuntu -m Build
error occur with cudnn
. skip.
run
./anakin_docker_build_and_run.sh -p NVIDIA-GPU -o Ubuntu -m Run
compile anakin
sudo docker run -it --runtime=nvidia fdcda959f60a bin/bash
root@962077742ae9:/# cd Anakin/
git checkout developing
build
# 1. use script to build
./tools/gpu_build.sh
# 2. or you can build directly.
mkdir build
cd build
cmake ..
make -j8
x86 build
./tools/x86_build.sh
OK. no errors.
gpu build
./tools/gpu_build.sh
build errors occur. no cudnn
found.
compile anakin in host
install protobuf
install protobuf 3.4.0, see Part 1: compile protobuf-cpp on ubuntu 16.04
configure env
vim .bashrc
# cuda for anakin
export PATH=/usr/local/cuda/bin:$PATH
# CUDNN for anakin
export CUDNN_ROOT=/usr/local/cuda/
export LD_LIBRARY_PATH=${CUDNN_ROOT}/lib64:$LD_LIBRARY_PATH
export CPLUS_INCLUDE_PATH=${CUDNN_ROOT}/include:$CPLUS_INCLUDE_PATH
source .bashrc
build anakin
x86 build
git checkout developing
./tools/x86_build.sh
mv output x86_output
OK. no errors.
if error occurs, then
rm -rf CMakeFiles
rm -rf anakin/framework/model_parser/proto/*.h
rm output
chown -R kezunlin:kezunlin anakin
gpu build
./tools/gpu_build.sh
mv output gpu_output
gpu build with cmake
cd anakin
mkdir build
cd build && cmake-gui ..
anakin overview
用Anakin来进行前向计算主要分为三个步骤:
- 将外部模型通过Anakin Parser解析为Anakin模型
- 加载Anakin模型生成原始计算图,然后需要对原始计算图进行优化。
- Anakin会选择不同硬件平台执行计算图。
Tensor
Tensor
接受三个模板参数:
template<typename TargetType, DataType datatype, typename LayOutType = NCHW>
class Tensor .../* Inherit other class */{
//some implements
...
};
- TargetType是平台类型,如X86,GPU等等,在Anakin内部有相应的标识与之对应;
- datatype是普通的数据类型,在Anakin内部也有相应的标志与之对应;
- LayOutType是数据分布类型,如batch x channel x height x width [NxCxHxW], 在Anakin内部用一个struct来标识。 Anakin中数据类型与基本数据类型的对应如下:
TargetType
Anakin TargetType | platform |
---|---|
NV | NVIDIA GPU |
ARM | ARM |
AMD | AMD GPU |
X86 | X86 |
NVHX86 | NVIDIA GPU with Pinned Memory |
Anakin DataType | C++ | Description |
---|---|---|
AK_HALF | short | fp16 |
AK_FLOAT | float | fp32 |
AK_DOUBLE | double | fp64 |
AK_INT8 | char | int8 |
AK_INT16 | short | int16 |
AK_INT32 | int | int32 |
AK_INT64 | long | int64 |
AK_UINT8 | unsigned char | uint8 |
AK_UINT16 | unsigned short | uint8 |
AK_UINT32 | unsigned int | uint32 |
AK_STRING | std::string | / |
AK_BOOL | bool | / |
AK_SHAPE | / | Anakin Shape |
AK_TENSOR | / | Anakin Tensor |
LayOutType
Anakin LayOutType ( Tensor LayOut ) | Tensor Dimention | Tensor Support | Op Support |
---|---|---|---|
W | 1-D | YES | NO |
HW | 2-D | YES | NO |
WH | 2-D | YES | NO |
NW | 2-D | YES | YES |
NHW | 3-D | YES | YES |
NCHW ( default ) | 4-D | YES | YES |
NHWC | 4-D | YES | NO |
NCHW_C4 | 5-D | YES | YES |
理论上,Anakin支持申明1维以上的tensor,但是对于Anakin中的Op来说,只支持NW、NHW、NCHW、NCHW_C4这四种LayOut,其中NCHW是默认的LayOutType,NCHW_C4是专门针对于int8这种数据类型的。
Graph
Graph
类负责加载Anakin模型生成计算图、对图进行优化、存储模型等操作。
template<typename TargetType, DataType Dtype, Precision Ptype>
class Graph ... /* inherit other class*/{
//some implements
...
};
load
//some declarations
...
auto graph = new Graph<NV, AK_FLOAT, Precision::FP32>();
std::string model_path = "the/path/to/where/your/models/are";
const char *model_path1 = "the/path/to/where/your/models/are";
//Loading Anakin model to generate a compute graph.
auto status = graph->load(model_path);
//Or this way.
auto status = graph->load(model_path1);
//Check whether load operation success.
if(!status){
std::cout << "error" << endl;
//do something...
}
optimize
//some declarations
...
//Load graph.
...
//According to the ops of loaded graph, optimize compute graph.
graph->Optimize();
save
//some declarations
...
//Load graph.
...
// save a model
//save_model_path: the path to where your model is.
auto status = graph->save(save_model_path);
//Checking
if(!status){
cout << "error" << endl;
//do somethin...
}
Net
Net
是计算图的执行器,通过Net对象获得输入和输出。
template<typename TargetType, DataType Dtype, Precision PType, OpRunType RunType = OpRunType::ASYNC>
class Net{
//some implements
...
};
- Precision指定Op的精度。
- OpRunType表示同步或异步类型,异步是默认类型。OpRunType::SYNC表示同步,在GPU上只有单个流;OpRunType::ASYNC表示异步,在GPU上有多个流并以异步方式执行。
Precision
Precision | Op support |
---|---|
Precision::INT4 | NO |
Precision::INT8 | NO |
Precision::FP16 | NO |
Precision::FP32 | YES |
Precision::FP64 | NO |
现在Op的精度只支持FP32, 但在将来我们会支持剩下的Precision.
OpRunType
OpRunType | Sync/Aync | Description |
---|---|---|
OpRunType::SYNC | Synchronization | single-stream on GPU |
OpRunType::ASYNC | Asynchronization | multi-stream on GPU |
create a executor
//some declarations
...
//Create a pointer to a graph.
auto graph = new Graph<NV, AK_FLOAT, Precision::FP32>();
//do something...
...
//create a executor
Net<NV, AK_FLOAT, Precision::FP32> executor(*graph);
get input tensor
//some declaratinos
...
//create a executor
//TargetType is NV [NVIDIA GPU]
Net<NV, AK_FLOAT, Precision::FP32> executor(*graph);
//Get the first input tensor.
//The following tensors(tensor_in0, tensor_in2 ...) are resident at GPU.
//Note: Member function get_in returns an pointer to tensor.
Tensor<NV, AK_FLOAT>* tensor_in0 = executor.get_in("input_0");
//If you have multiple input tensors
//You just type this code below.
Tensor<NV, AK_FLOAT>* tensor_in1 = executor.get_in("input_1");
...
auto tensor_inn = executor.get_in("input_n");
fill input tensor
//This tensor is resident at GPU.
auto tensor_d_in = executor.get_in("input_0");
//If we want to feed above tensor, we must feed the tensor which is resident at host. And then copy the host tensor to the device's one.
//using Tensor4d = Tensor<Ttype, Dtype>;
Tensor4d<X86, AK_FLOAT> tensor_h_in; //host tensor;
//Tensor<X86, AK_FLOAT> tensor_h_in;
//Allocate memory for host tensor.
tensor_h_in.re_alloc(tensor_d_in->valid_shape());
//Get a writable pointer to tensor.
float *h_data = tensor_h_in.mutable_data();
//Feed your tensor.
/** example
for(int i = 0; i < tensor_h_in.size(); i++){
h_data[i] = 1.0f;
}
*/
//Copy host tensor's data to device tensor.
tensor_d_in->copy_from(tensor_h_in);
// And then
get output tensor
//Note: this tensor are resident at GPU.
Tensor<NV, AK_FLOAT>* tensor_out_d = executor.get_out("pred_out");
execute graph
executor.prediction();
code example
std::string model_path = "your_Anakin_models/xxxxx.anakin.bin";
// Create an empty graph object.
auto graph = new Graph<NV, AK_FLOAT, Precision::FP32>();
// Load Anakin model.
auto status = graph->load(model_path);
if(!status ) {
LOG(FATAL) << " [ERROR] " << status.info();
}
// Reshape
graph->Reshape("input_0", {10, 384, 960, 10});
// You must optimize graph for the first time.
graph->Optimize();
// Create a executer.
Net<NV, AK_FLOAT, Precision::FP32> net_executer(*graph);
//Get your input tensors through some specific string such as "input_0", "input_1", and
//so on.
//And then, feed the input tensor.
//If you don't know Which input do these specific string ("input_0", "input_1") correspond with, you can launch dash board to find out.
auto d_tensor_in_p = net_executer.get_in("input_0");
Tensor4d<X86, AK_FLOAT> h_tensor_in;
auto valid_shape_in = d_tensor_in_p->valid_shape();
for (int i=0; i<valid_shape_in.size(); i++) {
LOG(INFO) << "detect input dims[" << i << "]" << valid_shape_in[i]; //see tensor's dimentions
}
h_tensor_in.re_alloc(valid_shape_in);
float* h_data = h_tensor_in.mutable_data();
for (int i=0; i<h_tensor_in.size(); i++) {
h_data[i] = 1.0f;
}
d_tensor_in_p->copy_from(h_tensor_in);
//Do inference.
net_executer.prediction();
//Get result tensor through the name of output node.
//And also, you need to see the dash board again to find out how many output nodes are and remember their name.
//For example, you've got a output node named obj_pre_out
//Then, you can get an output tensor.
auto d_tensor_out_0_p = net_executer.get_out("obj_pred_out"); //get_out returns a pointer to output tensor.
auto d_tensor_out_1_p = net_executer.get_out("lc_pred_out"); //get_out returns a pointer to output tensor.
//......
// do something else ...
//...
//save model.
//You might not optimize the graph when you load the saved model again.
std::string save_model_path = model_path + std::string(".saved");
auto status = graph->save(save_model_path);
if (!status ) {
LOG(FATAL) << " [ERROR] " << status.info();
}
anakin converter
cd anakin/tools/external_converter_v2
sudo pip install flask prettytable
vim config.yaml
# ...
python converter.py
config.yaml
OPTIONS:
Framework: CAFFE
SavePath: ./output
ResultName: mylenet
Config:
LaunchBoard: ON
Server:
ip: 0.0.0.0
port: 8888
OptimizedGraph:
enable: OFF
path: ./anakin_optimized/lenet.anakin.bin.saved
LOGGER:
LogToPath: ./log/
WithColor: ON
TARGET:
CAFFE:
# path to proto files
ProtoPaths:
- /home/kezunlin/program/caffe/src/caffe/proto/caffe.proto
PrototxtPath: /home/kezunlin/program/caffe/examples/mnist/lenet.prototxt
ModelPath: /home/kezunlin/program/caffe/examples/mnist/lenet_iter_10000.caffemodel
FLUID:
# path of fluid inference model
Debug: NULL # Generally no need to modify.
ModelPath: /path/to/your/model/ # The upper path of a fluid inference model.
NetType: # Generally no need to modify.
LEGO:
# path to proto files
ProtoPath:
PrototxtPath:
ModelPath:
TENSORFLOW:
ProtoPaths: /
PrototxtPath: /
ModelPath: /
OutPuts:
ONNX:
ProtoPath:
PrototxtPath:
ModelPath:
- input: caffe.proto + lenet.prototxt + lenet_iter_10000.caffemodel
- output: output/mylenet.anakin.bin + log/xxx.log
anakin test
model_test.cpp
cat Anakin/test/framework/net/model_test.cpp
cd gpu_output
./unit_test/model_test '/home/kezunlin/program/anakin/demo/model/'
example_nv_cnn_net.cpp
cat Anakin/examples/cuda/example_nv_cnn_net.cpp
my example
my workspace
ls demo/
anakin_lib build cmake CMakeLists.txt image model src
tree demo/src/ demo/model/ demo/cmake demo/image
demo/src/
└── demo.cpp
demo/model/
└── mylenet.anakin.bin
demo/cmake
├── anakin-config.cmake
├── msg_color.cmake
├── statistic.cmake
└── utils.cmake
demo/image
├── big.jpg
└── cat.jpg
0 directories, 8 files
anakin_lib
use ./tools/gpu_build.sh
to generate gpu_build_sm61
and rename to anakin_lib
./tools/gpu_build.sh
# ...
mv gpu_build_sm61 anakin_lib
ls anakin_lib/
anakin_config.h libanakin_saber_common.so libanakin.so log unit_test
framework libanakin_saber_common.so.0.1.2 libanakin.so.0.1.2 saber utils
anakin-config.cmake
set(ANAKIN_FOUND TRUE) # auto
set(ANAKIN_VERSION 0.1.2)
set(ANAKIN_ROOT_DIR "/home/kezunlin/program/anakin/demo/anakin_lib")
set(ANAKIN_ROOT ${ANAKIN_ROOT_DIR})
set(ANAKIN_FRAMEWORK ${ANAKIN_ROOT}/framework)
set(ANAKIN_SABER ${ANAKIN_ROOT}/saber)
set(ANAKIN_UTILS ${ANAKIN_ROOT}/utils)
set(ANAKIN_FRAMEWORK_CORE ${ANAKIN_FRAMEWORK}/core)
set(ANAKIN_FRAMEWORK_GRAPH ${ANAKIN_FRAMEWORK}/graph)
set(ANAKIN_FRAMEWORK_LITE ${ANAKIN_FRAMEWORK}/lite)
set(ANAKIN_FRAMEWORK_MODEL_PARSER ${ANAKIN_FRAMEWORK}/model_parser)
set(ANAKIN_FRAMEWORK_OPERATORS ${ANAKIN_FRAMEWORK}/operators)
set(ANAKIN_SABER_CORE ${ANAKIN_SABER}/core)
set(ANAKIN_SABER_FUNCS ${ANAKIN_SABER}/funcs)
set(ANAKIN_SABER_LITE ${ANAKIN_SABER}/lite)
set(ANAKIN_UTILS_LOGGER ${ANAKIN_UTILS}/logger)
set(ANAKIN_UTILS_UINT_TEST ${ANAKIN_UTILS}/unit_test)
#find_path(ANAKIN_INCLUDE_DIR NAMES anakin_config.h PATHS "${ANAKIN_ROOT_DIR}")
mark_as_advanced(ANAKIN_INCLUDE_DIR) # show entry in cmake-gui
find_library(ANAKIN_SABER_COMMON_LIBRARY NAMES anakin_saber_common PATHS "${ANAKIN_ROOT_DIR}")
mark_as_advanced(ANAKIN_SABER_COMMON_LIBRARY) # show entry in cmake-gui
find_library(ANAKIN_LIBRARY NAMES anakin PATHS "${ANAKIN_ROOT_DIR}")
mark_as_advanced(ANAKIN_LIBRARY) # show entry in cmake-gui
# use xxx_INCLUDE_DIRS and xxx_LIBRARIES in CMakeLists.txt
set(ANAKIN_INCLUDE_DIRS
${ANAKIN_ROOT}
${ANAKIN_FRAMEWORK}
${ANAKIN_SABER}
${ANAKIN_UTILS}
${ANAKIN_FRAMEWORK_CORE}
${ANAKIN_FRAMEWORK_GRAPH}
${ANAKIN_FRAMEWORK_LITE}
${ANAKIN_FRAMEWORK_MODEL_PARSER}
${ANAKIN_FRAMEWORK_OPERATORS}
${ANAKIN_SABER_CORE}
${ANAKIN_SABER_FUNCS}
${ANAKIN_SABER_LITE}
${ANAKIN_UTILS_LOGGER}
${ANAKIN_UTILS_UINT_TEST}
)
set(ANAKIN_LIBRARIES ${ANAKIN_SABER_COMMON_LIBRARY} ${ANAKIN_LIBRARY} )
message( "anakin-config.cmake " ${ANAKIN_ROOT_DIR})
CMakeLists.txt
cmake_minimum_required(VERSION 2.8.8)
project(demo)
include(cmake/msg_color.cmake)
include(cmake/utils.cmake)
include(cmake/statistic.cmake)
#add_definitions( -Dshared_DEBUG) # define macro
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11")
set(ROOT_CMAKE_DIR ./cmake)
set(CMAKE_PREFIX_PATH ${CMAKE_PREFIX_PATH} "${ROOT_CMAKE_DIR};${CMAKE_PREFIX_PATH}")
MESSAGE( [cmake] " CMAKE_PREFIX_PATH = ${CMAKE_PREFIX_PATH} for find_package")
# Find includes in corresponding build directories
set(CMAKE_INCLUDE_CURRENT_DIR ON)
find_package(OpenCV REQUIRED COMPONENTS core highgui imgproc features2d calib3d)
include_directories(${OpenCV_INCLUDE_DIRS})
# find anakin-config.cmake file
#include(cmake/anakin-config.cmake)
find_package(ANAKIN REQUIRED)
include_directories(${ANAKIN_INCLUDE_DIRS})
#message( [opencv] ${OpenCV_INCLUDE_DIRS} )
#message( [opencv] ${OpenCV_LIBS} )
#message( [anakin] ${ANAKIN_INCLUDE_DIRS} )
#message( [anakin] ${ANAKIN_LIBRARIES} )
add_executable(${PROJECT_NAME}
src/demo.cpp
)
# dl pthread
# error with -std=c++11 -lpthread -ldl
target_link_libraries(${PROJECT_NAME}
dl
pthread
${OpenCV_LIBS}
${ANAKIN_LIBRARIES}
)
src/demo.cpp
edit from Anakin/examples/cuda/example_nv_cnn_net.cpp
#include <iostream>
using namespace std;
// opencv
#include <opencv2/core.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/highgui.hpp>
using namespace cv;
// anakin
#include "utils/logger/logger.h"
#include "framework/graph/graph.h"
#include "framework/core/net/net.h"
/*util to fill tensor*/
#include "saber/core/tensor_op.h"
using namespace anakin;
using namespace anakin::graph;
using namespace anakin::saber;
/*
+------------+-----------------+-------+-----------+
| Input Name | Shape | Alias | Data Type |
+------------+-----------------+-------+-----------+
| input_0 | [64, 1, 28, 28] | NULL | NULL |
+------------+-----------------+-------+-----------+
+-------------+
| Output Name |
+-------------+
| prob_out |
+-------------+
*/
int fill_tensor(Tensor4d<X86, AK_FLOAT>& h_tensor_in, const cv::Mat& image)
{
// write data to tensor
int height = image.rows;
int width = image.cols;
LOG(INFO)<<"height*width ="<< height*width <<std::endl; // 784
LOG(INFO)<<"h_tensor_in.size() ="<<h_tensor_in.size()<<std::endl; // 784
float* tensor_ptr = h_tensor_in.mutable_data(); // int, float or double.
const float* ptr;
for (int h = 0; h < height; ++h)
{
ptr = image.ptr<float>(h); // row ptr
for (int w = 0; w < width; ++w)
{
*tensor_ptr++ = *ptr++;
}
}
return 1;
}
int main(int argc, const char** argv) {
const char *model_path = "../model/mylenet.anakin.bin";
Mat image = imread("../image/cat.jpg",0);
cv::resize(image,image,Size(28,28));
//imshow("image",image);
//waitKey(0);
/*init graph object, graph is the skeleton of model*/
Graph<NV, AK_FLOAT, Precision::FP32> graph;
/*load model from file to init the graph*/
auto status = graph.load(model_path);
if (!status) {
LOG(FATAL) << " [ERROR] " << status.info();
}
/*set net input shape and use this shape to optimize the graph(fusion and init operator),shape is n,c,h,w*/
graph.Reshape("input_0", {1, 1, 28, 28});
graph.Optimize();
/*net_executer is the executor object of model. use graph to init Net*/
Net<NV, AK_FLOAT, Precision::FP32> net_executer(graph, true);
/*use input string to get the input tensor of net. for we use NV as target, the tensor of net_executer is on GPU memory*/
auto d_tensor_in_p = net_executer.get_in("input_0");
auto valid_shape_in = d_tensor_in_p->valid_shape();
/*create tensor located in host*/
Tensor4d<X86, AK_FLOAT> h_tensor_in;
/*alloc for host tensor*/
h_tensor_in.re_alloc(valid_shape_in);
/*init host tensor by random*/
//fill_tensor_host_rand(h_tensor_in, -1.0f, 1.0f);
image.convertTo(image, CV_32FC1); // faster
fill_tensor(h_tensor_in,image);
/*use host tensor to int device tensor which is net input*/
d_tensor_in_p->copy_from(h_tensor_in);
/*run infer*/
net_executer.prediction();
LOG(INFO)<<"infer finish";
/*get the out put of net, which is a device tensor*/
auto d_out=net_executer.get_out("prob_out");
/*create another host tensor, and copy the content of device tensor to host*/
Tensor4d<X86, AK_FLOAT> h_tensor_out;
h_tensor_out.re_alloc(d_out->valid_shape());
h_tensor_out.copy_from(*d_out);
/*show output content*/
for(int i=0;i<h_tensor_out.valid_size();i++){
LOG(INFO)<<"out ["<<i<<"] = "<<h_tensor_out.data()[i];
}
}
compile demo
mkdir build
cd build
cmake ..
make
./demo
output
ERR| 16:45:56.00581| 110838.067s| 37CBF8C0| operator_attr.h:94] you have set the argument: is_reverse , so it's igrored by anakin
ERR| 16:45:56.00581| 110838.067s| 37CBF8C0| operator_attr.h:94] you have set the argument: is_reverse , so it's igrored by anakin
0| 16:45:56.00681| 0.098s| 37CBF8C0| parser.cpp:96] graph name: LeNet
0| 16:45:56.00681| 0.099s| 37CBF8C0| parser.cpp:101] graph in: input_0
0| 16:45:56.00681| 0.099s| 37CBF8C0| parser.cpp:107] graph out: prob_out
0| 16:45:56.00742| 0.159s| 37CBF8C0| graph.cpp:153] processing in-ordered fusion : ConvBatchnormScaleReluPool
0| 16:45:56.00742| 0.160s| 37CBF8C0| graph.cpp:153] processing in-ordered fusion : ConvBatchnormScaleRelu
0| 16:45:56.00742| 0.160s| 37CBF8C0| graph.cpp:153] processing in-ordered fusion : ConvReluPool
0| 16:45:56.00742| 0.160s| 37CBF8C0| graph.cpp:153] processing in-ordered fusion : ConvBatchnormScale
0| 16:45:56.00742| 0.160s| 37CBF8C0| graph.cpp:153] processing in-ordered fusion : DeconvRelu
0| 16:45:56.00742| 0.160s| 37CBF8C0| graph.cpp:153] processing in-ordered fusion : ConvRelu
0| 16:45:56.00742| 0.160s| 37CBF8C0| graph.cpp:153] processing in-ordered fusion : PermutePower
0| 16:45:56.00742| 0.160s| 37CBF8C0| graph.cpp:153] processing in-ordered fusion : ConvBatchnorm
0| 16:45:56.00742| 0.160s| 37CBF8C0| graph.cpp:153] processing in-ordered fusion : EltwiseRelu
0| 16:45:56.00742| 0.160s| 37CBF8C0| graph.cpp:153] processing in-ordered fusion : EltwiseActivation
WAN| 16:45:56.00743| 0.160s| 37CBF8C0| net.cpp:663] Detect and initial 1 lanes.
0| 16:45:56.00743| 0.161s| 37CBF8C0| env.h:44] found 1 device(s)
0| 16:45:56.00743| 0.161s| 37CBF8C0| cuda_device.cpp:45] Device id: 0 , name: GeForce GTX 1060
0| 16:45:56.00743| 0.161s| 37CBF8C0| cuda_device.cpp:47] Multiprocessors: 10
0| 16:45:56.00743| 0.161s| 37CBF8C0| cuda_device.cpp:50] frequency:1733MHz
0| 16:45:56.00743| 0.161s| 37CBF8C0| cuda_device.cpp:52] CUDA Capability : 6.1
0| 16:45:56.00743| 0.161s| 37CBF8C0| cuda_device.cpp:54] total global memory: 6078MBytes.
WAN| 16:45:56.00743| 0.161s| 37CBF8C0| net.cpp:667] Current used device id : 0
WAN| 16:45:56.00744| 0.161s| 37CBF8C0| input.cpp:16] Parsing Input op parameter.
0| 16:45:56.00744| 0.161s| 37CBF8C0| input.cpp:19] |-- shape [0]: 1
0| 16:45:56.00744| 0.161s| 37CBF8C0| input.cpp:19] |-- shape [1]: 1
0| 16:45:56.00744| 0.161s| 37CBF8C0| input.cpp:19] |-- shape [2]: 28
0| 16:45:56.00744| 0.161s| 37CBF8C0| input.cpp:19] |-- shape [3]: 28
ERR| 16:45:56.00744| 0.161s| 37CBF8C0| net.cpp:210] node_ptr->get_op_name() sass not support yet.
ERR| 16:45:56.00744| 0.161s| 37CBF8C0| net.cpp:210] node_ptr->get_op_name() sass not support yet.
WAN| 16:45:57.00269| 0.686s| 37CBF8C0| context.h:40] device index exceeds the number of devices, set to default device(0)!
0| 16:45:57.00270| 0.687s| 37CBF8C0| net.cpp:300] Temp mem used: 0 MB
0| 16:45:57.00270| 0.687s| 37CBF8C0| net.cpp:301] Original mem used: 0 MB
0| 16:45:57.00270| 0.687s| 37CBF8C0| net.cpp:302] Model mem used: 1 MB
0| 16:45:57.00270| 0.687s| 37CBF8C0| net.cpp:303] System mem used: 153 MB
0| 16:45:57.00270| 0.687s| 37CBF8C0| demo.cpp:40] height*width =784
0| 16:45:57.00270| 0.687s| 37CBF8C0| demo.cpp:41] h_tensor_in.size() =784
0| 16:45:57.00270| 0.688s| 37CBF8C0| demo.cpp:105] infer finish
0| 16:45:57.00270| 0.688s| 37CBF8C0| demo.cpp:117] out [0] = 0
0| 16:45:57.00270| 0.688s| 37CBF8C0| demo.cpp:117] out [1] = 0
0| 16:45:57.00270| 0.688s| 37CBF8C0| demo.cpp:117] out [2] = 0
0| 16:45:57.00270| 0.688s| 37CBF8C0| demo.cpp:117] out [3] = 1
0| 16:45:57.00270| 0.688s| 37CBF8C0| demo.cpp:117] out [4] = 0
0| 16:45:57.00270| 0.688s| 37CBF8C0| demo.cpp:117] out [5] = 0
0| 16:45:57.00270| 0.688s| 37CBF8C0| demo.cpp:117] out [6] = 0
0| 16:45:57.00270| 0.688s| 37CBF8C0| demo.cpp:117] out [7] = 0
0| 16:45:57.00270| 0.688s| 37CBF8C0| demo.cpp:117] out [8] = 0
0| 16:45:57.00270| 0.688s| 37CBF8C0| demo.cpp:117] out [9] = 0
For Windows (skip)
version
- windows 10
- vs 2015
- cmake 3.2.2
- cuda 8.0 + cudnn 6.0.21 (same as caffe) sm_61
- protobuf 3.4.0
protobuf
see compile protobuf-cpp on windows 10
compile
#git clone https://github.com/PaddlePaddle/Anakin.git anakin
git clone https://github.com/kezunlin/Anakin.git anakin
cd anakin
mkdir build && cd build && cmake-gui ..
with options
CUDNN_ROOT "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/"
PROTOBUF_ROOT "C:/Program Files/protobuf"
BUILD_SHARED ON
USE_GPU_PLACE ON
USE_OPENMP OFF
USE_OPENCV ON
generate Anakin.sln
and compile with VS 2015
with x64 Release
mode.
error fixs
we get 101 errors, hard to fix.
skip now.
Reference
History
- 20180903: created.