compile baidu anakin on ubuntu 16.04


Guide

version

  • gcc 4.8.5/5.4.0
  • g++ 4.8.5/5.4.0
  • cmake 3.2.2
  • nvidia driver 396.54 + cuda 9.2 + cudnn 7.1.4
  • protobuf 3.4.0

install nvidia-docker2

see nvidia-docker2 guide on ubuntu 16.04

test

sudo docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi

build and run

build

git clone https://github.com/PaddlePaddle/Anakin.git anakin
cd anakin/docker
./anakin_docker_build_and_run.sh -p NVIDIA-GPU -o Ubuntu -m Build

error occur with cudnn. skip.

run

./anakin_docker_build_and_run.sh  -p NVIDIA-GPU -o Ubuntu -m Run

compile anakin

sudo docker run -it --runtime=nvidia fdcda959f60a bin/bash
root@962077742ae9:/# cd Anakin/
git checkout developing

build

# 1. use script to build
./tools/gpu_build.sh

# 2. or you can build directly.
mkdir build  
cd build  
cmake ..  
make -j8

x86 build

./tools/x86_build.sh

OK. no errors.

gpu build

./tools/gpu_build.sh

build errors occur. no cudnn found.

compile anakin in host

install protobuf

install protobuf 3.4.0, see Part 1: compile protobuf-cpp on ubuntu 16.04

configure env

vim .bashrc

# cuda for anakin
export PATH=/usr/local/cuda/bin:$PATH

# CUDNN for anakin
export CUDNN_ROOT=/usr/local/cuda/
export LD_LIBRARY_PATH=${CUDNN_ROOT}/lib64:$LD_LIBRARY_PATH
export CPLUS_INCLUDE_PATH=${CUDNN_ROOT}/include:$CPLUS_INCLUDE_PATH

source .bashrc

build anakin

x86 build

git checkout developing  
./tools/x86_build.sh

mv output x86_output

OK. no errors.

if error occurs, then

rm -rf CMakeFiles
rm -rf anakin/framework/model_parser/proto/*.h
rm output

chown -R kezunlin:kezunlin anakin

gpu build

./tools/gpu_build.sh
mv output gpu_output

gpu build with cmake

cd anakin
mkdir build
cd build && cmake-gui ..

anakin overview

anakin

用Anakin来进行前向计算主要分为三个步骤:

  1. 将外部模型通过Anakin Parser解析为Anakin模型
  2. 加载Anakin模型生成原始计算图,然后需要对原始计算图进行优化。
  3. Anakin会选择不同硬件平台执行计算图。

Tensor

Tensor接受三个模板参数:

 template<typename TargetType, DataType datatype, typename LayOutType = NCHW>
 class Tensor .../* Inherit other class */{
  //some implements
  ...
 };
  • TargetType是平台类型,如X86,GPU等等,在Anakin内部有相应的标识与之对应;
  • datatype是普通的数据类型,在Anakin内部也有相应的标志与之对应;
  • LayOutType是数据分布类型,如batch x channel x height x width [NxCxHxW], 在Anakin内部用一个struct来标识。 Anakin中数据类型与基本数据类型的对应如下:

TargetType

Anakin TargetTypeplatform
NVNVIDIA GPU
ARMARM
AMDAMD GPU
X86X86
NVHX86NVIDIA GPU with Pinned Memory

DataType

Anakin DataTypeC++Description
AK_HALFshortfp16
AK_FLOATfloatfp32
AK_DOUBLEdoublefp64
AK_INT8charint8
AK_INT16shortint16
AK_INT32intint32
AK_INT64longint64
AK_UINT8unsigned charuint8
AK_UINT16unsigned shortuint8
AK_UINT32unsigned intuint32
AK_STRINGstd::string/
AK_BOOLbool/
AK_SHAPE/Anakin Shape
AK_TENSOR/Anakin Tensor

LayOutType

Anakin LayOutType ( Tensor LayOut )Tensor DimentionTensor SupportOp Support
W1-DYESNO
HW2-DYESNO
WH2-DYESNO
NW2-DYESYES
NHW3-DYESYES
NCHW ( default )4-DYESYES
NHWC4-DYESNO
NCHW_C45-DYESYES

理论上,Anakin支持申明1维以上的tensor,但是对于Anakin中的Op来说,只支持NW、NHW、NCHW、NCHW_C4这四种LayOut,其中NCHW是默认的LayOutType,NCHW_C4是专门针对于int8这种数据类型的。

Graph

Graph类负责加载Anakin模型生成计算图、对图进行优化、存储模型等操作。

template<typename TargetType, DataType Dtype, Precision Ptype>
class Graph ... /* inherit other class*/{

  //some implements
  ...

};

load

//some declarations
...
auto graph = new Graph<NV, AK_FLOAT, Precision::FP32>();
std::string model_path = "the/path/to/where/your/models/are";
const char *model_path1 = "the/path/to/where/your/models/are";

//Loading Anakin model to generate a compute graph.
auto status = graph->load(model_path);

//Or this way.
auto status = graph->load(model_path1);
//Check whether load operation success.
if(!status){
  std::cout << "error" << endl;
  //do something...
}

optimize

//some declarations
...
//Load graph.
...
//According to the ops of loaded graph, optimize compute graph.
graph->Optimize();

save

//some declarations
...
//Load graph.
...
// save a model
//save_model_path: the path to where your model is.
auto status = graph->save(save_model_path);

//Checking
if(!status){
  cout << "error" << endl;
  //do somethin...
}

Net

Net是计算图的执行器,通过Net对象获得输入和输出。

template<typename TargetType, DataType Dtype, Precision PType, OpRunType RunType = OpRunType::ASYNC>
class Net{
  //some implements
  ...

};
  • Precision指定Op的精度。
  • OpRunType表示同步或异步类型,异步是默认类型。OpRunType::SYNC表示同步,在GPU上只有单个流;OpRunType::ASYNC表示异步,在GPU上有多个流并以异步方式执行。

Precision

PrecisionOp support
Precision::INT4NO
Precision::INT8NO
Precision::FP16NO
Precision::FP32YES
Precision::FP64NO

现在Op的精度只支持FP32, 但在将来我们会支持剩下的Precision.

OpRunType

OpRunTypeSync/AyncDescription
OpRunType::SYNCSynchronizationsingle-stream on GPU
OpRunType::ASYNCAsynchronizationmulti-stream on GPU

create a executor

//some declarations
...
//Create a pointer to a graph.
auto graph = new Graph<NV, AK_FLOAT, Precision::FP32>();
//do something...
...

//create a executor
Net<NV, AK_FLOAT, Precision::FP32> executor(*graph);

get input tensor

//some declaratinos
...

//create a executor
//TargetType is NV [NVIDIA GPU]
Net<NV, AK_FLOAT, Precision::FP32> executor(*graph);

//Get the first input tensor.
//The following tensors(tensor_in0, tensor_in2 ...) are resident at GPU.
//Note: Member function get_in returns an pointer to tensor.
Tensor<NV, AK_FLOAT>* tensor_in0 = executor.get_in("input_0");

//If you have multiple input tensors
//You just type this code below.
Tensor<NV, AK_FLOAT>* tensor_in1 = executor.get_in("input_1");
...
auto tensor_inn = executor.get_in("input_n");

fill input tensor

//This tensor is resident at GPU.
auto tensor_d_in = executor.get_in("input_0");

//If we want to feed above tensor, we must feed the tensor which is resident at host. And then copy the host tensor to the device's one.

//using Tensor4d = Tensor<Ttype, Dtype>;
Tensor4d<X86, AK_FLOAT> tensor_h_in; //host tensor;
//Tensor<X86, AK_FLOAT> tensor_h_in; 

//Allocate memory for host tensor.
tensor_h_in.re_alloc(tensor_d_in->valid_shape());
//Get a writable pointer to tensor.
float *h_data = tensor_h_in.mutable_data();

//Feed your tensor.
/** example
for(int i = 0; i < tensor_h_in.size(); i++){
  h_data[i] = 1.0f;
}
*/
//Copy host tensor's data to device tensor.
tensor_d_in->copy_from(tensor_h_in);

// And then

get output tensor

//Note: this tensor are resident at GPU.
Tensor<NV, AK_FLOAT>* tensor_out_d = executor.get_out("pred_out");

execute graph

executor.prediction();

code example

std::string model_path = "your_Anakin_models/xxxxx.anakin.bin";
// Create an empty graph object.
auto graph = new Graph<NV, AK_FLOAT, Precision::FP32>();
// Load Anakin model.
auto status = graph->load(model_path);
if(!status ) {
    LOG(FATAL) << " [ERROR] " << status.info();
}
// Reshape
graph->Reshape("input_0", {10, 384, 960, 10});
// You must optimize graph for the first time.
graph->Optimize();
// Create a executer.
Net<NV, AK_FLOAT, Precision::FP32> net_executer(*graph);

//Get your input tensors through some specific string such as "input_0", "input_1", and 
//so on. 
//And then, feed the input tensor.
//If you don't know Which input do these specific string ("input_0", "input_1") correspond with, you can launch dash board to find out.
auto d_tensor_in_p = net_executer.get_in("input_0");
Tensor4d<X86, AK_FLOAT> h_tensor_in;
auto valid_shape_in = d_tensor_in_p->valid_shape();
for (int i=0; i<valid_shape_in.size(); i++) {
    LOG(INFO) << "detect input dims[" << i << "]" << valid_shape_in[i]; //see tensor's dimentions
}
h_tensor_in.re_alloc(valid_shape_in);
float* h_data = h_tensor_in.mutable_data();
for (int i=0; i<h_tensor_in.size(); i++) {
    h_data[i] = 1.0f;
}
d_tensor_in_p->copy_from(h_tensor_in);

//Do inference.
net_executer.prediction();

//Get result tensor through the name of output node.
//And also, you need to see the dash board again to find out how many output nodes are and remember their name.

//For example, you've got a output node named obj_pre_out
//Then, you can get an output tensor.
auto d_tensor_out_0_p = net_executer.get_out("obj_pred_out"); //get_out returns a pointer to output tensor.
auto d_tensor_out_1_p = net_executer.get_out("lc_pred_out"); //get_out returns a pointer to output tensor.
//......
// do something else ...
//...
//save model.
//You might not optimize the graph when you load the saved model again.
std::string save_model_path = model_path + std::string(".saved");
auto status = graph->save(save_model_path);
if (!status ) {
    LOG(FATAL) << " [ERROR] " << status.info();
}

anakin converter

cd anakin/tools/external_converter_v2
sudo pip install flask prettytable

vim config.yaml 
# ...
python converter.py

config.yaml

OPTIONS:
    Framework: CAFFE
    SavePath: ./output
    ResultName: mylenet
    Config:
        LaunchBoard: ON
        Server:
            ip: 0.0.0.0
            port: 8888
        OptimizedGraph: 
            enable: OFF
            path: ./anakin_optimized/lenet.anakin.bin.saved
    LOGGER:
        LogToPath: ./log/
        WithColor: ON 

TARGET:
    CAFFE:
        # path to proto files
        ProtoPaths:
            - /home/kezunlin/program/caffe/src/caffe/proto/caffe.proto
        PrototxtPath: /home/kezunlin/program/caffe/examples/mnist/lenet.prototxt
        ModelPath: /home/kezunlin/program/caffe/examples/mnist/lenet_iter_10000.caffemodel

    FLUID:
        # path of fluid inference model
        Debug: NULL                            # Generally no need to modify.
        ModelPath: /path/to/your/model/        # The upper path of a fluid inference model.
        NetType:                               # Generally no need to modify.

    LEGO:
        # path to proto files
        ProtoPath:
        PrototxtPath:
        ModelPath:

    TENSORFLOW:
        ProtoPaths: /
        PrototxtPath: /
        ModelPath: /
        OutPuts:

    ONNX:
        ProtoPath:
        PrototxtPath:
        ModelPath:

  • input: caffe.proto + lenet.prototxt + lenet_iter_10000.caffemodel
  • output: output/mylenet.anakin.bin + log/xxx.log

anakin test

model_test.cpp

cat Anakin/test/framework/net/model_test.cpp

cd gpu_output
./unit_test/model_test '/home/kezunlin/program/anakin/demo/model/' 

example_nv_cnn_net.cpp

cat Anakin/examples/cuda/example_nv_cnn_net.cpp

my example

my workspace

ls demo/
anakin_lib  build  cmake  CMakeLists.txt  image  model  src


tree demo/src/ demo/model/ demo/cmake demo/image
demo/src/
└── demo.cpp
demo/model/
└── mylenet.anakin.bin
demo/cmake
├── anakin-config.cmake
├── msg_color.cmake
├── statistic.cmake
└── utils.cmake
demo/image
├── big.jpg
└── cat.jpg

0 directories, 8 files

anakin_lib

use ./tools/gpu_build.sh to generate gpu_build_sm61 and rename to anakin_lib

./tools/gpu_build.sh
# ...

mv gpu_build_sm61 anakin_lib

ls anakin_lib/
anakin_config.h  libanakin_saber_common.so        libanakin.so        log    unit_test
framework        libanakin_saber_common.so.0.1.2  libanakin.so.0.1.2  saber  utils

anakin-config.cmake

set(ANAKIN_FOUND TRUE) # auto 
set(ANAKIN_VERSION 0.1.2)
set(ANAKIN_ROOT_DIR "/home/kezunlin/program/anakin/demo/anakin_lib")

set(ANAKIN_ROOT ${ANAKIN_ROOT_DIR})
set(ANAKIN_FRAMEWORK ${ANAKIN_ROOT}/framework)
set(ANAKIN_SABER ${ANAKIN_ROOT}/saber)
set(ANAKIN_UTILS ${ANAKIN_ROOT}/utils)


set(ANAKIN_FRAMEWORK_CORE ${ANAKIN_FRAMEWORK}/core)
set(ANAKIN_FRAMEWORK_GRAPH ${ANAKIN_FRAMEWORK}/graph)
set(ANAKIN_FRAMEWORK_LITE ${ANAKIN_FRAMEWORK}/lite)
set(ANAKIN_FRAMEWORK_MODEL_PARSER ${ANAKIN_FRAMEWORK}/model_parser)
set(ANAKIN_FRAMEWORK_OPERATORS ${ANAKIN_FRAMEWORK}/operators)

set(ANAKIN_SABER_CORE ${ANAKIN_SABER}/core)
set(ANAKIN_SABER_FUNCS ${ANAKIN_SABER}/funcs)
set(ANAKIN_SABER_LITE ${ANAKIN_SABER}/lite)

set(ANAKIN_UTILS_LOGGER ${ANAKIN_UTILS}/logger)
set(ANAKIN_UTILS_UINT_TEST ${ANAKIN_UTILS}/unit_test)

#find_path(ANAKIN_INCLUDE_DIR NAMES anakin_config.h PATHS "${ANAKIN_ROOT_DIR}") 
mark_as_advanced(ANAKIN_INCLUDE_DIR) # show entry in cmake-gui

find_library(ANAKIN_SABER_COMMON_LIBRARY NAMES anakin_saber_common PATHS "${ANAKIN_ROOT_DIR}") 
mark_as_advanced(ANAKIN_SABER_COMMON_LIBRARY) # show entry in cmake-gui

find_library(ANAKIN_LIBRARY NAMES anakin PATHS "${ANAKIN_ROOT_DIR}") 
mark_as_advanced(ANAKIN_LIBRARY) # show entry in cmake-gui

# use xxx_INCLUDE_DIRS and xxx_LIBRARIES in CMakeLists.txt
set(ANAKIN_INCLUDE_DIRS 
    ${ANAKIN_ROOT} 
    ${ANAKIN_FRAMEWORK} 
    ${ANAKIN_SABER} 
    ${ANAKIN_UTILS} 

    ${ANAKIN_FRAMEWORK_CORE} 
    ${ANAKIN_FRAMEWORK_GRAPH} 
    ${ANAKIN_FRAMEWORK_LITE} 
    ${ANAKIN_FRAMEWORK_MODEL_PARSER} 
    ${ANAKIN_FRAMEWORK_OPERATORS} 

    ${ANAKIN_SABER_CORE} 
    ${ANAKIN_SABER_FUNCS} 
    ${ANAKIN_SABER_LITE} 

    ${ANAKIN_UTILS_LOGGER} 
    ${ANAKIN_UTILS_UINT_TEST} 
)

set(ANAKIN_LIBRARIES ${ANAKIN_SABER_COMMON_LIBRARY} ${ANAKIN_LIBRARY} )

message( "anakin-config.cmake " ${ANAKIN_ROOT_DIR})

CMakeLists.txt

cmake_minimum_required(VERSION 2.8.8)

project(demo)

include(cmake/msg_color.cmake)
include(cmake/utils.cmake)
include(cmake/statistic.cmake)

#add_definitions( -Dshared_DEBUG) # define macro

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11")

set(ROOT_CMAKE_DIR ./cmake)
set(CMAKE_PREFIX_PATH ${CMAKE_PREFIX_PATH} "${ROOT_CMAKE_DIR};${CMAKE_PREFIX_PATH}")
MESSAGE( [cmake] " CMAKE_PREFIX_PATH = ${CMAKE_PREFIX_PATH} for find_package")

# Find includes in corresponding build directories
set(CMAKE_INCLUDE_CURRENT_DIR ON)

find_package(OpenCV REQUIRED COMPONENTS core highgui imgproc features2d calib3d) 
include_directories(${OpenCV_INCLUDE_DIRS})

# find anakin-config.cmake file
#include(cmake/anakin-config.cmake)
find_package(ANAKIN REQUIRED)
include_directories(${ANAKIN_INCLUDE_DIRS})

#message( [opencv] ${OpenCV_INCLUDE_DIRS} )
#message( [opencv] ${OpenCV_LIBS} )
#message( [anakin] ${ANAKIN_INCLUDE_DIRS} )
#message( [anakin] ${ANAKIN_LIBRARIES} )

add_executable(${PROJECT_NAME} 
    src/demo.cpp
)

# dl pthread 
# error with  -std=c++11 -lpthread -ldl 

target_link_libraries(${PROJECT_NAME} 
    dl 
    pthread
    ${OpenCV_LIBS} 
    ${ANAKIN_LIBRARIES}
)

src/demo.cpp

edit from Anakin/examples/cuda/example_nv_cnn_net.cpp

#include <iostream>
using namespace std;

// opencv
#include <opencv2/core.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/highgui.hpp>
using namespace cv;

// anakin
#include "utils/logger/logger.h"
#include "framework/graph/graph.h"
#include "framework/core/net/net.h"

/*util to fill tensor*/
#include "saber/core/tensor_op.h"
using namespace anakin;
using namespace anakin::graph;
using namespace anakin::saber;

/*
+------------+-----------------+-------+-----------+
| Input Name |      Shape      | Alias | Data Type |
+------------+-----------------+-------+-----------+
|  input_0   | [64, 1, 28, 28] |  NULL |    NULL   |
+------------+-----------------+-------+-----------+
+-------------+
| Output Name |
+-------------+
|   prob_out  |
+-------------+
*/

int fill_tensor(Tensor4d<X86, AK_FLOAT>& h_tensor_in, const cv::Mat& image)
{
    // write data to tensor
    int height = image.rows;
    int width = image.cols;

    LOG(INFO)<<"height*width ="<< height*width <<std::endl;  // 784
    LOG(INFO)<<"h_tensor_in.size() ="<<h_tensor_in.size()<<std::endl; // 784

    float* tensor_ptr = h_tensor_in.mutable_data(); // int, float or double.

    const float* ptr;
    for (int h = 0; h < height; ++h)
    {
        ptr = image.ptr<float>(h); // row ptr
        for (int w = 0; w < width; ++w)
        {
            *tensor_ptr++ = *ptr++;
        }
    }

    return 1;
}

int main(int argc, const char** argv) {

    const char *model_path = "../model/mylenet.anakin.bin";

    Mat image = imread("../image/cat.jpg",0);
    cv::resize(image,image,Size(28,28));
    //imshow("image",image);
    //waitKey(0);

    /*init graph object, graph is the skeleton of model*/
    Graph<NV, AK_FLOAT, Precision::FP32> graph;

    /*load model from file to init the graph*/
    auto status = graph.load(model_path);
    if (!status) {
        LOG(FATAL) << " [ERROR] " << status.info();
    }

    /*set net input shape and use this shape to optimize the graph(fusion and init operator),shape is n,c,h,w*/
    graph.Reshape("input_0", {1, 1, 28, 28});
    graph.Optimize();

    /*net_executer is the executor object of model. use graph to init Net*/
    Net<NV, AK_FLOAT, Precision::FP32> net_executer(graph, true);

    /*use input string to get the input tensor of net. for we use NV as target, the tensor of net_executer is on GPU memory*/
    auto d_tensor_in_p = net_executer.get_in("input_0");
    auto valid_shape_in = d_tensor_in_p->valid_shape();

    /*create tensor located in host*/
    Tensor4d<X86, AK_FLOAT> h_tensor_in;

    /*alloc for host tensor*/
    h_tensor_in.re_alloc(valid_shape_in);

    /*init host tensor by random*/
    //fill_tensor_host_rand(h_tensor_in, -1.0f, 1.0f);

    image.convertTo(image, CV_32FC1); // faster
    fill_tensor(h_tensor_in,image);

    /*use host tensor to int device tensor which is net input*/
    d_tensor_in_p->copy_from(h_tensor_in);

    /*run infer*/
    net_executer.prediction();

    LOG(INFO)<<"infer finish";

    /*get the out put of net, which is a device tensor*/
    auto d_out=net_executer.get_out("prob_out");

    /*create another host tensor, and copy the content of device tensor to host*/
    Tensor4d<X86, AK_FLOAT> h_tensor_out;
    h_tensor_out.re_alloc(d_out->valid_shape());
    h_tensor_out.copy_from(*d_out);

    /*show output content*/
    for(int i=0;i<h_tensor_out.valid_size();i++){
        LOG(INFO)<<"out ["<<i<<"] = "<<h_tensor_out.data()[i];
    }
}

compile demo

mkdir build
cd build 
cmake ..
make 
./demo 

output

ERR| 16:45:56.00581| 110838.067s|         37CBF8C0| operator_attr.h:94]  you have set the argument: is_reverse , so it's igrored by anakin
 ERR| 16:45:56.00581| 110838.067s|         37CBF8C0| operator_attr.h:94]  you have set the argument: is_reverse , so it's igrored by anakin
   0| 16:45:56.00681| 0.098s|         37CBF8C0| parser.cpp:96] graph name: LeNet
   0| 16:45:56.00681| 0.099s|         37CBF8C0| parser.cpp:101] graph in: input_0
   0| 16:45:56.00681| 0.099s|         37CBF8C0| parser.cpp:107] graph out: prob_out
   0| 16:45:56.00742| 0.159s|         37CBF8C0| graph.cpp:153]  processing in-ordered fusion : ConvBatchnormScaleReluPool
   0| 16:45:56.00742| 0.160s|         37CBF8C0| graph.cpp:153]  processing in-ordered fusion : ConvBatchnormScaleRelu
   0| 16:45:56.00742| 0.160s|         37CBF8C0| graph.cpp:153]  processing in-ordered fusion : ConvReluPool
   0| 16:45:56.00742| 0.160s|         37CBF8C0| graph.cpp:153]  processing in-ordered fusion : ConvBatchnormScale
   0| 16:45:56.00742| 0.160s|         37CBF8C0| graph.cpp:153]  processing in-ordered fusion : DeconvRelu
   0| 16:45:56.00742| 0.160s|         37CBF8C0| graph.cpp:153]  processing in-ordered fusion : ConvRelu
   0| 16:45:56.00742| 0.160s|         37CBF8C0| graph.cpp:153]  processing in-ordered fusion : PermutePower
   0| 16:45:56.00742| 0.160s|         37CBF8C0| graph.cpp:153]  processing in-ordered fusion : ConvBatchnorm
   0| 16:45:56.00742| 0.160s|         37CBF8C0| graph.cpp:153]  processing in-ordered fusion : EltwiseRelu
   0| 16:45:56.00742| 0.160s|         37CBF8C0| graph.cpp:153]  processing in-ordered fusion : EltwiseActivation
 WAN| 16:45:56.00743| 0.160s|         37CBF8C0| net.cpp:663] Detect and initial 1 lanes.
   0| 16:45:56.00743| 0.161s|         37CBF8C0| env.h:44] found 1 device(s)
   0| 16:45:56.00743| 0.161s|         37CBF8C0| cuda_device.cpp:45] Device id: 0 , name: GeForce GTX 1060
   0| 16:45:56.00743| 0.161s|         37CBF8C0| cuda_device.cpp:47] Multiprocessors: 10
   0| 16:45:56.00743| 0.161s|         37CBF8C0| cuda_device.cpp:50] frequency:1733MHz
   0| 16:45:56.00743| 0.161s|         37CBF8C0| cuda_device.cpp:52] CUDA Capability : 6.1
   0| 16:45:56.00743| 0.161s|         37CBF8C0| cuda_device.cpp:54] total global memory: 6078MBytes.
 WAN| 16:45:56.00743| 0.161s|         37CBF8C0| net.cpp:667] Current used device id : 0
 WAN| 16:45:56.00744| 0.161s|         37CBF8C0| input.cpp:16] Parsing Input op parameter.
   0| 16:45:56.00744| 0.161s|         37CBF8C0| input.cpp:19]  |-- shape [0]: 1
   0| 16:45:56.00744| 0.161s|         37CBF8C0| input.cpp:19]  |-- shape [1]: 1
   0| 16:45:56.00744| 0.161s|         37CBF8C0| input.cpp:19]  |-- shape [2]: 28
   0| 16:45:56.00744| 0.161s|         37CBF8C0| input.cpp:19]  |-- shape [3]: 28
 ERR| 16:45:56.00744| 0.161s|         37CBF8C0| net.cpp:210] node_ptr->get_op_name()  sass not support yet.
 ERR| 16:45:56.00744| 0.161s|         37CBF8C0| net.cpp:210] node_ptr->get_op_name()  sass not support yet.
 WAN| 16:45:57.00269| 0.686s|         37CBF8C0| context.h:40] device index exceeds the number of devices, set to default device(0)!
   0| 16:45:57.00270| 0.687s|         37CBF8C0| net.cpp:300] Temp mem used:        0 MB
   0| 16:45:57.00270| 0.687s|         37CBF8C0| net.cpp:301] Original mem used:    0 MB
   0| 16:45:57.00270| 0.687s|         37CBF8C0| net.cpp:302] Model mem used:       1 MB
   0| 16:45:57.00270| 0.687s|         37CBF8C0| net.cpp:303] System mem used:      153 MB
   0| 16:45:57.00270| 0.687s|         37CBF8C0| demo.cpp:40] height*width =784
   0| 16:45:57.00270| 0.687s|         37CBF8C0| demo.cpp:41] h_tensor_in.size() =784
   0| 16:45:57.00270| 0.688s|         37CBF8C0| demo.cpp:105] infer finish
   0| 16:45:57.00270| 0.688s|         37CBF8C0| demo.cpp:117] out [0] = 0
   0| 16:45:57.00270| 0.688s|         37CBF8C0| demo.cpp:117] out [1] = 0
   0| 16:45:57.00270| 0.688s|         37CBF8C0| demo.cpp:117] out [2] = 0
   0| 16:45:57.00270| 0.688s|         37CBF8C0| demo.cpp:117] out [3] = 1
   0| 16:45:57.00270| 0.688s|         37CBF8C0| demo.cpp:117] out [4] = 0
   0| 16:45:57.00270| 0.688s|         37CBF8C0| demo.cpp:117] out [5] = 0
   0| 16:45:57.00270| 0.688s|         37CBF8C0| demo.cpp:117] out [6] = 0
   0| 16:45:57.00270| 0.688s|         37CBF8C0| demo.cpp:117] out [7] = 0
   0| 16:45:57.00270| 0.688s|         37CBF8C0| demo.cpp:117] out [8] = 0
   0| 16:45:57.00270| 0.688s|         37CBF8C0| demo.cpp:117] out [9] = 0

For Windows (skip)

version

  • windows 10
  • vs 2015
  • cmake 3.2.2
  • cuda 8.0 + cudnn 6.0.21 (same as caffe) sm_61
  • protobuf 3.4.0

protobuf

see compile protobuf-cpp on windows 10

compile

#git clone https://github.com/PaddlePaddle/Anakin.git anakin
git clone https://github.com/kezunlin/Anakin.git anakin
cd anakin 
mkdir build && cd build && cmake-gui ..

with options

CUDNN_ROOT "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/"
PROTOBUF_ROOT "C:/Program Files/protobuf" 

BUILD_SHARED ON
USE_GPU_PLACE ON
USE_OPENMP OFF
USE_OPENCV ON

generate Anakin.sln and compile with VS 2015 with x64 Release mode.

error fixs

we get 101 errors, hard to fix.
skip now.

Reference

History

  • 20180903: created.

Author: kezunlin
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint polocy. If reproduced, please indicate source kezunlin !
评论
  TOC