0%

Guide

mlpack: a scalable C++ machine learning library

dependencies

  • Armadillo >= 6.500.0
  • Boost
  • CMake >= 3.3.2

Armadillo: c++ linear algebra library based on LAPACK and BLAS
If you are compiling Armadillo by hand, ensure that LAPACK and BLAS are enabled.

see OpenCV vs. Armadillo vs. Eigen on Linux

1
sudo apt-get install libarmadillo-dev

install

apt-get

1
sudo apt-get install libmlpack-dev

version: 2.0.1
by default mlpack will install to /usr/include/mlpack and /usr/lib

compile

1
2
3
4
5
wget https://www.mlpack.org/files/mlpack-3.1.1.tar.gz
git clone https://github.com/mlpack/mlpack.git
mkdir build && cd build && cmake-gui ..
make -j8
sudo make install

configure and output

...
Found Armadillo: /usr/lib/libarmadillo.so (found suitable version "6.500.5", minimum required is "6.500.0") 
Armadillo libraries: /usr/lib/libarmadillo.so
...

version: 3.1.1
by default mlpack will install to /usr/local/include and /usr/local/lib/libmlpack.so.3.1

usage

mlpack-config.cmake

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
#.rst:
# FindMLPACK
# -------------
#
# Find MLPACK
#
# Find the MLPACK C++ library
#
# Using MLPACK::
#
# find_package(MLPACK REQUIRED)
# include_directories(${MLPACK_INCLUDE_DIRS})
# add_executable(foo foo.cc)
# target_link_libraries(foo ${MLPACK_LIBRARIES})
#
# This module sets the following variables::
#
# MLPACK_FOUND - set to true if the library is found
# MLPACK_INCLUDE_DIRS - list of required include directories
# MLPACK_LIBRARIES - list of libraries to be linked
# MLPACK_VERSION_MAJOR - major version number
# MLPACK_VERSION_MINOR - minor version number
# MLPACK_VERSION_PATCH - patch version number
# MLPACK_VERSION_STRING - version number as a string (ex: "1.0.4")


# UNIX paths are standard, no need to specify them.
find_library(MLPACK_LIBRARY
NAMES mlpack
PATHS "$ENV{ProgramFiles}/mlpack/lib" "$ENV{ProgramFiles}/mlpack/lib64" "$ENV{ProgramFiles}/mlpack"
)
find_path(MLPACK_INCLUDE_DIR
NAMES mlpack/core.hpp mlpack/prereqs.hpp
PATHS "$ENV{ProgramFiles}/mlpack"
)


if(MLPACK_INCLUDE_DIR)
# Read and parse mlpack version header file for version number
file(STRINGS "${MLPACK_INCLUDE_DIR}/mlpack/core/util/version.hpp" _mlpack_HEADER_CONTENTS REGEX "#define MLPACK_VERSION_[A-Z]+ ")
string(REGEX REPLACE ".*#define MLPACK_VERSION_MAJOR ([0-9]+).*" "\\1" MLPACK_VERSION_MAJOR "${_mlpack_HEADER_CONTENTS}")
string(REGEX REPLACE ".*#define MLPACK_VERSION_MINOR ([0-9]+).*" "\\1" MLPACK_VERSION_MINOR "${_mlpack_HEADER_CONTENTS}")
string(REGEX REPLACE ".*#define MLPACK_VERSION_PATCH ([0-9]+).*" "\\1" MLPACK_VERSION_PATCH "${_mlpack_HEADER_CONTENTS}")

unset(_mlpack_HEADER_CONTENTS)

set(MLPACK_VERSION_STRING "${MLPACK_VERSION_MAJOR}.${MLPACK_VERSION_MINOR}.${MLPACK_VERSION_PATCH}")
endif()

find_package_handle_standard_args(MLPACK
REQUIRED_VARS MLPACK_LIBRARY MLPACK_INCLUDE_DIR
VERSION_VAR MLPACK_VERSION_STRING
)

if(MLPACK_FOUND)
set(MLPACK_INCLUDE_DIRS ${MLPACK_INCLUDE_DIR})
set(MLPACK_LIBRARIES ${MLPACK_LIBRARY})
endif()

# Hide internal variables
mark_as_advanced(
MLPACK_INCLUDE_DIR
MLPACK_LIBRARY
)

From here

CMakeLists.txt

1
2
3
4
5
find_package(MLPACK REQUIRED)
MESSAGE( [Main] " MLPACK_INCLUDE_DIRS = ${MLPACK_INCLUDE_DIRS}")
MESSAGE( [Main] " MLPACK_LIBRARIES = ${MLPACK_LIBRARIES}")
# /usr/local/include
# /usr/local/lib/libmlpack.so

mlpack clustering

see mlpack clustering

kmeans

skip now.

meanshift

dbscan

sklearn clustering

1
2
3
from sklearn.cluster import MeanShift
from sklearn.cluster import DBSCAN
from sklearn.cluster import KMeans

see sklearn clustering

opencv clustering

  • cv::kmeans()

see opencv clustering

Reference

History

  • 20190520: created.

Linear Algebra

determinant

basic

determinant formula

determinant properties

online calculator

inverse/adjoint(adjugate) matrix

Only non-singular matrices have inverses. (det(A) != 0)

  • minor matrix
  • cofactor matrix
  • adjoint/adjugate matrix
  • inverse matrix
  • conjugate matrix

eigenvalue/eigenvector

An*n
graph demo
eigenvector

eigenvector steps

svd

singular value decomposition
Am*n (m!=n)
svd

Probability Theory

random variable: discrete/continuous

  • probability mass function: pmf (possion, binomial distribution ) for discrete random variable
  • probability density function: pdf (normal,uniform) for contiunous random variable
  • cumulative distribution function: cdf for discrete+contiunous random variable

see pmf-cdf-pdf
distribution-function-terminology-pdf-cdf-pmf-etc

binomial: n times Bernoulli trial, P(x=k)=C(n,k)* p^k * (1-p)^(n-k)

  • marginal probability
  • joint probability
  • conditional probability
  • bayes theorem

see here

Marginal probability: the probability of an event occurring (p(A)), it may be thought of as an unconditional probability. It is not conditioned on another event. Example: the probability that a card drawn is red (p(red) = 0.5). Another example: the probability that a card drawn is a 4 (p(four)=1/13).

Joint probability: p(A and B). The probability of event A and event B occurring. It is the probability of the intersection of two or more events. The probability of the intersection of A and B may be written p(A ∩ B). Example: the probability that a card is a four and red =p(four and red) = 2/52=1/26. (There are two red fours in a deck of 52, the 4 of hearts and the 4 of diamonds).

Conditional probability: p(A|B) is the probability of event A occurring, given that event B occurs. Example: given that you drew a red card, what’s the probability that it’s a four (p(four|red))=2/26=1/13. So out of the 26 red cards (given a red card), there are two fours so 2/26=1/13.

bayes theorem: p(cancer)=0.01, p(positive test|cancer)=0.9, p(positive test|no cancer)=0.08
p(cancer|positive test)?

basic-prob

Statistics

2 types of statistics

  • descriptive statistics 描述性统计值
  • inferential statistics 推理性统计值

descriptive statistics

basic

  • n, sum, min,max, range =max-min,
  • mean,median,mode
  • variance,standard deviation
  • skewness,kurtosis

    from mean,median,mode

mean/median/mode

  • mean: regular meaning of “average”
  • median: middle value
  • mode: most often

2 types of data set: here

  • population: u,sigma^2, sigma —> parameter
  • sample: x, s^2, s —> statistic

population是总体,总体的数据是不变的,u就代表总体真实的均值;
sample是样本,我们总体的数据很难得到,必须借助样本猜测总体的情况,但是每次采样的时候会有不同,因此x拔表示一次采样的均值;
不同采样的均值x往往不同,但是总体均值u一定是不变的。

population
population

sample
sample

see example

skewness vs kurtosis

  • skewness: 偏度 the degree of symmetry
  • kurtosis: 峰度 the degree of peakedness/flatness

image

image

formula see skewness kurtosis formula

inferential statistics

inferential statistics
inferential statistics
Each hypothesis: null hypothesis + an alternative hypothesis.

  • H0: u1=u2=u3=…=un. it indicates that the group means for the various groups are NOT very different from each other based on statistical significance levels.
  • Ha: there exists at least two group means that are statistically significantly different from each other.

significance tests 显著性检验

  • H0: there is NO real difference
  • Ha: there is a difference

    Reject H0 at 5% significant level if p-value<5%, statistical significant
    Reject H0 at 1% significant level if p-value<1%, highly significant
    one-tailed tests vs two-tailed tests

one-way ANOVA test:

  • if p-value<=5%, the result is statistically significant different, we reject the null hypothesis in favor of the alternative hypothesis. (Ha was correct)
  • Otherwise, if the results is not statistically significant, we conclude that our null hypothesis was correct. (H0 was correct)

demo
anova test

F-stat>4.737 or p-value<0.05, then reject H0

boxplot for anova test

parametric tests vs nonparametric tests 参数检验 vs 非参数检验

Data Mining

  • KDD: knowledge discovery of dataset
  • CRISP-DM: cross-industry standard process for data mining 跨行业数据挖掘标准流程

CRISP-DM_Process_Diagram.png

Machine Learning methods

with/without labels

  • supervised learning:
    • classification
    • regression
  • unsupervised learning
    • clustering
    • dimensionality reduction
    • anomaly detection
    • assiciation rule-mining/market basket analysis(购物篮分析)
  • semi-supervised learning
  • reinforcement learning

online/offline

  • batch learning/offline learning
  • online learning

instance/model

  • instance based learning
  • model based learning

EDA

statistics

  • descriptive statistics
  • inferential statistics

analysis

3 types

  • univariate analysis: n=1
  • bivariate analysis: n=2
  • multivariate analysis: n>=3

3 types of analysis

  • use histogram to visualize data
  • correlation matrix/heatmap

Model Evaluation

Classification

confusion matrix

  • accuracy
  • precision
  • recall
  • F1-score: harmonic mean 调和平均值

value range (0-1), the bigger, the better.

confusion matrix

precision vs recall curve
precision vs recall

another curve

  • roc: receiver operating characteristic 接受者操作特征. TPR vs FPR curve
  • auc: area under curve. value range (0-1), the bigger, the better.

roc basic
roc
auc
roc demo

all in one
roc, precision recall, f1-score

multi-class classification for ROC

  • micro-averaging: treat as binary
  • macro-averaging: equal weight
    roc for multi-class classification

Clustering

types

  • partition/centroid based clustering: k-means,k-medoids
  • hierachical clustering: AgglomerativeClustering, affinity propagation
    • ward/single linkage
    • averate linkage
    • complete linkage
  • distribution based clustering: gaussian mixture models
  • densitity based clustering: DBSCAN, OPTICS

clustering
clustering

partition based clustering
hierachical clustering dendrogram
linkages

external validation

with labels

  • homogeneity
  • completeness
  • v-measure: harmonic mean 调和平均值
    value range (0-1), the bigger, the better.

homogeneity completeness
v-measure

internal validation

no labels
2 most important traits:

  • compact groups
  • well seperated groups

metric

  • silhouette coefficient: SC轮廓系数. value range (-1-1), the bigger, the better.
  • calinski-harabaz index: chi指数 value range >0 , the bigger, the better.

sc
sc
sc vs number of clusters
sc and chi

Regression

metric:

  • mean squared error: MSE
  • root mean squared error: RMSE
  • coefficient of determination (R^2):判定系数
  • coefficient of correlation (r):相关系数 value range (-1,1)

R2: value range (0,1), the bigger, the better.

for simple linear regression, R^2 = r^2

formula:
r2 formula

correlation coefficient
r demo

r2 demo
r2
r2 demo

images from bing search.

regression analysis

types

  • simple linear regression
  • multiple linear regression
  • nonlinear regression

assumptions

  • training dataset(sample) is representative of the population being modeled
  • x1,x2,…,xn are linearly independent. no multicollinearity 非多重共线性
  • homoscedasticity of error 同方差性: residuals being random and no any patterns

multicollinearity 多重共线性: correlation matrix
variance inflation factor (VIF)方差膨胀因子 VIFi = 1/(1-Ri^2). VIF越大,显示共线性越严重。经验判断方法表明:当0<VIF<10,不存在多重共线性;当10≤VIF<100,存在较强的多重共线性;当VIF≥100,存在严重多重共线性。
homo-scedastic(同方差) vs hetero-scedastic (异方差性): residual plot
homogeneous vs heterogeneous 同质的vs异质的

correlation matrix/heatmap

VIF

homoscedasticity
homoscedasticity

evaluation analysis

  • residual analysis
  • normality tests (Q-Q plot)正态分布检验
  • R^2

QQ-plot

linear regression

y = kx + b, use OLS

decision tree based regression

linear vs non-linear regression:

  • linear regression
  • decision tree based regression (non-linear)

decision tree can be used for both classification and regression. CART

node splitting
for regression:

  • MSE: mean squared error
  • RMSE: root mean squared error
  • MAE: mean absolute error
  • MAPE: mean absolute percentage error

regression
mse and mae

for classification

  • information gain(entropy): 信息增益(熵)
  • gini impurity/index: GINI 基尼不纯度
  • misclassification error:

ig and gini
bad vs good split

stoppint criteria

  • max depth
  • min samples to split internal nodes
  • max leaf nodes

    use GridSearch to search for optimal hyperparameters

decesion tree algorithms

  • CART
  • ID3
  • C4.5

ensemble learning

3 major families:

  • bagging: boostrap aggregating, boostrap sampling(自助采样法) eg. RandomForest
  • boosting: eg. Gradient Boosting Machine(GBM), AdaBoost
    • GBM variant: LightGBM, Extreme Gradient Boosting(XGBoost)
  • stacking

others

  • binning
  • blending
  • averaging
  • voting

see What is the difference between Bagging and Boosting?
see 集成学习-Boosting,Bagging与Stacking

boostrap aggregating/bagging
boostrap aggregating/bagging

boosting
boosting
boosting

model stacking
stacking
stacking

Model Tuning

decision trees

  • information gain: IG 信息增益
  • gini impurity: GI 基尼不纯度

bias-variance tradeoff

The main causes of error in learning are due to noise, bias and variance.

extreme cases of bias-variance

  • underfitting: higt bias, low variance
  • overfitting: lower bias, high vairance

bias-variance tradeoff

bias-variance

bias-variance model complexity

see learnopencv

cross validation

train/validation/test

cross validation strategies:

  • leave one out CV: n-1 samples as train, 1 sample as validate
  • k-fold CV: split into k equal subsets. k-1 subsets as train, 1 subset as validate

    5-fold, 10-fold in pratice

hyperparameter tuning strategies

  • grid search: manually specifying the grid, parallelizable
  • randomized search: automatic

Model Interpertation

tools

global vs local interpertation

  • global interpertation: based on the whole dataset (feature_importance, partial_dependence plot)
  • local interpertation: based on a single prediction

global interpertation
feature_importance

one-way partial_dependence plot

two-way partial_dependence plot

local interpertation
local interpertation

model decision surface/ hypersurface
model decision surface

Model Deployment

  • rest api
  • micro service
  • model deployment as a service, anything as a service(XAAS)

Real-world case studies

customer segmentation

clustering problem

factors

  • geographic 地理因素
  • demographic 人口统计因素
  • psychographic 心理因素
  • behavioural 行为因素

customer segmentation

RFM Model for customer value

  • recency
  • frequency
  • monetary value

RFM Model

association-rule mining

assiciation rule-mining/market basket analysis(购物篮分析)

basics

  • association rule: {item1,item2,item3 —> itemK}
  • itemset: {milk,bread} {beer,diaper}
  • frequent itemset: {milk,bread}

metrics

  • support = frq(X,Y)/N
  • confidence = support(X,Y)/support(X) = frq(X,Y)/frq(X)
  • lift = support(X,Y)/(support(X)*support(Y)) = N*frq(X,Y)/(frq(X)*frq(Y))

good rules: large confidence, large support, lift >1

lift(X->Y) = 0 means X and Y not occur at the same time
lift(X->Y) = 1 means X and Y are independent of each other.
support
support
demo

algorithms

  • apriori algorithm: generate all 2^k itemsets, TOO EXPENSIVE
  • FP growth: no need to generate all 2^k itemsets, use special structure FP-tree, divide-and-conquer stragety

k unique products, then 2^k itemsets.

recommender system

recommender systems/ recommendation engines

big data with pandas

how to process big data with pandas ?

import pandas as pd
for chunk in pd.read_csv(<filepath>, chunksize=<your_chunksize_here>)
    do_processing()
    train_algorithm()

read by chunk
see opening-a-20gb-file-for-analysis-with-pandas

other tools

other refs

types of recommendation engines

3 types

  • user-based recommendation engines
  • content-based recommendation engines
  • hybrid/collaborative filtering(协同过滤) recommendation engines

    based on similarity

different cases

  • popularity-based: most liked songs by all users
  • similarity-based: similar songs for given user
  • matrix factorization based: use svd to get low rand approximation of the utility matrix

similarity

  • Jaccard Index/Jaccard similarity coefficient, (0-1)
  • cosine similarity

Jaccard Distance = 1 - Jaccard Index
Jaccard Index
Jaccard Index
demo

matrix factorization

矩阵分解
use matrix factorization to discover latent features between two different kinds of entities

utility matrix

sparse matrix

matrix factorization

use SVD: matrix factorization, PCA

implicit feedback 隐式反馈: song play count—> likeness

recommendation engine libraries

  • scikit-surprise (Simple Python Recommendation System Engine)
  • lightfm
  • crab
  • rec_sys

time series forecasting

basics

predictive modeling

time series analysis/forecasting:

  • traditional approaches
    • Moving Average: MV
    • Exponential Smoothing: EWMA
    • Holt-Winter EWMA
    • Box-jenkins methodologies: AR, MA, ARIMA, S-ARIMA
  • deep learning approaches: RNN, eg. LSTM
    • regression modeling (x1,x2,..x6,—>x7): many-to-one
    • sequence modeling: squence -> sequence

two domains

  • frequency domain: spectral and wavelet analysis
  • time domain: auto- and cross-correlation analysis

where to get data ?

  • Yahho
  • quandl:

tools to fetch data:

  • quandl: register for key first
  • pandas-datareader

time series components

3 major components:

  • seasonality
  • trend
  • residual

major components

smoothing techniques

  • Moving Average: MV
  • Exponential Smoothing: EWMA

ARIMA

AR vs MV

  • auto regressive
  • moving average

    ARIMA: auto regressive integrated moving average

key concepts

  • Stationarity(平稳性): One the key assumptions behind the ARIMA models. Stationarity refers to the property where for a time series its mean, variance, and autocorrelation are time invariant. In other words, mean, variance,and autocorrelation do not change with time
  • Differencing(差分): differencing is widely used to stabilize the mean of a time series. We can then apply different tests to confirm if the resulting series is stationary or not.
  • Unit Root Tests: Statistical tests that help us understand if a given series is stationary
    or not.
    • ad_fuller_test: The Augmented Dickey Fuller test begins with a null hypothesis of series being non-stationary
    • kpss_test: while Kwiatkowski-Phillips-Schmidt-Shin test or KPSS has a null hypothesis that the series is stationary.

ad_fuller_test
ad_fuller_test 1

not statistically significant, accpet H0: non-stationary
validate 1

ad_fuller_test 2

statistically significant, reject H0 and accept Ha: stationary
validate 2

ARIMA(p,d,q) model
where,

  • p is the order of Autoregression
  • q is the order of Moving average
  • d is the order of differencing

how to choose p and q?

  • ACF or Auto Correlation Function plot —> q = 1
  • PACF or the Partial Auto Correlation Function plot —> p = 1

ACF PACF

use grid search to choose p and q based on AIC

AIC or Akaike Information Criterion measures the
goodness of fit and parsimony.
auto ARIMA

LSTM

Efficient Market Hypothesis: which says that it is almost impossible to beat the market consistently and there
are others which disagree with it.

modeling

  • regression modeling
  • sequence modeling

regression modeling

(N,W,F) format as input

  • number of sequence
  • window: length of sequence
  • features per timestamp

for regression
regression

for sequence
sequence

we need to pad test sequence to match input shape.

other time series tools

New Concepts

  • Linear Discriminant Analysis(LDA)线性判别分析
  • Quadratic Discriminant Analysis(QDA)线性判别分析

sklearn code

from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis

Reference

History

  • 20190516: created.

Guide

imagezmq

1
git clone https://github.com/jeffbass/imagezmq.git

imagezmq has been tested with:

  1. Python 3.5 and 3.6
  2. OpenCV 3.3
  3. Raspian Stretch and Raspian Jessie
  4. PyZMQ 16.0
  5. imutils 0.4.3 (used get to images from PiCamera)

install tools

1
2
3
workon py3cv3  # use your virtual environment name
pip install pyzmq
pip install imutils

test

1
2
3
4
5
6
7
# terminal 1
cd imagezmq/tests
python test_1_receive_images.py

# terminal 2
cd imagezmq/tests
python test_1_send_images.py

received image snapshot

receive image

receive image 2

Reference

History

  • 20190506: created.

Guide

sizeof(array)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
int print_size1(int a[], int n)
{// int add(int*, int)
std::cout<< sizeof(a) << std::endl;
// we get sizeof(int*)
}

int print_size2(int *a, int n)
{// int add(int*, int)
std::cout<< sizeof(a) << std::endl;
// we get sizeof(int*)
}

#define N_ELEMENTS(array) (sizeof(array)/sizeof((array)[0]))

void test_array()
{
int a[5] = {1,2,3,4,5};
int n = N_ELEMENTS(a);
std::cout<<"num = "<< n << std::endl; // 5
std::cout<< sizeof(int) << std::endl; // int size
std::cout<< sizeof(int*) << std::endl; // pointer size
std::cout<< sizeof(a) << std::endl; // 20
print_size1(a, 5);
print_size2(a, 5);
}

An array-type is implicitly converted into pointer type when you pass it in to a function.
int a[]作为函数参数,隐式的转换为int *a.

编译会产生warning.

warning: ‘sizeof’ on array function parameter ‘a’ will return size of ‘int*’ [-Wsizeof-array-argument]
  std::cout<< sizeof(a) << std::endl; // pointer size

char* string

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
void test_str1()
{
// warning: ISO C++ forbids converting a string constant to ‘char*’ [-Wwrite-strings]
char* str = "Hello"; //Warning

const char* str1 = "Hello"; // No warning

// trying to modify const string literal
// gives Runtime error
// segmentation fault (core dumped)
//str[1] = 'o';

cout << str << endl;
}
/*
Hello常量字符串,存放在静态数据区; 由于字符串常量无需改动,放在静态内存区会提高效率.
*/

void test_str2()
{
char str1[] = "abc";
char str2[] = "abc";
const char str3[] = "abc";
const char str4[] = "abc";
const char *str5 = "abc";
const char *str6 = "abc";
cout << ( str1 == str2 ) << endl; // 0
cout << ( str3 == str4 ) << endl; // 0
cout << ( str5 == str6 ) << endl; // 1

str1[1] = 'B'; // OK
//str3[1] = 'B'; // Compiler ERROR
}
/*
str1,str2,str3,str4是数组变量。它们有各自的内存空间;
而str5,str6是指针,它们指向同样的常量字符串。
*/


const char* return_str()
{
const char *p="abc";
return p;
}

void test_str3()
{
const char *str=NULL;
str= return_str();
printf("%s\n", str); // abc
}
/*
由于"abc"是一个字符串常量,存放在静态数据区。把该字符串常量存放的静态数据区的首地址赋值给了指针。
所以return_str函数退出时,该字符串常量所在内存不会被回收。故可以通过指针顺利无误的訪问。
*/

char* return_str2()
{
char p[] ="abc";
return p; // warning: address of local variable ‘p’ returned [-Wreturn-local-addr]
}

void test_str4()
{
char *str=NULL;
str= return_str2();
printf("%s\n", str); // null
}
/*
"abc"是一个字符串常量,存放在静态数据区。
可是把一个字符串常量赋值给了一个局部变量(char []型数组),该局部变量存放在栈中,该数组空间中也存储"abc"的一份拷贝。

也就是说`char p[]="abc";`这条语句让"abc"这个字符串在内存中有两份拷贝,一份在动态分配的栈中,还有一份在静态存储区。
这是与前者return_str1最本质的差别,当return_str2函数退出时,栈要清空,局部变量的内存也被清空了。
所以这时的函数返回的是一个已被释放的内存地址。所以打印出来的是null。
*/


char* return_str3()
{
static char p[]="abc"; //p存放在静态存储区,内容为abc
return p;
}

void test_str5()
{
char *str=NULL;
str= return_str3();
printf("%s\n", str); // abc
}
/*
如果函数的返回值非要是一个局部变量的地址,那么该局部变量一定要申明为static类型。
*/

char*-vs-stdstring
const string

NULL vs nullptr

nullptr
NULL (void *)0

  • convert to integer
  • pointer

nullptr keyword

  • pointer
  • CAN NOT convert to integer
  • nullptr is convertible to bool.

const vs non-const

Use const whenever possible.

将某些东西声明为const可以帮助编译器侦测出错误的用法。const可以被施加于任何作用域内的对象、函数参数、函数返回类型、成员函数本体。
当const和non-const成员函数有实质等价的实现时,令non-const版本去调用const版本可以避免代码重复。反之则不可。

code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class TextBlock {
public:
...
//const:和原先一样
const char& operator[] (std::size_t position) const
{
...
...
...
return text[position];
}

//non-const:发生区别,直接调用了const op[]
char& operator[] (std::size_t position)
{
return //直接return
const_cast<char&>( //(3)将op[]返回值的const移除
static_cast<const TextBlock&>(*this) //(1)为*this加上const
[position] //(2)调用const op[]
);
}
...
}

两次转型:
第一次,用来为this添加 const,将this从其原始类型TextBlock& 转换为const TextBlock&,使得接下来调用operator[]是可以条用const的版本,使用static_cast
第二次,则是从const operator[]的返回值中移除const,利用const_cast来完成。

static

  • static: internal linkage
  • extern: external linkage

extern

extern

extern is present by default with C functions.
Since the declaration can be done any number of times and definition can be done only once

volatile

volatile
volatile cnblogs

volatile 异变的,告诉compiler这个值可能会在当前线程外部被改变,因此不要进行优化,每次都从ram地址读取,而不要从register读取缓存的副本。

Internal Linkage and External Linkage in C

internal linkage and external linkage
what-is-external-linkage-and-internal-linkage

scope is a property handled by compiler, whereas linkage is a property handled by linker.
external linkage means the symbol (function or global variable) is accessible throughout your program and internal linkage means that it’s only accessible in one translation unit.
You can explicitly control the linkage of a symbol by using the extern and static keywords. If the linkage isn’t specified then the default linkage is extern for non-const symbols and static (internal) for const symbols.
The keyword static plays a double role. (1) When used in the definitions of global variables, it specifies internal linkage. (2) When used in the definitions of the local variables, it specifies that the lifetime of the variable is going to be the duration of the program instead of being the duration of the function.

constexpr

constexper

constexpr is a feature added in C++ 11. The main idea is performance improvement of programs by doing computations at compile time rather than run time.

constexpr vs inline functions

Both are for performance improvements, inline functions are request to compiler to expand at compile time and save time of function call overheads. In inline functions, expressions are always evaluated at run time. constexpr is different, here expressions are evaluated at compile time.

vtable and vptr

virtual-functions-and-runtime-polymorphism
what-are-vtable-and-vptr
calling-virtual-methods-in-constructordestructor-in-cpp

It is highly recommended to avoid calling virtual methods from constructor/destructor.

virtual-function-table
class-memory-layout

Virtual Constructor

Virtual Constructor

Virtual Constructor, NO
Can we make a class constructor virtual in C++ to create polymorphic objects? No. C++ being static typed (the purpose of RTTI is different) language, it is meaningless to the C++ compiler to create an object polymorphically. The compiler must be aware of the class type to create the object. In other words, what type of object to be created is a compile time decision from C++ compiler perspective. If we make constructor virtual, compiler flags an error.

Virtual Destructor

Virtual Destructor

Deleting a derived class object using a pointer to a base class that has a non-virtual destructor results in undefined behavior.

Advanced

thread

  • pass by value by default
  • pass by ref: std::ref(variable)

Reference

History

  • 20190429: created.

Guide

case1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#include <iostream>
#include <vector>
class A
{
public:
A (int x_arg) : x (x_arg) { std::cout << "A (x_arg)\n"; }
A () { x = 0; std::cout << "A ()\n"; }
A (const A &rhs) noexcept { x = rhs.x; std::cout << "A (A &)\n"; }
A (A &&rhs) noexcept { x = rhs.x; std::cout << "A (A &&)\n"; }
~A() { std::cout << "~A ()\n"; }

private:
int x;
};

void test_emplace_back_1()
{
// For emplace_back constructor A (int x_arg) will be called.
// And for push_back A (int x_arg) is called first and
// move A (A &&rhs) is called afterwards
{
std::vector<A> a;
std::cout << "call emplace_back:\n";
a.emplace_back(0);
// (1) direct object creation inside vector
}

{
std::vector<A> a;
std::cout << "call push_back:\n";
a.push_back(1);
// (1) create temp object and
// (2) then move copy to vector and
// (3) free temp object
}
}
/*
call emplace_back:
A (x_arg)
~A ()
call push_back:
A (x_arg)
A (A &&)
~A ()
~A ()
*/

see kezunlin

image from c-difference-between-emplace_back-and-push_back-function

case2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
void test_emplace_back_2()
{
// emplace_back and push_back for `A(0)`, it's same.
// A (int x_arg) is called first and
// move A (A &&rhs) is called afterwards
{
std::vector<A> a;
std::cout << "call emplace_back:\n";
a.emplace_back(A(0));
// (1) create temp object and
// (2) then move copy to vector and
// (3) free temp object
}

{
std::vector<A> a;
std::cout << "call push_back:\n";
a.push_back(A(1));
// (1) create temp object and
// (2) then move copy to vector and
// (3) free temp object
}
}

/*
call emplace_back:
A (x_arg)
A (A &&)
~A ()
~A ()
call push_back:
A (x_arg)
A (A &&)
~A ()
~A ()
*/

case 3

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
void test_emplace_back_3()
{
// emplace_back and push_back for `A obj(0)`, it's same.
// A (int x_arg) is called first and
// copy constructor A (A &) is called afterwards
{
std::vector<A> a;
std::cout << "call emplace_back:\n";
A obj(0);
a.emplace_back(obj);
// copy constructor to vector
}

{
std::vector<A> a;
std::cout << "call push_back:\n";
A obj(1);
a.push_back(obj);
// copy constructor to vector
}
}
/*
call emplace_back:
A (x_arg)
A (A &)
~A ()
~A ()
call push_back:
A (x_arg)
A (A &)
~A ()
~A ()
*/

extract subvector

1
2
3
vector<int>::const_iterator first = myVec.begin() + 10;
vector<int>::const_iterator last = myVec.begin() + 15;
vector<int> newVec(first, last); // [10,15)

Reference

History

  • 20190422: created.

Series

Code Example

include headers

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#include <assert.h>
#include <sys/stat.h>
#include <time.h>

#include <iostream>
#include <fstream>
#include <sstream>
#include <iomanip>
#include <cmath>
#include <algorithm>

#include <cuda_runtime_api.h>

#include "NvCaffeParser.h"
#include "NvOnnxConfig.h"
#include "NvOnnxParser.h"
#include "NvInfer.h"
#include "common.h"

using namespace nvinfer1;
using namespace nvcaffeparser1;

static Logger gLogger;

// Attributes of MNIST Caffe model
static const int INPUT_H = 28;
static const int INPUT_W = 28;
static const int OUTPUT_SIZE = 10;
//const char* INPUT_BLOB_NAME = "data";
const char* OUTPUT_BLOB_NAME = "prob";
const std::string mnist_data_dir = "data/mnist/";


// Simple PGM (portable greyscale map) reader
void readPGMFile(const std::string& fileName, uint8_t buffer[INPUT_H * INPUT_W])
{
readPGMFile(fileName, buffer, INPUT_H, INPUT_W);
}

caffe model to tensorrt

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
void caffeToTRTModel(const std::string& deployFilepath,       // Path of Caffe prototxt file
const std::string& modelFilepath, // Path of Caffe model file
const std::vector<std::string>& outputs, // Names of network outputs
unsigned int maxBatchSize, // Note: Must be at least as large as the batch we want to run with
IHostMemory*& trtModelStream) // Output buffer for the TRT model
{
// Create builder
IBuilder* builder = createInferBuilder(gLogger);

// Parse caffe model to populate network, then set the outputs
std::cout << "Reading Caffe prototxt: " << deployFilepath << "\n";
std::cout << "Reading Caffe model: " << modelFilepath << "\n";
INetworkDefinition* network = builder->createNetwork();
ICaffeParser* parser = createCaffeParser();

bool useFp16 = builder->platformHasFastFp16();
std::cout << "platformHasFastFp16: " << useFp16 << "\n";

bool useInt8 = builder->platformHasFastInt8();
std::cout << "platformHasFastInt8: " << useInt8 << "\n";

// create a 16-bit model if it's natively supported
DataType modelDataType = useFp16 ? DataType::kHALF : DataType::kFLOAT;

const IBlobNameToTensor* blobNameToTensor = parser->parse(deployFilepath.c_str(),
modelFilepath.c_str(),
*network,
modelDataType);
// Specify output tensors of network
// ERROR: Network must have at least one output
for (auto& s : outputs){
std::cout<<"output = "<< s.c_str() << std::endl;
network->markOutput(*blobNameToTensor->find(s.c_str())); // prob
}

builder->setMaxBatchSize(maxBatchSize);
builder->setMaxWorkspaceSize(1 << 20);

// set up the network for paired-fp16 format if available
if(useFp16)
builder->setFp16Mode(true);

// Build engine
ICudaEngine* engine = builder->buildCudaEngine(*network);
assert(engine);

// Destroy parser and network
network->destroy();
parser->destroy();

// Serialize engine and destroy it
trtModelStream = engine->serialize();
engine->destroy();
builder->destroy();

//shutdownProtobufLibrary();
}

pytorch onnx to tensorrt

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
void onnxToTRTModel( const std::string& modelFilepath,        // name of the onnx model 
unsigned int maxBatchSize, // batch size - NB must be at least as large as the batch we want to run with
IHostMemory *&trtModelStream) // output buffer for the TensorRT model
{
// create the builder
IBuilder* builder = createInferBuilder(gLogger);

nvonnxparser::IOnnxConfig* config = nvonnxparser::createONNXConfig();
config->setModelFileName(modelFilepath.c_str());

nvonnxparser::IONNXParser* parser = nvonnxparser::createONNXParser(*config);

//Optional - uncomment below lines to view network layer information
//config->setPrintLayerInfo(true);
//parser->reportParsingInfo();

if (!parser->parse(modelFilepath.c_str(), DataType::kFLOAT))
{
string msg("failed to parse onnx file");
gLogger.log(nvinfer1::ILogger::Severity::kERROR, msg.c_str());
exit(EXIT_FAILURE);
}

if (!parser->convertToTRTNetwork()) {
string msg("ERROR, failed to convert onnx network into TRT network");
gLogger.log(nvinfer1::ILogger::Severity::kERROR, msg.c_str());
exit(EXIT_FAILURE);
}
nvinfer1::INetworkDefinition* network = parser->getTRTNetwork();

// Build the engine
builder->setMaxBatchSize(maxBatchSize);
builder->setMaxWorkspaceSize(1 << 20);

ICudaEngine* engine = builder->buildCudaEngine(*network);
assert(engine);

// we don't need the network any more, and we can destroy the parser
network->destroy();
parser->destroy();

// serialize the engine, then close everything down
trtModelStream = engine->serialize();
engine->destroy();
builder->destroy();

//shutdownProtobufLibrary();
}

do inference

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
void doInference(IExecutionContext& context, float* input, float* output, int batchSize)
{
const ICudaEngine& engine = context.getEngine();
// Pointers to input and output device buffers to pass to engine.
// Engine requires exactly IEngine::getNbBindings() number of buffers.
assert(engine.getNbBindings() == 2);
void* buffers[2];

// In order to bind the buffers, we need to know the names of the input and output tensors.
// Note that indices are guaranteed to be less than IEngine::getNbBindings()
int inputIndex, outputIndex;

printf("Bindings after deserializing:\n");
for (int bi = 0; bi < engine.getNbBindings(); bi++)
{
if (engine.bindingIsInput(bi) == true)
{
inputIndex = bi;
printf("Binding %d (%s): Input.\n", bi, engine.getBindingName(bi));
} else
{
outputIndex = bi;
printf("Binding %d (%s): Output.\n", bi, engine.getBindingName(bi));
}
}

//const int inputIndex = engine.getBindingIndex(INPUT_BLOB_NAME);
//const int outputIndex = engine.getBindingIndex(OUTPUT_BLOB_NAME);

std::cout<<"inputIndex = "<< inputIndex << std::endl; // 0 data
std::cout<<"outputIndex = "<< outputIndex << std::endl; // 1 prob

// Create GPU buffers on device
CHECK(cudaMalloc(&buffers[inputIndex], batchSize * INPUT_H * INPUT_W * sizeof(float)));
CHECK(cudaMalloc(&buffers[outputIndex], batchSize * OUTPUT_SIZE * sizeof(float)));

// Create stream
cudaStream_t stream;
CHECK(cudaStreamCreate(&stream));

// DMA input batch data to device, infer on the batch asynchronously, and DMA output back to host
CHECK(cudaMemcpyAsync(buffers[inputIndex], input, batchSize * INPUT_H * INPUT_W * sizeof(float), cudaMemcpyHostToDevice, stream));
context.enqueue(batchSize, buffers, stream, nullptr);
CHECK(cudaMemcpyAsync(output, buffers[outputIndex], batchSize * OUTPUT_SIZE * sizeof(float), cudaMemcpyDeviceToHost, stream));
cudaStreamSynchronize(stream);

// Release stream and buffers
cudaStreamDestroy(stream);
CHECK(cudaFree(buffers[inputIndex]));
CHECK(cudaFree(buffers[outputIndex]));
}

save and load engine

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
void SaveEngine(const nvinfer1::IHostMemory& trtModelStream, const std::string& engine_filepath)
{
std::ofstream file;
file.open(engine_filepath, std::ios::binary | std::ios::out);
if(!file.is_open())
{
std::cout << "read create engine file" << engine_filepath <<" failed" << std::endl;
return;
}
file.write((const char*)trtModelStream.data(), trtModelStream.size());
file.close();
};


ICudaEngine* LoadEngine(IRuntime& runtime, const std::string& engine_filepath)
{
ifstream file;
file.open(engine_filepath, ios::binary | ios::in);
file.seekg(0, ios::end);
int length = file.tellg();
file.seekg(0, ios::beg);

std::shared_ptr<char> data(new char[length], std::default_delete<char[]>());
file.read(data.get(), length);
file.close();

// runtime->deserializeCudaEngine(trtModelStream->data(), trtModelStream->size(), nullptr);
ICudaEngine* engine = runtime.deserializeCudaEngine(data.get(), length, nullptr);
assert(engine != nullptr);
return engine;
}

example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
void demo_save_caffe_to_trt(const std::string& engine_filepath)
{
std::string deploy_filepath = mnist_data_dir + "mnist.prototxt";
std::string model_filepath = mnist_data_dir + "mnist.caffemodel";

// Create TRT model from caffe model and serialize it to a stream
IHostMemory* trtModelStream{nullptr};
caffeToTRTModel(deploy_filepath, model_filepath, std::vector<std::string>{OUTPUT_BLOB_NAME}, 1, trtModelStream);
assert(trtModelStream != nullptr);

SaveEngine(*trtModelStream, engine_filepath);

// destroy stream
trtModelStream->destroy();
}


void demo_save_onnx_to_trt(const std::string& engine_filepath)
{
std::string onnx_filepath = mnist_data_dir + "mnist.onnx";

// Create TRT model from caffe model and serialize it to a stream
IHostMemory* trtModelStream{nullptr};
onnxToTRTModel(onnx_filepath, 1, trtModelStream);
assert(trtModelStream != nullptr);

SaveEngine(*trtModelStream, engine_filepath);

// destroy stream
trtModelStream->destroy();
}


int mnist_demo()
{
bool use_caffe = false;
std::string engine_filepath;
if (use_caffe){
engine_filepath = "cfg/mnist/caffe_minist_fp32.trt";
demo_save_caffe_to_trt(engine_filepath);
} else {
engine_filepath = "cfg/mnist/onnx_minist_fp32.trt";
demo_save_onnx_to_trt(engine_filepath);
}
std::cout<<"[API] Save engine to "<< engine_filepath <<std::endl;

const int num = 6;
std::string digit_filepath = mnist_data_dir + std::to_string(num) + ".pgm";

// Read a digit file
uint8_t fileData[INPUT_H * INPUT_W];
readPGMFile(digit_filepath, fileData);
float data[INPUT_H * INPUT_W];

if (use_caffe){

std::string mean_filepath = mnist_data_dir + "mnist_mean.binaryproto";
// Parse mean file
ICaffeParser* parser = createCaffeParser();
IBinaryProtoBlob* meanBlob = parser->parseBinaryProto(mean_filepath.c_str());
parser->destroy();

// Subtract mean from image
const float* meanData = reinterpret_cast<const float*>(meanBlob->getData()); // size 786

for (int i = 0; i < INPUT_H * INPUT_W; i++)
data[i] = float(fileData[i]) - meanData[i];

meanBlob->destroy();
} else {

for (int i = 0; i < INPUT_H * INPUT_W; i++)
data[i] = 1.0 - float(fileData[i]/255.0);
}


// Deserialize engine we serialized earlier
IRuntime* runtime = createInferRuntime(gLogger);
assert(runtime != nullptr);

std::cout<<"[API] Load engine from "<< engine_filepath <<std::endl;
ICudaEngine* engine = LoadEngine(*runtime, engine_filepath);
assert(engine != nullptr);

IExecutionContext* context = engine->createExecutionContext();
assert(context != nullptr);

// Run inference on input data
float prob[OUTPUT_SIZE];
doInference(*context, data, prob, 1);

// Destroy the engine
context->destroy();
engine->destroy();
runtime->destroy();

// Print histogram of the output distribution
std::cout << "\nOutput:\n\n";

// for onnx,we get z as output, we need to use softmax to get probs
if ( !use_caffe){

//Calculate Softmax
float sum{0.0f};
for(int i = 0; i < OUTPUT_SIZE; i++)
{
prob[i] = exp(prob[i]);
sum += prob[i];
}
for(int i = 0; i < OUTPUT_SIZE; i++)
{
prob[i] /= sum;
}
}

// find max probs
float val{0.0f};
int idx{0};
for (unsigned int i = 0; i < 10; i++)
{
val = std::max(val, prob[i]);
if (val == prob[i]) {
idx = i;
}
cout << " Prob " << i << " "<< std::fixed << std::setw(5) << std::setprecision(4) << prob[i];
std::cout << i << ": " << std::string(int(std::floor(prob[i] * 10 + 0.5f)), '*') << "\n";
}
std::cout << std::endl;

return (idx == num && val > 0.9f) ? EXIT_SUCCESS : EXIT_FAILURE;
}


int main(int argc, char** argv)
{
mnist_demo();
return 0;
}

results

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
./bin/sample_mnist 
[API] Save engine to cfg/mnist/onnx_minist_fp32.trt
[API] Load engine from cfg/mnist/onnx_minist_fp32.trt
Bindings after deserializing:
Binding 0 (Input3): Input.
Binding 1 (Plus214_Output_0): Output.
inputIndex = 0
outputIndex = 1

Output:

Prob 0 0.00000:
Prob 1 0.00001:
Prob 2 0.00002:
Prob 3 0.00003:
Prob 4 0.00004:
Prob 5 0.00005:
Prob 6 1.00006: **********
Prob 7 0.00007:
Prob 8 0.00008:
Prob 9 0.00009:

Reference

History

  • 20190422 created.

Guide

main ui

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
messagebox
- showinfo()
- showwarning()
- showerror()
- askquestion()
- askokcancel()
- askyesno()
- askretrycancel()
- askyesnocancel()


filedialog
- asksaveasfilename()
- asksaveasfile()
- askopenfilename()
- askopenfile()
- askdirectory()
- askopenfilenames()
- askopenfiles()

demo

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98

from numpy.random import seed, uniform
from numpy import uint8, uint16, load, save

from cv2 import imread, imwrite
from os import listdir, makedirs
from os.path import exists, basename

# for python 3
from tkinter import Tk, Frame, messagebox, filedialog, Button, Label, StringVar

class MyGUI():
def __init__(self):

self.root = Tk()

sw = self.root.winfo_screenwidth()
sh = self.root.winfo_screenheight()

ww = 700
wh = 200
x = (sw-ww) / 2
y = (sh-wh) / 2
self.root.title('Image Compress Tool')
# center
self.root.geometry("%dx%d+%d+%d" % (ww, wh, x, y))

# frame1
frame1 = Frame(self.root)
frame1.grid(row=0, column=0, sticky='w')

self.input_btn = Button(frame1, text="Input Folder", width=10, height=3, command=self.set_input_folder)
self.input_btn.pack(side='left')

self.input_label_text = StringVar()
self.input_label_text.set("Input Folder")

self.input_label = Label(frame1, textvariable=self.input_label_text, width=70, height=3)
self.input_label.pack(side='left')

# frame2
frame2 = Frame(self.root)
frame2.grid(row=1, column=0, sticky='w')

self.output_btn = Button(frame2, text="Output Folder", width=10, height=3, command=self.set_output_folder)
self.output_btn.pack(side='left')

self.output_label_text = StringVar()
self.output_label_text.set("Output Folder")

self.output_label = Label(frame2, textvariable = self.output_label_text, width=70, height=3)
self.output_label.pack(side='left')

# frame3
frame3 = Frame(self.root)
frame3.grid(row=2, column=0, sticky='nw')

self.run_btn = Button(frame3, text="执行加密", width=10, height=3, command=self.run_task)
self.run_btn.pack(side='left')

self.run_label_text = StringVar()
self.run_label_text.set("Ready")

self.run_label = Label(frame3, textvariable = self.run_label_text, width=70, height=3)
self.run_label.pack(side='left')

def mainloop(self):
self.root.mainloop()

def set_input_folder(self):
result = filedialog.askdirectory()
self.input_label_text.set(result)

def set_output_folder(self):
result = filedialog.askdirectory()
self.output_label_text.set(result)

def run_task(self):
input_folder = self.input_label_text.get()
output_folder = self.output_label_text.get()
#print("input_folder: "+input_folder)
#print("output_folder: "+output_folder)
if exists(input_folder):
#batch_compress(input_folder, output_folder)
self.run_label_text.set("Compress OK.")
messagebox.showinfo("Info", "Compress OK.")
else:
messagebox.showwarning("Warn", "Please input folder")

def gui():
app = MyGUI()
app.mainloop()

def main():
gui()

if __name__ =="__main__":
main()

snapshots
tkinter demo

Reference

History

  • 20190411: created.

Series

Guide

config

  • linux/window: cmake with CXX_FLAGS=-fopenmp
  • window VS: VS also support openmp, C/C++| Language | /openmp

usage

1
2
3
4
#include <omp.h>

#pragma omp parallel for
for loop ...

code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#include <iostream>
#include <omp.h>

int main()
{
omp_set_num_threads(4);
#pragma omp parallel for
for (int i = 0; i < 8; i++)
{
printf("i = %d, I am Thread %d\n", i, omp_get_thread_num());
}
printf("\n");

return 0;
}

/*
i = 0, I am Thread 0
i = 1, I am Thread 0
i = 4, I am Thread 2
i = 5, I am Thread 2
i = 6, I am Thread 3
i = 7, I am Thread 3
i = 2, I am Thread 1
i = 3, I am Thread 1
*/

CMakeLists.txt

use CXX_FLAGS=-fopenmp in CMakeLists.txt

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
cmake_minimum_required(VERSION 3.0.0)

project(hello)

find_package(OpenMP REQUIRED)
if(OPENMP_FOUND)
message("OPENMP FOUND")

message([main] " OpenMP_C_FLAGS=${OpenMP_C_FLAGS}") # -fopenmp
message([main] " OpenMP_CXX_FLAGS}=${OpenMP_CXX_FLAGS}") # -fopenmp
message([main] " OpenMP_EXE_LINKER_FLAGS=${OpenMP_EXE_LINKER_FLAGS}") # ***

# no use for xxx_INCLUDE_DIRS and xxx_libraries for OpenMP
message([main] " OpenMP_INCLUDE_DIRS=${OpenMP_INCLUDE_DIRS}") # ***
message([main] " OpenMP_LIBRARIES=${OpenMP_LIBRARIES}") # ***

set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}")
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_EXE_LINKER_FLAGS}")
endif()

add_executable(hello hello.cpp)
#target_link_libraries(hello xxx)

options
openmp

or use g++ hello.cpp -fopenmp to compile

view demo

list dynamic dependencies (ldd)

ldd hello 
    linux-vdso.so.1 =>  (0x00007ffd71365000)
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f8ea7f00000)
    libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f8ea7cde000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f8ea7914000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f8ea760b000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f8ea8282000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f8ea73f5000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f8ea71f1000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f8ea6fd4000)

libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1

list names (nm)

nm hello 
0000000000602080 B __bss_start
0000000000602190 b completed.7594
                 U __cxa_atexit@@GLIBC_2.2.5
0000000000602070 D __data_start
0000000000602070 W data_start
0000000000400b00 t deregister_tm_clones
0000000000400b80 t __do_global_dtors_aux
0000000000601df8 t __do_global_dtors_aux_fini_array_entry
0000000000602078 d __dso_handle
0000000000601e08 d _DYNAMIC
0000000000602080 D _edata
0000000000602198 B _end
0000000000400d44 T _fini
0000000000400ba0 t frame_dummy
0000000000601de8 t __frame_dummy_init_array_entry
0000000000400f18 r __FRAME_END__
0000000000602000 d _GLOBAL_OFFSET_TABLE_
0000000000400c28 t _GLOBAL__sub_I_main
                 w __gmon_start__
0000000000400d54 r __GNU_EH_FRAME_HDR
                 U GOMP_parallel@@GOMP_4.0
                 U __gxx_personality_v0@@CXXABI_1.3
00000000004009e0 T _init
0000000000601df8 t __init_array_end
0000000000601de8 t __init_array_start
0000000000400d50 R _IO_stdin_used
                 w _ITM_deregisterTMCloneTable
                 w _ITM_registerTMCloneTable
0000000000601e00 d __JCR_END__
0000000000601e00 d __JCR_LIST__
                 w _Jv_RegisterClasses
0000000000400d40 T __libc_csu_fini
0000000000400cd0 T __libc_csu_init
                 U __libc_start_main@@GLIBC_2.2.5
0000000000400bc6 T main
0000000000400c3d t main._omp_fn.0
                 U omp_get_num_threads@@OMP_1.0
                 U omp_get_thread_num@@OMP_1.0
0000000000400b40 t register_tm_clones
0000000000400ad0 T _start
0000000000602080 d __TMC_END__
0000000000400bea t _Z41__static_initialization_and_destruction_0ii
                 U _ZNSolsEPFRSoS_E@@GLIBCXX_3.4
                 U _ZNSt8ios_base4InitC1Ev@@GLIBCXX_3.4
                 U _ZNSt8ios_base4InitD1Ev@@GLIBCXX_3.4
0000000000602080 B _ZSt4cout@@GLIBCXX_3.4
                 U _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_@@GLIBCXX_3.4
0000000000602191 b _ZStL8__ioinit
                 U _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_c@@GLIBCXX_3.4
                 

omp_get_num_threads, omp_get_thread_num

OpenMP Introduction

OpenMP的指令格式

#pragma omp directive [clause[clause]…]
#pragma omp parallel private(i, j)

parallel is directive, private is clause

directive

  • parallel,用在一个代码段之前,表示这段代码将被多个线程并行执行
  • for,用于for循环之前,将循环分配到多个线程中并行执行,必须保证每次循环之间无相关性。
  • parallel for, parallel 和 for语句的结合,也是用在一个for循环之前,表示for循环的代码将被多个线程并行执行。
  • sections,用在可能会被并行执行的代码段之前
  • parallel sections,parallel和sections两个语句的结合
  • critical,用在一段代码临界区之前
  • single,用在一段只被单个线程执行的代码段之前,表示后面的代码段将被单线程执行。
  • flush,
  • barrier,用于并行区内代码的线程同步,所有线程执行到barrier时要停止,直到所有线程都执行到barrier时才继续往下执行。
  • atomic,用于指定一块内存区域被制动更新
  • master,用于指定一段代码块由主线程执行
  • ordered, 用于指定并行区域的循环按顺序执行
  • threadprivate, 用于指定一个变量是线程私有的。

parallel for

OpenMP 对可以多线程化的循环有如下五个要求:

  • 循环的变量变量(就是i)必须是有符号整形,其他的都不行。
  • 循环的比较条件必须是< <= > >=中的一种
  • 循环的增量部分必须是增减一个不变的值(即每次循环是不变的)。
  • 如果比较符号是< <=,那每次循环i应该增加,反之应该减小
  • 循环必须是没有奇奇怪怪的东西,不能从内部循环跳到外部循环,goto和break只能在循环内部跳转,异常必须在循环内部被捕获。

如果你的循环不符合这些条件,那就只好改写了.

avoid race condition

当一个循环满足以上五个条件时,依然可能因为数据依赖而不能够合理的并行化。当两个不同的迭代之间的数据存在依赖关系时,就会发生这种情况。

1
2
3
4
5
// 假设数组已经初始化为1
#pragma omp parallel for
for (int i = 2; i < 10; i++) {
factorial[i] = i * factorial[i-1];
}

ERROR.

1
2
3
4
5
6
7
8
9
omp_set_num_threads(4);
#pragma omp parallel
{
#pragma omp for
for (int i = 0; i < 8; i++)
{
printf("i = %d, I am Thread %d\n", i, omp_get_thread_num());
}
}

same as

1
2
3
4
5
6
omp_set_num_threads(4);
#pragma omp parallel for
for (int i = 0; i < 8; i++)
{
printf("i = %d, I am Thread %d\n", i, omp_get_thread_num());
}

parallel sections

1
2
3
4
5
6
7
8
9
10
11
#pragma omp parallel sections # parallel 
{
#pragma omp section # thread-1
{
function1();
}
  #pragma omp section # thread-2
{
function2();
}
}

parallel sections里面的内容要并行执行,具体分工上,每个线程执行其中的一个section

clause

  • private, 指定每个线程都有它自己的变量私有副本。
  • firstprivate,指定每个线程都有它自己的变量私有副本,并且变量要被继承主线程中的初值。
  • lastprivate,主要是用来指定将线程中的私有变量的值在并行处理结束后复制回主线程中的对应变量。
  • reduce,用来指定一个或多个变量是私有的,并且在并行处理结束后这些变量要执行指定的运算。
  • nowait,忽略指定中暗含的等待
  • num_threads,指定线程的个数
  • schedule,指定如何调度for循环迭代
  • shared,指定一个或多个变量为多个线程间的共享变量
  • ordered,用来指定for循环的执行要按顺序执行
  • copyprivate,用于single指令中的指定变量为多个线程的共享变量
  • copyin,用来指定一个threadprivate的变量的值要用主线程的值进行初始化。
  • default,用来指定并行处理区域内的变量的使用方式,缺省是shared

private

1
2
3
4
5
6
7
8
9
10
#pragma omp parallel
{
int x; // private to each thread ? YES
}

#pragma omp parallel for
for (int i = 0; i < 1000; ++i)
{
int x; // private to each thread ? YES
}

local variables are automatically private to each thread.
The reason for the existence of the private clause is so that you don’t have to change your code.
see here

The only way to parallelize the following code without the private clause

1
2
3
4
5
6
7
int i,j;
#pragma omp parallel for private(j)
for(i = 0; i < n; i++) {
for(j = 0; j < n; j++) {
//do something
}
}

is to change the code. For example like this:

1
2
3
4
5
6
7
8
int i;
#pragma omp parallel for
for(i = 0; i < n; i++) {
int j; // mark j as local variable to worker thread
for(j = 0; j < n; j++) {
//do something
}
}

reduction

例如累加

1
2
3
4
int sum = 0;
for (int i = 0; i < 100; i++) {
sum += array[i]; // sum需要私有才能实现并行化,但是又必须是公有的才能产生正确结果
}

上面的这个程序里,sum公有或者私有都不对,为了解决这个问题,OpenMP 提供了reduction语句;

1
2
3
4
5
int sum = 0;
#pragma omp parallel for reduction(+:sum)
for (int i = 0; i < 100; i++) {
sum += array[i];
}

内部实现中,OpenMP为每个线程提供了私有的sum变量(初始化为0),当线程退出时,OpenMP再把每个线程私有的sum加在一起得到最终结果。

num_threads

num_threads(4) same as omp_set_num_threads(4)

1
2
3
4
5
// `num_threads(4)` same as `omp_set_num_threads(4)`
#pragma omp parallel num_threads(4)
{
printf("Hello, I am Thread %d\n", omp_get_thread_num()); // 0,1,2,3,
}

schedule

format

#pragma omp parallel for schedule(kind [, chunk size])

kind: see openmp-loop-scheduling and whats-the-difference-between-static-and-dynamic-schedule-in-openmp

  • static: Divide the loop into equal-sized chunks or as equal as possible in the case where the number of loop iterations is not evenly divisible by the number of threads multiplied by the chunk size. By default, chunk size is loop_count/number_of_threads.
  • dynamic: Use the internal work queue to give a chunk-sized block of loop iterations to each thread. When a thread is finished, it retrieves the next block of loop iterations from the top of the work queue. By default, the chunk size is 1. Be careful when using this scheduling type because of the extra overhead involved.
  • guided: special case of dynamic. Similar to dynamic scheduling, but the chunk size starts off large and decreases to better handle load imbalance between iterations. The optional chunk parameter specifies them minimum size chunk to use. By default the chunk size is approximately loop_count/number_of_threads.
  • auto: When schedule (auto) is specified, the decision regarding scheduling is delegated to the compiler. The programmer gives the compiler the freedom to choose any possible mapping of iterations to threads in the team.
  • runtime: with ENVOMP_SCHEDULE, we can test 3 types scheduling: static,dynamic,guided without recompile the code.

The optional parameter (chunk), when specified, must be a positive integer.

默认情况下,OpenMP认为所有的循环迭代运行的时间都是一样的,这就导致了OpenMP会把不同的迭代等分到不同的核心上,并且让他们分布的尽可能减小内存访问冲突,这样做是因为循环一般会线性地访问内存, 所以把循环按照前一半后一半的方法分配可以最大程度的减少冲突. 然而对内存访问来说这可能是最好的方法, 但是对于负载均衡可能并不是最好的方法, 而且反过来最好的负载均衡可能也会破坏内存访问. 因此必须折衷考虑.

内存访问vs负载均衡,需要折中考虑。
openmp默认使用的schedule是取决于编译器实现的。gcc默认使用schedule(dynamic,1),也就是动态调度并且块大小是1.
线程数不要大于实际核数,否则就是oversubscription

isprime可以对dynamic做一个展示。

functions

  • omp_get_num_procs, 返回运行本线程的多处理机的处理器个数。
  • omp_set_num_threads, 设置并行执行代码时的线程个数
  • omp_get_num_threads, 返回当前并行区域中的活动线程(active thread)个数,如果没有设置,默认为1。
  • omp_get_thread_num, 返回线程号(0,1,2,…)
  • omp_init_lock, 初始化一个简单锁
  • omp_set_lock, 上锁操作
  • omp_unset_lock, 解锁操作,要和omp_set_lock函数配对使用
  • omp_destroy_lock,关闭一个锁,要和 omp_init_lock函数配对使用

check cpu

cat /proc/cpuinfo | grep name | cut -f2 -d: | uniq -c 
    8  Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz

omp_get_num_procs return 8.

OpenMP Example

omp_get_num_threads

1
2
3
4
5
6
7
8
9
10
11
void test0()
{
printf("I am Thread %d, omp_get_num_threads = %d, omp_get_num_procs = %d\n",
omp_get_thread_num(),
omp_get_num_threads(),
omp_get_num_procs()
);
}
/*
I am Thread 0, omp_get_num_threads = 1, omp_get_num_procs = 8
*/

parallel

case1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
void test1()
{
// `parallel`,用在一个代码段之前,表示这段代码block将被多个线程并行执行
// if not set `omp_set_num_threads`, by default use `omp_get_num_procs`, eg 8
//omp_set_num_threads(4); // 设置线程数,一般设置的线程数不超过CPU核心数
#pragma omp parallel
{
printf("Hello, I am Thread %d, omp_get_num_threads = %d, omp_get_num_procs = %d\n",
omp_get_thread_num(),
omp_get_num_threads(),
omp_get_num_procs()
);
}
}
/*
Hello, I am Thread 3, omp_get_num_threads = 8, omp_get_num_procs = 8
Hello, I am Thread 7, omp_get_num_threads = 8, omp_get_num_procs = 8
Hello, I am Thread 1, omp_get_num_threads = 8, omp_get_num_procs = 8
Hello, I am Thread 6, omp_get_num_threads = 8, omp_get_num_procs = 8
Hello, I am Thread 5, omp_get_num_threads = 8, omp_get_num_procs = 8
Hello, I am Thread 4, omp_get_num_threads = 8, omp_get_num_procs = 8
Hello, I am Thread 2, omp_get_num_threads = 8, omp_get_num_procs = 8
Hello, I am Thread 0, omp_get_num_threads = 8, omp_get_num_procs = 8
*/

case2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
void test1_2()
{
// `parallel`,用在一个代码段之前,表示这段代码block将被多个线程并行执行
omp_set_num_threads(4); // 设置线程数,一般设置的线程数不超过CPU核心数
#pragma omp parallel
{
printf("Hello, I am Thread %d, omp_get_num_threads = %d, omp_get_num_procs = %d\n",
omp_get_thread_num(),
omp_get_num_threads(),
omp_get_num_procs()
);
//std::cout << "Hello" << ", I am Thread " << omp_get_thread_num() << std::endl; // 0,1,2,3
}
}
/*
# use `cout`
HelloHello, I am Thread Hello, I am Thread , I am Thread Hello, I am Thread 2
1
3
0
*/

/* use `printf`
Hello, I am Thread 0, omp_get_num_threads = 4, omp_get_num_procs = 8
Hello, I am Thread 3, omp_get_num_threads = 4, omp_get_num_procs = 8
Hello, I am Thread 1, omp_get_num_threads = 4, omp_get_num_procs = 8
Hello, I am Thread 2, omp_get_num_threads = 4, omp_get_num_procs = 8
*/

notice the difference of std::cout and printf

case3

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
void test1_3()
{
// `parallel`,用在一个代码段之前,表示这段代码block将被多个线程并行执行
omp_set_num_threads(4);
#pragma omp parallel
for (int i = 0; i < 3; i++)
{
printf("i = %d, I am Thread %d\n", i, omp_get_thread_num());
}
}
/*
i = 0, I am Thread 1
i = 1, I am Thread 1
i = 2, I am Thread 1
i = 0, I am Thread 3
i = 1, I am Thread 3
i = 2, I am Thread 3
i = 0, I am Thread 2
i = 1, I am Thread 2
i = 2, I am Thread 2
i = 0, I am Thread 0
i = 1, I am Thread 0
i = 2, I am Thread 0
*/

omp parallel/for

omp parallel + omp for

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
void test2()
{
// `omp parallel` + `omp for` === `omp parallel for`
// `omp for` 用在一个for循环之前,表示for循环的每一次iteration将被分配到多个线程并行执行。
// 此处8次iteration被平均分配到4个thread执行,每个thread执行2次iteration
/*
iter #thread id
0,1 0
2,3 1
4,5 2
6,7 3
*/
omp_set_num_threads(4);
#pragma omp parallel
{
#pragma omp for
for (int i = 0; i < 8; i++)
{
printf("i = %d, I am Thread %d\n", i, omp_get_thread_num());
}
}
}
/*
i = 0, I am Thread 0
i = 1, I am Thread 0
i = 2, I am Thread 1
i = 3, I am Thread 1
i = 6, I am Thread 3
i = 7, I am Thread 3
i = 4, I am Thread 2
i = 5, I am Thread 2
*/

omp parallel for

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
void test2_2()
{
// `parallel for`,用在一个for循环之前,表示for循环的每一次iteration将被分配到多个线程并行执行。
// 此处8次iteration被平均分配到4个thread执行,每个thread执行2次iteration
/*
iter #thread id
0,1 0
2,3 1
4,5 2
6,7 3
*/
omp_set_num_threads(4);
#pragma omp parallel for
for (int i = 0; i < 8; i++)
{
printf("i = %d, I am Thread %d\n", i, omp_get_thread_num());
}
}
/*
i = 0, I am Thread 0
i = 1, I am Thread 0
i = 4, I am Thread 2
i = 5, I am Thread 2
i = 6, I am Thread 3
i = 7, I am Thread 3
i = 2, I am Thread 1
i = 3, I am Thread 1
*/

sqrt case

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
void base_sqrt()
{
boost::posix_time::ptime pt1 = boost::posix_time::microsec_clock::local_time();

float a = 0;
for (int i=0;i<1000000000;i++)
a = sqrt(i);

boost::posix_time::ptime pt2 = boost::posix_time::microsec_clock::local_time();
int64_t cost = (pt2 - pt1).total_milliseconds();
printf("Worker Thread = %d, cost = %d ms\n",omp_get_thread_num(), cost);
}

void test2_3()
{
boost::posix_time::ptime pt1 = boost::posix_time::microsec_clock::local_time();

omp_set_num_threads(8);
#pragma omp parallel for
for (int i=0;i<8;i++)
base_sqrt();

boost::posix_time::ptime pt2 = boost::posix_time::microsec_clock::local_time();
int64_t cost = (pt2 - pt1).total_milliseconds();
printf("Main Thread = %d, cost = %d ms\n",omp_get_thread_num(), cost);
}

sequential

time ./demo_openmp
Worker Thread = 0, cost = 1746 ms
Worker Thread = 0, cost = 1711 ms
Worker Thread = 0, cost = 1736 ms
Worker Thread = 0, cost = 1734 ms
Worker Thread = 0, cost = 1750 ms
Worker Thread = 0, cost = 1718 ms
Worker Thread = 0, cost = 1769 ms
Worker Thread = 0, cost = 1732 ms
Main Thread = 0, cost = 13899 ms
./demo_openmp  13.90s user 0.00s system 99% cpu 13.903 total

parallel

time ./demo_openmp
Worker Thread = 1, cost = 1875 ms
Worker Thread = 6, cost = 1876 ms
Worker Thread = 0, cost = 1876 ms
Worker Thread = 7, cost = 1876 ms
Worker Thread = 5, cost = 1877 ms
Worker Thread = 3, cost = 1963 ms
Worker Thread = 4, cost = 2000 ms
Worker Thread = 2, cost = 2027 ms
Main Thread = 0, cost = 2031 ms
./demo_openmp  15.10s user 0.01s system 740% cpu 2.041 total

2031s + 10ms(system) = 2041ms (total)
2.041* 740% = 15.1034 s

parallel sections

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
void test3()
{
boost::posix_time::ptime pt1 = boost::posix_time::microsec_clock::local_time();

omp_set_num_threads(4);
// `parallel sections`里面的内容要并行执行,具体分工上,每个线程执行其中的一个`section`
#pragma omp parallel sections // parallel
{
#pragma omp section // thread-0
{
base_sqrt();
}

#pragma omp section // thread-1
{
base_sqrt();
}

#pragma omp section // thread-2
{
base_sqrt();
}

#pragma omp section // thread-3
{
base_sqrt();
}
}

boost::posix_time::ptime pt2 = boost::posix_time::microsec_clock::local_time();
int64_t cost = (pt2 - pt1).total_milliseconds();
printf("Main Thread = %d, cost = %d ms\n",omp_get_thread_num(), cost);
}
/*
time ./demo_openmp
Worker Thread = 0, cost = 1843 ms
Worker Thread = 1, cost = 1843 ms
Worker Thread = 3, cost = 1844 ms
Worker Thread = 2, cost = 1845 ms
Main Thread = 0, cost = 1845 ms
./demo_openmp 7.39s user 0.00s system 398% cpu 1.855 total
*/

private

error case

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
void test4_error()
{
int i,j;
omp_set_num_threads(4);
// we get error result, because `j` is shared between all worker threads.
#pragma omp parallel for
for(i = 0; i < 4; i++) {
for(j = 0; j < 8; j++) {
printf("Worker Thread = %d, j = %d ms\n",omp_get_thread_num(), j);
}
}
}
/*
Worker Thread = 3, j = 0 ms
Worker Thread = 3, j = 1 ms
Worker Thread = 0, j = 0 ms
Worker Thread = 0, j = 3 ms
Worker Thread = 0, j = 4 ms
Worker Thread = 0, j = 5 ms
Worker Thread = 3, j = 2 ms
Worker Thread = 3, j = 7 ms
Worker Thread = 0, j = 6 ms
Worker Thread = 1, j = 0 ms
Worker Thread = 2, j = 0 ms
*/

error results.

fix1 by changing code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
void test4_fix1()
{
int i;
omp_set_num_threads(4);
// we get error result, because `j` is shared between all worker threads.
// fix1: we have to change original code to make j as local variable
#pragma omp parallel for
for(i = 0; i < 4; i++) {
int j; // fix1: `int j`
for(j = 0; j < 8; j++) {
printf("Worker Thread = %d, j = %d ms\n",omp_get_thread_num(), j);
}
}
}

/*
Worker Thread = 0, j = 0 ms
Worker Thread = 0, j = 1 ms
Worker Thread = 2, j = 0 ms
Worker Thread = 2, j = 1 ms
Worker Thread = 1, j = 0 ms
Worker Thread = 1, j = 1 ms
Worker Thread = 1, j = 2 ms
Worker Thread = 1, j = 3 ms
Worker Thread = 1, j = 4 ms
Worker Thread = 1, j = 5 ms
Worker Thread = 1, j = 6 ms
Worker Thread = 1, j = 7 ms
Worker Thread = 2, j = 2 ms
Worker Thread = 2, j = 3 ms
Worker Thread = 2, j = 4 ms
Worker Thread = 2, j = 5 ms
Worker Thread = 2, j = 6 ms
Worker Thread = 2, j = 7 ms
Worker Thread = 0, j = 2 ms
Worker Thread = 0, j = 3 ms
Worker Thread = 0, j = 4 ms
Worker Thread = 0, j = 5 ms
Worker Thread = 0, j = 6 ms
Worker Thread = 0, j = 7 ms
Worker Thread = 3, j = 0 ms
Worker Thread = 3, j = 1 ms
Worker Thread = 3, j = 2 ms
Worker Thread = 3, j = 3 ms
Worker Thread = 3, j = 4 ms
Worker Thread = 3, j = 5 ms
Worker Thread = 3, j = 6 ms
Worker Thread = 3, j = 7 ms
*/

fix2 by private(j)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
void test4_fix2()
{
int i,j;
omp_set_num_threads(4);
// we get error result, because `j` is shared between all worker threads.
// fix1: we have to change original code to make j as local variable
// fix2: use `private(j)`, no need to change original code
#pragma omp parallel for private(j) // fix2
for(i = 0; i < 4; i++) {
for(j = 0; j < 8; j++) {
printf("Worker Thread = %d, j = %d ms\n",omp_get_thread_num(), j);
}
}
}

/*
Worker Thread = 0, j = 0 ms
Worker Thread = 0, j = 1 ms
Worker Thread = 0, j = 2 ms
Worker Thread = 0, j = 3 ms
Worker Thread = 0, j = 4 ms
Worker Thread = 0, j = 5 ms
Worker Thread = 0, j = 6 ms
Worker Thread = 0, j = 7 ms
Worker Thread = 2, j = 0 ms
Worker Thread = 2, j = 1 ms
Worker Thread = 2, j = 2 ms
Worker Thread = 2, j = 3 ms
Worker Thread = 2, j = 4 ms
Worker Thread = 2, j = 5 ms
Worker Thread = 2, j = 6 ms
Worker Thread = 2, j = 7 ms
Worker Thread = 3, j = 0 ms
Worker Thread = 3, j = 1 ms
Worker Thread = 3, j = 2 ms
Worker Thread = 3, j = 3 ms
Worker Thread = 3, j = 4 ms
Worker Thread = 3, j = 5 ms
Worker Thread = 1, j = 0 ms
Worker Thread = 1, j = 1 ms
Worker Thread = 1, j = 2 ms
Worker Thread = 1, j = 3 ms
Worker Thread = 1, j = 4 ms
Worker Thread = 1, j = 5 ms
Worker Thread = 1, j = 6 ms
Worker Thread = 1, j = 7 ms
Worker Thread = 3, j = 6 ms
Worker Thread = 3, j = 7 ms
*/

reduction

error case

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
void test5_error()
{
int array[8] = {0,1,2,3,4,5,6,7};

int sum = 0;
omp_set_num_threads(4);
//#pragma omp parallel for reduction(+:sum)
#pragma omp parallel for // ERROR
for (int i = 0; i < 8; i++) {
sum += array[i];
printf("Worker Thread = %d, sum = %d ms\n",omp_get_thread_num(), sum);
}
printf("Main Thread = %d, sum = %d ms\n",omp_get_thread_num(), sum);
}
/*
// ERROR RESULT
Worker Thread = 0, sum = 0 ms
Worker Thread = 0, sum = 9 ms
Worker Thread = 3, sum = 8 ms
Worker Thread = 3, sum = 16 ms
Worker Thread = 1, sum = 2 ms
Worker Thread = 1, sum = 19 ms
Worker Thread = 2, sum = 4 ms
Worker Thread = 2, sum = 24 ms
Main Thread = 0, sum = 24 ms
*/

reduction(+:sum)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
void test5_fix()
{
int array[8] = {0,1,2,3,4,5,6,7};

int sum = 0;
/*
sum需要私有才能实现并行化,但是又必须是公有的才能产生正确结果;
sum公有或者私有都不对,为了解决这个问题,OpenMP提供了reduction语句.
内部实现中,OpenMP为每个线程提供了私有的sum变量(初始化为0),
当线程退出时,OpenMP再把每个线程私有的sum加在一起得到最终结果。
*/
omp_set_num_threads(4);
#pragma omp parallel for reduction(+:sum)
//#pragma omp parallel for // ERROR
for (int i = 0; i < 8; i++) {
sum += array[i];
printf("Worker Thread = %d, sum = %d ms\n",omp_get_thread_num(), sum);
}
printf("Main Thread = %d, sum = %d ms\n",omp_get_thread_num(), sum);
}

/*
Worker Thread = 0, sum = 0 ms
Worker Thread = 0, sum = 1 ms
Worker Thread = 1, sum = 2 ms
Worker Thread = 1, sum = 5 ms
Worker Thread = 3, sum = 6 ms
Worker Thread = 3, sum = 13 ms
Worker Thread = 2, sum = 4 ms
Worker Thread = 2, sum = 9 ms
Main Thread = 0, sum = 28 ms
*/

num_threads

1
2
3
4
5
6
7
8
9
10
11
12
13
14
void test6()
{
// `num_threads(4)` same as `omp_set_num_threads(4)`
#pragma omp parallel num_threads(4)
{
printf("Hello, I am Thread %d\n", omp_get_thread_num()); // 0,1,2,3,
}
}
/*
Hello, I am Thread 0
Hello, I am Thread 2
Hello, I am Thread 3
Hello, I am Thread 1
*/

schedule

(static,2)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
void test7_1()
{
omp_set_num_threads(4);
// static, num_loop/num_threads
#pragma omp parallel for schedule(static,2)
for (int i = 0; i < 8; i++)
{
printf("i = %d, I am Thread %d\n", i, omp_get_thread_num());
}
}
/*
i = 2, I am Thread 1
i = 3, I am Thread 1
i = 6, I am Thread 3
i = 7, I am Thread 3
i = 4, I am Thread 2
i = 5, I am Thread 2
i = 0, I am Thread 0
i = 1, I am Thread 0
*/

(static,4)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
void test7_2()
{
omp_set_num_threads(4);
// static, num_loop/num_threads
#pragma omp parallel for schedule(static,4)
for (int i = 0; i < 8; i++)
{
printf("i = %d, I am Thread %d\n", i, omp_get_thread_num());
}
}
/*
i = 0, I am Thread 0
i = 1, I am Thread 0
i = 2, I am Thread 0
i = 3, I am Thread 0
i = 4, I am Thread 1
i = 5, I am Thread 1
i = 6, I am Thread 1
i = 7, I am Thread 1
*/

(dynamic,1)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
void test7_3()
{
omp_set_num_threads(4);
// dynamic
#pragma omp parallel for schedule(dynamic,1)
for (int i = 0; i < 8; i++)
{
printf("i = %d, I am Thread %d\n", i, omp_get_thread_num());
}
}
/*
i = 0, I am Thread 2
i = 4, I am Thread 2
i = 5, I am Thread 2
i = 6, I am Thread 2
i = 7, I am Thread 2
i = 3, I am Thread 3
i = 1, I am Thread 0
i = 2, I am Thread 1
*/

(dynamic,3)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
void test7_4()
{
omp_set_num_threads(4);
// dynamic
#pragma omp parallel for schedule(dynamic,3)
for (int i = 0; i < 8; i++)
{
printf("i = %d, I am Thread %d\n", i, omp_get_thread_num());
}
}
/*
i = 0, I am Thread 0
i = 1, I am Thread 0
i = 2, I am Thread 0
i = 6, I am Thread 2
i = 7, I am Thread 2
i = 3, I am Thread 1
i = 4, I am Thread 1
i = 5, I am Thread 1
*/

schedule compare

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#define NUM 100000000

int isprime( int x )
{
for( int y = 2; y * y <= x; y++ )
{
if( x % y == 0 )
return 0;
}
return 1;
}

void test8()
{
int sum = 0;

#pragma omp parallel for reduction (+:sum) schedule(dynamic,1)
for( int i = 2; i <= NUM ; i++ )
{
sum += isprime(i);
}

printf( "Number of primes numbers: %d", sum );
}

no schedule

Number of primes numbers: 5761455./demo_openmp  151.64s user 0.04s system 582% cpu 26.048 total

schedule(static,1)

Number of primes numbers: 5761455./demo_openmp  111.13s user 0.00s system 399% cpu 27.799 total

schedule(dynamic,1)

Number of primes numbers: 5761455./demo_openmp  167.22s user 0.02s system 791% cpu 21.135 total

schedule(dynamic,200)

Number of primes numbers: 5761455./demo_openmp  165.96s user 0.02s system 791% cpu 20.981 total

OpenCV with OpenMP

see how-opencv-use-openmp-thread-to-get-performance

3 type OpenCV implementation

  • sequential implementation: default (slowest)
  • parallel implementation: OpenMP / TBB
  • GPU implementation: CUDA(fastest) / OpenCL

With CMake-gui, Building OpenCV with the WITH_OPENMP flag means that the internal functions will use OpenMP to parallelize some of the algorithms, like cvCanny, cvSmooth and cvThreshold.

In OpenCV, an algorithm can have a sequential (slowest) implementation; a parallel implementation using OpenMP or TBB; and a GPU implementation using OpenCL or CUDA(fastest). You can decide with the WITH_XXX flags which version to use.

Of course, not every algorithm can be parallelized.

Now, if you want to parallelize your methods with OpenMP, you have to implement it yourself.

concepts

avoiding extra copying

from improving-image-processing-speed

There is one important thing about increasing speed in OpenCV not related to processor nor algorithm and it is avoiding extra copying when dealing with matrices. I will give you an example taken from the documentation:

“…by constructing a header for a part of another matrix. It can be a single row, single column, several rows, several columns, rectangular region in the matrix (called a minor in algebra) or a diagonal. Such operations are also O(1), because the new header will reference the same data. You can actually modify a part of the matrix using this feature, e.g.”

parallel for

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#include "opencv2/highgui/highgui.hpp"
#include "opencv2/features2d/features2d.hpp"
#include <iostream>
#include <vector>
#include <omp.h>

void opencv_vector()
{
int imNum = 2;
std::vector<cv::Mat> imVec(imNum);
std::vector<std::vector<cv::KeyPoint>>keypointVec(imNum);
std::vector<cv::Mat> descriptorsVec(imNum);

cv::Ptr<cv::ORB> detector = cv::ORB::create();
cv::Ptr<DescriptorMatcher> matcher = cv::DescriptorMatcher::create("BruteForce-Hamming");

std::vector< cv::DMatch > matches;
char filename[100];
double t1 = omp_get_wtime();

//#pragma omp parallel for
for (int i=0;i<imNum;i++){
sprintf(filename,"rgb%d.jpg",i);
imVec[i] = cv::imread( filename, CV_LOAD_IMAGE_GRAYSCALE );
detector->detect( imVec[i], keypointVec[i] );
detector->compute( imVec[i],keypointVec[i],descriptorsVec[i]);
std::cout<<"find "<<keypointVec[i].size()<<" keypoints in im"<<i<<std::endl;
}

double t2 = omp_get_wtime();
std::cout<<"time: "<<t2-t1<<std::endl;

matcher->match(descriptorsVec[0], descriptorsVec[1], matches, 2); // uchar descriptor Mat

cv::Mat img_matches;
cv::drawMatches( imVec[0], keypointVec[0], imVec[1], keypointVec[1], matches, img_matches );
cv::namedWindow("Matches",CV_WINDOW_AUTOSIZE);
cv::imshow( "Matches", img_matches );
cv::waitKey(0);
}

parallel sections

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#pragma omp parallel sections
{
#pragma omp section
{
std::cout<<"processing im0"<<std::endl;
im0 = cv::imread("rgb0.jpg", CV_LOAD_IMAGE_GRAYSCALE );
detector.detect( im0, keypoints0);
extractor.compute( im0,keypoints0,descriptors0);
std::cout<<"find "<<keypoints0.size()<<"keypoints in im0"<<std::endl;
}

#pragma omp section
{
std::cout<<"processing im1"<<std::endl;
im1 = cv::imread("rgb1.jpg", CV_LOAD_IMAGE_GRAYSCALE );
detector.detect( im1, keypoints1);
extractor.compute( im1,keypoints1,descriptors1);
std::cout<<"find "<<keypoints1.size()<<"keypoints in im1"<<std::endl;
}
}

Reference

History

  • 20190403: created.

Guide

nvidia-smi

1
2
3
4
5
6
7
8
9
10
11
> nvidia-smi
Thu Mar 21 09:41:18 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.54 Driver Version: 396.54 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1060 Off | 00000000:01:00.0 Off | N/A |
| N/A 65C P0 30W / N/A | 538MiB / 6078MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

nvidia-ml-py

This is a wrapper around the NVML library.

Python methods wrap NVML functions, implemented in a C shared library.

Each function’s use is the same with the following exceptions: Instead of returning error codes, failing error codes are raised as Python exceptions.

1
2
pip install nvidia-ml-py2 --user
pip install nvidia-ml-py3 --user

demo

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# pip install nvidia-ml-py3 --user

import pynvml

try:
pynvml.nvmlInit()
except pynvml.NVMLError as error:
print(error)
# Driver Not Loaded 驱动加载失败(没装驱动或者驱动有问题)
# Insufficent Permission 没有以管理员权限运行 pynvml.NVMLError_DriverNotLoaded: Driver Not Loaded
exit()

try:
print(pynvml.nvmlDeviceGetCount())
except pynvml.NVMLError as error:
print(error)

print(pynvml.nvmlDeviceGetCount())# total gpu count = 1
print(pynvml.nvmlSystemGetDriverVersion()) # 396.54

GPU_ID = 0
handle = pynvml.nvmlDeviceGetHandleByIndex(GPU_ID)
print(pynvml.nvmlDeviceGetName(handle)) # GeForce GTX 1060

meminfo = pynvml.nvmlDeviceGetMemoryInfo(handle)
MB_SIZE = 1024*1024
print(meminfo.total/MB_SIZE) # 6078 MB
print(meminfo.used/MB_SIZE) # 531 MB
print(meminfo.free/MB_SIZE) # 5546 MB

pynvml.nvmlShutdown()

Reference

History

  • 20190321: created.

Guide

MeanShift

  • python: git clone https://github.com/mattnedrich/MeanShift_py.git
  • cpp: git https://github.com/mattnedrich/MeanShift_cpp.git

cpp compile

1
2
3
4
cd MeanShfit_cpp 
mkdir build && cd build && cmake .. && make -j8

./MeanShift_cpp

Visualization for linux

1
sudo apt-get install gnuplot gnuplot-qt

gnuplot
plot ‘test.csv’ with points, ‘result.csv’ with points

python demo

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
import mean_shift as ms
import matplotlib.pyplot as plt
import numpy as np

def ms_cluster(data):
# case(1) demo: kernel_bandwidth = 3.0, cluster_epsilon = 6
# case(2) laneseg: kernel_bandwidth = 0.5, cluster_epsilon = 2
mean_shifter = ms.MeanShift()
mean_shift_result = mean_shifter.cluster(data, kernel_bandwidth = 3, cluster_epsilon= 6)
return mean_shift_result

def sklearn_cluster(data):
from sklearn.cluster import MeanShift
from sklearn.cluster import estimate_bandwidth

bandwidth = estimate_bandwidth(data, quantile=0.2, n_samples=data.shape[0])
#print("bandwidth=",bandwidth) # 3
mean_shifter = MeanShift(bandwidth, bin_seeding=True)
mean_shifter.fit(data)

# get same results
original_points = data
cluster_centers = mean_shifter.cluster_centers_
cluster_ids = mean_shifter.labels_

mean_shift_result = ms.MeanShiftResult(original_points, cluster_centers, cluster_ids)
return mean_shift_result

def cluster_api(data, use_sklearn=True):
if use_sklearn:
return sklearn_cluster(data)
else:
return ms_cluster(data)

def print_cluster_result(mean_shift_result):
print("Original Point Shifted Point Cluster ID")
print("============================================")
for i in range(len(mean_shift_result.original_points)): # 125
original_point = mean_shift_result.original_points[i] # 125
cluster_id = mean_shift_result.cluster_ids[i] # 125 value=0,1,2
cluster_center = mean_shift_result.cluster_centers[cluster_id] # 3

print(
"(%5.2f,%5.2f) -> (%5.2f,%5.2f) cluster %i" %
(original_point[0], original_point[1],
cluster_center[0], cluster_center[1],
cluster_id)
)
print("============================================")

def main():

use_sklearn = True
data = np.genfromtxt('data.csv', delimiter=',')
print("data.shape=",data.shape)

mean_shift_result = cluster_api(data,use_sklearn)
#print_cluster_result(mean_shift_result)

original_points = mean_shift_result.original_points # (125, 2)
cluster_centers = mean_shift_result.cluster_centers # (3, 2)
cluster_ids = mean_shift_result.cluster_ids # (125,) value=[0,1,2]

unique_ids = np.unique(cluster_ids) # (3,) value=[0,1,2]

print("original_points.shape=",original_points.shape) # (125, 2)
print(original_points[:10])

print("cluster_centers.shape=",cluster_centers.shape) # (3, 2)
print(cluster_centers)

print("cluster_ids.shape=",cluster_ids.shape) # (125,)
print(cluster_ids) # [0,0,0,...1,1,1,...,2,2,2,...] 0,1,2 cluster ids

print("unique_ids.shape=",unique_ids.shape) # (3,)
print(unique_ids) # 0,1,2

x = original_points[:,0]
y = original_points[:,1]

fig = plt.figure()
ax = fig.add_subplot(111)
scatter = ax.scatter(x,y,c=cluster_ids,s=50)
for cx,cy in cluster_centers:
ax.scatter(cx,cy,s=50,c='red',marker='+')
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.colorbar(scatter)

if use_sklearn:
filename = "1_sklearn"
else:
filename = "2_ms"

fig.savefig(filename)
plt.show()
print("OK "+filename)

if __name__ == "__main__":
main()

meanshift_py

#===============================
# ms 
#===============================
('data.shape=', (125, 2))
('original_points.shape=', (125, 2))
[[10.91079039  8.38941202]
 [ 9.87500165  9.9092509 ]
 [ 7.8481223  10.4317483 ]
 [ 8.53412293  9.55908561]
 [10.38316846  9.61879086]
 [ 8.11061595  9.77471761]
 [10.02119468  9.53877962]
 [ 9.37705852  9.70853991]
 [ 7.67017034  9.60315231]
 [10.94308287 11.76207349]]
('cluster_centers.shape=', (3, 2))
[[-3.45216026  5.28851174]
 [ 5.02926925  3.56548696]
 [ 8.63149568  9.25488818]]
('cluster_ids.shape=', (125,))
[2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 1 0 0 0]
('unique_ids.shape=', (3,))
[0 1 2]
OK 2_ms

ms

sklearn

#===============================
# sklearn 
#===============================

('data.shape=', (125, 2))
('original_points.shape=', (125, 2))
[[10.91079039  8.38941202]
 [ 9.87500165  9.9092509 ]
 [ 7.8481223  10.4317483 ]
 [ 8.53412293  9.55908561]
 [10.38316846  9.61879086]
 [ 8.11061595  9.77471761]
 [10.02119468  9.53877962]
 [ 9.37705852  9.70853991]
 [ 7.67017034  9.60315231]
 [10.94308287 11.76207349]]
('cluster_centers.shape=', (3, 2))
[[ 4.79792283  3.01140269]
 [ 9.2548292  10.11312163]
 [-4.11368202  5.44826076]]
('cluster_ids.shape=', (125,))
[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 0 2 2 2]
('unique_ids.shape=', (3,))
[0 1 2]
OK 1_sklearn

sklearn

Reference

History

  • 20190318: created.