0%

how to install docker and nvidia-docker2 on ubuntu 16.04

Docker Guide

install docker

1
2
3
4
5
6
7
8
9
10
11
12
13
# step 1: install tools
sudo apt-get update
sudo apt-get -y install apt-transport-https ca-certificates curl software-properties-common

# step 2: install GPG
curl -fsSL http://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo apt-key add -

# Step 3: add apt repo
sudo add-apt-repository "deb [arch=amd64] http://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable"

# Step 4: install docker-ce
sudo apt-get -y update
sudo apt-get -y install docker-ce

install docker-ce for given version

1
2
3
4
5
6
7
# Step 1: search versions
# apt-cache madison docker-ce
# docker-ce | 17.03.1~ce-0~ubuntu-xenial | http://mirrors.aliyun.com/docker-ce/linux/ubuntu xenial/stable amd64 Packages
# docker-ce | 17.03.0~ce-0~ubuntu-xenial | http://mirrors.aliyun.com/docker-ce/linux/ubuntu xenial/stable amd64 Packages

# Step 2: install given version
# sudo apt-get -y install docker-ce=17.03.1~ce-0~ubuntu-xenial

test docker

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
sudo docker version
Client:
Version: 18.06.1-ce
API version: 1.38
Go version: go1.10.3
Git commit: e68fc7a
Built: Tue Aug 21 17:24:56 2018
OS/Arch: linux/amd64
Experimental: false

Server:
Engine:
Version: 18.06.1-ce
API version: 1.38 (minimum version 1.12)
Go version: go1.10.3
Git commit: e68fc7a
Built: Tue Aug 21 17:23:21 2018
OS/Arch: linux/amd64
Experimental: false

docker namespace

host

id
uid=1000(kezunlin) gid=1000(kezunlin) groups=1000(kezunlin),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),113(lpadmin),128(sambashare)

sudo docker images
sudo docker run -it --name kzl -v /home/kezunlin/workspace/:/home/kezunlin/workspace nvidia/cuda

container

root@6f167ef72a80:/home/kezunlin/workspace# ll
total 48
drwxrwxr-x 12 1000 1000 4096 Nov 30 10:04 ./
drwxr-xr-x  3 root root 4096 Nov 30 10:14 ../
drwxrwxr-x 10 1000 1000 4096 Dec  5  2017 MyGit/
drwxrwxr-x 12 1000 1000 4096 Oct 31 03:01 blog/
drwxrwxr-x  5 1000 1000 4096 Sep 20 07:33 opencv/
drwxrwxr-x  4 1000 1000 4096 Oct 31 07:55 openmp/
drwxrwxr-x  5 1000 1000 4096 Jan  9  2018 qt/
drwxrwxr-x  2 1000 1000 4096 Jan  4  2018 ros/
drwxrwxr-x  4 1000 1000 4096 Nov 16  2017 voc/
drwxrwxr-x  5 1000 1000 4096 Aug  7 03:19 vs/
root@6f167ef72a80:/home/kezunlin/workspace# touch 1.txt

root@6f167ef72a80:/home/kezunlin/workspace# id
uid=0(root) gid=0(root) groups=0(root)

host

ll /home/kezunlin/workspace/
total 48
drwxrwxr-x 12 kezunlin kezunlin 4096 11月 30 18:14 ./
drwxr-xr-x 47 kezunlin kezunlin 4096 11月 30 18:04 ../

-rw-r--r--  1 root     root        0 11月 30 18:14 1.txt

drwxrwxr-x 12 kezunlin kezunlin 4096 10月 31 11:01 blog/
drwxrwxr-x  5 kezunlin kezunlin 4096 9月  20 15:33 opencv/
drwxrwxr-x  4 kezunlin kezunlin 4096 10月 31 15:55 openmp/
drwxrwxr-x  5 kezunlin kezunlin 4096 1月   9  2018 qt/
drwxrwxr-x  2 kezunlin kezunlin 4096 1月   4  2018 ros/
drwxrwxr-x  4 kezunlin kezunlin 4096 11月 16  2017 voc/
drwxrwxr-x  5 kezunlin kezunlin 4096 8月   7 11:19 vs/

install nvidia-docker2

The machine running the CUDA container only requires the NVIDIA driver, the CUDA toolkit doesn’t have to be installed.
Host系统只需要安装NVIDIA driver即可运行CUDA container。

install

remove nvidia-docker 1.0

1
2
3
# If you have nvidia-docker 1.0 installed: we need to remove it and all existing GPU containers
docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
sudo apt-get purge -y nvidia-docker

Add the package repositories

vim repo.sh

1
2
3
4
5
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list

run scripts

1
2
chmod +x repo.sh
./repo.sh

Install nvidia-docker2 and reload the Docker daemon configuration

1
2
sudo apt-get install -y nvidia-docker2
sudo pkill -SIGHUP dockerd

test

1
sudo docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi

output

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Unable to find image 'nvidia/cuda:latest' locally
latest: Pulling from nvidia/cuda
8ee29e426c26: Pull complete
6e83b260b73b: Pull complete
e26b65fd1143: Pull complete
40dca07f8222: Pull complete
b420ae9e10b3: Pull complete
a579c1327556: Pull complete
b440bb8df79e: Pull complete
de3b2ccf9562: Pull complete
a69a544d350e: Pull complete
02348b5db71c: Pull complete
Digest: sha256:5996fa2fc0666972360502fe32118286177b879a8a1a834a176e7786021b8cee
Status: Downloaded newer image for nvidia/cuda:latest
Mon Sep 3 10:08:27 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130 Driver Version: 384.130 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1060 Off | 00000000:01:00.0 Off | N/A |
| N/A 59C P8 8W / N/A | 408MiB / 6072MiB | 40% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+

or by tty

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
sudo docker run --runtime=nvidia -t -i --privileged nvidia/cuda bash

root@8f3ebd5ecbb6:/# nvidia-smi
Tue Sep 4 01:26:31 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130 Driver Version: 384.130 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1060 Off | 00000000:01:00.0 Off | N/A |
| N/A 56C P0 31W / N/A | 374MiB / 6072MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+

Advanced Topics

Default runtime

The default runtime used by the Docker® Engine is runc, our runtime can become the default one by configuring the docker daemon with --default-runtime=nvidia. Doing so will remove the need to add the --runtime=nvidia argument to docker run. It is also the only way to have GPU access during docker build.

Environment variables

The behavior of the runtime can be modified through environment variables (such as NVIDIA_VISIBLE_DEVICES).
Those environment variables are consumed by nvidia-container-runtime and are documented here.
Our official CUDA images use default values for these variables.

docker command

1
2
3
4
5
6
7
8
sudo docker image list
REPOSITORY TAG IMAGE ID CREATED SIZE
nvidia/cuda latest 04a9ce0dec6d 3 weeks ago 1.96GB

sudo docker run -it --privileged nvidia/cuda bash


docker build --network=host -t anakin:$tag . -f $DockerfilePath

kubernetes with GPU

kubernetes 对于 GPU 的支持截止到 1.9 版本,算是经历了3个阶段:

  • kubernetes 1.3 版本开始支持GPU,但是只支持单个 GPU卡;

  • kubernetes 1.6 版本开始支持对多个GPU卡的支持;

  • kubernetes 1.8 版本以 device plugin 方式提供对GPU的支持。

    ls /dev/nvidia*
    /dev/nvidia0 /dev/nvidia2 /dev/nvidia4 /dev/nvidia6 /dev/nvidiactl
    /dev/nvidia1 /dev/nvidia3 /dev/nvidia5 /dev/nvidia7

  • Kubernetes 1.8~1.9,通过k8s-device-plugin 获取每个Node上GPU的信息,根据这些信息对GPU资源进行管理和调度。需要结合 nvidia-docker2 使用。

  • k8s-device-plugin也是由 nvidia 提供,在kubernetes中可以DaemonSet方式运行。

Reference

History

  • 20180903: created.