Hyperplane Setup

Hyperplane Setup

Mellanox OFED, NVIDIA Peer Memory, NVIDIA Driver, CUDA, cuDNN & Docker

Download the Mellanox OFED tarball
and copy it to /tmp. We’ll need it later:

cp MLNX_OFED_LINUX-5.1-2.3.7.1-ubuntu18.04-x86_64.tgz /tmp

NVIDIA Peer Memory

Clone the NVIDIA Peer Memory repository & compile it:

git clone https://github.com/Mellanox/nv_peer_memory -b 1.0-9
cd nv_peer_memory
./build_module.sh
mv /tmp/nvidia-peer-memory_1.0.orig.tar.gz .
tar xzf nvidia-peer-memory_1.0.orig.tar.gz
cd nvidia-peer-memory
dpkg-buildpackage -us -uc

And copy the resulting debian files to /tmp also:

sudo cp ../*.deb /tmp

Then download the hyperplane_install.sh script to, saving it to /tmp. Then run it:

cd /tmp
wget files.lambdalabs.com/scripts/hyperplane_install.sh
chmod +x hyperplane_install.sh
sudo ./hyperplane_install.sh

Fabric Manager

The Fabric Manager is available in NVIDIA’s machine learning repo(if you’ve run hyperplane_install.sh above, the repo is already added).

Once you add the repos & trust the keys, you can install the Fabric Manager like so:

sudo apt -y install nvidia-fabricmanager-450

DCGM

You can download the Data Center GPU Manager debian file here(requires NVIDIA account), then it can be installed with:

sudo dpkg -i datacenter-gpu-manager_2.0.13_amd64.deb

Apt Mirrors

apt-mirror.sh configures nginx and apt-mirror.

wget files.lambdalabs.com/scripts/apt-mirror.sh
sh apt-mirror.sh

apt-sources.sh installs repos for the client machine. Pass it the hostname or IP of your apt-mirror server:

wget files.lambdalabs.com/scripts/apt-mirror.sh
sh apt-mirror.sh <apt-mirror host>

Apt

Though adding these is unnecesary if you’ve run hyperplane_install.sh,
we’re including these for clarity.

Repositories:

deb http://archive.ubuntu.com/ubuntu bionic main restricted universe multiverse
deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu18.04/x86_64 /
deb http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu18.04/x86_64 /
deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable
deb https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-docker/ubuntu18.04/$(ARCH) /

Public keys for the repos:

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
sudo apt-key adv --fetch-keys "http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub"
1 Like