How to run H100 with docker in sm_86 compatibility mode?

I’m trying to use a PyTorch based container.
The nvidia runtime wasn’t automatically available.
I downloaded and manually installed the NVIDIA container runtime on my H100 instance.
I then rebooted, and started the container with --runtime-nvidia
However, I get a complaint that PyTorch doesn’t work on sm_90 hardware.
Which seems weird – wouldn’t it be backwards compatble? What am I missing?

+ docker run --runtime=nvidia --rm -it --name train -v /mpt:/mpt -v /mpt/cache:/root/.cache train composer train.py observe-help.yaml

==========
== CUDA ==
==========

CUDA Version 11.7.1

Container image Copyright (c) 2016-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

/usr/lib/python3/dist-packages/torch/cuda/__init__.py:155: UserWarning: 
NVIDIA H100 PCIe with CUDA capability sm_90 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75 sm_80 sm_86.
If you want to use the NVIDIA H100 PCIe GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))

Although nvidia-smi is saying it’s working …

ubuntu@209-20-157-250:~$ nvidia-smi
Thu May 25 23:27:16 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA H100 PCIe    On   | 00000000:06:00.0 Off |                    0 |
| N/A   67C    P0   289W / 350W |  36667MiB / 81559MiB |    100%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      3686      C   /usr/bin/python3.10             36664MiB |
+-----------------------------------------------------------------------------+

confused

and then, no.

  File "/usr/lib/python3/dist-packages/triton_pre_mlir/compiler.py", line 901, in _compile
    name, asm, shared_mem = _triton.code_gen.compile_ttir(backend, module, device, num_warps, num_stages, extern_libs, cc)
RuntimeError: Internal Triton PTX codegen error: 
ptxas fatal   : Value 'sm_90' is not defined for option 'gpu-name'

Hello @jwatte,

For H100 instances, you are going to have to install CUDA version 11.8 or higher.
I would suggest using the newest container found here:

You can also setup a Conda environment with Pytorch 2.0+CUDA 11.8. I can send the instructions if you are interested.

Let me know if this works for you.

Hi @JosephM , Can you please share instruction of setup Conda env with pytorch 2.0+CUDA 11.8 for H100 when you get a chance? Thanks. I tried setup a conda env on H100 but got the following msg: NVIDIA H100 PCIe with CUDA capability sm_90 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75 sm_80 sm_86.
I checked pytorch version which is 2.0.1 Thanks :slight_smile:

cuda11.8 worked. thanks:)

Hello @artemis, thanks for confirming that CUDA 11.8 worked.

Here are the steps to setup a Conda environment with Pytorch 2.0+CUDA 11.8:

Install miniconda. You can run it without super user privileges and keep it local. See link here on how to download and configure: Miniconda — conda documentation.
For my example I will setup Python 3.9 so here’s what I downloaded and ran. Just follow the prompts:

Python 3.9
wget https://repo.anaconda.com/miniconda/Miniconda3-py39_23.3.1-0-Linux-x86_64.sh
chmod u+x Miniconda3-py39_23.3.1-0-Linux-x86_64.sh
bash Miniconda3-py39_23.3.1-0-Linux-x86_64.sh

Create your environment. I created an python3.9 environment named ‘test’. Make sure you activate your environment:

conda create -n test python=3.9
conda activate test

Install the required pytorch packages. You may notice that we use the test environment’s ‘pip’ so that it will be installed only within the domain we are setting up:

/home/ubuntu/miniconda3/envs/test/bin/pip install torch==2.0.0+cu118 torchaudio==2.0.0+cu118 torchvision==0.15.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118

Install the other CUDA packages:

conda install cudnn
conda install -c "nvidia/label/cuda-11.8.0" cuda-toolkit
conda install -c "nvidia/label/cuda-11.8.0" cuda-nvcc
conda install -c "nvidia/label/cuda-11.8.0" cuda-runtime

After that, test the pytorch installation:

python -c 'import torch ; print("PyTorch Version: ",torch.__version__) ; print("Is available: ", torch.cuda.is_available()) ; print("Current Device: ", torch.cuda.current_device()) ; print("Pytorch CUDA Compiled version: ", torch._C._cuda_getCompiledVersion()) ; print("Pytorch version: ", torch.version) ; print("pytorch file: ", torch.__file__) ; print("Number of GPUs: ",torch.cuda.device_count())'

Here’s what mine looks like:

(test) ubuntu@209-20-158-254:~$ python -c 'import torch ; print("PyTorch Verison: ",torch.__version__) ; print("Is available: ", torch.cuda.is_available()) ; print("Current Device: ", torch.cuda.current_device()) ; print("Pytorch CUDA Compiled version: ", torch._C._cuda_getCompiledVersion()) ; print("Pytorch version: ", torch.version) ; print("pytorch file: ", torch.__file__) ; print("Number of GPUs: ",torch.cuda.device_count())'
PyTorch Verison:  2.0.0+cu118
Is available:  True
Current Device:  0
Pytorch CUDA Compiled version:  11080
Pytorch version:  <module 'torch.version' from '/home/ubuntu/miniconda3/envs/test/lib/python3.9/site-packages/torch/version.py'>
pytorch file:  /home/ubuntu/miniconda3/envs/test/lib/python3.9/site-packages/torch/__init__.py
Number of GPUs:  1

Hi, I am running into the same issue as below. I am using a conda env:

NVIDIA H100 PCIe with CUDA capability sm_90 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75 sm_80 sm_86.

Cuda Version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

Packages installed

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             5.1                       1_gnu  
accelerate                0.23.0                   pypi_0    pypi
bitsandbytes              0.41.1                   pypi_0    pypi
bzip2                     1.0.8                h7b6447c_0  
ca-certificates           2023.08.22           h06a4308_0  
certifi                   2023.7.22                pypi_0    pypi
charset-normalizer        3.2.0                    pypi_0    pypi
cmake                     3.27.5                   pypi_0    pypi
cudatoolkit               11.8.0               h6a678d5_0  
filelock                  3.12.4                   pypi_0    pypi
fsspec                    2023.9.1                 pypi_0    pypi
huggingface-hub           0.17.2                   pypi_0    pypi
idna                      3.4                      pypi_0    pypi
jinja2                    3.1.2                    pypi_0    pypi
ld_impl_linux-64          2.38                 h1181459_1  
libffi                    3.4.4                h6a678d5_0  
libgcc-ng                 11.2.0               h1234567_1  
libgomp                   11.2.0               h1234567_1  
libstdcxx-ng              11.2.0               h1234567_1  
libuuid                   1.41.5               h5eee18b_0  
lit                       16.0.6                   pypi_0    pypi
markupsafe                2.1.3                    pypi_0    pypi
mpmath                    1.3.0                    pypi_0    pypi
ncurses                   6.4                  h6a678d5_0  
networkx                  3.1                      pypi_0    pypi
numpy                     1.26.0                   pypi_0    pypi
nvidia-cublas-cu11        11.10.3.66               pypi_0    pypi
nvidia-cuda-cupti-cu11    11.7.101                 pypi_0    pypi
nvidia-cuda-nvrtc-cu11    11.7.99                  pypi_0    pypi
nvidia-cuda-runtime-cu11  11.7.99                  pypi_0    pypi
nvidia-cudnn-cu11         8.5.0.96                 pypi_0    pypi
nvidia-cufft-cu11         10.9.0.58                pypi_0    pypi
nvidia-curand-cu11        10.2.10.91               pypi_0    pypi
nvidia-cusolver-cu11      11.4.0.1                 pypi_0    pypi
nvidia-cusparse-cu11      11.7.4.91                pypi_0    pypi
nvidia-nccl-cu11          2.14.3                   pypi_0    pypi
nvidia-nvtx-cu11          11.7.91                  pypi_0    pypi
openssl                   3.0.10               h7f8727e_2  
packaging                 23.1                     pypi_0    pypi
pip                       23.2.1          py310h06a4308_0  
protobuf                  4.24.3                   pypi_0    pypi
psutil                    5.9.5                    pypi_0    pypi
python                    3.10.13              h955ad1f_0  
pyyaml                    6.0.1                    pypi_0    pypi
readline                  8.2                  h5eee18b_0  
regex                     2023.8.8                 pypi_0    pypi
requests                  2.31.0                   pypi_0    pypi
safetensors               0.3.3                    pypi_0    pypi
scipy                     1.11.2                   pypi_0    pypi
sentencepiece             0.1.99                   pypi_0    pypi
setuptools                68.0.0          py310h06a4308_0  
sqlite                    3.41.2               h5eee18b_0  
sympy                     1.12                     pypi_0    pypi
tk                        8.6.12               h1ccaba5_0  
tokenizers                0.13.3                   pypi_0    pypi
torch                     2.0.1                    pypi_0    pypi
tqdm                      4.66.1                   pypi_0    pypi
transformers              4.33.2                   pypi_0    pypi
triton                    2.0.0                    pypi_0    pypi
typing-extensions         4.8.0                    pypi_0    pypi
tzdata                    2023c                h04d1e81_0  
urllib3                   2.0.4                    pypi_0    pypi
wheel                     0.38.4          py310h06a4308_0  
xz                        5.4.2                h5eee18b_0  
zlib                      1.2.13               h5eee18b_0