CUDA error after sudo apt-get update. UserWarning: CUDA initialization: CUDA unknown error....return torch._C._cuda_getDeviceCount() > 0

Hello,
To set up the Pytorch and Tensorflow environment, I did a fresh install of ubuntu 20.04 and installed the lambda stack. The installation went without any errors nvndia-smi shows 460.73.01 and CUDA 11.2. However, I got the following error.

torch.cuda.is_available()

/usr/lib/python3/dist-packages/torch/cuda/init.py:52: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at …/c10/cuda/CUDAFunctions.cpp:109.)
return torch._C._cuda_getDeviceCount() > 0
False

I found a similar issue in this topic CUDA pytorch not working on a fresh install RTX 3070 - Technical Help - DeepTalk - Deep Learning Community, and I tried again a fresh install of ubuntu 20.04 and installed the lambda stack.
Then the torch.cuda.is_available() returned True, and apparently worked fine for a while but after running

sudo apt-get update && sudo apt-get dist-upgrade
sudo reboot

the torch.cuda.is_available() threw the same error (i.e., UserWarning: CUDA initialization: CUDA unknown error…return torch._C._cuda_getDeviceCount() > 0).

What can I do to fix this issue?

The followings are the details of my environment.

PyTorch version: 1.8.1
CUDA used to build PyTorch: 11.1

OS: Ubuntu 20.04.2 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
CMake version: version 3.16.3

Python version: 3.8 (64-bit runtime)
CUDA runtime version: 11.1.105
GPU models and configuration: GPU 0: GeForce GTX 960
Nvidia driver version: 460.73.01

1 Like

Disabling the secure boot just before installing the lambda stack solved the issue.