Pytorch sometimes fails to recognize GPU

sgh42 · September 2, 2020, 1:01am

I’m using PyTorch on a Tensorbook, and sometimes it fails to see the GPU. That is, it becomes true that torch.cuda.is_available() == False. So far, I’ve found that rebooting reliably fixes the problem, but that’s of course not a very satisfactory long-term approach. Some googling indicates that the usual way of fixing the problem is to change the versions of CUDA libraries or other components (by reinstalling them). I’d rather not do this as I’m using the lambda stack and I don’t want to then have a non-standard installation. Is anyone else seeing this? I can post versions of various components if that’s helpful.

sabalaba · September 10, 2020, 11:45pm

Have you done any other installations alongside your Lambda Stack install? You can always email support@lambdalabs.com to report a Lambda Stack bug. Not sure if it’s a bug on our end. Is it a clean Ubuntu 20.04 LTS + Lambda Stack install?

sgh42 · September 14, 2020, 5:14pm

Thanks. Yes, it’s a brand-new Tensorbook. I installed my project using conda so it has local (to the project) versions of python (3.7.7) and pytorch (1.6.0). My conda environment doesn’t do anything with CUDA (I rely on the pytorch official Docker images when I deploy to the cloud).

sabalaba · September 18, 2020, 9:12am

Do you have this problem when you’re outside of conda and use the Lambda Stack versions of pytorch?

sgh42 · September 18, 2020, 2:24pm

I do see the same with /usr/bin/python, is that the correct python? I’m not sure where Lamba Stack installs stuff.

sabalaba · September 19, 2020, 11:45pm

Can you try updating to the latest version of Lambda Stack with this command:

sudo apt-get update && sudo apt-get upgrade -y

sgh42 · September 23, 2020, 4:32pm

Thanks. I’ve done that, it didn’t immediately fix the problem (CUDA was still unavailable) so I rebooted. I’ll let you know if the problem reoccurs post-reboot.

sabalaba · September 27, 2020, 4:46am

Did this fix the problem for you? Would be great to know!

sgh42 · September 28, 2020, 3:24pm

Unfortunately, I did the updates and it didn’t fix the problem :(, I still have to reboot sometimes to “reactivate” CUDA wrt PyTorch.

Topic		Replies	Views
Lambda Tensorbook - Unable to recognize GPU with PyTorch Technical Help	9	2918	March 4, 2022
Lambda workstation gpu not recognized Technical Help	1	1548	March 4, 2022
CUDA pytorch not working on a fresh install RTX 3070 Technical Help	1	1721	March 10, 2021
Can not keep pace with Pytorch Technical Help	2	2022	March 4, 2022
CUDA error after sudo apt-get update. UserWarning: CUDA initialization: CUDA unknown error....return torch._C._cuda_getDeviceCount() > 0 Technical Help	1	2661	May 6, 2021

Pytorch sometimes fails to recognize GPU

Related topics