NVIDIA driver not loaded after GCC upgrade

We have a Lambda Labs workstation running on 18.04.6 LTS (Bionic Beaver), the NVIDIA driver stopped loading after an user upgraded GCC few days ago. I have tried to uninstall and reinstalled the Lambda Stack for deep learning that is supposed to covered the NVIDIA drivers with all the deep learning modules like cuda, tensorflow and pytorch.

I have been researching on this issue for couple of days, and any helps would be greatly appreciated. I have checked that secure boot is disabled, and nvidia is not blacklisted in modprobe.d, and I have seen and tried other suggestions I found in other posts but nothing has worked so far. As far as I can tell, the compiled driver is 515.65.01 and GCC is 9.4.0

I found a similar case in NVIDIA Developer forum (Unable to load Nvidia Driver for Ubuntu 20.04 LTS - #3 by pratheek.ponnuru - Linux - NVIDIA Developer Forums), and the solution was to recompile the driver with standard header with:

sudo apt install --install-recommends linux-generic-hwe-20.04

I imagine that I would need to change from 20.04 to 18.04 to match my Ubuntu version, does that look like a possible solution?

Thanks,

1 Like

I would recommend sending a email to ‘support@lambdal.com’
With the ‘nvidia-bug-report.log.gz’ from ‘sudo nvidia-bug-report.sh’
and the ‘dpkg.txt’ from ‘sudo dpkg --list > dpkg.txt’

Also if you can showing the output of the:
$ sudo apt-get update && sudo apt-get upgrade

It should show the conflicts

In the past there was a common issue of the build would only work with specific GCC versions, so as long as it is the default that ships with Ubuntu LTS it should work. (the linux-generic-hwe-*, should only be the one for that ubuntu version).

Hi Mark,

Thank you for the input, the issue was resolved last Sunday with someone’s help from the NVIDIA Developer Forum (Ubuntu 18.04 NVIDIA driver not loaded after GCC update - #9 by generix - Linux - NVIDIA Developer Forums), I had to set NVreg_ModifyDeviceFiles=1 in /etc/modprobe.d/50_nvidia.conf, update initramfs, and reboot, and now nvidia-smi is working correctly.

The cause might be what you described that with the GCC version, it all started when the user updated GCC from the default 7.5 to 9.4.

Thanks,