I am working on developing models that require Cuda 12.1, but lambda stack seems to pin the Cuda version to 12.4. I have no issues Downgrading Cuda from 12.4 to 12.1 on an AWS EC2 instance. However, with lambda stack some of the drivers seem to be pinned to a lambda
url and auto-choose the versions. Is there any way to use Lambda with a specific version of Cuda?
I clear all Nvidia drivers with:
sudo apt-get --purge remove "*cublas*" "cuda*" "nsight*"
sudo apt-get --purge remove "*nvidia*"
sudo apt-get --purge remove "libcuda*"
sudo apt-get remove --purge
sudo apt-get autoremove
sudo apt-get autoclean
sudo rm -rf /usr/local/cuda*
Because I cannot uninstall Cuda with the normal installer because of sudo being password protected.
I try to install 12.1 with:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.1.1/local_installers/cuda-repo-ubuntu2204-12-1-local_12.1.1-530.30.02-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-1-local_12.1.1-530.30.02-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-1-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda
but then I get:
cuda-drivers-530 : Depends: nvidia-dkms-530 (>= 530.30.02)
Depends: nvidia-kernel-common-530 (>= 530.30.02) but it is not installable
Depends: nvidia-kernel-source-530 (>= 530.30.02) but it is not installable or
nvidia-kernel-open-530 (>= 530.30.02) but it is not installable
Depends: nvidia-utils-530 (>= 530.30.02) but it is not installable
Depends: xserver-xorg-video-nvidia-530 (>= 530.30.02) but it is not installable
libnvidia-gl-550 : Depends: libnvidia-compute-550 (= 550.120-0lambda0.22.04.1) but it is not installable
nvidia-driver-550 : Depends: libnvidia-compute-550 (= 550.120-0lambda0.22.04.1) but it is not installable
Depends: nvidia-compute-utils-550 (= 550.120-0lambda0.22.04.1) but it is not installable
Depends: libnvidia-decode-550 (= 550.120-0lambda0.22.04.1) but it is not installable
Depends: libnvidia-encode-550 (= 550.120-0lambda0.22.04.1) but it is not installable
Depends: libnvidia-fbc1-550 (= 550.120-0lambda0.22.04.1) but it is not installable
Recommends: libnvidia-compute-550:i386 (= 550.120-0lambda0.22.04.1) but it is not installable
Recommends: libnvidia-decode-550:i386 (= 550.120-0lambda0.22.04.1) but it is not installable
Recommends: libnvidia-encode-550:i386 (= 550.120-0lambda0.22.04.1) but it is not installable
Recommends: libnvidia-extra-550:i386 (= 550.120-0lambda0.22.04.1) but it is not installable
Recommends: libnvidia-fbc1-550:i386 (= 550.120-0lambda0.22.04.1) but it is not installable
Recommends: libnvidia-gl-550:i386 (= 550.120-0lambda0.22.04.1) but it is not installable
So I recognize that the instance comes with the driver 550.120, and that doesn’t work with 12.1… So I try to install driver 530
sudo apt install nvidia-driver-530
Which works, and then I try to remove 550, sudo apt autoremove
, and reboot sudo reboot now
, but driver 550 is still installed. Trying to change the driver from 550 to 535 or 550 seems to just install 550.
Alternatively, trying to directly install Cuda 12.1 without removing driver 550 gives the following error, saying that nvidia-dkms-530 could not be configured. But trying to configure it with sudo dpkg --configure nvidia-dkms-530
fails. So I try to uninstall and reinstall the drivers with,
sudo dkms remove nvidia/530 --all
sudo dkms build nvidia/530
sudo dkms install nvidia/530
But that doesn’t work either.
I tried looking at the make.log with cat /var/lib/dkms/nvidia/530.30.02/build/make.log
, but that just said
var/lib/dkms/nvidia/530.30.02/build/nvidia/nv-pat.o] Error 1
make[3]: *** [scripts/Makefile.build:243: /var/lib/dkms/nvidia/530.30.02/build/nvidia/nv-vtophys.o] Error 1
make[3]: *** [scripts/Makefile.build:243: /var/lib/dkms/nvidia/530.30.02/build/nvidia/nv-pci.o] Error 1
make[3]: *** [scripts/Makefile.build:243: /var/lib/dkms/nvidia/530.30.02/build/nvidia/nv-usermap.o] Error 1
make[3]: *** [scripts/Makefile.build:243: /var/lib/dkms/nvidia/530.30.02/build/nvidia/nv-vm.o] Error 1
make[3]: *** [scripts/Makefile.build:243: /var/lib/dkms/nvidia/530.30.02/build/nvidia/nv-mmap.o] Error 1
make[3]: *** [scripts/Makefile.build:243: /var/lib/dkms/nvidia/530.30.02/build/nvidia/nv-p2p.o] Error 1
make[3]: *** [scripts/Makefile.build:243: /var/lib/dkms/nvidia/530.30.02/build/nvidia/nv-i2c.o] Error 1
make[3]: *** [scripts/Makefile.build:243: /var/lib/dkms/nvidia/530.30.02/build/nvidia/nv-procfs.o] Error 1
make[3]: *** [scripts/Makefile.build:243: /var/lib/dkms/nvidia/530.30.02/build/nvidia/os-interface.o] Error 1
make[3]: *** [scripts/Makefile.build:243: /var/lib/dkms/nvidia/530.30.02/build/nvidia/nv.o] Error 1
make[2]: *** [/usr/src/linux-headers-6.8.0-47-generic/Makefile:1925: /var/lib/dkms/nvidia/530.30.02/build] Error 2
make[1]: *** [Makefile:240: __sub-make] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-6.8.0-47-generic'
make: *** [Makefile:82: modules] Error 2