Yes Lambda Stack is primarily for Python Development - Deep Learning. And there are requirements for building pytorch and tensorflow with CUDA and cuDNN that is on the system.
Most of the confusion for NVIDIA is because NVIDIA is not following best practices or standards.
Yes the Default NVIDIA install for NVIDIA is incorrect “/usr/local” is suppose to only be for users/sites local applications.
It is against the file system standards since the 1990’s.
/usr or /opt// are the proper locations.
And NVIDIA has a number of paths they may set.
make CUDAPATH=/usr
Or sometimes CUDA_HOME, etc.
- The NVIDIA Driver is installed correctly with the Lambda stack and is in the kernel.
- It properly installs the nvidia-persistenced and enables it on servers
- CUDA toolkit, nvcc is all installed in normal locations: /usr/bin, not in the non-standard /usr/local.
libcuda.so is installed properly in /usr as normal software is also.
These are all part of the standard paths, but NVIDiA software relies often on hard coded manual paths.
So in HPC, the use of modules (or the spin-off lmod). However fixing the paths for each application that hardcodes NVIDIA with different variables is always a issue.
$ which nvcc
/usr/bin/nvcc
$ whereis libcuda.so
libcuda: /usr/lib/x86_64-linux-gnu/libcuda.so
If you are not doing python Deep Learning, you can use the /usr/local or standard /opt. /usr/local is probably easier as it is hard to get NVIDIA or the various packages to fix their software. It has been this way since NVIDIA started.
Again CUDA and the NVIDIA driver are already installed.
The biggest trick to also installing in a second location is to do the install, but not install the NVIDIA Driver, as that causes problems for updates. Or you can remove lambda-stack if you do not use python or only use virtual environments.
I have instructions on how to overlay the two. Flavors of the cuda-toolkit, and cuDNN (in /usr/local) and with the Lambda Stack in standard locations.
I hope that somewhat clarifies the history and why. You are correct that cuDNN is not completely installed, it is just installed with the version of pytorch and tensorflow. But you need to match your cuDNN version with your CUDA and pytorch/tensorflow builds.
I have done 15+ years of C/C++ CUDA work (even fortran) with MPI. So I can help you get that setup.
Just sent to ‘support@lambdal.com’ and ask for Mark.