Lambda Stack and non-python based applications and frameworks

I have a brand new Lambda Vector machine with Lambda stack installed. I am trying to understand the value of using the Lambda stack with it appears to only be useful when using Python. When using cuDNN or CUDA outside of the python environment, the system reports missing CUDA drivers and none of the NVIDIA post install steps have been done. I also cannot seem to find any documentation that indicates it installing things like the cuda-toolkit will mess up the Lambda install.

If the result is that the Lambda stack is only useful for running things like cuDNN, TF, and Caffe from within Python only, then it is of limited value to me as I need to access these things in a standard way from outside of Python.

Can somebody at Lambda point me in the direction of any documentation that explains how the Lambda stack works and how it might co-exist with other applications that need to use CUDA outside of Python?

Further update to thisā€¦ I saw an older post that indicated that other versions of CUDA could be installed. So for example, when downloading the NVIDA cuda samples and running make, it appears to be looking for NVCC in /usr/local/cuda, which does not exist. It also cannot find libcuda.so. if I install the latest version of CUDA from NVIDIA, will this break the lambda stack install?

Yes Lambda Stack is primarily for Python Development - Deep Learning. And there are requirements for building pytorch and tensorflow with CUDA and cuDNN that is on the system.

Most of the confusion for NVIDIA is because NVIDIA is not following best practices or standards.

Yes the Default NVIDIA install for NVIDIA is incorrect ā€œ/usr/localā€ is suppose to only be for users/sites local applications.
It is against the file system standards since the 1990ā€™s.
/usr or /opt// are the proper locations.

And NVIDIA has a number of paths they may set.
make CUDAPATH=/usr
Or sometimes CUDA_HOME, etc.

  1. The NVIDIA Driver is installed correctly with the Lambda stack and is in the kernel.
  2. It properly installs the nvidia-persistenced and enables it on servers
  3. CUDA toolkit, nvcc is all installed in normal locations: /usr/bin, not in the non-standard /usr/local.

libcuda.so is installed properly in /usr as normal software is also.

These are all part of the standard paths, but NVIDiA software relies often on hard coded manual paths.
So in HPC, the use of modules (or the spin-off lmod). However fixing the paths for each application that hardcodes NVIDIA with different variables is always a issue.

$ which nvcc
/usr/bin/nvcc
$ whereis libcuda.so
libcuda: /usr/lib/x86_64-linux-gnu/libcuda.so

If you are not doing python Deep Learning, you can use the /usr/local or standard /opt. /usr/local is probably easier as it is hard to get NVIDIA or the various packages to fix their software. It has been this way since NVIDIA started.

Again CUDA and the NVIDIA driver are already installed.
The biggest trick to also installing in a second location is to do the install, but not install the NVIDIA Driver, as that causes problems for updates. Or you can remove lambda-stack if you do not use python or only use virtual environments.

I have instructions on how to overlay the two. Flavors of the cuda-toolkit, and cuDNN (in /usr/local) and with the Lambda Stack in standard locations.

I hope that somewhat clarifies the history and why. You are correct that cuDNN is not completely installed, it is just installed with the version of pytorch and tensorflow. But you need to match your cuDNN version with your CUDA and pytorch/tensorflow builds.

I have done 15+ years of C/C++ CUDA work (even fortran) with MPI. So I can help you get that setup.
Just sent to ā€˜support@lambdal.comā€™ and ask for Mark.

1 Like