Best practices for other package management without breaking my lambda stack?

First of all, thank you so so much for making the lambda-stack installer available :clap:

I installed it a while ago on a fresh ubuntu 22.04 using your instructions:

wget -nv -O- https://lambdalabs.com/install-lambda-stack.sh | sh -
sudo reboot

It worked beautifully and Tensorflow could talk to my GPU without any issues. Then I installed some additional libraries via pip: tensorflow_addons, tensorflow_probability. And this was fine, a pip freeze | grep tensor showed:

tensorboard-plugin-profile==2.11.1
tensorflow-addons==0.19.0
tensorflow-estimator==2.11.0
tensorflow-gpu==2.11.0
tensorflow-probability==0.19.0

However, when I installed tensorflow_graphics it installed the pip version of tensorflow and now it cannot see my GPU.

pip freeze | grep tensor

tensorboard==2.13.0
tensorboard-data-server==0.7.1
tensorboard-plugin-profile==2.11.1
tensorflow==2.13.0
tensorflow-addons==0.19.0
tensorflow-estimator==2.13.0
tensorflow-gpu==2.11.0
tensorflow-graphics==1.0.0
tensorflow-io-gcs-filesystem==0.32.0
tensorflow-probability==0.19.0

For now I guess I could just reinstall the lambda_stack with the same instructions even though my computer is no longer a fresh ubuntu.

My question is, what is the recommended way to install pip packages so that they don’t touch any software from the lambda stack?

Thank you!

The recommended way is to use a Python virtual environment (venv) or a conda virtual environment.

Hope this helps!

Yes, it is best practice not to use pip in your main account, but to use versioning (as Cody mentioned).
pip -v list | egrep -v “/usr/lib/python3/dist-packages”
* This will show all packages that are in your current environment not from Lambda.

It is best to use:

  • Docker
  • Python venv
  • Anaconda/Miniconda

This will allow you to have a environment for each code, so it does not conflict with others.

  • Docker images should be complete or at least reset to defaults on relaunching the image.
    NVIDIA NGC Tutorial: Run a PyTorch Docker Container using nvidia-container-toolkit on Ubuntu
  • Python venv - allows you to try with using system installed packages or without
    • This will setup a environment using system packages by default and you just add pip packages to see if they are compatible. Of course you may want to make the environment names much shorter, but for clarity I made them longer (these are affected by default pip installs in ~/.local or /usr/local):
      $ python -m venv --system-site-packages myenv-with-site-packages
      $ source ./myenv-site-packages/bin/activate
    • This will setup a independent environment, needed at times when libraries conflict
      $ python -m venv myenv-independent/bin/activate
      $ source ./myenv-independent
  • Anaconda always always replaces any except software in /usr/local or ~/.local
    • To setup conda you need to down load their script install, then for example with CUDA 11.8:
      $ conda create --name torch_gpu pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
      $ conda activate torch_gpu

Pytorch has a nice matrix on how to install (limited versions but helpful):
Start Locally | PyTorch
pip is limited on what it can install, and at times you need to change the LD_LIBRARY_PATH for pip packages (in Conda or in python venv). And always make sure your ~/.local and /usr/local do not have conflicts.

Also make sure ‘which python’ you are using. Example:
$ which python
/usr/bin/python
Versus
$ which python
/home/username/miniconda3/bin/python

I have additional examples and they break between versions and changes in packages.
https://github.com/markwdalton/lambdalabs/tree/main/documentation/software

Also a useful tip to find alternate versions without any work is use the “?” versus version and it will show you valid versions that are available.
$ pip install tensorflow-gpu==?
ERROR: Could not find a version that satisfies the requirement tensorflow-gpu==? (from versions: 2.8.0rc0, 2.8.0rc1, 2.8.0, 2.8.1, 2.8.2, 2.8.3, 2.8.4, 2.9.0rc0, 2.9.0rc1, 2.9.0rc2, 2.9.0, 2.9.1, 2.9.2, 2.9.3, 2.10.0rc0, 2.10.0rc1, 2.10.0rc2, 2.10.0rc3, 2.10.0, 2.10.1, 2.11.0rc0, 2.11.0rc1, 2.11.0rc2, 2.11.0, 2.12.0)
ERROR: No matching distribution found for tensorflow-gpu==?