We have some issues getting Tensorflow to work in a virtual environment on the Lambda Labs GPU Workstation with Ubuntu 20.04.
We test with the following minimal example tftest.py
:
from tensorflow.python.client import device_lib
print([x.name for x in device_lib.list_local_devices()])
We can run it and we get GPU:0 and GPU:1 -
asj@zlambda:~$ python3 tftest.py
... lots of tensorflow output ...
['/device:CPU:0', '/device:XLA_CPU:0', '/device:XLA_GPU:0', '/device:XLA_GPU:1', '/device:GPU:0', '/device:GPU:1']
However, if we run it in a virtual environment (as described on this official page: Lambda Stack: an AI software stack that's always up-to-date, Using Lambda Stack with python virtual environments, pip install tensorflow-gpu
), we instead get the following output:
(venv) asj@zlambda:~$ python3 tftest.py
... lots of tensorflow output ...
2020-11-30 15:11:44.852707: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
... lots of tensorflow output ...
['/device:CPU:0', '/device:XLA_CPU:0', '/device:XLA_GPU:0', '/device:XLA_GPU:1']
What to do? How can we make a virtual environment with tensorflow-gpu
on the Lambda machine, when we have Lambda Stack installed?
We are aware that we can use --system-site-packages
, but we would like to be able to build docker containers that can use the GPU, and while we have it working with Pytorch, it doesn’t work with Tensorflow. Since I can’t get it to work in a virtual environment, I assume this might be related.