Problem with tensorflow in virtual environment

I have an issue with running tensorflow 2.4 in virtual env on workstation with ubuntu 18.04.
The base lambda stack installation is with tensorflow-gpu 1.15. But I need a 2.x tensorflow for testing. So I created a virtual env and installed tensorflow manually with commands:

python3 -m venv lambda-stack-without-tensorflow
source lambda-stack-without-tensorflow/bin/activate
pip install tensorflow-gpu

I have tested it with a minimal example and getting the error:

tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library ‘libcudart.so.11.0’; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory

the nvidia setup is:
driver version: 460.39
cuda version: 11.2

What to do? How to get it work?

1 Like

I also have this issue, using pyenv. Specifically, the system tensorflow does this:

import tensorflow as tf
2021-03-04 02:09:28.476244: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-03-04 02:09:28.478779: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
tf.__>>> tf.version
‘2.4.1’
tf.test.is_gpu_available()
WARNING:tensorflow:From :1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.config.list_physical_devices('GPU') instead.
2021-03-04 02:09:40.285870: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-03-04 02:09:40.287042: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-03-04 02:09:40.297151: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-04 02:09:40.297886: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:00:04.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-03-04 02:09:40.297923: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-03-04 02:09:40.301656: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-03-04 02:09:40.301770: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-03-04 02:09:40.303172: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-03-04 02:09:40.303563: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-03-04 02:09:40.307856: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2021-03-04 02:09:40.308859: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-03-04 02:09:40.309082: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-03-04 02:09:40.309238: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-04 02:09:40.309987: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-04 02:09:40.310630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-03-04 02:09:41.052871: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-04 02:09:41.052917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-03-04 02:09:41.052925: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-03-04 02:09:41.053184: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-04 02:09:41.053959: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-04 02:09:41.054651: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-04 02:09:41.055364: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/device:GPU:0 with 14762 MB memory) → physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:04.0, compute capability: 7.0)
True

but in a virtualenv with pip install tensorflow, I get:

import tensorflow as tf
2021-03-04 02:16:44.184936: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
tf.__v>>> tf.version
‘2.4.1’
tf.test.is_gpu_available()
WARNING:tensorflow:From :1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.config.list_physical_devices('GPU') instead.
2021-03-04 02:16:56.753687: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-03-04 02:16:56.754247: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-03-04 02:16:56.755363: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-03-04 02:16:56.765109: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-04 02:16:56.765835: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:00:04.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-03-04 02:16:56.765877: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-03-04 02:16:56.769396: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-03-04 02:16:56.769484: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-03-04 02:16:56.770882: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-03-04 02:16:56.771254: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-03-04 02:16:56.771409: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library ‘libcusolver.so.10’; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory
2021-03-04 02:16:56.772316: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-03-04 02:16:56.772429: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library ‘libcudnn.so.8’; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2021-03-04 02:16:56.772457: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at GPU サポート  |  TensorFlow for how to download and setup the required libraries for your platform.
Skipping registering GPU devices…
2021-03-04 02:16:56.949265: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-04 02:16:56.949304: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-03-04 02:16:56.949312: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
False

I can see that it s trying to load libcusolver.so.10 instead of 11, but why does libcudnn.so.8 fail?

Moreover, where is libcudnn?

ldconfig -p | grep libcudnn

returns nothing

Breaking news:

It’s here:
/usr/lib/python3/dist-packages/tensorflow

Yes CuDNN is with tensorflow and pytorch, and it is the specific version it is built with.

Pytorch: /usr/lib/python3/dist-packages/torch/lib/libcudnn.so.8
Tensorflow: /usr/lib/python3/dist-packages/tensorflow/libcudnn.so.8

And yes you can install a version from NVIDIA after registration and the EULA:
CUDA Deep Neural Network (cuDNN) | NVIDIA Developer

See: Lambda Stack: an AI software stack that's always up-to-date

e.g.:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcudnn8_8.1.1.33-1+cuda11.2_amd64.deb
sudo dpkg -i libcudnn8_8.1.1.33-1+cuda11.2_amd64.deb