Fresh 20.04.1 LTS + Lambda Stack - libcudnn_adv_train.so.8

Hi folks.

I’ve setup a new workstation using a fresh install of Ubuntu Server 20.04.1 LTS and Lambda Stack ( Lambda Stack: an AI software stack that's always up-to-date ). I was messing my previous installation so I decided to go with a clean setup but it didn’t help.

A quick summary of my setup: Ryzen 5 3600, 16Gb RAM and a brand new Zotac Gaming GeForce RTX 3070 Twin Edge OC.

I’m having issues with deep learning examples. You can find the jupyter-notebook I’m testing here: Deep-Learning-Introduccion-practica-con-Keras/7.RedesNeuronalesRecurrentes.ipynb at master · jorditorresBCN/Deep-Learning-Introduccion-practica-con-Keras · GitHub

The problem is that libcudnn_adv_train.so.8 cannot be found; but it is on the filesystem:

$> locate libcudnn_adv_train
/usr/lib/python3/dist-packages/tensorflow/libcudnn_adv_train.so.8
/usr/lib/python3/dist-packages/torch/lib/libcudnn_adv_train.so.8

As far as I know, using the “Lambda Stack” implies not having to be modifying routes or environment variables. It is right?

This is the error stacktrace:

2020-11-09 18:19:39.502546: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7216 MB memory) -> physical GPU (device: 0, name: Graphics Device, pci bus id: 0000:29:00.0, compute capability: 8.6)
2020-11-09 18:19:39.844607: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
Could not load library libcudnn_adv_train.so.8. Error: libcudnn_ops_train.so.8: cannot open shared object file: No such file or directory
Please make sure libcudnn_adv_train.so.8 is in your library path!
[homelab1:03474] *** Process received signal ***
[homelab1:03474] Signal: Aborted (6)
[homelab1:03474] Signal code:  (-6)
[homelab1:03474] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x46210)[0x7f2e3cb33210]
[homelab1:03474] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f2e3cb3318b]
[homelab1:03474] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f2e3cb12859]
[homelab1:03474] [ 3] /usr/lib/python3/dist-packages/tensorflow/python/../libcudnn.so.8(cudnnRNNForwardTraining+0x230)[0x7f2dba9e80b0]
[homelab1:03474] [ 4] /usr/lib/python3/dist-packages/tensorflow/python/../libtensorflow_framework.so.2(cudnnRNNForwardTraining+0x8c)[0x7f2e29a25e1c]
[homelab1:03474] [ 5] /usr/lib/python3/dist-packages/tensorflow/python/../libtensorflow_framework.so.2(_ZN15stream_executor3gpu12CudnnSupport16DoRnnForwardImplIfEEN10tensorflow6StatusEPNS_6StreamERKNS0_18CudnnRnnDescriptorERKNS0_32CudnnRnnSequenceTensorDescriptorERKNS_12DeviceMemoryIT_EERKNS0_29CudnnRnnStateTensorDescriptorESH_SK_SH_SH_SC_PSF_SK_SL_SK_SL_bPNS_16ScratchAllocatorESN_PNS_3dnn13ProfileResultE+0xa95)[0x7f2e29a01cc5]
[homelab1:03474] [ 6] /usr/lib/python3/dist-packages/tensorflow/python/../libtensorflow_framework.so.2(_ZN15stream_executor3gpu12CudnnSupport12DoRnnForwardEPNS_6StreamERKNS_3dnn13RnnDescriptorERKNS4_27RnnSequenceTensorDescriptorERKNS_12DeviceMemoryIfEERKNS4_24RnnStateTensorDescriptorESE_SH_SE_SE_SA_PSC_SH_SI_SH_SI_bPNS_16ScratchAllocatorESK_PNS4_13ProfileResultE+0x5c)[0x7f2e29a022fc]
[homelab1:03474] [ 7] /usr/lib/python3/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.cpython-38-x86_64-linux-gnu.so(_ZN15stream_executor6Stream14ThenRnnForwardERKNS_3dnn13RnnDescriptorERKNS1_27RnnSequenceTensorDescriptorERKNS_12DeviceMemoryIfEERKNS1_24RnnStateTensorDescriptorESB_SE_SB_SB_S7_PS9_SE_SF_SE_SF_bPNS_16ScratchAllocatorESH_PNS1_13ProfileResultE+0xff)[0x7f2df691e22f]
[homelab1:03474] [ 8] /usr/lib/python3/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.cpython-38-x86_64-linux-gnu.so(+0xd893507)[0x7f2df5efc507]
[homelab1:03474] [ 9] /usr/lib/python3/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.cpython-38-x86_64-linux-gnu.so(_ZN10tensorflow17CudnnRNNForwardOpIN5Eigen9GpuDeviceEfE25ComputeAndReturnAlgorithmEPNS_15OpKernelContextEPN15stream_executor3dnn15AlgorithmConfigEbbi+0x5ef)[0x7f2df5f00aef]
[homelab1:03474] [10] /usr/lib/python3/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.cpython-38-x86_64-linux-gnu.so(_ZN10tensorflow17CudnnRNNForwardOpIN5Eigen9GpuDeviceEfE7ComputeEPNS_15OpKernelContextE+0x2d)[0x7f2df5f020fd]
[homelab1:03474] [11] /usr/lib/python3/dist-packages/tensorflow/python/../libtensorflow_framework.so.2(_ZN10tensorflow13BaseGPUDevice7ComputeEPNS_8OpKernelEPNS_15OpKernelContextE+0x245)[0x7f2e2950d5b5]
[homelab1:03474] [12] /usr/lib/python3/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.cpython-38-x86_64-linux-gnu.so(_ZN10tensorflow17KernelAndDeviceOp3RunEPNS_19ScopedStepContainerERKNS_15EagerKernelArgsEPSt6vectorINS_6TensorESaIS7_EEPNS_19CancellationManagerERKN4absl14lts_2020_02_258optionalINS_25EagerRemoteFunctionParamsEEE+0x7bb)[0x7f2dec41996b]
[homelab1:03474] [13] /usr/lib/python3/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.cpython-38-x86_64-linux-gnu.so(_ZN10tensorflow18EagerKernelExecuteEPNS_12EagerContextERKN4absl14lts_2020_02_2513InlinedVectorIPNS_12TensorHandleELm4ESaIS6_EEERKNS3_8optionalINS_25EagerRemoteFunctionParamsEEERKSt10unique_ptrINS_15KernelAndDeviceENS_4core15RefCountDeleterEEPNS_14GraphCollectorEPNS_19CancellationManagerENS3_4SpanIS6_EE+0x22e)[0x7f2dec3e317e]
[homelab1:03474] [14] /usr/lib/python3/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.cpython-38-x86_64-linux-gnu.so(_ZN10tensorflow11ExecuteNode3RunEv+0x168)[0x7f2dec3e3d78]
[homelab1:03474] [15] /usr/lib/python3/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.cpython-38-x86_64-linux-gnu.so(_ZN10tensorflow13EagerExecutor11SyncExecuteEPNS_9EagerNodeE+0x1c0)[0x7f2dec413740]
[homelab1:03474] [16] /usr/lib/python3/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.cpython-38-x86_64-linux-gnu.so(+0x3d7782e)[0x7f2dec3e082e]
[homelab1:03474] [17] /usr/lib/python3/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.cpython-38-x86_64-linux-gnu.so(_ZN10tensorflow12EagerExecuteEPNS_14EagerOperationEPPNS_12TensorHandleEPi+0xea)[0x7f2dec3e262a]
[homelab1:03474] [18] /usr/lib/python3/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.cpython-38-x86_64-linux-gnu.so(_ZN10tensorflow14EagerOperation7ExecuteEN4absl14lts_2020_02_254SpanIPNS_20AbstractTensorHandleEEEPi+0x16b)[0x7f2dec3d330b]
[homelab1:03474] [19] /usr/lib/python3/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.cpython-38-x86_64-linux-gnu.so(TFE_Execute+0x2d)[0x7f2debfe525d]
[homelab1:03474] [20] /usr/lib/python3/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.cpython-38-x86_64-linux-gnu.so(_Z24TFE_Py_ExecuteCancelableP11TFE_ContextPKcS2_PN4absl14lts_2020_02_2513InlinedVectorIP16TFE_TensorHandleLm4ESaIS7_EEEP7_objectP23TFE_CancellationManagerPNS5_IS7_Lm2ES8_EEP9TF_Status+0x454)[0x7f2debf753c4]
[homelab1:03474] [21] /usr/lib/python3/dist-packages/tensorflow/python/_pywrap_tfe.cpython-38-x86_64-linux-gnu.so(+0x34cc3)[0x7f2da3fc5cc3]
[homelab1:03474] [22] /usr/lib/python3/dist-packages/tensorflow/python/_pywrap_tfe.cpython-38-x86_64-linux-gnu.so(+0x598cf)[0x7f2da3fea8cf]
[homelab1:03474] [23] /usr/lib/python3/dist-packages/tensorflow/python/_pywrap_tfe.cpython-38-x86_64-linux-gnu.so(+0x5afd4)[0x7f2da3febfd4]
[homelab1:03474] [24] /usr/bin/python3(PyCFunction_Call+0x59)[0x5f4249]
[homelab1:03474] [25] /usr/bin/python3(_PyObject_MakeTpCall+0x296)[0x5f46d6]
[homelab1:03474] [26] /usr/bin/python3(_PyEval_EvalFrameDefault+0x5de6)[0x570936]
[homelab1:03474] [27] /usr/bin/python3(_PyEval_EvalCodeWithName+0x26a)[0x56955a]
[homelab1:03474] [28] /usr/bin/python3(_PyFunction_Vectorcall+0x393)[0x5f7323]
[homelab1:03474] [29] /usr/bin/python3(_PyEval_EvalFrameDefault+0x1901)[0x56c451]
[homelab1:03474] *** End of error message ***

It’s because Lambda Stack doesn’t currently have libcudnn version 8 support. It should be coming shortly.

1 Like

It’s good to know. For now I’m going with 455.34 drivers, nvidia-container-toolkit and Docker nvcr.io tensorflow 20.10 images.

Any updates on when we can expect it to be supported?