Updating Stack on GPU Cloud to latest versions

I am using the latest Python (3.11.3) and TensorFlow (2.12.0), and I was hoping to use LambdaLabs to speed up training. The Lambda website says: “Lambda Stack: an always updated AI software stack”, but it only has Python 3.8.10 and TensorFlow 2.9.1, so I guess not always updated.

If I run my training on the Stack as it is, my code is not compatible because I use TF features that aren’t available in 2.9.1. If I work around those, I still have problems and the training doesn’t progress the way it does on my Mac. I haven’t been able to find out why but I guess it’s down to version differences.

If I try to update Python and TensorFlow, the GPU isn’t used. I’ve tried the solution given here (Why can't my program find the NVIDIA cuDNN library? | Lambda Docs) but without any luck. Whatever I try, I cannot get a working updated environment. I’ve tried apt upgrade and I’ve tried creating a virtual environment. Everything I have tried has failed in a different way.

What would be great would be if Lambda Labs honoured their statement: “an always updated AI software stack”. Does anyone know if that will be happening soon?

Or has anyone been successful in getting a newer version of TensorFlow to work?

Having said that, I really like Lambda Labs GPU Cloud. It is potentially so easy to use, has a great pricing structure, and is very flexible. I hope I’ll be able to use it.

Julian

Here is one solution that I have now found.

pip install --upgrade tensorflow==2.10
for cudnn_so in /usr/lib/python3/dist-packages/tensorflow/libcudnn*; do
  sudo ln -s "$cudnn_so" /usr/lib/x86_64-linux-gnu/
done

The loop sets up symlinks according to Why can't my program find the NVIDIA cuDNN library? | Lambda Docs

I then had to alter my optimiser to:

self.optimizer = tf.keras.optimizers.legacy.Adam(learning_rate=self.learning_rate)

i.e. Use the legacy version.

I am now seeing a huge improvement in training speed - 12x faster than on my M1 Mac (CPU only).