Problem with testing full capabilities of models on cloud H100.

lukigg · October 27, 2024, 3:11pm

Hi,

I created a docker image of my models and tested it on my local PC for a similar speed to be sure it work well. Then I wanted to put it on the cloud to test the speeds on H100. I use lambda labs and install everything as they described Lambda Stack: an AI software stack that’s always up-to-date here Set up a GPU accelerated Docker container using Lambda Stack + Lambda Stack Dockerfiles on Ubuntu 20.04 LTS. I had some issues but got it working but on docker, I got much higher computing times for the llama model and parler tts than on my 3090. Then I checked nvdia-smi and nvidia --version and it didn’t see GPU, so I decided to install packages through the requirements file on my cloud and just put files here as here I see gpu but got the same result. As a base image, I used Nvidia prepared image that should have everything set up and it worked on my pc nvcr.io/nvidia/pytorch:23.08-py3. Am I missing something I have also conda that I used on my local PC maybe I could use it as a base so it will be better?

Topic		Replies	Views
NVIDIA container runtime error - Tensorbook	1	1648	September 27, 2020
Lambda Stack and Nvidia Data Science Workbench Technical Help	2	1388	June 8, 2022
Lambda stack with ubuntu 18.04 and rtx 2080ti Technical Help	0	1663	November 20, 2018
Lambda Quad: Using Anaconda / Docker? Technical Help	5	2846	May 5, 2018
Updating Stack on GPU Cloud to latest versions Technical Help	1	1071	April 30, 2023

Problem with testing full capabilities of models on cloud H100.

Related topics