VLLM cluster using VLLM production stack

Hello community,

I’m looking forward to renting multiple lambdalabs machines (with H100 gpus) and deploying them on a cluster using the vllm production stack.

My goal is to be able to manually purchase more machines from lambda labs and connect them to the cluster as my infrastructure get more load (→ more GPUs and more VLLM instances).

Any one has achieved something similar to this and would be willing to give me a few hints ? I followed the vllm production stack guide but I only managed to make it work with minikube (a single worker). I’d like to make it work for multi worker

Cheers !