Out there on the interweb this seems the closest: [Bug]: Cannot Load any model. IndexError with CUDA, multiple GPUs · Issue #4069 · vllm-project/vllm · GitHub
The author reported that “Solved it with a fresh install with a new docker container” but that’s not an option for me…