Hi everyone, I’m a real novice to using LLMs.
I know that Lambda Labs has provided a script to run Llama with multiple GPUs. For context this is because for the models >7B, they specify a MP>1. However, reading around “on the internet” it seems to me that there is enough memory to make it happen on a A6000.
Is there any advice on getting a 13B model work on a single GPU, rather than relying on spreading it between GPUs?