Expose HTTP server via firewall

I’m really struggling to make Lambda Labs work for me… At first I kept trying to launch an H100 instance but every instance had a dead GPU. A support ticket addressed that and I finally was able to launch an H100 instance with a GPU that worked. Support even gave me a $10 credit for the headache so kudos to them. So I got Llama 3 uploaded and I can run inferences on the same machine but I’m trying to exposes llama.cpp’s server to my other machines and I can’t seem to find a way to make that work.

I’ve tried every firewall configuration including ALL TCP ports allowed and while I can ping the machines IP address I can’t connect to it over HTTP. I always get ECONNREFUSED. I’m not even trying to use HTTPS. Just simple HTTP. Has anyone successfully gotten this to work?

I’ve given up for now and terminated my instance. I’m not going to pay someone almost $2,000 a month when basic HTTP doesn’t even work :frowning:

Has anyone successfully gotten this to work?

How are you running llama.cpp, like as a Docker container?

With it running, please run:

sudo ss -tlp

That command will show what IP address and port llama.cpp is using.

I tore it down but you start it as an executable so:

./server -m -c -ngl 99 —host —port

I know this part is working because I can send it requests from another terminal window in the Jupyter Notebook. I just can’t connect to it from outside Lambda Labs.

I’ve done this same config and startup flow in literally 100 separate Ubuntu environments and it works fine. I’m not able to get through the firewall even though I’ve reconfigured it every way that seems possible.

It’s ok… Im working with another hosting provider.

You need --host 0.0.0.0.

Otherwise, the server is binding only to 127.0.0.1 (localhost)—meaning only local connections are allowed.

See the llama.cpp server README:

  • --host: Set the hostname or ip address to listen. Default 127.0.0.1

I can give that a try… thanks!