Lambda <> Openrouter Woes

Jason · January 11, 2025, 11:15pm

Lambda has a great service through OpenRouter, having incredible models for incredible Prices.

However, it’s not so great if I’m trying to make a platform off of Lambda’s inference endpoints if every other second I’m being rate limited. My users (only 100 ish in total) get constant 429s.

I’m using Hermes 405 and Llama 3.3 70B, both of which are crazy limited.

Is there way to raise my key’s limits and Im just being dumb or is this how it’s meant to be.

cody_b · January 12, 2025, 7:41pm

@Jason Lambda’s engineers are looking into the issue. I’ll post an update in this thread as soon as I have one.

Jason · January 12, 2025, 8:21pm

Thank you!! I’m excited to use the service again without issues

cody_b · January 13, 2025, 3:49pm

@Jason Quick update: Lambda’s engineers currently suspect the issue is with the DDoS protection implementation. If that’s the case, the rate limit will be bumped up for all customers.

If you happen to have Ray IDs, those would be helpful for diagnosing the issue.

cody_b · January 14, 2025, 5:26pm

@Jason Work was done last night to address the issue. Let me know in this thread if the issue appears to be fixed (or not).

Jason · January 15, 2025, 3:00am

Hey! I’m still seeing 429s right now. I do not see it earlier, but I see it rn.

It comes from OpenRouter so I don’t see a Ray ID, but I do see the error being returned and nothing logged otherwise.

Jason · January 15, 2025, 3:14am

If it’s a rate limit, can I get my accounts limits increased? I’m happy to use my key and route through OR if this is possible.

friendly_fox · January 15, 2025, 11:24am

Hi, I’m also hitting the same issue - huge numbers of 429s. It seems to have become abruptly worse within the last day.

Jason · January 15, 2025, 7:01pm

Yeah it is getting bad. I’m all for prevent DDOS, but I’d like to raise my limits on my max inference requests somehow. I’ve been using Lambda for a while now and I really really want to keep using you guys, but the 429s are unacceptable now.

@cody_b

cody_b · January 15, 2025, 11:20pm

@Jason @friendly_fox

Here’s the latest update I received at 11:38 AM PST:

[Rate limiting has been implemented as a] temporary measure as we recover from an outage. The team is on it and will remove rate limits again when everything is stable.

I’ll continue to provide updates in this thread.

friendly_fox · January 16, 2025, 4:51pm

Eagerly awaiting updates! My app normally routes well over 1B tokens/week to Hermes 405b via OpenRouter. I’ve had to temporarily switch it all to DeepInfra, which is slower and less reliable in general, so I would love to start pointing to Lambda again!

cody_b · January 17, 2025, 7:16pm

@Jason @friendly_fox Small update but it’s all I have at the moment: Lambda’s engineers are slowly easing the rate limiting for the Inference product. I’m trying to get an ETA for when the rate limiting should stop affecting your projects.

Topic		Replies	Views
Model and content limit	2	98	February 19, 2025
Inference API Timeout Technical Help	2	59	June 6, 2025
How do I set a spend limit on Lambda Cloud? Technical Help	1	733	March 8, 2024
I'm getting an HTTP code 524 response from the the inference API's Technical Help	0	46	March 2, 2025
Can I make an api request to an endpoint in a lambda server? Technical Help	1	458	January 8, 2024

Lambda <> Openrouter Woes

Related topics