Inference API Limits?

junruilee · February 14, 2025, 2:30pm

Does anyone know the specific API limits (e.g. Tokens per minute (TPM), Request per minute (RPM), etc.) applicable to the Inference API, per model? Could not find any documentation on the website. Apologies if I missed anything.

cody_b · February 14, 2025, 5:35pm

@junruilee

There are no rate limits.

I just now added to the docs, “No limits are placed on the rate of requests.”

junruilee · February 14, 2025, 5:54pm

Thank you for confirming.

Topic		Replies	Views
No way to establish spending limits	0	59	March 4, 2025
Model and content limit	2	92	February 19, 2025
Lambda <> Openrouter Woes Model Debugging	11	169	January 17, 2025
I am Struggling with Transformer inference latency Technical Help	0	37	April 21, 2025
Inference API Privacy	3	170	April 17, 2025

Inference API Limits?

Related topics