Inferrence Streaming Changed

101100 · May 13, 2025, 9:50pm

I’ve noticed that in the last few days, the inference API streaming mode has changed in two ways:

The streaming mode tends to send practically the entire response in one chunk instead of sending things as they are generated.
The library I’m using (the Rust crate async_openai) is giving an error that the [DONE] token is not being received.

Was this intentional or should I open a support ticket?

Topic		Replies	Views
Did inference api change recently? Technical Help	0	71	March 20, 2025
Does Inference API support batch/asynchronous processing	1	58	March 13, 2025
[request] please add deepseek to lambda inference	2	155	April 17, 2025
I am Struggling with Transformer inference latency Technical Help	0	37	April 21, 2025
Inference API call returning "HTTP/1.1 400 Bad Request"	1	71	February 6, 2025