Inferrence Streaming Changed

I’ve noticed that in the last few days, the inference API streaming mode has changed in two ways:

  • The streaming mode tends to send practically the entire response in one chunk instead of sending things as they are generated.
  • The library I’m using (the Rust crate async_openai) is giving an error that the [DONE] token is not being received.

Was this intentional or should I open a support ticket?