I’ve noticed that in the last few days, the inference API streaming mode has changed in two ways:
- The streaming mode tends to send practically the entire response in one chunk instead of sending things as they are generated.
- The library I’m using (the Rust crate
async_openai
) is giving an error that the[DONE]
token is not being received.
Was this intentional or should I open a support ticket?