Did inference api change recently?

zach · March 20, 2025, 7:18pm

I am no longer getting good responses from the inference completion api.

I’ve been using the following code for a couple months now, nothing has changed on my end:

  const stream = await client.completions
    .create({
      model,
      prompt: txt,
      stream: true,
      seed: 1,
    })
    .catch((err) => log(err.message));

this is now roughly responding with a single sentence and terminating with the following object:

{
  "id": "cmpl-800ca8f6-c724-4f2b-addc-6df62cc9036c",
  "object": "text_completion",
  "created": 1742493982,
  "model": "qwen25-coder-32b-instruct",
  "choices": [
    {
      "text": ":\n\n",
      "index": 0,
      "finish_reason": "length",
      "logprobs": {
        "tokens": null,
        "token_logprobs": null,
        "top_logprobs": null,
        "text_offset": null
      }
    }
  ],
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 0,
    "total_tokens": 0,
    "prompt_tokens_details": null,
    "completion_tokens_details": null
  }
}

the finish reason is now length and it stops way early, like a single sentence and thats it.

Update

I was able to remedy by adding max_tokens to the request, and now I get full responses and the expected stop reason.

However prompt_tokens, completion_tokens, and total_tokens are no longer reporting, they always show 0.

Topic		Replies	Views
Inferrence Streaming Changed Technical Help	0	38	May 13, 2025
I'm getting an HTTP code 524 response from the the inference API's Technical Help	0	41	March 2, 2025
Does Inference API support batch/asynchronous processing	1	73	March 13, 2025
Inference API Timeout Technical Help	2	53	June 6, 2025
Tool calling in Lambda Inference API Technical Help	6	219	June 24, 2025

Did inference api change recently?

Update

Related topics