I am no longer getting good responses from the inference completion api.
I’ve been using the following code for a couple months now, nothing has changed on my end:
const stream = await client.completions
.create({
model,
prompt: txt,
stream: true,
seed: 1,
})
.catch((err) => log(err.message));
this is now roughly responding with a single sentence and terminating with the following object:
{
"id": "cmpl-800ca8f6-c724-4f2b-addc-6df62cc9036c",
"object": "text_completion",
"created": 1742493982,
"model": "qwen25-coder-32b-instruct",
"choices": [
{
"text": ":\n\n",
"index": 0,
"finish_reason": "length",
"logprobs": {
"tokens": null,
"token_logprobs": null,
"top_logprobs": null,
"text_offset": null
}
}
],
"usage": {
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0,
"prompt_tokens_details": null,
"completion_tokens_details": null
}
}
the finish reason is now length
and it stops way early, like a single sentence and thats it.
Update
I was able to remedy by adding max_tokens
to the request, and now I get full responses and the expected stop
reason.
However prompt_tokens
, completion_tokens
, and total_tokens
are no longer reporting, they always show 0.