I am currently using Lambda Labs credits on the Inference API for a hackathon, and I am finding that llama3.3-70b-instruct-fp8 as well as many other models time out really quickly when using the OpenAI API client. I cannot get past 30 iterations in a pipeline with 3k token length inputs and 20 token outputs, even when calling time.sleep(1) and setting the OpenAI timeout to 120s.
I initialize the client as follows:
openai_api_key = os.getenv("LAMBDA_API_KEY")
openai_api_base = "https://api.lambda.ai/v1"
# Initialize the OpenAI client
openai_client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
timeout=120
)
Then I query for a summary:
def get_gpt_summary(article, dataset, model) -> str:
history = [
{"role": "system", "content": DATASET_SYSTEM_PROMPTS[dataset]},
{
"role": "user",
"content": f"Article:\n{article}\n\nProvide only the summary with no other text.",
},
]
response = openai_client.chat.completions.create(
model=model,
messages=history,
)
return response.choices[0].message.content
Even when I put a sleep(1) call in between the calls to get_gpt_summary, I still get a Timeout error from the OpenAI API after anywhere from 5-50 iterations. This is incredibly frustrating and has bottlenecked my research efforts. The effect applies irrespective of model choice, and the timeout=120 does not help. The inference module is usually 6-12 seconds, and the timeout is triggered abnormally early, suggesting that there is a different problem.
Please let me know if you need more information.