jamba-instruct-preview
POST {{baseUrl}}/chat/completions
Request Body
{"messages"=>[{"role"=>"user", "content"=>"Explain the importance of low latency LLMs"}], "model"=>"jamba-instruct-preview", "temperature"=>0.5, "max_tokens"=>1024, "top_p"=>1, "stream"=>false, "stop"=>nil}
RESPONSES
status: OK
{"id":"cmpl-c8b8e905a9e84be38253174671ddede4","choices":[{"index":0,"message":{"role":"assistant","content":"Low latency large language models (LLMs) are important because they can process and generate responses quickly, which is essential for many real-time applications such as natural language processing, speech recognition, and machine translation. Low latency also allows for more efficient use of resources, as the model can process more requests in a shorter amount of time. Additionally, low latency can improve the user experience by reducing the delay between the user's input and the model's response."},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":106,"completion_tokens":97,"total_tokens":203}}