jamba-instruct-preview

POST {{baseUrl}}/chat/completions

Request Body

{"messages"=>[{"role"=>"user", "content"=>"Explain the importance of low latency LLMs"}], "model"=>"jamba-instruct-preview", "temperature"=>0.5, "max_tokens"=>1024, "top_p"=>1, "stream"=>false, "stop"=>nil}

RESPONSES

status: OK

{&quot;id&quot;:&quot;cmpl-c8b8e905a9e84be38253174671ddede4&quot;,&quot;choices&quot;:[{&quot;index&quot;:0,&quot;message&quot;:{&quot;role&quot;:&quot;assistant&quot;,&quot;content&quot;:&quot;Low latency large language models (LLMs) are important because they can process and generate responses quickly, which is essential for many real-time applications such as natural language processing, speech recognition, and machine translation. Low latency also allows for more efficient use of resources, as the model can process more requests in a shorter amount of time. Additionally, low latency can improve the user experience by reducing the delay between the user&#39;s input and the model&#39;s response.&quot;},&quot;logprobs&quot;:null,&quot;finish_reason&quot;:&quot;stop&quot;}],&quot;usage&quot;:{&quot;prompt_tokens&quot;:106,&quot;completion_tokens&quot;:97,&quot;total_tokens&quot;:203}}