Introduction - Generative AI & LLM REST APIs Documentation

Generative AI & Large Language Model REST APIs

25+ inference & fine-tuning APIs, including AI21 Labs, Amazon Bedrock, Anthropic, Anyscale, Cohere, Deepgram, Deep Infra, Fireworks AI, Google Vertex AI, Groq, Hugging Face, IBM watsonx.ai, Lamini, Lepton AI, Mistral AI, NVIDIA AI, OctoAI, Ollama (localhost), OpenAI, Perplexity AI, Replicate, Stability AI, and Together AI.

This is work in progress and will be updated over time.

1. Initial Thoughts on GenAI & LLM REST APIs

1.1. API-First

At this time, few AI companies offer a full-featured REST API. Some offer a partial-featured REST API and some don't offer a REST API at all.

For example, uploading files for fine-tuning can be accomplished via a REST API, but in some cases a web-based GUI or Python SDK is required instead (not offered in addition to).

With API-first, all features are exposed via a REST API, which can be consumed from virtually any client. GUI, CLI, and SDKs for different programming languages make use of the API.

1.2. Standardization

At this time, few AI companies offer an OpenAPI Specification (formerly known as Swagger) to download and generate client-side code in different programming languages, or to import into a tool like Qodex.

Some AI companies have adopted OpenAI's Chat Completions API design, which makes it easy for clients to switch from one API to another. It is becomimg a de-facto standard for Generative AI and LLM REST APIs for inference.

1.3. Performance

[Groq is delivering ultra-low LLM inference latency] with the world's first Language Processing Unit (LPU), outperforming GPU-based processing (NVIDIA, AMD, Intel, etc.)

1.4. Security

At this time, few AI companies offer REST API security for zero-trust use cases (for example, in financial services).

Most AI companies require a bearer token, but it is a static, long-lived API key, not a dynamic, time-based bearer token. Others are explicit about it and call it what it is, an API key. Very few AI companies offer a bearer token such as OAuth and JWT.

2. List of GenAI & LLM REST APIs

[AI21 Labs API]

Models: Jamba and Jurrasic.

[Amazon Bedrock API]

Models include Titan (by Amazon), Claude (by Anthropic), Command (by Cohere), Jurrasic (by AI21 Labs), Llama (by Meta AI, open-weight), Mixtral (by Mistral AI, open-weight), and more.

[Anthropic API]

Models: Claude.

[Anyscale API]

Models include Llama & CodeLlama (by Meta AI, open-weight) and Mixtral (by Mistral AI, open-weight).

[Cohere API]

Models include Command, Embed, and Rerank.

[Deep Infra API]

100+ open models (open-source, open-data, and/or open-weight), including Yi (by 01.AI, open-weight), Gemma (by Google, open-weight), Llama (by Meta AI, open-weight), and Mixtral (by Mistral AI, open-weight).

[Deepgram API]

Models include Nova, Enhanced, Base, Custom, and Whisper.

[Fireworks AI API]

Models include Gemma (by Google, open-weight), Llama (by Meta AI, open-weight), Mistral & Mixtral (by Mistral AI, open-weight), and more.

[Google Vertex AI API]

70+ models, including Gemini (by Google), Claude (by Anthropic), Llama (by Meta AI, open-weight), and Mixtral (by Mistral AI, open-weight).

[Groq API]

Models include Gemma (by Google, open-weight), Llama (by Meta AI, open-weight) and Mixtral (by Mistral AI, open-weight).

[Hugging Face API]

1,000,000+ open models (open-source, open-data, and/or open-weight), including Gemma (by Google, open-weight), Llama (by Meta AI, open-weight), and Mixtral (by Mistral AI, open-weight).

[IBM watsonx.ai API]

Models include Granite (by IBM), Llama & CodeLlama (by Meta AI, open-weight), Mixtral (by Mistral AI, open-weight), and more.

[Lamini API]

Models include Hugging Face transformer-based models, Llama (by Meta AI, open-weight), Mistral (by Mistral AI, open-weight), and more.

[Lepton AI API]

Models include Llama (by Meta AI, open-weight), Mixtral (by Mistral AI, open-weight), SDXL (by Stability AI), and more.

[Mistral AI API]

Models include Mistral and Mixtral.

[NVIDIA AI API]

Models include Gemma (by Google, open-weight), Llama & CodeLlama (by Meta AI, open-weight), Mixtral (by Mistral AI, open-weight), SDXL Turbo & Stable Video Diffusion (by Stability AI), Kosmos (by Microsoft), Edify (by Getty Images), Edify 3D (by Shutterstock), and more.

[OctoAI API]

Models include Gemma (by Google, open-weight), Llama & CodeLlama (by Meta AI, open-weight), Mistral & Mixtral (by Mistral AI, open-weight), SDXL & Stable Video Diffusion (by Stability AI), and more.

[Ollama API (localhost)]

Models include Gemma (by Google, open-weight), Llama & CodeLlama (by Meta AI, open-weight), Mixtral (by Mistral AI, open-weight), Phi (by Microsoft), and more.

[OpenAI API]

Models include GPT-4 & GPT-4 Turbo, GPT-3.5 Turbo, DALL·E, TTS, Whisper, Embeddings, Moderation, and more.

[Perplexity AI API]

Models include Mistral & Mixtral (by Mistral AI, open-weight) and SONAR (by Meta AI, open-weight).

[Replicate API]

50+ models, including Llama (by Meta AI, open-weight) and Mixtral (by Mistral AI, open-weight).

[Stability AI API]

Models include SDXL Turbo, Stable Diffusion, Stable Audio, Stable LM, Stable Video Diffusion, and Stable Zero123.