Models

DeepInfra hosts a large number of the most popular machine learning models. You can find the full list here, conveniently split into categories based on their functionality. We are constantly adding more. DeepInfra is usually amongst the first to add a new model once it is available, and offers the best prices for open-source model inference.

Model categories

Text generation / LLMs — Llama, DeepSeek, Mistral, Qwen, Gemma, and more
Embeddings — Qwen3 Embedding, BAAI/bge, sentence-transformers, and more
Rerankers — Cross-encoder rerankers for RAG pipelines
Vision / multimodal — Qwen2.5-VL, Llama Vision, and more
OCR — Specialized models for document text extraction
Text to image — FLUX, Stable Diffusion, and more
Text to video — Generate video clips from text prompts
Text to speech — Convert text to natural-sounding audio
Speech recognition — Whisper and other ASR models

Model pages

Each model has a dedicated page where you can:

Try it out interactively
See its API documentation
Grab ready-to-use code examples

Private models

We also support deploying custom models on DeepInfra infrastructure. Run your own fine-tuned or trained-from-scratch LLM on dedicated A100/H100/H200/B200/B300 GPUs.

Specifying model versions

Some models have more than one version available. You can infer against a particular version using {"model": "MODEL_NAME:VERSION", ...} format. You can also infer against a deploy_id using {"model": "deploy_id:DEPLOY_ID", ...}. This is especially useful for Custom LLMs — you can start inferring before the deployment finishes and before you have the model name + version pair.

Model deprecation

Due to the fast-paced AI world, newer and better models are released every day. Occasionally we have to deprecate older models to maintain quality and affordability. When a model is deprecated:

You’ll receive at least 1 week’s advance notice before the deprecation date
Your applications won’t break — after deprecation, inference requests are automatically forwarded to a recommended replacement model
You’ll get an email notifying recent users of the model, including the deprecation date

You can browse the current list of available models at deepinfra.com/models.

Suggest a model

If you think there is a model that we should run, let us know at info@deepinfra.com. We read every email.

Getting Started

Chat Completions

More APIs

Deploy Private Models

GPU Instances

Integrations

Account & Security

Tutorials

Model categories

Model pages

Private models

Specifying model versions

Model deprecation

Suggest a model

Getting Started

Chat Completions

More APIs

Deploy Private Models

GPU Instances

Integrations

Account & Security

Tutorials

​Model categories

​Model pages

​Private models

​Specifying model versions

​Model deprecation

​Suggest a model

Model categories

Model pages

Private models

Specifying model versions

Model deprecation

Suggest a model