Text Completions

The completions API is the legacy text generation interface — you provide a raw prompt string and the model continues it. For most use cases, the Chat Completions API is simpler and recommended instead. The endpoint is:

POST https://api.deepinfra.com/v1/openai/completions

This is an advanced API. You need to know your model’s exact prompt format. Different models have different input formats. Check the model’s API section on its page for the expected format.

Example

The example below uses deepseek-ai/DeepSeek-V3 with its prompt format:

from openai import OpenAI

openai = OpenAI(
    api_key="$DEEPINFRA_TOKEN",
    base_url="https://api.deepinfra.com/v1/openai",
)

stream = True  # or False

completion = openai.completions.create(
    model="deepseek-ai/DeepSeek-V3",
    prompt="<｜begin▁of▁sentence｜><｜User｜>Hello!<｜Assistant｜>",
    stop=["<｜end▁of▁sentence｜>"],
    stream=stream,
)

if stream:
    for event in completion:
        if event.choices[0].finish_reason:
            print(event.choices[0].finish_reason,
                  event.usage.prompt_tokens,
                  event.usage.completion_tokens)
        else:
            print(event.choices[0].text, end="", flush=True)
else:
    print(completion.choices[0].text)
    print(completion.usage.prompt_tokens, completion.usage.completion_tokens)

Supported parameters

Parameter	Notes
`model`	Model name or `MODEL_NAME:VERSION`
`prompt`	Raw prompt string in the model’s expected format
`max_tokens`	Max tokens to generate. Defaults to the model’s max context length minus input length
`stream`	Stream output via SSE instead of returning the full response at once. Default: `false`
`temperature`	Sampling temperature between 0 and 2. Higher values produce more random output; lower values more deterministic. Default: `1.0`
`top_p`	Nucleus sampling threshold — only tokens comprising the top `top_p` probability mass are considered. Default: `1.0`
`stop`	Up to 4 sequences where the API will stop generating further tokens
`n`	Number of completion sequences to return. Default: `1`
`echo`	If `true`, the prompt is included at the start of the returned text
`logprobs`	Return log probabilities for the generated tokens

For every model, you can check its prompt format in the API section on its page. For the complete parameter reference, see the API reference.

Getting Started

Chat Completions

More APIs

Deploy Private Models

GPU Instances

Integrations

Account & Security

Tutorials

Example

Supported parameters

Getting Started

Chat Completions

More APIs

Deploy Private Models

GPU Instances

Integrations

Account & Security

Tutorials

​Example

​Supported parameters

Example

Supported parameters