Guides

JSON output

Set response_format to {"type": "json_object"} and the model's response content will be a valid JSON string instead of free-form text. Useful for any downstream pipeline that parses the model output.

How it works

The OpenAI surface accepts response_format on /v1/chat/completions. ezrouter forwards it to the upstream provider, which applies its own JSON-mode constraint. Verified on the catalog's claude, deepseek, and glm families.

python
from openai import OpenAI
import json

client = OpenAI(
    api_key=os.environ["EZROUTER_API_KEY"],
    base_url="https://www.ezrouter.dev/v1",
)

system_prompt = """
The user will provide exam text. Extract the question and the
answer, then return JSON.

EXAMPLE INPUT:
Which is the highest mountain in the world? Mount Everest.

EXAMPLE JSON OUTPUT:
{"question": "Which is the highest mountain in the world?", "answer": "Mount Everest"}
"""

response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": "Which is the longest river in the world? The Nile River."},
    ],
    response_format={"type": "json_object"},
)

print(json.loads(response.choices[0].message.content))

Output:

json
{"question": "Which is the longest river in the world?", "answer": "The Nile River"}

Rules that make it work

JSON mode is a constraint at decode time, not a schema validator. Four things you must do for it to behave:

  1. Set response_format to {"type": "json_object"}.
  2. Include the word "json" (case-insensitive) in the system or

user prompt. Some upstream providers reject the request otherwise.

  1. Show a JSON example in the system prompt. Without one the model

may emit valid JSON with the wrong shape.

  1. Set max_tokens high enough. A JSON object truncated

mid-string will not parse. If you expect ~500 tokens of payload, set max_tokens: 800 and re-check.

Failure modes

  • Empty content. Occasionally the model returns

choices[0].message.content == "". Retry the request or rephrase the prompt. This is an upstream behavior, not a gateway bug.

  • Trailing tokens after the closing }. Some models append a

newline or a stop sequence. json.loads accepts trailing whitespace but not other characters; strip before parsing if you hit this.

  • Schema drift across models. A prompt that produces

{"question": ..., "answer": ...} reliably on claude-sonnet-4-6 may add an unwanted "explanation" field on deepseek-v4-pro. Validate against your expected schema before trusting the output.

Validating in client code

JSON mode guarantees parseability, not shape. Wrap the call with a schema check:

python
import json
from pydantic import BaseModel, ValidationError

class Extract(BaseModel):
    question: str
    answer: str

raw = response.choices[0].message.content
try:
    payload = Extract.model_validate_json(raw)
except ValidationError as e:
    # retry with a stricter prompt, or fall back
    ...

When not to use JSON mode

model returns a structured argument object the SDK already parses.

  • For very long outputs. JSON mode adds decoding overhead and is

more likely to truncate at the max_tokens boundary. For generative answers with embedded JSON, ask for markdown with a fenced code block and parse that.

signatures.

the parameter's documented shape.