Guides
JSON output
Set response_format to {"type": "json_object"} and the model's response content will be a valid JSON string instead of free-form text. Useful for any downstream pipeline that parses the model output.
How it works
The OpenAI surface accepts response_format on /v1/chat/completions. ezrouter forwards it to the upstream provider, which applies its own JSON-mode constraint. Verified on the catalog's claude, deepseek, and glm families.
from openai import OpenAI
import json
client = OpenAI(
api_key=os.environ["EZROUTER_API_KEY"],
base_url="https://www.ezrouter.dev/v1",
)
system_prompt = """
The user will provide exam text. Extract the question and the
answer, then return JSON.
EXAMPLE INPUT:
Which is the highest mountain in the world? Mount Everest.
EXAMPLE JSON OUTPUT:
{"question": "Which is the highest mountain in the world?", "answer": "Mount Everest"}
"""
response = client.chat.completions.create(
model="claude-sonnet-4-6",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": "Which is the longest river in the world? The Nile River."},
],
response_format={"type": "json_object"},
)
print(json.loads(response.choices[0].message.content))Output:
{"question": "Which is the longest river in the world?", "answer": "The Nile River"}Rules that make it work
JSON mode is a constraint at decode time, not a schema validator. Four things you must do for it to behave:
- Set
response_formatto{"type": "json_object"}. - Include the word "json" (case-insensitive) in the system or
user prompt. Some upstream providers reject the request otherwise.
- Show a JSON example in the system prompt. Without one the model
may emit valid JSON with the wrong shape.
- Set
max_tokenshigh enough. A JSON object truncated
mid-string will not parse. If you expect ~500 tokens of payload, set max_tokens: 800 and re-check.
Failure modes
- Empty
content. Occasionally the model returns
choices[0].message.content == "". Retry the request or rephrase the prompt. This is an upstream behavior, not a gateway bug.
- Trailing tokens after the closing
}. Some models append a
newline or a stop sequence. json.loads accepts trailing whitespace but not other characters; strip before parsing if you hit this.
- Schema drift across models. A prompt that produces
{"question": ..., "answer": ...} reliably on claude-sonnet-4-6 may add an unwanted "explanation" field on deepseek-v4-pro. Validate against your expected schema before trusting the output.
Validating in client code
JSON mode guarantees parseability, not shape. Wrap the call with a schema check:
import json
from pydantic import BaseModel, ValidationError
class Extract(BaseModel):
question: str
answer: str
raw = response.choices[0].message.content
try:
payload = Extract.model_validate_json(raw)
except ValidationError as e:
# retry with a stricter prompt, or fall back
...When not to use JSON mode
- For tool calls. Use tool calls instead — the
model returns a structured argument object the SDK already parses.
- For very long outputs. JSON mode adds decoding overhead and is
more likely to truncate at the max_tokens boundary. For generative answers with embedded JSON, ask for markdown with a fenced code block and parse that.
Related
- Tool calls — structured output via function
signatures.
the parameter's documented shape.