Cookbook

Chat completion in Python

ezrouter does not ship its own Python SDK. Use the official openai package with a base_url override.

Install

bash

pip install openai

Minimal example

python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["EZROUTER_API_KEY"],
    base_url="https://www.ezrouter.dev/v1",
)

response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"},
    ],
)

print(response.choices[0].message.content)

Set EZROUTER_API_KEY in your environment to a key from the dashboard.

Streaming

ezrouter always returns SSE on the chat-completions endpoint. The openai SDK abstracts this for you when you pass stream=True:

python

stream = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Count to five."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)
print()

When you call without stream=True, the SDK consumes the SSE stream internally and assembles the final response object — you get the non-streaming ergonomics on top of the always-streaming surface.

Multi-turn

Append each completed turn to a running messages list:

python

history = [
    {"role": "system", "content": "You are a helpful assistant."},
]

def ask(user_input: str) -> str:
    history.append({"role": "user", "content": user_input})
    resp = client.chat.completions.create(
        model="claude-sonnet-4-6",
        messages=history,
    )
    reply = resp.choices[0].message.content
    history.append({"role": "assistant", "content": reply})
    return reply

print(ask("What is 2+2?"))
print(ask("And 3+3?"))

Reading `usage`

The OpenAI SDK exposes the standard token counts; ezrouter extensions hang off the same object as a dict-like:

python

resp = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Hello"}],
)

print(resp.usage.prompt_tokens, resp.usage.completion_tokens)
# Cached portion, when present
cached = getattr(resp.usage, "prompt_tokens_details", None)
if cached:
    print("cached:", cached.cached_tokens)

The Anthropic-aliased output_tokens field on usage is unreliable on this surface (often reads 0); read completion_tokens instead.

Catch openai.APIError and switch on HTTP status. The error envelope ezrouter returns is documented in error codes. Do not retry on 429 — the gateway does not emit 429; you may see a 5xx during a gateway redeploy, which is the correct retry target.

python

from openai import APIError, APIConnectionError
import time

def safe_complete(**kwargs):
    for attempt in range(5):
        try:
            return client.chat.completions.create(**kwargs)
        except APIConnectionError:
            time.sleep(2 ** attempt)
        except APIError as e:
            if 500 <= e.status_code < 600:
                time.sleep(2 ** attempt)
                continue
            raise
    raise RuntimeError("exceeded retry budget")

Anthropic surface alternative

For claude models with extended thinking or prompt caching, the Anthropic surface gives a richer feature set. See anthropic-api guide.

Next steps

Node.js example — same call from JavaScript.
curl example — bare-metal HTTP without an SDK.
API reference —

every parameter explained.