Responses API

TokenRouter Documentation

Route every request to the fastest, most cost-efficient model—without rewriting your prompts. These docs cover the core Responses API, SDKs, provider key management, routing automation, and firewall controls.

TokenRouter TokenRouter
OpenAI OpenAI

Getting started

Overview

TokenRouter wraps the OpenAI Responses API with an intelligent routing layer. Send a standard Responses request to https://api.tokenrouter.io/v1/responsesand we will score eligible providers, enforce your firewall rules, and return a fully compatible payload.

The router monitors live provider pricing, historical latency, and your account limits so you always get the best trade-off for each prompt. You can adjust the strategy per-request, define deterministic rules in the console, or pin a provider when compliance requires it.

Authentication

Generate an API key from the Console → API Keyspage. Pass it as a bearer token: Authorization: Bearer TOKENROUTER_API_KEY. Keys inherit your workspace quota and firewall settings, so rotate them regularly and keep them scoped to the environments you deploy.

All requests must be made over HTTPS. We automatically issue request IDs, idempotency keys, and latency telemetry so you can correlate traffic in the Logs view.

OpenAI compatibility

TokenRouter mirrors the Responses schema, including tool calls, multimodal inputs, and SSE streaming. If you already use the official OpenAI SDKs, point them at https://api.tokenrouter.io/v1 and addOpenAI-Beta: responses=v1 to your default headers.

The router is drop-in compatible with OpenAI’s retry helpers, so you can keep your existing middleware. You’ll just gain routing decisions, provider telemetry, and cost controls without touching your prompts.

Responses API

The /v1/responses endpoint accepts the same payloads as OpenAI. Below are the most common request patterns, each with matching examples for cURL, the TokenRouter SDKs, and the OpenAI SDKs.

Creating a standard request

Send a synchronous request to the Responses API. TokenRouter mirrors the OpenAI Responses schema, so anything that works against OpenAI works here.

Bash Curl
curl https://api.tokenrouter.io/v1/responses \
  -H "Authorization: Bearer $TOKENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -H "OpenAI-Beta: responses=v1" \
  -d '{
    "model": "auto:balance",
    "input": [
      {
        "role": "user",
        "content": [
          { "type": "input_text", "text": "Summarize what TokenRouter does in one sentence." }
        ]
      }
    ]
  }'

Auto routing

Let TokenRouter pick the best provider for every request. Use metadata to give the router more context, or pin a preferred vendor when you need to.

Bash Curl
curl https://api.tokenrouter.io/v1/responses \
  -H "Authorization: Bearer $TOKENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -H "OpenAI-Beta: responses=v1" \
  -d '{
    "model": "auto:quality",
    "metadata": {
      "workspace": "docs-demo",
      "priority": "launch"
    },
    "input": [
      {
        "role": "user",
        "content": [
          { "type": "input_text", "text": "Draft an onboarding email for new customers." }
        ]
      }
    ]
  }'

Modes

Balance cost, latency, or quality for each call. Use the `model` shortcut (`auto:latency`) or pass `router_mode` explicitly.

Bash Curl
curl https://api.tokenrouter.io/v1/responses \
  -H "Authorization: Bearer $TOKENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -H "OpenAI-Beta: responses=v1" \
  -d '{
    "model": "auto:latency",
    "router_mode": "latency",
    "input": [
      {
        "role": "user",
        "content": [
          { "type": "input_text", "text": "Respond quickly with a short acknowledgement." }
        ]
      }
    ]
  }'

Creating a stream request

Stream responses over Server-Sent Events. TokenRouter emits the same event types as OpenAI: metadata, message.start, content.delta, usage, and done.

Bash Curl
curl https://api.tokenrouter.io/v1/responses \
  -H "Authorization: Bearer $TOKENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -H "OpenAI-Beta: responses=v1" \
  -d '{
    "model": "auto:balance",
    "stream": true,
    "input": [
      {
        "role": "user",
        "content": [
          { "type": "input_text", "text": "Stream a short haiku about resilient infrastructure." }
        ]
      }
    ]
  }'

Creating a tool request

TokenRouter normalises tool definitions so every provider receives compatible JSON. Keep tool schemas tight to get the best results across vendors.

Bash Curl
curl https://api.tokenrouter.io/v1/responses \
  -H "Authorization: Bearer $TOKENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -H "OpenAI-Beta: responses=v1" \
  -d '{
    "model": "auto:quality",
    "tools": [
      {
        "type": "function",
        "name": "lookup_customer",
        "description": "Fetch CRM profile by email",
        "parameters": {
          "type": "object",
          "properties": {
            "email": { "type": "string", "format": "email" }
          },
          "required": ["email"],
          "additionalProperties": false
        }
      }
    ],
    "tool_choice": "auto",
    "input": [
      {
        "role": "user",
        "content": [
          { "type": "input_text", "text": "Find the CRM record for sara@acme.dev and suggest next steps." }
        ]
      }
    ]
  }'

Provider keys

Including via headers

Bring-your-own-provider keys per request to test new vendors without storing credentials server-side. Add the appropriate header and the router will prefer that provider when routing candidates are equivalent.

  • X-OpenAI-Key
  • X-Anthropic-Key
  • X-Gemini-Key
  • X-DeepSeek-Key

We automatically decrypt and compare headers against stored keys so you can rotate credentials without breaking traffic.

Configure in console

Navigate to Console → Provider Keys to add encrypted provider secrets. Choose a primary key per provider and mark fallbacks. Saved keys are surfaced to the router so calls succeed even when you do not send headers.

Enterprise plans can scope keys to projects and restrict traffic to specific regions for compliance.

Errors

Viewing errors

Failed calls return an error object with a normalised structure:

{
  "error": {
    "type": "routing_error",
    "message": "No eligible providers available for this request.",
    "http_status": 422,
    "provider": null,
    "raw": null
  }
}

Every response also includes request_id so you can inspect the full trace inside Console → Logs.

Handling errors

SDKs raise typed exceptions (AuthenticationError,QuotaExceededError,APIStatusError) so you can branch on retryable errors. When calling the API directly, check for 429 and back off usingRetry-After headers.

Routing and firewall errors surface the offending rule in the rule_tracemetadata so you can tune matchers without guessing.

Rules (Console)

Routing rules run before the classifier to enforce business logic. Head to Console → Routing Rules to create weighted overrides—pin models for VIP customers, fall back to Anthropic for long contexts, or redirect traffic that triggers specific metadata.

Rules are ordered by priority. Each condition (input contains, metadata equals, mode equals, etc.) is logged in the rule trace so you can see exactly why a rule executed. Use the “Preview decision” drawer in the console to simulate routing without sending live traffic.

Firewall (Console)

The firewall inspects prompts before they reach any provider. Create policies underConsole → Firewall to block secrets, mask sensitive data, or warn analysts when compliance filters match.

Rules support substring and regex matching across prompt, response, or metadata scopes. Actions includewarn,mask, andblock. Every applied rule is recorded alongside the response so auditors can review exactly what happened.