Chat Completions
POST /v1/chat/completions
Fully compatible with the OpenAI Chat Completions API. Supports streaming, tool calling, JSON mode, and reasoning-token models.
Request
{
"model": "zai-org/GLM-5-FP8",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
"temperature": 0.7,
"max_tokens": 1024,
"stream": true
}
All standard OpenAI fields are accepted — temperature, top_p, n, stop, presence_penalty, frequency_penalty, logit_bias, user, response_format, and so on.
0G Router Extensions
Optional top-level fields. Stripped before the request is forwarded to the provider, so they never conflict with the OpenAI schema.
| Field | Type | Description |
|---|---|---|
provider | object | Control provider routing — see Routing |
verify_tee | boolean | Ask the Router to synchronously verify the provider's TEE signature — see Verifiable Execution |
Streaming
Set "stream": true to receive Server-Sent Events in the OpenAI SSE format. Any OpenAI client library that handles streaming will work unchanged.
curl https://router-api.0g.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-YOUR_API_KEY" \
-d '{
"model": "zai-org/GLM-5-FP8",
"messages": [{"role": "user", "content": "Write a haiku about decentralization"}],
"stream": true
}'
Some models (e.g. GLM-5) emit a reasoning_content field in streaming deltas before the final content. Client libraries that know about reasoning tokens will surface both separately.
Tool Calling
Models that advertise tool-calling capability accept the standard OpenAI tools / tool_choice fields.
{
"model": "zai-org/GLM-5-FP8",
"messages": [
{"role": "user", "content": "What's the weather in Tokyo?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"}
},
"required": ["city"]
}
}
}
]
}
The response shape (tool_calls in the assistant message, tool role for results) matches OpenAI exactly.
JSON Mode
For models that support structured output:
{
"model": "zai-org/GLM-5-FP8",
"messages": [
{"role": "system", "content": "Respond with JSON only."},
{"role": "user", "content": "List three colors and their hex codes."}
],
"response_format": {"type": "json_object"}
}
As with tool calling, check capability flags before using.
Response shape
Responses are OpenAI-compatible (choices[], usage, model, id, object, created). The Router adds two Router-specific additions on top:
x_0g_trace (always present)
Every Router response carries an x_0g_trace object with metadata about the request's execution:
"x_0g_trace": {
"request_id": "0852f405-6c56-40c2-a800-e6fd70785065",
"provider": "0xd9966e13a6026Fcca4b13E7ff95c94DE268C471C",
"billing": {
"input_cost": "19000000000000",
"output_cost": "1916800000000000",
"total_cost": "1935800000000000"
}
}
| Field | Description |
|---|---|
request_id | Unique ID for this request. Quote it in any support ticket or bug report. |
provider | On-chain address of the provider that served the request |
billing.input_cost / output_cost / total_cost | Exact cost in neuron for this specific request |
tee_verified | Present only when verify_tee: true was set — see Verifiable Execution |
This means you don't need to compute costs yourself — the Router tells you exactly what was charged. Handy for per-request logging and client-side budget tracking.
reasoning_content (thinking models)
For models with thinking support (e.g. zai-org/GLM-5-FP8), the Router returns the reasoning trace alongside the final answer. It appears in two equivalent places:
"choices": [{
"message": {
"role": "assistant",
"content": "{ \"colors\": [ ... ] }",
"reasoning_content": "The user wants a JSON response...",
"provider_specific_fields": {
"reasoning_content": "The user wants a JSON response..."
}
}
}]
Both fields contain the same text; most OpenAI SDKs surface reasoning_content directly on the message. You can ignore it for production output, log it for debugging, or display it to the user as "thinking".
Thinking is on by default for GLM-5. If you don't want it (saves tokens and latency), pass chat_template_kwargs: {"enable_thinking": false} in the request body — GLM-5 advertises this in its supported_parameters.