Skip to main content

Chat Completions

POST /v1/chat/completions

Fully compatible with the OpenAI Chat Completions API. Supports streaming, tool calling, JSON mode, and reasoning-token models.

Request

{
"model": "zai-org/GLM-5-FP8",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
"temperature": 0.7,
"max_tokens": 1024,
"stream": true
}

All standard OpenAI fields are accepted — temperature, top_p, n, stop, presence_penalty, frequency_penalty, logit_bias, user, response_format, and so on.

0G Router Extensions

Optional top-level fields. Stripped before the request is forwarded to the provider, so they never conflict with the OpenAI schema.

FieldTypeDescription
providerobjectControl provider routing — see Routing
verify_teebooleanAsk the Router to synchronously verify the provider's TEE signature — see Verifiable Execution

Streaming

Set "stream": true to receive Server-Sent Events in the OpenAI SSE format. Any OpenAI client library that handles streaming will work unchanged.

curl https://router-api.0g.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-YOUR_API_KEY" \
-d '{
"model": "zai-org/GLM-5-FP8",
"messages": [{"role": "user", "content": "Write a haiku about decentralization"}],
"stream": true
}'
Reasoning models

Some models (e.g. GLM-5) emit a reasoning_content field in streaming deltas before the final content. Client libraries that know about reasoning tokens will surface both separately.

Tool Calling

Models that advertise tool-calling capability accept the standard OpenAI tools / tool_choice fields.

{
"model": "zai-org/GLM-5-FP8",
"messages": [
{"role": "user", "content": "What's the weather in Tokyo?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"}
},
"required": ["city"]
}
}
}
]
}
Check model capabilities

Not every model supports tool calling. Before sending a request with tools, verify the model's capability flags in the catalog or on pc.0g.ai. Sending tools to a model that doesn't support it returns 400 Bad Request.

The response shape (tool_calls in the assistant message, tool role for results) matches OpenAI exactly.

JSON Mode

For models that support structured output:

{
"model": "zai-org/GLM-5-FP8",
"messages": [
{"role": "system", "content": "Respond with JSON only."},
{"role": "user", "content": "List three colors and their hex codes."}
],
"response_format": {"type": "json_object"}
}

As with tool calling, check capability flags before using.

Response shape

Responses are OpenAI-compatible (choices[], usage, model, id, object, created). The Router adds two Router-specific additions on top:

x_0g_trace (always present)

Every Router response carries an x_0g_trace object with metadata about the request's execution:

"x_0g_trace": {
"request_id": "0852f405-6c56-40c2-a800-e6fd70785065",
"provider": "0xd9966e13a6026Fcca4b13E7ff95c94DE268C471C",
"billing": {
"input_cost": "19000000000000",
"output_cost": "1916800000000000",
"total_cost": "1935800000000000"
}
}
FieldDescription
request_idUnique ID for this request. Quote it in any support ticket or bug report.
providerOn-chain address of the provider that served the request
billing.input_cost / output_cost / total_costExact cost in neuron for this specific request
tee_verifiedPresent only when verify_tee: true was set — see Verifiable Execution

This means you don't need to compute costs yourself — the Router tells you exactly what was charged. Handy for per-request logging and client-side budget tracking.

reasoning_content (thinking models)

For models with thinking support (e.g. zai-org/GLM-5-FP8), the Router returns the reasoning trace alongside the final answer. It appears in two equivalent places:

"choices": [{
"message": {
"role": "assistant",
"content": "{ \"colors\": [ ... ] }",
"reasoning_content": "The user wants a JSON response...",
"provider_specific_fields": {
"reasoning_content": "The user wants a JSON response..."
}
}
}]

Both fields contain the same text; most OpenAI SDKs surface reasoning_content directly on the message. You can ignore it for production output, log it for debugging, or display it to the user as "thinking".

Disable thinking for GLM-5

Thinking is on by default for GLM-5. If you don't want it (saves tokens and latency), pass chat_template_kwargs: {"enable_thinking": false} in the request body — GLM-5 advertises this in its supported_parameters.

  • Routing — choose your provider
  • Errors — response codes and error shapes