Skip to main content

0G Compute Inference

0G Compute Network provides decentralized AI inference services, supporting various AI models including Large Language Models (LLM), text-to-image generation, and speech-to-text processing.

Prerequisitesโ€‹

  • Node.js >= 22.0.0
  • A wallet with 0G tokens (either testnet or mainnet)
  • EVM compatible wallet (for Web UI)

Supported Service Typesโ€‹

  • Chatbot Services: Conversational AI with models like GPT, DeepSeek, and others
  • Text-to-Image: Generate images from text descriptions using Stable Diffusion and similar models
  • Speech-to-Text: Transcribe audio to text using Whisper and other speech recognition models

Available Servicesโ€‹

Testnet Services
View Testnet Services (2 Available)
#ModelTypeProviderInput (per 1M tokens)Output (per 1M tokens)
1qwen-2.5-7b-instructChatbot0xa48f01...0.05 0G0.10 0G
2qwen-image-edit-2511Image-Edit0x4b2a9...-0.005 0G/image

Available Models by Type:

Chatbots (1 model):

  • Qwen 2.5 7B Instruct: Fast and efficient conversational model

Image-Edit (1 model):

  • Qwen Image Edit 2511: Advanced image editing and manipulation model

All testnet services feature TeeML verifiability and are ideal for development and testing.

Mainnet Services
View Mainnet Services (7 Available)
#ModelTypeVerificationProviderInput (per 1M tokens)Output (per 1M tokens)
1GLM-5-FP8ChatbotTeeML0xd9966e...1 0G3.2 0G
2deepseek-chat-v3-0324ChatbotTeeML0x1B3AAe...0.30 0G1.00 0G
3gpt-oss-120bChatbotTeeML0xBB3f5b...0.10 0G0.49 0G
4qwen3-vl-30b-a3b-instructChatbotTeeML0x4415ef...0.49 0G0.49 0G
5qwen3.6-plusChatbotTeeTLS0x992e63...0.80 0Gยน4.80 0Gยน
6whisper-large-v3Speech-to-TextTeeML0x36aCff...0.05 0G0.11 0G
7z-imageText-to-ImageTeeML0xE29a72...-0.003 0G/image

ยน Tiered Pricing: qwen3.6-plus uses input-length-based tiered pricing. Input โ‰ค256k tokens: 0.80 / 4.80 0G. Input >256k tokens: 3.20 / 9.60 0G (ร—4 input, ร—2 output).

Available Models by Type:

Chatbots (5 models):

  • GLM-5-FP8: High-performance reasoning model (FP8 quantized)
  • GPT-OSS-120B: Large-scale open-source GPT model
  • Qwen3 VL 30B A3B Instruct: Efficient conversational model (text-only; image input is not yet supported)
  • Qwen3.6-Plus (TeeTLS): Alibaba's flagship LLM with hybrid linear attention and sparse MoE routing, optimized for agentic coding, multi-step workflows, and complex reasoning. 1M token context window, 119 languages. Powered by Alibaba Cloud Model Studio.
  • DeepSeek Chat V3: Optimized conversational model

Speech-to-Text (1 model):

  • Whisper Large V3: OpenAI's state-of-the-art transcription model

Text-to-Image (1 model):

  • Z-Image: Fast high-quality image generation

Verification Modesโ€‹

0G Compute supports two TEE verification modes depending on how the model is hosted:

TeeML โ€” The AI model runs directly inside a Trusted Execution Environment. The TEE guarantees that both the model and the computation are protected, and responses are signed by the TEE's private key. Used by self-hosted models (e.g., GLM-5-FP8, DeepSeek, GPT-OSS-120B).

TeeTLS โ€” The Broker runs inside a TEE and proxies requests to a centralized LLM provider (e.g., Alibaba Cloud Model Studio) over HTTPS. This provides cryptographic proof that responses genuinely came from the real provider, with the following guarantees:

  • Authentic routing: During the TLS handshake, the Broker verifies the provider's certificate against trusted Certificate Authorities, ensuring the connection reaches the real provider โ€” not an imposter.
  • Cryptographic proof: The Broker captures the provider's TLS certificate fingerprint and bundles it together with the request hash, response hash, and provider identity into a signed routing proof using its TEE-protected private key.
  • Privacy preservation: Since the Broker runs inside a TEE, it cannot inspect or tamper with user data in transit โ€” 0G acts as a verifiable relay, not a middleman. This is conceptually similar to zkTLS but with stronger privacy properties, as the TEE ensures the relay itself is trustworthy.
  • End-to-end integrity: The TEE attestation proves the Broker is running unmodified code, the CA/TLS system guarantees only the real provider holds a valid certificate for their domain, and the TEE signature binds everything together โ€” a verifier can confirm the proof came from a genuine TEE and that the fingerprint belongs to the expected provider.

Used by centralized API-backed models (e.g., qwen3.6-plus via Alibaba Cloud).

Choose Your Interfaceโ€‹

FeatureWeb UICLISDK
Setup time~1 min~2 min~5 min
Interactive chatโœ…โŒโŒ
AutomationโŒโœ…โœ…
App integrationโŒโŒโœ…
Direct API accessโŒโŒโœ…

Best for: Quick testing, experimentation and direct frontend integration.

Option 1: Use the Hosted Web UIโ€‹

Visit the official 0G Compute Marketplace directly โ€” no installation required:

https://compute-marketplace.0g.ai/inference

Option 2: Run Locallyโ€‹

Installationโ€‹

pnpm add @0glabs/0g-serving-broker -g

Launch Web UIโ€‹

0g-compute-cli ui start-web

Open http://localhost:3090 in your browser.

Getting Startedโ€‹

1. Connect & Fundโ€‹

  1. Connect your wallet (MetaMask recommended)
  2. Deposit some 0G tokens using the account dashboard
  3. Browse available AI models and their pricing

2. Start Using AI Servicesโ€‹

Option A: Chat Interface

  • Click "Chat" on any chatbot provider
  • Start conversations immediately
  • Perfect for testing and experimentation

Option B: Get API Integration

  • Click "Build" on any provider
  • Get step-by-step integration guides
  • Copy-paste ready code examples

Understanding Delayed Fee Settlementโ€‹

How Fee Settlement Works

0G Compute Network uses delayed (batch) settlement for provider fees. This means:

  • Fees are not deducted immediately after each inference request. Instead, the provider accumulates usage fees and settles them on-chain in batches.
  • Your sub-account balance may appear to drop suddenly when a batch settlement occurs. For example, if you make 10 requests and the provider settles all at once, you'll see a single larger deduction rather than 10 small ones.
  • You are only charged for actual usage โ€” no extra fees are deducted. The total amount settled always matches the sum of your individual request costs.
  • This is by design to reduce on-chain transaction costs and improve efficiency for both users and providers.

What this means in practice:

  • After making requests, your provider sub-account balance may temporarily appear higher than your "true" available balance
  • When settlement occurs, the balance updates to reflect all accumulated fees at once
  • If you see a sudden balance decrease, check your usage history โ€” the total will match your actual usage

This behavior is visible in the Web UI (provider sub-account balances), CLI (get-account), and SDK (getAccount()).

Rate Limitsโ€‹

Per-User Rate Limits

Each provider enforces per-user rate limits to ensure fair resource sharing across all users. The default limits are:

  • 30 requests per minute per user (sustained)
  • Burst allowance of 5 requests (short spikes allowed)
  • 5 concurrent requests per user

If you exceed these limits, the provider will return HTTP 429 Too Many Requests. Wait briefly and retry. These limits are set by individual providers and may vary.

Troubleshootingโ€‹

Common Issuesโ€‹

Error: Too many requests (429)

You are sending requests too quickly. Each provider enforces per-user rate limits (default: 30 requests/min, 5 concurrent).

  • Wait a few seconds and retry
  • Reduce request frequency โ€” for batch workloads, add a delay between requests
  • Check concurrent requests โ€” ensure you are not sending more than 5 simultaneous requests
Error: Insufficient balance

Your provider sub-account doesn't have enough funds. Each provider requires a minimum locked balance of 1 0G to serve requests.

CLI:

Deposit to Main Accountโ€‹

0g-compute-cli deposit --amount 10
0g-compute-cli transfer-fund --provider <PROVIDER_ADDRESS> --amount 1

SDK:

// Deposit to main account
await broker.ledger.depositFund(10);
// Transfer to provider sub-account (minimum 1 0G recommended)
await broker.ledger.transferFund(providerAddress, 'inference', BigInt(1) * BigInt(10 ** 18));

Note: In Node.js, the SDK provides background auto-funding that periodically checks sub-account balances and tops up when insufficient. In browser environments, you must transfer funds manually.

Error: Provider not acknowledged

You need to acknowledge the provider before using their service. The easiest way is to transfer funds, which auto-acknowledges:

CLI:

0g-compute-cli transfer-fund --provider <PROVIDER_ADDRESS> --amount 1

SDK:

// transferFund auto-acknowledges the provider's TEE signer
await broker.ledger.transferFund(providerAddress, 'inference', BigInt(1) * BigInt(10 ** 18));
Error: No funds in provider sub-account

Transfer funds to the specific provider sub-account:

0g-compute-cli transfer-fund --provider <PROVIDER_ADDRESS> --amount 1

Check your account balance:

0g-compute-cli get-account
Web UI not starting

If the web UI fails to start:

  1. Check if another service is using port 3090:
0g-compute-cli ui start-web --port 3091
  1. Ensure the package was installed globally:
pnpm add @0glabs/0g-serving-broker -g

Next Stepsโ€‹


Questions? Join our Discord for support.