Qwen3-235B: The Model Mined by Gonka

The Gonka network doesn't just rent out GPUs — it serves AI models for inference. For a long time, the only model was Qwen3-235B-A22B-Instruct, developed by Alibaba Cloud, and in May 2026, Kimi K2.6 from Moonshot AI joined it. Let's analyze what this model is, why Gonka chose it, and how to try it through our API Gateway.

What is Qwen3-235B

Qwen3-235B-A22B-Instruct-2507-FP8 is a large language model (LLM) from the Qwen3 family, developed by the Qwen team at Alibaba Cloud. The full name is deciphered as follows: Qwen3 — the third generation of the series, 235B — 235 billion parameters in total, A22B — 22 billion active parameters per request, Instruct — a version trained to follow instructions, 2507 — July 2025 release, FP8 — 8-bit quantization for memory optimization.

The key architectural feature is MoE (Mixture of Experts). Unlike 'dense' models (GPT-5.4, Claude Sonnet 4.5) where every token passes through all parameters, an MoE model activates only a subset of 'experts' — specialized neural network blocks — for each request. In the case of Qwen3-235B, out of 235 billion parameters, only 22 billion are activated per token — less than 10%. This delivers the quality level of models with 200B+ parameters at the computational cost of a 22B model.

Practically, this means the model is smarter than one might expect from its speed. It processes requests significantly faster than dense models of comparable quality, while requiring dramatically less VRAM for inference. This is why MoE became the dominant architecture for the largest models of 2025-2026.

The context window of Qwen3-235B is 131,072 tokens (~100,000 words) — enough to analyze entire books, codebases, or long legal documents in a single query. The model supports 119 languages, including Russian, English, Chinese, Arabic, Hindi, and dozens of others — making it one of the most multilingual models on the market.

Characteristics and Benchmarks

Qwen3-235B competes with the largest closed and open models. Here's a comparison of key characteristics:

Model	Parameters	Context	MoE	Open Source	Price (per 1M tokens)
Qwen3-235B (via JoinGonka)	235B (22B active)	131K	Yes	Yes (Apache 2.0)	$0.001
GPT-5.4 (OpenAI)	~1.8T (estimate)	128K	Yes (presumed)	No	$2.50
Claude Sonnet 4.5 (Anthropic)	Undisclosed	200K	No (presumed)	No	$3.00
Llama 4 Maverick (Meta)	400B (17B active)	1M	Yes	Yes (Llama License)	$0.20+ (hosting)
DeepSeek-R1 (DeepSeek)	671B (37B active)	128K	Yes	Yes (MIT)	$0.55

Qwen3-235B demonstrates a level of quality comparable to GPT-5.4 and Claude Sonnet 4.5 on most benchmarks, while its cost via JoinGonka Gateway is 2,500 times lower than that of GPT-5.4. This is possible due to two factors: the MoE architecture reduces computational costs, and the decentralized Gonka network eliminates data center margins.

On MMLU-Pro, HumanEval, MATH-500, and GSM8K benchmarks, the model ranks among the top three open-source models, trailing only DeepSeek-R1 in mathematical reasoning tasks. In code generation, translation, and instruction-following tasks, Qwen3-235B consistently outperforms Llama 4 Maverick and is comparable to Claude Sonnet 4.5.

How Gonka Uses Qwen3-235B

The Qwen3-235B model operates in the Gonka network in a distributed manner — via the DiLoCo protocol, adapted for inference. The full model in FP8 format requires approximately 640 GB of video memory (VRAM), which cannot fit on a single GPU — even H100 80GB or H200 141GB is insufficient. Therefore, the model is split by layers (tensor parallelism + pipeline parallelism) among several MLNode.

In practice, Qwen3-235B runs on a cluster of 8-16 GPU nodes, each with a minimum of 40 GB VRAM. Transfer Agents route the request to the required cluster, vLLM on each node processes its fragment of the model, results are aggregated and returned to the user. The entire process takes hundreds of milliseconds — the user does not feel that their request is processed by a dozen GPUs at different points on the planet.

An important technical detail: Gonka uses vLLM as the engine for serving. vLLM is an open-source project that provides high-performance text generation through PagedAttention — an algorithm that optimizes VRAM usage when processing multiple requests in parallel. This allows the network to serve thousands of concurrent users without quality degradation.

The model supports native tool calling — calling functions and tools directly from the model's response. This capability was added in Gonka via PR #767 with a threshold of 0.958 for detecting tool calls. This means developers can build AI agents that interact with external APIs, databases, and tools — all through a single request to Qwen3-235B.

The current Gonka network has over 4,000 GPUs (H100, H200, A100, RTX 4090 and others), combined into 120+ MLNode. This is one of the largest distributed GPU networks for AI inference in the world — and all this power is directed at serving Qwen3-235B.

How to Try Qwen3-235B

The easiest way to try Qwen3-235B is through the JoinGonka API Gateway. The Gateway provides an OpenAI-compatible API, which means any code written for OpenAI works with Qwen3-235B without changes — just replace the URL and API key.

Example request:

curl https://gate.joingonka.ai/api/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-235b-a22b",
    "messages": [{"role": "user", "content": "Explain MoE architecture"}]
  }'

Cost: $0.001 per 1 million tokens — this is 2,500 times cheaper than GPT-5.4 ($2.50/1M) and 3,000 times cheaper than Claude Sonnet 4.5 ($3.00/1M). Upon registration, you receive free 10 million tokens for testing.

The Gateway is compatible with popular development tools: Quick Start describes connection via Python, Node.js, and curl. IDE integrations are also supported — Cursor, Continue, Cline, Aider, and Claude Code — and frameworks for AI agents: LangChain, n8n, LibreChat, Open WebUI.

For a quick start:

Register on gate.joingonka.ai (connect a wallet or create a new one)
Get an API key in the Dashboard
Replace api.openai.com with gate.joingonka.ai/api in your code
Use the model qwen3-235b-a22b

Qwen3-235B through JoinGonka — is enterprise-level AI at the price of a hobby project.

Qwen3-235B-A22B is an MoE model with 235 billion parameters from Alibaba Cloud, which the Gonka network uses for decentralized AI inference. Thanks to its MoE architecture, it provides GPT-5.4 level quality at a cost 2,500 times lower. Through the JoinGonka Gateway, the model is available via an OpenAI-compatible API for $0.001/1M tokens.

← Choosing a GPU for Gonka: Hardware Recommendations Kimi K2.6: The Second Model in the Gonka Network →

Want to learn more?

Explore other sections or start earning GNK right now.

Try Qwen3-235B →