Knowledge Base Sections ▾

Tools

Technology

Kimi K2.6: The Second Model in the Gonka Network

For a long time, the Gonka network operated on a single model — Qwen3-235B from Alibaba Cloud. In May 2026, this changed: support for multiple models was launched through the DevShards mechanism, and the first to arrive was Kimi K2.6 from the Chinese company Moonshot AI. Let's explore what this model is, how it differs from Qwen3-235B, how Gonka technically implemented multi-modality, and how to try the new model through our API Gateway.

What is Kimi K2.6 from Moonshot AI

Kimi K2.6 is a Large Language Model (LLM) from the Kimi series, developed by the Beijing-based company Moonshot AI. Moonshot AI is one of China's leading AI laboratories, founded in 2023 by a team of researchers led by Yang Zhilin. The company has attracted funding from Alibaba, Tencent, and other major investors, and has been listed among the 'Chinese AI tigers' — companies that set the pace for AI development in Asia.

The Kimi series has been known since 2024. Early versions (K1, K1.5) immediately attracted attention with their exceptionally long context window — up to 200,000 tokens in a single request, which at the time of release was a record for publicly available models. A long context means the practical ability to analyze an entire book, a medium-sized codebase, or a collection of legal documents in one request. At the time of Kimi's release, this feature was a strong competitive advantage.

Version K2 appeared in 2025 and brought a fundamental architectural leap — the transition to MoE (Mixture of Experts). This same architecture underpins Qwen3-235B and DeepSeek-R1 — it has become the de facto standard for the largest models of 2025–2026. MoE allows for hundreds of billions of parameters 'in total', but only activates a subset (usually 5–10%) for each request, which radically reduces the computational cost of inference while maintaining comparable quality.

K2.6 is the latest iteration of the K2 series at the time of writing. Public statements from Moonshot AI indicate that this version has improved the model's capabilities in reasoning, code generation, and native tool calling. In the Gonka network, the model is identified as moonshotai/Kimi-K2.6 — this is the name you need to pass in the model field of your API request.

Comparison of Kimi K2.6 and Qwen3-235B

Both models represent flagship developments from major Chinese AI laboratories and are both available through a unified OpenAI-compatible interface, JoinGonka Gateway. However, they have different strengths and different legacies, making the choice between them not a matter of 'which is better', but 'which is suitable for the task'.

CharacteristicKimi K2.6Qwen3-235B-A22B
ManufacturerMoonshot AI (Beijing)Alibaba Cloud (Hangzhou)
Year company founded20232009 (Alibaba Cloud)
ArchitectureMoEMoE (235B total, 22B active)
Context WindowLong context (Kimi series' hallmark)131,072 tokens (~100,000 words)
Strong SuitReasoning, long context, code generationUniversal, multilingual (119 languages), stable tool calling
Price via JoinGonka$0.001 per 1M tokens$0.001 per 1M tokens
API Identifiermoonshotai/Kimi-K2.6Qwen/Qwen3-235B-A22B-Instruct-2507-FP8
Tool CallingIn refinement (auto-choice)Native, stable (PR #767)
Status in Gonka networkLaunched via DevShards (May 2026)Stable since August 2025

On reasoning benchmarks (MATH-500, GSM8K, AIME), the Kimi K2 series historically shows results in the top tier of open-weights models, competing with DeepSeek-R1 and o1-style models. On code generation tasks (HumanEval, MBPP), both models perform at similar levels. In multilingualism and translation, Qwen3-235B has an advantage due to training on 119 languages, while Kimi is more optimized for Chinese and English.

An important caveat about benchmarks in 2026: the gap between top models in public tests has narrowed to single-digit percentages, and this difference often falls within the statistical error of the benchmarks themselves. For practical work, what matters is not 'who is 2% higher in MMLU', but the nature of the tasks: what context you pass to the model, how complex the logical chains are, whether a long dialogue history is needed, and what languages are used. Therefore, the table above does not rank the models — it helps quickly understand which task profile each of them is optimized for.

For practical selection: if the task requires a long context (analysis of large documents, reading extensive codebases, long dialogues with history retention) or complex reasoning tasks — it's worth starting with Kimi K2.6. For universal tasks, translations, multilingual work, and stable tool calling in production — Qwen3-235B currently appears to be a more proven option, as it has been operating longer in the Gonka network. A good strategy in production is to have both models in your code: quickly switching via the model parameter allows you to alternate between them depending on the task without changing the application architecture.

DevShards: How Gonka Launched the Second Model

Until spring 2026, the entire Gonka network served exactly one model — Qwen3-235B. From an architectural standpoint, this was a sound decision: distributed inference via DiLoCo requires all network participants to hold the same model in VRAM, otherwise it's impossible to guarantee that any node can process any request. The full Qwen3-235B in FP8 format occupies about 640 GB of VRAM, which is already a huge commitment for every MLNode.

To transition to a multi-model network, a mechanism was needed that would allow several models to be held simultaneously, but would not require every host to run all of them. This mechanism became DevShards — separate shards of the network, each specializing in one model. Nodes within a single shard work on the same model, and the network router directs requests to the shard with the required model.

The idea did not come out of nowhere — it was formalized in Gonka Improvement Proposal #800 "Multi-Model PoC", put to community vote in spring 2026. The proposal received support from network participants and validators and was implemented in April–May 2026. Kimi K2.6 became the first model launched on a separate DevShard — effectively a test implementation of the new approach. If the experience proves successful, nothing prevents the launch of a third, fourth, and so on — each on its own shard, with its own set of hosts, its own economics, and its own roadmap.

What this means for users and developers:

  • One API — multiple models. Through the JoinGonka Gateway, there's no need to change endpoints or keys: simply specify a different model in the request body. The OpenAI-compatible format is fully preserved.
  • Same price. Currently, Kimi K2.6 in the network is priced at the same rate as Qwen3-235B — $0.001 per 1M tokens through the Gateway. In the future, prices may vary by model, but unified pricing at launch is a conscious decision to simplify user migration.
  • Stability depends on shard load. In its early stage, the Kimi shard has fewer hosts than the main Qwen shard, so during request concentration, the model may temporarily return 429 too many concurrent requests. This is a normal phase for a new model — as interest grows, hosts will connect to the Kimi shard, and limits will increase.
  • Tool calling — in refinement. At the time of writing, Kimi K2.6 in the Gonka network has minor issues with automatic tool selection (tool_choice: "auto"). The Gonka team is working to bring the behavior to OpenAI standards; for production-critical scenarios with tool calling, it is recommended to use Qwen3-235B for now.

How to Try Kimi K2.6 via Gonka

The most direct path is through the JoinGonka API Gateway. The Gateway provides an OpenAI-compatible API, which means that the same code that works with GPT, Claude, or Qwen will work with Kimi after changing the model field value in the request body.

Minimal example via curl:

curl https://gate.joingonka.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "moonshotai/Kimi-K2.6",
    "messages": [
      {"role": "user", "content": "Explain the difference between MoE and dense models"}
    ]
  }'

The same request with Python via the openai library:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://gate.joingonka.ai/v1",
)

response = client.chat.completions.create(
    model="moonshotai/Kimi-K2.6",
    messages=[{"role": "user", "content": "Hello, Kimi"}],
)
print(response.choices[0].message.content)

Streaming (Server-Sent Events) — for interactive interfaces and chats where you want to show the response as it's generated:

stream = client.chat.completions.create(
    model="moonshotai/Kimi-K2.6",
    messages=[{"role": "user", "content": "Write an essay about MoE"}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

The cost of Kimi K2.6 is the same $0.001 per 1 million tokens as Qwen3-235B. This is ~2,500 times cheaper than GPT-5.4 and ~3,000 times cheaper than Claude Sonnet 4.5. Upon registration with JoinGonka Gateway, you receive 10 million free tokens to test any network models — this is enough for several hours of intensive work or tens of thousands of regular requests.

Compatibility with development tools: everything that works with the OpenAI API also works with Kimi via the Gateway. At the model level, it's enough to change the model parameter:

  • Cursor: in Custom Model settings, specify moonshotai/Kimi-K2.6
  • Claude Code: environment variable ANTHROPIC_MODEL or flag --model
  • OpenClaw, Cline, Continue.dev: change the model name in the CustomChatModel config
  • LangChain, n8n: model parameter in client initialization
  • Open WebUI, LibreChat: the model appears in the dropdown list after adding Gonka as a custom provider

The list of available models is always up-to-date at the GET /v1/models endpoint of your Gateway instance — it's convenient to dynamically pull it into your application's UI so users see the full list and can choose the model themselves.

The demo chat on the /try page at the time of publication only works with Qwen3-235B — a multi-model selector in the widget is on the roadmap. To try Kimi right now, use the Gateway API: the 10M free tokens will be enough for several hours of experimentation. If you receive a 429 too many concurrent requests response — this is a normal phase for a new model in the early stages of the Gonka network's growth. Simply repeat the request after a few seconds or wait for a period of lower load.

What's next for the Gonka network: the success of DevShards for Kimi opens the door to other models. DeepSeek-V3/R1, Llama 4, and specialized code models are being discussed within the community. Each new model means a new shard, new hosts, new opportunities for users, and a new revenue stream for GPU providers. Multi-model architecture is also strategically important: a network tied to a single model is fundamentally fragile (a new version release means a migration crisis), while a network capable of holding multiple models simultaneously evolves smoothly and continuously.

Kimi K2.6 is an MoE model from Moonshot AI with a long context and strong reasoning capabilities. In May 2026, it became the second model in the Gonka network after Qwen3-235B, launched through the DevShards mechanism (a separate shard per model). It is available via the JoinGonka Gateway through an OpenAI-compatible API at $0.001 per 1M tokens — the same price as Qwen. The API model identifier is moonshotai/Kimi-K2.6. In the early stage, temporary 429 errors may occur during request concentration; tool calling is in refinement.

Want to learn more?

Explore other sections or start earning GNK right now.

Try Kimi K2.6 via Gateway →