Knowledge Base Sections ▾
For Beginners
For Investors
- Where does GNK token value come from
- Gonka vs Competitors: Render, Akash, io.net
- The Libermans: from biophysics to decentralized AI
- GNK Tokenomics
- Risks and Prospects of Gonka: Objective Analysis
- Gonka vs Render Network: Detailed Comparison
- Gonka vs Akash: AI Inference vs Containers
- Gonka vs io.net: Inference vs GPU Marketplace
- Gonka vs Bittensor: A Detailed Comparison of Two Approaches to AI
- Gonka vs Flux: Two Approaches to Useful Mining
- Governance in Gonka: How a Decentralized Network is Managed
Technical
Analytics
Tools
- Cursor + Gonka AI - cheap LLM for coding
- Claude Code + Gonka AI - LLM for the terminal
- OpenClaw + Gonka AI - affordable AI agents
- OpenCode + Gonka AI - free AI for code
- Continue.dev + Gonka AI - AI for VS Code/JetBrains
- Cline + Gonka AI - AI agent in VS Code
- Aider + Gonka AI - pair programming with AI
- LangChain + Gonka AI - AI applications for pennies
- n8n + Gonka AI - automation with cheap AI
- Open WebUI + Gonka AI - your own ChatGPT
- LibreChat + Gonka AI — open-source ChatGPT
- Hermes Agent + Gonka AI — Autonomous Agent for Pennies
- Kilo Code + Gonka AI — AI-Agent in VS Code
- Roo Code + Gonka AI — Autonomous AI Agent in VS Code
- LlamaIndex + Gonka AI — RAG applications for pennies
- PydanticAI + Gonka — typed AI agents for pennies
- Vercel AI SDK + Gonka AI — AI applications in TypeScript for pennies
- TanStack AI + Gonka — AI applications in TypeScript for pennies
- API quick start — curl, Python, TypeScript
- JoinGonka Gateway — a full overview
- Management Keys — SaaS on Gonka
- Cheapest AI API: Provider Comparison 2026
- Cursor Pro request limit reached — real breakdown and cheap alternative
- Claude Code cheaper alternative — bill breakdown and switch
- Cline burned through dollars — why the agent burns money
- OpenClaw too expensive — why the agent burns tokens and how to save
- OpenRouter cheaper alternative — comparison vs JoinGonka Gateway
Technology
MiniMax M2.7: Gonka Network's Third Model
In spring 2026, the Gonka network transitioned from a single-model to a multi-model network. First, Kimi K2.6 was added to the flagship Qwen3-235B, and then in late May 2026 — the third model, MiniMax M2.7 from the Chinese laboratory MiniMax. This marks the first time in the network's history that it simultaneously serves three independent large language models.
Let's break down what MiniMax M2.7 is, who is behind its development, what its characteristics are specifically within the Gonka network, how it differs from the two already functioning models, and how to access it via our API Gateway using an OpenAI-compatible protocol.
What is MiniMax M2.7 and who is behind the model
MiniMax M2.7 is a large language model (LLM) from MiniMax, a company based in Shanghai. MiniMax was founded in 2021 by a team of researchers led by Yan Junjie (formerly of SenseTime) and quickly became one of China's leading AI laboratories. The company has attracted funding from Alibaba, Tencent, and HongShan — the same circle of strategic investors behind other "Chinese AI tigers," including Moonshot AI, the developer of Kimi K2.6.
Beyond pure language models, MiniMax is known for consumer products: chat assistants Talkie and Hailuo, as well as one of the most prominent video generators in the industry. But for the Gonka network, the M-series of text models — successors to earlier abab models — is particularly important.
The main architectural feature of the M-series is its focus on an efficient attention mechanism. While early large models used classic quadratic attention (computational cost grows proportionally to the square of context length), MiniMax was one of the first to release a hybrid linear attention mechanism. This allows for processing very long sequences without an explosive increase in computational cost — a historical hallmark of the lineup. Like Qwen3-235B and Kimi K2.6, the model is built on the MoE (Mixture of Experts) architecture: hundreds of billions of parameters "on paper," but only a small fraction of them are activated for each query, drastically reducing the cost of inference.
In the Gonka network, the model is identified as MiniMaxAI/MiniMax-M2.7 — this is the string you need to pass in the model field of your API request. Version M2.7 is the latest iteration of the M-series at the time of publication.
Characteristics of MiniMax M2.7 in the Gonka Network
It's important to distinguish between the characteristics of the model "out of the box" and the characteristics with which it is deployed in a specific network. When the model operates in the decentralized Gonka network, its operational parameters are set by the vLLM inference configuration on the GPU hosts, not solely by the model's architecture. Here are the actual values provided by our Gateway:
- Context window: 131,072 tokens (approximately 100,000 words). This is the subnet configuration in the Gonka network. The MiniMax architecture itself supports significantly longer contexts, but the practical ceiling at any given moment is determined by the inference settings on the hosts.
- Maximum output: 4,096 tokens per response. This figure is measured empirically — by a request with forced long generation that hit the ceiling (finish_reason: length). For comparison, Qwen3-235B has a ceiling of 8,192, and Kimi K2.6 has 3,072 tokens. This is not a model limit, but a vLLM subnet configuration.
- Host VRAM requirement: approximately 320 GB VRAM per node. This is a typical requirement for a large MoE model in FP8 quantization — the same 320 GB are needed for Qwen3-235B and Kimi K2.6. In practice, this means several H100/H200 class GPUs combined into a single node.
The cost of inference in the Gonka network does not depend on the model choice and is determined by network parameters: through the JoinGonka Gateway, MiniMax M2.7 is available at the same rate as Qwen and Kimi. The unified price is a consequence of the network's underlying calculation of computational work cost, rather than a specific vendor's price.
MiniMax M2.7, Qwen3-235B, and Kimi K2.6 — Comparison of the Three Gonka Models
For the first time, a Gonka network user has a choice of three flagship models, and all three are available through a unified OpenAI-compatible interface, the JoinGonka Gateway. The comparison below helps to understand not "which is better," but for which task profile each is optimized.
| Characteristic | MiniMax M2.7 | Qwen3-235B | Kimi K2.6 |
|---|---|---|---|
| Manufacturer | MiniMax (Shanghai) | Alibaba Cloud (Hangzhou) | Moonshot AI (Beijing) |
| Architecture | MoE + linear attention | MoE (235B/22B active) | MoE |
| Context in Gonka | 131,072 tokens | 131,072 tokens | 131,072 tokens |
| Max. output | 4,096 tokens | 8,192 tokens | 3,072 tokens |
| Historical strength | Long context, efficient attention | Multilingual (119 languages), tool calling | Reasoning, long context |
| API Identifier | MiniMaxAI/MiniMax-M2.7 | Qwen/Qwen3-235B-A22B-Instruct-2507-FP8 | moonshotai/Kimi-K2.6 |
| Network Status | Launched via upgrade v0.2.13 (May 2026) | Stable since August 2025 | Launched via DevShards (May 2026) |
An important caveat regarding benchmarks in 2026: the gap between top open-weights models in public tests has narrowed to single-digit percentages, and this difference often falls within the statistical error of the benchmarks themselves. For practical work, what matters is not the absolute rank in the MMLU rating, but the nature of the task: context length, complexity of logical chains, required language, availability of tool calling.
Practical guidance: for tasks involving very long documents and streaming large volumes of text, it makes sense to test MiniMax M2.7 — its efficient attention mechanism is historically optimized for such scenarios. For universal multilingual work and stable production tool calling, Qwen3-235B is a proven option. For reasoning tasks with complex logic, compare the answers with Kimi K2.6. The best strategy in production is to keep all three models in the code and switch between them with a single model parameter without changing the application's architecture.
How Gonka launched the third model: upgrade v0.2.13
Adding MiniMax M2.7 is not a 'file upload to the server', but the result of a network upgrade that passed through on-chain voting. Model support was included in protocol release v0.2.13, approved by proposal #54: it was accepted on May 21, 2026 (approx. 63% 'yes' votes) and activated at a specified block height. This is the same governance mechanism through which the network adopts any significant changes — from tariffs to new models.
Multi-modality for a decentralized network is a fundamental step. A network tied to a single model is fundamentally fragile: a new model version release turns into a migration crisis, and any single model failure collapses the entire service. A network capable of holding several models simultaneously evolves smoothly: new models are added as additional 'lanes', old ones continue to operate, and GPU hosts get to choose what to serve. Technically, each model lives in its own network shard — this same mechanism (DevShards) was previously used to launch Kimi K2.6.
A specific nuance of the early stages: there can be a lag between 'model appeared in the network list' and 'model is open to all clients'. Initially, MiniMax M2.7 inference in broker mode was only available to privileged keys and returned an error for regular requests — a normal phase of breaking in. By the end of May 2026, public access opened, and the model became available to all Gateway clients. More details on how the network is structured and why models are launched this way can be found in the article on Gonka network architecture.
The same MiniMax M2.7 via OpenRouter — $0.279/$1.20 per 1M, versus $0.001 with JoinGonka.
How to use MiniMax M2.7 via JoinGonka Gateway
The most direct way is through the JoinGonka API Gateway. Since the Gateway provides an OpenAI-compatible API, the same code that works with GPT, Claude, Qwen, or Kimi will start working with MiniMax after changing the model field value.
Minimal example using curl:
curl https://gate.joingonka.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "MiniMaxAI/MiniMax-M2.7",
"messages": [
{"role": "user", "content": "Briefly explain what linear attention is"}
]
}'The same request in Python using the openai library:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://gate.joingonka.ai/v1",
)
response = client.chat.completions.create(
model="MiniMaxAI/MiniMax-M2.7",
messages=[{"role": "user", "content": "Hello, MiniMax"}],
)
print(response.choices[0].message.content)Streaming (Server-Sent Events) — for interactive interfaces where the response is shown as it is generated:
stream = client.chat.completions.create(
model="MiniMaxAI/MiniMax-M2.7",
messages=[{"role": "user", "content": "Write a short essay about long context"}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)Upon registration with JoinGonka Gateway, you receive 10 million free tokens to test any network models — enough to compare all three models on your own tasks.
Compatibility with development tools: everything that works with the OpenAI API also works with MiniMax via the Gateway. Simply change the model parameter:
- Cursor: in Custom Model settings, specify
MiniMaxAI/MiniMax-M2.7 - Claude Code, Cline, Continue.dev: model name in the config
- LangChain, n8n:
modelparameter when initializing the client
The current list of models is always available at the GET /v1/models endpoint — it's convenient to dynamically pull it from there so your application's UI automatically displays the latest set. If you receive 429 too many concurrent requests, this is a normal phase for a new model during the network's early growth: retry the request after a few seconds.
When to choose MiniMax M2.7 — practical scenarios
Having three models in one network is valuable because different tools can be chosen for different tasks without changing the provider or integration code. Here are scenarios where it makes sense to start testing with MiniMax M2.7.
Analysis of long documents. If the task is to summarize contracts, parse technical documentation, or process large legal or financial texts, the efficient attention of the M-series is historically optimized for retaining long contexts without a sharp increase in cost. Pass the entire document in one request and ask the model to work with the entire volume at once, rather than in pieces.
RAG and working with knowledge bases. In retrieval-augmented scenarios, where dozens of fragments from a vector database are incorporated into the context, the model's ability to retain many disparate pieces of text directly affects the quality of the response. This is a natural niche for models with long contexts.
Processing transcripts and logs. Call transcripts, long support dialogues, streaming logs — these are tasks where the input volume is large, but the response is usually short. Here, the 4,096-token output ceiling does not hinder: much goes in, a summary or extracted facts come out.
When to choose another model. If your application needs a very long response in one request (a large generated document, a voluminous piece of code), remember the output ceiling of 4,096 tokens — Qwen3-235B has twice that (8,192). If stable native tool calling in production is key, Qwen3-235B has been proven longer. For tasks with complex multi-step reasoning, compare the answers with Kimi K2.6. Universal advice: run the same set of your real queries through all three models and compare the results — the free 10 million tokens upon registration are enough for a comprehensive comparative test.
Technically, switching between models is changing a single line in the model field. Therefore, a well-designed application architecture on the Gonka network does not "choose a model forever," but allows routing requests between Qwen, Kimi, and MiniMax depending on the task type — cheap inference makes such routing economically advantageous.
Want to learn more?
Explore other sections or start earning GNK right now.
Try MiniMax M2.7 via Gateway →