Knowledge Base Sections ▾

Tools

Technology

MiniMax M2.7: The Third Gonka Network Model

In the spring of 2026, the Gonka network evolved from a single-model to a multi-model network. First, Kimi K2.6 was added to the flagship Qwen3-235B, and at the end of May 2026, the third model, MiniMax M2.7 from the Chinese laboratory MiniMax, joined. This marks the first time in the network's history that it simultaneously supports three independent large language models.

Let's examine what MiniMax M2.7 is, who is behind its development, what its characteristics are within the Gonka network, how it differs from the two already operational models, and how to access it via our API Gateway using the OpenAI-compatible protocol.

What is MiniMax M2.7 and who is behind the model

MiniMax M2.7 is a large language model (LLM) from MiniMax, a company based in Shanghai. MiniMax was founded in 2021 by a team of researchers led by Yan Junjie (formerly at SenseTime) and quickly became one of China's leading AI laboratories. The company has attracted funding from Alibaba, Tencent, and HongShan — the same circle of strategic investors behind other 'Chinese AI tigers,' including Moonshot AI, the developer of Kimi K2.6.

Beyond pure language models, MiniMax is known for consumer products: chatbot assistants Talkie and Hailuo, and one of the industry's most prominent video generators. But for the Gonka network, the M series of text models — successors to earlier abab models — are particularly important.

The main architectural feature of the M series is its focus on an efficient attention mechanism. While early large models used classic quadratic attention (computation cost scales quadratically with context length), MiniMax was one of the first to offer a hybrid linear attention mechanism. This allows processing very long sequences without an explosive increase in computational cost – a historical hallmark of the series. Like Qwen3-235B and Kimi K2.6, the model is built on the MoE (Mixture of Experts) architecture: hundreds of billions of parameters 'on paper', but only a small fraction of them are activated for each query, which drastically reduces inference costs.

In the Gonka network, the model is identified as MiniMaxAI/MiniMax-M2.7 — this is the string that needs to be passed in the model field of the API request. Version M2.7 is the latest iteration of the M series at the time of this article's publication.

MiniMax M2.7 Specifications in the Gonka Network

It's important to distinguish between the characteristics of the model 'out of the box' and the characteristics with which it's deployed in a specific network. When a model operates in the decentralized Gonka network, its operational parameters are determined by the vLLM inference configuration on the GPU hosts, not solely by the model's architecture. Here are the actual values returned by our Gateway:

  • Context Window: 131,072 tokens (approximately 100,000 words). This is the subnet configuration within the Gonka network. The MiniMax architecture itself supports significantly longer contexts, but the practical ceiling at any given moment is set by the inference configuration on the hosts.
  • Maximum Output: 4,096 tokens per response. This figure is empirically measured by a request with forced long generation that hit the ceiling (finish_reason: length). For comparison, Qwen3-235B has a ceiling of 8,192, and Kimi K2.6 has 3,072 tokens. This is not a model limit, but a vLLM subnet configuration.
  • Host VRAM Requirement: Approximately 320 GB VRAM per node. This is a typical requirement for a large MoE model in FP8 quantization — the same 320 GB are needed for Qwen3-235B and Kimi K2.6. In practice, this means several H100/H200 class GPUs combined into a single node.

The cost of inference in the Gonka network does not depend on the model choice and is determined by network parameters: through the JoinGonka Gateway, MiniMax M2.7 is available at the same rate as Qwen and Kimi. The unified price is a consequence of the network's underlying single calculation for computational work, not the price of a specific vendor.

MiniMax M2.7, Qwen3-235B, and Kimi K2.6 — Comparison of the Three Gonka Models

For the first time, a user of the Gonka network has a choice of three flagship models, all accessible via a unified OpenAI-compatible interface: JoinGonka Gateway. The comparison below helps understand not 'which is better,' but for which task profile each is optimized.

CharacteristicMiniMax M2.7Qwen3-235BKimi K2.6
ManufacturerMiniMax (Shanghai)Alibaba Cloud (Hangzhou)Moonshot AI (Beijing)
ArchitectureMoE + linear attentionMoE (235B/22B active)MoE
Context in Gonka131,072 tokens131,072 tokens131,072 tokens
Max Output4,096 tokens8,192 tokens3,072 tokens
Historical StrengthLong context, efficient attentionMultilingual (119 languages), tool callingReasoning, long context
API IdentifierMiniMaxAI/MiniMax-M2.7Qwen/Qwen3-235B-A22B-Instruct-2507-FP8moonshotai/Kimi-K2.6
Network StatusLaunched via upgrade v0.2.13 (May 2026)Stable since August 2025Launched via DevShards (May 2026)

An important caveat about benchmarks in 2026: the gap between top open-weights models in public tests has shrunk to single digits of percentages, and this difference often falls within the statistical error of the benchmarks themselves. For practical work, what matters is not the absolute rank in the MMLU rating, but the nature of the task: context length, complexity of logical chains, required language, availability of tool calling.

Practical guidance: for tasks involving very long documents and streaming processing of large text volumes, it makes sense to test MiniMax M2.7 — its efficient attention series is historically optimized for such scenarios. For universal multilingual work and stable tool calling in production, the proven option is Qwen3-235B. For reasoning tasks with complex logic, Kimi K2.6 is a good choice. The best strategy in production is to keep all three models in your code and switch between them with a single model parameter without changing the application's architecture.

How Gonka Launched the Third Model: Upgrade v0.2.13

Adding MiniMax M2.7 is not a 'file upload to the server' but the result of a network upgrade passed through on-chain voting. Model support was included in protocol release v0.2.13, ratified by proposal #54: it was accepted on May 21, 2026 (about 63% 'yes' votes) and activated at a specified block height. This is the same governance mechanism through which the network adopts any significant changes — from tariffs to new models.

Multimodality for a decentralized network is a fundamental step. A network tied to a single model is inherently fragile: a new model version becomes a migration crisis, and any failure of the sole model crashes the entire service. A network capable of hosting multiple models simultaneously evolves gracefully: new models are added as additional 'lanes', older ones continue to operate, and GPU hosts get a choice of what to serve. Technically, each model lives in its own network shard — this same mechanism (DevShards) was previously used to launch Kimi K2.6.

A specific nuance of early stages: there might be a lag between 'model appeared on the network list' and 'model is open to all clients'. Initially, MiniMax M2.7 inference in broker mode was only available to privileged keys and returned an error for regular requests — a normal rollout phase. By the end of May 2026, public access opened, and the model became available to all Gateway clients. More details on how the network is structured and why models are launched this way can be found in the article on Gonka network architecture.

How to Use MiniMax M2.7 via JoinGonka Gateway

The most direct way is via the JoinGonka API Gateway. Since the Gateway provides an OpenAI-compatible API, the same code that works with GPT, Claude, Qwen, or Kimi will start working with MiniMax after changing the model field's value.

Minimal example using curl:

curl https://gate.joingonka.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "MiniMaxAI/MiniMax-M2.7",
    "messages": [
      {"role": "user", "content": "Briefly explain what linear attention is"}
    ]
  }'

The same request in Python using the openai library:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://gate.joingonka.ai/v1",
)

response = client.chat.completions.create(
    model="MiniMaxAI/MiniMax-M2.7",
    messages=[{"role": "user", "content": "Hello, MiniMax"}],
)
print(response.choices[0].message.content)

Streaming (Server-Sent Events) — for interactive interfaces where the response is displayed as it is generated:

stream = client.chat.completions.create(
    model="MiniMaxAI/MiniMax-M2.7",
    messages=[{"role": "user", "content": "Write a short essay about long context"}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

Upon registration with JoinGonka Gateway, you receive free 10 million tokens to test any network models — enough to compare all three models on your own tasks.

Compatibility with development tools: everything that works with the OpenAI API also works with MiniMax via the Gateway. Simply change the model parameter:

The current list of models is always available at the GET /v1/models endpoint — it's convenient to fetch it dynamically from there so your application's UI automatically displays the latest set. If you receive 429 too many concurrent requests — this is a normal phase for a new model during the network's early growth: retry the request after a few seconds.

When to Choose MiniMax M2.7 — Practical Scenarios

Having three models in one network is valuable because you can choose different tools for different tasks without changing the provider or integration code. Here are scenarios where it makes sense to start testing with MiniMax M2.7.

Long Document Analysis. If the task is summarizing contracts, parsing technical documentation, processing large legal or financial texts, the efficient attention of the M series is historically designed to handle long contexts without a sharp increase in cost. Pass the entire document in one request and ask the model to work with the whole volume at once, rather than in chunks.

RAG and Knowledge Base Interaction. In retrieval-augmented scenarios, where dozens of fragments from a vector database are injected into the context, the model's ability to retain many disparate pieces of text directly affects the quality of the response. This is a natural niche for models with long contexts.

Processing Transcripts and Logs. Call transcripts, long support dialogues, streaming logs – these are tasks where the input volume is large, but the output is usually short. Here, the output ceiling of 4,096 tokens does not interfere: much goes in, a summary or extracted facts come out.

When to choose another model. If your application needs a very long response in a single request (a large generated document, a voluminous piece of code), remember the output ceiling of 4,096 tokens — Qwen3-235B has twice that (8,192). If stable native tool calling in production is key — Qwen3-235B has been proven longer. For tasks with complex multi-step reasoning, compare results with Kimi K2.6. Universal advice: run the same set of your real queries through all three models and compare the results — the free 10 million tokens upon registration are enough for a full comparative test.

Technically, switching between models is changing a single string in the model field. Therefore, a well-designed application architecture on the Gonka network does not 'choose a model forever,' but allows routing requests between Qwen, Kimi, and MiniMax depending on the type of task — cheap inference makes such routing economically viable.

MiniMax M2.7 is an MoE model from the Shanghai laboratory MiniMax, becoming the third model on the Gonka network after Qwen3-235B and Kimi K2.6. Support was included in protocol upgrade v0.2.13 (proposal #54, May 2026); by the end of May, public inference was opened to everyone. On the Gonka network, the model operates with a context of 131,072 tokens and an output ceiling of 4,096 tokens per node with ~320 GB VRAM. Available via JoinGonka Gateway through an OpenAI-compatible API; the model identifier is MiniMaxAI/MiniMax-M2.7. The M series is historically strong in efficient attention and long context.

Want to learn more?

Explore other sections or start earning GNK right now.

Try MiniMax M2.7 via Gateway →