MiniMax M2.7: Gonka Network Model

In the spring of 2026, the Gonka network evolved from a single-model to a multi-model network. First, Kimi K2.6 was added to the flagship Qwen3-235B, and at the end of May 2026, MiniMax M2.7 from the Chinese laboratory MiniMax arrived. Later, Qwen3-235B was removed from the network, and today Gonka services two models simultaneously — Kimi K2.6 and MiniMax M2.7.

Let's examine what MiniMax M2.7 is, who is behind its development, what its characteristics are within the Gonka network, how it differs from the second active network model — Kimi K2.6 — and how to access it via our API Gateway using the OpenAI-compatible protocol.

What is MiniMax M2.7 and who is behind the model

MiniMax M2.7 is a large language model (LLM) from MiniMax, a company based in Shanghai. MiniMax was founded in 2021 by a team of researchers led by Yan Junjie (formerly of SenseTime) and quickly became one of China's leading AI laboratories. The company has attracted funding from Alibaba, Tencent, and HongShan — the same circle of strategic investors behind other "Chinese AI tigers," including Moonshot AI, the developer of Kimi K2.6.

Beyond pure language models, MiniMax is known for consumer products: chat assistants Talkie and Hailuo, as well as one of the most prominent video generators in the industry. But for the Gonka network, the M-series of text models — successors to earlier abab models — is particularly important.

The main architectural feature of the M-series is its focus on an efficient attention mechanism. While early large models used classic quadratic attention (computational cost grows proportionally to the square of context length), MiniMax was one of the first to release a hybrid linear attention mechanism. This allows for processing very long sequences without an explosive increase in computational cost — a historical hallmark of the lineup. Like Qwen3-235B and Kimi K2.6, the model is built on the MoE (Mixture of Experts) architecture: hundreds of billions of parameters "on paper," but only a small fraction of them are activated for each query, drastically reducing the cost of inference.

In the Gonka network, the model is identified as MiniMaxAI/MiniMax-M2.7 — this is the string you need to pass in the model field of your API request. Version M2.7 is the latest iteration of the M-series at the time of publication.

Characteristics of MiniMax M2.7 in the Gonka Network

It is important to distinguish between the characteristics of the model "out-of-the-box" and the characteristics with which it is deployed in a specific network. When a model operates in the decentralized Gonka network, its operational parameters are defined by the vLLM-inference configuration on the GPU-host side, not just the model architecture. Here are the actual values returned by our Gateway:

Context window: 200,000 tokens (about 150,000 words). This is the subnet configuration in the Gonka network. The MiniMax architecture itself supports a significantly longer context, but the practical ceiling at any given moment is determined by the inference settings on the hosts.
Maximum output: 8,192 tokens per response. This figure was measured empirically — by a request with forced long generation that hit the ceiling (finish_reason: length). Currently, this ceiling is the same for all models in the network — up to 8,192 tokens. This is not a limit of the model itself, but the configuration of the vLLM-subnet.
Host VRAM requirement: about 320 GB VRAM per node. This is a typical requirement for a large MoE model in FP8 quantization — the same 320 GB are needed for Kimi K2.6. In practice, this means several H100/H200 class GPUs combined into one node.

The price of inference in the Gonka network does not depend on the model choice and is determined by network parameters: via JoinGonka Gateway, MiniMax M2.7 is available at the same rate as Kimi K2.6. The unified price is a consequence of the fact that the network is based on a single calculation of costs for computing power, not a specific vendor's price list.

MiniMax M2.7 and Kimi K2.6 — Gonka model comparison

Gonka network users have a choice of two flagship models, both accessible via a unified OpenAI-compatible JoinGonka Gateway interface. The comparison below helps to understand not "which one is better," but for which task profile each is optimized.

Feature	MiniMax M2.7	Kimi K2.6
Manufacturer	MiniMax (Shanghai)	Moonshot AI (Beijing)
Architecture	MoE + linear attention	MoE
Context in Gonka	200,000 tokens	200,000 tokens
Max output	8,192 tokens	8,192 tokens
Historical strength	Long context, efficient attention	Reasoning, long context
API identifier	`MiniMaxAI/MiniMax-M2.7`	`moonshotai/Kimi-K2.6`
Network status	Launched via v0.2.13 upgrade (May 2026)	Launched via DevShards (May 2026)

An important caveat regarding benchmarks in 2026: the gap between top open-weights models in public tests has shrunk to single percentage points, and this difference is often within the statistical margin of error of the benchmarks themselves. For practical work, it is not the absolute place in the MMLU rating that matters, but the nature of the task: context length, complexity of logic chains, required language, and tool calling availability.

Practical guideline: for tasks with very long documents and streaming processing of large volumes of text, it makes sense to test MiniMax M2.7 — the efficient attention of its series is historically tailored to such scenarios. For reasoning tasks with complex logic and long context, you should compare responses with Kimi K2.6. The best strategy in production is to keep both models in the code and switch between them by a single model parameter without changing the application architecture.

How Gonka launched MiniMax M2.7: v0.2.13 upgrade

Adding MiniMax M2.7 is not a "file upload to a server" task, but the result of a network upgrade processed via on-chain governance. Support for the model was included in the v0.2.13 protocol release, approved by proposal #54: it was accepted on May 21, 2026 (approximately 63% "for" votes) and activated at a specific block height. This is the same governance mechanism the network uses to adopt all significant changes — from pricing to new models.

Multimodality for a decentralized network is a fundamental step. A network tied to a single model is inherently fragile: the release of a new model version becomes a migration crisis, while any failure of the sole model crashes the entire service. A network capable of supporting multiple models simultaneously evolves gracefully: new models are added as additional "tracks," older ones continue to operate, and GPU-hosts have the choice of what to serve. Technically, each model lives in its own network shard — the same mechanism (DevShards) was previously used to launch Kimi K2.6.

A separate nuance in the early stages: there may be a lag between "the model appeared in the network list" and "the model is open to all clients." Initially, MiniMax M2.7 inference in broker mode was available only to privileged keys and returned an error for regular requests — a normal "shakedown" phase. By the end of May 2026, public access was opened, and the model became available to all Gateway clients. Learn more about how the network is structured and why models are launched this way in our article on Gonka network architecture.

The same MiniMax M2.7 via OpenRouter costs $0.279/$1.20 per 1M, compared to $0.003/$0.009 at JoinGonka.

How to use MiniMax M2.7 via JoinGonka Gateway

The most direct way is via the JoinGonka API Gateway. Since the Gateway provides an OpenAI-compatible API, the same code that works with GPT, Claude, or Kimi will work with MiniMax after changing the value of the model field.

A minimal example via curl:

curl https://gate.joingonka.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "MiniMaxAI/MiniMax-M2.7",
    "messages": [
      {"role": "user", "content": "Briefly explain what linear attention is"}
    ]
  }'

The same request in Python using the openai library:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://gate.joingonka.ai/v1",
)

response = client.chat.completions.create(
    model="MiniMaxAI/MiniMax-M2.7",
    messages=[{"role": "user", "content": "Hello, MiniMax"}],
)
print(response.choices[0].message.content)

Streaming (Server-Sent Events) is for interactive interfaces where the response is displayed as it is generated:

stream = client.chat.completions.create(
    model="MiniMaxAI/MiniMax-M2.7",
    messages=[{"role": "user", "content": "Write a short essay about long context"}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

By registering with JoinGonka Gateway, you receive 10 million free tokens for testing any network models — this is enough to compare both network models on your own tasks.

Compatibility with development tools: everything that works with the OpenAI API also works with MiniMax via the Gateway. Simply change the model parameter:

Cursor: in Custom Model settings, specify MiniMaxAI/MiniMax-M2.7
Claude Code, Cline, Continue.dev: model name in the config
LangChain, n8n: model parameter when initializing the client

The current list of models is always available at the GET /v1/models endpoint — it is convenient to pull it dynamically so that your application's UI displays the fresh set. If you receive 429 too many concurrent requests — this is a normal phase for a fresh model in the early stage of network growth: retry the request in a few seconds.

When to choose MiniMax M2.7 — practical scenarios

Having two models in one network is valuable because you can choose different tools for different tasks without changing the provider or the integration code. Here are scenarios where it makes sense to start testing with MiniMax M2.7.

Long document analysis. If the task is contract summarization, technical documentation analysis, or processing large legal or financial texts, the effective attention of the M series is historically designed for maintaining long context without a sharp increase in cost. Pass the entire document in one request and ask the model to process the whole volume at once, not in chunks.

RAG and knowledge base interaction. In retrieval-augmented scenarios, where dozens of fragments from a vector database are mixed into the context, the model's ability to hold many heterogeneous pieces of text directly affects response quality. This is a natural niche for long-context models.

Transcript and log processing. Call transcripts, long support dialogues, streaming logs — these are tasks where the input volume is large but the output is usually short. Here, the 8,192 token output limit is not a hindrance: you get high input volumes with concise summaries or extracted facts as output.

When to choose another model. Currently, all network models provide up to 8,192 tokens in a single response, so if your application requires a very long response in one request (a large generated document, a bulky piece of code) — plan for this total limit in your architecture and split the generation into parts. For tasks with complex multi-step reasoning, it is worth comparing answers with Kimi K2.6. Universal tip: run the same set of your real requests through both models and compare the results — the 10 million free tokens upon registration are enough for a full comparative test.

Technically, switching between models is just changing one string in the model field. Therefore, a competent application architecture on the Gonka network doesn't "choose a model forever," but allows routing requests between Kimi K2.6 and MiniMax M2.7 depending on the task type — cheap inference makes such routing economically viable.

MiniMax M2.7 is an MoE model from the Shanghai-based lab MiniMax, added to the Gonka network in May 2026 alongside Kimi K2.6 (support was included in protocol upgrade v0.2.13, proposal #54); public inference opened to everyone by the end of May. On the Gonka network, the model runs with a 200,000 token context and an 8,192 token output limit per node with ~320 GB VRAM. It is accessible via the OpenAI-compatible API through the JoinGonka Gateway; the model identifier is MiniMaxAI/MiniMax-M2.7. The M series is historically strong in effective attention and long context.

← Kimi K2.6: The Second Model in the Gonka Network

Want to learn more?

Explore other sections or start earning GNK right now.

Try MiniMax M2.7 via Gateway →