Knowledge Base Sections ▾
For Beginners
For Investors
- Where does GNK token value come from
- Gonka vs Competitors: Render, Akash, io.net
- The Libermans: from biophysics to decentralized AI
- GNK Tokenomics
- Risks and Prospects of Gonka: Objective Analysis
- Gonka vs Render Network: Detailed Comparison
- Gonka vs Akash: AI Inference vs Containers
- Gonka vs io.net: Inference vs GPU Marketplace
- Gonka vs Bittensor: A Detailed Comparison of Two Approaches to AI
- Gonka vs Flux: Two Approaches to Useful Mining
- Governance in Gonka: How a Decentralized Network is Managed
Technical
Analytics
Tools
- Cursor + Gonka AI - cheap LLM for coding
- Claude Code + Gonka AI - LLM for the terminal
- OpenClaw + Gonka AI - affordable AI agents
- OpenCode + Gonka AI - free AI for code
- Continue.dev + Gonka AI - AI for VS Code/JetBrains
- Cline + Gonka AI - AI agent in VS Code
- Aider + Gonka AI - pair programming with AI
- LangChain + Gonka AI - AI applications for pennies
- n8n + Gonka AI - automation with cheap AI
- Open WebUI + Gonka AI - your own ChatGPT
- LibreChat + Gonka AI — open-source ChatGPT
- Hermes Agent + Gonka AI — Autonomous Agent for Pennies
- Kilo Code + Gonka AI — AI-Agent in VS Code
- Roo Code + Gonka AI — Autonomous AI Agent in VS Code
- LlamaIndex + Gonka AI — RAG applications for pennies
- PydanticAI + Gonka — typed AI agents for pennies
- Vercel AI SDK + Gonka AI — AI applications in TypeScript for pennies
- TanStack AI + Gonka — AI applications in TypeScript for pennies
- API quick start — curl, Python, TypeScript
- JoinGonka Gateway — a full overview
- Management Keys — SaaS on Gonka
- Cheapest AI API: Provider Comparison 2026
- Cursor Pro request limit reached — real breakdown and cheap alternative
- Claude Code cheaper alternative — bill breakdown and switch
- Cline burned through dollars — why the agent burns money
- OpenClaw too expensive — why the agent burns tokens and how to save
- OpenRouter cheaper alternative — comparison vs JoinGonka Gateway
Tools
LlamaIndex + Gonka AI — RAG applications for pennies
LlamaIndex is a leading framework for building RAG applications and AI agents in Python (a TypeScript version, LlamaIndex.TS, is also available). It handles document loading, chunking, indexing, vector search, and response assembly—you describe the data, and LlamaIndex transforms it into a question-answering system on top of any LLM.
There's one problem—the cost of inference. RAG by its nature is resource-intensive: for each question, the query plus several found context fragments go into the model, and for indexing large collections, embeddings are added. At production volumes, this means thousands of requests per day. With OpenAI ($2.50–15 per 1M tokens) or Anthropic ($3–15 per 1M), even a modest Q&A service turns into tens of thousands of dollars per month.
LlamaIndex natively works with any OpenAI-compatible endpoint via the OpenAILike class. This means that JoinGonka Gateway connects with a few lines of code—without custom providers or patches. The result: the same RAG system works for $0.0005/1M input tokens (output ×3) through the decentralized Gonka network—hundreds to thousands of times cheaper than cloud APIs.
Quick Start: Connecting via OpenAILike
JoinGonka API key: register at gate.joingonka.ai/register—we provide 10M free tokens to start—and create a jg-xxx key in the Dashboard.
Installation:
pip install llama-index llama-index-llms-openai-likeFor an arbitrary OpenAI-compatible API, LlamaIndex provides the OpenAILike class from the llama_index.llms.openai_like package. A minimal example of a request to Gonka:
from llama_index.llms.openai_like import OpenAILike
llm = OpenAILike(
api_base="https://gate.joingonka.ai/v1",
api_key="jg-your-key",
model="Qwen/Qwen3-235B-A22B-Instruct-2507-FP8",
is_chat_model=True, # Gonka is a chat endpoint
is_function_calling_model=True, # native tool calling is supported
context_window=131072, # 128K for Qwen3-235B
max_tokens=8192, # output ceiling via Gateway (Qwen)
)
response = llm.complete("Explain what RAG is in three sentences.")
print(response)Important about OpenAILike: be sure to specify is_chat_model=True—otherwise, LlamaIndex will go to the completion endpoint, which we don't have. is_function_calling_model=True enables native tool calls. Set context_window according to the model so LlamaIndex correctly handles context.
Example: RAG pipeline with query engine
A classic LlamaIndex scenario is an index over your documents and queries to it via query_engine. The global LLM is set once via Settings.llm, then the entire pipeline automatically uses Gonka.
from llama_index.core import (
VectorStoreIndex,
SimpleDirectoryReader,
Settings,
)
from llama_index.llms.openai_like import OpenAILike
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
# 1. LLM via Gonka (once — globally)
Settings.llm = OpenAILike(
api_base="https://gate.joingonka.ai/v1",
api_key="jg-your-key",
model="Qwen/Qwen3-235B-A22B-Instruct-2507-FP8",
is_chat_model=True,
context_window=131072,
max_tokens=8192,
)
# 2. Local embeddings (free, without OpenAI)
Settings.embed_model = HuggingFaceEmbedding(
model_name="BAAI/bge-small-en-v1.5"
)
# 3. Loading and indexing documents from the ./data folder
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
# 4. Querying the knowledge base
query_engine = index.as_query_engine()
response = query_engine.query("What is this document about?")
print(response)Critical nuance about embeddings: by default, VectorStoreIndex uses OpenAI embeddings (text-embedding-ada-002)—these are separate paid calls to OpenAI, not to Gonka. To completely move away from OpenAI, set a local embedding model via Settings.embed_model (as in the example above—HuggingFaceEmbedding, package pip install llama-index-embeddings-huggingface). Then generation goes through Gonka, and vectorization is local and free.
Cost: one RAG pipeline query (search + generation) consumes ~2–5K LLM tokens. Through Gonka, this is fractions of a cent; through OpenAI/Anthropic, it's 3–4 orders of magnitude more expensive. For a stream of thousands of queries per day, the difference turns into tens of thousands of dollars in monthly savings.
Comparison of RAG Load Costs
A RAG application is not a one-time chat but a continuous stream of requests: each user question pulls ~2–5K LLM tokens (the question itself plus found context fragments). Let's calculate typical volumes and what they cost on different providers. Gonka prices via JoinGonka Gateway: input ~$0.0005/1M, output ×3.
| Scenario | LLM Tokens | OpenAI / Anthropic | JoinGonka Gonka |
|---|---|---|---|
| One question to the knowledge base | ~4K | $0.01 — $0.06 | ~$0.000005 |
| Support chatbot (1K queries/day) | ~4M/day | $10 — $60 per day | ~$0.005 per day |
| Indexing + Q&A on corpus (1M words) | ~5M | $12 — $75 | ~$0.006 |
| Production service, 50K queries/month | ~200M/month | $500 — $3,000 per month | ~$0.25 per month |
With 10M free tokens, you can debug the entire RAG pipeline, index a test corpus, and run thousands of queries—without spending a cent. At production volumes, JoinGonka Gateway turns RAG from an expensive service into an expense item you might not even notice.
Agents, tool calling and model selection
LlamaIndex can not only answer based on documents but also build agents with tools. All three Gonka models support native tool calling—agents call functions structurally, without text parsing. Example of an agent with a tool:
import asyncio
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai_like import OpenAILike
llm = OpenAILike(
api_base="https://gate.joingonka.ai/v1",
api_key="jg-your-key",
model="Qwen/Qwen3-235B-A22B-Instruct-2507-FP8",
is_chat_model=True,
is_function_calling_model=True,
context_window=131072,
max_tokens=8192,
)
def multiply(a: float, b: float) -> float:
"""Multiplies two numbers."""
return a * b
agent = FunctionAgent(
tools=[multiply],
llm=llm,
system_prompt="You are a helpful assistant. Use tools for calculations.",
)
async def main():
result = await agent.run("What is 1234 multiplied by 5678?")
print(result)
asyncio.run(main())Model selection (model field and corresponding context_window / max_tokens limits):
Model (model) | Context | Max Output | When to use |
|---|---|---|---|
Qwen/Qwen3-235B-A22B-Instruct-2507-FP8 | 128K | 8192 | Default: RAG, agents, long answers |
moonshotai/Kimi-K2.6 | 128K | 3072 | Strong reasoning and tool calling |
MiniMaxAI/MiniMax-M2.7 | 128K | 4096 | Alternative for agent tasks |
The max_tokens limit via Gateway is up to 8192 for Qwen3; for Kimi and MiniMax, specify 3072 and 4096 respectively. If max_tokens is not specified for a non-streaming request, up to 1500 tokens will be returned by default—for RAG answers and agent steps, set the value explicitly.
TypeScript: For LlamaIndex.TS, there's a mirroring path—the OpenAI class from the @llamaindex/openai package accepts baseURL and apiKey (or reads OPENAI_BASE_URL / OPENAI_API_KEY variables), so the same Gateway connects in the Node.js stack. If you're building AI applications and on Python frameworks, also check out the guide for LangChain.