1M Context Model Free: Trillions of Tokens Burned in Three Weeks, This Company Still Hasn't Charged
1M context. Free.
Putting these two words together feels unusual in the AI scene of 2026. 1M context means you can fit the entire Three-Body Problem trilogy in with room to spare. Among models that currently support this length, GPT-4.1 charges $2 per million input tokens, and Gemini 2.5 Pro, while offering 1M context, is also usage-based. Agnes AI's Agnes-2.0-Flash is completely free — no credit card required, no trial period, no feature restrictions.
Three weeks ago I wrote about this Singapore-based company — text, image, and video model APIs all free. The most common comment then was: "They can't last long, can they?"
Three weeks later. They haven't shut down, and they've even updated with two major capabilities.
What's new this time
Text model: Context window expanded from 128K to 1M (1 million tokens), with max output also increased to 65.5K tokens. It can read long texts and write long outputs.
Image model: New Agnes-Image-2.1-Flash, output resolution increased from 2K to 4K (3840×2160), with four options: 1K/2K/3K/4K, and multiple aspect ratios: 1:1/3:4/4:3/16:9/9:16. It can also do background replacement, partial edits, image text editing, and inpainting.
Video model Agnes-Video-V2.0 didn't have major changes, but in the first week alone it generated 2 million seconds of video.
What volume was achieved in three weeks
Agnes AI started offering free access on June 1st. The official LinkedIn post shared first-week data:
- Text model processed over 1 trillion tokens
- Image model generated over 2 million images
- Video model generated over 2 million seconds of video
By the third week, weekly token usage had grown to 3.12 trillion. This scale is no joke.
How much does 1 trillion tokens cost in server compute? By the most conservative estimate, GPU time for bfloat16 inference would cost the text model alone hundreds of thousands of dollars per week. Image and video models are even more expensive.
Over three weeks, inference costs alone are already in the seven-figure USD range. Free. And still running.
What problem does 1M context actually solve?
Many people think "longer context" just means "can input more text." That's the biggest misunderstanding.
128K context is roughly the length of a novel. Enough for daily document processing, but when dealing with long documents, code projects, or multi-turn agent conversations, it starts to fall short — you have to chunk, summarize, and stitch back, often losing information along the way.
1M context truly alleviates engineering burdens:
Long document analysis — product documentation, technical specs, meeting minutes, industry reports — just throw them all in and let the model find the key points. No need for you to pre-summarize or split into segments.
Code project understanding — put multiple source files, API docs, and change logs into context together, letting the model understand the whole project before explaining code or locating bugs. Previously, when using Claude Code for agentic programming, limited context meant constant compression and cleanup; now you can fit everything in one go.
Multi-step agent tasks — agents need to plan, call tools, read results, and adjust plans during execution. The longer the context, the less likely the model is to "forget." With 128K context, an agent might start losing early information after step 20. With 1M context, it stays stable even after hundreds of steps.
One easily overlooked point: Agnes-2.0-Flash natively supports Function Calling, Tool Use, and structured output. This means it's not just a chat-only text model; it's a versatile model capable of building agent workflows directly — multi-step tool calls, web search, file processing, custom knowledge bases — all supported at the API level.
4K image generation: from "decent" to "usable"
2K images look blurry on large screens and have limited cropping flexibility. For people making posters, e-commerce images, or social media covers, resolution directly determines whether the generated image is a "reference material" or a "deliverable."
4K output brings direct benefits:
- Generated images are closer to ready-to-deliver assets, no need for upscaling
- Cropping doesn't cause blurring, leaving more room for post-editing
- More detail preserved — cleaner text, textures, and edges
Agnes-Image-2.1-Flash also supports image editing — background replacement, local modifications, image text editing, and inpainting. It's essentially a generation + retouching tool combined, and still free.
How to integrate: just change one URL line
Agnes AI's API is compatible with OpenAI's format. Your existing toolchain requires almost no changes — just swap the base_url and model to start:
- API endpoint:
https://apihub.agnes-ai.com/v1 - Text model:
agnes-2.0-flash(1M context) - Image model:
agnes-image-2.1-flash(4K output) - Video model:
agnes-video-v2.0 - Signup link: https://platform.agnes-ai.com/
It supports direct integration with tools like WorkBuddy, Claude CLI, Cherry Studio, Cursor, Codex CLI, etc. If you're using OpenAI's API, just change base_url from https://api.openai.com/v1 to https://apihub.agnes-ai.com/v1, and set the model to agnes-2.0-flash. No other code changes needed.
If you're using OpenModel (https://www.openmodel.ai?ref=wYOxNxlv), Agnes series models are already listed; switch with a single command.
How long can the free offer last?
This is the question on everyone's mind. I'm not sure either.
But here are a few facts to consider:
- Agnes AI is backed by Sapiens AI, a Singapore-based company currently raising funds at a $100 million valuation
- Agora (NASDAQ-listed API company) provides underlying real-time communication infrastructure for Agnes AI
- The official statement says "indefinitely free" — not a trial period
- First week: 1 trillion tokens; third week: 3.12 trillion — this is not a hollow promise
The business logic behind the free model is likely similar to the API gateway industry — use free models to attract developers, then build value-added services and enterprise editions on top of the ecosystem. Domestic CDN and cloud storage companies have followed this path: grow the user base first, then monetize.
For developers, the current stage is pure bonus. 1M context + Function Calling + 4K image generation + video generation — all free. When they eventually start charging, the only loss is the cost of changing your base_url back.
A reminder
Free doesn't mean unlimited. Agnes AI currently has rate limits; under high concurrency you might encounter 429 errors. If you're running production-level services, still consider fallback options.
Also, check the privacy policy of the free model for yourself. Whether sensitive data goes through training or is used for model improvement is explained in the official documentation — it's recommended to review before use.
The fact that 1M context is openly free, no matter how long it lasts, has already raised the industry's bar. From now on, anyone trying to charge for 1M context will have to answer one question: if others offer it for free, why should you charge?
Signup link: https://platform.agnes-ai.com/
API documentation: https://agnes-ai.com/doc/agnes-20-flash
暂无评论。