An AI Image Generation Model Under 1GB That Does the Work of 8GB
A 7.75GB model compressed to 0.93GB, with only a 12% quality loss.
But what you get in return is that your phone can run it.
PrismML just released Bonsai Image 4B, accomplishing something many thought impossible: compressing all the weights of the 7.75GB FLUX.2 Klein 4B image generation model into just two numbers: -1 and +1.
You are not mistaken – the entire neural network's parameters have been transformed from floating-point numbers into just two signs.
The compressed model generates a 512×512 image in 9.4 seconds on the iPhone 17 Pro Max and 6 seconds on the Mac M4 Pro. It can even run in a browser – open the HuggingFace WebGPU demo, no registration or API key required, just input your prompt and it generates an image.
Apache 2.0 open-source and free.
How Can It Still Generate Images with Only Two Weight Values?
First, understand what it does, then decide if it's a gimmick.
Typical diffusion models (like FLUX.2) store each weight in FP16 – 16-bit floating point, 2 bytes per parameter. Bonsai Image's approach is: no retraining, directly apply extreme quantization to the existing model weights.
Specifically, transformer layer weights are mapped to {-1, +1} (1-bit version) or {-1, 0, +1} (ternary version), paired with a set of FP16 scaling factors to compensate for precision loss.
It's like turning a high-definition photo into a pixel art image – information is indeed lost, but the outline and essence remain.
Data comparison for the two versions:
| Version | Size | Compression Ratio | Quality Retention |
|---|---|---|---|
| 1-bit version | 0.93 GB | 8.3x | ~88% |
| Ternary version | 1.21 GB | 6.4x | ~95% |
| Original FLUX.2 | 7.75 GB | — | 100% |
Including the text encoder and VAE, running fully on Apple Silicon, the 1-bit version requires only 3.42GB of memory – the original FLUX.2 needs nearly 16GB.
On three professional benchmarks (GenEval for object composition, HPSv3 for human preference, DPG-Bench for prompt adherence), the ternary version retains 88%, 95%, and 99.8% of the original's performance respectively. Overall, 95% quality retention.
The 1-bit version is equivalent to 1.125 bits/weight, the ternary version to 1.71 bits/weight. Adding one more state (zero) significantly boosts expressiveness.
In plain terms: unless you compare two images side by side, you basically won't notice the difference.
How to Use It? Simpler Than You Think
The Bonsai Image GitHub repository provides complete one-click scripts, supporting macOS, Linux, and Windows. Windows doesn't even require WSL2.
Setting up the environment:
macOS / Linux:
git clone https://github.com/PrismML-Eng/Bonsai-image-demo.git
cd Bonsai-image-demo
./setup.sh
Windows (PowerShell):
Set-ExecutionPolicy -Scope CurrentUser RemoteSigned
.\setup.ps1
The setup script automatically pulls the model weights – macOS uses MLX format, Linux/Windows uses Gemlite format.
Downloading the model:
# Recommended: ternary version for better quality
./scripts/download_model.sh
# For ultimate compactness, choose 1-bit version
./scripts/download_model.sh binary
Running it:
One-click launch Web Studio (FastAPI + Next.js frontend):
./scripts/serve.sh
# Open browser at localhost:3000
Or generate directly from the command line:
./scripts/generate.sh -p "A crystalline dragon perched on a snowy peak, cinematic lighting." --size 1024x1024 --seed 42
Default output is 512×512 for quick preview, 1024×1024 for final output. Dimensions must be multiples of 32.
Windows users: NVIDIA drivers must be up-to-date, and vcredist must be installed. GPUs with less than 4GB VRAM may run out of memory for 1024×1024 – just reduce to 512×512.
Don't want to set it up? HuggingFace has a WebGPU browser version – open the page and it runs, all local inference. iOS users can also download the Bonsai Studio App directly.
The Real Significance
Running on phones, browsers, and older GPUs – that's certainly cool. But Bonsai Image 4B truly changes the economic model of image generation.
Before: Image generation = cloud API. Every call costs money, every iteration waits for latency, every output goes over the network. Batch generation? You'd need to rent an A100.
Now: Local inference, marginal cost is zero. Change a prompt without waiting in queue, switch a seed without counting costs, tweak a parameter without checking the bill.
Image generation is inherently iterative – you never generate just one image and call it done. You repeatedly adjust prompts, change seeds, compare results. Local inference transforms "change and wait" into "change and get it instantly", completely altering the creative rhythm.
PrismML's announcement puts it well:
"Cloud APIs will continue to be the right choice for many products. But cloud-only generation imposes certain product constraints: every prompt is a remote request, every iteration carries marginal serving cost, and every interaction adds round-trip latency."
In other words: cloud APIs have their place, but if every time you generate an image you have to – send a request, wait for a response, pay – this creative loop is broken. Local inference lets you experiment freely with zero cost for trial and error.
The "Poor Man's GPT Moment" for Image Generation
Many people consider 1-bit quantization a toy for academia – too much precision loss to be practically usable.
Bonsai Image 4B proves: good enough is enough.
88% quality retention in exchange for 8.3x size compression. Going from "must have a high-VRAM GPU" to "runs on a phone browser" – this is not just incremental change, it's a qualitative leap.
Think about the path language models took: GPT-4 full version is powerful, but what truly changed the world were the small models quantized to 4-bit or 2-bit, running on consumer hardware. Image generation is walking the same path.
Apache 2.0 open-source. 9.4 seconds, 1.5GB memory, on a phone. This isn't publishing a paper; it's shipping a product.
The barrier to image generation has been pushed down to the floor.
Reference links:
PrismML official announcement: https://prismml.com/news/bonsai-image-4b
WebGPU browser Demo: https://huggingface.co/spaces/webml-community/bonsai-image-webgpu
GitHub: https://github.com/PrismML-Eng/Bonsai-image-demo
Want to try more AI tools?
🔧 OpenModel — AI model aggregation platform, one-stop experience
🚀 Stepfun 阶跃星辰 — Domestic large model, strong long-text capability
🤖 Agnes AI — 1M context + 4K image generation + video all free, API: apihub.agnes-ai.com/v1
Found this article useful? Share it with friends still waiting in API queues.
暂无评论。