Meet RIN (凛) — our free, unlimited reasoning model. Built on Mistral AI's Ministral 3B foundation and fine-tuned for precision. 350-400 tokens per second, 256K context window. The precision instrument for builders who value speed over hand-holding.
RIN (凛) in Japanese means dignified, cold, and severe. It commands respect without seeking approval. One syllable, impossible to mispronounce, easy to remember. Like “Grok” or “Claude” — it just sticks.
This isn’t a friendly chatbot name. It’s the psychology of the sharp knife in a chef’s hand, the tuned sports car engine, the perfectly balanced mechanical keyboard. Tools that respect the craftsman.
Built on Mistral AI's foundation. We didn't build Ministral 3B from scratch—Mistral AI did. What we did was take their exceptional 3B parameter model and fine-tune it for our users. Think of it as taking a high-performance engine and tuning it for your specific track.
Inference Speed
350-400 tok/s
3.5x faster than GPT-4
Context Window
256K tokens
2x larger than GPT-4 Turbo
Model Size
3B parameters
50x smaller, same quality
Latency (TTFT)
<100ms
Near-instant responses
RIN (Ministral 3B 2512) punches above its weight class, competing with models 10x its size across reasoning, comprehension, and multilingual tasks.
Large-scale reading comprehension dataset with 650K+ question-answer-evidence triples
Human-centric benchmark evaluating foundation models on standardized exams (GaoKao, SAT, LSAT, etc.)
12,500 challenging competition mathematics problems requiring step-by-step reasoning
Comprehensive multilingual benchmark covering 29 diverse languages with 11,829 questions
Built on Mistral AI's Ministral 3B (2512) foundation. We've fine-tuned it for precision, speed, and reasoning—delivering flagship-class performance in a compact 3B parameter model.
Blazing fast inference. No waiting. No buffering. No thinking delays. RIN responds at the speed of thought, keeping you in flow state instead of watching a spinner.
Rate-limited per hour, but unlimited overall. No credits, no quotas, no upgrade prompts. RIN is our gift to builders who need a sharp, fast tool without the billing anxiety.
No waiting. No buffering. No thinking delays. RIN responds at the speed of thought, keeping you in flow state instead of watching a spinner.
RIN isn't trying to be your friend. It's a precision instrument. No warm greetings, no apologies, no disclaimers. Just fast, accurate responses.
Like a chef's blade or a tuned engine, RIN is about surgical precision. It cuts through complexity without the bloat of models 10x its size.
Every delay is a micro-frustration. At 350-400 tok/s, RIN keeps up with your thoughts. You're driving, not waiting.
Every AI has a personality. RIN occupies the “performance tool” quadrant — not a chatbot, a system.
| Model | Psychology | Feeling |
|---|---|---|
| ChatGPT | The helpful assistant | Safe, friendly |
| Claude | The thoughtful colleague | Warm, ethical |
| Gemini | The Google ecosystem | Integrated, familiar |
| RIN | The precision instrument | Sharp, fast, uncompromising |
Use case
Paste a function, get instant feedback. RIN’s speed means you can iterate on code without breaking your flow.
Use case
Rapid-fire idea generation. Ask, get, refine, repeat. No waiting means more iterations per minute.
Use case
Ask questions freely without worrying about credits. Perfect for students, hobbyists, and curious minds.
Use case
First drafts, outlines, summaries. RIN’s thinking capability produces structured, reasoned output fast.
Base Model
mistralai/ministral-3b-2512
Architecture
Fine-tuned (not MoE)
Context Window
256,000 tokens
Inference Speed
350-400 tokens/second
Parameters
3 billion (all active)
Capabilities
Reasoning, Thinking (no web/files)
Training
Fine-tuned on curated datasets
Rate Limit
Per-hour throttling, unlimited overall
RIN doesn’t waste your time, energy, or tokens. It’s the AI for people who don’t want to think about the AI. Sharp. Fast. Precise. That’s it.
Z.AI's GLM-5.1 is the first model built for long-horizon autonomous coding — running independently for 8+ hours, planning, executing, and self-improving without human input. It beats or matches GPT-5.4 and Claude Opus 4.6 on several benchmarks. Here's why we made it free, how it compares, and why Pro still matters.
Read moreApril 4, 2026Three frontier-grade models go free — Xiaomi MiMo-V2-Pro (1M context), Z.AI GLM-5V-Turbo (native multimodal agent), and Google Gemma 4 31B IT (89.2% AIME 2026). Premium gets sharper with GPT-5.4, GLM-5, and MiniMax M2.7. Plus: why we cleaned up the lineup and what frontier actually means now.
Read moreFebruary 12, 2026MiniMax M2.5 launches on LeemerChat with breakthrough performance in Word, Excel, and PowerPoint generation. Scoring 80.2% on SWE-Bench Verified and 76.3% on BrowseComp, M2.5 extends M2.1's coding expertise into general office productivity.
Read moreFebruary 11, 2026GLM-5 launches on LeemerChat with major upgrades in scale, training data, and RL infrastructure. Built for long-horizon agentic systems, coding reliability, and complex reasoning under production constraints.
Read more