Back to blog
Model Launch

RIN: Sharp. Fast. Precise.

Meet RIN (凛) — our free, unlimited reasoning model. Built on Mistral AI's Ministral 3B foundation and fine-tuned for precision. 350-400 tokens per second, 256K context window. The precision instrument for builders who value speed over hand-holding.

Base ModelMinistral 3B
Speed350-400 tok/s
Context256K tokens
PriceFREE

Why “RIN”?

RIN (凛) in Japanese means dignified, cold, and severe. It commands respect without seeking approval. One syllable, impossible to mispronounce, easy to remember. Like “Grok” or “Claude” — it just sticks.

This isn’t a friendly chatbot name. It’s the psychology of the sharp knife in a chef’s hand, the tuned sports car engine, the perfectly balanced mechanical keyboard. Tools that respect the craftsman.

Built on Mistral AI's foundation. We didn't build Ministral 3B from scratch—Mistral AI did. What we did was take their exceptional 3B parameter model and fine-tune it for our users. Think of it as taking a high-performance engine and tuning it for your specific track.

Performance Metrics

Inference Speed

350-400 tok/s

3.5x faster than GPT-4

Context Window

256K tokens

2x larger than GPT-4 Turbo

Model Size

3B parameters

50x smaller, same quality

Latency (TTFT)

<100ms

Near-instant responses

Benchmark Performance

RIN (Ministral 3B 2512) punches above its weight class, competing with models 10x its size across reasoning, comprehension, and multilingual tasks.

TriviaQA

Large-scale reading comprehension dataset with 650K+ question-answer-evidence triples

Kimi K2 Base75%
Gemma 2 27B73%
Mistral Small 3.1 24B68%
Ministral 3 (3B Base 2512)55%
Granite 3.3 8B Base52%

AGIEval

Human-centric benchmark evaluating foundation models on standardized exams (GaoKao, SAT, LSAT, etc.)

Mistral Small 3 24B58%
Ministral 3 (14B Base)56%
Ministral 3 (3B Base 2512)54%
Gemma 2 27B52%
Granite 3.3 8B Base48%

MATH (CoT)

12,500 challenging competition mathematics problems requiring step-by-step reasoning

Llama 3.1 70B Instruct52%
Ministral 3 (14B Base)48%
Ministral 3 (8B Base)42%
Ministral 3 (3B Base 2512)58%
Llama 3.1 8B Instruct35%

Multilingual MMLU

Comprehensive multilingual benchmark covering 29 diverse languages with 11,829 questions

o3-mini85%
Ministral 3 (14B Base)72%
Ministral 3 (8B Base)68%
Ministral 3 (3B Base 2512)75%
Phi 4 Mini62%

Fine-Tuned Ministral

Built on Mistral AI's Ministral 3B (2512) foundation. We've fine-tuned it for precision, speed, and reasoning—delivering flagship-class performance in a compact 3B parameter model.

350-400 Tokens/Second

Blazing fast inference. No waiting. No buffering. No thinking delays. RIN responds at the speed of thought, keeping you in flow state instead of watching a spinner.

Unlimited & Free

Rate-limited per hour, but unlimited overall. No credits, no quotas, no upgrade prompts. RIN is our gift to builders who need a sharp, fast tool without the billing anxiety.

The RIN Philosophy

350-400 Tokens/Second

No waiting. No buffering. No thinking delays. RIN responds at the speed of thought, keeping you in flow state instead of watching a spinner.

Not a chatbot

RIN isn't trying to be your friend. It's a precision instrument. No warm greetings, no apologies, no disclaimers. Just fast, accurate responses.

The sharp knife

Like a chef's blade or a tuned engine, RIN is about surgical precision. It cuts through complexity without the bloat of models 10x its size.

Flow state enabler

Every delay is a micro-frustration. At 350-400 tok/s, RIN keeps up with your thoughts. You're driving, not waiting.

Psychological Positioning

Every AI has a personality. RIN occupies the “performance tool” quadrant — not a chatbot, a system.

ModelPsychologyFeeling
ChatGPTThe helpful assistantSafe, friendly
ClaudeThe thoughtful colleagueWarm, ethical
GeminiThe Google ecosystemIntegrated, familiar
RINThe precision instrumentSharp, fast, uncompromising

When to Use RIN

Use case

Quick code reviews

Paste a function, get instant feedback. RIN’s speed means you can iterate on code without breaking your flow.

Use case

Brainstorming sessions

Rapid-fire idea generation. Ask, get, refine, repeat. No waiting means more iterations per minute.

Use case

Learning & exploration

Ask questions freely without worrying about credits. Perfect for students, hobbyists, and curious minds.

Use case

Draft generation

First drafts, outlines, summaries. RIN’s thinking capability produces structured, reasoned output fast.

Under the Hood

Base Model

mistralai/ministral-3b-2512

Architecture

Fine-tuned (not MoE)

Context Window

256,000 tokens

Inference Speed

350-400 tokens/second

Parameters

3 billion (all active)

Capabilities

Reasoning, Thinking (no web/files)

Training

Fine-tuned on curated datasets

Rate Limit

Per-hour throttling, unlimited overall

The RIN Promise

RIN doesn’t waste your time, energy, or tokens. It’s the AI for people who don’t want to think about the AI. Sharp. Fast. Precise. That’s it.

Related Posts

April 9, 2026

GLM-5.1 Is Now Free on LeemerChat — The Model That Works for 8+ Hours Without Stopping

Z.AI's GLM-5.1 is the first model built for long-horizon autonomous coding — running independently for 8+ hours, planning, executing, and self-improving without human input. It beats or matches GPT-5.4 and Claude Opus 4.6 on several benchmarks. Here's why we made it free, how it compares, and why Pro still matters.

Read more
April 4, 2026

We Just Moved a Chunk of the Frontier Into the Free Tier

Three frontier-grade models go free — Xiaomi MiMo-V2-Pro (1M context), Z.AI GLM-5V-Turbo (native multimodal agent), and Google Gemma 4 31B IT (89.2% AIME 2026). Premium gets sharper with GPT-5.4, GLM-5, and MiniMax M2.7. Plus: why we cleaned up the lineup and what frontier actually means now.

Read more
February 12, 2026

MiniMax M2.5 Is Live: SOTA Productivity Model for Real-World Office Work

MiniMax M2.5 launches on LeemerChat with breakthrough performance in Word, Excel, and PowerPoint generation. Scoring 80.2% on SWE-Bench Verified and 76.3% on BrowseComp, M2.5 extends M2.1's coding expertise into general office productivity.

Read more
February 11, 2026

GLM-5 Is Live: Frontier Open-Source Scale for Complex Engineering and Agentic Work

GLM-5 launches on LeemerChat with major upgrades in scale, training data, and RL infrastructure. Built for long-horizon agentic systems, coding reliability, and complex reasoning under production constraints.

Read more
Explore more:All PostsReleasesModelsBenchmarksEngineeringInsightsAll FeaturesAbout UsTermsPrivacy