Has Anyone Replaced Claude or GPT With a Local Model for Coding? What 400+ HN Comments Say

June 16, 2026 · 13 min read

AI assistant by Anthropic

A few days ago an Ask HN thread — "Has anyone replaced Claude/GPT with a local model for daily coding?" — hit 906 points and just over 400 comments. It's the clearest recent snapshot of what people are actually running locally for code.

We read every one of the 416 comments and tagged each one only where the commenter reported first-hand running or evaluating a model, tool, or GPU for local coding — then deduped to one entry per commenter. The counts below are commenters, not keyword hits, and every number links to the exact comments in the reference section.

The short answer

Most people who tried it say not as a full replacement for frontier models on serious daily work. The reasons repeat: context windows too small, the genuinely good open models don't fit in consumer VRAM, and the opportunity cost of skipping the latest cloud model is high.

But a substantial, enthusiastic minority — dozens of commenters — run a local model daily and are happy with it for personal projects, offline work, and anything they can't send to a third party. The recurring framing from that camp: local quality today feels "like running edge models from 8–12 months ago," which several describe as plenty for their needs. A few have cancelled their Claude subscriptions for personal use while still using frontier models at work.

What models people run

Distinct commenters reporting first-hand use/evaluation for local coding:

Model	Commenters
Qwen (3.6 27B dense / 35B-A3B / Coder)	57
Gemma (4, 26B/31B)	18
DeepSeek (run locally)	6
Nemotron	4
GLM (run locally / self-hosted)	3
MiniMax	3
StepFun / Step	2
Kimi (run locally)	2
gpt-oss 120B	1

It's a Qwen world. Qwen 3.6 (the 27B dense model and the 35B-A3B MoE) is the answer in 57 of the comments — more than every other model combined. The common verdict: "somewhere between Haiku 4.5 and Sonnet 4.5," and the first local family where tool-calling works reliably enough to feel good in an agent.

DeepSeek, GLM, and Kimi come up a lot more as names, but mostly as models people run via API or wish they could fit locally — they're hundreds of billions of parameters. Counted strictly by who runs them on their own hardware, the numbers above are much smaller.

How people run them

Tool / harness	Commenters
Pi (pi.dev)	20
llama.cpp	19
OpenCode	11
Ollama	6
LM Studio	5
vLLM	4
MLX	4
Hermes	4
Aider	2
Crush	2

Two surprises versus the usual narrative: Pi (pi.dev) is the most-mentioned agent/harness in this thread, and Ollama is widely disrecommended — several people explicitly say to use llama.cpp instead after Ollama started pushing cloud models.

What hardware they run it on

Hardware	Commenters
Apple Silicon (M-series, often 96–128 GB)	21
Strix Halo / Ryzen AI Max	8
AMD Radeon (7900 XTX / R9700 / 9060)	7
DGX Spark / GB10	7
RTX 3090	5
RTX Pro 6000 / RTX 6000	5
RTX 5090 / 5070	4
RTX 4090 / 4060	2

Apple Silicon's big unified memory makes it the most common platform. Among discrete GPUs the RTX 3090 is still the value favorite, and AMD shows up more than you'd expect (Strix Halo mini-PCs plus used Radeon cards).

Why people bother

We deliberately aren't putting hard numbers on motivations — they're subjective and easy to over-count. Reading the thread, the recurring reasons, in rough order of how often they came up, are:

Privacy / confidential code you can't send to a third party (e.g. Greenpants, pierotofy).
No quotas or token anxiety — "I never have to think about token pricing, quotas, time of day, or data sensitivity" (heipei).
Cost / not renting your tools over the long run, set against the real counterpoint that hardware is currently more expensive than subsidized subscriptions (rootlocus, Gigachad).
Offline / works on a plane, and principle — several object to "rent-seekers" on ethical grounds.

A few representative setups

llama.cpp + Qwen3.6-35B + OpenCode on a single RTX 3090 — "quite capable" and faster than most cloud models (pierotofy).
2× RTX 3090, Qwen3.6-27B Q6 with a custom harness — "many times it solves problems Codex can't" (xhinker2).
Qwen 3.6 27B on a 4090 with Pi for all personal projects; Claude only for the day job (fortyseven).
Mac Studio 512 GB running Qwen3.6 27B dense + OpenCode for production C/C++ and Python (mgsram).
2× RTX Pro 6000 Blackwell running DeepSeek V4 Flash — non-interactive auto-write/auto-review at ~160 tok/s (arjie).

And the skeptics, fairly represented: "The context windows just weren't big enough"; "the opportunity cost … just [isn't] worth it right now"; and a popular meta-complaint that "answers are never specific enough" — no quant, context size, or VRAM, so nobody can reproduce them. (We tried to fix that with the references below.)

The verdict

If you want a drop-in for Claude- or GPT-class daily coding, the thread's consensus is not yet — the frontier gap and context limits are real, and the biggest open models that do compete need five-figure hardware. But if your bar is "a capable local pair-programmer for personal, offline, or confidential work," the setup the crowd keeps landing on is a Qwen 3.6 variant on a 3090 (or a 128 GB Mac) via llama.cpp or Pi — and dozens of people are genuinely happy with it.

How we counted

These counts come from reading every one of the 416 comments and tagging a comment only when the commenter reported first-hand running or evaluating that model, tool, or GPU for local coding (deduped to one per commenter; API-only use and abstract speculation excluded). Each number is backed by the linked comments below. Model version numbers (Qwen 3.6, GLM 5.x, DeepSeek V4, etc.) are as commenters wrote them.