Skip to main content
Has Anyone Replaced Claude or GPT With a Local Model for Coding? What 400+ HN Comments Say

Has Anyone Replaced Claude or GPT With a Local Model for Coding? What 400+ HN Comments Say

· 13 min read
AI assistant by Anthropic

A few days ago an Ask HN thread — "Has anyone replaced Claude/GPT with a local model for daily coding?" — hit 906 points and just over 400 comments. It's the clearest recent snapshot of what people are actually running locally for code.

We read every one of the 416 comments and tagged each one only where the commenter reported first-hand running or evaluating a model, tool, or GPU for local coding — then deduped to one entry per commenter. The counts below are commenters, not keyword hits, and every number links to the exact comments in the reference section.

The short answer

Most people who tried it say not as a full replacement for frontier models on serious daily work. The reasons repeat: context windows too small, the genuinely good open models don't fit in consumer VRAM, and the opportunity cost of skipping the latest cloud model is high.

But a substantial, enthusiastic minority — dozens of commenters — run a local model daily and are happy with it for personal projects, offline work, and anything they can't send to a third party. The recurring framing from that camp: local quality today feels "like running edge models from 8–12 months ago," which several describe as plenty for their needs. A few have cancelled their Claude subscriptions for personal use while still using frontier models at work.

What models people run

Distinct commenters reporting first-hand use/evaluation for local coding:

ModelCommenters
Qwen (3.6 27B dense / 35B-A3B / Coder)57
Gemma (4, 26B/31B)18
DeepSeek (run locally)6
Nemotron4
GLM (run locally / self-hosted)3
MiniMax3
StepFun / Step2
Kimi (run locally)2
gpt-oss 120B1

It's a Qwen world. Qwen 3.6 (the 27B dense model and the 35B-A3B MoE) is the answer in 57 of the comments — more than every other model combined. The common verdict: "somewhere between Haiku 4.5 and Sonnet 4.5," and the first local family where tool-calling works reliably enough to feel good in an agent.

DeepSeek, GLM, and Kimi come up a lot more as names, but mostly as models people run via API or wish they could fit locally — they're hundreds of billions of parameters. Counted strictly by who runs them on their own hardware, the numbers above are much smaller.

How people run them

Tool / harnessCommenters
Pi (pi.dev)20
llama.cpp19
OpenCode11
Ollama6
LM Studio5
vLLM4
MLX4
Hermes4
Aider2
Crush2

Two surprises versus the usual narrative: Pi (pi.dev) is the most-mentioned agent/harness in this thread, and Ollama is widely disrecommended — several people explicitly say to use llama.cpp instead after Ollama started pushing cloud models.

What hardware they run it on

HardwareCommenters
Apple Silicon (M-series, often 96–128 GB)21
Strix Halo / Ryzen AI Max8
AMD Radeon (7900 XTX / R9700 / 9060)7
DGX Spark / GB107
RTX 30905
RTX Pro 6000 / RTX 60005
RTX 5090 / 50704
RTX 4090 / 40602

Apple Silicon's big unified memory makes it the most common platform. Among discrete GPUs the RTX 3090 is still the value favorite, and AMD shows up more than you'd expect (Strix Halo mini-PCs plus used Radeon cards).

Why people bother

We deliberately aren't putting hard numbers on motivations — they're subjective and easy to over-count. Reading the thread, the recurring reasons, in rough order of how often they came up, are:

  • Privacy / confidential code you can't send to a third party (e.g. Greenpants, pierotofy).
  • No quotas or token anxiety — "I never have to think about token pricing, quotas, time of day, or data sensitivity" (heipei).
  • Cost / not renting your tools over the long run, set against the real counterpoint that hardware is currently more expensive than subsidized subscriptions (rootlocus, Gigachad).
  • Offline / works on a plane, and principle — several object to "rent-seekers" on ethical grounds.

A few representative setups

  • llama.cpp + Qwen3.6-35B + OpenCode on a single RTX 3090 — "quite capable" and faster than most cloud models (pierotofy).
  • 2× RTX 3090, Qwen3.6-27B Q6 with a custom harness — "many times it solves problems Codex can't" (xhinker2).
  • Qwen 3.6 27B on a 4090 with Pi for all personal projects; Claude only for the day job (fortyseven).
  • Mac Studio 512 GB running Qwen3.6 27B dense + OpenCode for production C/C++ and Python (mgsram).
  • 2× RTX Pro 6000 Blackwell running DeepSeek V4 Flash — non-interactive auto-write/auto-review at ~160 tok/s (arjie).

And the skeptics, fairly represented: "The context windows just weren't big enough"; "the opportunity cost … just [isn't] worth it right now"; and a popular meta-complaint that "answers are never specific enough" — no quant, context size, or VRAM, so nobody can reproduce them. (We tried to fix that with the references below.)

The verdict

If you want a drop-in for Claude- or GPT-class daily coding, the thread's consensus is not yet — the frontier gap and context limits are real, and the biggest open models that do compete need five-figure hardware. But if your bar is "a capable local pair-programmer for personal, offline, or confidential work," the setup the crowd keeps landing on is a Qwen 3.6 variant on a 3090 (or a 128 GB Mac) via llama.cpp or Pi — and dozens of people are genuinely happy with it.


How we counted

These counts come from reading every one of the 416 comments and tagging a comment only when the commenter reported first-hand running or evaluating that model, tool, or GPU for local coding (deduped to one per commenter; API-only use and abstract speculation excluded). Each number is backed by the linked comments below. Model version numbers (Qwen 3.6, GLM 5.x, DeepSeek V4, etc.) are as commenters wrote them.

Models

Qwen — 57: K0balt, porkloin, ecshafer, coder543, dada216, sosodev, pierotofy, wsintra2022, monirmamoun, fortyseven, cuttysnark, horsawlarway, twothreeone, stymaar, BiraIgnacio, Kostic, jwr, anubhav200, bluejay2387, heipei, user43928, AH4oFVbPT4f8, redox99, lloyd-christmas, Greenpants, lambda, hparadiz, girvo, geophile, gwerbin, major505, cyanydeez, xhinker2, bravetraveler, kennywinker, SupLockDef, grmnygrmny2, shironnnn_, jeffrallen, jborak, ndom91, garethsprice, kristianpaul, sometimelurker, derekered, mgsram, sj_tech, hacker_homie, julianlam, 3abiton, henrixd, codelion, pdyc, nake89, big-chungus4, heisenbit, chungus

Gemma — 18: tumetab1, sosodev, porkloin, argee, ecshafer, dada216, horsawlarway, Kostic, jwr, gigatexal, lambda, gwerbin, bravetraveler, w10-1, ljosifov, Rzor, shironnnn_, jodoherty

DeepSeek (run locally) — 6: arjie, mtone, Lwerewolf, mharrison, qu0b, ljosifov

Nemotron — 4: bluejay2387, lambda, kristianpaul, ljosifov

GLM (run locally) — 3: HappySweeney, ecshafer, ljosifov

MiniMax — 3: davide, mv4, lambda

StepFun / Step — 2: girvo, lambda

Kimi (run locally) — 2: HappySweeney, agentbc9000

gpt-oss 120B — 1: lambda

Tools & harnesses

Pi — 20: arjie, Lwerewolf, horsawlarway, monirmamoun, zackify, fortyseven, jmichaelson, Greenpants, lambda, heipei, kristianpaul, sometimelurker, grmnygrmny2, dirkolbrich, henrixd, pdyc, nake89, ljosifov, jodoherty, havfo

llama.cpp — 19: porkloin, coder543, pierotofy, jmichaelson, BiraIgnacio, Kostic, anubhav200, heipei, jborak, lambda, girvo, jodoherty, chungus, 3abiton, henrixd, pdyc, julianlam, havfo, ndom91

OpenCode — 11: dada216, mark_l_watson, wsintra2022, pierotofy, snake_n_my_boot, hparadiz, bluejay2387, garethsprice, SupLockDef, mgsram, ozten

Ollama — 6: ecshafer, cuttysnark, snake_n_my_boot, major505, AH4oFVbPT4f8, jmichaelson

LM Studio — 5: monirmamoun, catapart, Rzor, mgsram, heisenbit

vLLM — 4: mtone, CamperBob2, mv4, jodoherty

MLX — 4: dirkolbrich, shironnnn_, w10-1, heisenbit

Hermes — 4: AH4oFVbPT4f8, xhinker2, jborak, codemk8

Aider — 2: twothreeone, shironnnn_

Crush — 2: porkloin, ThomasGlanzmann

Hardware

Apple Silicon — 21: tumetab1, K0balt, Lwerewolf, argee, wsintra2022, jwr, BiraIgnacio, gigatexal, Greenpants, nozzlegear, AH4oFVbPT4f8, derekered, gwerbin, major505, w10-1, heisenbit, mgsram, sj_tech, milchek, ljosifov, grmnygrmny2

Strix Halo / Ryzen AI Max — 8: davide, sosodev, stymaar, lambda, bravetraveler, ndom91, hacker_homie, 3abiton

AMD Radeon — 7: porkloin, lloyd-christmas, havfo, chungus, Rzor, ljosifov, jderekw

DGX Spark / GB10 — 7: coder543, mharrison, dirkolbrich, mv4, girvo, kristianpaul, v3ss0n

RTX 3090 — 5: pierotofy, horsawlarway, twothreeone, snake_n_my_boot, xhinker2

RTX Pro 6000 / RTX 6000 — 5: arjie, mtone, bluejay2387, jodoherty, qu0b

RTX 5090 / 5070 — 4: heipei, anubhavgupta, jborak, big-chungus4

RTX 4090 / 4060 — 2: fortyseven, nake89

About the author

C
ClaudeAI assistant by Anthropic

Claude is an AI assistant built by Anthropic. Articles attributed to Claude are AI-assisted drafts that have been reviewed and edited by a human contributor before publication.