
Coding Agent Harness Comparison 2026: Claude Code, Codex, Amp, OpenCode, Gemini CLI, Pi, Command Code, Factory, and Aider
In 2023, there was one serious terminal coding agent: Aider. By May 2026, there are at least nine, representing every possible combination of philosophy, funding, and model strategy. Some are backed by the largest AI labs in the world. Some were built by a single developer who got frustrated with the others. Some will charge you $200 a month; some are free software you can read line by line.
We compared Claude Code, OpenAI Codex CLI, Amp, OpenCode, Gemini CLI, Pi, Command Code, Factory, and Aider across open source status, model flexibility, supported models, release history, funding, and real-world GitHub issue patterns. Here's what actually differentiates them.
What is a coding agent harness?
A coding agent harness is a wrapper around an LLM that gives it tools to read and write files, run shell commands, and iterate on code autonomously. You describe what you want; the agent plans, edits, runs tests, fixes errors, and commits — without you approving every line.
The "harness" is everything except the model: the agent loop, the tool definitions, the context management, the system prompt, the interface. The harness can be a simple 200-token system prompt with four tools (Pi's philosophy) or a 10,000-token orchestration layer with sub-agents, hooks, MCP plugins, and cloud scheduling (Claude Code's current state).
The interesting question in 2026 is whether the harness matters at all. Amp's public thesis is that frontier models are now good enough that "nearly any agent can get good results" — the wrapper is no longer the limiting factor. If that's true, the interesting differences are pricing, model flexibility, and organizational features rather than prompt engineering.
The nine tools at a glance
| Tool | Company | Open source | Released | Funding / backer |
|---|---|---|---|---|
| Claude Code | Anthropic | No | Feb 2025 | $72B+ raised, $380B valuation |
| OpenAI Codex CLI | OpenAI | Yes (Apache 2.0) | Apr 2025 | $180B+ raised, $852B valuation |
| Amp | Amp Inc. (ex-Sourcegraph) | No | May 2025 | Sourcegraph raised $223M; Amp independent funding undisclosed |
| OpenCode | Anomaly (ex-SST) | Yes (MIT) | Jun 2025 | Undisclosed round |
| Gemini CLI | Yes (Apache 2.0) | Jun 2025 | Google (no separate round) | |
| Pi | Earendil Inc. | Yes (MIT) | Nov 2025 | None (bootstrapped) |
| Command Code | CommandCodeAI (ex-Langbase) | No | May 2026 | $5M seed |
| Factory | Factory AI | No | May 2025 | $200M+ raised, $1.5B valuation |
| Aider | Independent (Paul Gauthier) | Yes (Apache 2.0) | Jun 2023 | None |
Open source tools
The open source tools range from the original (Aider, 2023) to the largest by GitHub stars (OpenCode, 156K) to the most surprising lab entrant (Codex CLI, Apache 2.0). They share a core property: the harness code is readable and forkable. What varies is whether the models behind them are locked.
| Tool | License | Stars | Open issues | Open PRs | Model flexibility | Entry cost |
|---|---|---|---|---|---|---|
| OpenCode | MIT | 156,000 | 4,700 | 200+ | None — 75+ providers, BYOK | Free (pay provider) |
| Gemini CLI | Apache 2.0 | 103,000 | 2,100 | 402 | None — Gemini only | Free (1,000 req/day) |
| Codex CLI | Apache 2.0 | 80,600 | 3,500 | 356 | Partial — OpenAI default; Ollama officially supported | $20/mo (ChatGPT Plus) |
| Pi | MIT | 45,700 | 13 | — | None — 15+ providers, 324 models | Free (pay provider) |
| Aider | Apache 2.0 | 44,500 | — | — | None — 100+ via LiteLLM | Free (pay provider) |
Note: Claude Code's GitHub repo has 121,000 stars but contains only documentation and issue tracking, not source code.
Aider
Released: June 2023 | Creator: Paul Gauthier | Funding: None
Aider is the original. It appeared in June 2023 when the idea of a CLI coding agent was still experimental, and it established the git-first philosophy that most later tools adopted: every AI edit becomes a proper git commit with a descriptive message. You can review, revert, or squash the AI's work the same way you'd handle a junior engineer's PR.
Six million pip installs and fifteen billion tokens per week in 2026 suggest it is not a historical curiosity. It remains widely used because it makes one architectural bet that nothing else fully copies: the LLM should operate as a commit-level collaborator, not a file-level autocomplete.
Model support runs to 100+ providers via LiteLLM. Aider scores its own benchmark (the Aider Polyglot Benchmark, 225 Exercism exercises across six languages) as a way to rank LLMs on editing ability — a genuinely useful resource that has outlasted the tool's early benchmark leadership. Current SWE-bench scores for agent systems with the best models are in the 60–80% range; Aider's earlier ~26% on SWE-bench Lite reflected what was possible in 2023.
The main criticism is that it's a pair programmer, not an autonomous agent. It waits for input rather than completing multi-step tasks independently. If you want to hand off a ticket and come back to a PR, Aider is not the tool.
One sentence: The OG git-first pair programming agent, model-agnostic, 100+ providers, six million installs.
Notable GitHub activity: Aider hosts the Aider Polyglot Leaderboard — a respected external benchmark used to compare LLMs on code editing. Active maintenance, well-organized issues.
Pi
Released: November 2025 | Creator: Mario Zechner (also made libGDX) | Funding: None
Pi began as a personal frustration project. Mario Zechner — who had previously built the libGDX game framework with 23,000 GitHub stars — found that Claude Code was becoming increasingly unpredictable as features accumulated. He wanted a coding agent he could fully understand. The original repository was named shittycodingagent before being rebranded.
The core design constraint is transparency: Pi ships with exactly four tools (read, write, edit, bash) and a system prompt under 1,000 tokens. By comparison, Claude Code's system prompt is over 10,000 tokens and injects context the user cannot fully inspect. Pi's sessions are stored as JSONL trees that support branching — you can fork a session mid-task and try a different approach. Every aspect of the model interaction is visible.
At 45,700 stars and 3.17 million monthly npm downloads with only 13 open issues, Pi's GitHub health is exceptional. The low issue count despite high stars reflects either aggressive triage or genuinely fewer surface-area bugs — likely both.
Pi supports 15+ providers and 324 models. The oh-my-pi community fork adds LSP for 40 languages, Python execution, browser automation, and subagents while keeping the same minimal core.
One sentence: A minimal, fully transparent terminal agent — four tools, tiny system prompt, MIT licensed, 324 models, no SaaS backend.
Notable GitHub issues: Most open issues are TUI/terminal compatibility (Kitty, tmux, Zsh, Windows). Fast resolution. #4142: image paste can hard-abort Pi on macOS.
OpenCode
Released: June 19, 2025 | Company: Anomaly (founders: Jay V, Frank Wang, ex-SST/YC 2021) | Funding: Undisclosed
OpenCode is the fastest-growing open source coding agent by stars: 156,000 as of May 2026, compared to Claude Code's docs-only tracker repo at 121,000. It reached 6.5 million monthly developers in under a year.
The founders previously built Serverless Stack (SST) and Terminal.shop — a coffee subscription service run entirely in the terminal that generated $100,000 in first-year sales. This terminal-first product DNA is evident in OpenCode's TUI, which was built by Neovim users for Neovim users.
The philosophical stance is explicit: "Not an AI product — a product designed to use AI." OpenCode connects to 75+ providers and stores no code on its servers in BYOK mode. For teams wanting managed models, OpenCode Zen offers 40+ curated models with transparent per-token pricing and zero retention.
The client/server architecture is a technical differentiator: OpenCode runs as a server, clients connect to it. This enables Docker-sandboxed execution and remote sessions from a phone or browser — things that Claude Code's terminal-only architecture cannot support without additional tooling.
One sentence: The largest open source coding agent by stars, 75+ providers, client/server architecture for Docker and remote execution, MIT licensed.
Notable GitHub issues: #3176 (open): git add . runs on arbitrarily large directories for session snapshots, causing severe slowdowns on large repos (54K files, 45GB). #5690: Claude model hallucinates GitHub token errors inside CI workflows that actually have the correct permissions.
Gemini CLI
Released: June 25, 2025 | Company: Google | Funding: Google (no separate round)
Gemini CLI is Google's answer to Claude Code, and it is genuinely open source (Apache 2.0, 103,000 stars). The full agent code, prompts, and memory store are public and forkable. What is not open is the model: Gemini CLI calls Google's cloud APIs exclusively.
At launch, Google announced the "industry's largest free allowance" — 1,000 requests per day with a personal Google account. By March 25, 2026, they restricted free-tier users to Flash models only; Pro models require a paid Gemini Code Assist plan. The community reaction was stark: the announcement discussion received 963 downvotes against 136 upvotes, and multiple users documented switching to Claude Code or Codex CLI.
On SWE-bench Verified, Gemini 3.1 Pro scores ~80.6%, essentially tied with Claude Code at 80.9% and Codex CLI at ~80%. The practical differences show up in token efficiency: Gemini CLI uses approximately 432K tokens per task versus Claude Code's 261K for equivalent work.
One sentence: Google's Apache 2.0 CLI agent, Gemini-only, strong free tier at launch (now restricted), 103K GitHub stars.
Notable GitHub issues: Discussion #22970: rate limiting crisis announcement, 963 downvotes. Issue #26619 (CRITICAL, May 7, 2026): Gemini CLI silently falls back to Flash quota even when explicitly locked to Pro, then hits rate limits without warning. Issue #26621: infinite loop bug. Issue #26633: paying customers hitting same capacity errors as free users.
Codex CLI
Released: April 16, 2025 | Company: OpenAI | Funding: $180B+ raised, $852B valuation
Codex CLI is the most surprising entry on this list. OpenAI published a full CLI coding agent under Apache 2.0, built in Rust (96% of the codebase), and it is genuinely open source — not a docs-only tracker like Claude Code's GitHub repository. The VS Code extension remains closed source (issue #5822 requests this be changed), but the CLI itself is forkable and commercially usable.
The default models are OpenAI's — GPT-5.5, GPT-5.4, GPT-5.3-Codex — but Ollama and any OpenAI-compatible local server are officially documented and supported. Community forks route to Claude, Gemini, and Llama via gateway proxies. This puts Codex CLI in an intermediate position: it leans toward OpenAI, but it is not locked.
At 3 million weekly active users and 14.5 million monthly npm downloads, it is the second most-used CLI coding agent after Claude Code. The stated positioning differentiates from Claude Code by being async and delegation-focused rather than interactive: hand off a task, review the result. The GitHub PR automation is reportedly stronger than Claude Code's, with inline commenting and code review on cloud tasks.
One sentence: OpenAI's Apache 2.0 CLI agent, built in Rust, OpenAI models by default with official Ollama support, 80.6K stars.
Notable GitHub issues: Issue #7278: CLI and Codex-max hang after v5.1 upgrade. Issue #20865: agent started generating tangent designs without asking — "excessive token waste over 2-3 weeks." Issues #13725, #13832, #14209: server-side reconnection loops, still active after multiple versions. Issue #5822: request to open-source the VS Code extension.
Closed source tools
The closed source tools span the range from the most-used coding agent in the world (Claude Code) to one launched days ago (Command Code). What they share is that the harness code is not public. What varies is whether the models behind them are locked.
| Tool | Company | Released | Funding | Valuation | Model flexibility | Entry cost |
|---|---|---|---|---|---|---|
| Claude Code | Anthropic | Feb 2025 | $72B+ total | $380B | None — Claude only | $20/mo (rate-limited) |
| Amp | Amp Inc. | May 2025 | $223M (Sourcegraph) | undisclosed | Partial — mode-based multi-provider | Free ($10/day) |
| Factory | Factory AI | May 2025 | $200M+ | $1.5B | None — BYOK, multi-model | $20/mo |
| Command Code | CommandCodeAI | May 2026 | $5M seed | undisclosed | None — BYOK, 10+ providers | $1/mo |
Claude Code
Released: February 24, 2025 | Company: Anthropic | Funding: $72B+ raised across 18 rounds; $380B post-money valuation (February 2026)
Claude Code is the market-leading coding agent by usage: 46% of developers in the Pragmatic Engineer's February 2026 survey of 15,000 developers named it their "most loved" tool, compared to 19% for Cursor and 9% for GitHub Copilot. It drives a reported ~$2B in annualized revenue as of January 2026.
The GitHub repository at anthropics/claude-code has 121,000 stars and 5,000+ open issues, but the repository contains only documentation and an issue tracker — not the source code. This is a point of community frustration: issue #19073 is titled "[DOCS] Admit that claude-code isn't open source." In March 2026, a missing .npmignore entry accidentally published 512,000 lines of unobfuscated TypeScript source code in an npm package. Security researcher Chaofan Shou discovered it; the code was mirrored tens of thousands of times before being pulled. Clean-room rewrites hit 50,000 GitHub stars within hours.
Model flexibility is the clearest limitation. Claude Code routes exclusively to Anthropic's Claude models — directly, or via AWS Bedrock, GCP Vertex, or Azure Foundry. All of these are Claude. There is no official path to GPT or Gemini. Third-party proxies (LiteLLM, Bifrost) exist but are unsupported and break with updates.
At $20/month (Pro), rate limits hit after two to three hours of intensive use. Anthropic added weekly caps in August 2025, leading to reports of $200/month subscribers hitting weekly ceilings mid-week. The practical floor for daily professional use is the $100/month Max plan.
One sentence: The most-used CLI coding agent, deepest feature set (sub-agents, hooks, MCP, scheduling, Slack integration), Claude-only, $100-200/mo for professional use.
Notable GitHub activity: 5,000+ open issues, 525 open PRs. Issue #22002: community request to open source under Apache 2.0 or MIT. The accidental source leak in March 2026 is the most significant incident.
Amp
Released: May 2025 (Sourcegraph preview); December 2025 (independent company); February 2026 (CLI-only pivot)
Company: Amp Inc., spun out of Sourcegraph on December 2, 2025
Funding: Sourcegraph (parent) raised $223M over 5 rounds including a $125M Series D at $2.6B valuation. Amp's independent funding is undisclosed as of May 2026.
Amp's most significant product event was February 19, 2026: a public announcement titled "The Coding Agent Is Dead." The argument was that frontier models are now good enough that the agent wrapper no longer differentiates outcomes. Amp shut down its VS Code and Cursor extensions (they self-destructed on March 5, 2026) and pivoted to CLI-only.
The remaining supported interfaces — CLI, JetBrains, Neovim, Zed — reflect a deliberate choice to serve developers who already live in the terminal or in power-user editors. The VS Code exit removed a major adoption vector but sharpened the positioning.
Model routing is Amp's core technical differentiator. You select a mode, not a model: Smart mode runs Claude Opus 4.7, Deep mode runs GPT-5.5, Oracle mode runs GPT-5.4. Amp controls the routing; you cannot directly swap to an arbitrary model. But the underlying infrastructure uses multiple providers, unlike Claude Code's single-provider dependency.
Pricing is transparent pass-through: Amp charges exactly what the model provider charges, with no markup. The free tier provides $10/day for all modes for active users. Enterprise adds a 50% markup with SSO and zero data retention.
One sentence: Multi-model routing (Claude + OpenAI via modes, not BYOK), pay-at-cost, CLI-only since March 2026, strong team collaboration features.
Notable: "The Coding Agent Is Dead" post is a significant piece of industry thinking regardless of whether you agree with the conclusion.
Factory
Released: GA May 28, 2025 | Company: Factory AI (founded April 2023, Matan Grinberg + Eno Reyes) | Funding: $200M+ across seed, Series B ($50M, September 2025, NEA-led), Series C ($150M, April 2026, Khosla-led). Valuation: $1.5B.
Factory is the most enterprise-targeted tool in this comparison. The architecture is a coordinator agent that decomposes tasks and dispatches to specialized "Droids": Code, Review, Docs, Test, and Knowledge. The goal is not to help individual developers go faster — it is to replace manual steps in enterprise software delivery pipelines.
Customer names illustrate the target: Morgan Stanley, Ernst & Young, Palo Alto Networks, Nvidia, Adobe, MongoDB. Factory integrates natively with GitHub, GitLab, Jira, Linear, Slack, and PagerDuty. "Droid Computers" provide persistent cloud machines that retain package installs, repositories, and running services between sessions — solving the cold-start problem for long-horizon tasks.
Model flexibility is a selling point: Factory supports Claude, GPT-5 family, Gemini, open models (GLM, Kimi, MiniMax), and BYOK. The Series C announcement specifically called out "ability to switch between different foundation models" as a differentiator.
Factory's GitHub repository (843 stars) is client tooling only — CLI, SDKs, IDE extensions. The coordination logic is proprietary and cloud-hosted.
One sentence: Enterprise SDLC automation via specialized multi-agent "Droids," $1.5B valuation, BYOK multi-model, native integrations with GitHub/Jira/Slack.
Notable GitHub issues: #1090, #1094: Linux and Windows segfaults (Bun 1.3.13). #1091: DeepSeek V4 Flash 400 errors on tool calls. #276: droid exec cannot use custom droids in headless mode. Issue pattern suggests BYOK reliability and cross-platform stability are ongoing work.
Command Code
Released: ~May 2026 | Company: CommandCodeAI (formerly Langbase) | Funding: $5M seed, led by Tom Preston-Werner (GitHub co-founder). Angels include Amjad Masad (Replit CEO) and Luca Maestri (Apple CFO).
Command Code is the newest tool on this list by a large margin — the GitHub repository had 11 commits as of May 7, 2026, and the open issues were filed within 48 hours of each other, indicating a very recent first release.
The core differentiator is taste-1: a proprietary meta neuro-symbolic model that learns from your accepts, rejects, and edits over time. Taste profiles are stored as human-readable markdown in .commandcode/taste/taste.md. The pitch is that every other coding agent generates statistically average code — Command Code is the first to continuously adapt to your specific coding patterns.
The underlying infrastructure is not new: the company previously operated as Langbase, processing ~1.2 billion agent runs per month. The pivot is the taste layer and CLI-first interface on top of a known-working backend.
Model flexibility is strong: Claude, GPT-5, Gemini, Grok, DeepSeek, Qwen, Kimi, GLM, MiniMax, and BYOK. The $1/month Go plan gives access to open-source models only; $15/month Pro adds Claude and GPT-5.
One sentence: The newest entrant, differentiates via a personalization model that learns your coding style, BYOK multi-model, very early-stage with known crasher bugs.
Notable GitHub issues: #335: Cmd-V paste causes panic crash. #334, #325: paste functionality broken or overwrites instead of inserting. #332: VS Code integration not functioning. All filed May 6-7, 2026 — this is a product in its first public days.
Model flexibility: the real spectrum
"Model-agnostic" means different things across these tools. Here's the precise state as of May 2026:
| Tool | Can you use Claude? | OpenAI? | Gemini? | Local models? | Summary |
|---|---|---|---|---|---|
| Aider | Yes (BYOK) | Yes (BYOK) | Yes (BYOK) | Yes (Ollama) | Fully BYOK, 100+ via LiteLLM |
| Pi | Yes (BYOK) | Yes (BYOK) | Yes (BYOK) | Yes (Ollama) | Fully BYOK, 324 models |
| OpenCode | Yes (BYOK or Zen) | Yes | Yes | Yes | 75+ providers, Zen hosted option |
| Codex CLI | Community only | Yes (official) | Community only | Yes (official, Ollama) | OpenAI default; Ollama documented |
| Gemini CLI | No | No | Yes (only option) | No (unofficial) | Gemini-only |
| Command Code | Yes (BYOK) | Yes | Yes | Yes | BYOK, 10+ providers |
| Factory | Yes (BYOK) | Yes | Yes | Yes (open models) | BYOK, multiplier-based pricing |
| Amp | Via Smart mode | Via Deep/Oracle modes | No | No | Mode-based routing, no BYOK |
| Claude Code | Yes (only option) | No | No | No | Claude-only |
The two most locked-in tools (Claude Code and Gemini CLI) are also the two backed by AI labs with a direct financial interest in token consumption. Codex CLI is in an intermediate position: OpenAI-backed but with official local model support. Amp is an interesting case: it uses multiple providers but you cannot freely substitute; Amp routes for you.
GitHub health: warning signs and green flags
For open source projects, GitHub issue patterns reveal a lot about maintenance quality and fundamental design choices. Here are the most notable patterns across the five open source tools:
Design-level concerns:
-
OpenCode issue
#3176: The agent runsgit add .on all files in the current directory as part of session management, with no size check, no.gitignorerespect, and no user consent. On a 54,000-file, 45GB repository, this caused "severe system degradation." The reporter called it a "fundamental design flaw." Open as of May 2026. -
Gemini CLI discussion
#22970: The rate limiting announcement that received 963 downvotes. The free tier progressive restriction is a product/business decision, not a bug, but the community reaction — documented migration to other tools — is a leading indicator. -
Codex CLI issue
#20865: The agent started generating tangent architectural designs without being asked, ignoring normal conversational patterns, for 2-3 weeks. This is a reliability concern for users who depend on predictable behavior.
Fast resolution signals:
-
Pi has 13 open issues against 45,700 stars. Most are TUI compatibility edge cases (Kitty, tmux, Windows). The "closed-because-bigrefactor" label on older issues suggests proactive housekeeping.
-
Aider maintains clean issue organization and an active maintainer who has shipped 3+ years of continuous releases from a solo starting point.
Brand-new product signals:
- Command Code has 25 open issues, all filed within 48 hours, all focusing on paste/crash regressions and integration failures. This is normal for a product that shipped days ago, but it's worth noting for any early adopters.
How to choose
If model flexibility is your top priority: Pi, OpenCode, or Aider. All three are free software with no proprietary backend. Pi supports 324 models across 15+ providers. OpenCode supports 75+ providers. Aider uses LiteLLM to cover 100+. You pay your provider directly.
If you want the most polished product and don't mind Claude-only: Claude Code at $100/month (Max) is the most feature-complete tool with sub-agents, hooks, skills, MCP integration, Slack triggers, and cloud scheduling. The 46% "most loved" share in the Pragmatic Engineer survey reflects real usage, not marketing.
If you're building for a team and want enterprise integrations: Factory or Amp. Factory's multi-agent Droid architecture with Jira/GitHub/Slack integrations is purpose-built for enterprise SDLC. Amp's thread sharing, team workspaces, and zero-markup pricing scale well for collaborative workflows.
If you want open source from a major lab and some model flexibility: Codex CLI. It's genuinely Apache 2.0, genuinely open source, built in Rust, and officially supports Ollama for local models alongside its OpenAI default.
If cost is the priority: Gemini CLI's free tier (1,000 requests/day, now Flash-only after March 2026) is the cheapest managed option. Pi and Aider and OpenCode in BYOK mode are free software — you only pay your model provider.
If you care about minimal surface area and auditability: Pi. Four tools, under 1,000 token system prompt, every interaction inspectable, session trees that support branching, MIT licensed.
If you're watching the space and want to bet on the learning angle: Command Code. Very early, has real bugs, but the taste-1 personalization model is a genuinely different approach from everything else on this list. Tom Preston-Werner as lead investor suggests serious backing.
Conclusion
The coding agent harness market went from one serious tool to nine in three years. The clearest divisions:
By model strategy: Three groups. The lab tools locked to their own models (Claude Code, Gemini CLI). The lab tool with a foot in both camps (Codex CLI, OpenAI models by default but Ollama supported). The genuinely agnostic tools (Pi, OpenCode, Aider, Command Code, Factory, Amp via modes).
By open source status: The tools that open-sourced the harness (Aider, Pi, OpenCode, Codex CLI, Gemini CLI). The tools that kept it closed (Claude Code, Amp, Command Code, Factory). Note the anomaly: Claude Code's tracker repo is the second most-starred coding agent repository on GitHub despite containing no source code.
By target user: Individual developers who want control and transparency (Pi, Aider, OpenCode). The terminal-native developer who wants the best models and doesn't mind paying (Claude Code, Codex CLI). Teams doing collaborative development (Amp, OpenCode, Factory). Enterprises with existing SDLC tooling and compliance requirements (Factory, Claude Code Enterprise).
The fastest-moving signal is the model flexibility axis. In 2024, Claude Code being Claude-only was a smaller concern — Anthropic had the best models. In 2026, GPT-5.5, Gemini 3 Pro, and DeepSeek are all competitive. Paying $200/month for a harness that can only use one provider's models looks increasingly different from paying $0 for a harness that can switch on a per-task basis.
Amp's "The Coding Agent Is Dead" thesis may be premature — but the direction it points is credible. As the models converge, the harness choice will increasingly be about pricing, organizational features, model flexibility, and trust. Which of those you weight highest depends on whether you're a solo developer paying out of pocket, a team lead managing costs, or an engineering director procuring enterprise tooling.
All nine tools will look different by the end of 2026. Check the GitHub issue trackers before you commit.