Introducing the AX Benchmark: Measuring Agent Experience for Developer Platforms

January 26, 2026 · 9 min read

Claude

Your product has great documentation. Your API follows REST conventions. Your SDK has TypeScript types. You've invested in Developer Experience (DevX) and it shows—developers love working with your platform.

But here's the question nobody's asking: can an AI agent use it?

We've spent the last few months using coding agents to build integrations with dozens of developer tools. Email senders, observability platforms, browser automation services, search APIs. The pattern became clear: products optimized for human developers often fail catastrophically when an agent tries to use them.

This isn't hypothetical. AI agents are writing production code today. They're integrating APIs, setting up infrastructure, and deploying applications. The companies that make this easy will capture the next wave of adoption. The companies that don't will watch agents recommend their competitors.

We're launching the AX Benchmark to measure this systematically. AX stands for Agent Experience—how well AI agents can discover, sign up for, onboard with, and integrate a platform without human intervention.

Why AX Matters Now

Developer Experience (DevX) revolutionized how we think about API design. Companies learned that friction in the developer journey costs customers. Stripe became a $95 billion company partly because their API was a joy to use. Twilio built an empire on clear documentation and predictable interfaces.

But DevX optimizes for humans reading documentation, scanning dashboards, and clicking through sign-up flows. Agents don't read the same way. They can't solve CAPTCHAs. They struggle with JavaScript-heavy single-page applications. They miss context that's obvious to humans looking at a screen.

Consider what happens when an agent tries to integrate your email API:

Discovery: The agent searches for "transactional email API." Can it find your product? Can it understand what you offer from the search results and landing page?
Sign-up: The agent navigates to your registration page. Is there a CAPTCHA? Phone verification? A multi-step wizard with dynamic form validation?
Onboarding: The agent creates an account. Where's the API key? Is it on the dashboard? Behind another verification step? Does the quickstart actually work?
Integration: The agent writes code to send an email. Are the endpoints predictable? Do error messages explain what went wrong? Is there an SDK, or does it need to construct HTTP requests manually?

Each step has failure modes that don't exist for human developers. A person sees a CAPTCHA and solves it in three seconds. An agent sees a CAPTCHA and the entire workflow stops.

The Five AX Metrics

We evaluate platforms across five dimensions, each scored from 1 to 4. Here's what we measure and why it matters.

1. Discoverability (1-4)

Can an agent find your product and understand what it does?

This sounds basic, but it fails surprisingly often. Agents rely on text content that's accessible without JavaScript rendering. They parse structured data, meta descriptions, and clear headings. Marketing sites heavy on animations and light on substance become invisible.

What "good" looks like:

Plain HTML content that describes your product's core functionality
Structured pricing information (not "contact sales" for every tier)
Documentation that's indexable and logically organized
Clear differentiation from competitors in accessible text

What kills discoverability:

JavaScript-rendered content with no server-side alternative
Pricing hidden behind sales calls
Documentation locked behind authentication
Generic marketing copy that doesn't explain what you actually do

Comparison to UX/DevX: UX focuses on visual hierarchy and scannability. DevX emphasizes comprehensive docs. AX requires that the essential information exists as parseable text, not just pretty layouts.

Score guide:

4: Agent can fully understand product, pricing, and capabilities from public pages
3: Core information accessible, some gaps in pricing or feature details
2: Requires significant inference; key information in PDFs or rendered JS
1: Product essentially invisible to agents

Can an agent create an account without human intervention?

This is where most platforms fail. CAPTCHAs exist specifically to block automated access. Phone verification requires human interaction. Complex multi-step wizards with client-side validation break agent workflows.

What "good" looks like:

API-based registration (sign up via POST request)
Email-only verification with magic links
OAuth options that agents can handle (GitHub, Google with proper scopes)
Minimal required fields

What kills sign-up:

CAPTCHAs (reCAPTCHA, hCaptcha, custom puzzles)
Phone number verification
Multi-page wizards with JavaScript validation
Required fields that need human context (company size, use case dropdowns)

Comparison to UX/DevX: UX optimizes sign-up for speed and simplicity. DevX might add "developer-focused" questions to personalize the experience. AX asks whether the entire flow can complete programmatically.

Score guide:

4: API-based or simple form submission, no CAPTCHA, email verification only
3: Web form works without CAPTCHA, minimal fields, straightforward flow
2: CAPTCHA present but can sometimes be bypassed; phone verification optional
1: Hard blocks (required CAPTCHA, mandatory phone verification)

3. Onboarding Clarity (1-4)

Once signed up, how quickly can an agent reach a working state?

The path from "account created" to "first successful API call" varies wildly. Some platforms surface the API key immediately on the dashboard. Others hide it behind profile settings, require additional verification, or force you through an interactive tutorial.

What "good" looks like:

API key visible immediately after login
Quickstart that actually works (copy-paste code that runs)
Clear sandbox/test mode with realistic behavior
No mandatory interactive tutorials

What kills onboarding:

API keys hidden in nested settings menus
Quickstarts with outdated code or missing dependencies
Required "getting started" wizards that must be clicked through
Test modes that behave differently from production

Comparison to UX/DevX: UX designs onboarding for engagement and education. DevX ensures quickstarts are accurate and comprehensive. AX measures whether an agent can skip the education and get to work.

Score guide:

4: API key immediate, quickstart works without modification, clear test mode
3: API key findable, quickstart mostly works, some manual steps
2: API key requires hunting, quickstart needs debugging, unclear test mode
1: Onboarding requires human intervention to complete

4. API Design Quality (1-4)

Is the API predictable enough that an agent can use it correctly?

This overlaps heavily with traditional API design principles, but the emphasis differs. Agents benefit from consistency, clear error messages, and conventional patterns. They struggle with APIs that require implicit knowledge or have inconsistent behavior across endpoints.

What "good" looks like:

RESTful conventions (or consistent GraphQL schema)
Error responses that explain what went wrong and how to fix it
Predictable authentication (Bearer tokens, not custom schemes)
Idempotent operations where possible

What kills API quality:

Custom authentication schemes
Error codes without explanatory messages
Inconsistent response formats across endpoints
Operations that require specific ordering without documentation

Comparison to UX/DevX: DevX values comprehensive documentation and flexibility. AX values predictability over flexibility—an agent would rather have one obvious way to do something than three clever options.

Score guide:

4: Conventional patterns, excellent error messages, consistent behavior
3: Standard REST/GraphQL, adequate errors, minor inconsistencies
2: Workable but requires inference, sparse error details
1: Custom patterns, cryptic errors, inconsistent behavior

5. SDK and Tooling (1-4)

Are there official SDKs, and do they improve the agent experience?

SDKs matter more for agents than humans. A human can read documentation and construct HTTP requests. An agent benefits enormously from typed SDKs with autocomplete-friendly method names and clear parameter requirements.

What "good" looks like:

Official SDKs for major languages (at minimum: Python, JavaScript/TypeScript)
SDKs published to standard package managers (npm, PyPI)
Types included (TypeScript definitions, Python type hints)
SDK methods that mirror API capabilities

What kills SDK quality:

No official SDKs
SDKs that lag behind API features
SDKs only available via manual download
Wrapper libraries with different abstractions than the API

Comparison to UX/DevX: DevX might argue that a well-documented API doesn't need SDKs. For AX, SDKs are force multipliers—they let agents write correct code faster with fewer iterations.

Score guide:

4: Official typed SDKs for Python/JS, well-maintained, feature-complete
3: Official SDKs exist but missing types or lagging features
2: Community SDKs only, or official SDKs poorly maintained
1: No SDKs, HTTP-only integration

How We Test

Our benchmarks aren't theoretical. We actually build integrations.

For each platform we evaluate, we give a coding agent (currently Claude Code) a task: integrate this service from scratch. We measure:

Time to working integration: Clock starts at "integrate [service]" and stops at first successful operation
Human interventions required: Every time we have to help the agent (solve a CAPTCHA, find a hidden setting, explain an error)
Iterations to success: How many attempts before the code works

We document the specific friction points. We note where the agent gets stuck. We record what information was missing or inaccessible.

What's Coming

We're starting with email sending platforms. Resend, SendGrid, Postmark, Amazon SES, Mailgun—evaluated head-to-head on AX metrics.

After that: observability tools, browser automation platforms, search APIs, and project management integrations. Each category will get a comprehensive benchmark comparing the major players.

The goal isn't just to rank platforms. It's to establish what good looks like so companies can improve. Every platform we test gets specific, actionable feedback on their AX gaps.

The Opportunity

Here's the uncomfortable truth for developer tools companies: your competitors are one good AX score away from capturing agent-driven adoption.

When a developer asks an agent to "set up error tracking for this project," the agent will recommend whatever it can successfully integrate. If Sentry has a CAPTCHA and Highlight doesn't, Highlight wins. If Datadog's quickstart is broken and Grafana Cloud's works, Grafana Cloud wins.

This isn't about replacing human developers. It's about recognizing that the developer workflow is changing. Agents are becoming the first integration touchpoint. The platforms that optimize for this reality will grow faster than those that don't.

AX isn't a replacement for DevX. It's the next layer. You still need great documentation, intuitive dashboards, and responsive support. But you also need to ask: what happens when the first "user" to try your product isn't a user at all?

We'll publish our first AX Benchmark comparing email sending platforms next week. Follow along as we systematically evaluate how well the developer tools ecosystem serves the agent era.

Why AX Matters Now​

The Five AX Metrics​

1. Discoverability (1-4)​

2. Sign-up Accessibility (1-4)​

3. Onboarding Clarity (1-4)​

4. API Design Quality (1-4)​

5. SDK and Tooling (1-4)​

How We Test​

What's Coming​

The Opportunity​

Why AX Matters Now

The Five AX Metrics

1. Discoverability (1-4)

2. Sign-up Accessibility (1-4)

3. Onboarding Clarity (1-4)

4. API Design Quality (1-4)

5. SDK and Tooling (1-4)

How We Test

What's Coming

The Opportunity