Introducing the AX Benchmark: Measuring Agent Experience for Developer Platforms
Your product has great documentation. Your API follows REST conventions. Your SDK has TypeScript types. You've invested in Developer Experience (DevX) and it shows—developers love working with your platform.
But here's the question nobody's asking: can an AI agent use it?
We've spent the last few months using coding agents to build integrations with dozens of developer tools. Email senders, observability platforms, browser automation services, search APIs. The pattern became clear: products optimized for human developers often fail catastrophically when an agent tries to use them.
This isn't hypothetical. AI agents are writing production code today. They're integrating APIs, setting up infrastructure, and deploying applications. The companies that make this easy will capture the next wave of adoption. The companies that don't will watch agents recommend their competitors.
We're launching the AX Benchmark to measure this systematically. AX stands for Agent Experience—how well AI agents can discover, sign up for, onboard with, and integrate a platform without human intervention.
Why AX Matters Now
Developer Experience (DevX) revolutionized how we think about API design. Companies learned that friction in the developer journey costs customers. Stripe became a $95 billion company partly because their API was a joy to use. Twilio built an empire on clear documentation and predictable interfaces.
But DevX optimizes for humans reading documentation, scanning dashboards, and clicking through sign-up flows. Agents don't read the same way. They can't solve CAPTCHAs. They struggle with JavaScript-heavy single-page applications. They miss context that's obvious to humans looking at a screen.
Consider what happens when an agent tries to integrate your email API:
-
Discovery: The agent searches for "transactional email API." Can it find your product? Can it understand what you offer from the search results and landing page?
-
Sign-up: The agent navigates to your registration page. Is there a CAPTCHA? Phone verification? A multi-step wizard with dynamic form validation?
-
Onboarding: The agent creates an account. Where's the API key? Is it on the dashboard? Behind another verification step? Does the quickstart actually work?
-
Integration: The agent writes code to send an email. Are the endpoints predictable? Do error messages explain what went wrong? Is there an SDK, or does it need to construct HTTP requests manually?
Each step has failure modes that don't exist for human developers. A person sees a CAPTCHA and solves it in three seconds. An agent sees a CAPTCHA and the entire workflow stops.
The Five AX Metrics
We evaluate platforms across five dimensions, each scored from 1 to 4. Here's what we measure and why it matters.
1. Discoverability (1-4)
Can an agent find your product and understand what it does?
This sounds basic, but it fails surprisingly often. Agents rely on text content that's accessible without JavaScript rendering. They parse structured data, meta descriptions, and clear headings. Marketing sites heavy on animations and light on substance become invisible.
What "good" looks like:
- Plain HTML content that describes your product's core functionality
- Structured pricing information (not "contact sales" for every tier)
- Documentation that's indexable and logically organized
- Clear differentiation from competitors in accessible text
What kills discoverability:
- JavaScript-rendered content with no server-side alternative
- Pricing hidden behind sales calls
- Documentation locked behind authentication
- Generic marketing copy that doesn't explain what you actually do
Comparison to UX/DevX: UX focuses on visual hierarchy and scannability. DevX emphasizes comprehensive docs. AX requires that the essential information exists as parseable text, not just pretty layouts.
Score guide:
- 4: Agent can fully understand product, pricing, and capabilities from public pages
- 3: Core information accessible, some gaps in pricing or feature details
- 2: Requires significant inference; key information in PDFs or rendered JS
- 1: Product essentially invisible to agents
2. Sign-up Accessibility (1-4)
Can an agent create an account without human intervention?
This is where most platforms fail. CAPTCHAs exist specifically to block automated access. Phone verification requires human interaction. Complex multi-step wizards with client-side validation break agent workflows.
What "good" looks like:
- API-based registration (sign up via POST request)
- Email-only verification with magic links
- OAuth options that agents can handle (GitHub, Google with proper scopes)
- Minimal required fields
What kills sign-up:
- CAPTCHAs (reCAPTCHA, hCaptcha, custom puzzles)
- Phone number verification
- Multi-page wizards with JavaScript validation
- Required fields that need human context (company size, use case dropdowns)
Comparison to UX/DevX: UX optimizes sign-up for speed and simplicity. DevX might add "developer-focused" questions to personalize the experience. AX asks whether the entire flow can complete programmatically.
Score guide:
- 4: API-based or simple form submission, no CAPTCHA, email verification only
- 3: Web form works without CAPTCHA, minimal fields, straightforward flow
- 2: CAPTCHA present but can sometimes be bypassed; phone verification optional
- 1: Hard blocks (required CAPTCHA, mandatory phone verification)
3. Onboarding Clarity (1-4)
Once signed up, how quickly can an agent reach a working state?
The path from "account created" to "first successful API call" varies wildly. Some platforms surface the API key immediately on the dashboard. Others hide it behind profile settings, require additional verification, or force you through an interactive tutorial.
What "good" looks like:
- API key visible immediately after login
- Quickstart that actually works (copy-paste code that runs)
- Clear sandbox/test mode with realistic behavior
- No mandatory interactive tutorials
What kills onboarding:
- API keys hidden in nested settings menus
- Quickstarts with outdated code or missing dependencies
- Required "getting started" wizards that must be clicked through
- Test modes that behave differently from production
Comparison to UX/DevX: UX designs onboarding for engagement and education. DevX ensures quickstarts are accurate and comprehensive. AX measures whether an agent can skip the education and get to work.
Score guide:
- 4: API key immediate, quickstart works without modification, clear test mode
- 3: API key findable, quickstart mostly works, some manual steps
- 2: API key requires hunting, quickstart needs debugging, unclear test mode
- 1: Onboarding requires human intervention to complete
4. API Design Quality (1-4)
Is the API predictable enough that an agent can use it correctly?
This overlaps heavily with traditional API design principles, but the emphasis differs. Agents benefit from consistency, clear error messages, and conventional patterns. They struggle with APIs that require implicit knowledge or have inconsistent behavior across endpoints.
What "good" looks like:
- RESTful conventions (or consistent GraphQL schema)
- Error responses that explain what went wrong and how to fix it
- Predictable authentication (Bearer tokens, not custom schemes)
- Idempotent operations where possible
What kills API quality:
- Custom authentication schemes
- Error codes without explanatory messages
- Inconsistent response formats across endpoints
- Operations that require specific ordering without documentation
Comparison to UX/DevX: DevX values comprehensive documentation and flexibility. AX values predictability over flexibility—an agent would rather have one obvious way to do something than three clever options.
Score guide:
- 4: Conventional patterns, excellent error messages, consistent behavior
- 3: Standard REST/GraphQL, adequate errors, minor inconsistencies
- 2: Workable but requires inference, sparse error details
- 1: Custom patterns, cryptic errors, inconsistent behavior
5. SDK and Tooling (1-4)
Are there official SDKs, and do they improve the agent experience?
SDKs matter more for agents than humans. A human can read documentation and construct HTTP requests. An agent benefits enormously from typed SDKs with autocomplete-friendly method names and clear parameter requirements.
What "good" looks like:
- Official SDKs for major languages (at minimum: Python, JavaScript/TypeScript)
- SDKs published to standard package managers (npm, PyPI)
- Types included (TypeScript definitions, Python type hints)
- SDK methods that mirror API capabilities
What kills SDK quality:
- No official SDKs
- SDKs that lag behind API features
- SDKs only available via manual download
- Wrapper libraries with different abstractions than the API
Comparison to UX/DevX: DevX might argue that a well-documented API doesn't need SDKs. For AX, SDKs are force multipliers—they let agents write correct code faster with fewer iterations.
Score guide:
- 4: Official typed SDKs for Python/JS, well-maintained, feature-complete
- 3: Official SDKs exist but missing types or lagging features
- 2: Community SDKs only, or official SDKs poorly maintained
- 1: No SDKs, HTTP-only integration
How We Test
Our benchmarks aren't theoretical. We actually build integrations.
For each platform we evaluate, we give a coding agent (currently Claude Code) a task: integrate this service from scratch. We measure:
- Time to working integration: Clock starts at "integrate [service]" and stops at first successful operation
- Human interventions required: Every time we have to help the agent (solve a CAPTCHA, find a hidden setting, explain an error)
- Iterations to success: How many attempts before the code works
We document the specific friction points. We note where the agent gets stuck. We record what information was missing or inaccessible.
What's Coming
We're starting with email sending platforms. Resend, SendGrid, Postmark, Amazon SES, Mailgun—evaluated head-to-head on AX metrics.
After that: observability tools, browser automation platforms, search APIs, and project management integrations. Each category will get a comprehensive benchmark comparing the major players.
The goal isn't just to rank platforms. It's to establish what good looks like so companies can improve. Every platform we test gets specific, actionable feedback on their AX gaps.
The Opportunity
Here's the uncomfortable truth for developer tools companies: your competitors are one good AX score away from capturing agent-driven adoption.
When a developer asks an agent to "set up error tracking for this project," the agent will recommend whatever it can successfully integrate. If Sentry has a CAPTCHA and Highlight doesn't, Highlight wins. If Datadog's quickstart is broken and Grafana Cloud's works, Grafana Cloud wins.
This isn't about replacing human developers. It's about recognizing that the developer workflow is changing. Agents are becoming the first integration touchpoint. The platforms that optimize for this reality will grow faster than those that don't.
AX isn't a replacement for DevX. It's the next layer. You still need great documentation, intuitive dashboards, and responsive support. But you also need to ask: what happens when the first "user" to try your product isn't a user at all?
We'll publish our first AX Benchmark comparing email sending platforms next week. Follow along as we systematically evaluate how well the developer tools ecosystem serves the agent era.