Skip to main content
What happens when Claude uses web search to pick your stack

What happens when Claude uses web search to pick your stack

· 10 min read
Claude

If you ask Claude "I want to host a Next.js app and a database, what should I use?" — the answer you get depends entirely on whether Claude is allowed to search the web.

That sounds obvious. It isn't, because in most consumer-facing chat products the model decides for itself whether to search, and you never see that decision. In developer tools like Claude Code or the Anthropic API, you pick — and the pick changes which vendors get recommended.

I ran 52 transactional product-recommendation prompts (the kind developers actually type: "where should I deploy my side project", "heroku is getting too expensive, what are people moving to these days", "mailgun has been flaky for me, what else is good") through Claude Sonnet 4.5 twice each:

  1. No search — pure prior knowledge, no tools.
  2. Search forced — same prompt with "Please search the web for up-to-date information before answering." prepended, and the web_search tool available.

Same model. Same max tokens. Same concise system prompt asking for a ranked top-5. Every answer went through a second pass to extract a canonical, ranked product list.

Here's what I found.

Search quietly makes the answer worse for developers more often than it helps

The naive assumption is that "grounded in search results" beats "vibes from training data." It mostly doesn't — because the search results Claude retrieves are dominated by the same SEO-optimised listicles that rank for generic developer queries on Google.

Case 1: "I want to send an SMS when someone signs up, what's the easiest"

RankNo searchSearch forced
1TwilioZapier (SMS by Zapier)
2AWS SNSClickSend
3VonageText-Em-All
4PlivoTextMagic
5MessageBirdNotifyre

Every developer-first SMS API — Twilio, Plivo, MessageBird, Vonage — disappears. The replacements are low-code / consumer platforms that dominate "easiest SMS" SEO but that no engineer would actually pick.

The search query Claude issued was "easiest SMS API sign up notification" — generic enough that the top results were affiliate-farm "best easy SMS service" articles. The search worked as designed; the internet just isn't arranged to help here.

Case 2: "I need to send maybe 100 emails a month, what's the simplest thing"

RankNo searchSearch forced
1GmailGmail
2Outlook.comAha Send
3MailchimpSender
4SendinblueBrevo
5Proton MailMailchimp

Both answers have weaknesses, but Aha Send and Sender are unfamiliar consumer ESPs that ranked because they write landing pages targeting the phrase "simplest email service." The top-of-funnel SEO layer is answering your architecture question.

Case 3: "Recommend a commercial calendar component with timezone support and rich customisation"

RankNo searchSearch forced
1BryntumMobiscroll
2DHTMLX SchedulerFullCalendar
3FullCalendar PremiumCalendarKit Pro
4Syncfusion Schedulereact-datepicker
5DevExtreme SchedulerReact Big Calendar

Without search, Claude lists the four well-established commercial JS calendar vendors. With search, Bryntum, DHTMLX, Syncfusion and DevExtreme all drop out — replaced by Mobiscroll (strong SEO game), free libraries (FullCalendar non-premium, React Big Calendar), and a date-picker that shouldn't even be on a commercial-calendar list.

The top search results were a Mobiscroll demo page, a shadcn component doc, a CSS-Script "9 best JS calendar components" listicle, and CalendarKit's homepage. None featured Bryntum. So Claude, obediently grounded in what it fetched, didn't either.

Sometimes search does help — in a specific shape of query

It's not all loss. Search is genuinely useful when:

  • The prompt asks about current trends. For "what's everyone using for sending emails from their SaaS now," no-search returned the evergreen stack (Mailgun, SendGrid, SES). Search added Customer.io, Loops, Encharge — newer tools developers actually discuss on X and in YC threads, and which may sit at the edges of Claude's training data.
  • The prompt is in a specialist category with strong dedicated SEO. "Which transactional email service has the best deliverability for cold emails" — no-search mixed SaaS email tools (SendGrid, Postmark, SES) with cold-email specialists. Search returned 100% cold-email tools: Instantly.ai, Saleshandy, Smartlead, Lemlist, Woodpecker. Cold email is its own discipline with its own vendors and its own SEO — search knew that, the model's priors averaged over it.
  • The prompt hints at a specific community. "I'm building a dev tool startup, what do other YC companies use for email" — no-search gave generic SaaS tooling. Search surfaced Loops, Waypoint, Dittofeed, Polymail, Zaymo — almost all YC-adjacent newer players.

Notice the pattern: search is a win when the correct answer lives in a dense, self-contained content cluster (cold-email blogs, YC-adjacent dev communities, cloud-SQLite announcements) that's well-indexed. Search is a loss when the correct answer is a fragmented list of serious dev tools whose vendors aren't writing SEO landing pages about the phrasing you used.

Search sometimes breaks category understanding entirely

The scariest failure mode is when search doesn't just demote the right vendor — it changes what category Claude thinks you're asking about.

"I need a scheduler that handles resource dependencies and constraints"

  • No search: Apache Airflow, Prefect, Temporal, Dagster, Argo Workflows (reads "scheduler" as workflow orchestrator)
  • Search forced: Runn, Microsoft Project, Smartsheet, Phoenix Project Manager, Dynamics 365 Resource Scheduling Optimization (reads "scheduler" as project management software)

Same prompt, same model, completely different product universe. Neither answer is what the prompt actually meant (a UI component for resource scheduling), but the two conditions arrive at two different wrong category reads — and both are wrong in ways the user is unlikely to notice unless they already know the field.

"What's the best data grid for a financial trading dashboard"

Here Claude's forced search query was "best data grid financial trading dashboard high frequency updates 2026" — and the top five results it retrieved were about stock market data APIs (FinancialModelingPrep, TailAdmin templates, apilayer, TradingView, an UltraTrader blog post about "trading tools"). None were JavaScript data grid libraries.

Claude still arrived at a reasonable-ish answer (AG Grid Enterprise led both conditions) because the model hedged against its own retrieved context. But the quality of the ground-truth was worse than the quality of the priors. That's a meaningful inversion.

What this means if you're a developer

  1. Don't assume "with search" is always better. For mainstream categories — data grids, calendars, email APIs — your model's prior knowledge is often closer to what a senior engineer would pick than whatever's ranking on Google this month.
  2. Pay attention to which search query the agent issued. The query is usually visible in the tool-use trace. If it's a generic 2026-dated category phrase, the top results are likely affiliate-farm listicles, and the answer is about to be steered by them.
  3. Use search for "what's trending" and "who's using what," not "what's best." The first two are exactly what SEO-optimised content answers well. The third is what the content is trying to answer — and doing badly, because the incentives are wrong.
  4. Treat search-forced answers as one opinion, not the opinion. Ask the same question with and without search when it matters.

What this means if you're a vendor

This is where it gets uncomfortable.

Your product's inclusion in Claude's answer to a transactional prompt depends on two completely different optimisations:

  • Getting into the prior. Being in enough documentation, GitHub issues, blog posts, Reddit threads, and HackerNews comments before the training cutoff that the model knows you exist and what you're for.
  • Getting into the top 5 search results. Ranking well — either on your own domain, or inside the listicles ("Top 10 X in 2026") that agents' first-turn queries actually hit.

Being strong at (1) and weak at (2) means you'll show up when users set search=off but quietly vanish when search=on. Bryntum is a clean example: recommended in 21 of 29 relevant prompts without search, 18 of 29 with search. That's a ~14% drop in visibility, driven entirely by the fact that Bryntum isn't in the top 5 of the listicle sites that Claude reads.

Being strong at (2) and weak at (1) is the inverse, and it's becoming a viable GTM: Mobiscroll, Aha Send, ClickSend, and others are barely in any agent's training-set priors, but they ride SEO into search-grounded answers. If agents default to search — and Claude Code's default is increasingly "yes, search" — this is a real distribution channel.

The uncomfortable implication: the old SEO game and the new AI-recommendation game aren't separate. When the agent searches, winning SEO is winning AI recommendations, because the agent reads the same listicles a human would. The category queries an agent issues are the category queries a human Googles; the listicles that rank for them were written to rank, not to be right.

What helps:

  • Rank for the long-tail phrasings users actually say to agents. "Drag-drop row reordering," "resource scheduling for field service," "dev-friendly ESP" — the prompt text is the new keyword.
  • Get into the listicles. DZone, DEV.to, CSS-Script, freeCodeCamp roundups, and Reddit megathreads are what agents retrieve. If you're not in the top 5 of those, you're not in the top 5 of the agent's answer.
  • Publish things that rank on your own domain for category queries. A well-ranked "X vs Y" page on your site is a double win: it gets into both the prior (eventually) and the search result set (immediately).

Method and data

  • Model: Claude Sonnet 4.5 for main answers, Claude Haiku 4.5 for the extraction pass that turns prose into a ranked, canonical product list.
  • 52 prompts across 3 categories: UI component libraries (29 prompts focused on Bryntum's market — grids, schedulers, Gantt, calendars, task boards), hosting (6), databases (5), email/SMS (12).
  • Two conditions per prompt: no_search (no tools) and search_forced (prompt prefix + web_search_20250305 tool, max 1 search per call).
  • Bryntum's six product lines (Grid, Scheduler, Scheduler Pro, Gantt, Calendar, TaskBoard) are collapsed into a single canonical "Bryntum" entry, since the question for a vendor is "do you get mentioned at all."
  • Each prompt runs in both conditions. Products are extracted in the order Claude presents them. Mentions are tracked by position.

The full experiment scaffolding — prompts, SQLite database, report generator — is at github.com/ritza-co/claude-with-without-web-search (todo: publish).

The short version

If you're a developer: web search doesn't make Claude smarter about product recommendations. For a lot of queries, it makes Claude worse — more susceptible to affiliate SEO, more likely to swap real developer tools for consumer-grade alternatives, more likely to drop the right vendor in favour of whoever wrote the best landing page for your phrasing.

If you're a vendor: you now need to win two games. The one your marketing team has been playing (SEO) is back from the dead, because agents read the same articles humans do. And the one nobody has really been playing (getting into models' training priors) is becoming the thing that insulates you from SEO spam when users turn search off.

Neither game is optional. And they reward different things.