## What is content negotiation for agents?

Content negotiation means serving different responses from the same URL based on what the client asks for. For AI agents, this means returning markdown or plain text — the format LLMs already parse well — when the request comes from an agent, while still serving HTML to browsers. Same URL, different representation, per [RFC 9110 §12.5](https://www.rfc-editor.org/rfc/rfc9110#name-content-negotiation).

## Why it matters

LLM-driven agents typically receive only the response body, not HTTP headers, status codes, or redirect chains. Serving HTML forces the agent to parse DOM structure, strip layout, and discard navigation — wasting tokens before the model sees anything useful. A clean text response gives the model the actual content directly.

## The Accept preference order gotcha

The most common content-negotiation bug is treating the Accept header as a simple substring check. Take Claude Code's WebFetch tool — it sends:

```
Accept: text/markdown, text/html, */*
```

This is the client saying, in preference order: "I'd prefer markdown if you have it, otherwise HTML, otherwise anything." A naive check like `if (accept.includes('text/html'))` sees `text/html` in the string and serves HTML — ignoring that `text/markdown` was listed first.

Per [RFC 9110 §12.5.1](https://www.rfc-editor.org/rfc/rfc9110#name-accept), when q-values are not specified the **order of media types expresses preference**. A correct implementation parses the Accept list, applies q-values, and picks the leftmost type the server can serve.

## What AgentGrade checks

**Agent UA gets non-HTML** — We send `User-Agent: claude-code/1.0.0` with `Accept: text/markdown, text/html, */*` to your homepage. The check passes if you serve `text/markdown`, `text/plain`, or `application/json` with body ≥20 bytes. Sites that substring-match Accept and serve HTML fail this.

**Accept: JSON returns JSON** — We send `Accept: application/json` and check for valid JSON.

**Accept: text returns text** — We send `Accept: text/plain` and check for plain text or markdown.

**Accept: markdown returns markdown** — We send `Accept: text/markdown` and check for markdown or plain text.

**Vary: Accept set** — When you negotiate, the response must include `Vary: Accept` so shared caches key entries correctly.

## How to implement it correctly

Use a proper Accept negotiator instead of substring matching:

```javascript
// Express — req.accepts uses the negotiator package under the hood
app.get('/', async (req, res) => {
  res.vary('Accept');
  const best = req.accepts(['text/html', 'text/markdown', 'text/plain', 'application/json']);
  if (best === 'text/markdown' || best === 'text/plain') {
    return res.type(best).send(await buildLlmsTxt());
  }
  if (best === 'application/json') {
    return res.json({ name: 'Your Service', api: '/openapi.json' });
  }
  res.sendFile('index.html');
});
```

Other ecosystems:
- **Node.js (no framework):** [`negotiator`](https://www.npmjs.com/package/negotiator) npm package
- **Python:** `werkzeug.wrappers.AcceptMixin` or `request.accept_mimetypes.best_match`
- **Go:** [`github.com/markusthoemmes/goautoneg`](https://github.com/markusthoemmes/goautoneg)
- **Ruby on Rails:** `respond_to do |format|` blocks handle response generation, but Rails treats `*/*` in Accept as a license to serve HTML — `Accept: text/markdown, */*` returns HTML even though markdown is preferred. Fix by setting `request.format` explicitly in a `before_action` based on the first non-wildcard Accept type, before `respond_to` runs. Reordering `format.X` blocks alone won't override the `*/*` fallback.
- **Cloudflare Workers:** parse `request.headers.get('Accept')` manually or use the `accept` npm package

## Inline vs redirect — pick inline

There are two ways to serve agent-friendly content. Inline is better:

**Inline (recommended):** Same URL serves different bodies based on Accept.

```
GET / → 200 OK
  Content-Type: text/html (browser) | text/markdown (agent)
```

**Redirect (legacy):** Send agents to `/llms.txt`.

```
GET / → 302 Found, Location: /llms.txt
GET /llms.txt → 200 OK
```

Inline wins because: (1) one fetch instead of two — half the latency; (2) the URL the agent reports to the user is the URL they were asked about, not a redirect target; (3) caching is cleaner with `Vary: Accept`. The `/llms.txt` route still exists for tools that fetch it directly — both routes call the same content function so there's one source of truth.

## Vary: Accept is load-bearing

Whenever the same URL returns different bodies based on Accept, set `Vary: Accept`. This tells shared caches (CDNs, proxies, browsers) that the cache key must include the Accept header value.

Without it, a CDN could cache the markdown response from one agent fetch and serve it to a browser visit — or the reverse. The Vary header is the only thing that keeps cache entries from being interchangeable when the bodies are not.

## Known AI agent User-Agents

| Agent | User-Agent | Purpose |
|---|---|---|
| ClaudeBot | `Mozilla/5.0 (compatible; ClaudeBot/1.0; +claudebot@anthropic.com)` | Anthropic training crawler |
| Claude-User | `Mozilla/5.0 ... (compatible; Claude-User/1.0; +Claude-User@anthropic.com)` | claude.ai web_fetch, Claude API web_search page reads |
| Claude-SearchBot | (string not published) | Anthropic search index crawler |
| claude-code | `claude-code/<version>` | Claude Code CLI WebFetch tool |
| ChatGPT-User | `Mozilla/5.0 ... (compatible; ChatGPT-User/1.0; +https://openai.com/bot)` | User-initiated ChatGPT browse |
| OAI-SearchBot | `Mozilla/5.0 ... (compatible; OAI-SearchBot/1.3; +https://openai.com/searchbot)` | OpenAI search index |
| OAI-AdsBot | `Mozilla/5.0 ... (compatible; OAI-AdsBot/1.0; +https://openai.com/adsbot)` | OpenAI ads crawler |
| GPTBot | `Mozilla/5.0 ... (compatible; GPTBot/1.3; +https://openai.com/gptbot)` | OpenAI training crawler |
| PerplexityBot | `Mozilla/5.0 ... (compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)` | Perplexity search-results crawler |
| Perplexity-User | `Mozilla/5.0 ... (compatible; Perplexity-User/1.0; +https://perplexity.ai/perplexity-user)` | User-initiated Perplexity fetches |
| Google-Extended | (uses Googlebot UA) | Google Gemini training, controlled via robots.txt |

## Web Bot Auth — the next signal

A growing list of agents (ChatGPT Agent confirmed today; Anthropic, Perplexity, Google expected) cryptographically sign their requests per [RFC 9421 HTTP Message Signatures](https://www.rfc-editor.org/rfc/rfc9421). The signal is the `Signature-Agent` request header:

```
Signature-Agent: "https://chatgpt.com"
Signature-Input: sig=("@authority" "signature-agent"); keyid="..."; tag="web-bot-auth"
Signature: sig=...
```

If you see `Signature-Agent` on an incoming request, treat the client as a known agent even if the UA looks like a browser. For full verification, fetch the JWKS at the host named in `Signature-Agent` (`/.well-known/http-message-signatures-directory`) and verify the signature with the [`web-bot-auth`](https://www.npmjs.com/package/web-bot-auth) npm package. For content-negotiation purposes, presence of the header alone is a sufficient soft signal.

## Learn more

- [RFC 9110 §12.5 — Content Negotiation](https://www.rfc-editor.org/rfc/rfc9110#name-content-negotiation)
- [llms.txt specification](https://llmstxt.org/)
- [Anthropic crawler docs](https://support.claude.com/en/articles/8896518)
- [OpenAI bot docs](https://developers.openai.com/api/docs/bots)
- [Perplexity crawler docs](https://docs.perplexity.ai/docs/resources/perplexity-crawlers)
- [Cloudflare Web Bot Auth](https://blog.cloudflare.com/web-bot-auth/)

## Preferred type vs non-HTML — the next bar

"Agent UA gets non-HTML" is a basic check: did the server serve anything other than HTML? "Returns preferred Content-Type" is the strict version: did the response Content-Type match the leading type the client signaled?

For example, when the client sends `Accept: text/markdown, text/html, */*`:

- Server returns `Content-Type: text/markdown` → passes both checks
- Server returns `Content-Type: text/plain` → passes the basic check, fails the strict one
- Server returns `Content-Type: text/html` → fails both

The scanner runs four probes: markdown leading (the Claude Code / Cursor pattern), HTML leading with markdown listed second (the browser / ChatGPT Agent pattern — catches sites that ignore client order and use server-side preference instead), explicit q-values favoring HTML over markdown (catches sites that ignore q-values entirely), and JSON leading (programmatic discovery pattern). All four must pass. Today the Content-Type label is mostly decorative for LLM-based agents — they parse the body bytes regardless of MIME type. But browser-based AI extensions and emerging MCP tools branch on Content-Type, and the gap will widen as the ecosystem matures.

The fix is a one-line change in your handler: set the response Content-Type from the negotiated type, not a hardcoded value. If your code already returns `text/plain` for both `Accept: text/plain` and `Accept: text/markdown`, branch on the negotiated type and label accordingly.

This check is required — failing it costs points in the Content Negotiation group.

## Diagnosing your bug — q-values and the three patterns

### How q-values work

When a client sends multiple types in Accept, it can attach **q-values** (quality factors) between 0.0 and 1.0 to express *relative* preference:

```
Accept: text/markdown;q=1.0, text/html;q=0.5, */*;q=0.1
```

Meaning: "I really want markdown. I'll take HTML as a backup. Anything else is a last resort."

When no q-value is given, it defaults to 1.0. So `Accept: text/markdown, text/html, */*` means all three are equally preferred — and the **order in the header** breaks the tie. A correct server picks markdown.

A proper Accept negotiator (Express `req.accepts()`, the `negotiator` npm package, Python `werkzeug`, Go `goautoneg`) handles all this automatically: parse q-values, honor order on ties, pick the best match the server can serve.

### The three bug patterns we see in the wild

If your site fails the "Agent UA gets non-HTML" check, the cause is almost always one of these:

**Pattern 1: Substring matching.** Code that checks if the Accept header *contains* a type, in a fixed if-else order. Example:

```javascript
// WRONG — order of checks, not order in Accept, wins
if (accept.includes('text/html')) return html;
else if (accept.includes('text/markdown')) return markdown;
```

Client sends `Accept: text/markdown, text/html` → server returns HTML because `text/html` is in the string. Preference order from the client is ignored entirely.

**Pattern 2: Framework default that serves HTML on `*/*`.** Some frameworks treat `*/*` in Accept as a license to fall back to HTML, even when explicit non-HTML types are listed earlier. Rails 8's `respond_to` is a notable example:

```
Accept: text/markdown, */*       → Rails returns HTML (markdown ignored)
Accept: text/markdown, text/html → Rails returns markdown (no */*, honors order)
```

**Pattern 3: Server-internal preference order + q-values ignored.** Server has its own priority list (often hardcoded somewhere) and picks whichever type from the Accept header is highest *on the server's list* — not on the client's list. q-values aren't parsed at all:

```
Accept: text/plain;q=0.9, text/html;q=0.5 → returns HTML
                                            (server prefers html despite q-values
                                             explicitly favoring plain)
```

The smoking gun for pattern 3 is q-values being ignored. If the same site returns HTML for the row above and markdown for `Accept: text/plain, text/markdown` (markdown won despite plain being listed first), it's pattern 3.

### A quick diagnostic test

Run these five curl commands against your homepage. The pattern in the responses tells you which bug you have:

```bash
curl -sI -H "Accept: text/markdown" YOUR_SITE/
curl -sI -H "Accept: text/markdown, text/html, */*" YOUR_SITE/
curl -sI -H "Accept: text/plain, text/markdown" YOUR_SITE/
curl -sI -H "Accept: text/markdown, */*" YOUR_SITE/
curl -sI -H "Accept: text/plain;q=0.9, text/html;q=0.5" YOUR_SITE/
```

- If only the first returns markdown: pattern 1 (substring matching).
- If the first three return markdown but the fourth returns HTML: pattern 2 (`*/*` fallback).
- If the fifth returns HTML and the third returns markdown (or vice versa with whatever you think you serve): pattern 3 (server preference + q-values ignored).

The fix recipe is the same in all three cases: replace whatever ad-hoc selection logic you have with a proper Accept negotiator from the list above.

## Inline vs 302 redirect — what to do

Two patterns for serving agent-friendly content at your homepage:

**Inline** — same URL returns different bodies based on Accept header.

```
GET /  + Accept: text/html      →  200 + HTML
GET /  + Accept: text/markdown  →  200 + markdown
```

**Redirect** — server sends agent requests to a separate canonical URL.

```
GET /  + Accept: text/markdown  →  302, Location: /llms.txt
GET /llms.txt                   →  200 + markdown
```

**Use inline.** It is the documented best practice in [RFC 9110 §12.2](https://www.rfc-editor.org/rfc/rfc9110#name-reactive-negotiation), which explicitly lists the disadvantages of redirect-based (reactive) negotiation: "suffers from transmitting a list of alternatives... and needing a second request to obtain an alternate representation" and "does not define a mechanism for supporting automatic selection."

Every major content-negotiation-aware site we tested uses inline:

- **GitHub API** — same URL varies on Accept (`application/vnd.github+json` vs `application/vnd.github.html+json`), no redirect
- **Stripe docs** — `docs.stripe.com/api` returns HTML or markdown from the same URL with `Vary: Accept`
- **Cloudflare developer docs** — edge converts inline, same URL
- **Vercel**, **Mintlify**, **Sanity** — all recommend inline in their public guidance for agent-friendly pages

## Why inline wins concretely

1. **Half the latency.** One HTTP fetch instead of two. HTTP/2 and HTTP/3 multiplexing do not eliminate the redirect cost — the client still has to receive the 302, parse `Location`, and issue a new request.
2. **URL fidelity.** The URL the agent reports to the user is the URL the user actually asked about. With 302, the agent ends up at `/llms.txt` — a different URL than the homepage.
3. **Cleaner caching.** Inline with `Vary: Accept` lets caches store both representations under one URL key. With 302, caches have to handle two URLs and keep their coherence.
4. **No magic agent-only URL.** Inline keeps the URL space unified — humans and agents hit the same URL; the server picks what to serve based on Accept.

## What AgentGrade checks

The `Inline content negotiation` check sends an agent-shaped request (`claude-code/1.0.0` UA with `Accept: text/markdown, text/html, */*`) and verifies the response does not end up at a different URL than a browser request would. Specifically: if the agent fetch was redirected to a path that the browser fetch was not, the check fails.

Universal redirects that affect everyone (HTTPS upgrade, trailing-slash normalization) are not penalized — only agent-specific redirects to a separate URL.

## How to fix

Replace your 302 logic with inline negotiation. Express example:

```javascript
app.get('/', async (req, res) => {
  res.vary('Accept');
  const best = req.accepts(['text/html', 'text/markdown', 'text/plain']);
  if (best === 'text/markdown') {
    return res.type('text/markdown').send(await buildLlmsTxt());
  }
  // Honor browser preference
  res.sendFile('index.html');
});
```

The `/llms.txt` route can still exist as a separate URL — both routes call the same content function. Sites that fetch `/llms.txt` directly still work; sites that hit `/` with an agent Accept also work, in one request.

This check is emerging (optional) today — it does not yet penalize sites that use 302. It will graduate to required once industry adoption of inline is broad enough that the few remaining 302-based sites are clearly the outliers.
