Nine things every site needs to be findable in 2026 — five from classic SEO hygiene, four from AI-search hygiene. You can check all nine yourself in about 45 minutes.

This is the same checklist the AskVolume audit runs automatically. We're publishing it as a standalone reference because (a) you should be able to spot-check your own site without us, and (b) if you understand the underlying signal each item tests for, the recommendations make more sense.

Print this, run it against your homepage and your top three category pages. Score yourself out of 9. Below 6 is a problem; below 4 means most AI search engines can't cite you reliably.

1. Unique title tag, 45–60 characters

Check it

View source on your page, find <title>. Count characters. The text inside should be unique to this page (not repeated across your site) and between roughly 45 and 60 characters.

Why it matters

The title is the single strongest on-page signal for both Google and AI engines. It tells the engine what the page is about. Too short and you give up snippet real estate; too long and Google truncates it.

Fix

Rewrite the title to: specific topic + brand. For a company page about pricing: "Pricing — [Company]" (often too short). Better: "Pricing for teams of 5–500 — [Company]" (specific, scannable, within range).

2. Meta description that survives quotation

Check it

Find <meta name="description"> in the head. Read the content. Imagine an AI engine quoting it as the snippet for your page. Does it stand on its own? Does it contain a verb? A concrete claim?

Why it matters

AI engines frequently quote the meta description verbatim as the citation snippet. Google rewrites it 70% of the time, but AI engines respect it more. A weak meta description leaks into every AI-generated answer that cites you.

Fix

Rewrite as a self-contained sentence with a concrete claim. Bad: "Acme makes great products for businesses." Good: "Acme helps SaaS teams cut customer-support time by 40% with an AI knowledge-base that learns from past tickets."

3. Exactly one H1 per page

Check it

In the page source, count <h1> tags. There should be exactly one. Some frameworks accidentally render two (one in the header, one in the body).

Why it matters

H1 is the unambiguous "this is what the page is about" signal. Multiple H1s split the signal; zero H1s force the engine to guess from text density. Both hurt citation accuracy.

Fix

Demote duplicate H1s to H2s. If your site logo is wrapped in an H1, wrap it in a regular <span>instead and reserve the H1 for the page's actual headline.

4. Article (or Product, FAQ, HowTo) JSON-LD schema

Check it

Search the page source for application/ld+json. There should be at least one such script block, ideally containing an Article, Product, FAQPage, or HowTo schema relevant to the page's content.

Why it matters

Structured data lets engines extract entities, dates, prices, and relationships without parsing prose. It's also a credibility signal: sites that bother with schema are more carefully built.

Fix

For a blog post, add this minimum Article schema:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Your article title",
  "datePublished": "2026-05-25",
  "dateModified": "2026-05-25",
  "author": { "@type": "Organization", "name": "Your Brand" },
  "publisher": { "@type": "Organization", "name": "Your Brand" }
}
</script>

For a product page, swap to Product schema with name, description, offers, and a brand. For an FAQ section, FAQPage. Test your markup in Google's Rich Results Test.

5. Working /sitemap.xml

Check it

Visit https://yourdomain.com/sitemap.xml. You should see an XML document listing every important URL on your site, with<lastmod> dates.

Why it matters

Sitemaps tell crawlers your full content surface. Without one, deep pages on large sites stay partially indexed. AI engines also use sitemaps to know which URLs to consider citing for follow-up questions.

Fix

Next.js: add app/sitemap.ts that exports a function returning the URL list. WordPress: install Yoast or RankMath, both generate sitemaps automatically. Static sites: most build tools have a sitemap plugin. Then submit your sitemap in Google Search Console under Sitemaps.

6. /robots.txt that explicitly allows what you want

Check it

Visit https://yourdomain.com/robots.txt. It should exist, allow the major search crawlers (Googlebot, Bingbot), and ideally allow the AI crawlers you want (GPTBot, ClaudeBot, PerplexityBot, anthropic-ai). It should also reference your sitemap with Sitemap: ....

Why it matters

No robots.txt means crawlers guess. Some are conservative and skip sections. A clear, permissive robots.txt removes ambiguity.

Fix

User-agent: *
Allow: /

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

Adjust if you want to opt out of AI training (use Disallow: / under specific AI user-agents — but note this blocks all AI access, including the answer-engine citation crawlers).

7. /llms.txt with a clean canonical summary

Check it

Visit https://yourdomain.com/llms.txt. It should exist and contain a one-sentence brand description, your canonical URLs, and citation guidance.

Why it matters

See our full guide: What is llms.txt. Short version: it gives AI engines the citation cheat-sheet they have to infer otherwise.

Fix

~20 minutes of work. The guide above has a template. Drop it at your root.

8. At least 200 words of body content per page

Check it

View source, strip out scripts and styles, count the words in the body. Or use a browser extension like "Word Count".

Why it matters

Pages with less than ~200 words of body content rarely get cited. AI engines need enough text to extract a quotable answer and verify relevance. Thin pages get skipped even if they rank well in traditional search.

Fix

Add a real answer to the page's implicit question. If the page is "Pricing," that means: who is the pricing for, what does each tier include, what's the rough ROI break-even, common gotchas. 300–500 well-chosen words.

9. Fast first-byte (under 1 second to first paint)

Check it

Run your homepage through pagespeed.web.dev. Look at the "Time to First Byte" (TTFB) and Largest Contentful Paint (LCP). Aim for TTFB under 800ms and LCP under 2.5 seconds.

Why it matters

AI crawlers have per-page time budgets. Slow pages get partially loaded or skipped. Google also penalizes slow pages in ranking, which keeps them out of the candidate pool for AI engines.

Fix

The usual suspects: enable a CDN (Cloudflare, Vercel, Fastly), cache HTML at the edge, defer non-critical JavaScript, compress images. Modern frameworks (Next.js, Astro, SvelteKit) handle most of this automatically; older WordPress sites are where most of the speed problems live.

Scoring

Run all nine against your most important page. Each one is a binary pass/fail.

  • 9/9:You're ahead of 95% of the web. Focus on content depth and link-earning.
  • 7–8/9: Solid. Fix the gaps in a single afternoon and move on to content.
  • 5–6/9: Hygiene problems are likely capping your ceiling. Worth a focused week of work.
  • 3–4/9:AI engines probably can't cite you reliably. Start with items 1, 3, 4, and 8.
  • 0–2/9:Your site is invisible to AI search. This is the most fixable state — pick up the cheapest two and you'll see movement within weeks.

If you'd rather not score yourself, the AskVolume audit runs all nine automatically and adds a 10th — checking whether AI engines actually cite your domain for relevant queries today. Free, no signup, report in your inbox.