All guides

How to get your products cited by AI search (ChatGPT, Perplexity, Google AI)

A growing share of product research now happens inside an answer engine. A buyer types "best 90-minute fire-rated door closer for a hospital corridor" into ChatGPT, Perplexity, or Google's AI Mode, reads one synthesized answer, and clicks two or three cited links. If your product is in that answer, you're on the shortlist before a salesperson is involved. If it isn't, you don't get a second chance to rank lower — you're just absent.

This is not SEO with a new coat of paint, and it is not a content-marketing problem. Answer engines don't reward the cleverest copy; they reward the page a model can read, extract a specific fact from, and trust enough to attribute. That makes citation mostly a product-data and technical-hygiene problem, which is good news: it's fixable with concrete work rather than guesswork.

This guide walks through how these engines actually decide what to cite, the steps to make your catalog citable, where the major engines differ, how to measure whether any of it is working, and the pitfalls that quietly keep good products invisible. It's written to be even-handed: some of this is plumbing you should fix regardless of vendor, and some of it is genuinely hard at catalog scale, which is where tooling earns its keep.

How AI search actually decides what to cite

Every major answer engine runs a version of the same three-step pipeline. Understanding it tells you exactly where to intervene.

  1. Retrieve. The engine turns the buyer's prompt into one or more search queries and pulls a set of candidate pages from an index — Bing and OpenAI's own index for ChatGPT, Google's index for AI Overviews and AI Mode, Perplexity's blended index plus live fetches. If your page isn't retrievable for the query, nothing else matters.
  2. Read. The engine fetches the candidate pages and parses the rendered content. It is looking for specific, extractable facts: a spec, a compatibility statement, a price, a use case. Anything locked in an image, a downloadable PDF, or content that only appears after JavaScript runs is often invisible at this step.
  3. Synthesize and attribute. The model writes one answer and cites the sources it actually used for each claim. Citation is earned at the sentence level: the model pulls a fact, and it attributes the page that fact came from.

The practical takeaway: a citation requires you to win all three steps. Be in the index, be readable when fetched, and carry the exact fact the model needs to answer the question. Most catalogs fail at step two or three even when they rank fine in classic search.

Step 1: Make sure the engine can crawl, fetch, and render your pages

Before optimizing content, confirm the machines can see it. This is where a surprising number of catalogs lose silently.

  • Don't block the retrieval crawlers. Separate the two kinds of AI bots. Training crawlers (GPTBot, Google-Extended, ClaudeBot) collect data to train models. Retrieval/search crawlers (OAI-SearchBot and ChatGPT-User for ChatGPT, PerplexityBot and Perplexity-User for Perplexity, plain Googlebot for Google's AI features) fetch pages to answer live queries. To be cited, you must allow the retrieval crawlers. Blocking them in robots.txt — or, more commonly, at the CDN/WAF bot-management layer — removes you from AI answers entirely. Check both places; Cloudflare and Akamai bot rules block AI agents by default in many configs.
  • Server-render the facts. Open a product page, view source (not the inspector), and search the raw HTML for your key spec, price, and availability. If they only show up in the rendered DOM after JS executes, assume the fetcher missed them. ChatGPT-User and PerplexityBot do limited or no JavaScript rendering. Server-side render or pre-render the content that matters.
  • Get the facts out of images and PDFs. A spec sheet attached as a PDF, or a dimensions table baked into a product photo, is effectively unreadable. Put every attribute in real HTML text.
  • Keep pages fast and fetchable. Timeouts, aggressive rate limiting, and login walls all read as "no content" to a crawler on a deadline.

Step 2: Put extractable facts on the page — structure beats prose

Once a page is reachable, the model needs to lift a specific, unambiguous fact from it. Marketing prose buries facts; structure surfaces them.

  • Use a real spec table. A labeled HTML table with attribute, value, and unit (Voltage | 24 V DC, Thread | 1/2" NPT) is the single highest-leverage format. Models parse it cleanly and quote it confidently.
  • Add structured data. Mark up Product with Offer, AggregateRating, and Review, and add FAQPage for the questions buyers ask. Schema is strongest for Google's AI features, which lean directly on the structured index; for ChatGPT and Perplexity it's a secondary signal but still improves how reliably your facts are parsed. Validate with Google's Rich Results Test so it isn't silently broken.
  • State identifiers explicitly. GTIN/UPC, MPN, brand, and series should appear as labeled text on the page. These are how an engine resolves "this page is about that product" and reconciles it with reviews and mentions elsewhere.
  • Write the answer the buyer is searching for. Add a short, plain-language block covering what the product is for, what it's compatible with, and when to choose it over the alternative. Models lift these sentences almost verbatim into "best for…" and "X vs Y" answers.
  • Don't make the model infer. If a closer is rated for 90-minute fire doors, say "90-minute fire-rated" in text. Don't rely on a certification logo or a linked datasheet to carry it.

Step 3: Match the buyer's question, not your catalog

Answer engines are queried in natural language about a situation, not a SKU. The gap between how you describe a product and how a buyer describes their problem is where citations are won or lost — especially in B2B.

  • Cover application and compatibility. B2B buyers ask in terms of the job: "pump for chemical transfer at 80°C," "breaker compatible with a Square D QO panel," "gasket for a 4-inch flange, food-grade." If your page only lists the model number and a parts-list spec, the model can't connect it to that intent.
  • Include cross-references and substitutions. Part-number cross-references, "replaces" relationships, and equivalents are high-value because buyers and engines both search by the competitor's number.
  • Answer the comparison directly. For categories where "X vs Y" is a real query, a clear, factual comparison (not a hedge-everything page) gives the model something to cite on both sides.
  • Use the buyer's vocabulary and units. Mirror the terms, abbreviations, and units your customers actually type. A model matching a query to a page rewards that overlap.

This is the step where Anglera is honestly relevant: enriching every SKU against buyer signals — how the buyer searches, compares, and decides — and writing that structured content back to your PIM is exactly the work that turns a thin catalog row into something a model can match to a real question. The point isn't to publish more words; it's to make each SKU answer the question being asked.

Step 4: Earn corroboration off your own site

Engines don't trust a single self-published page on its own, particularly for a recommendation. They cross-check. Third-party signals are often the deciding factor in which of several sellers gets cited for the same product.

  • Keep your facts consistent everywhere the product appears. Manufacturer page, distributor listings, marketplaces, directories — same GTIN, same name, same specs. Inconsistency fractures entity resolution and makes the model less confident citing any of you.
  • Win the sources these engines over-index on. ChatGPT and Perplexity lean heavily on Reddit, forums, YouTube, and trade-specific communities. A genuinely helpful, accurate answer in the right subreddit or industry forum can get pulled into an AI answer that your own page never would. This is participation, not spam — low-effort link drops get filtered and can hurt you.
  • Pursue reviews and ratings. AggregateRating and real review text give the model corroboration it can quote, and "best" queries strongly favor products with visible review signal.
  • Get listed in the directories and comparison sites that already rank for your category. Those pages are frequent citation sources, and being in them is a second path into the answer.

Where the engines differ (and how to prioritize)

The fundamentals are shared, but the three big surfaces weight things differently. Optimize for the pipeline first, then tune.

  • Google AI Overviews / AI Mode. Built on Google's existing index, so what already ranks and what's marked up with valid schema has a head start. AI Mode does query fan-out — it decomposes one prompt into many sub-queries — which rewards deep, complete coverage of a topic and its edge cases. Note: AI Overviews are served by regular Googlebot, so blocking Google-Extended (a training control) does not remove you from them.
  • ChatGPT search. Uses Bing plus OpenAI's own index and crawler. Confirm Bing actually indexes your pages (many sites neglect Bing Webmaster Tools). ChatGPT favors authoritative, well-structured pages and pulls noticeably from Reddit and established reference sources.
  • Perplexity. The most citation-dense engine and the most real-time. It rewards freshness, clear structure, and forum/community corroboration, and it shows its sources prominently, so a citation here drives visible referral clicks.

If you have to sequence the work: fix crawlability and rendering (helps all three), then schema and Bing indexing (Google and ChatGPT), then community corroboration (ChatGPT and Perplexity).

How to measure whether it's working

AI citation is harder to measure than rankings, but "unmeasurable" is an excuse, not a fact. Run three loops.

  1. Prompt panel. Write a fixed set of 20–40 real buyer prompts for your category — application-based, comparison, and substitution queries. Run them monthly across ChatGPT, Perplexity, and Google AI Mode and log, per prompt: were you cited, which page, and which competitors showed up. This is your scoreboard. Use a clean/logged-out session so personalization doesn't flatter you.
  2. Server logs and referral analytics. Track hits from OAI-SearchBot, ChatGPT-User, PerplexityBot, and Googlebot to confirm the fetchers are actually reaching your product pages. Watch referral traffic from chatgpt.com, perplexity.ai, and gemini.google.com — note that some assistant traffic arrives without a referrer, so logs plus a brand-tracking question ("how did you hear about us") fill the gap.
  3. Tooling, with eyes open. A category of AI-visibility trackers (Profound, Peec, Otterly, and others) automates the prompt-panel loop. They're useful for scale and trend lines, but they sample and estimate — treat them as directional, and keep a small manual panel as ground truth.

Expect lag. Re-crawl and re-synthesis take weeks, and AI referral volume is still a minority of traffic for most B2B catalogs today — growing, but not yet where you stop caring about classic search.

Common pitfalls — and the distributor's specific problem

Most failures cluster into a short list:

  • Facts trapped in PDFs and images. The most common and most fixable. If the spec isn't HTML text, it doesn't exist to the model.
  • JS-gated content. Pages that look complete in a browser but are empty in raw HTML. The fetcher sees the empty version.
  • Accidental crawler blocks at the WAF/CDN. Teams add bot protection, sweep up the retrieval crawlers, and vanish from AI answers without realizing it.
  • Marketing fluff with no extractable fact. "Industry-leading performance" gives a model nothing to quote. A number does.
  • Inconsistent attributes across channels that break entity resolution, so no version of you is confidently cited.

For distributors and many retailers there's a harder, structural pitfall: you publish the manufacturer's copy verbatim, identical to ten competitors selling the same SKU. When every listing is the same words, an engine has no reason to cite you specifically, and the model often defaults to the manufacturer or the most-corroborated seller. Thin, duplicated catalog rows are invisible by construction. The fix is real per-SKU enrichment — distinct, structured, application-aware content tied to consistent identifiers, at catalog scale rather than on a few hero pages. That scale is the genuinely hard part: doing it by hand across tens of thousands of SKUs is where the project usually stalls, and where an enrichment layer that fills the data and writes it back to your PIM is worth evaluating against the cost of staying invisible.

Step-by-step checklist

  • Confirm the retrieval crawlers (OAI-SearchBot, ChatGPT-User, PerplexityBot, Perplexity-User, Googlebot) are allowed in robots.txt AND at your CDN/WAF bot-management layer
  • View raw page source and verify each SKU's key specs, price, and availability are in the server-rendered HTML, not JS-injected
  • Move every spec out of PDFs and images into a labeled HTML table with attribute, value, and unit
  • Add and validate Product, Offer, AggregateRating, Review, and FAQPage schema (Rich Results Test)
  • State GTIN/UPC, MPN, brand, and series as visible text, and keep them identical everywhere the product appears
  • Write a plain-language block on each page covering what it's for, what it's compatible with, and when to pick it over the alternative
  • Add part-number cross-references, equivalents, and 'replaces' relationships so substitution queries match
  • Ensure your pages are indexed in Bing (Bing Webmaster Tools), not just Google
  • Cover the whole catalog with distinct, application-aware content — replace verbatim manufacturer copy on duplicated SKUs
  • Seed accurate corroboration: consistent manufacturer/marketplace listings, relevant directories, and genuinely helpful Reddit/forum answers
  • Run a fixed 20–40 prompt panel monthly across ChatGPT, Perplexity, and Google AI Mode; log citations and competitors
  • Monitor server logs and referral analytics for AI bot hits and assistant referrals to confirm pages are being fetched

Frequently asked questions

Does blocking Google-Extended remove my products from Google's AI Overviews?

No. Google-Extended only governs whether your content is used for Gemini model training and grounding. AI Overviews and AI Mode are served using ordinary Googlebot and your existing search index, so blocking Google-Extended doesn't take you out of those answers. The crawlers to watch for actual AI citation are the per-engine retrieval bots and your CDN/WAF rules.

If I'm worried about my content being used to train models, can I still get cited?

Yes, if you separate the two crawler types. You can block the training crawlers (GPTBot, ClaudeBot, Google-Extended) while still allowing the retrieval/search crawlers (OAI-SearchBot, ChatGPT-User, PerplexityBot, Googlebot) that fetch pages to answer live queries. Citation depends on allowing the retrieval crawlers; blocking those is what makes you invisible.

Is schema markup required to be cited?

It's required-ish for Google's AI features and a strong helper everywhere. Google's AI Overviews and AI Mode lean directly on the structured index, so valid Product/Offer/Review/FAQ schema meaningfully improves your odds. ChatGPT and Perplexity rely on it less directly, but clean schema still makes your facts easier to parse and quote, so there's no scenario where correct markup hurts.

How long after I make changes will products start showing up in AI answers?

Plan on weeks, not days. The engines have to re-crawl your updated pages, re-index them, and incorporate them into freshly synthesized answers. Perplexity tends to reflect changes fastest because it fetches in near real time; Google and ChatGPT lag more. Use a monthly prompt panel to track the trend rather than checking daily.

Why do my competitors get cited for the exact same product I sell?

Usually because their page carries an extractable, distinct fact and yours carries verbatim manufacturer copy that ten other sellers also publish. When listings are identical, the engine has no reason to single you out and often defaults to the manufacturer or the most-corroborated seller. Distinct, application-aware, structured content per SKU — plus consistent identifiers and some third-party corroboration — is what gives a model a reason to cite you specifically.

Will optimizing for AI search hurt my traditional SEO?

No. The work overlaps heavily with good SEO and structured-data hygiene: faster pages, server-rendered facts, valid schema, complete attributes, and content that matches real queries all help classic rankings too. AI citation is additive. Traditional search is still the larger traffic source for most B2B catalogs today, so treat AEO as a complement you build on top of solid SEO, not a replacement for it.

See it on your own SKUs.

A 30-minute walkthrough on your categories and your supplier data.

Book a demo