Guides & Tools

Ecommerce SEO and AI Search: A Technical Playbook

There is no page two in AI search

Google gives a shopper ten links to choose from. An AI answer engine gives them a paragraph. For a growing share of queries, the first thing a shopper sees is a synthesized answer that names two or three brands, not a list of ten. You are one of the brands named, or you do not exist for that query. There is no scrolling to position four. There is no page two.

That single shift rewrites the ecommerce SEO playbook. Backlink counts and domain authority matter less than they did. What matters now is whether a model can fetch your page, extract a clean answer from it, and find enough credible third parties saying good things about you to feel confident recommending you. This guide is the technical playbook for getting there. It moves from how answer engines actually work, to what our own data says about why most stores are invisible, to the specific fixes that move the needle, to how you measure any of it.

It is grounded in proprietary data. LimeLight's AI Search Readiness platform, built by our team and led by Senior Director and Partner John Kuefler, has extracted and scored content from thousands of ecommerce pages and run thousands of buyer-intent queries through the major answer engines. Where we cite a number below, it comes from that platform unless we link an outside source. The picture it paints is consistent: the technical bar for AI search is much higher than most brands realize, and almost no one is clearing it yet.

How AI answer engines actually work

To optimize for AI search you have to understand the pipeline. When a shopper asks an assistant "what is the best washable rug for a high-traffic entryway," roughly five things happen:

  1. Query understanding. The model rewrites the question into one or more search queries. This is semantic retrieval against an embedding index, not exact-match keyword lookup.
  2. Live retrieval. The engine fetches a batch of pages, often ten to twenty, many in real time.
  3. Extraction. Each page runs through a content extractor that strips navigation, footer, and ads and pulls the readable body. Structured data is parsed separately as clean facts.
  4. Synthesis. The model reads what survived extraction, writes an answer, and decides which sources to cite.
  5. Output. The shopper sees a short answer naming a few brands, with citation links to the sources the model leaned on.

Two things should jump out. If your content does not survive step three, you cannot appear in step four. And the citations in step five are usually third-party: the brand named in the answer is often not the source the model cited. The model cited a review site, a forum thread, or a video that recommended the brand. Both facts drive everything that follows.

The engines are not interchangeable, either. In our query testing, ChatGPT, Claude, and Perplexity recommend meaningfully different brands and cite very different sources for the same questions. A one-engine audit gives you a false read. We unpack the strategic shift in the future of organic search in an AI world.

What our data shows: most stores are invisible

We score ecommerce pages on the dimensions answer-engine extractors actually care about, each on a 0 to 100 scale. Across the corpus, the averages are bleak. A representative slice:

Average scores across the ecommerce pages in our corpus (0 to 100)
DimensionWhat it measuresAvg
Answer readinessCan an engine extract a clean question-to-answer from the page?8.7
Freshness signalsVisible publish and update dates plus dateModified schema11.9
Review proofMachine-readable reviews and aggregate ratings14.3
Schema depthCompleteness of the JSON-LD on the page31.7
Specs densityReal HTML tables for product attributes37.8
SurvivabilityBody content survives the trip through extractors43.6
Heading hierarchyOne H1, no skipped levels, descriptive H2s77.1

Read the top row again. The average page scores under nine out of one hundred on answer readiness, the dimension most predictive of getting cited. Headings, the thing SEO teams have drilled for a decade, are the one bright spot. The newer signals are wide open. In our corpus, fewer than one in ten pages clears the survivability bar most brands need to be reliably citable, and complete schema markup is vanishingly rare. The opportunity is wide open, and almost no one is doing the technical work to take it.

The engines also behave differently enough to matter. Same query set, three very different worldviews:

How the three engines behave on the same buyer-intent queries (our query testing)
EngineNames a brandCitations per answerLeans hardest on
ChatGPT (GPT-4o-mini)30.9%2.8Major publishers, Wikipedia
Claude (Haiku 4.5)21.5%6.1Amazon, major news media
Perplexity (Sonar)20.7%4.4YouTube, shopping retailers

ChatGPT names brands most often but cites the fewest sources. Claude cites the most and hedges. Perplexity leans hard on video. The takeaway: a credible AI search program is really three overlapping ones, each weighted to where your buyers actually research. Most stores never realize they are invisible because they only ever checked one engine. We dig into why in your website was not built for AI.

The technical foundation still wins

None of this replaces the fundamentals. Answer engines and classic crawlers read the same web, so a site that is slow, hard to crawl, or rendered entirely in JavaScript is invisible to both. Technical SEO is still where ecommerce visibility is won or lost. The essentials we prioritize:

  • Server-render the content that matters. If your product description, specs, and reviews only appear after JavaScript hydrates, most AI crawlers never see them. View source on a product page and search for a sentence from the description. If it is not in the raw HTML, you are not server-rendering it. Shopify, WooCommerce, Next.js with SSR, Remix, and Astro handle this well by default; pure client-side React or Vue single-page apps are the worst offenders we see.
  • Crawlability and clean architecture. A logical category and product hierarchy, no orphan pages, and an internal link graph that connects related products and content.
  • Site speed and Core Web Vitals. Fast, stable pages that hold up on mobile, where most ecommerce traffic lives.
  • Indexation hygiene. Correct canonicals, no duplicate product URLs competing with each other, and a clean, current sitemap.
  • Crawl access for AI. Confirm your robots rules and your CDN or firewall are not silently blocking the crawlers that feed AI answers.

That last point quietly sinks more brands than any other. The crawlers that power AI answers each declare a user-agent, and they honor robots.txt. A defensive WAF rule or a stray Disallow added eighteen months ago can make you invisible to half the AI ecosystem without anyone noticing. At minimum, make sure the retrieval crawlers can reach you:

User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

Note: providers run both training crawlers (GPTBot, ClaudeBot, Google-Extended) and separate search or retrieval crawlers (OAI-SearchBot, ChatGPT-User, PerplexityBot) used at query time. Decide training consent deliberately, but if you want to appear in AI answers, the retrieval crawlers must be allowed. We cover the ecommerce specifics in why your ecommerce site needs a technical SEO strategy and platform tactics in SEO tips for Shopify sites.

Structured data: how you talk to machines

If technical SEO makes your site readable, structured data makes it understandable. Schema.org markup labels what each part of a page is: this is a product, this is its price, this is a review, this is the author and date of an article. According to Google Search Central, Google uses the structured data it finds to understand a page and power rich results, and it recommends JSON-LD as the format. Google publishes click-through examples too, including one brand that saw an 82 percent higher click-through rate on pages that show as rich results.

Here is the part most teams miss: structured data is now arguably more important for AI than for Google. Google has gotten good enough at reading unstructured content that schema is mostly a tiebreaker for it. Answer engines, working at scale, lean on schema heavily because it is a deterministic shortcut to clean facts (name, price, rating, author, date) without parsing and guessing. In our data, pages with deep schema get cited more often than pages with shallow or no schema.

For ecommerce that means real Product, Offer, Review, and Breadcrumb markup on product and category pages, and Article markup with author, datePublished, and dateModified on blogs and guides. The fields that matter most for citation are aggregateRating, offers with availability and price, brand, and a genuine description rather than just the product name. Most platforms emit partial schema by default; you almost always have to extend it. Validate every template with Google's Rich Results Test and the Schema.org validator. The same structured thinking powers a smarter on-site search experience, which we cover in using AI to optimize ecommerce search.

Answer engine optimization: write for extraction

Answer engine optimization (AEO) is the practice of earning your brand a place inside AI-generated answers. Some call it generative engine optimization (GEO); same idea, different acronym. Where classic SEO aims for a ranking, AEO aims to be the source an assistant quotes. Three habits do most of the work:

  • Structure pages around buyer questions, not keywords. Models extract question-to-answer pairs, so use the real question verbatim as an H2 and answer it in the very next paragraph. "Are these shoes good for flat feet?" beats a heading that just says "Features." A genuine FAQ section with the actual questions buyers ask, in the order they ask them, is the single highest-impact content change most stores can make in a week.
  • Put real data in real HTML tables. Specs, dimensions, compatibility, sizing, and comparisons belong in actual table elements, not prose and not images of tables. Tables are the highest-density format an extractor handles, and they parse perfectly. The same facts buried in a sentence parse inconsistently.
  • Add freshness signals. Answer engines prefer recent content. Show a visible last-updated date, set datePublished and dateModified in your schema, and actually re-review pages on a cadence rather than bumping a date. Freshness is one of the lowest-scoring dimensions in our corpus, which means it is one of the easiest places to stand out.

Underneath all three is specificity. Concrete, verifiable claims get extracted and trusted; vague marketing copy gets filtered out. That is also why purely AI-written content tends to underperform here: it skews generic and hedge-y, exactly what answer engines drop. Use AI to draft and structure, then have a human add the specifics, the proprietary data, and the real buyer questions.

The citation graph: earned media is now a performance channel

This is the fix most brands skip because it is the slowest to pay off, and it is also the one that compounds and is hardest to copy. The deciding signal in most AI answers is third-party. When we bucket every cited source in our data, owned brand and merchant pages account for the large majority of citations, but a meaningful slice, on the order of fifteen percent, goes to a small set of named third parties: review sites, forums, video reviewers, major publishers, and reference sites. Owning your share of that slice is the highest-leverage move in AI search, because that is where the model forms its opinion of who to recommend.

Which third parties matter depends on the engine and the category. Perplexity leans on video reviewers and shopping retailers; Claude surfaces Amazon listings and major news media constantly; ChatGPT favors major publishers and Wikipedia. Within a vertical the gatekeepers narrow further: supplements live or die on health publishers and PubMed, mattresses on sleep-review sites, outdoor gear on a handful of dedicated review publications. A practical earned-media program looks like this:

  • Run your real buyer-intent queries through all three engines and log every source they cite. That list is your target media plan.
  • Build a relationship strategy per source: pitch the writers, send product, sponsor honest reviews, contribute expert quotes, and track which placements actually result in AI citations.
  • Show up authentically in the three to five forums and subreddits where your category is discussed. Answer questions where your product is genuinely the right answer, and disclose your affiliation. Communities detect astroturfing instantly, and it backfires.
  • Identify the ten to twenty video reviewers an engine cites in your space, send product, and earn honest long-form reviews.

A brand often has zero control over what a cited reviewer says, but full control over whether it invests in being on that reviewer's radar at all, and that gap is where most stores leave AI visibility on the table.

AI shopping and agentic commerce

The step beyond answers is action. Assistants are beginning to shop: comparing products, assembling carts, and in some flows completing checkout on a shopper's behalf. Google has started folding AI directly into the buying journey, which raises a blunt question for every store: can an agent actually use your checkout? A flow that is clear, fast, and standards-based now decides whether an agent can buy from you at all, on top of helping humans convert. Hidden steps, custom JavaScript widgets, and unlabeled form fields that merely annoy a human can stop an agent cold. We break down what Google's move means for merchants in Google AI is changing ecommerce: when search becomes the checkout.

Test your own site the way an engine sees it

You can see roughly what an answer engine sees with a few lines of Python, using the same kind of open-source extractor the pipelines rely on. Run it against your top product pages and read what comes back:

# pip install requests trafilatura
import requests, trafilatura

URL = "https://your-store.com/products/your-best-seller"
html = requests.get(URL, headers={"User-Agent": "Mozilla/5.0"}, timeout=15).text

# What the extractor keeps after stripping nav, footer, and ads:
print(trafilatura.extract(html, include_links=True, output_format="markdown"))

If the output is blank, or full of menu items and boilerplate instead of your product copy and specs, your survivability is broken and no amount of keyword work will fix it. Then simulate the real crawlers by swapping the User-Agent for GPTBot, ClaudeBot, or PerplexityBot and watch the status code: a 403 or 451, or a body much smaller than a normal browser request, means a firewall or bot rule is blocking the very systems you want to appear in.

Share of voice: the KPI you can defend

There is no position one in AI search, so ranking is the wrong thing to report. The metric that works is share of voice. Define thirty to one hundred buyer-intent questions in your category, run them through the major engines every month, and count how often each brand gets named. Your share of voice is your slice of that pie. It is binary (named or not), comparable across competitors, and it produces a quarter-over-quarter trendline a CMO can actually defend: "we were named in 40 percent of category questions last quarter and 55 percent this quarter, here is what changed." Weight position into the score, because being the first brand named is worth far more than being the fifth. Everything else in this guide, survivability, schema, freshness, citations, is a leading indicator. Share of voice is the lagging one that proves the work paid off.

A 30, 60, 90 day plan

Here is what we would actually do, mapped to a calendar, if we took over an ecommerce brand on Monday:

Days 1 to 30, foundation. Audit robots.txt and allow the retrieval crawlers on day one. Run the extractor across your top 25 pages and build a fix list. Move all critical content (descriptions, specs, prices, reviews) to server-rendered HTML. Publish a first /llms.txt file, an emerging, still-unproven standard that almost no competitor has adopted yet, so it is a cheap flag to plant.

Days 31 to 60, schema and structure. Add complete Product schema to every product page and validate each one. Add real buyer-question FAQs with the questions as H2s. Convert spec lists, comparison content, and sizing charts to real HTML tables. Add visible dates and dateModified across the site.

Days 61 to 90, earned media and measurement. Identify the ten most-cited third parties in your category and start pitching and sending product. Establish authentic presence in three to five relevant communities. Set up monthly share-of-voice tracking against a fixed query set and take a baseline.

By day ninety you have a measurable program with a real KPI on the wall, not a vague "we are doing AI search now."

Content, clusters, and topical authority

All of the above rewards brands that are genuine authorities on their topic, and authority is built with content. The most durable structure is the hub and cluster model: a comprehensive hub page on a core topic, like this one, supported by focused articles that go deep on each sub-topic and link back to the hub. This helps shoppers, classic search, and answer engines alike. It signals the full shape of your expertise, spreads ranking strength across related pages through internal links, and gives an answer engine a well-organized body of work to draw from. Pair it with a habit of keeping pages accurate and fresh, and visibility compounds instead of being chased one post at a time.

Frequently asked questions

How do you rank in AI search?

You do not rank, you get named. Make your content extractable with server-rendered HTML, complete schema, and real tables; structure it around buyer questions with the answer right after the question; and invest in being talked about on the third-party sources the engines cite. Then track share of voice across the major engines every month.

What is AEO and how is it different from SEO?

AEO (answer engine optimization) is optimizing for AI answer engines like Perplexity, ChatGPT, and Claude. The core difference from SEO is that AEO targets binary visibility, named or not named, rather than a ranking position, and it weights structured data and third-party citations more heavily than backlinks.

Does schema markup matter more for AI than for Google?

Increasingly, yes. Google can now read unstructured content well, so schema is largely a tiebreaker for it. Answer engines lean on schema heavily because it hands them clean, deterministic facts at scale. In our data, deeper schema correlates with more citations.

Should I block AI crawlers to protect my content?

For ecommerce, almost never. Blocking the retrieval crawlers does little to protect content that was likely scraped years ago, and it immediately makes you invisible in those engines' real-time answers. Decide training consent deliberately, but keep the search crawlers allowed.

Can I just use ChatGPT to write content optimized for AI search?

Use it to draft and structure, not to finish. AI-written copy tends to be generic and hedge-y, which is exactly what answer engines filter out. A human has to add the specifics, the proprietary data, and the real buyer questions. That combination is what gets cited.

How often should I run an AI search audit?

Quarterly at a minimum, monthly while you are actively making changes. The engines refresh on different cadences and share of voice can move week to week in competitive categories.

Your next move

Ecommerce SEO is not going away. It is widening to include how machines read, understand, cite, and even buy from your store. The brands that win the next few years will treat technical health, structured data, answer-ready content, the third-party citation graph, and an agent-friendly checkout as one connected system. In most agencies those capabilities live on different teams, and the gap between them is where strategy falls apart.

We built LimeLight to operate across that gap, with the SEO and engineering to ship the technical work, the content strategy to make your store the clearest answer in your category, and the proprietary platform that measures whether any of it is working. If your organic visibility has slipped, or you are not sure how AI search is affecting your store, book a strategy call and we will show you exactly where you stand against the data and what it would take to win.

Ready to grow your brand?

Let's talk about how LimeLight can help you scale.

Book a Call