What Vectorizing Our Entire Website Showed Us About SEO

SEO AI

Published on June 23, 2026

John Kuefler

Search does not really read keywords anymore. It reads meaning. That sounds like a small change and it is not, so I wanted to actually see what it meant for us. We took every page on our site, turned each one into a vector, did the same to the queries we show up for, and looked at the whole thing as a map. It told me more about our content in an afternoon than years of keyword reports ever had, and it changed how I think about SEO heading into the AI era. Here is what we found, and what it means for your site.

What a vector actually is, and cosine similarity

Let me define the two terms this whole thing rests on, because most articles skip them and then lose you.

A vector, here, is just a long list of numbers that stands in for a piece of text. You run your text through an embedding model and it hands back a few hundred numbers that, together, describe what the text means. The useful part is that text with similar meaning gets similar numbers, so it lands in nearly the same spot in this mathematical space, and text that means something different lands somewhere else. "Running shoes" and "sneakers" end up close together. "Running shoes" and "garden hose" end up far apart. The model worked that out from how the words are actually used, not from a dictionary.

Cosine similarity is just how you measure how close two of those vectors are. Picture each vector as an arrow pointing out from a center point. If two arrows point in nearly the same direction, the angle between them is small and their cosine similarity is high, which means the two pieces of text mean close to the same thing. If they point in very different directions, the angle is wide and the score is low. That is the whole trick: meaning turns into geometry, and "how related are these two things" turns into "what is the angle between these two arrows."

A diagram of cosine similarity: three phrases drawn as arrows from an origin, where similar phrases point in nearly the same direction and unrelated phrases point in very different directions — Cosine similarity in one picture. Phrases that mean similar things point in nearly the same direction (a small angle, a high score); unrelated phrases point in very different directions (a wide angle, a low score).

You do not need the math past that. What matters is the shift it allows: a machine can now judge whether two pieces of content are about the same thing without them sharing a single keyword. And that is exactly what modern search does with your pages.

How AI search actually uses this

When someone asks Google's AI Mode or an assistant like ChatGPT a question, it does not go hunting for your keywords. It turns the question into a vector, turns candidate passages from across the web into vectors, and pulls back the ones whose meaning sits closest by cosine similarity. Then it writes its answer out of those passages. That is the pattern under most AI answers: chunk the content, embed it, and at question time fetch the closest matches to ground the response.

A couple of things make this less forgiving than old search. These systems usually do not pull whole pages, they pull passages, so every section of your page is competing on its own. And they tend to fan one question out into a dozen related sub-questions and retrieve for each, so you get matched against questions you never thought to target. You are not optimizing a page for a keyword anymore. You are trying to make every part of it clearly mean something a machine can use.

And the reason this matters now, not someday: Pew Research found that when Google shows an AI summary, people click a normal result just 8 percent of the time, down from 15 percent when there is no summary, and they click a link inside the summary only about 1 percent of the time. Being the source the AI pulls from is quickly worth more than ranking in the links underneath it.

So we vectorized our entire site

I wanted to stop reading about this and look at it. So we embedded all 327 of our live pages with a language model, embedded the real Search Console queries we show up for, and put them in the same space. Then we flattened that down to a two-dimensional map so we could actually see it. Below is our whole website, drawn by meaning alone, with nobody tagging or sorting anything.

A semantic map of the LimeLight website: every page placed by meaning, clustering into topic neighborhoods, colored by content type — Every LimeLight page placed by meaning. The neighborhoods formed on their own; the faint lines connect pages that mean nearly the same thing. Color is content type.

Nobody designed that layout. It fell out of the math. Our brand and lifestyle content pulled into one corner, our BigCommerce and migration content into another, our AI and SEO work into a third. The model grouped our content the way a sharp reader would, with zero instructions, which is a good sign the meaning it is reading is real.

What it exposed

The map was interesting to look at. The diagnostics underneath it were the part that actually mattered, and they were a little humbling. Three things jumped out, and they are problems most content libraries have and cannot see:

Cannibalization. A bunch of our pages sat almost on top of each other because they mean nearly the same thing. We have five separate posts circling "how to choose an ecommerce marketing agency." Each one is fine on its own, but to a vector engine they are not five strong answers, they are five blurry half-answers to one question, splitting the signal between them.
Traffic going to the wrong page. For some queries we actually get impressions on, the page that ranks is only loosely related, while a better page for that question sits somewhere else on the site. We had a broad query landing on our homepage when a dedicated service page answered it far better. The visitor was being sent to the wrong door.
Real gaps. A handful of questions had no page anywhere on the site that matched them well. Not a weak page, no page. That is not an optimization problem, that is content we have not written yet.

None of that came from a keyword tool or a hunch. It came from comparing what our content actually means against what people are actually asking. That is a more honest content audit than I have ever been able to run.

It also made something obvious that I had felt for years but could not prove: the more a page tries to cover, the blurrier it looks to a machine. A focused page reads as a confident answer. A page that wanders reads as vaguely about nothing, because all that range gets averaged into one fuzzy point. Depth on one topic beats breadth across ten.

What a lot of AI SEO advice gets wrong

A lot of advice about this gets the next part wrong, and some of it comes from our own industry. You will see people imply you can feed engines your vectors, or stuff "semantic signals" into your markup, and win. That is not how it works, and chasing it will waste your money.

The vectors we made are ours. They are a diagnostic, for us. Google and the AI tools do not read your vectors, they compute their own, from your actual words, every time someone searches. There is no vector you can publish that makes you rank. I am being specific about this because the opposite gets implied constantly, and it sends people down the wrong road.

So where is the actual leverage? It is exactly what the diagnostic points at. Make your content genuinely clearer, more complete, and more distinct, so that when an engine reads it, the meaning it picks up is obvious and whole. Vectorization is not a trick you apply to content. It is just a way to check whether your content really says what you think it says.

What to actually do about it

The to-do list is not exotic. The difference now is that all of it is measurable instead of guesswork.

One page, one job. Find your near-duplicates and merge them into one definitive page. Five competing posts become one that wins.
Answer the whole question. Go deep enough on a topic that an engine can lift a complete, useful passage straight out of the page. Thin pages do not get cited.
Separate by intent, not wording. Two pages can share words and still both deserve to exist, as long as they answer genuinely different questions. Make that difference obvious in how you frame them.
Send each query to its best page. If a good query is landing on a weak page, build or strengthen the page that actually fits it, and link to it clearly.
Write in plain, complete answers. The passage-by-passage way these engines pull content rewards pages built as clear answers to real questions, not as keyword scaffolding.

We can do this for your site

The honest catch is that the map is the easy part to look at and the harder part to build. You need every page and a real pull of the queries you show up for, embedded into the same space, before any of those patterns show up. We built that for ourselves first, on purpose, before offering it to anyone.

If you want the same picture of your own site, that is something we do. We vectorize your pages and your search data, hand you the map, and walk you through where you are cannibalizing yourself, where traffic is hitting the wrong page, and where the real gaps are. Then we help you fix it. It is the engine behind what we call agentic SEO, and if you are trying to stay visible as search turns into answers, it is a good place to start. If you want to see more of how we think about this, we wrote up why most sites were not built for AI and a fuller ecommerce SEO and AI search guide.

The brands that do well over the next few years will not be the ones who gamed the most keywords. They will be the ones whose content means something clear enough that a machine can repeat it. That is a higher bar, and honestly a better one.

Sources

OpenAI, Vector embeddings: how embeddings represent meaning and how distance measures relatedness.
Pew Research Center, Google users are less likely to click on links when an AI summary appears in the results (2025).
iPullRank, How AI Mode works: query fan-out and passage-level semantic retrieval.
Qdrant, What is RAG: embedding, vector retrieval, and cosine similarity.