How I Built a Publishing Stack for AI Search
I spent twelve years in ad tech. Banners to email to mobile to social — I watched attention move from one interface to the next, and I watched the infrastructure scramble to keep up.
When OpenAI announced they were putting ads in ChatGPT, that told me something specific: ChatGPT had become a place where people go to find things. Not just to mess around with a chatbot — to actually look stuff up, compare options, make decisions. The same behavior that used to happen on Google.
And it's not just ChatGPT. Perplexity is growing fast. Gemini is baked into Google's search results now — AI Overviews are pulling answers directly into the page before anyone clicks a link.
Discovery is moving from "search and scan" to "ask and get an answer."
The difference between getting ranked and getting cited
Here's what tripped me up at first. I assumed SEO and AEO were basically the same problem. They're not.
SEO is about convincing Google's crawler to rank your page. AEO — Answer Engine Optimization — is about giving an AI engine enough structured information that it decides to cite you. To pull a line from your article and put it in front of someone as the answer to their question.
When an AI doesn't cite you, it doesn't just skip you. It paraphrases your ideas using someone else's content. There's no "page 2" to check — you're either in the answer or you're not.
What I actually built
I wanted to understand this problem by building for it, so I built the publishing infrastructure around it.
Every article goes through a pipeline that handles 13 AEO signals automatically. Here's what that looks like.
There's a schema layer — FAQ Schema, NewsArticle, BreadcrumbList, Organization, SpeakableSpecification — all injected as JSON-LD. This tells an AI engine "here's exactly what this article is, who wrote it, what it covers, and which parts you can extract."
Then there's the discovery layer. I built an llms.txt file — basically robots.txt for AI crawlers — that tells language models what the site is and where to look. There's a companion llms-full.txt that regenerates from the database on every request, so AI engines always have a current index. Plus a Google News sitemap for anything time-sensitive.
And then the part I haven't seen elsewhere: content enrichment. Every article gets its key points extracted as structured data. Actionable insights get segmented by audience. Companies and people mentioned in the article get tagged as entities. All of this lives in structured database fields, not buried in HTML paragraphs. It's machine-readable by design.
It's not fully hands-off — I'm still the human in the loop, reviewing output and catching the spots where the AI gets it wrong. But the pipeline handles the heavy lifting. There are self-healing properties built in — quality gates that catch bad HTML, citation audits that strip fabricated URLs, schema validation that runs before anything goes live.
It's not perfect yet. But it's close enough that the default output is publishable, and my job is mostly tightening the edges.
What I'm still figuring out
AI engines are building their trust graphs right now. They're learning which domains to cite, which sources to trust, which content to pull from. I don't know exactly how that shakes out long-term.
What I do know is that the infrastructure I built gives me a way to measure it. Real-time visibility tracking across ChatGPT, Perplexity, Claude, and Gemini — so I can see what's working and what isn't, instead of guessing.
That's the part I care about most. Not the prediction. The measurement.