Three years ago, a client asked me whether they needed to worry about "AI search". I told them it was early days and that traditional SEO would remain dominant for the foreseeable future. I was right about traditional SEO remaining important. I underestimated how fast the landscape would shift.
By the end of 2025, AI Overviews were appearing on over 30% of Google search results pages. Perplexity was processing over 100 million queries per month. ChatGPT's web browsing mode was being used by tens of millions of users to get sourced, cited answers to questions that would previously have led to a Google search.
GEO (Generative Engine Optimisation) is the practice of making your site the preferred citation source for AI-generated answers. When a user asks Perplexity "what is the best emergency preparedness kit for a family of four?", GEO determines whether your site appears in the cited sources or not.
The volume of queries going to AI engines rather than traditional search is growing every quarter. Sites that are not visible to AI engines are not just losing a marginal traffic channel: they are becoming invisible to an increasingly large share of their target audience.
The most important finding from studying AI citation patterns across hundreds of sites is this: AI engines predominantly cite sites that already rank in the top 10 organic results. In the data I have collected, approximately 63% of Perplexity citations go to sites ranking in the top 10 for the queried keyword. For Google AI Overviews, that figure is closer to 70%.
This is not a coincidence. It reflects how AI engines are built:
GPTBot, ClaudeBot, and PerplexityBot crawl pages on the open web. They crawl the same pages that Googlebot crawls. If a page is not crawlable, not indexed, or blocked in robots.txt: it is invisible to AI engines for exactly the same reasons it is invisible to Google.
When multiple pages could plausibly answer a query, AI engines favour the more authoritative source. Authoritativeness is largely determined by the same signals Google uses: the quality and quantity of inbound links, domain authority, brand recognition, and E-E-A-T signals. These are traditional SEO signals.
A site with hundreds of duplicate pages, redirect chains, broken internal links, and thin content is harder for any crawler to process efficiently. Crawl budget problems that hurt Google rankings hurt AI citation frequency for the same structural reasons.
The practical implication is clear: invest in SEO first, GEO-specific optimisations second. A site with a clean technical foundation, quality content, and strong backlinks will see GEO improvements from basic GEO actions. A site with fundamental SEO problems will see minimal GEO benefit from adding FAQPage schema.
Understanding the selection process helps you understand why each GEO signal matters. Different AI engines have slightly different architectures, but the core selection criteria are consistent.
Before any GEO optimisation can help, the page must be crawlable and indexed. AI crawlers respect robots.txt. A page blocked to GPTBot is simply not available to ChatGPT for citation, regardless of how good the content is. This is the first gate, and it is entirely an SEO problem.
The AI must determine that your page is relevant to the query. This uses the same signals as search rankings: keyword coverage, topical depth, semantic relevance. Strong on-site SEO and keyword research make this gate easier to pass.
Among relevant pages, AI engines select those they trust most. Trust is built from a combination of traditional SEO signals (domain authority, backlinks, brand recognition) and AI-specific signals (E-E-A-T markup, sameAs profiles, brand mentions, recency of content updates).
The final selection criterion is how easily the AI can extract a useful answer from your page. This is where GEO-specific optimisations matter most. Pages with FAQPage schema, clear heading hierarchies, direct answers in the opening paragraphs, and HowTo schema are significantly easier for AI engines to cite accurately. A page with the right content but poor structure loses at this final stage to a page with equivalent content and better structure.
With a solid SEO foundation in place, GEO-specific actions improve your citation rate across all five extractability dimensions AI engines evaluate.
FAQPage JSON-LD is the highest-single-impact GEO action. AI engines are built to answer questions. FAQPage schema gives them pre-formatted Q&A pairs that map directly to the output format they produce. My analysis across 200 sites found that pages with FAQPage schema are cited at approximately 3.4 times the rate of equivalent pages without it.
HowTo schema has a similar effect for procedural queries. When a user asks an AI engine "how to do X", HowTo markup makes your steps directly extractable: the AI reads the structured data rather than interpreting unstructured text.
Many sites that would never intentionally block AI crawlers are blocking them accidentally. A robots.txt file that includes User-agent: * / Disallow: / for maintenance purposes, or a staging domain firewall that leaked to production, or a WordPress security plugin that rate-limits crawlers: all of these can result in AI crawler blocks that eliminate your site from AI citation entirely.
The fix is simple: explicitly allow each major AI crawler by name in your robots.txt. GPTBot, ChatGPT-User, ClaudeBot, PerplexityBot, Google-Extended, and Amazonbot each need their own allow directive. Do not rely on the wildcard user-agent inheritance: be explicit.
AI engines prefer content that answers the question immediately. The inverted pyramid (answer first, elaboration second) is not just good editorial practice for readers. It matches exactly how AI engines extract and present information. Opening paragraphs are cited more than any other section of a page.
Question-style H2 and H3 headings serve a similar function. "What Is an Emergency Preparedness Kit?" is not just a heading that matches user query phrasing: it tells the AI exactly what question this section answers. Generic headings like "Overview" or "Introduction" provide no such signal.
AI engines do not just evaluate individual pages: they evaluate brands. Organization schema with sameAs links is how AI engines connect your website to your LinkedIn profile, your Wikipedia page (if you have one), your Crunchbase entry, your Wikidata entity, and your social profiles. This cross-referencing is how AI engines verify that you are a legitimate, identifiable brand rather than an anonymous website.
Unlinked brand mentions (your brand name appearing on credible third-party sites without a hyperlink) are also read by AI engines as brand authority signals. Press coverage, podcast appearances, and industry directory listings all contribute to this signal.
The llms.txt standard is still emerging, but adoption is accelerating. The file lives at yourdomain.com/llms.txt and provides AI engines with a structured overview of your site: what it is, who it serves, and which pages are most authoritative. It is the AI equivalent of a robots.txt for positive guidance rather than restrictions.
Creating an llms.txt file takes under an hour and is one of the clearest signals you can send that your site is deliberately optimised for AI engine access.
Based on implementation data across hundreds of sites, here are the GEO actions ranked by the size and speed of citation rate improvement:
If your robots.txt is blocking any major AI crawler, fixing this is the single action with the largest potential impact. It is also the fastest to implement. Use our AI Bot Checker to test your current robots.txt against all major AI crawlers instantly.
Every service page, blog post, and product page should have a FAQ section with matching FAQPage JSON-LD. Write questions exactly as AI users would phrase them. Answers should be 40 to 150 words, complete, and self-contained.
This establishes your brand identity in machine-readable format. Include all your authoritative profiles in the sameAs array. Use our Schema Builder to generate this in minutes.
Rewrite key pages to answer the primary question in the first paragraph. Add question-style H2 headings. This is an ongoing content improvement that compounds over time.
Use our llms.txt Generator to create a properly formatted file. List your most authoritative pages. Upload to your root directory.
Author attribution in schema is an E-E-A-T signal. AI engines favour attributed content from named authors with verifiable credentials over anonymous content.
Any step-by-step guide, tutorial, recipe, or setup process should have HowTo JSON-LD. The AI extraction improvement is particularly noticeable for instructional queries.
The clients who see the best GEO results are those who have done the SEO work first. Strong domain authority, quality backlinks, well-structured content, and clean technical health are not optional GEO prerequisites: they are the bedrock that GEO actions build on. A site with a domain rating of 10, no backlinks, and thin content will not become a trusted AI citation source by adding FAQPage schema. GEO magnifies existing SEO strength; it does not substitute for it.
Google's guidelines, and by extension AI engine trust signals, require that schema content matches the visible page content. FAQ questions in your JSON-LD that do not appear as readable text on the page are treated as spammy markup. Every FAQ item in your FAQPage schema must appear as visible HTML. Every HowTo step must exist in the body of the page. The schema is the machine-readable layer on top of the human-readable content, not a replacement for it.
There is a legitimate debate about blocking AI training crawlers (like CCBot) while allowing AI search crawlers (like GPTBot). The risk is that a misconfigured robots.txt intended to block training crawlers ends up blocking search crawlers too. Always be explicit in your robots.txt rather than relying on wildcard rules. Test every change with our AI Bot Checker before deploying.