Schema Markup for AI Search: What Actually Works (and What’s Just Noise)

A research-backed guide to structured data in the age of AI search. Including the data the industry doesn’t want to talk about.

Here’s the uncomfortable truth about schema markup and AI search: the industry consensus that structured data improves AI citation rates is based mostly on assumptions, not evidence. A February 2026 empirical study of 730 AI citations found that generic schema (Article, Organization, BreadcrumbList) provides zero measurable citation advantage. A December 2025 Search Atlas analysis of millions of LLM responses across OpenAI, Gemini, and Perplexity found the same thing: schema coverage has no correlation with how often AI systems cite a domain.

So should you skip schema entirely? No. But you need to understand what it actually does, what it doesn’t, and where your time is better spent. That’s what this guide is for.

Schema markup is still valuable. It still matters for Google’s rich results, for Knowledge Graph recognition, for helping machines parse your content accurately. Google and Microsoft both confirmed in March 2025 that they use structured data for their generative AI features. But “uses structured data” and “cites you more often because of schema” are two very different claims, and the research increasingly shows the gap between them is wide.

This guide covers the real research, the honest trade-offs, and the implementation that actually makes a difference. If you want inflated statistics and unverifiable claims, there are about 500 other schema guides that will happily provide them. If you want to know what the data actually says, keep reading.

By Tim Dini | Last updated February 2026

What Schema Markup Actually Is (and Why It Exists)

Schema markup is code you add to your website that tells machines what your content means, not just what it says. Think of it like labeling boxes when you move. A box labeled “kitchen” tells the movers something useful. A box with no label? They have to open it and figure it out themselves. Schema does the labeling for search engines and AI systems.

The technical version: schema uses a standardized vocabulary (Schema.org, created in 2011 by Google, Microsoft, Yahoo, and Yandex) to add structured metadata to your pages in a machine-readable format. The preferred format is JSON-LD (JavaScript Object Notation for Linked Data), which sits in your page’s code separately from the visible content.

The practical version: when you tell Google “this page is an Article, written by Tim Dini, published on February 25, 2026, about schema markup for AI search,” you’re not hoping Google figures that out from reading the page. You’re stating it explicitly in a format machines can process instantly.

There are over 800 schema types on Schema.org today, up from the original 297 when it launched. Most of them don’t matter for your purposes. The ones that do are surprisingly straightforward, and we’ll cover them in detail.

JSON-LD vs. Microdata vs. RDFa: Skip the Debate

There are three formats for implementing schema markup: JSON-LD, Microdata, and RDFa. Use JSON-LD. End of discussion.

Google explicitly recommends JSON-LD. Microsoft’s Bing uses it. It sits in a clean script block in your page header, separate from your HTML. It’s easier to add, easier to maintain, easier to debug, and won’t break your page layout if something goes wrong. Microdata and RDFa require you to weave markup into your existing HTML, which is messy, fragile, and creates maintenance headaches.

If your CMS or plugin generates Microdata instead of JSON-LD, either switch tools or accept the limitation. But if you’re adding schema manually or choosing a plugin, JSON-LD is the only format worth your time.

Schema Is Not a Ranking Factor (But That’s Not the Whole Story)

John Mueller, Google’s Search Advocate, confirmed in 2025 that structured data is not a direct ranking factor. Implementing schema will not, by itself, move you up in search results.

But schema unlocks things that do affect your visibility. Rich results (those enhanced search listings with star ratings, FAQ dropdowns, recipe cards, and product prices) require schema. And rich results get more clicks. Schema App’s quarterly business reviews documented significant CTR improvements when rich results were awarded. A Search Pilot test found a 20% traffic increase from Review schema on product pages.

Schema also feeds Google’s Knowledge Graph, which is the structured database Google uses to understand entities and their relationships. If Google’s Knowledge Graph knows your business is a local plumber in Chicago with 15 years of experience and 200+ five-star reviews, that understanding influences how Google represents you across all its products, including AI Overviews.

So schema isn’t a ranking factor in the traditional sense. It’s infrastructure. Skip it and you’re leaving real visibility on the table, even if it’s not the direct ranking signal some people claim.

Schema and AI Citations: What the Research Actually Says

This is where most schema guides go off the rails. They cite statistics like “schema improves AI citation rates by 2.5x” or “structured data increases AI visibility by 44%” without linking to the original studies. I tried to trace these numbers back to their sources. Most of them lead to other blog posts citing other blog posts, in an echo chamber where the same unverified claims get repeated until they sound like established facts.

A Growth Marshal study from February 2026 actually called this out by name: the practitioner consensus that schema improves AI visibility “originated through an LLM feedback loop in which AI platforms reproduced untested SEO recommendations from their training data.” In other words, AI tools started repeating claims that hadn’t been tested, and those claims became “conventional wisdom” because AI said them confidently.

Here’s what the actual empirical research says. And I should note: this is an area where new data is coming in regularly, so these findings represent the best available evidence right now, not permanent conclusions.

The Studies That Challenge the Consensus

Search Atlas (December 2025): Analyzed millions of LLM responses across OpenAI, Gemini, and Perplexity. Found that schema markup coverage has no correlation with how often a domain is cited by LLMs. Domains with complete schema coverage performed no better than domains with minimal or no schema. Important caveat: this study measured schema presence, not schema type or quality.

Growth Marshal (February 2026): Studied 730 AI citations across ChatGPT and Gemini with 1,006 pages across 75 commercial queries. Found that generic schema (Article, Organization, BreadcrumbList) provides zero measurable citation advantage. Pages with generic schema were actually cited less often (41.6%) than pages with no schema at all (59.8%). However, attribute-rich schema (Product and Review types with populated pricing, ratings, and specifications) showed a significant advantage at 61.7%, especially for lower-authority domains.

SearchVIU (October 2025): Built a test page with product information placed exclusively in JSON-LD schema markup (not visible on the page). Result: zero AI systems (ChatGPT, Claude, Perplexity, Gemini) extracted the schema-only data during direct page fetch. The AI systems read visible page content and ignored the hidden structured data entirely.

AccuraCast (December 2025): Analyzed 2,000+ prompts and 9,000 citations across ChatGPT, Google AI Overviews, and Perplexity. Found 81% of cited pages included schema markup. But as they noted, this is correlation, not causation. Wikipedia dominates AI citations and uses minimal schema. The high-authority sites that get cited most also tend to have schema because they tend to have mature SEO programs.

The Studies That Support Schema’s Value

Microsoft Bing (March 2025, SMX Munich): Fabrice Canel, Principal Product Manager at Microsoft Bing, confirmed on stage that “Schema markup helps Microsoft’s LLMs understand content.” This is an official statement from one of the two companies whose search infrastructure powers most AI search products (Bing powers ChatGPT’s web search).

Google (March 2025, Search Central Live NYC): Ryan Levering, Google’s structured data engineer, said “a lot of our systems run much better with structured data.” Google was explicit: structured data is critical for modern search features because it is efficient, precise, and easy for machines to process.

Schema App (January 2026 retrospective): Documented that throughout 2025, both Google and Microsoft publicly confirmed using schema for generative AI features. ChatGPT also confirmed using structured data to determine which products appear in its results. Schema App argues (and they have a business interest here, which I note for transparency) that schema’s real value is building a Content Knowledge Graph that helps AI understand entities and relationships, not directly triggering citations.

Search Engine Land (September 2025): Ran a head-to-head experiment comparing a page with well-structured schema against one without. The schema page showed improved visibility in AI Overviews, though the test was limited in scope.

What This Means for You (The Honest Assessment)

Here’s my synthesis after going through all of this research:

Generic, default-plugin schema (the kind most websites have) does not appear to improve AI citation rates. Adding Article schema because your WordPress plugin does it automatically is not an AI search strategy. It’s a checkbox.

Attribute-rich schema with real, populated data (pricing, ratings, specifications, detailed entity properties) shows genuine promise, especially for smaller domains that need every signal they can get. If you’re a local business competing against national brands, this is where schema earns its keep.

Schema’s biggest value for AI is indirect. It feeds Google’s Knowledge Graph, which influences how Google’s AI systems understand your business. It powers rich results that improve click-through rates. It helps machines parse your content accurately, which reduces errors when AI systems process your pages. None of that shows up as a direct “schema = more AI citations” correlation, but all of it matters.

The single biggest predictor of AI citations, according to the Growth Marshal study, is organic search rank position. Each position drop reduces citation odds by approximately 24%. Position-1 pages get cited 43% of the time; position-7 pages get cited 5% of the time. If you’re choosing between improving your schema and improving your organic rankings, rankings win. But ideally, you do both, because they support each other.

How AI Systems Actually Use Your Structured Data

Understanding how schema actually flows through AI systems is the difference between implementing it strategically and implementing it because a blog post told you to. The process isn’t what most people think.

AI search operates on a Retrieval-Augmented Generation (RAG) pipeline. When someone asks ChatGPT a question, it doesn’t scan the entire internet in real time. It queries a search backend (Bing, in ChatGPT’s case), retrieves candidate pages, extracts relevant information, and then generates a response with citations. Schema is theoretically relevant at the extraction and entity-resolution stages, where machine-readable labels reduce the work the AI system has to do.

But here’s the critical detail the SearchVIU test revealed: when AI systems fetch a page directly (what researchers call “Phase 4”), they read the visible content. They do not extract hidden JSON-LD data during that direct fetch. Zero out of five tested systems (ChatGPT, Claude, Perplexity, Gemini, Google AI Mode) could find product pricing that existed only in schema markup and wasn’t visible on the page.

The Four Phases Where Schema Matters (and Doesn’t)

Phase 1: Web Crawling. Search engine crawlers collect billions of pages for LLM training data. Schema is incorporated into this training data. This is where schema gets baked into the AI’s foundational knowledge.

Phase 2: Search Index. Search engines store metadata and structured data in their indexes. This is where schema gets extracted and stored for later use. Google’s Knowledge Graph is built partly from schema data collected at this phase.

Phase 3: Retrieval. When an AI system searches for relevant pages to answer a query, it uses the search engine’s index. Schema data stored at Phase 2 can influence which pages the search engine considers relevant and how it ranks them for the AI’s retrieval step.

Phase 4: Direct Fetch. The AI system retrieves the actual page and reads the content. Based on the SearchVIU tests, schema is NOT processed at this phase. The AI reads your visible text, your HTML structure, your headings, your lists. It does not parse your JSON-LD block.

This means schema’s value for AI operates through Phases 1-3 (the search index pipeline), not Phase 4 (the direct read). Your schema feeds Google’s Knowledge Graph, influences how Bing indexes your pages (and Bing powers ChatGPT), and helps search engines understand your content’s structure. But the AI model itself, when it’s reading your page to generate a response, is reading your visible content.

A Recent Experiment That Reinforces This

In early 2026, an SEO researcher created a test page for a fictional product called “DUCKYEA t-shirts.” He placed a fake company address only within JSON-LD schema markup (not visible on the page). Then he prompted both ChatGPT and Perplexity to find the address.

Both AI systems found the address. But not because they parsed the JSON-LD as structured data. They read the JSON-LD block as text on the page, the same way they’d read any other text. Since the schema wasn’t valid and contained made-up information, the researcher concluded that “the LLM agent is simply picking up whatever you are listing in the HTML. It does not matter if it is valid schema.”

This confirms what the SearchVIU tests showed: AI systems read your page content (including visible JSON-LD code), but they don’t process it as structured data in the way search engines do. They treat it as more text to read. Which means your schema helps AI indirectly (through search engines) but doesn’t give you a direct advantage during the AI’s own processing.

The Schema Types That Actually Matter

Google deprecated several schema types starting January 2026 (Practice Problem, Dataset for general search, Sitelinks Search Box, SpecialAnnouncement, Q&A). Some people panicked. They shouldn’t have. Google was pruning low-usage types, not abandoning structured data. The core schema types that drive real business value are fully supported and, if anything, more important than ever.

Here’s what to implement, in priority order, based on which types have the most documented impact on visibility, rich results, and (where data exists) AI search performance.

Tier 1: Implement These First (Highest Impact)

Organization schema tells Google who you are as a business entity. Your legal name, logo, contact information, social profiles, founding date, and physical location. This feeds directly into Google’s Knowledge Graph and is the foundation for entity recognition. If Google doesn’t understand your organization as an entity, everything else is built on sand. Include sameAs links to your official profiles (LinkedIn, social media, Wikipedia if applicable) to strengthen entity connections.

LocalBusiness schema (for businesses with physical locations) extends Organization with location-specific details: address, service area, hours of operation, geo-coordinates. If your clients are plumbers, lawyers, dentists, or any business that serves a geographic area (see the YMYL guide for why these industries face higher scrutiny), this is non-negotiable. It powers the local pack results, Google Business Profile connections, and helps AI systems answer “near me” queries accurately.

Person schema identifies authors, experts, and key people. The AccuraCast study found ChatGPT showed a particular preference for Person schema, with 70.4% of cited sources including it. Whether or not that’s causal, Person schema directly supports E-E-A-T signals by connecting your content to verifiable author identities. If your name is on the content, your Person schema should be on the site.

Product and Review schema are where the Growth Marshal study found the biggest impact. Product schema with populated attributes (pricing, availability, ratings, specifications) showed a 20-percentage-point advantage over generic schema in AI citation rates. For e-commerce and service businesses with concrete offerings, this is the highest-ROI schema investment for AI visibility.

Tier 2: Implement These Next (Strong Supporting Value)

Article schema establishes content type, authorship, publication dates, and publisher information. For any content-focused page (guides, blog posts, research), Article schema helps search engines categorize and attribute your content correctly. It’s straightforward to implement, especially with tools like Rank Math, and it reinforces the author and publication signals that AI systems evaluate.

FAQPage schema structures your frequently asked questions in a format that search engines can extract directly. Even though Google removed FAQ rich results for most sites in 2023, the schema still helps AI systems identify question-answer pairs on your page. Keep answers concise (40-60 words for optimal extraction, according to multiple sources) and make sure the FAQ content is visible on the page, not hidden behind accordions that AI crawlers might miss.

HowTo schema structures step-by-step instructions. If your content walks someone through a process, HowTo schema makes each step explicitly extractable. Number steps clearly and keep each to 1-2 sentences. This format aligns well with how AI systems chunk and extract procedural information.

BreadcrumbList schema shows the page’s position in the site hierarchy. It helps search engines understand site structure and can influence how your site is displayed in results. Simple, quick to implement, and it provides navigational context that benefits both users and machines.

Tier 3: Implement When You’re Ready to Go Deeper

SameAs links connect your entity to verified external profiles. Linking your Organization schema to your Wikipedia page, Wikidata entry, LinkedIn, and other authoritative profiles creates a web of corroborating signals. The Growth Marshal study noted that fewer than 4% of pages implemented sophisticated entity-linking techniques like Wikidata sameAs identifiers. This is essentially uncontested territory.

Service schema (for service businesses without physical products) lets you describe what you offer in structured terms. If you’re a marketing agency, law firm, or consulting practice, Service schema can specify your service types, pricing models, and service areas.

VideoObject and ImageObject schema help AI systems process your multimedia content. As AI systems become more multimodal, properly structured media metadata will matter more. For now, these are nice-to-have unless video or images are core to your content strategy.

How to Implement Schema Without Losing Your Mind

Let me save you some time. You do not need to hand-code JSON-LD for every page on your site. You need to understand what the code does (so you can verify it’s correct), but the actual generation can and should be handled by tools built for the job.

Here’s the honest breakdown of your implementation options, ranked by effort and reliability.

Option 1: WordPress SEO Plugins (Easiest Path)

If you’re on WordPress, plugins like Rank Math Pro, Yoast SEO Premium, or AIOSEO handle the most common schema types automatically. Rank Math Pro is what this site uses, and it generates Article, Organization, Person, FAQPage, and BreadcrumbList schema from fields you fill in through the editor. No code required.

The limitation: plugins generate schema based on the fields they offer. If you need custom attributes (like detailed product specifications or complex entity relationships), you’ll either need the plugin’s advanced settings or a manual JSON-LD block. For most business websites, the plugin-generated schema covers 80-90% of what you need.

The critical step most people skip: verify that your plugin’s output is actually correct. Just because Rank Math says it generated Article schema doesn’t mean all the fields are populated properly. Run every page through Google’s Rich Results Test after configuration. Garbage in, garbage out applies to schema just as much as anything else.

Option 2: Schema Generation Tools (For More Control)

Tools like Google’s Structured Data Markup Helper, Schema App, and Merkle’s Schema Generator let you build JSON-LD blocks without writing code from scratch. You select the schema type, fill in the properties, and the tool generates the code for you to paste into your site.

These are useful when you need schema types your CMS plugin doesn’t support, or when you want more granular control over the properties included. The output still needs to be validated, and you’ll need to know where to paste the code (typically in the page’s head section or via a custom HTML block).

Option 3: Manual JSON-LD (For Specific Needs)

Sometimes you need to write JSON-LD by hand. Entity-graph connections with sameAs and @id cross-referencing, nested schemas that link Organization to Person to Article, or highly specific Product attributes that no tool generates automatically.

If you’re going this route, start by studying working examples on sites that already implement well. Google’s Search Central documentation has code samples for every supported schema type. Use the Schema.org reference to see all available properties for each type. And test relentlessly. One misplaced bracket or missing comma breaks the entire block.

See the code examples in the next section.

The Validation Habit You Need to Build

Implementing schema without validating it is like installing a water heater without checking for leaks. It might work. It might flood the basement. You won’t know until something goes wrong.

Run every page with schema through two tools:

Google’s Rich Results Test (search.google.com/test/rich-results) shows whether Google can read your structured data and whether it qualifies for rich results. This is the tool that matters for Google search visibility.

The Schema.org Validator (validator.schema.org) checks whether your markup is technically correct according to Schema.org standards, regardless of which search engine will process it. It catches errors that Google’s tool might miss.

Check both your live URL and your raw code. Sometimes the URL test shows different results than code testing, especially if JavaScript rendering is involved. Make validation a habit, not a one-time thing. Run it after every significant page update.

One more thing worth doing while you’re in implementation mode: make sure your page is using semantic HTML tags properly. Wrapping your main content in <article> and <main> tags helps AI parsers identify what’s primary content versus navigation, sidebars, and footer boilerplate. It won’t move the needle dramatically on its own, but it removes friction for systems reading your page. Think of it as clearing the path rather than paving a new one.

JSON-LD Code Examples You Can Actually Use

The implementation section above tells you which tools to use. This section gives you the actual code.

Every example below is a working JSON-LD block you can copy, customize with your own information, and paste into your site. If you’re using Rank Math Pro or another SEO plugin, these examples show you what the plugin should be generating (so you can verify it’s doing its job). If you’re adding schema manually, these are your starting templates.

I’ve organized them by the same tier system from earlier in this guide. Tier 1 first, because that’s where the impact is.

Tier 1: The Code That Matters Most

Organization Schema

This is your business entity’s digital ID card. It tells Google’s Knowledge Graph who you are, where to find you, and which online profiles belong to you. The sameAs links are the part most people skip. Don’t skip them. They’re how Google connects your website to your LinkedIn, your social profiles, and (if you have one) your Wikipedia entry. Fewer than 4% of sites do this well, according to the Growth Marshal study. That’s free competitive space.

What to customize: Replace every “your” placeholder with real info. The @id field (“https://yoursite.com/#organization”) is an internal identifier that lets other schema blocks on your site reference this entity. Keep the format but use your actual URL. Add or remove sameAs links based on which profiles you actually maintain. Don’t link to a Facebook page you haven’t posted on since 2019.

LocalBusiness Schema

If you have a physical location (or serve a specific geographic area), this is your most important schema type. LocalBusiness extends Organization with the details that matter for “near me” queries and local pack results: your address, your hours, your service area, and your geo-coordinates. AI systems answering local queries pull this information directly.

The example below uses “Plumber” as the @type because, well, you can guess why. Replace it with your actual business type. Schema.org has specific types for hundreds of businesses: LegalService, Dentist, AccountingService, InsuranceAgency, AutoRepair, RealEstateAgent, and many more. Use the most specific type that fits.

What to customize: Change “Plumber” to your Schema.org business type. Get your exact geo-coordinates from Google Maps (right-click your location, click the coordinates to copy). Set openingHoursSpecification to your real hours. The areaServed field tells AI systems your service radius, which is particularly useful for businesses that travel to customers rather than having walk-in traffic. The sameAs links should include your Google Business Profile and major directory listings.

Person Schema

Person schema connects your content to a real human being with verifiable credentials. The AccuraCast study found 70.4% of ChatGPT-cited sources included Person schema. Whether that’s causal or just correlation with high-authority sites, it costs you nothing to implement and directly supports the E-E-A-T signals AI systems evaluate.

Put this on your About page and anywhere your author bio appears. The @id is critical here because your Article schema (covered below) will reference it to create the author-to-content connection.

What to customize: Notice that worksFor references the Organization @id from the first example. That’s the cross-referencing that builds a Content Knowledge Graph. The knowsAbout field is underused and genuinely helpful for AI systems trying to determine whether you’re a credible source on a topic. List real areas of expertise, not aspirational ones. And the sameAs links matter here just as much as they do for Organization. LinkedIn is particularly valuable for Person schema because it’s independently verifiable.

Product Schema (with Review)

This is where the Growth Marshal study found the real AI advantage. Product schema with populated attributes (actual pricing, real ratings, specific details) showed a 20-percentage-point citation advantage over generic schema. The key word is “populated.” An empty Product schema with just a name and nothing else is useless. The value comes from the data in the fields.

This example combines Product with AggregateRating (for star ratings based on multiple reviews) and individual Review markup. If you sell products or services with concrete pricing, this is your highest-ROI schema investment.

What to customize: Everything in this block should reflect real data. Real price, real rating, real review count, real customer review. This is where Mistake #2 from later in the guide (schema that doesn’t match your visible content) is most dangerous. If your page shows 47 reviews, your schema says 47 reviews. If your price changed last week, update the schema. Google’s Rich Results Test will flag mismatches, and AI systems cross-reference what your schema claims against what your page shows.

Tier 2: The Supporting Code

Article Schema

Article schema tells search engines and AI systems: this is a piece of content, here’s who wrote it, here’s when it was published and last updated, and here’s who stands behind it. If you’re publishing guides, blog posts, or any content-focused pages, every single one should have this.

Notice how the author and publisher fields reference the @id values from the Person and Organization examples above. That’s the three-layer chain (Article to Person to Organization) that gives AI systems a complete path to verify your content’s credibility.

What to customize: The dateModified field matters more than most people realize. AI systems and search engines both use it to assess content freshness. Update it every time you make a meaningful edit. Don’t update it when you fix a typo. The wordCount field is optional but useful as a content-quality signal. The keywords field is also optional; use it if your content covers a specific topic cluster, skip it if you’d just be stuffing terms.

FAQPage Schema

FAQ schema structures your question-and-answer content in a format AI systems can extract directly. Even though Google removed FAQ rich results for most sites in 2023, the structured data still helps AI systems identify Q&A pairs on your page. The key is keeping answers concise: 40 to 60 words for optimal extraction, according to multiple sources.

One important detail: the FAQ content in your schema must be visible on the page. Don’t hide it behind accordions or tabs that AI crawlers might not expand. If it’s in your schema, it should be on the page in plain view.

What to customize: Write your FAQs based on questions your customers actually ask. Not questions you wish they’d ask. If you have a customer service team, ask them what comes up most. If you don’t, check Google’s “People Also Ask” for your main keywords, or look at your competitors’ FAQ sections. The answers should be genuinely useful. If someone reads your FAQ answer and still needs to call you to understand the basics, the answer isn’t good enough.

HowTo Schema

HowTo schema is your best friend for any step-by-step content. It maps directly to procedural queries (“how do I…” questions), which are exactly the kind of queries AI systems handle well. Each step is explicitly extractable, which means AI can pull step 3 out of your 8-step process and cite it in a response without needing to summarize your entire page.

Keep each step to 1 to 2 sentences. Number them clearly. If a step has sub-steps, nest them. The structure should be so clear that someone could follow it without reading anything else on the page.

What to customize: The totalTime field uses ISO 8601 duration format: PT30M means 30 minutes, PT2H means 2 hours, PT1H30M means 90 minutes. Be honest about the time. If your “5-minute guide” actually takes 45 minutes, your schema is lying and people will notice. The url field on each step is optional but valuable; it lets AI systems deep-link to specific sections of your guide. Use anchor IDs (#step-1, #step-2) on your page headings to make this work.

BreadcrumbList Schema

BreadcrumbList is the simplest schema on this page and takes about two minutes to implement manually. It tells search engines where a page sits in your site hierarchy. The payoff is small but real: breadcrumb-enhanced search results give users navigational context and can improve click-through rates.

If you’re using Rank Math Pro, this is likely already generated for you. Verify it in the Rich Results Test. If not, here’s the code.

What to customize: Match your actual site hierarchy. The last item in the list is the current page. The position numbers must be sequential. If your site doesn’t have a logical hierarchy (everything is a top-level page), you probably don’t need BreadcrumbList. But if you have categories, guides, or any nested structure, this is worth the two minutes.

Putting It All Together: The Entity Chain

Here’s what makes the difference between “my site has schema” and “my site has a Content Knowledge Graph.” Look at how the @id fields connect across the examples above:

Organization (@id: yoursite.com/#organization) defines the business entity.

Person (@id: yoursite.com/#person-yourname) links to Organization via worksFor.

Article links to Person via author and to Organization via publisher.

LocalBusiness can link to Organization via parentOrganization for multi-location setups.

That’s the three-layer chain: content points to author, author points to organization. AI systems can follow these connections to verify that a real person at a real company wrote the content they’re considering citing. The Growth Marshal study found fewer than 4% of sites implement this kind of deliberate entity-linking. It’s essentially uncontested territory.

You don’t need to add all of these on day one. Start with Organization (on every page, via your site header or footer). Add Person to your About page and author bios. Add Article to every content page. Then layer in FAQPage, HowTo, and Product as the content warrants. Each piece you add strengthens the whole structure.

Then validate everything. Every page, both tools, every time. That’s the section right below this one.

Four Schema Mistakes Costing You Visibility

Most schema mistakes fall into the same categories. Fix these four and you’re ahead of the majority of sites I’ve audited.

Mistake #1: Implementing Schema and Never Checking It

You installed Rank Math or Yoast, toggled on the schema settings, and assumed everything was working. Maybe it was, initially. Then you changed your site theme, updated the plugin, or restructured a page, and the schema broke silently. Nobody noticed because nobody was checking.

Schema errors don’t send you a notification. They don’t crash your site. They just quietly stop working, which means your rich results disappear and your structured data stops feeding the Knowledge Graph. Google Search Console will flag some errors in its Rich Results reports, but only for the schema types Google supports. Other errors go undetected unless you’re actively validating.

Set a quarterly reminder to run your key pages through both validation tools. If you have more than 50 pages with schema, consider a crawl tool like Screaming Frog that can audit structured data at scale.

Mistake #2: Schema That Doesn’t Match Your Visible Content

If your page shows a 4.5-star rating but your schema says 4.8, you have a mismatch. If your schema claims you have 200 reviews but your page shows 47, that’s a problem. If your Article schema lists an author who isn’t credited anywhere on the visible page, AI systems notice.

Google explicitly warns about this. Mismatched schema and visible content can result in manual actions (penalties) and will absolutely undermine your credibility with both search engines and AI systems. AI platforms cross-reference claims against visible content. If your schema says one thing and your page says another, the charitable interpretation is sloppy implementation. The uncharitable interpretation is attempted manipulation.

The fix is simple: your schema should describe exactly what’s on the page. Not an aspirational version of it. Not what you plan to update it to. What’s actually there right now.

Mistake #3: Using Schema as a Substitute for Good Content

No amount of schema markup will make thin, unhelpful content visible to AI systems. The Growth Marshal study left the majority of citation variance unexplained, and identified content quality as the most likely dominant factor. Answer-first heading structure, entity clarity in running text, factual density, and modular extractability are all stronger citation predictors than any schema implementation.

I’ve seen sites spend weeks perfecting their schema while the underlying content is generic, unoriginal, and fails to answer the questions their audience is actually asking. That’s optimizing the label on a box that’s empty.

Get the content right first. Structure it clearly. Answer questions directly. Cite your sources. Then add schema to make the structure machine-readable. The order matters.

While you’re thinking about giving AI systems something to work with, don’t overlook image alt text. An AI parser reading your page sees alt=”chart” and gets nothing. It sees alt=”Chart showing 40% increase in organic traffic after schema implementation” and gets context it can actually use. Not schema, but the same principle: describe what’s there. Don’t make the machine guess.

Mistake #4: Treating Schema as a One-Time Project

Schema needs maintenance. Google deprecates types (they removed seven in June 2025, more in January 2026). Schema.org adds new vocabulary. Your business information changes. New pages get published without schema because nobody included it in the content workflow.

Build schema into your publishing process. When a new page goes live, schema implementation and validation should be part of the checklist, not an afterthought you might get to eventually. When your business information changes (new address, new services, updated hours), update the schema. When Google deprecates a type, audit your implementation.

The sites that win at structured data are the ones that treat it as ongoing maintenance, not a project with a completion date.

What’s Coming Next for Schema and AI

The structured data landscape is shifting in ways that go beyond traditional schema markup. Here’s what’s worth watching and what’s still too early to act on.

NLWeb: Microsoft’s Bet on Conversational Schema

Microsoft’s NLWeb initiative, led by RV Guha (the creator of Schema.org itself), is an open project built on Schema.org vocabulary that enables conversational AI interfaces to query website content in natural language. Early adopters are already using it for on-site search.

The implication: structured data isn’t just about how search engines display your content anymore. It’s about how AI agents interact with it. If NLWeb gains traction, your Schema.org markup becomes the API that AI agents use to have conversations with your website.

Status: early stage, worth watching, too early to implement unless you’re building tools for developers. But it signals where the larger ecosystem is heading.

Content Knowledge Graphs: The Next Frontier

Schema App’s CEO Martha van Berkel has been pushing a concept that’s gaining traction: Content Knowledge Graphs. Instead of treating schema as isolated markup on individual pages, you build a connected data layer across your entire site that maps entities, relationships, and topical authority.

Think of it this way: basic schema says “this page is an Article about schema markup.” A Content Knowledge Graph says “this Article about schema markup is written by Tim Dini, who is a Person associated with the Organization SearchLab Digital, which provides Services in the industries of automobile dealerships, legal, medical, and home services, and this Article is part of a collection of Guides about AI search optimization.” Everything is connected. Everything is contextualized.

The Growth Marshal study found that fewer than 4% of pages implemented anything resembling deliberate entity-linking. Firms that deploy Wikidata-linked sameAs identifiers, genuine @id cross-referencing across schema blocks, and nested entity structures are operating in essentially uncontested territory. If you want a competitive edge that almost nobody is exploiting, this is it.

llms.txt: Worth Mentioning, Not Worth Prioritizing (Yet)

The llms.txt specification is a proposed file (like robots.txt) that tells AI systems about your site’s most important content. It’s elegant in concept. The problem: as of mid-2025, analysis of 1,000 domains showed zero visits from major LLM crawlers (GPTBot, ClaudeBot, PerplexityBot) to llms.txt files. Only about 951 domains had published them.

I’ve added one to this site because the effort is minimal and the downside is zero. But if you’re prioritizing your time, schema markup on your key pages will deliver measurably more value than an llms.txt file that AI crawlers aren’t reading yet.

Resources and Tools

Validation and Testing

Google Rich Results Test – The definitive tool for testing whether Google can read your structured data and generate rich results from it.

Schema.org Validator – Tests your markup against Schema.org standards regardless of which search engine will process it. Catches errors Google’s tool might miss.

Google Search Console – Rich Results reports show structured data errors and validation issues across your entire site.

Schema Generation

Rank Math Pro – WordPress SEO plugin with built-in schema generation for Article, Organization, Person, FAQPage, HowTo, and more. What this site uses.

Google Structured Data Markup Helper – Free tool from Google that generates JSON-LD from your page content.

Schema App – Enterprise-level schema management. Overkill for small sites, but worth knowing about for its Content Knowledge Graph approach.

Reference Documentation

Schema.org – The official vocabulary reference. Over 800 types with full property documentation.

Google Search Central: Structured Data – Google’s documentation on which schema types they support and how to implement them.

Google Search Central: Succeeding in AI Search – Google’s (limited) guidance on AI search optimization.

AI Citation Monitoring (Evolving)

Semrush AI Visibility Toolkit – Tracks visibility across AI platforms. Available as a separate add-on ($99/month as of early 2026).

Search Atlas LLM Visibility – Monitors brand presence across ChatGPT, Claude, Gemini, and Perplexity. Included on all plans.

Otterly.ai – Tracks AI search responses and brand mentions across platforms.

This category is evolving fast. I’ll update this section as I test these tools and have honest assessments to share.

Frequently Asked Questions About Schema Markup and AI Search

Does schema markup directly improve my rankings?

No. Google confirmed in 2025 that structured data is not a direct ranking factor. But schema unlocks rich results (which get more clicks), feeds the Knowledge Graph (which influences how Google understands your business), and helps AI systems parse your content accurately. The value is real; it’s just indirect.

Will schema markup get my site cited more by ChatGPT and other AI tools?

The honest answer: generic schema (Article, Organization, BreadcrumbList) shows no measurable citation advantage in current studies. Attribute-rich schema (Product, Review with populated data) shows a significant advantage, especially for smaller domains. The biggest citation predictor is your organic search ranking, not your schema implementation.

Which schema type should I implement first?

Organization schema, because it establishes your business entity in Google’s Knowledge Graph. If you have a physical location, add LocalBusiness. If you publish content, add Article and Person. If you sell products or services with concrete attributes, add Product with detailed specifications. Start with what establishes who you are, then expand to what you offer and create.

Do I need a developer to implement schema markup?

For most implementations, no. WordPress plugins like Rank Math Pro handle the common schema types through settings panels. You need a developer (or developer-level comfort) for advanced implementations: custom entity-graph connections, nested schema relationships, or schema types your plugin doesn’t support. Start with the plugin; go custom when you hit its limits.

Google deprecated some schema types. Should I be worried?

No. Google pruned low-usage types like Practice Problem, Dataset (for general search), and SpecialAnnouncement. The core types that drive business value (Product, Article, Organization, Person, Review, LocalBusiness, FAQPage, HowTo) are fully supported and prioritized. John Mueller specifically clarified these are visual and functional refinements, not algorithmic penalties.

How do I know if my schema is working?

Run your pages through Google’s Rich Results Test and the Schema.org Validator. Check Google Search Console’s Rich Results reports for errors. If Google awards rich results for your schema type, it’s working. If you see errors or warnings, fix them. Make this a quarterly habit, not a one-time check.

Is the ‘GPT-4 improves from 16% to 54% with structured data’ stat real?

I tried to trace this claim to its original source. It’s attributed to a “Data World study” and gets cited across dozens of schema guides. I could not find the primary research. The stat may be real, but I can’t verify it, so I won’t cite it as fact. That’s the kind of thing this guide exists to call out. When you see a stat repeated everywhere without a link to the original study, be skeptical.

Key Research Sources Referenced in This Guide

Primary Empirical Research

Growth Marshal: Schema for AI Citation (February 2026) – 730 AI citations across ChatGPT and Gemini, 1,006 pages, 75 commercial queries. Found generic schema provides no citation advantage; attribute-rich schema outperforms by 20 percentage points.

Search Atlas: Limits of Schema Markup for AI Search (December 2025) – Analysis of millions of LLM responses across OpenAI, Gemini, and Perplexity. Schema coverage showed no correlation with LLM citation frequency.

SearchVIU: Schema Markup and AI in 2025 (December 2025) – Controlled test with fictional product data placed exclusively in schema markup. No AI system extracted schema-only data during direct fetch.

AccuraCast: Schema Markup Impact on AI Search (December 2025) – 2,000+ prompts and 9,000 citations across ChatGPT, Google AI Overviews, and Perplexity. 81% of cited pages included schema; Person schema showed 70.4% presence on ChatGPT-cited sources.

Search Engine Land: Schema and AI Overviews Experiment (September 2025) – Head-to-head test of page with vs. without structured schema in AI Overview visibility.

Official Platform Confirmations

Search Engine Roundtable: Microsoft Bing/Copilot Use Schema (March 2025) – Fabrice Canel, Principal Product Manager at Microsoft Bing, confirmed at SMX Munich that schema markup helps Microsoft’s LLMs understand content.

Schema App: What 2025 Revealed About AI Search (January 2026) – Retrospective documenting Google, Microsoft, and ChatGPT confirmations of structured data use in generative AI features throughout 2025.

SE Roundtable: ChatGPT/Perplexity Treat Schema as Text (February 2026) – Test showing AI systems read JSON-LD as text content, not as structured data. Confirms schema is not processed as structured data during direct page fetch.

Google Documentation and Deprecation Updates

Google Search Central: Structured Data Introduction – Official documentation on structured data implementation and supported types.

Google Search Central: Documentation Updates (Ongoing) – Changelog including January 2026 schema deprecations (Practice Problem, Dataset, Sitelinks Search Box).

Search Engine Journal: Google Deprecates Practice Problem (November 2025) – Coverage of November 2025 deprecation announcements with John Mueller’s clarification that changes are refinements, not penalties.

Foundational References

Schema.org – The official standardized vocabulary for structured data, maintained by Google, Microsoft, Yahoo, and Yandex.

Schema App: Semantic Value of Schema Markup in 2025 – Analysis of schema’s role in Knowledge Graph feeding and AI content understanding.

The Bottom Line

Schema markup is not the AI search silver bullet the industry wants it to be. Generic schema doesn’t move the needle for AI citations. The research is clear on that.

But schema is still genuinely valuable. It powers rich results that get more clicks. It feeds Knowledge Graphs that help machines understand your business. It provides the machine-readable structure that Google and Microsoft have both confirmed their AI systems use. And attribute-rich schema, the kind with real data in the fields, shows measurable advantages for smaller domains competing against established players.

The practical advice is straightforward: implement the Tier 1 schema types (Organization, LocalBusiness, Person, Product) with fully populated attributes. Validate everything. Maintain it over time. Don’t expect schema to compensate for thin content or poor rankings. And don’t believe inflated statistics that can’t be traced to their original research.

Schema is infrastructure. Treat it that way. Build it right, maintain it properly, and focus most of your energy on the things that drive the outcomes schema supports: great content!

Use the Keep Learning: Related Guides stack below to go much deeper into AEO, GEO, E-E-A-T, and YMYL.

If you want a monthly update on what’s working: Join The Punch List monthly email newsletter. One email a month, no spam, genuinely useful.

And if you’ve got a question this guide didn’t answer, reach out. I read everything.

Keep Learning: Related Guides

→

The Complete AEO Guide

The anchor guide for everything AI search optimization

→ →

The Complete GEO Guide

Practical strategies for getting cited by AI systems

→ →

E-E-A-T for AI Search

Building authority that AI systems actually recognize

→ →

YMYL Guide

Why AI holds your industry to a higher standard

→