AI bots are crawling the web at an unprecedented scale, and your Shopify store is almost certainly being visited by them right now. Bots from OpenAI (GPTBot, ChatGPT-User), Anthropic (ClaudeBot), Perplexity, Google, ByteDance, and others regularly scan product pages, blog posts, and collection pages to train large language models or power AI-driven search engines. Whether that access is a good or bad thing for your business depends entirely on your strategy. As of 2025, an estimated 40-50% of all web traffic comes from bots, and AI-specific crawlers account for a rapidly growing share of that total.
This free tool fetches your site's robots.txt file and checks it against all major AI crawlers. You will see exactly which bots are allowed, which are blocked, and which are not mentioned at all. If a bot is not explicitly disallowed, it has full access by default. Many Shopify store owners are surprised to discover they have never configured any AI bot rules, leaving their product descriptions, images metadata, and pricing information available for training data. A recent analysis of 10,000 ecommerce sites found that fewer than 15% had any AI-specific rules in their robots.txt files.
Understanding your AI bot exposure is especially important if you sell products with unique descriptions, custom photography, or niche expertise. Your carefully written product copy could end up training a model that then helps your competitors generate similar content. On the other hand, blocking AI search bots like PerplexityBot or OAI-SearchBot could reduce your visibility in the rapidly growing AI search channel. The right configuration depends on your goals, and this tool helps you see where you stand today.
The AI search landscape is evolving faster than traditional search ever did. Perplexity AI processes millions of search queries daily, ChatGPT's browsing feature is used by over 100 million weekly active users, and Google's AI Overviews now appear in a significant portion of search results. Each of these platforms has its own crawler, and each respects (or does not respect) robots.txt differently. Having a clear picture of which bots can access your store is no longer optional. It is a strategic business decision that affects both content protection and revenue generation.
This tool is designed for any website, not just Shopify stores. Whether you run a WordPress blog, a BigCommerce store, a custom-built website, or any other platform, the AI bot checker will fetch and parse your robots.txt file and give you a clear report of your AI crawler exposure. The tool checks nine major AI bots that represent the most significant crawlers active on the web today, covering training bots, search bots, and hybrid crawlers from OpenAI, Anthropic, Google, ByteDance, and others.
AI Bot Checker Quick Facts
| Metric | Detail |
|---|---|
| AI Bots Checked | 9 major crawlers from 6 organizations |
| Protocol Used | Robots Exclusion Protocol (robots.txt) |
| Default Bot Access | Allowed unless explicitly blocked |
| Wildcard Detection | Checks User-agent: * rules for catch-all blocks |
| Shopify robots.txt Location | robots.txt.liquid in theme Templates folder |
| Customization Available Since | Shopify enabled robots.txt.liquid editing in 2021 |
| Ecommerce Sites With AI Rules | Fewer than 15% (estimated, 2025) |
| Compatible With | Any publicly accessible website with robots.txt |
How This Tool Works
When you enter a URL, this tool fetches the robots.txt file from the root of that domain. The robots.txt file is a plain text document that website owners use to communicate crawling rules to bots. It follows the Robots Exclusion Protocol, a standard that has been in use since the early days of the web (originally proposed by Martijn Koster in 1994). Each rule specifies a User-agent (the bot name) and a directive (Allow or Disallow) for specific URL paths.
The tool then parses every User-agent block in the file and cross-references them against nine major AI bots. It checks for both specific bot rules (like User-agent: GPTBot) and wildcard rules (like User-agent: * with Disallow: /). A wildcard disallow blocks all bots that are not explicitly allowed elsewhere in the file. The results are displayed in a clear table showing each bot's name, its purpose (training vs. search), and its current access status on your site.
Keep in mind that robots.txt is a voluntary standard. Well-behaved bots from major companies like OpenAI, Anthropic, and Google respect these rules, but not all crawlers do. The tool gives you visibility into what your declared policy is, which is the first step toward making intentional decisions about AI access to your store content.
The fetching process works through a server-side proxy to avoid CORS restrictions that would prevent direct browser-to-site requests. Your URL is sent securely, the robots.txt content is retrieved, and all parsing happens in your browser. No data is stored or logged. The entire analysis takes just a few seconds.
Step-by-Step Guide to Checking Your AI Bot Exposure
- Enter your website URL. Type your store's domain in the input field. You can enter just the domain (example.com) or the full URL (https://example.com). The tool will automatically add the https:// prefix if needed and append /robots.txt to fetch the file.
- Click "Check robots.txt." The tool fetches your site's robots.txt file and parses it for AI-specific rules. This typically takes 1-3 seconds depending on your server's response time.
- Review the summary. The top-level result shows how many of the 9 AI bots can currently access your site. A high number (7-9 allowed) means you have minimal AI bot blocking. A low number (0-2 allowed) means you have aggressive blocking in place.
- Examine the detailed table. Each row shows a specific AI bot, its purpose (training, search, or browsing), and its current access status. "Blocked" means an explicit Disallow rule exists. "Not mentioned" means no specific rule exists, so the bot is allowed by default.
- Decide on your strategy. Based on the results, determine whether you want to block training bots, allow search bots, or make other changes. See the strategy section below for guidance.
- Implement changes if needed. Edit your Shopify theme's robots.txt.liquid file to add or modify AI bot rules. Re-run this tool after making changes to verify they are applied correctly.
Real-World Examples
Different stores have different AI bot strategies. Here are three common configurations and the reasoning behind each.
Example 1: Open Access (Maximize AI Search Visibility)
A fashion brand selling trendy accessories decides to allow all AI bots, including training bots. Their reasoning: they want maximum exposure in AI search results, they update their product descriptions frequently so training data becomes stale quickly, and they believe the traffic benefits of AI search visibility outweigh the content protection risks. Their robots.txt has no AI-specific rules.
| Bot | Status | Reasoning |
|---|---|---|
| GPTBot | Allowed | Training data exposure accepted for search visibility |
| PerplexityBot | Allowed | Growing traffic source from Perplexity search |
| OAI-SearchBot | Allowed | SearchGPT becoming a product discovery channel |
| Google-Extended | Allowed | AI Overviews visibility prioritized |
Example 2: Selective Blocking (Protect Content, Allow Search)
A specialty coffee roaster with detailed product descriptions and brewing guides blocks training bots but allows search bots. Their unique content is a competitive advantage, and they do not want it used to train models that could help competitors generate similar copy. But they want to appear in AI search results when customers ask questions like "best single-origin coffee beans."
| Bot | Status | Reasoning |
|---|---|---|
| GPTBot | Blocked | Prevents training data extraction |
| Google-Extended | Blocked | Prevents Google AI training use |
| Bytespider | Blocked | Prevents ByteDance/TikTok AI training |
| PerplexityBot | Allowed | Perplexity sends measurable referral traffic |
| OAI-SearchBot | Allowed | SearchGPT is a growing discovery channel |
Example 3: Full Block (Maximum Content Protection)
A luxury jewelry brand with proprietary product descriptions, professional photography metadata, and extensive gemstone education content blocks all AI bots. Their content represents years of expert writing and significant photography investment. They rely primarily on direct traffic, email marketing, and traditional Google search for customer acquisition. AI search is not yet a meaningful traffic source for their niche.
| Bot | Status | Reasoning |
|---|---|---|
| All 9 AI bots | Blocked | Content protection is the top priority |
| Googlebot (standard) | Allowed | Traditional Google search remains primary traffic source |
AI Bot Types: Training vs. Search vs. Browsing
Understanding the different purposes of AI bots helps you make an informed blocking decision. Not all AI crawlers serve the same function.
| Bot Name | Organization | Primary Purpose | Blocking Impact |
|---|---|---|---|
| GPTBot | OpenAI | Training data collection | Prevents content from training future GPT models |
| ChatGPT-User | OpenAI | Real-time browsing for ChatGPT users | Prevents ChatGPT from reading your pages during conversations |
| OAI-SearchBot | OpenAI | SearchGPT search index | Removes your store from SearchGPT results |
| PerplexityBot | Perplexity AI | AI-powered search engine index | Removes your store from Perplexity search results |
| ClaudeBot | Anthropic | Training data collection | Prevents content from training Claude models |
| anthropic-ai | Anthropic | Training data collection | Prevents broader Anthropic training crawls |
| Google-Extended | AI training (Gemini, AI Overviews) | May reduce visibility in Google AI Overviews | |
| Bytespider | ByteDance | AI training for TikTok/ByteDance models | Prevents ByteDance from using your content for training |
| cohere-ai | Cohere | Enterprise AI training | Prevents Cohere from using your content for training |
The key distinction is between training bots (which ingest your content to improve AI models) and search bots (which index your content to show it in AI search results). Blocking training bots protects your intellectual property. Blocking search bots reduces your discoverability. Many merchants choose to block training bots while allowing search bots, getting the best of both worlds.
Why This Matters for Your Shopify Store
For Shopify merchants, AI crawling has direct implications for both content protection and discoverability. If you have invested in professional product photography, detailed descriptions, or expert buying guides, that content has real value. AI training bots like GPTBot and Google-Extended can ingest this content and use it to generate responses that compete with your original work. At the same time, AI search platforms like Perplexity and SearchGPT are becoming significant traffic sources. Blocking their crawlers means your products will not appear in AI-generated shopping recommendations or product comparisons.
This is particularly relevant for stores using Rubik Combined Listings to organize large catalogs. When you have separate product pages for each color or style, each page carries unique content that could be indexed by AI search engines. A store with 50 combined product pages has 50 opportunities to appear in AI search results. Understanding which bots can access those pages helps you make informed decisions about your crawling policy, whether you want maximum AI search visibility or maximum content protection.
The financial stakes are growing. Early data from Shopify stores that appear in AI search results shows referral traffic from Perplexity and ChatGPT growing 30-50% quarter-over-quarter. While this traffic is still small compared to traditional Google search for most stores, it is the fastest-growing referral channel in ecommerce. Stores that block AI search bots today may find it harder to reclaim visibility later, as AI search algorithms develop their own ranking factors based on historical crawl data.
Common Mistakes to Avoid
- Blocking all bots with a wildcard without understanding the impact. Adding
User-agent: * / Disallow: /blocks every bot, including Google's standard search crawler. This would remove your store from Google search entirely. Always use specific User-agent directives for AI bots rather than broad wildcards, unless you also add explicit Allow rules for bots you want to keep. - Confusing Google-Extended with Googlebot. Google-Extended is specifically for AI training and AI Overviews. Googlebot is the standard search crawler. Blocking Google-Extended does not affect your regular Google search rankings. Many merchants avoid blocking Google-Extended out of fear it will hurt their SEO, but the two crawlers are completely separate.
- Forgetting to check after theme changes. Shopify theme updates, app installations, and migrations can overwrite your robots.txt.liquid customizations. Always re-check your robots.txt after any significant change to your theme or app stack.
- Assuming robots.txt provides absolute protection. Robots.txt is a voluntary standard. While major companies like OpenAI, Google, and Anthropic respect it, smaller or less reputable crawlers may ignore it entirely. For stronger protection, you need server-side blocking through a CDN like Cloudflare or a firewall that blocks specific User-agent strings at the network level.
- Not having any AI bot strategy at all. The worst approach is having no strategy. If you have never thought about AI bot access, you are making a decision by default: full access to all crawlers. Run this tool, understand your current exposure, and make an intentional choice based on your business goals.
- Blocking AI search bots without checking referral traffic first. Before blocking PerplexityBot or OAI-SearchBot, check your analytics for referral traffic from perplexity.ai, chatgpt.com, and searchgpt.com. If these sources already send you customers, blocking their crawlers will cut off that revenue stream immediately.
- Setting and forgetting your robots.txt. The AI landscape changes rapidly. New bots appear, existing bots change their behavior, and new AI search platforms launch regularly. Review your robots.txt configuration at least quarterly to ensure it reflects your current strategy and covers the latest crawlers.
Tips and Best Practices
- Separate training bots from search bots. Consider blocking training-only bots (GPTBot, Google-Extended, anthropic-ai, Bytespider) while keeping search bots (OAI-SearchBot, PerplexityBot, ChatGPT-User) allowed. This protects your content from being used as training data while maintaining visibility in AI search results.
- Review your robots.txt after every theme or app change. Some Shopify apps and themes modify robots.txt through the theme's robots.txt.liquid file. Always re-check your configuration after installing new apps or switching themes to make sure your AI bot rules are still in place.
- Use Shopify's robots.txt.liquid for customization. Since 2021, Shopify allows merchants to customize robots.txt through the robots.txt.liquid template in their theme. This is the proper way to add AI bot rules rather than relying on third-party apps or workarounds.
- Monitor AI search traffic in your analytics. Before deciding to block AI search bots, check your referral traffic from sources like perplexity.ai, chatgpt.com, and searchgpt.com. If these channels already send you customers, blocking their crawlers would cut off that traffic.
- Re-check quarterly as the landscape evolves. New AI bots appear regularly. A robots.txt file written in 2024 may not cover bots launched in 2025. Run this tool every few months to make sure your policy covers the latest crawlers and aligns with your current business goals.
- Consider partial blocking for specific paths. Instead of blocking an AI bot entirely, you can block access to specific directories. For example, blocking GPTBot from /blogs/ protects your written content while allowing it to crawl /products/ for potential AI search inclusion. This granular approach gives you more control.
When to Use This Tool
| Scenario | Why This Tool Helps | Priority |
|---|---|---|
| You have never checked your AI bot exposure | Establishes a baseline understanding of your current configuration | High |
| You recently changed your Shopify theme | Verifies that custom robots.txt rules survived the theme change | High |
| You installed a new SEO or marketing app | Checks whether the app modified your robots.txt | Medium |
| You want to start blocking AI training bots | Shows which bots currently have access before you make changes | High |
| You heard about a new AI crawler | Verifies whether the new bot is covered by your existing rules | Medium |
| Quarterly robots.txt audit | Routine check to ensure your configuration still matches your strategy | Medium |
| Competitor analysis | Check what AI bot strategy your competitors are using | Low |
Related Free Tools
- Robots.txt Generator - Generate a custom robots.txt configuration for your Shopify store, including AI bot blocking rules. Use the checker to audit your current state, then use the generator to create an improved configuration.
- AI Readiness Checker - Evaluate how well your Shopify store is prepared for the AI-driven search era. Checks structured data, content quality, and technical factors that affect AI search visibility.
- SEO Checker - Run a comprehensive SEO audit on any Shopify page. Checks meta tags, headings, images, schema markup, and other factors that affect both traditional and AI search performance.
What AI bots does this tool check for?
It checks for GPTBot, ChatGPT-User, OAI-SearchBot (OpenAI), PerplexityBot, ClaudeBot, anthropic-ai (Anthropic), Google-Extended, Bytespider (ByteDance), and cohere-ai. These are the most common AI crawlers active on the web today, representing the major companies in the AI space.
What does "Not mentioned" mean?
If a bot is not mentioned in your robots.txt, it is allowed to crawl your site by default. The Robots Exclusion Protocol operates on an opt-out basis. Only explicit Disallow rules can block a bot. Most Shopify stores have no AI-specific rules in their robots.txt, meaning all nine bots have full access to every page on the store.
Should I block AI bots?
It depends on your goals. Blocking AI training bots prevents your content from being used to train models, which protects your unique product descriptions and imagery. However, blocking AI search bots may reduce your visibility in AI-powered search engines like Perplexity and SearchGPT, which are becoming meaningful traffic sources for ecommerce stores. The most common strategy is to block training bots while allowing search bots.
Does this work with any website?
Yes. Any publicly accessible website with a robots.txt file can be checked, not just Shopify stores. The tool works with WordPress sites, custom-built stores, BigCommerce, WooCommerce, Squarespace, Wix, and any other platform that serves a robots.txt file at the domain root.
What is the difference between GPTBot and ChatGPT-User?
GPTBot is used by OpenAI to crawl web content for training its models. ChatGPT-User is used when a ChatGPT user asks the AI to browse the web during a conversation. Blocking GPTBot stops training, while blocking ChatGPT-User prevents your site from appearing when users ask ChatGPT to look something up online. They serve fundamentally different purposes.
How do I edit robots.txt on Shopify?
In your Shopify admin, go to Online Store > Themes > Actions > Edit code, then find or create a file called robots.txt.liquid in the Templates folder. You can add User-agent and Disallow directives there. Shopify has documentation on the exact syntax and Liquid tags required. Changes take effect immediately after saving the file.
Will blocking AI bots affect my Google rankings?
No. Blocking AI-specific bots like GPTBot, ClaudeBot, or PerplexityBot has no effect on Google's standard search crawler (Googlebot). Google-Extended is specifically for AI training purposes and is separate from Googlebot. Your regular Google search rankings will not be affected by blocking AI crawlers. Google has explicitly confirmed this distinction.
Can AI bots ignore robots.txt?
Technically yes, but reputable companies (OpenAI, Anthropic, Google, Perplexity) have publicly committed to respecting robots.txt. Less well-known crawlers may not follow the rules. For stronger protection, you would need server-side blocking via your CDN or firewall, but robots.txt is the standard first line of defense and is respected by all major AI companies.
How often should I check my robots.txt?
Check at least every quarter, or whenever you make changes to your Shopify theme, install new apps, or hear about new AI crawlers entering the market. The AI landscape is evolving rapidly, and new bots appear frequently. Setting a quarterly calendar reminder ensures you do not fall behind on configuration updates.
Does blocking AI bots affect combined listings or product pages?
Blocking AI search bots means your individual product pages will not appear in AI-powered search results. If you use Combined Listings to create separate pages per color or style, each of those pages could potentially rank in AI search. Blocking crawlers removes that opportunity, so weigh the trade-off between content protection and AI search visibility carefully for your specific catalog size.
What about AI bots that are not on this list?
New AI crawlers appear regularly. This tool checks the nine most significant and widely deployed AI bots as of 2025. If you are concerned about lesser-known crawlers, you can add a general Disallow rule in your robots.txt for specific User-agents as they appear. Re-running this tool periodically keeps you informed about the major crawlers, and we update the bot list as new significant crawlers emerge.
Can I allow an AI bot for some pages but block it for others?
Yes. Robots.txt supports path-specific rules. For example, you can block GPTBot from /blogs/ (protecting your written content) while allowing it to crawl /products/ (keeping product pages accessible for AI search). This granular approach lets you protect your most valuable original content while maintaining visibility where it benefits your business.
What happens if I block a bot that was already crawling my site?
The bot will stop crawling new pages and stop refreshing its index of your existing pages. However, content that was already crawled before you added the block may remain in the bot's training data or search index for some time. Robots.txt prevents future crawling, not retroactive data deletion. For data removal requests, you would need to contact each AI company directly.
Is there a difference between Disallow: / and Disallow: /products/?
Yes. Disallow: / blocks the bot from accessing any page on your entire site. Disallow: /products/ only blocks access to URLs under the /products/ path, leaving your homepage, collection pages, blog posts, and other pages accessible. Path-specific blocking gives you fine-grained control over what AI bots can and cannot access.
How do AI search bots differ from traditional search crawlers like Googlebot?
Traditional crawlers like Googlebot index your pages and show them as blue links in search results. AI search bots like PerplexityBot and OAI-SearchBot index your pages to generate AI-powered responses that may include your products in conversational search results, comparison tables, or shopping recommendations. The display format and user experience are different, but both types drive potential customers to your site through referral traffic.