AI companies are crawling your Shopify store right now. Some bots scrape your product descriptions and images to train large language models. Others crawl your pages so AI assistants can recommend your products to shoppers. These are two very different activities, and blocking them all with one setting is a mistake.
This guide breaks down which AI bots are visiting your store, what they do with your content, and how to make an informed decision about blocking or allowing each type.
In this post
- What are AI bots?
- Training bots vs search bots
- Common AI bots crawling Shopify stores
- Pros of blocking AI bots
- Cons of blocking AI bots
- How to check your current bot exposure
- Recommended approach
- How to block bots with robots.txt
- Frequently asked questions
- Related reading
What are AI bots?
AI bots are automated crawlers that visit websites to collect content. They work like traditional search engine crawlers (Googlebot, Bingbot) but serve different purposes. Instead of indexing pages for a search engine results page, AI bots either collect training data for language models or gather real-time information for AI-powered search and assistants.
These bots read your product titles, descriptions, prices, images, and blog content. They follow links, parse structured data, and can process thousands of pages per hour. Most identify themselves in the user-agent string, but not all of them do. Some are polite crawlers that respect robots.txt directives. Others ignore them entirely.
Training bots vs search bots
This is the most important distinction. Training bots and search bots have completely different goals, and treating them the same will cost you traffic.
Training bots collect content to train or fine-tune AI models. Your product descriptions become part of a dataset used to teach the model how to write, reason, and respond. Once your content is in the training data, it is used without attribution or compensation. Examples include GPTBot (OpenAI’s training crawler), CCBot (Common Crawl), and Google-Extended (Google’s AI training crawler). These bots take your content and give nothing back.
Search bots collect content to display in AI-powered search results and assistant responses. When someone asks an AI assistant “what is the best waterproof backpack under $100,” the AI might recommend your product – but only if its search bot was allowed to read your product page. Examples include ChatGPT-User (OpenAI’s browsing feature), PerplexityBot, and Applebot-Extended (Apple Intelligence). These bots can send you traffic.
Blocking training bots protects your content. Blocking search bots removes you from a growing discovery channel. That is why a blanket block-all-AI-bots approach backfires.
Common AI bots crawling Shopify stores
Here are the bots you are most likely to see in your server logs:
| Bot name | Company | Purpose | Recommended action |
|---|---|---|---|
| GPTBot | OpenAI | Training data | Block |
| ChatGPT-User | OpenAI | Real-time search/browsing | Allow |
| Google-Extended | AI training (Gemini) | Block | |
| Googlebot | Search indexing | Allow (always) | |
| PerplexityBot | Perplexity | AI search results | Allow |
| Applebot-Extended | Apple | Apple Intelligence | Allow |
| CCBot | Common Crawl | Training data | Block |
| ClaudeBot | Anthropic | Training data | Block |
| Bytespider | ByteDance | Training data | Block |
| cohere-ai | Cohere | Training data | Block |
This table gives you a starting point. The bot landscape changes fast. New crawlers appear regularly, and existing ones change their behavior. Use the AI Bot Checker tool to see which bots your store currently allows or blocks.
Pros of blocking AI bots
Protect your original content. If you write unique product descriptions, blog posts, or buying guides, training bots can absorb that content into their models. Your competitors might eventually get AI-generated descriptions that sound like yours because the model learned from your work.
Reduce server load. AI crawlers can be aggressive. Some send hundreds of requests per minute, which uses server resources and can slow down your store for real shoppers. Shopify handles this at the infrastructure level, but aggressive crawling still affects your bandwidth allocation.
Protect product data. Your pricing, inventory descriptions, and product photography are business assets. Training bots that scrape this data contribute to AI tools that could help competitors generate similar content at scale.
Cons of blocking AI bots
Lost visibility in AI search. AI-powered search is growing. ChatGPT, Perplexity, and Google AI Overviews all pull from web content. If you block the bots that power these features, your products will not appear when shoppers ask AI assistants for recommendations. This is a real cost that grows as more consumers shift from traditional search to AI assistants.
No backlinks or citations. Some AI search tools cite sources and link back to the original page. If you block their crawlers, you lose those citations. Perplexity, for example, shows source links for every answer. Those links drive traffic.
Enforcement is imperfect. Robots.txt is a polite request, not a firewall. Some bots ignore it. Others spoof their user-agent to look like regular browsers. Blocking well-behaved bots while rogue ones get through is a frustrating outcome. Blocking in robots.txt works best against reputable companies that follow the rules.
How to check your current bot exposure
Before making changes, find out where you stand. Two tools help:
AI Bot Checker – Enter your store URL and the tool analyzes your current robots.txt file to show which AI bots you are currently blocking and allowing. It flags bots you might want to reconsider and shows your overall exposure level.
AI Readiness Checker on CraftShift – A broader assessment that looks at your store’s readiness for AI-powered search. It checks structured data, content quality, and bot access together. Useful for understanding the full picture beyond just robots.txt.
If you have not touched your robots.txt, Shopify’s default configuration allows most bots. That means training bots have been reading your entire catalog. This is the default for most Shopify stores.
Recommended approach
Here is the approach that works for most Shopify stores:
- Block training bots. GPTBot, Google-Extended, CCBot, ClaudeBot, Bytespider, cohere-ai. These take your content for model training and give nothing back. Block them.
- Allow search bots. ChatGPT-User, PerplexityBot, Applebot-Extended. These power AI search features that can recommend your products to shoppers. Keep them allowed.
- Never block Googlebot or Bingbot. These are traditional search engines. Blocking them removes you from Google and Bing search results entirely. This is not about AI – it is basic SEO survival.
- Review quarterly. New bots appear, existing bots change behavior, and the AI search landscape shifts. Check your robots.txt every few months and update as needed.
This balanced approach protects your content from being used as free training data while keeping your store visible in the growing AI search channel. It is not all-or-nothing. Be selective.
How to block bots with robots.txt
On Shopify, you can customize your robots.txt file through your theme’s robots.txt.liquid template. Here is a basic example that blocks training bots while allowing search bots:
User-agent: GPTBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: cohere-ai
Disallow: /
Do not add “Disallow: /” to the default user-agent section, or you will block all crawlers including Google. Each bot gets its own block.
If you are not comfortable editing template files, the Robots.txt Generator on CraftShift can create the right directives for you. Select which bots to block, and it generates the code you need to paste into your robots.txt.liquid file.
Keep in mind that robots.txt changes can take days to weeks to take effect. Bots re-crawl your robots.txt on their own schedule. There is no way to force an immediate update.
How this connects to your broader SEO strategy
AI bot management is one piece of your store’s SEO picture. The same structured data that helps Google understand your products also helps AI search bots give accurate recommendations. Clean URLs, proper meta titles, and well-organized product data all feed into both traditional and AI search.
If you are using combined listings to give each product variation its own URL, you already have more pages for AI search bots to discover and recommend. Each separate product page is an opportunity for an AI assistant to recommend a specific color, size, or material to a shopper. This is one of the hidden benefits of the separate products approach – more surface area for both Google and AI discovery.
For stores tracking how visitors interact with product options, the swatch click tracking guide covers setting up Google Analytics events for swatch interactions.
Frequently asked questions
Should I block all AI bots on my Shopify store?
No. Block training bots (GPTBot, Google-Extended, CCBot) that scrape content for model training. Allow search bots (ChatGPT-User, PerplexityBot, Applebot-Extended) that power AI search features. A selective approach protects your content while keeping you visible in AI-powered discovery.
What is the difference between AI training bots and AI search bots?
Training bots collect your content to train language models. Your data becomes part of the model and is used without attribution. Search bots collect content to display in AI-powered search results, often with a link back to your store. Training bots take. Search bots can give.
How do I edit robots.txt on Shopify?
Go to Online Store > Themes > Actions > Edit code, then find or create the robots.txt.liquid file. Add user-agent and disallow directives for each bot you want to block. You can also use a robots.txt generator tool to create the right code.
Will blocking AI bots affect my Google rankings?
Not if you only block AI-specific bots. Never block Googlebot. Google-Extended is a separate bot used only for AI training (Gemini). Blocking Google-Extended does not affect your Google Search rankings.
How can I check which AI bots are crawling my store?
Use the AI Bot Checker to analyze your current robots.txt and see which bots you allow or block. For a broader assessment, try the AI Readiness Checker on CraftShift.
Do AI bots actually respect robots.txt?
Major companies (OpenAI, Google, Anthropic, Apple) respect robots.txt directives. Smaller or less reputable crawlers may not. Robots.txt is a convention, not a technical barrier. It works best against bots from established organizations.