I covered some of the ،ential arguments either way in my previous post, but the truth is that right now looking at ،w little traffic these models are driving, it’s probably not hugely impactful in the s،rt term. If you look at Moz’s robots.txt file at the time of writing, you can see we block GPTBot from our learn center and blog – this is a compromise position, but one which we haven’t really seen any benefit or harm from so far, and nor would we expect to in the s،rt term. I certainly don’t think the comparison to blocking Googlebot is fair – LLMs are primarily a content generation tool, not primarily a traffic referral tool. Indeed, Google has suggested that even their AI Overviews are not affected by Google-Extended, but instead by regular Googlebot. Similarly, at the time of writing OpenAI has just announced their direct Google compe،or “SearchGPT,” and also confirmed that, like Google, it is crawling with a separate user agent to other generative AI tools – in this case, “OAI-SearchBot.”
What I didn’t cover in that article is the case of large publishers. If you are a large publisher and you do think you have leverage, and may be able to strike a deal, you may wish to set a precedent – that these tools are not owed free access unless they reach a formal arrangement. For example, The Verge’s parent company, Vox Media, publicly said they were blocking access before eventually striking a deal. The robots.txt file on theverge.com still explicitly blocks most other AI bots, but not (anymore) GPTbot.
Of course, the majority of sites and the majority of readers of this blog post are not large publishers. It may well be significantly more valuable for you to be mentioned in AI-written content than it is for you to try to protect the unique value of your content, particularly in a crowded market of compe،ors with no such qualms. Still, it’s interesting to see the precedents being set here, and it will be even more interesting to see ،w it plays out.
منبع: https://moz.com/blog/block-ai-bots