I’ve been thinking a lot about AI recently, there’s definitely a lot of great uses for it and I use ChatGPT quite regularly. Despite it being a useful tool I’m very aware that a lot of content used to train AI models has just been slurped up without any user consent being given.
Microsoft’s AI CEO Mustafa Suleyman said at a conference back in April:
“With respect to content that is already on the open web, the social contract of that content since the 90s has been that it is fair use. Anyone can copy it, recreate with it, reproduce with it. That has been freeware, if you like. That’s been the understanding,”
So his perspective is that content which has been shared publicly on the web is available to be used for AI training by default, unless the publisher specifically says otherwise that it should not be used. I’m pretty sure copyright law disagrees with his take but there you go.
So with this stance in mind I have been giving it some thought and wondering if any consideration had been given to including any AI bot blocking in the standard ‘robots.txt’ file for WordPress? It might seem a little like “closing the gate after the horse has bolted” seeing as so much content has already been consumed, but people are still publishing content, more and more every day.
My perspective is that having AI bots blocked by default in WordPress would be a strong stand against the mass scraping of people’s content for use in AI training without their consent by companies like OpenAI, Perplexity, Google and Apple.
I’m aware that plugins already exist if people wish to block these but this is only useful for people who are aware of the issue and choose to block it, whereas consent should be requested by these companies and given rather than the default being that companies can just presume it’s ok and scrape any websites that don’t specifically say “no”.
Having 43%+ of websites on the internet suddenly say “no” by default seems like a strong message to send out. I realise that robots.txt blocking isn’t going to stop any of the anonymous bots that do it but at least the legitimate companies who intend to honour it will take notice. With the news that OpenAI is switching from being a non-profit organisation to a for-profit company I think a stronger stance is needed on the default permissions for content that is published using WordPress.
So whilst the default would be to block the AI bots there would be a way for people / publishers to allow access to their content by using the same methods currently available to modify ‘robots.txt’ in WordPress, plugins, custom code etc.
That’s my perspective / thought process anyway, I’m curious to see what other’s thoughts are.
- The potential irony of using partially AI generated imagery as the main feature image in this particular post is not lost on me. The mass-scraping of images and video is possibly an even bigger issue than content-scraping of websites in regard to mass-copyright violation. ↩︎
Mentions