AI Bots: Disallow

I wrote a post recently “Should WordPress block AI bots by default?” with some thoughts about whether WordPress should be blocking AI bots via the robots.txt file by default.

Since writing that I decided that rather than just talking about it I should go ahead and submit some updated code to the WordPress project that does exactly that. I’ve done WordPress development for 14+ years, whilst I’ve created my own plugins and added them to the WordPress plugin repository I’ve never submitted anything to the core codebase before, so it was an interesting process to go through to get a bit of experience of that.

I’m not going through the various steps in detail to do this, but basically it involves forking the WordPress codebase on Github, making the changes in a local development environment, pushing some code to Github and making a Pull Request for those changes.

Whilst the code change is pushed to Github you also need to make a ticket in WordPress Trac ticketing system that is used to track code issues like bugs, updates and feature requests. I created a new Trac ticket for the PR but as it turns out a similar idea had been previously suggested in this Trac ticket so mine has been marked as duplicate to this original one.

This original ticket has some good ideas in it, although no code has been written so I’m glad to have submitted a PR along with it. I do also think my argument for this are a bit more forceful in my ticket compared to the original, I really do think this should be added. However, I am approaching this from the perspective of trying to create some discussion around this, so I don’t at all expect that the code in my PR is exactly the way this feature should work. In the original Trac ticket the suggestion is to have another checkbox in the “Reading” options in WordPress, “Discourage Al services from indexing this site” which I think makes perfect sense.

I did wonder whether there should be any specific way to manage the list of AI Bots though, whilst the “discourage search engines…” option is similar there is a difference. In the ‘robots.txt’ file it only takes a couple of lines to block all search engine user agents:

User-agent: *
Disallow: /

So if you wanted to block all search engines and AI bots you could use just those couple of lines, but presuming you still want search engines to index your site1 you need to specifically list all of the AI bot user agents to be blocked, something like this should block most known AI bots (at the time of writing in October 2024 anyway):

User-agent: AI2Bot
User-agent: Ai2Bot-Dolma
User-agent: Amazonbot
User-agent: anthropic-ai
User-agent: AlphaAI
User-agent: Applebot
User-agent: Applebot-Extended
User-agent: Bytespider
User-agent: CCBot
User-agent: ChatGPT-User
User-agent: Claude-Web
User-agent: ClaudeBot
User-agent: cohere-ai
User-agent: Diffbot
User-agent: FacebookBot
User-agent: facebookexternalhit
User-agent: FriendlyCrawler
User-agent: GPTBot
User-agent: Google-Extended
User-agent: GoogleOther
User-agent: GoogleOther-Image
User-agent: GoogleOther-Video
User-agent: iaskspider/2.0
User-agent: ICC-Crawler
User-agent: ISSCyberRiskCrawler
User-agent: ImagesiftBot
User-agent: img2dataset
User-agent: Kangaroo Bot
User-agent: Meta-ExternalAgent
User-agent: Meta-ExternalFetcher
User-agent: OAI-SearchBot
User-agent: omgili
User-agent: omgilibot
User-agent: PerplexityBot
User-agent: PetalBot
User-agent: Scrapy
User-agent: Sidetrade indexer bot
User-agent: Timpibot
User-agent: VelenPublicWebCrawler
User-agent: Webzio-Extended
User-agent: YouBot
Disallow: /
2

It’s possible users might want to allow certain ones, and disallow others so the original Trac ticket also suggests that this list could be filterable so that plugins etc could modify this list.

I don’t think adding any kind of UI beyond the checkbox to core would be desirable as it’s exactly the kind of extension of functionality that plugins are intended for. The basic feature of blocking AI bots will work and if users need more they can find a plugin or write their own code to do what they need. One consideration is whether this list of default AI bots should get updated outwith the regular core WordPress development cycle, but the amount of new AI bots appearing probably(?) isn’t that frequent and there are fairly common interim point updates in the WordPress development cycle that would allow this block list to be updated.

If you’re reading this and think it’s an enhancement worth supporting then please do leave a comment on the original Trac ticket if you can, or reshare this post anywhere you think might help draw attention to it.


  1. I acknowledge there is a lot of discussion about whether blocking AI bots will one day have the same impact that blocking search engines from your site does now in that you basically won’t show in any search engine results. The intention of blocking AI bots by default is so that users can make an informed choice about how their content is used. ↩︎
  2. These are the droids we are looking for? ↩︎

Cross-platform, cross-browser website testing with BrowserStack

Like many web design / developers I’ve made use of virtualisation applications like VirtualBox, Parallels Desktop and VMWare Fusion for Mac in order to test websites in the various versions of Internet Explorer. Using these apps requires buying the relevant Windows licences for the various virtual machines and also the overhead of keeping these current with the latest OS updates and browser / plugin updates too.

However, I recently did a fresh install of OSX on my Mac and decided just to remove all of the virtual machines due to the amount of space they used and had every intention of installing them all fresh and continuing to work that way. But due to project demands at work I had no time to do it and decided to look around for alternative ways to do some testing as I needed to do it for a project. I had previously used Adobe’s BrowserLab tool for quick static testing for layout issues in browsers but I needed something that let me browse sites and actually interact with the pages, and that’s where BrowserStack fits the bill perfectly.

How does BrowserStack work?

BrowserStack lets you connect to browsers running in virtual machines but directly through your browser, a bit like connecting to a machine via remote desktop. There are basically three steps to testing a site:

1.) Select the OS version you want:

BrowserStack's Operating System choices menu

2.) Choose from the available web browsers from that platform:

BrowserStack's Browser choices menu

3.) Enter the url of the site you want to test and hit the "Start testing" button:

BrowserStack's URL entry fields screen

The connection is then made to the virtual machine and rendered via the Flash plugin in your browser allowing you to interact with the site remotely, you can then easily choose from different OS and / or browser versions and then hit the "Update" button in the left hand menu and it will automatically grab the current url you are browsing and open that using your desired selection.

This how the site is viewed within BrowserStack:

Preview of site within BrowserStack

Overall it’s just a really easy to use system and lets you switch between different OS / browser variations much more quickly and with much less system overhead than using locally installed virtual machines.

It’s worth weighing up and pointing out some of the pros and cons of BrowserStack, there are few issues that might still cause you to choose running local virtual machines instead:

Pros:

  • You can run any OS you want on there including Win XP, Win 7, Win 8, OSX 10.6 Snow Leopard, OSX 10.7 Lion, OSX 10.8 Mountain Lion as well as mobile tablet OS / Browsers iOS, Android and Opera.
  • Quick to load up OS / Browser options.
  • Low overhead on your computer compared to running one or more VMs natively.
  • Developer tools such as FireBug installed on all browsers.
  • No new OS licences to purchase.
  • No OS updates or browser update hassles.
  • Low cost (from $19 per month for single user)

Cons

  • No Linux OS / Browser options.
  • You only get one browser option at a time, so if you’re using Win 7 with IE9 and want to test Firefox you need to select Firefox from the browser list and hit "Update" to initialise a fresh connection with only that browser.
  • Slow refresh rate for moving / animated content so it’s not great if you want to preview how well videos or animations run on your desired test platform. This is probably the biggest reason you may wish to test on local VMs instead.

So, that’s a basic overview of BrowserStack, but the best thing to do of course is try it yourself using the free trial which gives you 30 minutes (non-consecutive) to try out the full system.

Remotely debugging mobile devices: Remote web inspection in Safari and iOS6

It doesn’t take a genius to note that mobile devices are pretty much overtaking the web, and that a huge amount of people – the majority depending on the statistics you pay attention to – are accessing the web via a mobile device such as an iPhone, iPad or other smartphone / tablet.

As such there has been a huge buzz about responsive design and how to make sites adapt well between a range of screen sizes and resolutions, and moving away from the concept of a fixed size of screen such as the ubiquitous 960 pixel grid framework. One of the biggest challenges in this new era of web design and development has been the lack of good tools to aid you in the process of creating responsive, adaptive websites.

Media Queries and Responsive Design

The ability to work in code and create media queries to handle various device widths and related styles has been possible for a while, but it can be a bit mind-melting trying to keep track of all of this and to test as you work through the development process. Fortunately we are now beginning to see a range of tools to help you develop responsive sites, one of the most recent being Adobe’s new Edge Reflow tool which is an app that lets you visually adjust the viewport and tweak the CSS of various media queries. It’s a simple, focused app that lets you resize the viewport and tweak the styles as you go. At the time of writing it hasn’t been released yet but when I get a chance I will definitely be checking it out and writing something about it.

Remotely Debugging Code on Mobile Devices

Another challenge with working with mobile devices is that you really need to test on actual mobile devices to get an understanding of the true behaviour of them. Although you can set the width of your desktop browser to be the same as that of an iPhone it won’t necessarily behave exactly the same way due to the differences in the way the browsers handle CSS or JavaScript.

One of the difficulties in testing on devices themselves is that it’s not very easy to debug when things don’t work as expected. On a desktop browser such as Safari you can use the Web Inspector to see the live code as you interact with it and also see any JavaScript errors that are triggered, but on a mobile browser there is often little available to help you detect the errors.

Thankfully there are now tools being developed to allow remote access to the code running on the device itself, Adobe developed a tool codenamed ‘Shadow’ (now formally released as Edge Inspect) which works by providing apps for various mobile devices such as iPhone, iPad, Android phones and tablets.

With these apps installed on your devices you then run a desktop app on your computer as well as an extension in Google Chrome. You can then view websites in Chrome and they will be simultaneously displayed on the mobile devices running the apps, but the key feature is that you can remotely inspect the HTML, CSS and JavaScript running in the app on those devices. A really excellent tool.

Remotely Debugging in Safari

Adobe Edge Inspect (formerly Shadow) is a great tool, but what if, like me, you prefer to work in desktop Safari as your main browser and don’t want to or can’t use Chrome to test sites? In case anyone thinks this is just down to a matter of personal preference of browser I can give a legitimate example of why having to use Chrome can be a problem – Chrome’s in-built support for Flash gets in the way of testing content that is intended as fallback or alternative content on the desktop.

Screenshot of Web Inspector settings in iOS6 on an iPhoneFortunately the recent release of iOS6 offers a new feature that enables remote web inspection of mobile Safari on iPhone or iPad.

To make use of it you need to go into the ‘Settings’ app on your iOS device, and drill down into ‘Safari->Advanced’ where you’ll find a new toggle button for ‘Web inspector’, (this replaces the old ‘Debug’ option in mobile Safari which really offered little functionality).

Switch this on and you’ll see a small paragraph of text appearing which explains that you need to connect your device to your computer with a cable for this feature to work.

Once you’ve enabled Web Inspector on your iOS device(s) then you should find them listed in the Develop menu in Safari on your computer, it should looks something like this:

You can select the device and then it opens up a menu showing the available applications that web inspector can open. Note that you need to have mobile Safari open on your iOS device for any sites to be listed in the menu, if they’re not open then you’ll get a message saying ‘No Inspectable Applications’.

Once you select a site from the menu then the familiar Web Inspector window in Safari on your computer will open, the difference is that you are seeing the HTML, CSS and JavaScript from your iOS device. You can then browse around and interact with the site on your iOS device and inspect all of the changes occurring right in Web Inspector. Here’s a view of the HTML from a site on my iPhone:

Debugging via the console remotely

Just as with the ‘regular’ web inspector you can interact, view and update HTML and CSS and then see these temporary tweaks appear right on your iOS device. The main benefit in debugging for me has been in dealing with JavaScript / jQuery code, I can make use of console.log messages and debug via the console just as I would when working on my computer:

In a recent jQuery mobile based site I was developing I encountered code that was failing in mobile Safari but working fine in Safari on my Mac – exactly the kind of situation I mentioned earlier in this article where code is handled differently in mobile Safari. But thanks to this new remote web inspector functionality I was able to easily add some console messages and figure out what was going on and adjust the code to work around the problem.

A Great Solution for iOS Web Development and Testing

iOS6’s remote web inspection functionality is definitely a huge improvement if you are making sites that you need to test on iOS devices. With the increase in Android-based devices such as Google’s Nexus 7 and Amazon’s Kindle Fire tablets you will of course need to test on other devices besides iPhones and iPads. So tools like Adobe’s Edge Inspect are definitely something you will need to make use of too for testing across the various platforms and devices, but the simplicity of this iOS-specific testing workflow is very easy to set up and work with. A definite two-thumbs up from me!

I’m going to take a look at some of the other tools available to aid in the contemporary web development workflow of responsive, mobile-friendly design and write some more posts about them soon. In particular an updated look at Adobe’s Edge Animate tool and also a look at the Edge Reflow tool once it has been released.

How to install an SSL certificate on Plesk 9.5.4 (without going completely nuts)

I just had to set up an SSL certificate for a client site and encountered really poor documentation trying to get it up and running. Whether the documentation was just presuming some other prior knowledge that I didn’t have I’m not sure, but I thought I’d document what I did to resolve it. If nothing else I can come back to this myself in future to remind myself but perhaps it will be of use to someone else.

Server configuration & SSL Certificate:

My server is a VPS running Plesk 9.5.4, the SSL certificate I was installing in this case was a GeoTrust QuickSSL Premium certificate. I don’t think there will be too much difference with other certificate providers, however the installation process in Plesk is quite specific so these instructions won’t necessarily be that helpful if you’re not using Plesk.

Instructions

The instructions provided for installing the certificate can be found in a PDF document linked here, simplified they basically say this:

  1. Go to your domain.
  2. Under ‘Additional Tools’, click ‘SSL Certificates’, add a certificate entry.
  3. Edit the certificate entry you just made, browse to the supplied certificate file in the ‘Certificate’ file field.
  4. Go to ‘Web Hosting Settings’ for your domain, select the certificate from the drop down, enable SSL support on the domain, click OK.

So, following those steps should get you up and running with SSL on your domain. However, one small difference for me was that the supplied SSL Certificate didn’t come as a text file, it was simply in the contents of an email so I had to carefully copy and paste this into a text field on the certificate page instead. Not a huge deal, but given that you have to be careful to paste all of it in it’s odd that it’s not mentioned in these instructions.

Not so fast…

Following the instructions before seemed to work at first, I tested it in various browsers and the certificate seemed to be accepted fine. Except in Firefox on Mac when I tried to access a secure page it showed me the following message:

The certificate is not trusted because no issuer chain was provided.
[Error code: sec_error_unknown_issuer]

A bit of Google searching came up with some answers, it seems that Firefox (on Mac at least) objects to there being no Trusted Root and Intermediate CA certificates. Who knew??!!! Certainly not me as there was no mention of this in any of the instructions provided!

Installing Trusted Root and Intermediate CA Certificates

GeoTrust have a page with installation instructions on how to install an SSL certificate on Plesk, funnily enough they state that you require the Trusted Root and Intermediate CA certificates (Why wasn’t this included in the previous PDF instructions? Who knows…). Here’s the link to this page (I know it refers to Plesk 9.2 but it worked fine for me on Plesk 9.5.4):

GeoTrust Support: Install certificate on Plesk 9.2

Follow the instructions on that page and you’ll be able to get your SSL certificate up and running. It’s important to make sure you download the right version of Trusted Root and Intermediate CA for the type of SSL Certificate you’ve installed, so double-check which one you’re getting.

The only issue I encountered following those instructions was that when I submitted all the files to add the certificates that it showed the following warning in Plesk:

Warning: the CA certificate does not sign the certificate.

However, I found that if you go out of the certificate editing view in Plesk and go back in that the error was no longer shown. I went and tested the page in Firefox and it was now working correctly so I’m just going to ignore that error as it seems to only be a temporary issue.

I hope this helps someone out, it took me way, way longer than anticipated to get this all set up, but hopefully it’ll be easier next time around :)

The Highland Fling 2011 Web Conference – Back to Basics

The Highland Fling 2011 Web Conference takes place on Friday 8th July at Symposium Hall, part of the Royal College of Surgeons in Edinburgh:

It’s great to have an event like this in Scotland for once so don’t miss out on the opportunity to go along!

The Highland Fling prides itself on exciting speakers and valuable topics that we can all learn from. Sound good? Whether you’re a front-end whizz or a project manager wanting to find out a bit more about what your team could be getting up to, The Highland Fling 2011 – Back to Basics will take you on a journey through modern web development that you can’t afford to miss.

There’s a great line up of speakers and will be hosted by Christian Heilmann of Mozilla who is also a great speaker.

There’s a special last minute discount deal, just enter the discount code LASTCALL to receive 10% off the standard ticket price.

You can book tickets online at:

http://thehighlandfling2011.eventbrite.com

Here’s the list of speakers and what they’re talking about:

  • Steve Marshal – Why Simple Isn’t
  • Jack Osborne – Lifting the lid on HTML5
  • Mike Rundle – From Websites To Apps: The "Apple Look"
  • Remy Sharp – Interaction Implementation
  • Rachel Andrew – Choosing the right Content Management System
  • James Edwards – Accessibility in modern interfaces

For more details about the conference please visit

http://thehighlandfling.com

Don’t miss it!