Should WordPress block AI bots by default?

I’ve been thinking a lot about AI recently, there’s definitely a lot of great uses for it and I use ChatGPT quite regularly. Despite it being a useful tool I’m very aware that a lot of content used to train AI models has just been slurped up without any user consent being given.

Microsoft’s AI CEO Mustafa Suleyman said at a conference back in April:

“With respect to content that is already on the open web, the social contract of that content since the 90s has been that it is fair use. Anyone can copy it, recreate with it, reproduce with it. That has been freeware, if you like. That’s been the understanding,”

So his perspective is that content which has been shared publicly on the web is available to be used for AI training by default, unless the publisher specifically says otherwise that it should not be used. I’m pretty sure copyright law disagrees with his take but there you go.

So with this stance in mind I have been giving it some thought and wondering if any consideration had been given to including any AI bot blocking in the standard ‘robots.txt’ file for WordPress? It might seem a little like “closing the gate after the horse has bolted” seeing as so much content has already been consumed, but people are still publishing content, more and more every day.

An AI bot image generated by AI? 1

My perspective is that having AI bots blocked by default in WordPress would be a strong stand against the mass scraping of people’s content for use in AI training without their consent by companies like OpenAI, Perplexity, Google and Apple.

I’m aware that plugins already exist if people wish to block these but this is only useful for people who are aware of the issue and choose to block it, whereas consent should be requested by these companies and given rather than the default being that companies can just presume it’s ok and scrape any websites that don’t specifically say “no”.

Having 43%+ of websites on the internet suddenly say “no” by default seems like a strong message to send out. I realise that robots.txt blocking isn’t going to stop any of the anonymous bots that do it but at least the legitimate companies who intend to honour it will take notice. With the news that OpenAI is switching from being a non-profit organisation to a for-profit company I think a stronger stance is needed on the default permissions for content that is published using WordPress.

So whilst the default would be to block the AI bots there would be a way for people / publishers to allow access to their content by using the same methods currently available to modify ‘robots.txt’ in WordPress, plugins, custom code etc.

That’s my perspective / thought process anyway, I’m curious to see what other’s thoughts are.


  1. The potential irony of using partially AI generated imagery as the main feature image in this particular post is not lost on me. The mass-scraping of images and video is possibly an even bigger issue than content-scraping of websites in regard to mass-copyright violation. ↩︎

Outlook comes to Mac, will it make Microsoft ‘Fix Outlook’?

Microsoft’s Mac Office team recently announced that the next version of Mac Office 2010 will replace the Entourage email client with a purpose-built for Mac version of Outlook. This is interesting as it suggests that Microsoft see the forthcoming Exchange support built into Mac OSX 10.6 as bit of a threat.

The current email client in Mac Office, Entourage, is a poor citizen of OSX 10.5 due to its single database that is not very compatible with OSX 10.5’s Time Machine back up function. So Entourage was at least due for an update, however, bringing Outlook to Mac at least makes things a bit more consistent between Windows and Mac Office suites. I dare say a number of Windows to Mac switchers will be quite happy to see the addition of Outlook.

Will Outlook for Mac’s HTML email support suck like it does on Windows?

With the announcement I think many web designer / developers might likely have the above question in mind! If you’re not in the habit of creating HTML format emails then you may not understand what the problem is, basically since the release of Office 2007 for Windows the rendering of HTML emails in Outlook took a turn for the worse. Outlook 2007 uses Word’s HTML rendering engine to display HTML emails, effectively taking several steps backward in regards to rendering support in modern email clients.

Due to the poor HTML support in Outlook on Windows, developing HTML email newsletters requires using HTML formatting that hasn’t seen the light of day since the heady days of the late 90’s, table based markup, very little support for CSS markup. If you want to read more about this then head over to the Email Standards Project for lots of good information, and in particular this post: “Microsoft to ignore web standards in Outlook 2010 – enough is enough”.

The Mac Office team have indicated they will make Outlook for Mac from the ground up as a true Mac application:

Outlook for Mac is being built from the ground up as a Mac OS X application using Cocoa. It will have a new database that delivers a reliable, high performance, and integrated experience with Mac OS X. Users will be able to back-up with Time Machine and search email, calendar and contacts with Spotlight.

So I’m reading from this that as it’s using Apple’s Cocoa frameworks that it will make use of the WebKit rendering that Cocoa provides to render HTML emails. As such it should have excellent HTML rendering capabilities, which incidentally the current Entourage application already has. So Mac users should have a good experience regardless of whether they use Apple’s Mail or Outlook email clients.

However, there’s a potential problem. The Windows Outlook team have so far stubbornly denied any need to fix the problem with HTML rendering in Outlook on Windows and indeed seem intent to release Outlook 2010 with the exact same rendering support. As the Outlook team seem to think that the Word HTML rendering engine is appropriate, will they mandate that Mac Outlook should render emails exactly the same way that the Outlook for Windows does? Will they make Word the rendering engine for HTML emails in Outlook for Mac?

If they don’t then basically they’ll have a potential lack of interoperability between Outlook on the different platforms, Outlook for Mac will offer good support for HTML email (as it already does with Entourage), and Outlook for Windows will suck. I’m not sure that Microsoft will be happy with that seeing as Office is one of their flagship products and will want to make the experience the same. The question is, if that is the case then which side will yield?

Email Standards are Web Standards

Regardless of your opinion on HTML emails* it’s a big issue, whether you like or dislike HTML format emails the reality is that they’re here to stay, support for web standards and good layout practices should be encouraged, regardless of whether that HTML is rendered in an email client or a web browser. The recent Fix Outlook campaign hopefully sent a strong message to Microsoft about how the development community feel about it. Let’s hope Outlook for Mac doesn’t come with the same support as it’s Windows counterpart and that the discrepancy between their rendering engines forces Microsoft to step up and make Outlook 2010 for Windows include improved support instead of dumbing down Outlook for Mac!

Update:
There’s a new site launched by a couple of employees from Microsoft, “Make Office Better”, it introduces it’s purpose stating:

Hi! We’re two Microsoft employees looking to collect customer ideas on how to improve Microsoft Office. If you’ve got a new feature idea or an idea on how to improve Microsoft Office, please share it here…and vote on other ideas you agree with. Through the magic of crowd-sourcing the best ideas should rise to the top.

What’s great about the ‘magic of crowd-sourcing’ in this case is that the number 1 Office issue that people want to see fixed is “Improve the HTML support in Outlook“. Well, they asked for feedback!

~

* And don’t say “HTML emails suck, everyone should just use plain text emails”, HTML emails will stop being used around the same time that everyone adopts XHTML 2. If you don’t like them then you can always use a mail client that can force the display in plain text!

Evidence of the hidden “features” of Vista and Windows Media Centre?

Way back in February 2007 I wrote a blog post called "Windows Vista: Beneath Aero’s transparency hides some future ‘surprises’" where I pondered some of the features of Vista designed to appease Hollywood’s desire to control how people use media on their computers.

More evidence of these features were revealed recently when users of Windows Media Centre in the US who intended to record an episode of Gladiators found that the recording was blocked because of a broadcast flag known as CGMS-A in the TV signal which WMC understood to indicate it should not be recorded. NBC, who aired the show, said it was mistakenly added and that it wouldn’t happen again. Microsoft have claimed they will work to make sure this doesn’t happen again. However, many people are skeptical about this, in an article on Ars Technica Eric Bangeman wrote:

There is technically no reason why Microsoft should support CGMS-A in Windows Vista and Windows XP MCE, and the screwup is evidence the software giant has decided to align itself with the interests of broadcasters and movie studios rather than those of its customers. Yes, this was a mistake by NBC, but the technology is there for such mistakes to be turned into policy.

The Electronic Frontier Foundation also have a few interesting things to say about why Microsoft have support for broadcast flags that were rejected by the courts:

To be perfectly clear: Microsoft is under no legal obligation to look for and respond in any particular way when it sees the broadcast flag being sent by NBC’s digital stations. Any DTV-receiving software technology or device – like MythTV – is free to take the same stream from HDHomeRun and ignore a broadcast flag transmitted with it. In other words Microsoft did not have to build its PC to look for and refuse to record a program which has its flag turned on.

Had consumers not stood up against the FCC’S mandatory flag rule three years ago, alternatives like MythTV would no longer be available. Back then, the FCC tried to force tech companies (and open source developers) to obey the entertainment industry’s remote TV control. A coalition of librarians, public interest organizations, and consumer groups successfully challenged the FCC’s jurisdiction to impose such a broad regulation in Federal court. After the rightsholders lost in court, they spent millions lobbying Congress to pass a law forcing receivers to obey their command. Your letters and calls stopped that bill.

Interesting.

~Rick

AIR Vs Silverlight? Adobe Vs Microsoft? Open Source Vs Proprietary?

TechCrunch ran an article on 26th February titled ‘Adobe AIR Vs Microsoft Silverlight: It?s All About Numbers‘ which kind of compared them both as being quite similar, but it struck me that it’s really not a fair comparison.

There’s a big difference between AIR and Silverlight at the moment. It’s fair to say Microsoft will push Silverlight forwards quickly but there’s no fair comparison between them just now, it’s far closer to compare Flash and Silverlight for the time being as AIR features a lot more than Silverlight.

Flash and PDF have huge market share and AIR brings those plus regular HTML/CSS/JS web development into one runtime as well making easy cross-platform offline / online application development.

Competition = good

It’s certainly good that there’s some competition in the market but AIR’s incorporation of various open source projects such as Webkit as well as the fact that Adobe have open sourced a lot of their own code such as Flex and Flash Player code will hold a lot of mindshare of developers. While AIR is not 100% open source it’s certainly a lot more attractive on several levels, not least being able to create Apps whether you’re used to HTML/JS, Flash or Flex.

Competition is good, and the fact the MS are developing web development apps to challenge Dreamweaver is a good thing. Dreamweaver is a great program but it needs to keep progressing to provide the tools that developers need.

One aspect that Dreamweaver (Adobe / Macromedia) has done a good job with is support for multiple server platforms such as their own ColdFusion but also PHP and JSP development. I’m not sure we’ll see any of Microsoft’s ‘Expression’ development apps support PHP and JSP any time soon! This multiple server platform support is something Adobe need to keep supporting as it’s definitely one thing that will separate their tools from Microsoft’s offerings.

~Rick

Interesting articles about DRM on Engadget.com

There’s a couple of interesting articles about DRM on Engadget.com. The first :’DRM: The state of disrepair‘ mentions a few of the various views voiced since Steve Jobs’ ‘Thoughts on Music’ post on Apple.com. A really interesting part is a table showing the state of DRM on various physical and digital media.

Chart displaying 'the state of disrepair' of Digital Rights Management schemes

The second article: ‘Microsoft announces another new DRM: PlayReady‘ is about a new DRM scheme introduced by Microsoft which is intended for the mobile device market. It’s intended to bring DRM capabilities not just for their own formats but for other codecs as well such as H.264 or AAC.

So much for Bill Gates’ perspective that DRM has “huge problems“!

~Rick

Windows Vista: Beneath Aero’s transparency hides some future ‘surprises’

It’s well known that Microsoft were way behind schedule with the launch of Windows Vista. The problems with security in Windows XP required Service Pack 2 to be developed which took a huge development effort for Microsoft and slowed development of ‘Longhorn’ (Windows Vista’s development codename). These delays meant that features were dropped by the wayside in order to get it launched. Two features in particular that were apparently dropped were:

  • WinFS – Windows Future Storage
  • NGSCB – Next-Generation Secure Computing Base (formerly Palladium)

WinFS – Windows Future Storage

WinFS is described on Wikipedia as:

…a data storage and management system based on relational databases, developed by Microsoft and first demonstrated in 2003 as an advanced storage subsystem for the Microsoft Windows operating system.
When introduced at the 2003 Professional Developers Conference, WinFS was billed a pillar of the “Longhorn” wave of technologies.

So, an interesting feature dropped for the time being perhaps and one that may be added in a future version of Windows or an interim update.

NGSCB – Next-Generation Secure Computing Base

Wikipedia describes NGSCB as:

…a software architecture designed by Microsoft which is expected to implement parts of the controversial “Trusted Computing” concept on future versions of the Microsoft Windows operating system. NGSCB is part of Microsoft’s Trustworthy Computing initiative. Microsoft’s stated aim for NGSCB is to increase the security and privacy of computer users…

Interestingly if you read through the Wikipedia entry for Windows Vista it talks about how both of these features were dropped:

Faced with ongoing delays and concerns about feature creep, Microsoft announced on August 27, 2004 that it was making significant changes. “Longhorn” development basically started afresh, building on the Windows Server 2003 codebase, and re-incorporating only the features that would be intended for an actual operating system release. Some previously announced features, such as WinFS and NGSCB, were dropped or postponed, and a new software development methodology called the “Security Development Lifecycle” was incorporated in an effort to address concerns with the security of the Windows codebase.

So, on that page it does talk about NGSCB being dropped from Vista, however, back on the NGSCB page under the heading ‘Availability’ it states:

When originally announced, NGSCB was expected to be part of the then next major version of the Windows Operating System, Windows Vista (then known as Longhorn). However, in May 2004, Microsoft was reported to have shelved the NGSCB project [12]. This was quickly denied by Microsoft who released a press release stating that they were instead “revisiting” their plans.

The interesting point is that Microsoft denied it, and for good reason. An important part of the NGSCB, or ‘Palladium’, initiative is alive and well and active in Windows Vista. Known as Protected Media Path or Protected Video Path it is a technology present in Vista that is intended to provide a protected environment for viewing content on PCs. The technology basically provides encryption throughout the hardware components of the system, this prevents any other software or hardware outputs on the system being used to copy the content being viewed, played or read etc. It determines whether the components in a PC can be trusted to play back the content without risking it being copied, hence the other term used in relation to the NGSCB initiative, Trustworthy or Trusted Computing.

WinFS and XP Service Pack 2 were not the only things delaying Vista’s launch

Vista’s original WinFS feature and the development of Windows XP Service Pack 2 might have contributed to delays in the development of Vista, but the inclusion of the Trusted Computing technology surely contributed to a major aspect of the entire codebase of the operating system. It really has been built from the ground up to provide Trusted Platform, a protected, or Digital Rights Managed environment that neatly fits the demands of Hollywood and future digital content such as Blu-ray and HD-DVD disk formats.

What’s ‘Hollywood’ got to do with it?

Everything. Trusted Computing is all to do with protecting or preventing content from being copied that the originators or copyright owners don’t want you to copy. Hollywood, used here as a generic term to represent the movie, tv and large media industries, are driving the whole initiative.

The music industry was caught completely unaware by the digital revolution, the unprotected CD audio format meant it was very easy for people to copy CD’s onto their computer’s hard drive. Couple this with a complete lack of forward thinking by the music industry or provision of easy ways to buy audio tracks online and the end result is a huge surge in file sharing. The Music industry have tried hard to patch up the leaking dam but it has been largely fruitless, the advent of Apple’s iTunes Store brought a great legal alternative but this still didn’t stop overall music sales declining. However, the music industry is still by and large convinced that piracy is the root cause of this decline.

Hollywood, on the other hand, weren’t quite so unaware. VHS movies and DVD disks have come with copy protection methods form quite sometime. the problem was that they could be easily circumvented and it’s not a difficult task to copy a DVD onto your hard disk with any number of freely available pieces of software. So, despite these attempts to protect copying, they have been unsuccessful. What Hollywood were worried about was the possibility that the new Hi-Definition formats such as Blu-ray and HD-DVD would be as easily copied. So in order to prevent this copying we are now entering the era of Trusted Computing, and Hollywood have their hopes pinned on it.

How does Trusted Computing affect me?

Blu-ray and HD-DVD disks will only play at full quality if the equipment it is being played on is guaranteed as a trusted platform, if not you are either going to get a lower-quality version of the content on the disk or perhaps find it can’t be played at all. The reason it may not play back at full quality could be caused by any number of factors in your system, your graphics card, your monitor or your soundcard could be considered ‘untrustworthy’ and therefore limit the experience of content that you have paid for the privilege of watching.

The only way of being sure that you can see the content at full quality is by making sure the components are running software drivers that are certified as trusted by Microsoft, as such upgrading components may be necessary to achieve this. Upgrading to Vista may suddenly seem an even more costly move. Additionally, the demands placed on the system in order to do the additional checks on the various sub-system components add to the overhead placed on the system, it’s not really surprising that Vista requires new hardware in order to run well.
Also, requiring people to upgrade older computers to new ones containing hardware that "plays for sure" with Vista is a great way to make sure all of the pieces of the DRM puzzle fall into place for Microsoft and content producers such as the movie industry.

Your PC may be Vista compatible, but is it Trustworthy?

Your current PC or even your brand new PC may be Vista compatible now, but once the use of Blu-ray, HD-DVD and other forms of Hi-Definition digital content replaces DVDs and becomes the norm will it meet the requirements necessary to be viewed as trusted?

You might just find that you’re suddenly locked out of what you’ve legally purchased until you go and buy the necessary upgrades!

Not just your PC either…

It’s worth noting too that it’s not just PC’s that are affected by this notion of Trustworthiness, all of the new wave of HD TV’s and Blu-ray or HD-DVD players support a similiar system of copy protection that is built into the very hardware itself. If you’re TV is not considered trustworthy you may find the content does not play back at full quality.

That shiny new HD-ready TV you just bought probably provides the same hidden surprise ‘features’ that are lurking behind the transparent clouds of Windows Vista.

Further reading

~Rick

Flash: Can it be a viable alternative to Windows Media DRM for the BBC? [updated]

This post continues my ongoing theme of the last few posts which is in response to the issues raised by the BBC choosing to use Digital Rights Managed Microsoft Windows Media format for their new iPlayer initiative. Please read the previous post ‘Dear BBC…‘ for more background information about it.

Where are the alternative formats?

I wrote in my last post about trying to find alternative formats to use instead of Windows Media DRM that could be used to deliver the BBC iPlayer initiative, I didn’t find any real solutions that could compete. I’ve been looking into it a bit more and I still don’t think I’ve found anything. One thing that comes to mind when thinking about video on the web these days is of course Flash video, sites like YouTube and Google Video have meant a huge upturn in the amount of Flash based video content available. What’s more it’s also incredibly easy for people to get it into Flash format thanks to these sites.

Why not use Flash Video for the iPlayer?

If Flash video is so popular then why doesn’t the BBC use Flash for the iPlayer initiative? A good question, and I’ve found a few answers that give some explanation to the reason. I found a post on the FlashComGuru.com website entitled “‘Why we don’t use Flash (video)’ – The BBC speaks up“, the article and the comments left by various people make for interesting reading. This article in turn references a response from the Editor of the BBC News website Steve Herrman regarding changes to the way audio / video is used in the BBC News website. FlashComGuru highlights that the overwhelming reason not to use Flash for video is simply the cost implications of shifting over to a whole new format and delivery method, particularly due to the invested use of RealPlayer format content.

The reponse from Steve Herrman titled “In response to site changes” contains a technical response as to why they don’t use Flash, one reason is:

“The BBC is trying to make its video available to the widest possible audience. This means that when we choose the formats in which to stream our audio and video clips and live programmes, we have to take account of: All the operating systems in use, and the number of people who use them (this is not just desktop operating systems – we need to take account of mobiles too); whether a player is available for that format on a particular operating system; and whether it is easy to play that video on an operating system.”

These are all good intentions obviously, the BBC has a remit where it has to be available to the widest possible audience and this is clearly stated in the first sentence. However, taken in the context of the iPlayer initiative which locks you into Windows Media DRM format and excludes Mac OSX and Linux OS users then this is obviously not the widest possible audience being catered for! Admittedly the article this quote from is specifically talking about the use of RealPlayer and Windows Media format on the BBC News website, but the remit there is the same as for the rest of the BBC.

So ultimately it is an issue of it being too costly to replace all of the existing infrastructure with a Flash based system due to the previous investment in the RealPlayer over the last 10 years. Now I can appreciate that, obviously the BBC don’t want to go wasting the investment previously made, plus they could be perhaps criticised for wasting Licence payers money too. However, why get into a relationship with Microsoft on this? There’s really no likelyhood that they will ever do much to help the fact that DRM’ed Windows Media content can’t be played on Mac OSX or Linux. I can’t see how this represents any kind of good investment of my Licence fee?

Surely there must be an alternative?

I keep coming back to that question, however, I can’t really find any viable alternatives. However, that is not a reason to let the BBC of the hook here. The best thing I can possibly think of is that this is time for Adobe to step up and take on Microsoft in this area, there’s a long game afoot here which Microsoft are pushing with the BBC. If the BBC can’t just dump the investment into RealPlayer technology overnight then how is it going to be any easier to dump investment into the Windows Media format and it’s DRM?

Calling Adobe! Time to get ‘mobile’…

There are obviously big issues going on here, advocating one commercial companies format over another isn’t necessarily the answer. For some this definitely isn’t the answer, especially with the use of DRM within the files. However, despite the prospect of perhaps seeing music available for purchase DRM free, I don’t think we’ll be seeing this happening as easily or so quickly where video is concerned. With that said I think the best option here is for Adobe to get the Flash video format positioned much better as a viable format to compete Windows Media DRM’ed content.

The previous quote from the BBC above mentions the use of Mobile devices as a target end user of the BBC’s content, yet again Windows Media is no solution here at all either with or without DRM. The point is interesting though because Adobe have just made an announcement at the annual 3GSM conference that version 3 of their Flash Lite mobile platform will support Flash video. This provides a vital piece of the puzzle that the BBC is trying to piece together, and a much more platform friendly method at that.

I think the only technical challenge left to fill in is the provision of a decent DRM scheme to use within Flash video, if Adobe can provide that piece of the puzzle then there’s absolutely no reason for the BBC to use Windows Media DRM and cause thousands of licence payers to be locked out of using a service they are entitled to use.

So, Dear BBC, time to think again…

If…

  • Flash format can work for other TV channels such as ABC around the world,
  • a growing amount of people use non-Microsoft operating systems on their computers,
  • more and more people are looking to access content online via mobile devices

then how can you consider Windows Media DRM a viable solution that is compatible with the remit of the BBC?

Update:

Bruce Schneier has written a great article about the DRM restrictions in Windows Vista, more reasons why a lock-in to Microsoft DRM is a bad choice for the BBC: [Via DaringFireball]

This isn’t even about Microsoft satisfying its Hollywood customers at the expense of those of us paying for the privilege of using Vista. This is about the overwhelming majority of honest users and who owns the distribution channels to them. And while it may have started as a partnership, in the end Microsoft is going to end up locking the movie companies into selling content in its proprietary formats.

I think you can replace the words movie companies at the end there with BBC instead. Microsoft desperately wants to have control of the DRM used in TV / Movie / Video distribution, the control they never managed to gain in the Music industry.

Vista: the longest suicide note in history

There’s an interesting article by Peter Guttman I just saw a link to: A Cost Analysis of Windows Vista Content Protection. It gives a lot of indepth information about Vista’s Content Protection, it’s quite scary reading, the heading above kind of sums it up.

~Rick