Rumors about Automattic, the company that owns WordPress.com and Tumblr, making deals with AI (Artificial Intelligence) companies to provide training data scraped from users’ posts was in the news yesterday as 404Media and The Verge reported possible deals.
As you might imagine, many WordPress users were worried about what the rumors might mean for their content and websites.
Quickly after the news stories were posted, WordPress.com published More Control Over the Content You Share (Wayback Machine archive) to describe how they’re engaging with AI companies and a new setting option to control content sharing.
Here are my current thoughts about the stories and WordPress.com.
-
First off, some people are confused about WordPress and WordPress.com.
This has been an ongoing issue for years; learn about the difference between WordPress.com and self-hosted WordPress.
The news stories this week about AI and the potential deals are about WordPress.com, the hosted version of WordPress owned by Automattic.
Not self-hosted WordPress.
-
I have concerns with WordPress.com’s announcement post and their approach to the new settings option to “control your content.”
With the new setting, users can opt-out from having their content gathered by third party platforms.
Which means users are automatically opted-in.
If WordPress.com was truly concerned about users having control over their content, the opt-in setting would not be the default.
-
And then there’s the exception for AI crawlers WordPress.com partners with.
From their blog post:
“We already discourage AI crawlers from gathering content from WordPress.com and will continue to do so, save for those with which we partner.”
That disagrees with the explanation in the WordPress.com settings option for preventing third-party sharing of your site:
This option will prevent this site’s content from being shared with our licensed network of content and research partners, including those that train AI models.
I’m confused, which is correct?
The blog post says “save for those with which we partner” and the toggle setting says “including those that train AI models.”
-
In the Public section of the WordPress.com Privacy Settings page (Wayback Machine archive), note that preventing third-party sharing will also prevent your content from displaying in the WordPress.com Reader.
Using this option also means your blog posts will not appear in the WordPress.com Reader.
That’s not good.While not as popular as it was in the past, many people use the WordPress.com Reader to read blog posts.February 28, 2024 2:03pm update: the online documentation was incorrect. The reference to content not displaying in WordPress.com Reader has been deleted as of early this afternoon.
Thanks to Jen T. and Donncha Ó Caoimh for asking and getting a resolution to this issue.
-
Did you notice comments are turned off for the More Control Over the Content You Share post?
Yes, I noticed it, too.
That’s a deliberate choice.
WordPress.com doesn’t want to hear what people have to say about their decision to opt-in users automatically to have user data scraped by AI.
How the News Affected Me
Personally, the news has me thinking it may be time to consider a move away from WordPress.
I use both Jetpack and Akismet on this site, two plugins owned and developed by Automattic.
While I can’t imagine how AI scraping could impact spam filtering by Akismet, Jetpack is a different story.
Its free Enhanced Distribution feature is activated by default and feeds your content automatically into the WordPress.com firehose.
You can turn Enhanced Distribution off in Jetpack > Settings and selecting the Modules link at the page bottom, but I wonder how many Jetpack users realize that.
It’s possible I’m misunderstanding the WordPress.com Firehose.
February 28, 2024 2:15pm update: And I was correct, I was misunderstanding the Firehose.
Earlier this afternoon, I posted on Mastodon my question about the Enhanced Distribution module and the Firehose.
And received a quick reply and explanation from Brandon Kraft, director of Engineering who works on the Jetpack plugin.
Kraft explained:
- The Enhanced Distribution module doesn’t include content in the AI data sharing.
- The Firehose is a different data stream. Firehose is more of RSS on steroids and the sharing is based on stored data (from non-Jetpack sites).
- Jetpack sites are specifically excluded from the queries for the AI setup.
My first thoughts: I’m taking a look at Textpattern, the first open-source content management system I used.
It was released in 2003.
I believe I started using Textpattern in 2004. Yes, that’s 20 years ago. And it’s still maintained and developed.
Summary
For WordPress.com users, the implementation and announcement of the default opt-in setting for AI to scrape user data is a confusing mess.
The announcement post for the new setting option would be more accurate if it was titled “Losing Control Over the Content You Share.”
Whatever trust users had in their relationship with WordPress.com is broken.
An excellent summary and I’m glad that the Reader issue was cleared up with Donncha’s help. Still, so many questions!
Hi Jen,
I’m glad you asked the question and Donncha was able to contact someone so quickly. I suspect many people have been wondering about that Reader comment, but didn’t have an option (since the blog post had comments turned off) to contact someone who could follow up.
Yes, many questions!