Legal Concerns: Copyright and Artificial Intelligence

Gold-colored scales and a dark wooden gavel sit on a desk, blurred objects in the background.

My post last week, about Google’s large language model using your website content for training data, captured a lot of attention.

The post highlighted the Washington Post story revealing the millions of sites Google’s data set had already scraped.

It was my most visited post in the past four months.

And shared my frustrations and concerns about copyright.

I’m not alone.

Continue reading Legal Concerns: Copyright and Artificial Intelligence

Is Google’s Large Language Model Using Your Website Content As Training Data?

Results of a search of websites in Google's C4 dataset shows lireo.com ranks 442,028 with 53k tokens, representing 0.00003% of all tokens.

Remember the post I published in late March 2023, with steps you can take to restrict ChatGPT from using content from your WordPress site?

It may not have worked, if your site’s content was already scraped.

Which it did with this site, lireo.com.

Not what I expected.

Continue reading Is Google’s Large Language Model Using Your Website Content As Training Data?

Quick Tip: Block OpenAI from Using Content From Your WordPress Site

ChatGPT-User documentation on how to block ChatGPT using a website's robots.txt file.

Over the past few months, ChatGPT and other artificial intelligence (AI) bots have captured the attention of many people on the web. Whatever your thoughts on AI bots, you may want to take action on your own website to block ChatGPT from crawling, indexing, and using your website content and data. Thanks to Barry Schwartz’s… Continue reading Quick Tip: Block OpenAI from Using Content From Your WordPress Site