Legal Concerns: Copyright and Artificial Intelligence

Gold-colored scales and a dark wooden gavel sit on a desk, blurred objects in the background.

My post last week, about Google’s large language model using your website content for training data, captured a lot of attention.

The post highlighted the Washington Post story revealing the millions of sites Google’s data set had already scraped.

It was my most visited post in the past four months.

And shared my frustrations and concerns about copyright.

I’m not alone.

Continue reading Legal Concerns: Copyright and Artificial Intelligence

Is Google’s Large Language Model Using Your Website Content As Training Data?

Results of a search of websites in Google's C4 dataset shows lireo.com ranks 442,028 with 53k tokens, representing 0.00003% of all tokens.

Remember the post I published in late March 2023, with steps you can take to restrict ChatGPT from using content from your WordPress site?

It may not have worked, if your site’s content was already scraped.

Which it did with this site, lireo.com.

Not what I expected.

Continue reading Is Google’s Large Language Model Using Your Website Content As Training Data?