Is Google’s Large Language Model Using Your Website Content As Training Data?

Results of a search of websites in Google's C4 dataset shows lireo.com ranks 442,028 with 53k tokens, representing 0.00003% of all tokens.

Remember the post I published in late March 2023, with steps you can take to restrict ChatGPT from using content from your WordPress site?

It may not have worked, if your site’s content was already scraped.

Which it did with this site, lireo.com.

Not what I expected.

Continue reading Is Google’s Large Language Model Using Your Website Content As Training Data?