Fighting bots is fighting humans.
One advantage to working on freely-licensed projects for over a decade is that I was forced to grapple with this decision far before mass scraping for AI training.
Fighting bots is fighting humans.
One advantage to working on freely-licensed projects for over a decade is that I was forced to grapple with this decision far before mass scraping for AI training.
In my personal view, option 1 is almost strictly better. Option 2 is never as simple as "only allow actual human beings access" because determining who's a human is hard. In practice, it means putting a barrier in front of the website that makes it harder for EVERYONE to access it: gathering personal data, CAPTCHAs, paywalls, etc.
http://mollywhite.net/micro/entry/fighting-bots-is-fighting-humans
@molly0xfff I’ve got a blog post in the works about this but there’s a third option we should be considering:
Making it more expensive to download and parse by rejecting the corporate-friendly facism of minimalism and plaintext and re-embracing the creative possibilities of the multimedia web.
If a blog post was the same rough size as a YouTube video and required a scraper to understand a complex css layout and rich interactive context the scraping difficulty and the possibilities for unique, luxurious creation go through the roof in a way that cannot be replicated or co-opted by corps. And if every website was 1000x bigger, the scraping costs go up at least 1000x.
I say it’s time to discard the markdown web and for the dawn of the indie baroque.
@leon as with trying to block bots, there are tradeoffs. in your case, people with limited bandwith or low-end devices might also suffer
@molly0xfff with YouTube and TikTok so massive, I don’t believe we need to be anywhere near as concerned with bandwidth as we were when flash intros stalked the earth.
And if a blog post isn’t worth a 20 second load, is it worth a 30 minute read?
@molly0xfff @leon ...if your multimedia website is hard to parse by bots it will also be hard for accessibility tools that for instance vision impaired users rely on.