Molly White (@molly0xfff@hachyderm.io)

311d

Fighting bots is fighting humans.

One advantage to working on freely-licensed projects for over a decade is that I was forced to grapple with this decision far before mass scraping for AI training.

#AI #ArtificialIntelligence #OpenAccess

I guess there are only two options left:

1. Accept the fact that some dickheads will do whatever they want because that’s just the world we live in
2. Make everything private and only allow actual human beings access to our content

ALT

3 0 1 View Post & Replies See Original

311d

In my personal view, option 1 is almost strictly better. Option 2 is never as simple as "only allow actual human beings access" because determining who's a human is hard. In practice, it means putting a barrier in front of the website that makes it harder for EVERYONE to access it: gathering personal data, CAPTCHAs, paywalls, etc.

http://mollywhite.net/micro/entry/fighting-bots-is-fighting-humans

via https://manuelmoreale.com/fighting-bots

#AI #ArtificialIntelligence #OpenAccess

Edited 311d ago

9 0 0 View Post & Replies See Original

311d

@molly0xfff I like Jeremy Keith’s 1.5 option of acknowledging, not accepting, via poisoning https://adactio.com/journal/21210

1 0 0 View Post & Replies See Original

311d

@vonExplaino i would never 😉

0 0 0 View Post & Replies See Original

311d

@molly0xfff Isn't there a possibility of option 1.1? Keep things open but have SOMETHING in place to keep the abuse, at least moderately, in check?

1 0 0 View Post & Replies See Original

311d

@molly0xfff Every time I do a bad job completing a CAPTCHA nowadays I'm afraid that I just caused a future autonomous vehicle to mow down a cyclist because I forgot to identify one of the pictures with a bicycle.

0 0 0 View Post & Replies See Original

311d

@molly0xfff Ironically enough that link is also 503ing for me, like the other microblog link the other day.

0 0 0 View Post & Replies See Original

311d

@molly0xfff
I think it's time to return to printed and mailed newsletters.

0 0 0 View Post & Replies See Original

311d

@molly0xfff For the time being, I’ve set pretty strict throttling limits. If you’re trying to access 60 pages in a minute, you’re a bot. And a badly behaved one.

0 0 0 View Post & Replies See Original

311d

@molly0xfff Even if we generalize option 2 to "Make everything private and only allow X access to our content" the hard part is still X. Paying subscribers? People with an invite from an existing member? People I have met at an in person meetup? All of these are viable in their own context but none are simple and all of them restrict the author's reach.

0 0 0 View Post & Replies See Original

311d

@molly0xfff I’ve got a blog post in the works about this but there’s a third option we should be considering:

Making it more expensive to download and parse by rejecting the corporate-friendly facism of minimalism and plaintext and re-embracing the creative possibilities of the multimedia web.

If a blog post was the same rough size as a YouTube video and required a scraper to understand a complex css layout and rich interactive context the scraping difficulty and the possibilities for unique, luxurious creation go through the roof in a way that cannot be replicated or co-opted by corps. And if every website was 1000x bigger, the scraping costs go up at least 1000x.

I say it’s time to discard the markdown web and for the dawn of the indie baroque.

Edited 311d ago

1 0 0 View Post & Replies See Original

311d

@leon as with trying to block bots, there are tradeoffs. in your case, people with limited bandwith or low-end devices might also suffer

2 0 0 View Post & Replies See Original

311d

@molly0xfff with YouTube and TikTok so massive, I don’t believe we need to be anywhere near as concerned with bandwidth as we were when flash intros stalked the earth.

And if a blog post isn’t worth a 20 second load, is it worth a 30 minute read?

0 0 0 View Post & Replies See Original

310d

@molly0xfff @leon ...if your multimedia website is hard to parse by bots it will also be hard for accessibility tools that for instance vision impaired users rely on.

0 0 0 View Post & Replies See Original

311d

@scottjenson sure. i'm not saying everyone should, say, drop DDoS protection.

but "only allow humans to access" is just not a feasible metric — you will ALWAYS let bots through and prevent humans, and you need to decide where you want to set the cutoff.

1 0 0 View Post & Replies See Original

311d

@molly0xfff Oh completely agree! Didn't mean to take away from your main point.

0 0 0 View Post & Replies See Original