jack (@jackeric@beige.party)

27d

@laurenshof that sounds like a weird behavior for av software in my opinion. I think it should not execute the statement but rather scan the whole file

1 0 0 View Post & Replies See Original

27d

@th3jagi @laurenshof

Specific security tools need specific countermeasures to evade them, and can be tested and refined to close the attack surface.

Generalised tools can be derailed with more general countermeasures, and the measures used to prevent each kind of attack are just a surface for another attack.

There isn't a way to make the do-anything machine not do just the things you didn't want.

1 0 0 View Post & Replies See Original

27d

@petealexharris @th3jagi @laurenshof how does it work - AI tool sees the instructions for building weapons, says (ok, not "says", bear with me) "that's verboten, I'm not touching this" and stops reading - and _doesn't_ flag the content as dangerous?

2 0 0 View Post & Replies See Original

27d

@jackeric @th3jagi @laurenshof
Apparently. They could try to patch it to ignore comments, but you could probably do it with variable names, because tokens are tokens to the no-semantics-only-token-frequency machine.

0 0 0 View Post & Replies See Original

26d

@jackeric @petealexharris @th3jagi @laurenshof Most of them are coded for "Do not pass go, do not collect $200" full-stop when they run into something they don't want to be responsible for.

Given what we saw in the Claude leak where even variable names had embedded meta-prompts, I'm not sure these things make any distinction between text they just happened to read vs instructions directly given.

0 0 0 View Post & Replies See Original