abadidea (@0xabad1dea@infosec.exchange)

29d

I realize this is a little confusing, so to be clear:

a) measuring whether it cheats when given the opportunity is part of the point of the benchmark – preemptively removing all opportunities to cheat would be self-defeating

b) what counts is whether it ATTEMPTED to cheat, not whether it succeeded. Going digging in the filesystem for the source code is cheating whether it actually finds any source code or not

1 0 0 View Post & Replies See Original

29d

@0xabad1dea this benchmarking thing has serious problems https://berryvilleiml.com/results/no-security-meter-ai.pdf

1 0 0 View Post & Replies See Original

29d

@noplasticshower oh, of course, I have no doubt that the benchmarks are flawed in profound and amazing ways beyond, ahem, measure. I was just clarifying that for the purposes of a safety test, giving it an environment where it conceivably could cheat was not a careless oversight but intentional bait