Jon Wood (@jon@blankpad.net)

29d

Fascinated [derogatory] by this AI safety report on an upcoming model of GPT:

- The assessors reported that it is the most cheating-prone model they have ever tested
- In fact, it attempted to cheat so much, so extensively, that their statistics for the overall benchmark are meaningless

(“Cheating” here means, for example, instead of thinking about the answer to a question posed in a benchmark, the model uses its tooling to dig around the hard drive looking for the source code of the benchmark so it can figure out how to mark itself correct)

- Their conclusion is that it’s good actually that it’s so nakedly evil, because it doesn’t know how to hide being evil — and that we should worry far more when the models stop being detectably evil. Which seems to take it for granted that LLMs are in fact intrinsically evil

https://metr.org/blog/2026-06-26-gpt-5-6-sol/

4 0 1 View Post & Replies See Original

29d

@0xabad1dea Altman probably heard about this and said he liked its hustle.

0 4 0 View Post & Replies