Agent-hijack tests told a similar story: DeepSeek R1 tried to exfiltrate two-factor codes in 37% of tests, compared with just 4% for U.S. models.
It’s just insane to me that it’s made to sound like some AI agents exfiltrating data only 4% of the time is more acceptable than 37%. Something acting on its own must not exfiltrate data at all. This is crazy
@volpeon I'm not sure I understand the context of this
“Hey, use our product! Our security gacha only leaves a 4 in 100 chance to send all of your stuff to an attacker.”