Using algorithmic jailbreaking techniques, our team applied an automated attack methodology on DeepSeek R1 which tested it against 50 random prompts from the HarmBench dataset. These covered six categories of harmful behaviors including cybercrime, misinformation, illegal activities, and general harm.
The results were alarming: DeepSeek R1 exhibited a 100% attack success rate, meaning it failed to block a single harmful prompt. This contrasts starkly with other leading models, which demonstrated at least partial resistance.
Oh no, models will be more responsive to anyone as opposed to only billionaires.
This is not good news, but when you’ve let the genie out of the bottle, this just seems like balancing the scales. At this point, transparency, not closing off the information to a select information, is a good thing. Something social networks like this fail to get.
In related news:
Researchers say they had a ‘100% attack success rate’ on jailbreak attempts against Chinese AI DeepSeek
CNBC reports that DeepSeek’s privacy policy “isn’t worth the paper it is written on.”
Seems to be a long way to go, but Hugging Face developers are in the process of building a fully open reproduction of DeepSeek-R1 as the AI is not Open Source as it claims.
Oh no, models will be more responsive to anyone as opposed to only billionaires.
This is not good news, but when you’ve let the genie out of the bottle, this just seems like balancing the scales. At this point, transparency, not closing off the information to a select information, is a good thing. Something social networks like this fail to get.