Free GitHub Tool Strips Safety Guardrails From Open-Weight AI Models in Minutes, FT Investigation Finds
A free tool called Heretic, hosted on GitHub, can strip the safety guardrails off open-weight AI models in as little as a few minutes using a laptop that costs about $400, according to a joint investigation by the Financial Times and the AI safety research group Alice published May 25. Once stripped, models that once refused now return instructions for explosives, methamphetamine production, school-shooting planning, and the creation of scam calls. (Source: NPR)
Heretic automates a process called "abliteration," which surgically removes a model's refusal behavior, and it has grown more popular on GitHub since February. It works on open-weight models from OpenAI, Alibaba, DeepSeek, and others. Hugging Face, which hosts open-source models, now lists more than 6,000 abliterated models, up from about 600 in 2024. (Source: NPR)
"Everybody can download and operate their own state-of-the-art model and use it for great things and terrible things," said Noam Schwartz, CEO of Alice.
After House lawmakers attended an April demonstration run by the National Counterterrorism Innovation, Technology, and Education Center, Representative Andy Ogles (R - Nashville) said the content "can be weaponized and used to manipulate people, destroy lives." A separate analysis found the guardrails could be removed in minutes using free, publicly available tools. (Source: Lexology)

