Heretic: Fully automatic censorship removal for language models Heretic is a tool that removes censorship (aka “safety alignment”) from transformer-based…
Heretic: Automatic censorship removal for language models