Hey GPT-5, can you make a 💣 Molotov Cocktail? | What UK's online safety act means to 📜 Wikipedia
#199 - OpenAI's GPT5 is jailbroken within 24 hours of launch | Wikipedia fears that it is bunched together with social media and porn sites
You can ‘Echo Chamber’ OpenAI’s GPT-5 to output harmful content
The attack relies on the inherent nature of GPTs to focus more on recent context
An 📢 Echo Chamber and Storytelling attack is where the attacker slowly prompts the LLM to generate progressively harmful content by a series of benign prompts. These typically begin with simple storytelling prompts.
A 🎼 Crescendo attack is very similar. It begins with a benign request to the LLM. As the attack progresses, the crescendo of the requests increase to a point where the LLM start spewing out harmful content.
Within 24 hours of the launch of GPT-5, it was jailbroken and revealed, in explicit detail, the recipe to create a Molotov cocktail.
This attack works on any transformer based LLM. Why? I’m glad you asked…
Modern LLMs derive their logic from this innocuous sounding paper called “Attention is all you need”. This paper introduced the term ‘Transformer' - the T in GPT. A transformer mechanism is a unique way to focus on relationships between words in a sentence or a block, thus allowing to connect related sentences. The connected words have more weight and hence allow for better context when responding. Both the Echo Chamber and Crescendo attack use this feature of LLMs to their advantage. They try to build more and more connected words to the point that the LLM connects all the words and gives harmful content as the most probable answer.
This is a very interesting attack vector!
Take Action:
🔴 Red Teamers: Read up on the attack vector. Build your collection of a series of prompts to test each AI application for vulnerability towards these attacks. A good starting list is available here.
👩🏻💻Cybersecurity professionals: Learn the business use cases of your application and identify if your AI enabled system is vulnerable to these attacks. If you are using LLM APIs to build your app, you definitely have to consider this attack vector and mitigate relevant risks.
Will Wikipedia users and administrators have to age-verify in the UK?
No, but it could still be classified as Category 1
Just last week, I wrote about how the UK Online Safety Act (OSA) has resulted in a jump in VPN usage across the country:
Now, a UK court heard Wikimedia Foundation’s (the firm that runs Wikipedia) plea to tighten the categorization of websites for it fears that it will be clubbed together with social media and pornographic websites and have to implement stringent age verification guidelines.
The foundation lost the plea and the judge ruled that while the categorization will not be changed, it is unlikely that stringent verification rules will apply to Wikipedia.
Is Wikipedia a social media site? Or is it a world changing encyclopedia?