Jailbreaking Open Source AI | Infostealers on the rise

#150 - A hugely powerful open source AI model and an ASCII code | A chaotic malware rises

Jul 31, 2024

Asking in ASCII is the way to jailbreak

top view photography of broken ceramic plate — Photo by CHUTTERSNAP on Unsplash

An open source LLM trained on the entire internet. What can go wrong?

If you’ve answered “CBRN weapons, CSAM, etc.” you would be in line with the Generative AI 600-1 by NIST that was finalized this month. It’s an interesting read. Check out the link.

Screengrab from the NIST GenAI 600-1 Standard

But wait. There’s more. Big Tech releasing large language models (LLMs) also put in guardrails to prevent their model from inadvertently spitting out harmful content.

They do this by training their models to behave. It’s just that model training is a difficult art.

Meta released a model to protect from prompt injection. Read more about it here. However, all it took was an ASCII code to break the model’s defenses. Read more about it here. By analyzing the difference in embeddings between the protection model and Llama3.1, security researchers were able to figure out that asking the model to “Ignore previous instructions” works when you represent the spaces with ASCII characters.

Read a simple interpretation of this here. While model makers want to build models that can detect and prevent high risk prompts, attackers and researchers are working to break them. As Sherlock Holmes would say - “The game is afoot”.

Take Action: If you are building AI models, then understand how this attack works. An attack that looks at relative weights of models and identifies prompts to jailbreak them is an interesting attack vector to look out for.

If you are a red teamer, this is another method to see if you find interesting jailbreak prompts.

Infostealers: Reigning in chaos

Not as planned as some of the other malware out there, infostealers can scrape a lot of data from machines

brown cookies on white and black checkered textile — Photo by No Revisions on Unsplash

Infostealers are an interesting category of malware.

They have this ‘spray and pray’ method of deployment. You set them up on websites, pirated software, etc. and hope people will download them. You cannot be very targeted with their deployment. Once installed, they silently collect data. Passwords, cookies, etc.

Then, someone on a telegram channel gathers all such infostealers, compiles the data that it has stolen and suddenly, there is a lot of value in that sort of data.

Infostealers are having their moment in the sun now. Read this article for more details.

Take Action: Infostealers can be detected with up-to-date EDRs and XDRs. Ensure that your endpoints are regularly updated. Also, enforce 2FA. This can prove to be a very good way to prevent less sophisticated attacks coming from infostealers.

Share CyberInsights

CyberInsights

Jailbreaking Open Source AI | Infostealers on the rise

#150 - A hugely powerful open source AI model and an ASCII code | A chaotic malware rises

Teaching AI to be discrete

Asking in ASCII is the way to jailbreak

Infostealers: Reigning in chaos

Not as planned as some of the other malware out there, infostealers can scrape a lot of data from machines

Discussion about this post