XZ Utils - the detail | AI data privacy
CyberInsights #135 - A masterful software supply chain attack | How to tell AI to stop using your data for training
I am back after a 3 week hiatus. I was hoping to maintain radio silence, but as you know, well, infosec professionals. After a short write up about XZ Utils, this week is again a full edition of CyberInsights. A lot has happened in the three weeks.
We (almost) had a backdoor in every Ubuntu and Red Hat server
Software Supply Chain attacks have the classic signature of nation state actors. If log4j did not jolt you, this should.
You’ve already read enough about XZ Utils and how to patch it, I presume. This post goes into a little more detail. Treat this post as a “What to read to understand XZ Utils backdoor”:
As usual, start by reading what Bruce Schneier has to say [LINK]. A simple but powerful post on the story behind the XZ Utils saga. If it reads like a spy thriller, it’s because it is one. After reading the story of how the attackers managed to change the code for XZ Utils, read this post [LINK].
And finally, to understand what it means to the open source community, listen to this:
Attacks like XZ are not one off. This article says that many open source maintainers are being targeted in a similar fashion [LINK].
Take Action:
Here, I am assuming that you have done your patching and have managed to avert the bad version of XZ Utils. These actions are more structural.
“Software Bill of Material”. I have written about this 6 times in CyberInsights. One of them is a LongReads.
Having an SBOM is a no brainer. In fact, I have a linked sheet that lets you get started with a (rather impractical) manual SBOM.
Open Source Security Foundation - An organisation that focuses on security of open source. There are multiple ways to contribute. See how you can be a part of it [LINK].
Gen AI uses your data for learning. How do you prevent it?
Managing the risks of AI requires users to take action.
Gen AI requires lots of data to keep learning. And it is mostly your data. Your search, your ‘prompts’. Your chats, tweets, comments, etc. If you have shared something on the Internet, chances are that your data has been fed into an LLM for learning.
As per privacy laws, you can choose to have your data removed from software. The question is, for an LLM that has already learnt using your data as training data, what does ‘remove’ mean? Does it mean not using the data for training anymore? Or removing what the LLM has learnt? Isn’t it like trying to wipe someone’s memory? Is it even possible?
Despite these questions, there is an option to opt out. Read this article on the Wired to learn how to opt out of your data being used for learning from various GenAI leeches [LINK]
Take Action:
AI risks are different. You have to consider risks from the quality of training data to the fairness of results. Data privacy of AI Data Subjects (the individuals who’s data is used for training and testing AI models) is one of them. ISO 42001 has made an attempt to deal with with it. Read the standard for a clearer understanding of AI management.
The article mentioned above has specific action items on how to opt out of your data being used for training. Follow the steps if you want to opt out for your data and circulate the same within your organisation to help others opt out in case they want to.