Questions on AI Threats in CyberSecurity

As people occasionally do, I was asked to attend a conference that involved a lot of listening and not much speaking, which is generally what happens when your speaker list consists more than thirty individuals on a Zoom call.

This Medium post is where you can get more literature behind all of the types of zoom participants we all know so well….

Despite my lack of ability to speak at said conference, I did think the questions posed were worthwhile. As I settle into my new role as Chief Data Ethicist at Anno.Ai, I realize now more than ever that data ethics is extremely closely tied with subjects on AI security.

I will present my musings on the questions below, and hopefully this article compensates for my lack of caffiene filled participation in aforementioned conference:

How are threat actors currently using AI in attacks?

Source

The answer is a mixture between totally expected, deflating and not exciting at all- hardly any threat actors are using AI in cybersecurity attacks. Well, though- it depends. It’s used in a variety of ways to enhance current very effective attacks that (unfortunately for those looking for some type of “the doomsday is coming” effect) are old school traditional techniques that are simply enhanced with some sprinkle of machine learning here and there.

What are some examples? The TaskRabbit and WordPress attacks of 2016 and 2019 used AI to help proliferate botnets used to perform- well, a bunch of the usual- DDoS attacks and phishing. We might see some GAN/CGAN creativity when it comes to propaganda production, phishing email generation, and even some social engineering attacks too. Unlike our usual counter-terrorism mindset- the biggest threat here is not necessarily the Syrian Cyber Army, but in fact more mature state actors along the Russia and China lines. No, ISIL’s self driving IEDs do not count.

The most exciting use of AI that I can see in the near future is in the form of malware creation and propagation- SELF propagation. I also think the area of cryptography and encryption will get more exciting as compute resources become more available. Also- what about some autonomous IDS/IDPS action? Letting systems respond to attacks using AI to select the most appropriate mitigation techniques.

It’s not as simple as reading signatures or performing post-mortems. Also… the other guy is probably not the biggest threat to the security of your AI or cybersecurity systems- your own employees/ your own ML/AI code is probably the biggest security liability!

What are the attack capabilities being enhanced?

Hmmm… hard to tell due to all of the reasons posted on the first question.

Capabilities to me = supply chain. What parts of the supply chain are being enhanced to make attack capabilities more robust?

Source

So I immediately think of all of the unsexy things that are likely to be sought after by adversaries seeking to launch attacks- compute resources, quantum/near quantum, high performance computing resources. All of these resources will be used in the manners outlined in the last section- how to make social engineering techniques higher in volume and scalable, more convincing.

What might compose appetizing targets? I feel like cloud resources and cloud networking is a good option here, for both launching attacks and being the subject of attacks. We still somehow suck at securing cloud computing resources that compose critical infrastructure. Cloud networking also might present a vulnerability, especially since resources living in the cloud leave very easily distinguishable signatures behind. Cloud resources also assist in anonymizing or abstracting the attacker from any type of known entity online. Just.. pay for a few GPUs and off you go!

Will existing risk controls be sufficient?

Well, no. They barely get us to safe right now against traditional attacks.

I heard a lot of good pontification on this point… but to really get anywhere meaningful here, practitioners need to talk to one another. The same vulnerabilities that present themselves in regular systems that are coded together are present for AI systems. A few examples: basic code security practices- these will also matter in putting together AI/ML systems. The risks that come with copying and pasting code from Github or using open source libraries for solutions are the same in AI/ML systems as they are in any other coded systems. This is especially compounded when your ML engineer or data scientist shrugs their shoulders when it comes to explaining how and why the code works the way it does.

What are cybersecurity requirements to protect current AI/ML assets?

Let’s talk about the time a man consulted Gmail’s autocomplete to determine decisions he should make rather than making the decisions himself…!

Echoing the spirit of the section before this one… think about what techniques and approaches you would use to secure plan old code, and add that in here.

Think about ML model attack surfaces- let’s start with inference or serving, the section most exposed to the outside world for attack. Let’s make sure we are securing our GRPC servers, our containers, and our API endpoints. Endpoint security is key. Then we can move back a bit to how the model is actually architected to perform serving. Think model feature robustness, feature resiliency. Can your model be manipulated by the end user to produce bad results? Would that then poison the re-training loop you have set on autopilot for your model pipelines? What if this manipulation happens not on purpose by an end user simply trying to figure out how to use your system (i.e. me trying to pass a re-captcha and somehow always getting pegged as a robot). What about adversarial-like examples that could produce radically wrong results, whether (again) done on purpose or purposefully. Think… man with 99 phones walking down a road running Google Maps. Think the Waymo stop sign adversarial technique that is not even visible to the human eye.

Gotcha, Google!
Speed up or slow down?

One has to look at the ML pipeline with an end-to-end security mindset. Ask your practitioner leads questions around the following:

  • Is your training data secure? As in, is the S3 bucket secured? Or can data be manipulated? Made available? Exposed?
  • Was the dataset curated properly? Unbiased? Properly divided into train, test and prod?
  • Data bias? Is your team aware of any? Any weaknesses in the data from a features perspective?
  • Finally- has the team done due diligence to ensure every part of the ML pipeline system is properly and easily visualized? Is model monitoring optimized to make sure every aspect of the system is quickly and easily understood?

What exists now and what capacity gaps do we need to address?

So… what exists now is basically all over the place. Each org is in a radically different place when it comes to ethical ML/AI and ethical practices — which mostly consist of absolutely nothing.

We are our own biggest threat here when it comes to gaps… and the majority of the threat comes from culture. A culture of not practicing security and ethics in our technology.

And now that I have rebuked everyone who is responsible for building, maintaining, selling and creating AI products, I will now also acknowledge that there isn’t much in the way of technical literature or solutions around this space. Security AI systems is probably somewhere around 5 years out in terms of making scalable solutions and having them be profitable on the market. The reasons behind this assessment mostly point to the still very nascent nature of even the most basic applied ML use cases. ML/AI based technologies and solutions are just still very new. And in proper back-assward-ness, we only start considering securing code and product… well, after a Solarwinds like incident of course. We haven’t seen this large scale high effect type attack on AI systems yet.

So, asides from the lack of product/applied solutions to secure and make AI/ML pipelines more ethical, we have the usual gaps that we see in an organization/ enterprises who base their existence on code- rushing ML pipelines to production because the hype to do so is so high. The usual coding practices are encouraged to fill most gaps here, but a slew of new frameworks and habits must be formed on the applied/engineer level to get these ML pipelines truly ready for prod- pen-testing ML/AI pipelines, examining data bias in training datasets, visualising and testing feature robustness, etc…

What of these are research challenges the science and technology base can attack?

There are two distinctive roles I see playing in the space by the usual culprits- academia/research and applied practitioners.

As has been mentioned in a litany of articles on the subject- the technology and specifically AI/ML realm is in desperate need of frameworks to articulate what areas of ethics and security are needed to make our AI/ML systems whole. There has been a little bit of conversation around these topics, resulting in a few frameworks. However, a few more rigorous rinse and repeats here and socializing in conferences not unlike the one I just recently attended is important- we MUST keep the conversation going with haste and urgency, and not pontificating about these aspects as distant threats to our AI/ML ecosystems . We also could use more research focused on deep diving into how adversarial states (China, Russia… China and Russia) might be using AI/ML techniques in attacks.

Our AI/ML engineers and practitioners have a role to play as well. Many solutions out there have sprung up in the open source space (See Deon, The Digital Impact Toolkit, Digital Civil Society Lab, The Ethics and Algorithms Toolkit, Aequitas, AI Fairness 360 Toolkit, the What-If Tool, Lime, Fairness Flow, Fairness Tool, Ethical OS, the UK Ethics Framework, The Ethics Canvas) and some handful of vendors focused on delivering solutions ready to work in the space (Fiddler.ai, Immuta, Calypso.ai’s Vespr, Weights and Biases, Advai).

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store