Using ChatGPT for security and introduction of AI security


There are many security services that uses ChatGPT.
Methods to attack against AI are, for example, input noises, poisoning data, or reverse engineering.


If you hear about "AI and security", 2 things can be considered. First, using AI for cyber security. Second, attack against AI. In this article, I am going to discuss these topics.

- Using AI for security: Introducing some security application that uses ChatGPT. 
- Attack against AI: What it is.


Using AI (ChatGPT) for security (purpose)

Since the announcement of GPT-3 in September 2020 and the release of many image-generating AIs in 2022, using AI become commonplace. Especially, after the release of ChatGPT in November 2022, it immediately got popular because of its ability to generate quite natural sentences for human.

ChatGPT is also used to code from human's natural languages, and also can be used to explain the meaning of the codes,  memory dumps, or logs in a way that is easy for human to understand. 
Finding an unusual pattern from a large amount of data is what AI is good at. Hence, there is a service to use AI for Incident Response.
Microsoft Security Copilot : Security Incident response adviser.

This research uses ChatGPT to detect phishing sites and marked 98.3% of accuracy. 

- Detecting Phishing Sites Using ChatGPT

Of course, the ChatGPT can be used for penetration testing. 

- PentestGPT 

However no one is willing to share sensitive information with Microsoft or other vendors. Then it is possible to run ChatGPT-Like LLM on Your PC Offline by some opensource LLM application, for example gpt4allgpt4all needs GPU and large memory (128G+) to work. 

- gpt4all

ChatGPT will be kept used for both offensive and defensive security.

Attack against AI

Before we discuss about attack against AI, let's briefly review how AI works. Research on AI has long history. However, generally people uses AI as a Machine Learning model or Deep Learning algorithms, and some of them uses Neural Network. In this article, we discuss about Deep Neural Network (DNN).


DNNs works as follows. At first, there are several nodes and one set of those are called nodes. Each nodes has it layer and the layer are connected each other. (Please see the pic below).

The data from Input layer is going to propagate to multiple (hidden) layers and then finally reached to the Output layer, which performs classification or regression analysis. For example, input many pictures of animals to let the DNN learn, and then perform to identify (categorize) which animal is in the pictures.

What kind of attacks are possible against AI?

Threat of cyber security is to compromise the system's CIA (Confidentiality, Integrity, Availability). The attack to AI is to force wrong decisions (lose Integrity), make the AI unavailable (lose availability), or the decision model is theft (lose confidentiality). Among these attacking, the most well-known attack methodology is to input a noise in the input layer and force wrong decision - it is named as an Adversarial Example attack.

Adversarial Example attack


The Adversarial Example is illustrated in this paper in 2014:

Explaining and Harnessing Adversarial Examples

The panda in the picture on the left side is the original data and be input to DNN - normally, the DNN will categorize this picture as panda obviously. However, if the attacker add a noise (middle picture), the DNN misjudge it as a gibbon. 
In other words, the attack on the AI is to make the AI make a wrong decision, without noticed by humans. 

The example above is attack to the image classifier.  Another attack example is ShapeShifter, which attack to object detector. This makes a self-driving car with AI cause an accident without being noticed by humans, by makes stop signs undetectable.

- ShapeShifter: Robust Physical Adversarial Attack on Faster R-CNN Object Detector
Usually, a stop sign image is captured through a optical sensor in a self-driving car, and its object detector would recognise it as a stop sign and follow the instructions on the sign to stop. However, this attack would cause the car to fail to recognise the stop sign.

You might think even if the DNN model on a self driving car is classified so the attacker can't get info to attack to the specific DNN model. However,  the paper below discuss that an adversarial example designed for one model can transfer to other models as well (transferability).

- Transferability Ranking of Adversarial Examples

That means, even if an attacker is unable to examine the target DNN model, they can still experiment and attack by other DNN models.

Data poisoning attack

In an adversarial example attack, the data itself is not changed, instead, added noise to the data. The attack that poisoning the training data also exists. Data poisoning is to access to the training data which is used to learn/train the DNN model, and input incorrect data to make DNN model produce results which is profitable for the attacker, or reducing the accuracy of the learning. Inputting a backdoor is also possible.

- Transferable Clean-Label Poisoning Attacks on Deep Neural Nets

Reverse engineering attack

Vulnerabilities in cryptography include a vulnerability that the attacker can learn the encryption model by analyzing the input/output strings which are easy to obtain. Similarly, in AI models, there is a possibility of reverse engineering of DNN models or copy the models by analysing the input (training data) and output (decision results). These papers discuss about that.

- Reverse-Engineering Deep Neural Networks Using Floating-Point Timing Side-Channels

- Copycat CNN 


Finally, there's one last thing I'd like to say.  This article was not generated by ChatGPT.

Published Jun 30, 2023
Version 1.0

Was this article helpful?


  • This is insightful! Please expand on ways to prevent such adversarial objects to cause malfunction? Or is it an example of the shortcomings of this DNN model that is easily avoided by a real human, which even a child can figure out.

  • AubreyKingF5 chatGPT and these new AI tools fall under above OWASP new project ?. They are not part of any OWASP Web or API Top 10 projects ?

  • Comet's avatar
    Ret. Employee

    When AI learns from an attacking group, the parameters weights' values shift, causing the AI to make other choices.
    Alternatively, you can have an AI that does not learn.

    "Garbage In, Garbage Out"

    You do not get something for nothing in generative AI -- it is no more than statistical correlations of the data fed in.  The behavior you observe is not an error -- it is mathematically logical result of extrapolating beyond realm of applicability.  The AI does not know reality, and language models, for example, only know language, just as image models do not know actual objects.  You cross your legs and you know that does not disconnect your underneath leg into two parts.  But an AI cannot know this, if trained on static images.  Thus, you get image "hallucinations."  An attacker who controls the data from which AI draws inference can thus bias the output.  Not just by needing to explicitly compute a difference overlay as in the panda example, but by using techniques of prompt engineering to feed in data that make the AI "hallucinate" as if drugged.

  • AIs are built on LLM. LLM come from the web. Also, when AI begin to be plugged into the net to feed themselves and communicate, long term, there are APIs. There are already APIs for access. Every method for interaction with AI are via HTTP(S).

    Give the beta a read. It's seriously exciting.