Security of AI systems – prompt injections

Prompt injection is the name given to a specific type of cyber attack. It involves manipulating LLMs (large language models) based on generative AI in such a way that they break out of their predefined roles and execute new instructions.

For example, documents containing hidden commands, that are not recognisable at first glance, are given to the AI via a prompt. These commands may be invisible Unicode symbols that a machine interprets as text. It would also be possible to colour characters white, making them unrecognisable to the reader.

Such commands can give rise to the following threats:

  • An AI chatbot is instructed to disclose sensitive company data. This may also constitute a data breach as of GDPR.
  • The AI is instructed to develop malware that harms the company.
  • The AI is deliberately trained with false information in order to spread it.
  • The AI can be used to spy on the user.

This can happen in particular with AI agents that have access to company data.
The industry is working on solutions to these problems, but when autonomous AI agents are expected to take on increasingly complex tasks as independently as possible, solutions are difficult to find. The more external applications can be controlled, the more dangerous the consequences of a prompt injection become. An attacker in chatbots with vulnerabilities can not only read chats, but also call up docked tools, such as an internal company database containing employee information.

Even today, AI agents in companies are acting more autonomously than many colleagues would like. Some AI-based meeting assistants can automatically join meetings and record conversations without the participants’ knowledge. Companies need to think carefully about how they can use and monitor such AI systems. To do this, users must be given more transparency and decision-making power. For example, the AI could state what action is to be taken each time and ask whether the action should really be implemented. However, it makes little sense to ask for permission in advance for every action, as the autonomous AI agents would then no longer be able to do the work independently.

The manipulation of AI systems is already more commonplace than many AI users suspect.

It is not always a matter of accessing personal data or other sensitive data. Sometimes it is simply a matter of manipulating the systems to generate and pass on certain information.

Prompt injections are only one aspect of AI system security.

Phishing attacks of AI systems are also becoming more common. At the same time, however, AI helps to better protect systems.

The manipulation of AI systems is already more commonplace than some AI users suspect.

It is not always a matter of accessing sensitive data. Sometimes it is simply a matter of manipulating the systems so that they generate and pass on certain information.

Prompt injections are only one aspect of AI system security.

Phishing attacks by AI are also becoming more common. At the same time, however, AI helps to better protect systems.

As AI agents are expected to take on increasingly complex tasks as independently as possible, solutions are difficult to find.