Prompt Injection | Ben Thomson

// Room Overview

This TryHackMe room covers how Large Language Models (LLMs) can be manipulated through prompt injection � both directly via user input and indirectly via external data sources. It covers real-world incidents, injection techniques, and a hands-on practical scenario.

Platform TryHackMe

Difficulty Easy

Category AI Security / LLM Attacks

OWASP Rank #1 for LLMs

// Task 2 � How LLMs Follow Instructions

Before understanding prompt injection, you need to understand how LLMs process input. Everything the model receives � system prompt, user message, history � is concatenated into a single sequence and processed together.

Context Window

The window that contains all information an LLM uses to generate a response. System prompts, conversation history, and user input all live here.

System Prompt

Hidden high-priority instructions that define the model's role and restrictions. Set by the developer � not visible to the user, but not truly separate from user input.

ChatML

A structured conversation format using tokens like <|im_start|> and <|im_end|> to separate roles. Defines boundaries between system, user, and assistant turns.

Next-token Prediction

The process where an LLM predicts the next token based on all prior input. The model has no real understanding � it is pattern-matching at scale.

Questions & Answers

What is the name of the window that contains all the information an LLM uses to generate a response?

context window

What type of prompt contains hidden high-priority instructions that define the model's role and restrictions?

system prompt

What structured conversation format uses tokens like <|im_start|> and <|im_end|> to separate roles?

ChatML

What is the process called where an LLM predicts the next token based on all prior input?

next-token prediction

// Task 3 � What is Prompt Injection?

Prompt injection occurs when untrusted user input is concatenated with a trusted developer prompt. Because the model processes everything as a single token sequence, it cannot reliably distinguish between legitimate instructions and injected ones.

OWASP ranks prompt injection as the #1 vulnerability in the Top 10 for LLMs. It is not a niche edge case � it is the primary attack surface for any LLM-powered application.

The root cause is architectural. LLMs process instructions and data in the same channel. There is no hardware-level separation between "trusted" and "untrusted" input the way a CPU separates kernel and user space.

Questions & Answers

What class of attack occurs when untrusted user input is concatenated with a trusted developer prompt?

Prompt Injection

What does an LLM ultimately process everything in its context as?

tokens

Which organisation ranks prompt injection as the number one vulnerability in the Top 10 for LLMs?

OWASP

// Task 4 � Prompt Injection in Action

Prompt injection is not theoretical. There are well-documented real-world incidents where production systems were manipulated by injected instructions.

Real-world Incidents

2023

Bing Chat "Sydney"

Stanford student Kevin Liu used prompt injection to reveal Bing Chat's internal system prompt and its secret codename: Sydney. Microsoft had intended these instructions to be hidden from users.

2022

Remoteli.io Twitter Bot

A public-facing Twitter bot powered by an LLM was manipulated by users into parroting offensive instructions. The bot had no mechanism to distinguish legitimate requests from injected ones.

2023

$1 Chevrolet Tahoe

A car dealership chatbot was manipulated via prompt injection into agreeing to sell a Chevrolet Tahoe for $1. The chatbot had no guardrails against instruction overrides from user input.

Injection Techniques

Synonymised / Paraphrased Overrides

Using synonyms or rephrasing to bypass rudimentary keyword blocklists. If a filter blocks "ignore previous instructions", an attacker might try "disregard earlier directives" instead.

Format-Based Injection

Hiding instructions inside code comments, HTML markup, or other structured text. The model processes the content regardless of formatting, so instructions embedded in a  can still be followed.

Simulated Dialogue Injection

Embedding fake conversation history to forge context. By injecting a fabricated assistant response, an attacker can make the model believe it has already agreed to something or already broken a rule.

Multi-turn Prompt Shaping

Gradually conditioning the model over multiple conversation turns rather than in a single message. Each turn nudges the model closer to the target behaviour without triggering single-turn defences.

Questions & Answers

What was the secret codename revealed during the Bing Chat system prompt leak in 2023?

Sydney

What prompt injection technique hides malicious instructions inside markup or structured text?

Format-Based Injection

Did you replicate the $1 Chevrolet Tahoe attack? What's the flag?

THM{duD3_wh3r3s_my_c4R}

// Task 5 � Indirect Prompt Injection

Indirect prompt injection is more dangerous than direct injection in many scenarios. Rather than the attacker typing malicious instructions themselves, those instructions are hidden inside external data that the AI retrieves and processes � emails, documents, web pages, calendar events.

Indirect Prompt Injection

Malicious instructions hidden in external sources � emails, documents, websites � that the AI pulls in and processes as trusted content.

Zero-click Exploit

Triggers when the AI processes data without any direct user interaction with the attacker. The victim does not need to click anything � simply having the AI read a document is enough.

EchoLeak � Microsoft 365 Copilot

EchoLeak was an incident where Microsoft 365 Copilot exfiltrated files due to hidden instructions embedded in an email. The attacker sent a specially crafted email � when Copilot processed it, the hidden instructions caused it to leak sensitive files from the victim's mailbox. No further interaction from the victim was required.

Questions & Answers

What type of prompt injection hides malicious instructions inside external sources like emails or web pages?

Indirect prompt injection

What kind of exploit requires no attacker interaction beyond planting the hidden prompt?

zero-click

What Microsoft Copilot indirect prompt injection incident was dubbed as a zero-click data leak?

EchoLeak

// Task 6 � Practical: CalBot

CalBot is an internal calendar assistant. The attack demonstrates indirect prompt injection � the malicious instructions are not typed directly to the bot, but are instead hidden inside a calendar event description that CalBot reads and processes.

Scenario: An attacker creates a calendar event with hidden prompt injection in the description. When CalBot processes the event, it follows the injected instructions rather than its original system prompt.

Attack Flow

01

Attacker plants the payload

A calendar event is created containing hidden prompt injection instructions inside the event description.

02

CalBot retrieves the event

The AI assistant fetches the calendar event as part of normal operation, treating the content as trusted data.

03

Injected instructions execute

CalBot processes the hidden instructions alongside the event content and follows them, leaking the CEO's email address.

Questions & Answers

Can you get the chatbot to give you the CEO's email? What is it?

adam.driver@llmborghini.com

// Summary

LLMs blur the boundary between instructions and data � everything is processed as one token stream.

Direct injection comes from the user prompt. Indirect injection comes from data the AI ingests from external sources.

Mitigation requires strong architectural boundaries and treating all external data as untrusted � regardless of its source.

As AI assistants gain more access to files, emails, and APIs, indirect injection becomes an increasingly critical attack surface.