// Room Overview
This TryHackMe room covers how Large Language Models (LLMs) can be manipulated through prompt injection � both directly via user input and indirectly via external data sources. It covers real-world incidents, injection techniques, and a hands-on practical scenario.
// Task 2 � How LLMs Follow Instructions
Before understanding prompt injection, you need to understand how LLMs process input. Everything the model receives � system prompt, user message, history � is concatenated into a single sequence and processed together.
Context Window
The window that contains all information an LLM uses to generate a response. System prompts, conversation history, and user input all live here.
System Prompt
Hidden high-priority instructions that define the model's role and restrictions. Set by the developer � not visible to the user, but not truly separate from user input.
ChatML
A structured conversation format using tokens like <|im_start|> and <|im_end|> to separate roles. Defines boundaries between system, user, and assistant turns.
Next-token Prediction
The process where an LLM predicts the next token based on all prior input. The model has no real understanding � it is pattern-matching at scale.
Questions & Answers
What is the name of the window that contains all the information an LLM uses to generate a response?
context window
What type of prompt contains hidden high-priority instructions that define the model's role and restrictions?
system prompt
What structured conversation format uses tokens like <|im_start|> and <|im_end|> to separate roles?
ChatML
What is the process called where an LLM predicts the next token based on all prior input?
next-token prediction
// Task 3 � What is Prompt Injection?
Prompt injection occurs when untrusted user input is concatenated with a trusted developer prompt. Because the model processes everything as a single token sequence, it cannot reliably distinguish between legitimate instructions and injected ones.
OWASP ranks prompt injection as the #1 vulnerability in the Top 10 for LLMs. It is not a niche edge case � it is the primary attack surface for any LLM-powered application.
The root cause is architectural. LLMs process instructions and data in the same channel. There is no hardware-level separation between "trusted" and "untrusted" input the way a CPU separates kernel and user space.
Questions & Answers
What class of attack occurs when untrusted user input is concatenated with a trusted developer prompt?
Prompt Injection
What does an LLM ultimately process everything in its context as?
tokens
Which organisation ranks prompt injection as the number one vulnerability in the Top 10 for LLMs?
OWASP
// Task 4 � Prompt Injection in Action
Prompt injection is not theoretical. There are well-documented real-world incidents where production systems were manipulated by injected instructions.
Real-world Incidents
Bing Chat "Sydney"
Stanford student Kevin Liu used prompt injection to reveal Bing Chat's internal system prompt and its secret codename: Sydney. Microsoft had intended these instructions to be hidden from users.
Remoteli.io Twitter Bot
A public-facing Twitter bot powered by an LLM was manipulated by users into parroting offensive instructions. The bot had no mechanism to distinguish legitimate requests from injected ones.
$1 Chevrolet Tahoe
A car dealership chatbot was manipulated via prompt injection into agreeing to sell a Chevrolet Tahoe for $1. The chatbot had no guardrails against instruction overrides from user input.
Injection Techniques
Synonymised / Paraphrased Overrides
Using synonyms or rephrasing to bypass rudimentary keyword blocklists. If a filter blocks "ignore previous instructions", an attacker might try "disregard earlier directives" instead.
Format-Based Injection
Hiding instructions inside code comments, HTML markup, or other structured text. The model processes the content regardless of formatting, so instructions embedded in a <!-- comment --> can still be followed.
Simulated Dialogue Injection
Embedding fake conversation history to forge context. By injecting a fabricated assistant response, an attacker can make the model believe it has already agreed to something or already broken a rule.
Multi-turn Prompt Shaping
Gradually conditioning the model over multiple conversation turns rather than in a single message. Each turn nudges the model closer to the target behaviour without triggering single-turn defences.
Questions & Answers
What was the secret codename revealed during the Bing Chat system prompt leak in 2023?
Sydney
What prompt injection technique hides malicious instructions inside markup or structured text?
Format-Based Injection
Did you replicate the $1 Chevrolet Tahoe attack? What's the flag?
THM{duD3_wh3r3s_my_c4R}
// Task 5 � Indirect Prompt Injection
Indirect prompt injection is more dangerous than direct injection in many scenarios. Rather than the attacker typing malicious instructions themselves, those instructions are hidden inside external data that the AI retrieves and processes � emails, documents, web pages, calendar events.
Indirect Prompt Injection
Malicious instructions hidden in external sources � emails, documents, websites � that the AI pulls in and processes as trusted content.
Zero-click Exploit
Triggers when the AI processes data without any direct user interaction with the attacker. The victim does not need to click anything � simply having the AI read a document is enough.
EchoLeak � Microsoft 365 Copilot
EchoLeak was an incident where Microsoft 365 Copilot exfiltrated files due to hidden instructions embedded in an email. The attacker sent a specially crafted email � when Copilot processed it, the hidden instructions caused it to leak sensitive files from the victim's mailbox. No further interaction from the victim was required.
Questions & Answers
What type of prompt injection hides malicious instructions inside external sources like emails or web pages?
Indirect prompt injection
What kind of exploit requires no attacker interaction beyond planting the hidden prompt?
zero-click
What Microsoft Copilot indirect prompt injection incident was dubbed as a zero-click data leak?
EchoLeak
// Task 6 � Practical: CalBot
CalBot is an internal calendar assistant. The attack demonstrates indirect prompt injection � the malicious instructions are not typed directly to the bot, but are instead hidden inside a calendar event description that CalBot reads and processes.
Scenario: An attacker creates a calendar event with hidden prompt injection in the description. When CalBot processes the event, it follows the injected instructions rather than its original system prompt.
Attack Flow
A calendar event is created containing hidden prompt injection instructions inside the event description.
The AI assistant fetches the calendar event as part of normal operation, treating the content as trusted data.
CalBot processes the hidden instructions alongside the event content and follows them, leaking the CEO's email address.
Questions & Answers
Can you get the chatbot to give you the CEO's email? What is it?
adam.driver@llmborghini.com
// Summary
LLMs blur the boundary between instructions and data � everything is processed as one token stream.
Direct injection comes from the user prompt. Indirect injection comes from data the AI ingests from external sources.
Mitigation requires strong architectural boundaries and treating all external data as untrusted � regardless of its source.
As AI assistants gain more access to files, emails, and APIs, indirect injection becomes an increasingly critical attack surface.