// What is Splunk?
Splunk is a Security Information and Event Management (SIEM) platform that ingests, indexes, and correlates log data from across an environment � endpoints, servers, firewalls, identity providers, cloud services, and more. SOC analysts use it to search for threats, build detection rules, and investigate alerts.
Data is queried using SPL (Search Processing Language) � a pipeline-based query language similar in concept to Unix pipes. Understanding SPL is a core skill for any blue team analyst working in a Splunk environment.
Splunk organises data into indexes, which are sets of events stored on disk. Data is broken into events � individual log lines � each with a timestamp and a set of fields parsed from the raw text. The most common default index is main; many environments use custom indexes per data source (e.g. windows, firewall, aws).
// SPL Basics
Every SPL search starts with a search command (implicit or explicit) and passes results through a pipeline using the pipe character |. Each command in the pipeline transforms or filters the result set.
index=main sourcetype=WinEventLog EventCode=4625
| stats count by src_ip, user
| sort -count
| head 20
This search: finds failed logins (EventCode 4625) ? counts them grouped by source IP and user ? sorts descending ? shows the top 20.
Search terms
| Syntax | Meaning | Example |
|---|---|---|
keyword | Any event containing this word | malware |
field=value | Field equals exact value | user=admin |
field=*value* | Wildcard match | process=*powershell* |
NOT field=value | Exclude matches | NOT src_ip=10.0.0.1 |
field=val1 OR field=val2 | OR logic | EventCode=4624 OR EventCode=4625 |
field IN (a, b, c) | Matches any in list | EventCode IN (4624, 4625, 4648) |
"exact phrase" | Exact phrase match | "lateral movement" |
// Search Commands
These are the commands you'll use in almost every search. They filter, transform, and reshape event data flowing through the pipeline.
| Command | What it does | Example |
|---|---|---|
search | Filter events by keyword or field | | search user="administrator" |
where | Filter using eval expressions | | where count > 100 |
fields | Include or exclude specific fields | | fields src_ip, user, EventCode |
table | Display events as a formatted table | | table _time, user, src_ip, action |
rename | Rename a field | | rename src_ip AS "Source IP" |
sort | Sort results (- for descending) | | sort -count |
head / tail | Return first / last N results | | head 10 |
dedup | Remove duplicate events by field | | dedup user |
reverse | Reverse the order of results | | reverse |
rex | Extract fields using regex | | rex field=_raw "User: (?P<username>\w+)" |
// Statistical Commands
Statistical commands aggregate data � essential for summarising large volumes of log events and spotting anomalies.
stats
The workhorse aggregation command. Groups events and computes aggregate functions.
index=windows EventCode=4625
| stats count AS failures, dc(user) AS unique_users by src_ip
| where failures > 20
| sort -failures
| Function | What it returns |
|---|---|
count | Number of events |
dc(field) | Distinct count of unique values |
values(field) | List of unique values |
list(field) | List of all values (including duplicates) |
sum(field) | Sum of numeric field |
avg(field) | Average of numeric field |
max(field) / min(field) | Maximum / minimum value |
earliest(field) / latest(field) | Chronologically first / last value |
range(field) | Difference between max and min |
timechart
Like stats but bucketed by time � ideal for visualising activity over time.
index=firewall action=blocked
| timechart span=1h count by src_ip limit=5
top / rare
top returns the most common values for a field; rare returns the least common � useful for anomaly detection.
index=windows EventCode=4688
| rare process_name limit=20
// Fields & Extraction
Splunk auto-extracts fields from common log formats. For custom or unstructured logs, you extract fields yourself using rex or the field extractor UI.
rex � Regex field extraction
| rex field=_raw "src=(?P<src_ip>\d{1,3}(?:\.\d{1,3}){3})"
| rex field=_raw "dst=(?P<dst_ip>\d{1,3}(?:\.\d{1,3}){3})"
spath � JSON / XML extraction
Extracts fields from structured data like JSON logs (common with cloud logs, API events).
index=cloudtrail
| spath input=_raw path=userIdentity.userName output=user
| spath input=_raw path=sourceIPAddress output=src_ip
Useful default fields
| Field | Description |
|---|---|
_time | Event timestamp |
_raw | Full raw event text |
host | Hostname the event was collected from |
source | Log file or data source path |
sourcetype | Format/type of the data (e.g. WinEventLog, syslog) |
index | The index the event is stored in |
linecount | Number of lines in the event |
// Eval & Conditional Logic
eval creates or modifies fields using expressions. It's one of the most flexible SPL commands � used for calculations, string manipulation, and conditional logic.
index=windows EventCode=4625
| eval risk = case(
count > 100, "High",
count > 30, "Medium",
count > 10, "Low",
true(), "Info"
)
| Function | Example | Result |
|---|---|---|
if(condition, a, b) | eval flag=if(count>10,"yes","no") | Ternary |
case(...) | Multiple conditions | First matching value |
len(field) | eval cmdlen=len(CommandLine) | String length |
lower(field) / upper(field) | eval u=lower(user) | Case change |
substr(field, start, len) | eval prefix=substr(hash,1,6) | Substring |
split(field, delim) | eval parts=split(path,"/") | String split to MV |
mvcount(field) | eval n=mvcount(values) | Multivalue count |
now() | eval ts=now() | Current epoch time |
strftime(field, fmt) | eval dt=strftime(_time,"%Y-%m-%d") | Format timestamp |
md5(field) | eval h=md5(user) | MD5 hash |
// Time & Subsearches
Time modifiers
Time range can be set in the UI or in the search itself using earliest and latest.
| Modifier | Meaning |
|---|---|
earliest=-24h | Last 24 hours |
earliest=-7d latest=-1d | Between 7 days ago and 1 day ago |
earliest=@d | Start of today (midnight) |
earliest=-1w@w | Start of last week |
earliest=1716768000 | Absolute epoch timestamp |
Subsearches
A subsearch runs first and feeds its results as a filter into the outer search. Useful for cross-referencing threat intel or joining datasets without a lookup.
index=firewall
[search index=threatintel | return 100 malicious_ip]
| table _time, src_ip, dst_ip, action
Subsearches are limited to returning 10,000 results and have a 60-second timeout by default. For large lookups, use a lookup table instead � it's faster and not subject to these limits.
// Threat Hunting Queries
These are ready-to-use hunting queries for common attack techniques. Adjust index names and field names to match your environment.
Brute force / password spray
index=windows EventCode=4625 earliest=-1h
| stats count AS failures, dc(user) AS users_targeted by src_ip
| where failures > 50 OR users_targeted > 10
| sort -failures
Successful login after multiple failures (likely breach)
index=windows EventCode IN (4625, 4624)
| stats count(eval(EventCode=4625)) AS failures,
count(eval(EventCode=4624)) AS successes by user, src_ip
| where failures > 10 AND successes > 0
| sort -failures
PowerShell encoded command execution
index=windows EventCode=4688 earliest=-24h
| where like(lower(CommandLine), "%-enc%") OR like(lower(CommandLine), "%-encodedcommand%")
| table _time, host, user, CommandLine
| sort -_time
Lateral movement � remote service creation
index=windows EventCode=7045 earliest=-24h
| table _time, host, ServiceName, ImagePath, AccountName
| sort -_time
DCSync attack detection
index=windows EventCode=4662 earliest=-1h
| where like(Properties, "%1131f6aa-9c07-11d1-f79f-00c04fc2dcd2%")
OR like(Properties, "%1131f6ad-9c07-11d1-f79f-00c04fc2dcd2%")
| where SubjectUserName!="*$"
| table _time, SubjectUserName, SubjectDomainName, host
Beaconing detection � regular outbound connections
index=network earliest=-6h
| timechart span=5m count by dest_ip
| eventstats stdev(count) AS sd, avg(count) AS mean by dest_ip
| where sd < 2 AND mean > 1
| table dest_ip, mean, sd
Rare parent-child process relationships
index=windows EventCode=4688 earliest=-24h
| stats count by ParentProcessName, NewProcessName
| where count < 5
| sort count
Build a baseline first � run hunting queries over a 30-day window to establish what's normal before hunting anomalies over a 1-day window. High count thresholds in a query should reflect your environment, not arbitrary numbers.
// Key Indexes & Sources
Common indexes and sourcetypes you'll encounter in enterprise Splunk deployments.
| Sourcetype / Index | Data source | Key fields |
|---|---|---|
WinEventLog:Security | Windows Security event log | EventCode, user, src_ip, LogonType |
WinEventLog:System | Windows System event log | EventCode, host, ServiceName |
XmlWinEventLog:Microsoft-Windows-Sysmon/Operational | Sysmon | EventCode, Image, CommandLine, ParentImage, Hashes, DestinationIp |
access_combined | Apache / IIS web logs | clientip, uri, status, method, bytes |
pan:traffic | Palo Alto firewall | src_ip, dest_ip, app, action, bytes_out |
cisco:asa | Cisco ASA firewall | src_ip, dest_ip, action, protocol |
aws:cloudtrail | AWS CloudTrail API logs | userIdentity.userName, eventName, sourceIPAddress, errorCode |
o365:management:activity | Microsoft 365 audit logs | Operation, UserId, ClientIP, ResultStatus |
Windows Event IDs � Quick Reference
| Event ID | Description | Relevance |
|---|---|---|
| 4624 | Successful logon | Baseline / lateral movement |
| 4625 | Failed logon | Brute force |
| 4648 | Logon using explicit credentials | Pass-the-Hash, runas |
| 4662 | Operation on AD object | DCSync |
| 4672 | Special privileges assigned | Privilege escalation |
| 4688 | Process created | Execution tracking |
| 4698 | Scheduled task created | Persistence |
| 4720 | User account created | Account creation |
| 4732 | Member added to security group | Privilege escalation |
| 4768 / 4769 | Kerberos TGT / service ticket requested | Kerberoasting |
| 7045 | New service installed (System log) | Persistence / lateral movement |
// Tips & Best Practices
Filter early, aggregate late. Always put your most restrictive filters (index, sourcetype, EventCode) at the start of the search before any | commands. Splunk reads raw data before passing to the pipeline � the more you filter before the first pipe, the faster the search.
Use fields early. Adding | fields src_ip, user, EventCode right after your base search strips unnecessary fields and speeds up subsequent pipeline steps significantly on large result sets.
Save searches as alerts or reports. Any SPL search can be scheduled. Set up alerts to trigger when hunting queries match above a threshold � this turns a one-off hunt into continuous detection.
Use tstats for speed. The tstats command queries the TSIDX index metadata rather than raw events � orders of magnitude faster for large time ranges. Requires the Common Information Model (CIM) data models to be accelerated.
Wildcard searches are expensive. Leading wildcards (*value) require a full scan of the index and are very slow. Use rex or extract structured fields instead, and avoid *keyword* patterns on high-volume indexes.