Splunk SPL � SIEM & Threat Hunting

// What is Splunk?

Splunk is a Security Information and Event Management (SIEM) platform that ingests, indexes, and correlates log data from across an environment � endpoints, servers, firewalls, identity providers, cloud services, and more. SOC analysts use it to search for threats, build detection rules, and investigate alerts.

Data is queried using SPL (Search Processing Language) � a pipeline-based query language similar in concept to Unix pipes. Understanding SPL is a core skill for any blue team analyst working in a Splunk environment.

Splunk organises data into indexes, which are sets of events stored on disk. Data is broken into events � individual log lines � each with a timestamp and a set of fields parsed from the raw text. The most common default index is main; many environments use custom indexes per data source (e.g. windows, firewall, aws).

// SPL Basics

Every SPL search starts with a search command (implicit or explicit) and passes results through a pipeline using the pipe character |. Each command in the pipeline transforms or filters the result set.

SPL � Basic Structure

index=main sourcetype=WinEventLog EventCode=4625
| stats count by src_ip, user
| sort -count
| head 20

This search: finds failed logins (EventCode 4625) ? counts them grouped by source IP and user ? sorts descending ? shows the top 20.

Search terms

Syntax	Meaning	Example
`keyword`	Any event containing this word	`malware`
`field=value`	Field equals exact value	`user=admin`
`field=value`	Wildcard match	`process=powershell`
`NOT field=value`	Exclude matches	`NOT src_ip=10.0.0.1`
`field=val1 OR field=val2`	OR logic	`EventCode=4624 OR EventCode=4625`
`field IN (a, b, c)`	Matches any in list	`EventCode IN (4624, 4625, 4648)`
`"exact phrase"`	Exact phrase match	`"lateral movement"`

// Search Commands

These are the commands you'll use in almost every search. They filter, transform, and reshape event data flowing through the pipeline.

Command	What it does	Example
`search`	Filter events by keyword or field	`\| search user="administrator"`
`where`	Filter using eval expressions	`\| where count > 100`
`fields`	Include or exclude specific fields	`\| fields src_ip, user, EventCode`
`table`	Display events as a formatted table	`\| table _time, user, src_ip, action`
`rename`	Rename a field	`\| rename src_ip AS "Source IP"`
`sort`	Sort results (- for descending)	`\| sort -count`
`head` / `tail`	Return first / last N results	`\| head 10`
`dedup`	Remove duplicate events by field	`\| dedup user`
`reverse`	Reverse the order of results	`\| reverse`
`rex`	Extract fields using regex	`\| rex field=_raw "User: (?P<username>\w+)"`

// Statistical Commands

Statistical commands aggregate data � essential for summarising large volumes of log events and spotting anomalies.

stats

The workhorse aggregation command. Groups events and computes aggregate functions.

SPL

index=windows EventCode=4625
| stats count AS failures, dc(user) AS unique_users by src_ip
| where failures > 20
| sort -failures

Function	What it returns
`count`	Number of events
`dc(field)`	Distinct count of unique values
`values(field)`	List of unique values
`list(field)`	List of all values (including duplicates)
`sum(field)`	Sum of numeric field
`avg(field)`	Average of numeric field
`max(field)` / `min(field)`	Maximum / minimum value
`earliest(field)` / `latest(field)`	Chronologically first / last value
`range(field)`	Difference between max and min

timechart

Like stats but bucketed by time � ideal for visualising activity over time.

SPL

index=firewall action=blocked
| timechart span=1h count by src_ip limit=5

top / rare

top returns the most common values for a field; rare returns the least common � useful for anomaly detection.

SPL

index=windows EventCode=4688
| rare process_name limit=20

// Fields & Extraction

Splunk auto-extracts fields from common log formats. For custom or unstructured logs, you extract fields yourself using rex or the field extractor UI.

rex � Regex field extraction

SPL

| rex field=_raw "src=(?P<src_ip>\d{1,3}(?:\.\d{1,3}){3})"
| rex field=_raw "dst=(?P<dst_ip>\d{1,3}(?:\.\d{1,3}){3})"

spath � JSON / XML extraction

Extracts fields from structured data like JSON logs (common with cloud logs, API events).

SPL

index=cloudtrail
| spath input=_raw path=userIdentity.userName output=user
| spath input=_raw path=sourceIPAddress output=src_ip

Useful default fields

Field	Description
`_time`	Event timestamp
`_raw`	Full raw event text
`host`	Hostname the event was collected from
`source`	Log file or data source path
`sourcetype`	Format/type of the data (e.g. `WinEventLog`, `syslog`)
`index`	The index the event is stored in
`linecount`	Number of lines in the event

// Eval & Conditional Logic

eval creates or modifies fields using expressions. It's one of the most flexible SPL commands � used for calculations, string manipulation, and conditional logic.

SPL � Conditional classification

index=windows EventCode=4625
| eval risk = case(
    count > 100, "High",
    count > 30, "Medium",
    count > 10, "Low",
    true(), "Info"
)

Function	Example	Result
`if(condition, a, b)`	`eval flag=if(count>10,"yes","no")`	Ternary
`case(...)`	Multiple conditions	First matching value
`len(field)`	`eval cmdlen=len(CommandLine)`	String length
`lower(field)` / `upper(field)`	`eval u=lower(user)`	Case change
`substr(field, start, len)`	`eval prefix=substr(hash,1,6)`	Substring
`split(field, delim)`	`eval parts=split(path,"/")`	String split to MV
`mvcount(field)`	`eval n=mvcount(values)`	Multivalue count
`now()`	`eval ts=now()`	Current epoch time
`strftime(field, fmt)`	`eval dt=strftime(_time,"%Y-%m-%d")`	Format timestamp
`md5(field)`	`eval h=md5(user)`	MD5 hash

// Time & Subsearches

Time modifiers

Time range can be set in the UI or in the search itself using earliest and latest.

Modifier	Meaning
`earliest=-24h`	Last 24 hours
`earliest=-7d latest=-1d`	Between 7 days ago and 1 day ago
`earliest=@d`	Start of today (midnight)
`earliest=-1w@w`	Start of last week
`earliest=1716768000`	Absolute epoch timestamp

Subsearches

A subsearch runs first and feeds its results as a filter into the outer search. Useful for cross-referencing threat intel or joining datasets without a lookup.

SPL � Subsearch

index=firewall
    [search index=threatintel | return 100 malicious_ip]
| table _time, src_ip, dst_ip, action

Subsearches are limited to returning 10,000 results and have a 60-second timeout by default. For large lookups, use a lookup table instead � it's faster and not subject to these limits.

// Threat Hunting Queries

These are ready-to-use hunting queries for common attack techniques. Adjust index names and field names to match your environment.

Brute force / password spray

SPL

index=windows EventCode=4625 earliest=-1h
| stats count AS failures, dc(user) AS users_targeted by src_ip
| where failures > 50 OR users_targeted > 10
| sort -failures

Successful login after multiple failures (likely breach)

SPL

index=windows EventCode IN (4625, 4624)
| stats count(eval(EventCode=4625)) AS failures,
        count(eval(EventCode=4624)) AS successes by user, src_ip
| where failures > 10 AND successes > 0
| sort -failures

PowerShell encoded command execution

SPL

index=windows EventCode=4688 earliest=-24h
| where like(lower(CommandLine), "%-enc%") OR like(lower(CommandLine), "%-encodedcommand%")
| table _time, host, user, CommandLine
| sort -_time

Lateral movement � remote service creation

SPL

index=windows EventCode=7045 earliest=-24h
| table _time, host, ServiceName, ImagePath, AccountName
| sort -_time

DCSync attack detection

SPL

index=windows EventCode=4662 earliest=-1h
| where like(Properties, "%1131f6aa-9c07-11d1-f79f-00c04fc2dcd2%")
    OR like(Properties, "%1131f6ad-9c07-11d1-f79f-00c04fc2dcd2%")
| where SubjectUserName!="*$"
| table _time, SubjectUserName, SubjectDomainName, host

Beaconing detection � regular outbound connections

SPL

index=network earliest=-6h
| timechart span=5m count by dest_ip
| eventstats stdev(count) AS sd, avg(count) AS mean by dest_ip
| where sd < 2 AND mean > 1
| table dest_ip, mean, sd

Rare parent-child process relationships

SPL

index=windows EventCode=4688 earliest=-24h
| stats count by ParentProcessName, NewProcessName
| where count < 5
| sort count

Build a baseline first � run hunting queries over a 30-day window to establish what's normal before hunting anomalies over a 1-day window. High count thresholds in a query should reflect your environment, not arbitrary numbers.

// Key Indexes & Sources

Common indexes and sourcetypes you'll encounter in enterprise Splunk deployments.

Sourcetype / Index	Data source	Key fields
`WinEventLog:Security`	Windows Security event log	EventCode, user, src_ip, LogonType
`WinEventLog:System`	Windows System event log	EventCode, host, ServiceName
`XmlWinEventLog:Microsoft-Windows-Sysmon/Operational`	Sysmon	EventCode, Image, CommandLine, ParentImage, Hashes, DestinationIp
`access_combined`	Apache / IIS web logs	clientip, uri, status, method, bytes
`pan:traffic`	Palo Alto firewall	src_ip, dest_ip, app, action, bytes_out
`cisco:asa`	Cisco ASA firewall	src_ip, dest_ip, action, protocol
`aws:cloudtrail`	AWS CloudTrail API logs	userIdentity.userName, eventName, sourceIPAddress, errorCode
`o365:management:activity`	Microsoft 365 audit logs	Operation, UserId, ClientIP, ResultStatus

Windows Event IDs � Quick Reference

Event ID	Description	Relevance
4624	Successful logon	Baseline / lateral movement
4625	Failed logon	Brute force
4648	Logon using explicit credentials	Pass-the-Hash, runas
4662	Operation on AD object	DCSync
4672	Special privileges assigned	Privilege escalation
4688	Process created	Execution tracking
4698	Scheduled task created	Persistence
4720	User account created	Account creation
4732	Member added to security group	Privilege escalation
4768 / 4769	Kerberos TGT / service ticket requested	Kerberoasting
7045	New service installed (System log)	Persistence / lateral movement

// Tips & Best Practices

Filter early, aggregate late. Always put your most restrictive filters (index, sourcetype, EventCode) at the start of the search before any | commands. Splunk reads raw data before passing to the pipeline � the more you filter before the first pipe, the faster the search.

Use fields early. Adding | fields src_ip, user, EventCode right after your base search strips unnecessary fields and speeds up subsequent pipeline steps significantly on large result sets.

Save searches as alerts or reports. Any SPL search can be scheduled. Set up alerts to trigger when hunting queries match above a threshold � this turns a one-off hunt into continuous detection.

Use tstats for speed. The tstats command queries the TSIDX index metadata rather than raw events � orders of magnitude faster for large time ranges. Requires the Common Information Model (CIM) data models to be accelerated.

Wildcard searches are expensive. Leading wildcards (*value) require a full scan of the index and are very slow. Use rex or extract structured fields instead, and avoid *keyword* patterns on high-volume indexes.