Detection Intermediate SIEM / Threat Hunting / Log Analysis

Splunk SPL � SIEM & Threat Hunting

A practical SPL reference for SOC analysts � covering search syntax, essential commands, field extraction, statistical analysis, and real-world threat hunting queries for common attack patterns.

20 min read Reference Guide Blue Team / SOC

// What is Splunk?

Splunk is a Security Information and Event Management (SIEM) platform that ingests, indexes, and correlates log data from across an environment � endpoints, servers, firewalls, identity providers, cloud services, and more. SOC analysts use it to search for threats, build detection rules, and investigate alerts.

Data is queried using SPL (Search Processing Language) � a pipeline-based query language similar in concept to Unix pipes. Understanding SPL is a core skill for any blue team analyst working in a Splunk environment.

Splunk organises data into indexes, which are sets of events stored on disk. Data is broken into events � individual log lines � each with a timestamp and a set of fields parsed from the raw text. The most common default index is main; many environments use custom indexes per data source (e.g. windows, firewall, aws).

// SPL Basics

Every SPL search starts with a search command (implicit or explicit) and passes results through a pipeline using the pipe character |. Each command in the pipeline transforms or filters the result set.

SPL � Basic Structure
index=main sourcetype=WinEventLog EventCode=4625
| stats count by src_ip, user
| sort -count
| head 20

This search: finds failed logins (EventCode 4625) ? counts them grouped by source IP and user ? sorts descending ? shows the top 20.

Search terms

SyntaxMeaningExample
keywordAny event containing this wordmalware
field=valueField equals exact valueuser=admin
field=*value*Wildcard matchprocess=*powershell*
NOT field=valueExclude matchesNOT src_ip=10.0.0.1
field=val1 OR field=val2OR logicEventCode=4624 OR EventCode=4625
field IN (a, b, c)Matches any in listEventCode IN (4624, 4625, 4648)
"exact phrase"Exact phrase match"lateral movement"

// Statistical Commands

Statistical commands aggregate data � essential for summarising large volumes of log events and spotting anomalies.

stats

The workhorse aggregation command. Groups events and computes aggregate functions.

SPL
index=windows EventCode=4625
| stats count AS failures, dc(user) AS unique_users by src_ip
| where failures > 20
| sort -failures
FunctionWhat it returns
countNumber of events
dc(field)Distinct count of unique values
values(field)List of unique values
list(field)List of all values (including duplicates)
sum(field)Sum of numeric field
avg(field)Average of numeric field
max(field) / min(field)Maximum / minimum value
earliest(field) / latest(field)Chronologically first / last value
range(field)Difference between max and min

timechart

Like stats but bucketed by time � ideal for visualising activity over time.

SPL
index=firewall action=blocked
| timechart span=1h count by src_ip limit=5

top / rare

top returns the most common values for a field; rare returns the least common � useful for anomaly detection.

SPL
index=windows EventCode=4688
| rare process_name limit=20

// Fields & Extraction

Splunk auto-extracts fields from common log formats. For custom or unstructured logs, you extract fields yourself using rex or the field extractor UI.

rex � Regex field extraction

SPL
| rex field=_raw "src=(?P<src_ip>\d{1,3}(?:\.\d{1,3}){3})"
| rex field=_raw "dst=(?P<dst_ip>\d{1,3}(?:\.\d{1,3}){3})"

spath � JSON / XML extraction

Extracts fields from structured data like JSON logs (common with cloud logs, API events).

SPL
index=cloudtrail
| spath input=_raw path=userIdentity.userName output=user
| spath input=_raw path=sourceIPAddress output=src_ip

Useful default fields

FieldDescription
_timeEvent timestamp
_rawFull raw event text
hostHostname the event was collected from
sourceLog file or data source path
sourcetypeFormat/type of the data (e.g. WinEventLog, syslog)
indexThe index the event is stored in
linecountNumber of lines in the event

// Eval & Conditional Logic

eval creates or modifies fields using expressions. It's one of the most flexible SPL commands � used for calculations, string manipulation, and conditional logic.

SPL � Conditional classification
index=windows EventCode=4625
| eval risk = case(
    count > 100, "High",
    count > 30, "Medium",
    count > 10, "Low",
    true(), "Info"
)
FunctionExampleResult
if(condition, a, b)eval flag=if(count>10,"yes","no")Ternary
case(...)Multiple conditionsFirst matching value
len(field)eval cmdlen=len(CommandLine)String length
lower(field) / upper(field)eval u=lower(user)Case change
substr(field, start, len)eval prefix=substr(hash,1,6)Substring
split(field, delim)eval parts=split(path,"/")String split to MV
mvcount(field)eval n=mvcount(values)Multivalue count
now()eval ts=now()Current epoch time
strftime(field, fmt)eval dt=strftime(_time,"%Y-%m-%d")Format timestamp
md5(field)eval h=md5(user)MD5 hash

// Time & Subsearches

Time modifiers

Time range can be set in the UI or in the search itself using earliest and latest.

ModifierMeaning
earliest=-24hLast 24 hours
earliest=-7d latest=-1dBetween 7 days ago and 1 day ago
earliest=@dStart of today (midnight)
earliest=-1w@wStart of last week
earliest=1716768000Absolute epoch timestamp

Subsearches

A subsearch runs first and feeds its results as a filter into the outer search. Useful for cross-referencing threat intel or joining datasets without a lookup.

SPL � Subsearch
index=firewall
    [search index=threatintel | return 100 malicious_ip]
| table _time, src_ip, dst_ip, action

Subsearches are limited to returning 10,000 results and have a 60-second timeout by default. For large lookups, use a lookup table instead � it's faster and not subject to these limits.

// Threat Hunting Queries

These are ready-to-use hunting queries for common attack techniques. Adjust index names and field names to match your environment.

Brute force / password spray

SPL
index=windows EventCode=4625 earliest=-1h
| stats count AS failures, dc(user) AS users_targeted by src_ip
| where failures > 50 OR users_targeted > 10
| sort -failures

Successful login after multiple failures (likely breach)

SPL
index=windows EventCode IN (4625, 4624)
| stats count(eval(EventCode=4625)) AS failures,
        count(eval(EventCode=4624)) AS successes by user, src_ip
| where failures > 10 AND successes > 0
| sort -failures

PowerShell encoded command execution

SPL
index=windows EventCode=4688 earliest=-24h
| where like(lower(CommandLine), "%-enc%") OR like(lower(CommandLine), "%-encodedcommand%")
| table _time, host, user, CommandLine
| sort -_time

Lateral movement � remote service creation

SPL
index=windows EventCode=7045 earliest=-24h
| table _time, host, ServiceName, ImagePath, AccountName
| sort -_time

DCSync attack detection

SPL
index=windows EventCode=4662 earliest=-1h
| where like(Properties, "%1131f6aa-9c07-11d1-f79f-00c04fc2dcd2%")
    OR like(Properties, "%1131f6ad-9c07-11d1-f79f-00c04fc2dcd2%")
| where SubjectUserName!="*$"
| table _time, SubjectUserName, SubjectDomainName, host

Beaconing detection � regular outbound connections

SPL
index=network earliest=-6h
| timechart span=5m count by dest_ip
| eventstats stdev(count) AS sd, avg(count) AS mean by dest_ip
| where sd < 2 AND mean > 1
| table dest_ip, mean, sd

Rare parent-child process relationships

SPL
index=windows EventCode=4688 earliest=-24h
| stats count by ParentProcessName, NewProcessName
| where count < 5
| sort count

Build a baseline first � run hunting queries over a 30-day window to establish what's normal before hunting anomalies over a 1-day window. High count thresholds in a query should reflect your environment, not arbitrary numbers.

// Key Indexes & Sources

Common indexes and sourcetypes you'll encounter in enterprise Splunk deployments.

Sourcetype / IndexData sourceKey fields
WinEventLog:SecurityWindows Security event logEventCode, user, src_ip, LogonType
WinEventLog:SystemWindows System event logEventCode, host, ServiceName
XmlWinEventLog:Microsoft-Windows-Sysmon/OperationalSysmonEventCode, Image, CommandLine, ParentImage, Hashes, DestinationIp
access_combinedApache / IIS web logsclientip, uri, status, method, bytes
pan:trafficPalo Alto firewallsrc_ip, dest_ip, app, action, bytes_out
cisco:asaCisco ASA firewallsrc_ip, dest_ip, action, protocol
aws:cloudtrailAWS CloudTrail API logsuserIdentity.userName, eventName, sourceIPAddress, errorCode
o365:management:activityMicrosoft 365 audit logsOperation, UserId, ClientIP, ResultStatus

Windows Event IDs � Quick Reference

Event IDDescriptionRelevance
4624Successful logonBaseline / lateral movement
4625Failed logonBrute force
4648Logon using explicit credentialsPass-the-Hash, runas
4662Operation on AD objectDCSync
4672Special privileges assignedPrivilege escalation
4688Process createdExecution tracking
4698Scheduled task createdPersistence
4720User account createdAccount creation
4732Member added to security groupPrivilege escalation
4768 / 4769Kerberos TGT / service ticket requestedKerberoasting
7045New service installed (System log)Persistence / lateral movement

// Tips & Best Practices

Filter early, aggregate late. Always put your most restrictive filters (index, sourcetype, EventCode) at the start of the search before any | commands. Splunk reads raw data before passing to the pipeline � the more you filter before the first pipe, the faster the search.

Use fields early. Adding | fields src_ip, user, EventCode right after your base search strips unnecessary fields and speeds up subsequent pipeline steps significantly on large result sets.

Save searches as alerts or reports. Any SPL search can be scheduled. Set up alerts to trigger when hunting queries match above a threshold � this turns a one-off hunt into continuous detection.

Use tstats for speed. The tstats command queries the TSIDX index metadata rather than raw events � orders of magnitude faster for large time ranges. Requires the Common Information Model (CIM) data models to be accelerated.

Wildcard searches are expensive. Leading wildcards (*value) require a full scan of the index and are very slow. Use rex or extract structured fields instead, and avoid *keyword* patterns on high-volume indexes.