How to Detect Anomalies in Splunk Using Streamstats (2024)

How to Detect Anomalies in Splunk Using Streamstats (1)

  • Report this article

Hurricane Labs How to Detect Anomalies in Splunk Using Streamstats (2)

Hurricane Labs

Managed Splunk and Security Services for your business's unique use case.

Published Sep 28, 2021

+ Follow

In his recent Splunk tutorial, Josh discusses different methods for anomaly detection, including standard deviation, MLTK, and StreamStats. This post provides a basic overview of his talk; to learn more about this topic, you can find the unabridged post here.

What is standard deviation?

Standard deviation measures the amount of spread in a dataset using the value’s distance from the mean. With standard deviation, a certain percentage of data will be seen as anomalous depending on the distribution of data. In security contexts, user behavior is most often an exponential distribution; in other words, having more data means more outliers–and that means more alerts.

What about MLTK?

Splunk’s Machine Learning Toolkit (MLTK) adds machine learning capabilities to Splunk, including an algorithm for anomaly detection called DensityFunction. DensityFunction, however, has limitations with large datasets.

Recommended by LinkedIn

Splunk Jai Adithiya K 4 months ago
Service Fabric. Logging. Splunk Igor Lashchenko 5 years ago
Splunk Discovery Day Moscow 2018 Alexander Leonov 5 years ago

Using StreamStats to get neighboring values

Streamstats can mimic alert investigation by calculating distance from the nearest neighbors. If the count over the past 30 days is significantly higher than previous counts, consider it anomalous. For a correlation search, we need to make sure we’re pulling in the data we want and that it’s normalized. It can also be useful to add additional metrics to filter on.

Conclusion

Base your detection method on what an outlier is in your data. If standard deviation provides those results, stick with it–but in my experience, standard deviation provides more noise than actionable results for our use cases.

Calculating distance from the nearest neighbors works well, regularly providing anomalous results. Applying this method allows analysts to focus on abnormal behavior, reducing their workload.

Looking for more details? See the extended content here!

Help improve contributions

Mark contributions as unhelpful if you find them irrelevant or not valuable to the article. This feedback is private to you and won’t be shared publicly.

Contribution hidden for you

This feedback is never shared publicly, we’ll use it to show better contributions to everyone.

To view or add a comment, sign in

More articles by this author

No more previous content

  • 7 Steps to a Proactive Vulnerability Management Plan Sep 15, 2022
  • First Look: Splunk 9.0 Configuration Change Logging Jun 15, 2022
  • Splunk Indexer Clustering: Your Hero in the Fight Against Data Loss Jun 8, 2022
  • How to Reduce Your Organization’s Vulnerability to Social Engineering May 11, 2022
  • Getting Started with Automation Before You’re Ready to SOAR Apr 27, 2022
  • The Russia-Ukraine War: Malware Risks and Mitigations Apr 5, 2022
  • 6 Tips for Wireless Security Jan 20, 2022
  • Console Wars Part 1: Hacks for Hackers Dec 20, 2021
  • Ingesting a CSV file into Splunk Dec 9, 2021
  • Malware Analysis Part 3: The phases and roles of incident response Nov 23, 2021

No more next content

Sign in

Stay updated on your professional world

Sign in

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

Insights from the community

  • Telecommunications Systems How can you ensure the 5G system test data is accurate?
  • Programming Languages How do you debug and troubleshoot monitors and condition variables in complex systems?
  • SNMP How do you choose between MIB and YANG for SNMP data modeling?
  • System Architecture How do you learn from the results and feedback of fault injection tests?
  • Operations Research What do you do with unreliable data in network analysis?
  • Computer Science How can you implement a critical section in a concurrent program?
  • Distributed Control System (DCS) What are the key features and functionalities of a modern DCS integration solution?

Others also viewed

  • Splunk Discovery Day Moscow 2018 Alexander Leonov 5y
  • Data Analysis made easy! Hema Mohan 7y
  • Splunk ITSI Favorites Day #2 Emily Duncan 6y
  • A curious case of field formatting in Splunk and Datadog Alex Gerulaitis 3y
  • PowerProtect Data Manager: Automating Virtual Machine Whitespace Reporting Cliff Rodriguez 2mo
  • Rocana Vs. Splunk: IT Operations Management Battle Of Words Jason Bloomberg 8y
  • Fire the Detective: Transparency in Data Ken Weston 6y
  • Simplify Threat Management while Minimizing Risk and Safeguarding your Business Rajan Sharma 7y
  • G2X GovCon Market Research: Weekly Roundup G2Xchange 5mo
  • BIG DATA ! Dirk Reinders 7y

Explore topics

  • Sales
  • Marketing
  • Business Administration
  • HR Management
  • Content Management
  • Engineering
  • Soft Skills
  • See All
How to Detect Anomalies in Splunk Using Streamstats (2024)
Top Articles
Latest Posts
Article information

Author: Barbera Armstrong

Last Updated:

Views: 6277

Rating: 4.9 / 5 (79 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Barbera Armstrong

Birthday: 1992-09-12

Address: Suite 993 99852 Daugherty Causeway, Ritchiehaven, VT 49630

Phone: +5026838435397

Job: National Engineer

Hobby: Listening to music, Board games, Photography, Ice skating, LARPing, Kite flying, Rugby

Introduction: My name is Barbera Armstrong, I am a lovely, delightful, cooperative, funny, enchanting, vivacious, tender person who loves writing and wants to share my knowledge and understanding with you.