Best book to be a Splunk ninja?
August 3, 2020 1:52 PM   Subscribe

Hi all, back for a book recommendation, this time on Splunk.

Looking for something to assist with building apps and dashboards, optimizing queries, and troubleshooting performance issues. I've got ad hoc query syntax down cold, but I'm now in a position where my queries need to become reports or dashboards and run quickly. Any recommendations? Also happy to take blog links for engineers worth reading that are dealing with these topics.
posted by bfranklin to Computers & Internet (2 answers total) 3 users marked this as a favorite
 
I don't know of a good book recommendation, I've always worked with the online references for Splunk at their web site.

My core model is that we're finding structure in unstructured data, then building a reliable pipeline to ingest, transform and present that data. I like the search syntax building pipelines that transform data because the '|' model looks to me like working on the unix command line. It extends the paradigm to search heads and indexes, where you break the pipeline into saved high-water-mark indexes that are run on a schedule in the indexers. But you still work with ingest -> processing -> index -> more processing -> visualisation.

The key win of moving a query from a Search Head to an Indexer is that the work to prepare a search isn't done at search time but in periodic chunks as new data arrives to ingest. You need to have a policy for 'ageing off' data that's indexed but you won't look at again, or you'll be swamped by old (full-content) data and not the summary statistics and dashboards you use. Splunk has deletion tasks for that.

* Use the Jobs dashboard to monitor how much data and processing time your searches take, and use that data to be sure that your indexing is more efficient.
* A year or so ago, sub-searches went over the whole available time frame rather than the restricted time-frame of your search (I haven't checked if that's still the case or relevant if indexers aren't search heads) and it'll mess up attempts to write efficient searches.
* Dropping data by specifying only certain Fields helps write efficient searches
* Extending that 'dropping data' thought to indexes allows us to keep a low number (IIRC below 20,000) of records moving between stages in the pipeline -- if searches or index jobs are timing out, take a shorter time frame to guarantee (rather than fail) you've processed fewer records.

Dashboards need things that will help you make decisions:
  - is everything still OK?
  - is [item] present or missing?
  - has it crossed a threshold?
  - how much headroom is there?
  - when was the last time it was this high/low a value?
  - how extreme is this metric in terms of variance, or frequency (one-in-ten-thousand, one-in-a-million, one-in-a-billion)?
posted by k3ninho at 1:53 AM on August 5, 2020


Update: A quick search of the recommendation blogs suggests the GoSplunk.com community might have better answers than us here. Though Splunk Operational Intelligence Cookbook caught my eye for being a set of recipes you can work through -- learn by doing -- with praise for being accessible for people who aren't IT experts.
posted by k3ninho at 4:53 AM on August 6, 2020


« Older how to set boundaries with a child who isn't mine   |   What is the name of this (editing) job? Newer »

You are not logged in, either login or create an account to post comments