====== [hemmerling] Logfile Data Processing / Logfile Analysis / IT Operations Management ====== Related page: *[[mathengineering.html|Mathematical Engineering]]. ===== Events ===== *Online Hackathon [[http://splunkapptitude2.devpost.com/|Devpost "Splunk Apptitude App Challenge"]], 2015. ===== Logstash ===== *The OpenSource [[http://www.elastic.co/products/logstash|Elasticsearch BV, "Logstash"]] in Ruby - "Collect, Enrich, and Transport Data". *[[http://www.elastic.co/downloads/logstash|Elasticsearch BV, - Downloads "Logstash"]]. *[[http://www.elastic.co/community|Elasticsearch BV "Community"]]. *Blog [[http://www.elastic.co/blog|Elastic Blog]]. *[[http://www.github.com/elastic/logstash|GitHub "elastic/logstash"]] - "logstash - transport and process your logs, events, or other data". *[[http://www.twitter.com/elastic|Twitter "elastic, @elastic"]]. ===== Rocana ===== *[[http://www.rocana.com/|Rocana]]. *Here you may order te free PDF e-book [[http://info.rocana.com/ebook-real-big-data-for-it-ops|Rocana "e-Book: Using R.E.A.L. Big Data for IT Operations"]]. *The free PDF e-book [[http://info.rocana.com/hubfs/Rocana_REALBigData_eBook.pdf|Using R.E.A.L. Big Data for IT Operations]] ( PDF ). *Experts consider Rocana as more advanced alternative to Splunk. ===== Splunk ===== ==== The Tool ==== *The Java application [[http://www.splunk.com/|Splunk]] - "Operational Intelligence, Log Management ...". *[[http://www.splunk.com/de_de/solutions/solution-areas/log-management.html|Splunk "Log-Management-Lösungen: Auswertung von Logdaten für Einblicke in die Vorgänge im Unternehmen"]] - "Splunk bietet die branchenführende Software für die Konsolidierung und Indizierung von Log- und Maschinendaten, einschließlich strukturierter, unstrukturierter und komplexer, mehrzeiliger Anwendungslogdaten". *The free [[http://www.splunk.com/en_us/download/universal-forwarder.html|Splunk "Splunk Universal Forwarder"]] for Windows, Linux, MacOSX,... *[[http://www.github.com/splunk|GitHub "Splunk"]] - As Splunk is not OpenSource, there are just a few additional tools published here... *[[http://www.splunk.com/en_us/community.html|Splunk Community]]. *[[http://dev.splunk.com/|Splunk Developer Portal]]. *[[http://splunkbase.splunk.com/|Splunkbase]] - Splunk's App store. *Wiki [[http://wiki.splunk.com/|Splunk Wiki]]. *Blogs [[http://blogs.splunk.com/|Splunk Blogs]]. *Blog [[http://blogs.splunk.com/dev/|Splunk Blog - Category "Dev"]]. *[[http://www.twitter.com/splunkdev|Twitter "Splunk Dev, @splunkdev"]]. *Download and do not uncompress the [[http://docs.splunk.com/images/Tutorial/tutorialdata.zip|tutorial data file]]! ==== The free local Splunk Platform ==== *The Splunk Web interface is at [[http://localhost:8000|http://localhost:8000]]. *username: "admin". *password: "changeme". ==== Literature ==== *Book [[http://www.amazon.de/exec/obidos/ASIN/1849693285/hemmerling-21|Vincent Bumgarner "Implementing Splunk: Big Data Reporting and Development for Operational Intelligence"]], 2013. *Book [[http://www.amazon.de/exec/obidos/ASIN/0982550677/hemmerling-21|David Carasso "Exploring Splunk"]], 2012. *Book [[http://www.amazon.de/exec/obidos/ASIN/1849697841/hemmerling-21|Josh Diakun, Paul R Johnson, Derek Mock "Splunk Operational Intelligence Cookbook"]], 2014. *Book [[http://www.amazon.de/exec/obidos/ASIN/1514615746/hemmerling-21|Grigori Melnik, Dominic Betts "Building Splunk Solutions (Second edition): Splunk Developer Guide"]], 2015. *Accompanying website [[http://dev.splunk.com/view/dev-guide/SP-CAAAE2R|Splunk Developer Guidance]]. *Book [[http://www.amazon.de/exec/obidos/ASIN/1782173838/hemmerling-21|James Miller "Mastering Splunk"]], 2014. *"Splunk Developer's Guide. *Book [[http://www.amazon.de/exec/obidos/ASIN/1784398381/hemmerling-21|Betsy Page Sigman "Splunk Essentials"]], 2015. *Book [[http://www.amazon.de/exec/obidos/ASIN/B01AJST0TY/hemmerling-21|Erickson Delgado, Betsy Page Sigman "Splunk Essentials - Second Edition Kindle Edition"]], 2016. *Kindle E-Book [[http://www.amazon.de/exec/obidos/ASIN/1785882376/hemmerling-21|Kyle Smith "Splunk Developer's Guide - Second Edition"]], 2016 ( no paper edition yet available or announced ). *Book [[http://www.amazon.de/exec/obidos/ASIN/143025761X/hemmerling-21|Peter Zadrozny, Raghu Kodali "Big Data Analytics Using Splunk: Deriving Operational Intelligence from Social Media, Machine Data, Existing Data Warehouses, and Other Real-Time Streaming Sources"]], 2013. === Training & Tutorial & Tips & Tricks === *[[http://www.splunk.com/en_us/download/universal-forwarder/thank-you.html|Splunk "Let's get started"]]. *[[http://docs.splunk.com/Documentation/SplunkLight|Splunk Documentation, Manuals "Splunk Light"]] - "Splunk Light delivers full-featured log search and analysis for small businesses and workgroups". *Documentation as online HTML website & PDF. *[[http://docs.splunk.com/Documentation/Splunk/latest/SearchTutorial|Splunk Knowledgebase "Search Tutorial"]]. *[[http://docs.splunk.com/Documentation/Splunk/latest/Search/Identifyeventpatterns|Splunk Knowledgebase "Search Tutorial" / "Identify event patterns with the Patterns tab"]]. *[[http://docs.splunk.com/Documentation/Splunk/latest/Search/Usethesearchcommand|Splunk Knowledgebase "Search Tutorial" / "Search command primer"]]. *[[http://docs.splunk.com/Documentation/Splunk/latest/Data|Splunk Knowledgebase "Getting Data In"]]. *[[http://docs.splunk.com/Documentation/Splunk/latest/Data/Whysourcetypesmatter|Splunk Knowledgebase "Getting Data In" / "Why source types matter"]]. *[[http://docs.splunk.com/Documentation/Splunk/latest/Data/Listofpretrainedsourcetypes|Splunk Knowledgebase "Getting Data In" / "List of pretrained source types"]]. *[[http://docs.splunk.com/Documentation/Splunk/latest/Data/Createsourcetypes|Splunk Knowledgebase "Getting Data In" / "Create source types"]]. *[[http://docs.splunk.com/Documentation/Splunk/latest/Data/Managesourcetypes|Splunk Knowledgebase "Getting Data In" / "Manage source types"]]. *[[http://docs.splunk.com/Documentation/Splunk/latest/Data/Configuretimestamprecognition|Splunk Knowledgebase "Getting Data In" / "Configure timestamp recognition"]]. *[[http://docs.splunk.com/Documentation/Splunk/latest/Data/HowSplunkextractstimestamps|Splunk Knowledgebase "Getting Data In" / "How timestamp assignment works"]]. *[[http://docs.splunk.com/Documentation/Splunk/latest/Data/ConfigurePositionalTimestampExtraction|Splunk Knowledgebase "Getting Data In" / "Configure timestamp assignment for events with multiple timestamps"]]. *[[http://docs.splunk.com/Documentation/Splunk/latest/Data/Extractfieldsfromfileheadersatindextime|Splunk Knowledgebase "Getting Data In" / "Extract data from files with headers"]]. *[[http://docs.splunk.com/Documentation/Splunk/latest/Forwarding|Splunk Knowledgebase "Forwarding Data"]] -"Install the universal forwarder software". *[[http://docs.splunk.com/Documentation/Splunk/latest/Forwarding/Theuniversalforwarder|Splunk Knowledgebase "Forwarding Data" / "The universal forwarder"]]. *[[http://docs.splunk.com/Documentation/Splunk/latest/Knowledge|Splunk Knowledgebase "Knowledge Manager Manual"]]. *[[http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Aboutfields|Splunk Knowledgebase "Knowledge Manager Manual" / "About fields"]]. *[[http://docs.splunk.com/Documentation/Splunk/latest/Indexer/|Splunk Knowledgebase "Managing Indexers and Clusters of Indexers"]]. *[[http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/AboutSplunkregularexpressions|Splunk Knowledgebase "About Splunk Enterprise regular expressions"]]. *[[http://docs.splunk.com/Documentation/Splunk/latest/SearchReference|Splunk Knowledgebase "Search Reference"]]. *[[http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/CommonEvalFunctions|Splunk Knowledgebase "Search Reference" / "Evaluation functions"]]. *[[http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Rex|Splunk Knowledgebase "Search Reference" / "rex"]]. *[[http://wiki.splunk.com/Community:RegexTestingTools|Splunk Wiki "Community:RegexTestingTools"]] - "Some helpful tools for writing regular expressions". *[[http://docs.splunk.com/Documentation/Splunk/latest/admin|Splunk Knowledgebase "Admin Manual"]]. *[[http://docs.splunk.com/Documentation/Splunk/latest/admin/Propsconf|Splunk Knowledgebase "Admin Manual" / "props.conf"]]. *[[http://answers.splunk.com/|Splunk Answers]]. *[[http://answers.splunk.com/topics/max_days_ago.html|Splunk Answers - Search for "max_days_ago"]]. *[[http://answers.splunk.com/answers/833/how-does-splunk-determine-the-date-when-there-is-no-date-stamp-in-the-event.html|Splunk Answers "How does Splunk determine the date, when there is no date stamp in the event?"]]. *[[http://answers.splunk.com/answers/22968/exclude-events-with-specific-field-value-from-results.html|Splunk Answers "Exclude events with specific field value from results"]]. *[[http://answers.splunk.com/answers/41266/dateparserverbose-timestamp-match-is-outside-of-the-acceptable-time-window.html|Splunk Answers "DateParserVerbose - timestamp match is outside of the acceptable time window"]]. *[[http://answers.splunk.com/answers/52257/how-to-exclude-some-result.html|Splunk Answers "How to exclude some result"]]. *[[http://answers.splunk.com/answers/82116/ignoring-a-specific-portion-of-the-log-file-header-footer.html|Splunk Answers "Ignoring a specific portion of the log file (header/footer)"]]. *[[http://answers.splunk.com/answers/133285/whats-the-earliest-date-i-can-have-in-splunk.html|Splunk Answers "What's the earliest date I can have in Splunk?"]]. *Error message"The TIME_FORMAT specified is matching timestamps (Sun Dec 31 00:00:00 2006) outside of the acceptable time window. If this timestamp is correct, consider adjusting MAX_DAYS_AGO and MAX_DAYS_HENCE. Failed to parse timestamps. Default to timestamps of previous event(Wed Dec 31 00:00:00 2014). *Answer: 1971-01-01 ( and not 1970-01-01 as you may expect reading [[http://en.wikipedia.org/wiki/Unix_time|EN.Wikipedia "Unix time"]], or 1970-12-31 ! ). *[[http://answers.splunk.com/answers/134553/how-to-delete-data-index-reset-start-from-scratch.html|Splunk Answers "How to delete data / index (reset start from scratch)"]]. ==== Configuration Tips ==== ==== Where to configure? ==== *Configuration by: *File "C:\Program Files\Splunk\etc\system\local\props.conf" *At "Add Data / Set Sourcetype". ==== How to configure? ==== *The ISO date format is easily recognized by the pattern Section "Timestamp" Field: Timestamp format Value: %Y-%m-%d *Question: "The largest setting I can make for MAX_DAYS_AGO according to the props.conf.spec is 10951 days. Is there anything I can do if I have data prior to 1984?". *Answer #1: "You can just set it to -1". *Example configurations of "props.conf" & "Add Data / Set Sourcetype / Timestamp" settings: Timestamp format = %Y-%m-%d Timestamp fields = Datum, Uhrzeit *Example configurations of "props.conf", "Add Data / Set Sourcetype / Delimiter settings: Field preamble = ^#.* Field names = Auto Field names on line number = (empty) *Example configurations of "props.conf", "Add Data / Set Sourcetype / Delimiter settings: Field names on line number = 10 Field names on line number = 2 *Example configurations of "props.conf", "Add Data / Set Sourcetype / Advanced settings": MAX_DAYS_AGO = -1 ==== Sample Search Quests ==== sourcetype=vornamen Anzahl_Knaben!=Anzahl sourcetype="vornamen" Datum="2006-12-31" | top 4 Vornamenstatistik Haeufigkeiten sourcetype="vornamen" Datum="2007-12-31" | top 4 Vornamenstatistik Haeufigkeiten sourcetype = "vornamen" Datum != "#" | top 4 Vornamenstatistik Haeufigkeiten by Datum sourcetype = "vornamen" Datum != "#" | top 4 Haeufigkeiten by Vornamenstatistik, Datum ==== Resources ==== *Limit of the free Splunk edition: "The maximum file upload size is 500 Mb". *Splunk expects CSV data with "," delimiter. By default, OpenOffice & LibreOffice save with space " " as delimiter. *Check the "Edit Filter Settings" box in the save dialog, else the default delimiter is space, instead of ","! *[[http://answers.launchpad.net/ubuntu/+source/openoffice.org/+question/194976|Ubuntu Answers "can't change delimiter when a file is already saved as csv"]]. **If the delimiter is the default "," , *At "Add Data", Splunk accepts data with fieldnames containing spaces ( e.g. "Number of Persons" ). *At "Search & Reporting / Pivot", "Fields - Which fields would yo like to use as a Data Model", such data fields with fieldnames containing spaces may be selected for use as a Data Model. *But at "New Pivot", "Add X-Axis" and "Add Y-Axis", such data fields with fieldnames containing spaces are not available as fields :-(. *So far, I didn´t test if data values may not contain spaces ( e.g. "One Person" ), too. *Statistical software as Splunk expects high data quality as available with computer-generated data. *If your data values are manually created, and then even have non-numeric data fields ( e.g. for the x-axis "One Person", "To Persons", "Three Persons, "More than three Persons" - the last one can´t even be transformed to numeric value ), typing mistakes cause different datasets. *So you must check such fields at the "Data Summary" infos of Splunk, if there are just the number of expected values ( e.g. in our example 4 values, and not 5 by a typing mistake like "One Persons" ) for this field. *On the other hand, numeric data of numeric fields suitable for y-axis might have repeated values, which might cause a misleading counting at "Data Summary". Remember the difference between "Count" and "Distinct Count", with Splunk. *Splunk expects: *Logfiles with a data stamp in each data row. The minimum data stamp is the full date of day, i.e. 2015-11-01. *So data which is stored in a single file for each date ( e.g. each day ) is not suitable for Splunk. *So data which is not organized by a date stamp ( but e.g. by the zip code of a country ) is not suitable for Splunk. *Additionally Splunk offers the date of a file as field "_time". This doesn´t help much for building a timestamp for data processing, if the file was generated, downloaded or modified manually, or the date of the file is irrelevant due to other reasons. *Decriptions of each data table row ( "Field" ), at the top of the file. *You may specify at the "Source Type" configuration, that the field names are on a certain line number. *However, there is no option to define a final line. So data garbage at the end of the file might disturb the data processing. Especially it would be hard or impossible to put 2 different data sources in one physical file, defined by a single "Source Type". *[[http://msdn.microsoft.com/en-us/library/ms235560%28v=vs.90%29.aspx|Microsoft Developers Network "C Run-Time Error R6034"]]. *Error message when installing Splunk on a Win8.1, 32-bit, where "Python X,Y" is already installed: Microsoft Visual C++ Runtime Library Runtime Error! Program C:\Program Files\Splunk\bin\Python.EXE R6034 An application has made an attempt to load the C runtime library incorrectly. Please contact the application's support team for more information. *[[http://answers.splunk.com/answers/60706/splunk-services-wont-start-after-install-error.html|Splunk Answers "Splunk services won't start after install error"]]. *[[http://answers.splunk.com/answers/312901/why-am-i-getting-error-r6034-an-application-has-ma.html|Splunk Answers "Why am I getting error "R6034 An application has made an attempt to load the C runtime library incorrectly." during a Splunk 6.3 installation?"]]. *[[http://answers.splunk.com/answers/4444/splunk-error-runtime-error-r6034.html|Splunk Answers "Splunk error - runtime error R6034"]] - "I uninstalled ActiveState's Python, and now splunkd starts right up". *[[http://en.wikipedia.org/wiki/Unix_time|EN.Wikipedia "Unix time"]], [[http://de.wikipedia.org/wiki/Unixzeit|DE.Wikipedia "Unixzeit"]]. *[[http://en.wikipedia.org/wiki/Splunk|EN.Wikipedia "Splunk"]], [[http://de.wikipedia.org/wiki/Splunk|DE.Wikipedia "Splunk"]] - "The freeware version is limited to 500 MB of data a day, and lacks some features of the Enterprise license edition". ===== Resources ===== *[[http://www.forbes.com/sites/jasonbloomberg/2015/11/25/rocana-vs-splunk-it-operations-management-battle-of-words/|Forbes Tech "Rocana Vs. Splunk: IT Operations Management Battle Of Words"]], 2015. {{tag>"mathematical engineering" "logfile data processing" logfile data processing "logfile analysis" logfile analysis "it operations management" it operations management}}