[hemmerling] Logfile Data Processing / Logfile Analysis / IT Operations Management

Related page:

Events

Logstash

Rocana

Splunk

The Tool

The free local Splunk Platform

Literature

Training & Tutorial & Tips & Tricks

Configuration Tips

Where to configure?

  • Configuration by:
    • File “C:\Program Files\Splunk\etc\system\local\props.conf”
    • At “Add Data / Set Sourcetype”.

How to configure?

  • The ISO date format is easily recognized by the pattern
    Section "Timestamp"
    Field: Timestamp format 
    Value: %Y-%m-%d
    
  • Question: “The largest setting I can make for MAX_DAYS_AGO according to the props.conf.spec is 10951 days. Is there anything I can do if I have data prior to 1984?”.
  • Answer #1: “You can just set it to -1”.
  • Example configurations of “props.conf” & “Add Data / Set Sourcetype / Timestamp” settings:
    Timestamp format = %Y-%m-%d
    Timestamp fields = Datum, Uhrzeit
    
  • Example configurations of “props.conf”, “Add Data / Set Sourcetype / Delimiter settings:
    Field preamble = ^#.*
    Field names = Auto
    Field names on line number = (empty)
    
  • Example configurations of “props.conf”, “Add Data / Set Sourcetype / Delimiter settings:
    Field names on line number = 10
    Field names on line number = 2
    
  • Example configurations of “props.conf”, “Add Data / Set Sourcetype / Advanced settings”:
    MAX_DAYS_AGO = -1
    

Sample Search Quests

sourcetype=vornamen Anzahl_Knaben!=Anzahl
sourcetype="vornamen" Datum="2006-12-31" | top 4 Vornamenstatistik Haeufigkeiten
sourcetype="vornamen" Datum="2007-12-31" | top 4 Vornamenstatistik Haeufigkeiten
sourcetype = "vornamen" Datum != "#" | top 4 Vornamenstatistik Haeufigkeiten by Datum
sourcetype = "vornamen" Datum != "#" | top 4 Haeufigkeiten by Vornamenstatistik, Datum

Resources

  • Limit of the free Splunk edition: “The maximum file upload size is 500 Mb”.
  • Splunk expects CSV data with ”,” delimiter. By default, OpenOffice & LibreOffice save with space ” ” as delimiter.
    • Check the “Edit Filter Settings” box in the save dialog, else the default delimiter is space, instead of ”,”!
    • *If the delimiter is the default ”,” ,
      • At “Add Data”, Splunk accepts data with fieldnames containing spaces ( e.g. “Number of Persons” ).
      • At “Search & Reporting / Pivot”, “Fields - Which fields would yo like to use as a Data Model”, such data fields with fieldnames containing spaces may be selected for use as a Data Model.
      • But at “New Pivot”, “Add X-Axis” and “Add Y-Axis”, such data fields with fieldnames containing spaces are not available as fields :-(.
      • So far, I didn´t test if data values may not contain spaces ( e.g. “One Person” ), too.
      • Statistical software as Splunk expects high data quality as available with computer-generated data.
        • If your data values are manually created, and then even have non-numeric data fields ( e.g. for the x-axis “One Person”, “To Persons”, “Three Persons, “More than three Persons” - the last one can´t even be transformed to numeric value ), typing mistakes cause different datasets.
        • So you must check such fields at the “Data Summary” infos of Splunk, if there are just the number of expected values ( e.g. in our example 4 values, and not 5 by a typing mistake like “One Persons” ) for this field.
        • On the other hand, numeric data of numeric fields suitable for y-axis might have repeated values, which might cause a misleading counting at “Data Summary”. Remember the difference between “Count” and “Distinct Count”, with Splunk.
  • Splunk expects:
    • Logfiles with a data stamp in each data row. The minimum data stamp is the full date of day, i.e. 2015-11-01.
      • So data which is stored in a single file for each date ( e.g. each day ) is not suitable for Splunk.
      • So data which is not organized by a date stamp ( but e.g. by the zip code of a country ) is not suitable for Splunk.
      • Additionally Splunk offers the date of a file as field “_time”. This doesn´t help much for building a timestamp for data processing, if the file was generated, downloaded or modified manually, or the date of the file is irrelevant due to other reasons.
    • Decriptions of each data table row ( “Field” ), at the top of the file.
    • You may specify at the “Source Type” configuration, that the field names are on a certain line number.
    • However, there is no option to define a final line. So data garbage at the end of the file might disturb the data processing. Especially it would be hard or impossible to put 2 different data sources in one physical file, defined by a single “Source Type”.
  • EN.Wikipedia "Splunk", DE.Wikipedia "Splunk" - “The freeware version is limited to 500 MB of data a day, and lacks some features of the Enterprise license edition”.

Resources

 
en/logfileprocessing.html.txt · Last modified: 2024/09/07 15:23 (external edit) · []
Recent changes RSS feed Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki