LogStatistics

WebHome | UnixGeekTools | Geekfarm | About This Site

logstatsd and monitoring

Monitoring an application frequently involves monitoring it's log file(s). Log files may contain hundreds or thousands of events per minute. Parsing the entire log file can be a very cpu intensive task making near-real-time reporting or monitoring difficult.

logstatsd was designed to solve this problem. logstatsd runs as a daemon on the server where the log file resides, parsing new entries as they enter the log, and storing statistics. The daemon can be signaled to export current xml reports or to update an RRD.

In addition to collecting near-real-time data for monitoring, logstatsd also makes an excellent tool for generating offline reports from single log file, multiple log files, or even multiple log files scattered across multiple servers.

Log::Statistics

Log::Statistics parses log entries into fields and collects statistics about fields that you find interesting (e.g. date/time, duration, transaction name, status, end user locations, etc). For example, if a transaction field is specified, the number of hits for each unique transaction will be counted. If a duration field is available in the log, then information about average response times of each transaction will also be recorded.

Additionally, statistics may be collected for multi-dimensional data. For example, if you specify collection of data about transaction name grouped with status, you can generate a report about the numbers of each status for each transaction. If you collect summary statistics about status grouped with time, then you might see statistics about the successful and unsuccessful transactions per minute. You can even group three or more fields, e.g. grouping status, transaction, and minute will show the statuses for each transaction broken down by minute. See the "Example XML output" section in the included documentation.

Thresholds may be defined to categorize response times. For example, by specifying two thresholds of 1 second and 5 seconds, you would gather data on the number of transactions that were less than 1 second, the number between 1 and 5 seconds, and the number over 5 seconds. See the "THRESHOLDS" section in the POD for more details.

Log::Statistics parses formatted data in log files. Unlike some log processing tools which run a series of regexps on each log entry and

count each match, Log::Statistics splits each entry into a series of fields using a single regexp. This makes it useful for files like an apache access log or CSV files, but less useful for files with less predictable contents like an apache error log.

Log files

In order to monitor useful data, you must start with a log file that contains the information you want to track. For web applications, apache access can provide data such as end user information, status codes, transaction names, date/time, and durations. In some cases it may make sense to create a custom log for your application that contain entries for other events you want to monitor, e.g. database transaction times, back end system response times, etc.

Download

Similar Free / Open Source Projects

Log::Statistics was influenced by other modules including:

- Algorithm::Accounting - Algorithm::Accounting performs some of the same operations as Log::Statistics. The Log::Statistics algorithm adds handling for durations and response time thresholds, and is much more efficient.

- Logfile - Logfile is designed for offline file reporting. Log::Statistics adds handling for response time thresholds.

Pointers

CPAN

Planned features

tutorial - real time apache monitoring
field-specific filters, e.g. track transactions given urls with params
read in generated xml file and start processing log file where last left off
read in multiple generated xml files and generate summary report
handle multiple files on multiple servers in daemon mode

Updated Sun Jul 23, 2006 12:12 PM