Data Analysis Tools
From TracingWiki
TracingBook : Data Analysis Tools
Analysis tools produce views that help understand traces. Views can be text-based, or take the form of tables or graphical displays, among others. A programmable data provider such as SystemTap may double as a data analysis tool, by performing filtering/analysis/rendering of results.
Common views included in analysis tools:
- Pretty-printed list of raw events
- Histogram showing the density of events through time
- State of the system through time (scheduling, state of processes, state of hardware or other logical resources)
- Dependency analysis
Other useful features found in some tools:
- Filtering, searching for patterns
- Bookmarking
Data analysis tools face the challenge of remaining interactive while handling huge amounts of data. Even in compact binary form, traces can be as large as 10 gigabytes or more, which generally exceeds the quantity of available RAM; they can therefore not be loaded completely into memory. When a user zooms onto a particular region of such a trace, the i/o and computations necessary for extracting state information can take a long time. One solution to this problem involves reading the trace completely when initially opening it to precompute a number of useful values and properties.
For instance, the state information can be computed incrementally, as each event is processed, and saved at regular intervals. Thereafter, when a user seeks to a specific time point in the trace, the state at that point can be computed starting at the nearest saved state and processing the events in between, rather than starting from the beginning of the trace. Another similar approach consists in generating synthetic state events at regular points in the trace. These state events directly provide the state information to display, which otherwise may require reading a large portion of the trace to compute.
A large number of different analysis may be performed on a trace. Examples include computing the causality links between events (thus finding the time critical path between two events), or searching for specific patterns (excessive swapping, spurious timeouts, overloaded disk subsystems).
