Application areas
From TracingWiki
A number of different application areas, usage categories and typical use cases, for monitoring and tracing, are detailed here.
Tracing and monitoring may be used for different reasons:
- traceability, documentation or control purposes;
- system understanding, debugging or tuning;
- system health monitoring;
- security and intrusion detection.
A trace can be generated and recorded on demand when the need arises, either for a random control or because a problem is suspected. It can also be generated continuously, in flight recorder mode, and only recorded on permanent storage when needed, because a problem is detected such as a crash or for sampling purposes. It is also possible to continuously generate and record a trace.
A posteriori trace analysis is used to minimize the impact on the system under study. It can be performed on the same system, after the trace collection is finished, or on a different system. It is used to determine after the fact why the system was failing or was not performing as well as expected. Online trace analysis is usually performed on a separate computer, networked with the computers under study. This reduces the impact on the system under study by not running the data analysis framework on it. Online trace analysis provides a real-time or near real-time picture of the system behavior, useful to detect security problems and intrusions in real-time, or to correlate the trace content with the perceived system state in real-time.
System tracing is ideally suited for the understanding, debugging and performance tuning of applications where several processes heavily interact with the operating system and with each other, or where the real-time behavior is important. This includes:
- online servers with heavy interactions between several cooperating processes, the operating system, the network and the disks;
- embedded real-time systems for machine and vehicle control, cellular phones, or multimedia pocket computers;
- desktop computers running several cooperating processes for the graphical user interface, the window manager, the sound server, and the wireless network manager;
Systems running CPU intensive tasks, either a single system or a cluster with relatively infrequent interactions, may be analyzed with simpler profiling tools.
In the following sections, typical use cases for tracing and monitoring are described.
[edit] Cluster of online servers
When a cluster of online servers (file servers, web servers, telecom servers) is not performing well or encounters random failures, detailed execution traces are generated on each computer. The traces are either stored on disk continuously, or the trace buffers are filled in flight recorder mode and saved to disk only when encountering a problem or performing a specific performance test. The traces are then collected on a single computer and analyzed to understand the problem.
[edit] System monitoring
A critical system (battlefield support server) is instrumented to continuously generate a system trace at the lowest level. The system trace is sent on the fly to a monitoring system. The monitoring system checks the trace for evidence of abnormal behavior, cyber attack or security intrusion. When something is detected, a human operator is alerted or corrective measures are activated.
[edit] Embedded system
An embedded system (high end cellular phone handset) is being debugged. An execution trace can be generated for the processes of interest and for a short duration (making a phone call, viewing a short video). The trace buffers are limited in size and there is no local disk. The tracing data is stored on flash memory or sent on the fly to a development host through a communication link, without additional hardware required.
