SystemTap

From TracingWiki

Jump to: navigation, search

Systemtap is a debugging and troubleshooting tool. It is an open source project with contributors from IBM, Red Hat, Intel, Hitachi, Oracle and others. It uses mainly the kprobes API to dynamically instrument the Linux Kernel, and the experimental utrace/uprobes APIs to dynamically instrument user-space programs/libraries. Other data providers can be used seamlessly with Systemtap such as kernel markers and /proc file system controls.

Systemtap works by processing script files where he a user specify probe points and associate handlers to them. Probe points currently supported include function entry, exit and down to almost any machine instruction. Probing can also be enabled asynchronously, at regular time interval [1].

Here are some probing points declarations that can be specified in a script file:

  • Probing a kernel function entry:

kernel.function("sys_read"){ ... }

  • Probing a kernel function exit:

kernel.function("sys_read").return{ ... }

  • Probing all function entries in a module:

module("ext3").function("*@fs/ext3/inode.c"){ ... }

  • Asynchronously probing using a timer:

probe timer.ms(10000){ ... }

  • Attaching to Linux Kernel Markers:

probe kernel.mark(“context_switch”) {residency[$arg2,cpu()]++} [2]

  • Probing a user space program or shared library:

probe program(“/lib/libc-2.7.so”).function(“malloc”) {if($size > 1024*1024) log(“hog!”)}

  • Probing a dtrace-style static instrumentation marker in a userspace program:

probe process("postgres").mark("transaction*") {log ($$vars)}

The default contextual variables at a probe point that can be retrieved include: cpu number, egid, euid, execname, gid, pexecname, pid, ppid, tid, uid, stack_size, target pid. One limitation here is that these functions may not return correct values when the probe is hit in an interrupt context. It is possible however to access (read or even modify) variables from within the context of the traced program. These target variables can be referenced on demand in the handlers of the script file and their names should be prefixed with a $ sign. They will be resolved to memory addresses during the script elaboration phase. It is worth noting here that such information may not always be available whenever optimization options are used during kernel build.

The probe handler scripts initially pass through the elaboration phase where symbolic references including function parameters, local and global variable, etc. get resolved to run-time real addresses. This is achieved by looking into the DWARF debugging information generated by the compiler during the program's or kernel's build. The next phase consists in translating the processed script into C code and compiling it as a kernel module. The module first inserts the probes then waits for a probe to be hit [3]. Since the generated code will run in kernel mode and for safety issues, it is made open for inspection.

The impact of Systemtap dynamic instrumentation was evaluated. The test is a time measurement of the compilation of gcc-3.4.6 with and without dynamic tracing to get an estimate of the cost per event. The activated event to trace is the function entry of the read system call. No special handling is done except for incrementing a counter whenever the event is triggered. The test machine has an Intel Pentium 4 3.00GHz processor, and 2 GB of RAM. The table below summarizes the results obtained.

Read Syscall probe enabledNo probes
Avg. execution time32m 44.105s32m 41.673
Avg. number of events1,872,394
Cost per event1.30 us

[edit] References

http://sourceware.org/systemtap/tutorial.pdf http://sourceware.org/systemtap/langref.pdf http://sourceware.org/systemtap/documentation.html http://sourceware.org/systemtap/systemtap-ols.pdf http://ltt.polymtl.ca/tracingwiki/images/0/03/SystemTap_FrankEigler_MontrealJan2008.pdf

Personal tools