Finding the Root Cause of a Web Request Latency


Photo: Copyright © Alexandre Claude, used with permission

When trying to solve complex performance issues, a kernel trace can be the last resort: capture everything and try to understand what is going on. With LTTng, that would go like that:

lttng create
lttng enable-event -k -a
lttng start
...wait for the problem to appear...
lttng stop
lttng destroy

Once this is done, depending on how long the capture is, there are probably a lot of events in the resulting trace (~50.000 events per second on my mostly idle laptop), now it is time to make sense of it !

This post is a first in a serie to present the LTTng analyses scripts. In this one, we will try to solve an unusual I/O latency issue.

Tracing Bare-Metal Systems: a Multi-Core Story


Photo: Copyright © Michael Kirschner, used with permission

Some systems do not have the luxury of running Linux. In fact, some systems have no operating system at all. In the embedded computing world, they are called bare-metal systems.

Bare-metal systems usually run a single application; think of microcontrollers, digital signal processors, or real-time dedicated units of any kind. It would be wrong, however, to assume that those application-specific applications are simple: they often are sophisticated little beasts. Hence the need to debug them, and of course, to trace them in order to highlight latency issues, especially since they're almost always required to meet strict real-time constraints.

Since Linux is not available on bare-metal systems, LTTng is unfortunately out of reach. LTTng's trace format, the Common Trace Format (CTF), is, however, still very relevant. Because CTF was designed with flexibility and write performance in mind, it's actually a well suited trace format for bare-metal environments.