Blog

Full Stack System Call Latency Profiling

François Doray, Julien Desfossez on 18 March 2015

Finding the exact origin of a kernel-related latency can be very difficult: we usually end up recording too much or not enough information, and it becomes even more complex if the problem is sporadic or hard to reproduce.

In this blog post, we present a demo of a new feature to extract as much relevant information as possible around an unusual system call latency. Depending on the configuration, during a long system call, we can extract:

multiple kernel-space stacks of the process to identify where we are waiting,
the user-space stack to identify the origin of the system call,
an LTTng snapshot of the user and/or kernel-space trace to give some background information about what led us here and what was happening on the system at that exact moment.

All of this information is tracked at run-time with low overhead and recorded only when some pre-defined conditions are matched (PID, current state and latency threshold).

Combining all of this data gives us enough information to accurately determine where the system call was launched in an application, what it was waiting for in the kernel, and what system-wide events (kernel and/or user-space) could explain this unexpected latency.

LTTng 2.6 Has Been Released!

Jérémie Galarneau on 27 February 2015

Comments

At long last, LTTng 2.6 Gaïa has been released!

On top of several bug fixes and improvements, this new release introduces three new major features: Java log4j support, per-syscall tracing, and client machine interface (MI).

Finding the Root Cause of a Web Request Latency

Julien Desfossez on 04 February 2015

Comments

When trying to solve complex performance issues, a kernel trace can be the last resort: capture everything and try to understand what is going on. With LTTng, that would go like that:

lttng create
lttng enable-event -k -a
lttng start
...wait for the problem to appear...
lttng stop
lttng destroy

Once this is done, depending on how long the capture is, there are probably a lot of events in the resulting trace (~50.000 events per second on my mostly idle laptop), now it is time to make sense of it !

This post is a first in a serie to present the LTTng analyses scripts. In this one, we will try to solve an unusual I/O latency issue.

Tracing Bare-Metal Systems: a Multi-Core Story

Philippe Proulx on 25 November 2014

Comments

Some systems do not have the luxury of running Linux. In fact, some systems have no operating system at all. In the embedded computing world, they are called bare-metal systems.

Bare-metal systems usually run a single application; think of microcontrollers, digital signal processors, or real-time dedicated units of any kind. It would be wrong, however, to assume that those application-specific applications are simple: they often are sophisticated little beasts. Hence the need to debug them, and of course, to trace them in order to highlight latency issues, especially since they're almost always required to meet strict real-time constraints.

Since Linux is not available on bare-metal systems, LTTng is unfortunately out of reach. LTTng's trace format, the Common Trace Format (CTF), is, however, still very relevant. Because CTF was designed with flexibility and write performance in mind, it's actually a well suited trace format for bare-metal environments.

LTTng Toolchain 2.5.0 is out!

Christian Babeux on 04 August 2014

Comments

Hi all!

The LTTng 2.5 stable release, codenamed Fumisterie, is finally ready!

« Newer 3 4 5 6 7 Older »

Full Stack System Call Latency Profiling

LTTng 2.6 Has Been Released!

Finding the Root Cause of a Web Request Latency

Tracing Bare-Metal Systems: a Multi-Core Story

LTTng Toolchain 2.5.0 is out!

What is LTTng?

Recent posts