Debugging Python applications on Linux with LTTng-UST

The EfficiOS team on 16 July 2019

Some of the world's largest and most critical applications are written in Python, a language used in everything from high-performance scientific programs to large-scale websites. Keeping that code running when issues arise often requires tracing on production systems where reliability and low-intrusiveness are vital.

LTTng is an open source tracing framework for Linux designed to be safe for production use. Developers and admins who need to examine low-level events, and who need to trace their applications on-the-fly without restarting them, will find this tool invaluable due to how it unifies several information sources, supporting languages including Java, C++, and Python.

This article offers a brief introduction on how to use LTTng-UST to trace Python events, how to trace the Linux kernel with LTTng, and how to process LTTng traces. It then looks at some examples of tracing Python applications using the LTTng-UST (User-Space Tracer) as well as the kernel tracer.

LTTng-UST

LTTng-UST is a port of the low-overhead LTTng kernel tracer to user-space which enables tracing of applications and libraries. Included with LTTng-UST is a Python agent that hooks into the standard logging mechanisms of Python, funneling everything Python already logs to the LTTng buffers, and allowing you to seamlessly trace Python applications.

Adding LTTng-UST instrumentation to an application provides anchor points in kernel traces, allowing you to see what the application was doing when something happened in the kernel. You can make use of this feature by adding the lttngust package to the source code of any Python application that uses the standard logging package. Doing so automatically emits LTTng events for each log statement, which are then written to a trace log.

However, sorting through “all events” is overwhelming. Targeting specific events is more useful when investigating a particular issue, and produces smaller, more manageable traces. LTTng's event rules enable you accomplish this task. Even when targeting specific events, however, working with traces can be a daunting task. Worse, such logs are typically harder to work with than text log files. They're larger, difficult to explore, and often require deep knowledge of the application or of the kernel itself to make sense of what was captured.

To simplify analyzing your trace logs, LTTng and Babeltrace can be combined to create purpose-built tools, allowing you to choose which events to trace, how to present them in your logs, and how to filter the trace data. The increased ability to understand kernel behaviour, compare logs from multiple sources side by side, and gain a holistic view of your system reduces the time to find and fix application issues.

To demonstrate how easy it is to trace Python applications with LTTng and Babeltrace, the instructions below were presented as part of our talk at PyCon Canada 2017: Using Python to make sense of system traces.

Prerequisites

First you have to install the software. Follow the installation instructions for LTTng, including installing LTTng-UST. You should also ensure you have Babeltrace installed so you're ready to analyse the trace results.

Recording traces

Here is a sample app called httpserver.py. This app simply redirects the Python http handler logging to funnel through LTTng's logger module.

import logging
import http.server
import socketserver
import os
import lttngust

class LoggingHTTPRequestHandler(http.server.SimpleHTTPRequestHandler):
    def __init__(self, *args, **kwargs):
        self._logger = logging.getLogger('HTTP-logger')
        super().__init__(*args, **kwargs)

    def do_GET(self):
        self._logger.debug('begin GET')
        super().do_GET()
        self._logger.debug('end GET')

    def log_request(self, code='-', size='-'):
        self._logger.debug('{} request: {}, code: {}, size: {}'.format(
            self.address_string(), self.requestline, code, size))

    def log_error(self, format, *args):
        self._logger.error('%s - %s' %
                           (self.address_string(),
                            format%args))

    def log_message(self, format, *args):
        self._logger.info('%s - %s' %
                          (self.address_string(),
                           format%args))


logging.basicConfig()

PORT = 8000
print('Server PID: {}'.format(os.getpid()))

httpd = socketserver.TCPServer(("", PORT), LoggingHTTPRequestHandler)
httpd.allow_reuse_address = True
httpd.serve_forever()

LTTng-UST is included in the same way as any other Python package:

import lttngust

Launch `httpserver.py` by itself in a terminal:

python3 httpserver.py

Server PID: <pid>

The basic commands to start tracing are:

Create a session named "py_tracing":
```
$
```
```
lttng create py_tracing
```
Enable all Python agent (logger) events:
```
$
```
```
lttng enable-event --python --all
```
Start tracing:
```
$
```
```
lttng start
```
Visit localhost:8000 with a web browser.
Stop tracing:
```
$
```
```
lttng stop
```
View the trace:
```
$
```
```
lttng view
```

Alternatively, you can start a tracing session in live mode which lets you view the trace while the session is still running. For example, to trace user-space events for the above httpserver.py application:

Create a live session named "py_live_tracing":
```
$
```
```
lttng create py_live_tracing --live
```
Enable all Python agent (logger) events:
```
$
```
```
lttng enable-event --python --all
```
Start tracing:
```
$
```
```
lttng start
```
View the live trace:
```
$
```
```
lttng view
```
Visit localhost:8000 with a web browser.

You can also trace kernel system calls invoked by the application. But beware that your user account needs to be a member of the tracing group:

Create a live session named "py_tracing_syscalls:
```
$
```
```
lttng create py_tracing_syscalls --live
```
Enable all Python agent (logger) events:
```
$
```
```
lttng enable-event --python --all
```

Enable tracing of all syscalls:

lttng enable-event --kernel --syscall --all

Tell LTTng to only track the provided PID:
```
$
```
```
lttng track --kernel --pid <pid>
```
Start tracing:
```
$
```
```
lttng start
```
View Python logging and syscalls performed on the same timeline:
```
$
```
```
lttng view
```
Visit localhost:8000 with a web browser.

Working with trace data

The output above is for one simple application, but even this relatively small amount of information is hard to read. The LTTng analyses makes the results easier to read by enabling you to extract and analyze trace data offline with a collection of command-line tools. In addition, Babeltrace supports creating Python scripts, customized for your own application needs, with Python bindings.

To use Python bindings with the current version of Babeltrace (1.5.5 at the time of writing), take a look at the GitHub repo for the project and follow the instructions in the README for building Babeltrace with Python bindings enabled.

Conclusion

Tracing Python applications with LTTng is as simple as importing the lttngust package and running a few commands—and that means you can understand the behaviour of your code in no time at all. When sifting through trace data becomes overwhelming, LTTng analyses and Babeltrace offer a way to make sense of the data.

If you'd like to learn more about LTTng and Babeltrace, check out the LTTng and Babeltrace documentation.

← Back to LTTng's blog