Finding the Root Cause of a Web Request Latency
When trying to solve complex performance issues, a kernel trace can be the last resort: capture everything and try to understand what is going on. With LTTng, that would go like that:
lttng create lttng enable-event -k -a lttng start ...wait for the problem to appear... lttng stop lttng destroy
Once this is done, depending on how long the capture is, there are probably a lot of events in the resulting trace (~50.000 events per second on my mostly idle laptop), now it is time to make sense of it !
This post is a first in a serie to present the LTTng analyses scripts. In this one, we will try to solve an unusual I/O latency issue.
Our test case is a PHP image board
(tinyib-mappy) and we want to
understand why some client requests are significantly slower than others. In
this example, we manually call sync
to force the kernel to flush the
dirty buffers on disk to have an easy example, but the methodology is the same
with more complex use-cases.
On the server, we are recording a kernel LTTng session as explained earlier. On the client, we test and measure with this command:
$ while true; do time wget -q -p http://10.2.0.1/; sleep 0.5; done ... real 0m0.017s real 0m0.016s real 0m0.454s real 0m0.016s real 0m0.015s ...
The output here shows that one request took much longer than the usual delay (around 430ms more). Now we will try to understand the source of this latency.
I/O usage
The resulting trace covers around 16 seconds. Our first step to get an idea of
what was the I/O activity in the trace is to look at the lttng-iousagetop
script:
$ ./lttng-iousagetop analysis-20150115-120942/ Timerange: [2015-01-15 12:18:37.216484041, 2015-01-15 12:18:53.821580313] Per-process I/O Read ############################################################################### ██████████████████████████████████████████████████ 528.0 KB lttng-consumerd (19964) 0 B file 0 B net 528.0 KB unknown ███████ 80.0 KB lttng-consumerd (19964) 0 B file 0 B net 80.0 KB unknown ███ 39.45 KB apache2 (31544) 33.7 KB file 5.33 KB net 424.0 B unknown ██ 29.58 KB apache2 (31591) 25.28 KB file 4.0 KB net 318.0 B unknown ██ 29.58 KB apache2 (31592) 25.28 KB file 4.0 KB net 318.0 B unknown ██ 29.58 KB apache2 (31545) 25.28 KB file 4.0 KB net 318.0 B unknown ██ 29.58 KB apache2 (31517) 25.28 KB file 4.0 KB net 318.0 B unknown ██ 29.58 KB apache2 (31520) 25.28 KB file 4.0 KB net 318.0 B unknown ██ 29.58 KB apache2 (31584) 25.28 KB file 4.0 KB net 318.0 B unknown ██ 29.58 KB apache2 (31589) 25.28 KB file 4.0 KB net 318.0 B unknown Per-process I/O Write ############################################################################### ██████████████████████████████████████████████████ 2.24 MB find (22785) 0 B file 0 B net 2.24 MB unknown ███████████ 528.0 KB lttng-consumerd (19964) 0 B file 0 B net 528.0 KB unknown ████ 225.24 KB apache2 (31544) 0 B file 221.3 KB net 3.94 KB unknown ███ 168.93 KB apache2 (31591) 0 B file 165.98 KB net 2.95 KB unknown ███ 168.93 KB apache2 (31592) 0 B file 165.98 KB net 2.95 KB unknown ███ 168.93 KB apache2 (31545) 0 B file 165.98 KB net 2.95 KB unknown ███ 168.93 KB apache2 (31517) 0 B file 165.98 KB net 2.95 KB unknown ███ 168.93 KB apache2 (31520) 0 B file 165.98 KB net 2.95 KB unknown ███ 168.93 KB apache2 (31584) 0 B file 165.98 KB net 2.95 KB unknown ███ 168.93 KB apache2 (31589) 0 B file 165.98 KB net 2.95 KB unknown
In this first section, we see the top 10 processes performing I/O system calls.
From the information in the trace, we try to identify if the requests are disk
or network-related, but sometimes the trace does not have all the information
we need (especially with FD passing between processes). Of course LTTng itself
is doing some I/O to store the trace, but besides that, we see the apache2
threads receiving our HTTP GET
requests, reading some files on disk and
answering on the network. We also see a find
process that was launched on
the system during the capture of the trace, but up to now, nothing alarming.
Note: you can see that the apache2 processes read less than they write, it is not because Apache magically creates new data, it is a current limitation in the syscall instrumentation. When Apache sends a complete file, it opens it, mmap it and passes the address of the memory map in a vector to writev, but the current instrumentation does not show us the content of the vectors for now, so we can't assume that it relates to the last opened file. This can be fixed in the instrumentation and should be soon.
Next, we see the same information, but sorted by files, so we can see the files shared by multiple processes (and the respective FD number in each process):
Files Read ############################################################################### ██████████████████████████████████████████████████ 304.0 KB anon_inode:[lttng_stream] (lttng-consumerd) 'fd 28 in lttng-consumerd (19964)' ██████████████████████████████████████████ 259.87 KB /var/www/html/tinyib.db 'fd 14 in apache2 (31591)', 'fd 14 in apache2 (31592)', 'fd 14 in apache2 (31545)', 'fd 14 in apache2 (31517)', 'fd 14 in apache2 (31520)', 'fd 14 in apache2 (31544)', 'fd 14 in apache2 (31584)', 'fd 14 in apache2 (31589)', 'fd 14 in apache2 (31588)', 'fd 14 in apache2 (31590)' ██ 12.85 KB socket (lttng-sessiond) 'fd 44 in lttng-sessiond (19839)' 6.0 KB /proc/interrupts 'fd 3 in irqbalance (1337)' 4.88 KB /lib/x86_64-linux-gnu/libc.so.6 'fd 3 in find (22785)', 'fd 3 in bash (22784)', 'fd 3 in bash (22790)', 'fd 3 in sync (22791)', 'fd 3 in bash (22796)', 'fd 3 in lttng (22797)' 4.82 KB socket:[168432] (ssh) 'fd 3 in ssh (10368)' 4.45 KB /proc/stat 'fd 3 in irqbalance (1337)' 3.25 KB /lib/x86_64-linux-gnu/libz.so.1 'fd 3 in bash (22784)', 'fd 3 in bash (22790)', 'fd 3 in bash (22796)', 'fd 3 in lttng (22797)' 3.25 KB /lib/x86_64-linux-gnu/libpthread.so.0 'fd 3 in bash (22784)', 'fd 3 in bash (22790)', 'fd 3 in bash (22796)', 'fd 3 in lttng (22797)' 2.51 KB /usr/share/locale/locale.alias 'fd 3 in find (22785)' Files Write ############################################################################### ██████████████████████████████████████████████████ 2.24 MB /dev/null 'fd 1 in find (22785)', 'fd 2 in bash (22784)', 'fd 2 in bash (22790)', 'fd 2 in bash (22796)' ██████ 304.0 KB /home/julien/lttng-traces/analysis-20150115-120942/kernel/metadata 'fd 32 in lttng-consumerd (19964)' ██████ 304.0 KB pipe:[180336] (lttng-consumerd) 'fd 18 in lttng-consumerd (19964)' █ 55.33 KB unknown (origin not found) (apache2) 'fd 12 in apache2 (31590)' 30.52 KB /var/log/apache2/access.log 'fd 8 in apache2 (31591)', 'fd 8 in apache2 (31592)', 'fd 8 in apache2 (31545)', 'fd 8 in apache2 (31517)', 'fd 8 in apache2 (31520)', 'fd 8 in apache2 (31544)', 'fd 8 in apache2 (31584)', 'fd 8 in apache2 (31589)', 'fd 8 in apache2 (31588)', 'fd 8 in apache2 (31590)' 12.85 KB socket (lttng) 'fd 3 in lttng (22797)' 2.54 KB socket:[164921] (sshd) 'fd 3 in sshd (10339)' 1.69 KB socket:[164888] (sshd) 'fd 3 in sshd (10236)' 1.39 KB /dev/pts/4 'fd 5 in ssh (10368)' 604.0 B /dev/pts/1 'fd 0 in bash (10237)', 'fd 2 in bash (10237)', 'fd 1 in lttng (22752)'
Once again, we see some LTTng-related I/O, we see our sqlite database accessed
by apache, the apache log files and now we understand where find
is
writing (find / >dev/null
)
Next, we see the actual block I/O performed, so how much really touched the disk during the trace period (not cached or buffered):
Block I/O Read ############################################################################### ██████████████████████████████████████████████████ 38.47 MB find (pid=22785) 128.0 KB (pid=-1) 28.0 KB sync (pid=22791) Block I/O Write ############################################################################### ██████████████████████████████████████████████████ 7.57 MB jbd2/sda1-8 (pid=245) █████████████████████████████████████████ 6.24 MB (pid=-1) ███ 572.0 KB kworker/u32:1 (pid=21668) █ 264.0 KB lttng-consumerd (pid=19964) 56.0 KB sync (pid=22791) 40.0 KB lttng-consumerd (pid=19964)
Here, we are starting to see some interesting stuff: even though we didn't do much I/O as far as the system calls are concerned, some kernel threads wrote around 14MB of data directly on disk. So there was definitely some I/O activity but we don't know which process started it.
Finally, the iotop script gives us the breakdown of network usage per network interface:
Network recv_bytes ############################################################################### ████████████████████████████████████████████████████████ 103.9 KB eth1 ███ 6.65 KB eth0 Network sent_bytes ############################################################################### ████████████████████████████████████████████████████████ 1.72 MB eth1 8.03 KB eth0
Ok, so in this phase we saw the overall I/O usage in the trace, no obvious process was saturating our ressources, but we know we are writing actual data on disk. Now let's have a look at the I/O latencies.
I/O latency
The I/O latency is the delay an I/O system call takes to complete (time elapsed
between the syscall entry, syscall exit), or the delay for a block request to
complete (block_rq_issue, block_rq_complete). My bash
process is always
waiting to read one character for my keyboard, so in all the examples, I will
filter out the syscalls that consume less that 2 bytes with the parameter
--minsize 2
Latency statistics
The first step is to get some statistics:
$ ./lttng-iolatencystats analysis-20150115-120942/ --minsize 2 Syscalls latency statistics (usec): Type Count Min Average Max Stdev ---------------------------------------------------------------------- Open 14724 0.250 32.173 437697.753 3607.107 Read 1034 0.360 2.383 482.860 15.337 Write 1036 0.298 7.778 2667.346 82.972 Sync 153 13.393 6601.007 994745.622 80413.186 Disk latency statistics (usec): Name Count Min Average Max Stdev ----------------------------------------------------------------------------------------- sda1 10211 0.002 0.507 24.260 1485.700
Now we are getting somewhere ! If we look at the Max
column, the Open and
Sync have particularly huge numbers compared to the other statistics.
Latency frequency distribution
To illustrate this, let's look at the frequency distribution of just the
open
operations which include most of the syscalls that return a new file
descriptor (open, openat, connect, accept, etc).
$ ./lttng-iolatencyfreq prototyping/analysis-20150115-120942 --minsize 2 Open latency distribution (usec) ############################################################################### 0.250 ████████████████████████████████████████████████████████████ 14723 21885.125 0 43770.000 0 65654.875 0 87539.751 0 109424.626 0 131309.501 0 153194.376 0 175079.251 0 196964.126 0 218849.002 0 240733.877 0 262618.752 0 284503.627 0 306388.502 0 328273.377 0 350158.252 0 372043.128 0 393928.003 0 415812.878 1
As we can see, most of the requests (14723) have a latency between 0.250 and
21885.125 usec, only one is way above. If we want to "zoom in" on this graph,
we can just use the --min
and --max
parameters to filter out the
data.
Top latencies
Now that we know we have some outliers in Open and Sync operations, let's look at the top 10 worst I/O latencies found in this trace:
$ ./lttng-iolatencytop analysis-20150115-120942/ --minsize 2 --limit 4 Top open syscall latencies (usec) Begin End Name Duration (usec) Size Proc PID Filename [12:18:50.432950815,12:18:50.870648568] open 437697.753 N/A apache2 31517 /var/lib/php5/sess_0ifir2hangm8ggaljdphl9o5b5 (fd=13) [12:18:52.946080165,12:18:52.946132278] open 52.113 N/A apache2 31588 /var/lib/php5/sess_mr9045p1k55vin1h0vg7rhgd63 (fd=13) [12:18:46.800846035,12:18:46.800874916] open 28.881 N/A apache2 31591 /var/lib/php5/sess_r7c12pccfvjtas15g3j69u14h0 (fd=13) [12:18:51.389797604,12:18:51.389824426] open 26.822 N/A apache2 31520 /var/lib/php5/sess_4sdb1rtjkhb78sabnoj8gpbl00 (fd=13) Top read syscall latencies (usec) Begin End Name Duration (usec) Size Proc PID Filename [12:18:37.256073107,12:18:37.256555967] read 482.860 7.0 B bash 10237 unknown (origin not found) (fd=3) [12:18:52.000209798,12:18:52.000252304] read 42.506 1.0 KB irqbalance 1337 /proc/interrupts (fd=3) [12:18:37.256559439,12:18:37.256601615] read 42.176 5.0 B bash 10237 unknown (origin not found) (fd=3) [12:18:42.000281918,12:18:42.000320016] read 38.098 1.0 KB irqbalance 1337 /proc/interrupts (fd=3) Top write syscall latencies (usec) Begin End Name Duration (usec) Size Proc PID Filename [12:18:49.913241516,12:18:49.915908862] write 2667.346 95.0 B apache2 31584 /var/log/apache2/access.log (fd=8) [12:18:37.472823631,12:18:37.472859836] writev 36.205 21.97 KB apache2 31544 unknown (origin not found) (fd=12) [12:18:37.991578372,12:18:37.991612724] writev 34.352 21.97 KB apache2 31589 unknown (origin not found) (fd=12) [12:18:39.547778549,12:18:39.547812515] writev 33.966 21.97 KB apache2 31584 unknown (origin not found) (fd=12) Top sync syscall latencies (usec) Begin End Name Duration (usec) Size Proc PID Filename [12:18:50.162776739,12:18:51.157522361] sync 994745.622 N/A sync 22791 None (fd=None) [12:18:37.227867532,12:18:37.232289687] sync_file_range 4422.155 N/A lttng-consumerd 19964 /home/julien/lttng-traces/analysis-20150115-120942/kernel/metadata (fd=32) [12:18:37.238076585,12:18:37.239012027] sync_file_range 935.442 N/A lttng-consumerd 19964 /home/julien/lttng-traces/analysis-20150115-120942/kernel/metadata (fd=32) [12:18:37.220974711,12:18:37.221647124] sync_file_range 672.413 N/A lttng-consumerd 19964 /home/julien/lttng-traces/analysis-20150115-120942/kernel/metadata (fd=32)
If we look at the first line in the Open category, we see clearly that a PHP session file took 437ms to open, whereas it usually takes around 20ms. If we remember the beginning of this example, we had a latency 430ms longer than usual, we are on the right track ! Now we need to understand why this latency occured. In order to narrow down this research, we can center the analysis around the interesting timestamp range:
$ ./lttng-iolatencytop analysis-20150115-120942/ --minsize 2 --timerange [12:18:50.432950815,12:18:50.870648568] Top open syscall latencies (usec) Begin End Name Duration (usec) Size Proc PID Filename [12:18:50.432950815,12:18:50.870648568] open 437697.753 N/A apache2 31517 /var/lib/php5/sess_0ifir2hangm8ggaljdphl9o5b5 (fd=13) Top sync syscall latencies (usec) Begin End Name Duration (usec) Size Proc PID Filename [12:18:50.162776739,12:18:51.157522361]* sync 994745.622* N/A sync 22791 None (fd=None) *: Syscalls started and/or completed outside of the range specified
Now here is a very interesting line: we now know that while our open
was
running, a sync
was also running at the same time. It started before and
terminated after the range we specified. So we could now widen the range to
match this one and for example see the actual disk usage recorded during the
period this syscall was active, or the list of the other syscalls affected by
this one.
Note: instead of using the --timerange
option, we can control
individually the beginning and the end of the range with --begin
and/or
--end
.
Note 2: the time is relative to the timezone of the machine running the
analysis script, so if you are exchanging results with people in a different
timezone, add the --gmt
option to have a common time reference.
Note 3: if you are working on a trace that spans multiple days, you will have to specify the date as well.
I/O requests log
Finally, we can also have a look at it in a timeline (instead of top or
frequency). For that matter, we have the lttng-iolog
script (or
--log
option in the other scripts) which can give us an understanding of
the sequence that lead to our problem. Here is an example centered around the
sync
period:
$ ./lttng-iolog analysis-20150115-120942/ --timerange [12:18:50.162776739,12:18:51.157522361] Log of all I/O system calls Begin End Name Duration (usec) Size Proc PID Filename [12:18:50.162776739,12:18:51.157522361] sync 994745.622 N/A sync 22791 None (fd=None) [12:18:50.432904513,12:18:50.432912248] open 7.735 N/A apache2 31517 /dev/urandom (fd=13) [12:18:50.432912958,12:18:50.432923604] read 10.646 32.0 B apache2 31517 /dev/urandom (fd=13) [12:18:50.432950815,12:18:50.870648568] open 437697.753 N/A apache2 31517 /var/lib/php5/sess_0ifir2hangm8ggaljdphl9o5b5 (fd=13) [12:18:50.870685082,12:18:50.870689642] open 4.560 N/A apache2 31517 /var/www/html/inc/header.html (fd=14) [12:18:50.870696308,12:18:50.870698097] read 1.789 6.0 B apache2 31517 /var/www/html/inc/header.html (fd=14) [12:18:50.870785224,12:18:50.870788336] open 3.112 N/A apache2 31517 /var/www/html/tinyib.db (fd=14) [12:18:50.870795212,12:18:50.870796527] read 1.315 100.0 B apache2 31517 /var/www/html/tinyib.db (fd=14) [12:18:50.870866830,12:18:50.870867961] read 1.131 1.0 KB apache2 31517 /var/www/html/tinyib.db (fd=14) ...
Conclusion
Through this example, we identified an unusual latency and got up to the root
cause: someone (or a script) launched the sync
command which commited all
the dirty buffers in the page cache to disk. This caused a high I/O load (for
our old desktop hard drive). At the same time, our client tried to access the
index page. When the client connected, PHP tried to open a temporary session
file on disk which stalled 437ms because of the current disk activity.
One particularity about this workflow compared to the traditionnal debugging
practices is that we only recorded the trace once and then investigated it
through various tools and options. So we definitely understood this exact
latency, not a look alike. For recurring problem it does not make much
difference, but if you have ever investigated a weird case that happens only
once a month, you know the value of this approach. Moreover, reproducing this
exact example would have been really difficult considering the fact that it
depends on the state of the page cache when the sync
is started.
If you want to reproduce this example, the trace is available here. Feel free to experiment with the other LTTng analyses scripts, all the setup instructions are there.
Finally, we appreciate any feedback/suggestions/contributions and new use-cases/analyses ideas, so experiment with it and help us make great low-level performance debugging/monitoring tools.
Next time: an IRQ analysis.