← Back to LTTng's blog

Changelog:
commit e198fb6a2ebc22ceac8b10d953103b59452f24d4
Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Date:   Sat Mar 1 11:33:25 2014 -0500

    Fix: high cpu usage in synchronize_rcu with long RCU read-side C.S.

    We noticed that with this kind of scenario:
    - application using urcu-mb, urcu-membarrier, urcu-signal, or urcu-bp,
    - long RCU read-side critical sections, caused by e.g. long network I/O
      system calls,
    - other short lived RCU critical sections running in other threads,
    - very frequent invocation of call_rcu to enqueue callbacks,
    lead to abnormally high CPU usage within synchronize_rcu() in the
    call_rcu worker threads.

    Inspection of the code gives us the answer: in urcu.c, we expect that if
    we need to wait on a futex (wait_gp()), we expect to be able to end the
    grace period within the next loop, having been notified by a
    rcu_read_unlock(). However, this is not always the case: we can very
    well be awakened by a rcu_read_unlock() executed on a thread running
    short-lived RCU read-side critical sections, while the long-running RCU
    read-side C.S. is still active. We end up in a situation where we
    busy-wait for a very long time, because the counter is !=
    RCU_QS_ACTIVE_ATTEMPTS until a 32-bit overflow happens (or more likely,
    until we complete the grace period). We need to change the wait_loops ==
    RCU_QS_ACTIVE_ATTEMPTS check into an inequality to use wait_gp() for
    every attempts beyond RCU_QS_ACTIVE_ATTEMPTS loops.

    urcu-bp.c also has this issue. Moreover, it uses usleep() rather than
    poll() when dealing with long-running RCU read-side critical sections.
    Turn the usleep 1000us (1ms) into a poll of 10ms. One of the advantage
    of using poll() rather than usleep() is that it does not interact with
    SIGALRM.

    urcu-qsbr.c already checks for wait_loops >= RCU_QS_ACTIVE_ATTEMPTS, so
    it is not affected by this issue.

    Looking into these loops, however, shows that overflow of the loop
    counter, although unlikely, would bring us back to a situation of high
    cpu usage (a negative value well below RCU_QS_ACTIVE_ATTEMPTS).
    Therefore, change the counter behavior so it stops incrementing when it
    reaches RCU_QS_ACTIVE_ATTEMPTS, to eliminate overflow.

    Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

← Back to LTTng's blog

Userspace RCU 0.8.3 and 0.7.11