From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: paulmck@kernel.org
Cc: bpf@vger.kernel.org, Magnus Karlsson <magnus.karlsson@intel.com>
Subject: Re: BPF trampolines break because of hang in synchronize_rcu_tasks() on PREEMPT kernels
Date: Tue, 23 Mar 2021 18:29:35 +0100 [thread overview]
Message-ID: <871rc57p8g.fsf@toke.dk> (raw)
In-Reply-To: <20210323164315.GY2696@paulmck-ThinkPad-P72>
"Paul E. McKenney" <paulmck@kernel.org> writes:
> On Tue, Mar 23, 2021 at 01:26:36PM +0100, Toke Høiland-Jørgensen wrote:
>> Hi Paul
>>
>> Magnus and I have been debugging an issue where close() on a bpf_link
>> file descriptor would hang indefinitely when the system was under load
>> on a kernel compiled with CONFIG_PREEMPT=y, and it seems to be related
>> to synchronize_rcu_tasks(), so I'm hoping you can help us with it.
>>
>> The issue is triggered reliably by loading up a system with network
>> traffic (causing 100% softirq CPU load on one or more cores), and then
>> attaching an freplace bpf_link and closing it again. The close() will
>> hang until the network traffic load is lowered.
>>
>> Digging further, it appears that the hang happens in
>> synchronize_rcu_tasks(), as seen by running a bpftrace script like:
>>
>> bpftrace -e 'kprobe:synchronize_rcu_tasks { @start = nsecs; printf("enter\n"); } kretprobe:synchronize_rcu_tasks { printf("exit after %d ms\n", (nsecs - @start) / 1000000); }'
>> Attaching 2 probes...
>> enter
>> exit after 54 ms
>> enter
>> exit after 3249 ms
>>
>> (the two enter/exit pairs are, respectively, from an unloaded system,
>> and from a loaded system where I stopped the network traffic after a
>> couple of seconds).
>>
>> The call to synchronize_rcu_tasks() happens in bpf_trampoline_put():
>>
>> https://elixir.bootlin.com/linux/latest/source/kernel/bpf/trampoline.c#L376
>>
>> And because it does this while holding trampoline_mutex, even deferring
>> the put to a worker (as a previously applied-then-reverted patch did[0])
>> doesn't help: that'll fix the initial hang on close(), but any
>> subsequent use of BPF trampolines will then be blocked because of the
>> mutex.
>>
>> Also, if I just keep the network traffic running I will eventually get a
>> kernel panic with:
>>
>> kernel:[44348.426312] Kernel panic - not syncing: hung_task: blocked tasks
>>
>> I've created a reproducer for the issue here:
>> https://github.com/xdp-project/bpf-examples/tree/master/bpf-link-hang
>>
>> To compile simply do this (needs a recent llvm/clang for compiling the BPF program):
>>
>> $ git clone --recurse-submodules https://github.com/xdp-project/bpf-examples
>> $ cd bpf-examples/bpf-link-hang
>> $ make
>> $ ./sudo bpf-link-hang
>>
>> you'll need to load up the system to trigger the hang; I'm using pktgen
>> from a separate machine to do this.
>>
>> My question is, of course, as ever, What Is To Be Done? Is it expected
>> that synchronize_rcu_tasks() can hang indefinitely on a PREEMPT system,
>> or can this be fixed? And if it is expected, how can the BPF code be
>> fixed so it doesn't deadlock because of this?
>>
>> Hoping you can help us with this - many thanks in advance! :)
>
> Let me start with the usual question... Is the network traffic intense
> enough that one of the CPUs might remain in a loop handling softirqs
> indefinitely?
Yup, I'm pegging all CPUs in softirq:
$ mpstat -P ALL 1
[...]
18:26:52 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
18:26:53 all 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00
18:26:53 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00
18:26:53 1 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00
18:26:53 2 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00
18:26:53 3 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00
18:26:53 4 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00
18:26:53 5 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00
> If so, does the (untested, probably does not build) patch below help?
Doesn't appear to, no. It builds fine, but I still get:
Attaching 2 probes...
enter
exit after 8480 ms
(that was me interrupting the network traffic again)
-Toke
next prev parent reply other threads:[~2021-03-23 17:30 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <877dly6ooz.fsf@toke.dk>
2021-03-23 16:43 ` BPF trampolines break because of hang in synchronize_rcu_tasks() on PREEMPT kernels Paul E. McKenney
2021-03-23 17:29 ` Toke Høiland-Jørgensen [this message]
2021-03-23 17:57 ` Paul E. McKenney
2021-03-23 19:50 ` Toke Høiland-Jørgensen
2021-03-23 19:59 ` Andrii Nakryiko
2021-03-23 21:04 ` Toke Høiland-Jørgensen
2021-03-23 21:52 ` Paul E. McKenney
2021-03-23 22:06 ` Toke Høiland-Jørgensen
2021-03-24 2:41 ` Paul E. McKenney
2021-03-24 11:33 ` Toke Høiland-Jørgensen
2021-03-24 16:11 ` Paul E. McKenney
2021-03-24 19:17 ` Toke Høiland-Jørgensen
2021-03-25 16:28 ` Paul E. McKenney
2021-03-25 21:13 ` Toke Høiland-Jørgensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=871rc57p8g.fsf@toke.dk \
--to=toke@redhat.com \
--cc=bpf@vger.kernel.org \
--cc=magnus.karlsson@intel.com \
--cc=paulmck@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.