All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: paulmck@kernel.org
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>,
	bpf <bpf@vger.kernel.org>,
	Magnus Karlsson <magnus.karlsson@intel.com>
Subject: Re: BPF trampolines break because of hang in synchronize_rcu_tasks() on PREEMPT kernels
Date: Tue, 23 Mar 2021 23:06:04 +0100	[thread overview]
Message-ID: <87lfad5xv7.fsf@toke.dk> (raw)
In-Reply-To: <20210323215247.GD2696@paulmck-ThinkPad-P72>

"Paul E. McKenney" <paulmck@kernel.org> writes:

> On Tue, Mar 23, 2021 at 10:04:50PM +0100, Toke Høiland-Jørgensen wrote:
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>> 
>> > On Tue, Mar 23, 2021 at 12:52 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>
>> >> "Paul E. McKenney" <paulmck@kernel.org> writes:
>> >>
>> >> > On Tue, Mar 23, 2021 at 06:29:35PM +0100, Toke Høiland-Jørgensen wrote:
>> >> >> "Paul E. McKenney" <paulmck@kernel.org> writes:
>> >> >>
>> >> >> > On Tue, Mar 23, 2021 at 01:26:36PM +0100, Toke Høiland-Jørgensen wrote:
>> >> >> >> Hi Paul
>> >> >> >>
>> >> >> >> Magnus and I have been debugging an issue where close() on a bpf_link
>> >> >> >> file descriptor would hang indefinitely when the system was under load
>> >> >> >> on a kernel compiled with CONFIG_PREEMPT=y, and it seems to be related
>> >> >> >> to synchronize_rcu_tasks(), so I'm hoping you can help us with it.
>> >> >> >>
>> >> >> >> The issue is triggered reliably by loading up a system with network
>> >> >> >> traffic (causing 100% softirq CPU load on one or more cores), and then
>> >> >> >> attaching an freplace bpf_link and closing it again. The close() will
>> >> >> >> hang until the network traffic load is lowered.
>> >> >> >>
>> >> >> >> Digging further, it appears that the hang happens in
>> >> >> >> synchronize_rcu_tasks(), as seen by running a bpftrace script like:
>> >> >> >>
>> >> >> >> bpftrace -e 'kprobe:synchronize_rcu_tasks { @start = nsecs; printf("enter\n"); } kretprobe:synchronize_rcu_tasks { printf("exit after %d ms\n", (nsecs - @start) / 1000000); }'
>> >> >> >> Attaching 2 probes...
>> >> >> >> enter
>> >> >> >> exit after 54 ms
>> >> >> >> enter
>> >> >> >> exit after 3249 ms
>> >> >> >>
>> >> >> >> (the two enter/exit pairs are, respectively, from an unloaded system,
>> >> >> >> and from a loaded system where I stopped the network traffic after a
>> >> >> >> couple of seconds).
>> >> >> >>
>> >> >> >> The call to synchronize_rcu_tasks() happens in bpf_trampoline_put():
>> >> >> >>
>> >> >> >> https://elixir.bootlin.com/linux/latest/source/kernel/bpf/trampoline.c#L376
>> >> >> >>
>> >> >> >> And because it does this while holding trampoline_mutex, even deferring
>> >> >> >> the put to a worker (as a previously applied-then-reverted patch did[0])
>> >> >> >> doesn't help: that'll fix the initial hang on close(), but any
>> >> >> >> subsequent use of BPF trampolines will then be blocked because of the
>> >> >> >> mutex.
>> >> >> >>
>> >> >> >> Also, if I just keep the network traffic running I will eventually get a
>> >> >> >> kernel panic with:
>> >> >> >>
>> >> >> >> kernel:[44348.426312] Kernel panic - not syncing: hung_task: blocked tasks
>> >> >> >>
>> >> >> >> I've created a reproducer for the issue here:
>> >> >> >> https://github.com/xdp-project/bpf-examples/tree/master/bpf-link-hang
>> >> >> >>
>> >> >> >> To compile simply do this (needs a recent llvm/clang for compiling the BPF program):
>> >> >> >>
>> >> >> >> $ git clone --recurse-submodules https://github.com/xdp-project/bpf-examples
>> >> >> >> $ cd bpf-examples/bpf-link-hang
>> >> >> >> $ make
>> >> >> >> $ ./sudo bpf-link-hang
>> >> >> >>
>> >> >> >> you'll need to load up the system to trigger the hang; I'm using pktgen
>> >> >> >> from a separate machine to do this.
>> >> >> >>
>> >> >> >> My question is, of course, as ever, What Is To Be Done? Is it expected
>> >> >> >> that synchronize_rcu_tasks() can hang indefinitely on a PREEMPT system,
>> >> >> >> or can this be fixed? And if it is expected, how can the BPF code be
>> >> >> >> fixed so it doesn't deadlock because of this?
>> >> >> >>
>> >> >> >> Hoping you can help us with this - many thanks in advance! :)
>> >> >> >
>> >> >> > Let me start with the usual question...  Is the network traffic intense
>> >> >> > enough that one of the CPUs might remain in a loop handling softirqs
>> >> >> > indefinitely?
>> >> >>
>> >> >> Yup, I'm pegging all CPUs in softirq:
>> >> >>
>> >> >> $ mpstat -P ALL 1
>> >> >> [...]
>> >> >> 18:26:52     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
>> >> >> 18:26:53     all    0.00    0.00    0.00    0.00    0.00  100.00    0.00    0.00    0.00    0.00
>> >> >> 18:26:53       0    0.00    0.00    0.00    0.00    0.00  100.00    0.00    0.00    0.00    0.00
>> >> >> 18:26:53       1    0.00    0.00    0.00    0.00    0.00  100.00    0.00    0.00    0.00    0.00
>> >> >> 18:26:53       2    0.00    0.00    0.00    0.00    0.00  100.00    0.00    0.00    0.00    0.00
>> >> >> 18:26:53       3    0.00    0.00    0.00    0.00    0.00  100.00    0.00    0.00    0.00    0.00
>> >> >> 18:26:53       4    0.00    0.00    0.00    0.00    0.00  100.00    0.00    0.00    0.00    0.00
>> >> >> 18:26:53       5    0.00    0.00    0.00    0.00    0.00  100.00    0.00    0.00    0.00    0.00
>> >> >>
>> >> >> > If so, does the (untested, probably does not build) patch below help?
>> >> >>
>> >> >> Doesn't appear to, no. It builds fine, but I still get:
>> >> >>
>> >> >> Attaching 2 probes...
>> >> >> enter
>> >> >> exit after 8480 ms
>> >> >>
>> >> >> (that was me interrupting the network traffic again)
>> >> >
>> >> > Is your kernel properly shifting from back-of-interrupt softirq processing
>> >> > to ksoftirqd under heavy load?  If not, my patch will not have any
>> >> > effect.
>> >>
>> >> Seems to be - this is from top:
>> >>
>> >>      12 root      20   0       0      0      0 R  99.3   0.0   0:43.64 ksoftirqd/0
>> >>      24 root      20   0       0      0      0 R  99.3   0.0   0:43.62 ksoftirqd/2
>> >>      34 root      20   0       0      0      0 R  99.3   0.0   0:43.64 ksoftirqd/4
>> >>      39 root      20   0       0      0      0 R  99.3   0.0   0:43.65 ksoftirqd/5
>> >>      19 root      20   0       0      0      0 R  99.0   0.0   0:43.63 ksoftirqd/1
>> >>      29 root      20   0       0      0      0 R  99.0   0.0   0:43.63 ksoftirqd/3
>> >>
>> >> Any other ideas? :)
>> >
>> > bpf_trampoline_put() got significantly changed by e21aa341785c ("bpf:
>> > Fix fexit trampoline. "), it doesn't do synchronize_rcu_tasks()
>> > anymore. Please give it a try. It's in bpf tree.
>> 
>> Ah! I had missed that patch, and only tested this on bpf-next. Yes, that
>> indeed works better; awesome!
>> 
>> And sorry for bothering you with this, Paul; guess I should have looked
>> harder for fixes first... :/
>
> Glad it is now working!
>
> And in any case, my patch needed an s/true/false/.  :-/
>
> Hey, I did say "untested"!  ;-)

Haha, right, well at least you run afoul of the 'truth in advertising'
committee ;)

-Toke


  reply	other threads:[~2021-03-23 22:07 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <877dly6ooz.fsf@toke.dk>
2021-03-23 16:43 ` BPF trampolines break because of hang in synchronize_rcu_tasks() on PREEMPT kernels Paul E. McKenney
2021-03-23 17:29   ` Toke Høiland-Jørgensen
2021-03-23 17:57     ` Paul E. McKenney
2021-03-23 19:50       ` Toke Høiland-Jørgensen
2021-03-23 19:59         ` Andrii Nakryiko
2021-03-23 21:04           ` Toke Høiland-Jørgensen
2021-03-23 21:52             ` Paul E. McKenney
2021-03-23 22:06               ` Toke Høiland-Jørgensen [this message]
2021-03-24  2:41                 ` Paul E. McKenney
2021-03-24 11:33                   ` Toke Høiland-Jørgensen
2021-03-24 16:11                     ` Paul E. McKenney
2021-03-24 19:17                       ` Toke Høiland-Jørgensen
2021-03-25 16:28                         ` Paul E. McKenney
2021-03-25 21:13                           ` Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lfad5xv7.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=magnus.karlsson@intel.com \
    --cc=paulmck@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.