All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Sitnicki <jakub@cloudflare.com>
To: Hillf Danton <hdanton@sina.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	 Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	 Eric Dumazet <edumazet@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	 bpf <bpf@vger.kernel.org>,  LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] bpf, sockmap: defer sk_psock_free_link() using RCU
Date: Wed, 22 May 2024 14:12:26 +0200	[thread overview]
Message-ID: <87o78yvydx.fsf@cloudflare.com> (raw)
In-Reply-To: <20240522113349.2202-1-hdanton@sina.com> (Hillf Danton's message of "Wed, 22 May 2024 19:33:49 +0800")

On Wed, May 22, 2024 at 07:33 PM +08, Hillf Danton wrote:
> On Wed, 22 May 2024 11:50:49 +0200 Jakub Sitnicki <jakub@cloudflare.com>
> On Wed, May 22, 2024 at 06:59 AM +08, Hillf Danton wrote:
>> > On Tue, 21 May 2024 08:38:52 -0700 Alexei Starovoitov <alexei.starovoitov@gmail.com>
>> >> On Sun, May 12, 2024 at 12:22=E2=80=AFAM Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote:
>> >> > --- a/net/core/sock_map.c
>> >> > +++ b/net/core/sock_map.c
>> >> > @@ -142,6 +142,7 @@ static void sock_map_del_link(struct sock *sk,
>> >> >         bool strp_stop =3D false, verdict_stop =3D false;
>> >> >         struct sk_psock_link *link, *tmp;
>> >> >
>> >> > +       rcu_read_lock();
>> >> >         spin_lock_bh(&psock->link_lock);
>> >> 
>> >> I think this is incorrect.
>> >> spin_lock_bh may sleep in RT and it won't be safe to do in rcu cs.
>> >
>> > Could you specify why it won't be safe in rcu cs if you are right?
>> > What does rcu look like in RT if not nothing?
>> 
>> RCU readers can't block, while spinlock RT doesn't disable preemption.
>> 
>> https://docs.kernel.org/RCU/rcu.html
>> https://docs.kernel.org/locking/locktypes.html#spinlock-t-and-preempt-rt
>> 
>> I've finally gotten around to testing proposed fix that just disallows
>> map_delete_elem on sockmap/sockhash from BPF tracing progs
>> completely. This should put an end to this saga of syzkaller reports.
>> 
>> https://lore.kernel.org/all/87jzjnxaqf.fsf@cloudflare.com/
>> 
> The locking info syzbot reported [2] suggests a known issue that like Alexei
> you hit the send button earlier than expected.
>
> 4 locks held by syz-executor361/5090:
>  #0: ffffffff8e334d20 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline]
>  #0: ffffffff8e334d20 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:781 [inline]
>  #0: ffffffff8e334d20 (rcu_read_lock){....}-{1:2}, at: map_delete_elem+0x388/0x5e0 kernel/bpf/syscall.c:1695
>  #1: ffff88807b2af8f8 (&htab->buckets[i].lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
>  #1: ffff88807b2af8f8 (&htab->buckets[i].lock){+...}-{2:2}, at: sock_hash_delete_elem+0x17c/0x400 net/core/sock_map.c:945
>  #2: ffff88801c2a4290 (&psock->link_lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
>  #2: ffff88801c2a4290 (&psock->link_lock){+...}-{2:2}, at: sock_map_del_link net/core/sock_map.c:145 [inline]
>  #2: ffff88801c2a4290 (&psock->link_lock){+...}-{2:2}, at: sock_map_unref+0xcc/0x5e0 net/core/sock_map.c:180
>  #3: ffffffff8e334d20 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline]
>  #3: ffffffff8e334d20 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:781 [inline]
>  #3: ffffffff8e334d20 (rcu_read_lock){....}-{1:2}, at: __bpf_trace_run kernel/trace/bpf_trace.c:2380 [inline]
>  #3: ffffffff8e334d20 (rcu_read_lock){....}-{1:2}, at: bpf_trace_run2+0x114/0x420 kernel/trace/bpf_trace.c:2420
>
> [2] https://lore.kernel.org/all/000000000000d0b87206170dd88f@google.com/
>
>
> If CONFIG_PREEMPT_RCU=y rcu_read_lock() does not disable
> preemption. This is even true for !RT kernels with CONFIG_PREEMPT=y
>
> [3] Subject: Re: [patch 30/63] locking/spinlock: Provide RT variant
> https://lore.kernel.org/all/874kc6rizr.ffs@tglx/

That locking issue is related to my earlier, as it turned out -
incomplete, fix:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ff91059932401894e6c86341915615c5eb0eca48

We don't expect map_delete_elem to be called from map_update_elem for
sockmap/sockhash, but that is what syzkaller started doing by attaching
BPF tracing progs which call map_delete_elem.

  reply	other threads:[~2024-05-22 12:12 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-12  7:21 [PATCH] bpf, sockmap: defer sk_psock_free_link() using RCU Tetsuo Handa
2024-05-21 15:38 ` Alexei Starovoitov
2024-05-21 22:59   ` Hillf Danton
2024-05-22  9:50     ` Jakub Sitnicki
2024-05-22 10:30       ` Tetsuo Handa
2024-05-22 11:08         ` Jakub Sitnicki
2024-05-22 11:33       ` Hillf Danton
2024-05-22 12:12         ` Jakub Sitnicki [this message]
2024-05-22 14:57           ` Alexei Starovoitov
2024-05-24 13:06             ` Jakub Sitnicki
2024-05-27 11:22               ` Jakub Sitnicki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o78yvydx.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=edumazet@google.com \
    --cc=hdanton@sina.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.