From: John Fastabend <john.fastabend@gmail.com>
To: Jakub Sitnicki <jakub@cloudflare.com>,
Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Cc: netdev@vger.kernel.org, Cong Wang <cong.wang@bytedance.com>,
Eric Dumazet <edumazet@google.com>,
Daniel Borkmann <daniel@iogearbox.net>,
John Fastabend <john.fastabend@gmail.com>,
"David S. Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
Paolo Abeni <pabeni@redhat.com>,
bpf@vger.kernel.org, kernel-dev@igalia.com,
syzbot+07a2e4a1a57118ef7355@syzkaller.appspotmail.com,
stable@vger.kernel.org
Subject: Re: [PATCH net v2] sock_map: avoid race between sock_map_close and sk_psock_put
Date: Tue, 28 May 2024 13:20:19 -0700 [thread overview]
Message-ID: <66563c8385546_2f7f208bf@john.notmuch> (raw)
In-Reply-To: <875xuzwpjb.fsf@cloudflare.com>
Jakub Sitnicki wrote:
> On Fri, May 24, 2024 at 11:47 AM -03, Thadeu Lima de Souza Cascardo wrote:
> > sk_psock_get will return NULL if the refcount of psock has gone to 0, which
> > will happen when the last call of sk_psock_put is done. However,
> > sk_psock_drop may not have finished yet, so the close callback will still
> > point to sock_map_close despite psock being NULL.
> >
> > This can be reproduced with a thread deleting an element from the sock map,
> > while the second one creates a socket, adds it to the map and closes it.
> >
> > That will trigger the WARN_ON_ONCE:
> >
> > ------------[ cut here ]------------
> > WARNING: CPU: 1 PID: 7220 at net/core/sock_map.c:1701 sock_map_close+0x2a2/0x2d0 net/core/sock_map.c:1701
> > Modules linked in:
> > CPU: 1 PID: 7220 Comm: syz-executor380 Not tainted 6.9.0-syzkaller-07726-g3c999d1ae3c7 #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
> > RIP: 0010:sock_map_close+0x2a2/0x2d0 net/core/sock_map.c:1701
> > Code: df e8 92 29 88 f8 48 8b 1b 48 89 d8 48 c1 e8 03 42 80 3c 20 00 74 08 48 89 df e8 79 29 88 f8 4c 8b 23 eb 89 e8 4f 15 23 f8 90 <0f> 0b 90 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d e9 13 26 3d 02
> > RSP: 0018:ffffc9000441fda8 EFLAGS: 00010293
> > RAX: ffffffff89731ae1 RBX: ffffffff94b87540 RCX: ffff888029470000
> > RDX: 0000000000000000 RSI: ffffffff8bcab5c0 RDI: ffffffff8c1faba0
> > RBP: 0000000000000000 R08: ffffffff92f9b61f R09: 1ffffffff25f36c3
> > R10: dffffc0000000000 R11: fffffbfff25f36c4 R12: ffffffff89731840
> > R13: ffff88804b587000 R14: ffff88804b587000 R15: ffffffff89731870
> > FS: 000055555e080380(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000000 CR3: 00000000207d4000 CR4: 0000000000350ef0
> > Call Trace:
> > <TASK>
> > unix_release+0x87/0xc0 net/unix/af_unix.c:1048
> > __sock_release net/socket.c:659 [inline]
> > sock_close+0xbe/0x240 net/socket.c:1421
> > __fput+0x42b/0x8a0 fs/file_table.c:422
> > __do_sys_close fs/open.c:1556 [inline]
> > __se_sys_close fs/open.c:1541 [inline]
> > __x64_sys_close+0x7f/0x110 fs/open.c:1541
> > do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> > do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
> > entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > RIP: 0033:0x7fb37d618070
> > Code: 00 00 48 c7 c2 b8 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff eb d4 e8 10 2c 00 00 80 3d 31 f0 07 00 00 74 17 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c
> > RSP: 002b:00007ffcd4a525d8 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
> > RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fb37d618070
> > RDX: 0000000000000010 RSI: 00000000200001c0 RDI: 0000000000000004
> > RBP: 0000000000000000 R08: 0000000100000000 R09: 0000000100000000
> > R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
> > R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> > </TASK>
> >
> > Use sk_psock, which will only check that the pointer is not been set to
> > NULL yet, which should only happen after the callbacks are restored. If,
> > then, a reference can still be gotten, we may call sk_psock_stop and cancel
> > psock->work.
> >
> > As suggested by Paolo Abeni, reorder the condition so the control flow is
> > less convoluted.
> >
> > After that change, the reproducer does not trigger the WARN_ON_ONCE
> > anymore.
> >
> > Suggested-by: Paolo Abeni <pabeni@redhat.com>
> > Reported-by: syzbot+07a2e4a1a57118ef7355@syzkaller.appspotmail.com
> > Closes: https://syzkaller.appspot.com/bug?extid=07a2e4a1a57118ef7355
> > Fixes: aadb2bb83ff7 ("sock_map: Fix a potential use-after-free in sock_map_close()")
> > Fixes: 5b4a79ba65a1 ("bpf, sockmap: Don't let sock_map_{close,destroy,unhash} call itself")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
> > ---
> >
> > v2: change control flow as suggested by Paolo Abeni
> >
> > v1: https://lore.kernel.org/netdev/20240520214153.847619-1-cascardo@igalia.com/
> >
> > ---
> > net/core/sock_map.c | 16 ++++++++++------
> > 1 file changed, 10 insertions(+), 6 deletions(-)
> >
> > diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> > index 9402889840bf..c3179567a99a 100644
> > --- a/net/core/sock_map.c
> > +++ b/net/core/sock_map.c
> > @@ -1680,19 +1680,23 @@ void sock_map_close(struct sock *sk, long timeout)
> >
> > lock_sock(sk);
> > rcu_read_lock();
> > - psock = sk_psock_get(sk);
> > - if (unlikely(!psock)) {
> > - rcu_read_unlock();
> > - release_sock(sk);
> > - saved_close = READ_ONCE(sk->sk_prot)->close;
> > - } else {
> > + psock = sk_psock(sk);
> > + if (likely(psock)) {
> > saved_close = psock->saved_close;
> > sock_map_remove_links(sk, psock);
> > + psock = sk_psock_get(sk);
> > + if (unlikely(!psock))
> > + goto no_psock;
> > rcu_read_unlock();
> > sk_psock_stop(psock);
> > release_sock(sk);
> > cancel_delayed_work_sync(&psock->work);
> > sk_psock_put(sk, psock);
> > + } else {
> > + saved_close = READ_ONCE(sk->sk_prot)->close;
> > +no_psock:
> > + rcu_read_unlock();
> > + release_sock(sk);
> > }
> >
> > /* Make sure we do not recurse. This is a bug.
>
> Thanks.
>
> Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
LGTM as well. Thanks.
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
next prev parent reply other threads:[~2024-05-28 20:20 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-24 14:47 [PATCH net v2] sock_map: avoid race between sock_map_close and sk_psock_put Thadeu Lima de Souza Cascardo
2024-05-27 9:51 ` Jakub Sitnicki
2024-05-28 20:20 ` John Fastabend [this message]
2024-05-28 10:20 ` patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=66563c8385546_2f7f208bf@john.notmuch \
--to=john.fastabend@gmail.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=cascardo@igalia.com \
--cc=cong.wang@bytedance.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=jakub@cloudflare.com \
--cc=kernel-dev@igalia.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=stable@vger.kernel.org \
--cc=syzbot+07a2e4a1a57118ef7355@syzkaller.appspotmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.