* [PATCH net v2] net: sockmap: avoid race between sock_map_destroy() and sk_psock_put()
@ 2024-09-10 11:43 Dmitry Antipov
2024-09-19 8:51 ` Paolo Abeni
0 siblings, 1 reply; 6+ messages in thread
From: Dmitry Antipov @ 2024-09-10 11:43 UTC (permalink / raw)
To: John Fastabend, Jakub Sitnicki
Cc: Cong Wang, Jakub Kicinski, Paolo Abeni, netdev, lvc-project,
Dmitry Antipov, syzbot+f363afac6b0ace576f45
Syzbot has triggered the following race condition:
On CPU0, 'sk_psock_drop()' (most likely scheduled from 'sock_map_unref()'
called by 'sock_map_update_common()') is running at [1]:
void sk_psock_drop(struct sock *sk, struct sk_psock *psock)
{
write_lock_bh(&sk->sk_callback_lock);
sk_psock_restore_proto(sk, psock); [1]
rcu_assign_sk_user_data(sk, NULL); [2]
...
}
If 'sock_map_destroy()' is scheduled on CPU1 at the same time, psock is
always NULL at [3]. But, since [1] may be is in progress during [4], the
value of 'saved_destroy' at this point is undefined:
void sock_map_destroy(struct sock *sk)
{
void (*saved_destroy)(struct sock *sk);
struct sk_psock *psock;
rcu_read_lock();
psock = sk_psock_get(sk); [3]
if (unlikely(!psock)) {
rcu_read_unlock();
saved_destroy = READ_ONCE(sk->sk_prot)->destroy; [4]
} else {
saved_destroy = psock->saved_destroy; [5]
sock_map_remove_links(sk, psock);
rcu_read_unlock();
sk_psock_stop(psock);
sk_psock_put(sk, psock);
}
if (WARN_ON_ONCE(saved_destroy == sock_map_destroy))
return;
if (saved_destroy)
saved_destroy(sk);
}
Fix this issue in 3 steps:
1. Prefer 'sk_psock()' over 'sk_psock_get()' at [3]. Since zero
refcount is ignored, 'psock' is non-NULL until [2] is completed.
2. Add read lock around [5], to make sure that [1] is not in progress
when the former is executed.
3. Since 'sk_psock()' does not adjust reference counting, drop
'sk_psock_put()' and redundant 'sk_psock_stop()' (which is
executed by 'sk_psock_drop()' anyway).
Fixes: 5b4a79ba65a1 ("bpf, sockmap: Don't let sock_map_{close,destroy,unhash} call itself")
Reported-by: syzbot+f363afac6b0ace576f45@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f363afac6b0ace576f45
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
---
net/core/sock_map.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index d3dbb92153f2..1eeb1d3a6b71 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -1649,16 +1649,16 @@ void sock_map_destroy(struct sock *sk)
struct sk_psock *psock;
rcu_read_lock();
- psock = sk_psock_get(sk);
+ psock = sk_psock(sk);
if (unlikely(!psock)) {
rcu_read_unlock();
saved_destroy = READ_ONCE(sk->sk_prot)->destroy;
} else {
+ read_lock_bh(&sk->sk_callback_lock);
saved_destroy = psock->saved_destroy;
+ read_unlock_bh(&sk->sk_callback_lock);
sock_map_remove_links(sk, psock);
rcu_read_unlock();
- sk_psock_stop(psock);
- sk_psock_put(sk, psock);
}
if (WARN_ON_ONCE(saved_destroy == sock_map_destroy))
return;
--
2.46.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH net v2] net: sockmap: avoid race between sock_map_destroy() and sk_psock_put() 2024-09-10 11:43 [PATCH net v2] net: sockmap: avoid race between sock_map_destroy() and sk_psock_put() Dmitry Antipov @ 2024-09-19 8:51 ` Paolo Abeni 2024-09-19 9:26 ` Dmitry Antipov 0 siblings, 1 reply; 6+ messages in thread From: Paolo Abeni @ 2024-09-19 8:51 UTC (permalink / raw) To: Dmitry Antipov, John Fastabend, Jakub Sitnicki Cc: Cong Wang, Jakub Kicinski, netdev, lvc-project, syzbot+f363afac6b0ace576f45 On 9/10/24 13:43, Dmitry Antipov wrote: > Syzbot has triggered the following race condition: > > On CPU0, 'sk_psock_drop()' (most likely scheduled from 'sock_map_unref()' > called by 'sock_map_update_common()') is running at [1]: > > void sk_psock_drop(struct sock *sk, struct sk_psock *psock) > { > write_lock_bh(&sk->sk_callback_lock); > sk_psock_restore_proto(sk, psock); [1] > rcu_assign_sk_user_data(sk, NULL); [2] > ... > } > > If 'sock_map_destroy()' is scheduled on CPU1 at the same time, psock is > always NULL at [3]. But, since [1] may be is in progress during [4], the > value of 'saved_destroy' at this point is undefined: > > void sock_map_destroy(struct sock *sk) > { > void (*saved_destroy)(struct sock *sk); > struct sk_psock *psock; > > rcu_read_lock(); > psock = sk_psock_get(sk); [3] > if (unlikely(!psock)) { > rcu_read_unlock(); > saved_destroy = READ_ONCE(sk->sk_prot)->destroy; [4] > } else { > saved_destroy = psock->saved_destroy; [5] > sock_map_remove_links(sk, psock); > rcu_read_unlock(); > sk_psock_stop(psock); > sk_psock_put(sk, psock); > } > if (WARN_ON_ONCE(saved_destroy == sock_map_destroy)) > return; > if (saved_destroy) > saved_destroy(sk); > } > > Fix this issue in 3 steps: > > 1. Prefer 'sk_psock()' over 'sk_psock_get()' at [3]. Since zero > refcount is ignored, 'psock' is non-NULL until [2] is completed. > > 2. Add read lock around [5], to make sure that [1] is not in progress > when the former is executed. > > 3. Since 'sk_psock()' does not adjust reference counting, drop > 'sk_psock_put()' and redundant 'sk_psock_stop()' (which is > executed by 'sk_psock_drop()' anyway). > > Fixes: 5b4a79ba65a1 ("bpf, sockmap: Don't let sock_map_{close,destroy,unhash} call itself") > Reported-by: syzbot+f363afac6b0ace576f45@syzkaller.appspotmail.com > Closes: https://syzkaller.appspot.com/bug?extid=f363afac6b0ace576f45 > Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> I think there is still Cong question pending: why this could not/ should not be addressed instead in the RDS code. Thanks, Paolo ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net v2] net: sockmap: avoid race between sock_map_destroy() and sk_psock_put() 2024-09-19 8:51 ` Paolo Abeni @ 2024-09-19 9:26 ` Dmitry Antipov 2024-09-24 8:23 ` Paolo Abeni 0 siblings, 1 reply; 6+ messages in thread From: Dmitry Antipov @ 2024-09-19 9:26 UTC (permalink / raw) To: Paolo Abeni, John Fastabend, Jakub Sitnicki Cc: Cong Wang, Jakub Kicinski, netdev, lvc-project, syzbot+f363afac6b0ace576f45 On 9/19/24 11:51 AM, Paolo Abeni wrote: > I think there is still Cong question pending: why this could not/ should not be addressed instead in the RDS code. Hm. Except 'rds_tcp_accept_xxx()' functions, the rest of the backtrace is contributed by generic TCP and IP things. I've tried to debug this issue starting from the closest innards and seems found an issue within sockmap code. Anyway, I'm strongly disagree with "unless otherwise proven, there are a lot of bugs everywhere except the subsystem I'm responsible to" bias, assuming that a bit more friendly and cooperative efforts should give us the better results. Dmitry ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net v2] net: sockmap: avoid race between sock_map_destroy() and sk_psock_put() 2024-09-19 9:26 ` Dmitry Antipov @ 2024-09-24 8:23 ` Paolo Abeni 2024-09-25 0:22 ` Michal Luczaj 2024-09-27 13:04 ` Jakub Sitnicki 0 siblings, 2 replies; 6+ messages in thread From: Paolo Abeni @ 2024-09-24 8:23 UTC (permalink / raw) To: Dmitry Antipov, John Fastabend, Jakub Sitnicki Cc: Cong Wang, Jakub Kicinski, netdev, lvc-project, syzbot+f363afac6b0ace576f45 On 9/19/24 11:26, Dmitry Antipov wrote: > On 9/19/24 11:51 AM, Paolo Abeni wrote: > >> I think there is still Cong question pending: why this could not/ should not be addressed instead in the RDS code. > > Hm. Except 'rds_tcp_accept_xxx()' functions, the rest of the backtrace > is contributed by generic TCP and IP things. AFAICS most of RDS is build on top of TCP, most RDS-specific bugs will show a lot tcp/ip in their backtrace. i.e. with mptcp we had quite a few bugs with a 'tcp mostily' or 'tcp only' backtrace and root caused in the mptcp code. > I've tried to debug this > issue starting from the closest innards and seems found an issue within > sockmap code. > Anyway, I'm strongly disagree with "unless otherwise proven, > there are a lot of bugs everywhere except the subsystem I'm responsible > to" bias, assuming that a bit more friendly and cooperative efforts > should give us the better results. I guess that the main point in Cong's feedback is that a sockmap update is not supposed to race with sock_map_destroy() (???) @Cong, @John, @JakubS: any comments on that? Cheers, Paolo ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net v2] net: sockmap: avoid race between sock_map_destroy() and sk_psock_put() 2024-09-24 8:23 ` Paolo Abeni @ 2024-09-25 0:22 ` Michal Luczaj 2024-09-27 13:04 ` Jakub Sitnicki 1 sibling, 0 replies; 6+ messages in thread From: Michal Luczaj @ 2024-09-25 0:22 UTC (permalink / raw) To: Paolo Abeni, Dmitry Antipov, John Fastabend, Jakub Sitnicki Cc: Cong Wang, Jakub Kicinski, netdev, lvc-project, syzbot+f363afac6b0ace576f45, Kuniyuki Iwashima On 9/24/24 10:23, Paolo Abeni wrote: > ... > I guess that the main point in Cong's feedback is that a sockmap update > is not supposed to race with sock_map_destroy() (???) @Cong, @John, > @JakubS: any comments on that? In somewhat related news: sock_map_unhash() races with the update, hitting WARN_ON_ONCE(saved_unhash == sock_map_unhash). CPU0 CPU1 ==== ==== BPF_MAP_DELETE_ELEM sk_psock_drop() sk_psock_restore_proto rcu_assign_sk_user_data(NULL) shutdown() sock_map_unhash() psock = sk_psock(sk) if (unlikely(!psock)) { BPF_MAP_UPDATE_ELEM sock_map_init_proto() sock_replace_proto saved_unhash = READ_ONCE(sk->sk_prot)->unhash; } if (WARN_ON_ONCE(saved_unhash == sock_map_unhash)) return; [ 20.860668] WARNING: CPU: 1 PID: 1238 at net/core/sock_map.c:1641 sock_map_unhash+0xa1/0x220 [ 20.860686] CPU: 1 UID: 0 PID: 1238 Comm: a.out Not tainted 6.11.0+ #59 [ 20.860688] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 20.860705] Call Trace: [ 20.860706] <TASK> [ 20.860725] unix_shutdown+0xb0/0x470 [ 20.860728] __sys_shutdown+0x7a/0xa0 [ 20.860731] __x64_sys_shutdown+0x10/0x20 [ 20.860733] do_syscall_64+0x93/0x180 [ 20.860788] entry_SYSCALL_64_after_hwframe+0x76/0x7e Under VM it takes about 10 minutes to trigger the splat: #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <unistd.h> #include <pthread.h> #include <sys/un.h> #include <sys/syscall.h> #include <sys/socket.h> #include <linux/bpf.h> int s[2], sockmap; static void die(char *msg) { perror(msg); exit(-1); } static int create_sockmap(int key_size, int value_size, int max_entries) { union bpf_attr attr = { .map_type = BPF_MAP_TYPE_SOCKMAP, .key_size = key_size, .value_size = value_size, .max_entries = max_entries }; int map = syscall(SYS_bpf, BPF_MAP_CREATE, &attr, sizeof(attr)); if (map < 0) die("bpf_create_map"); return map; } static void map_update_elem(int map_fd, int key, void *value, uint64_t flags) { union bpf_attr attr = { .map_fd = map_fd, .key = (uint64_t)&key, .value = (uint64_t)value, .flags = flags }; syscall(SYS_bpf, BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr)); } static void map_del_elem(int map_fd, int key) { union bpf_attr attr = { .map_fd = map_fd, .key = (uint64_t)&key }; syscall(SYS_bpf, BPF_MAP_DELETE_ELEM, &attr, sizeof(attr)); } static void *racer_del(void *unused) { for (;;) map_del_elem(sockmap, 0); return NULL; } static void *racer_update(void *unused) { for (;;) map_update_elem(sockmap, 0, &s[0], BPF_ANY); return NULL; } int main(void) { pthread_t t0, t1; if (pthread_create(&t0, NULL, racer_del, NULL)) die("pthread_create"); if (pthread_create(&t1, NULL, racer_update, NULL)) die("pthread_create"); sockmap = create_sockmap(sizeof(int), sizeof(int), 1); for (;;) { if (socketpair(AF_UNIX, SOCK_STREAM, 0, s) < 0) die("socketpair"); map_update_elem(sockmap, 0, &s[0], BPF_ANY); shutdown(s[1], 0); close(s[0]); close(s[1]); } } With mdelay(1) it's a matter of seconds: diff --git a/net/core/sock_map.c b/net/core/sock_map.c index 724b6856fcc3..98a964399813 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -1631,6 +1631,7 @@ void sock_map_unhash(struct sock *sk) psock = sk_psock(sk); if (unlikely(!psock)) { rcu_read_unlock(); + mdelay(1); saved_unhash = READ_ONCE(sk->sk_prot)->unhash; } else { saved_unhash = psock->saved_unhash; I've tried the patch below and it seems to do the trick diff --git a/net/core/sock_map.c b/net/core/sock_map.c index 724b6856fcc3..a384771a66e8 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -1627,6 +1627,7 @@ void sock_map_unhash(struct sock *sk) void (*saved_unhash)(struct sock *sk); struct sk_psock *psock; + lock_sock(sk); rcu_read_lock(); psock = sk_psock(sk); if (unlikely(!psock)) { @@ -1637,6 +1638,7 @@ void sock_map_unhash(struct sock *sk) sock_map_remove_links(sk, psock); rcu_read_unlock(); } + release_sock(sk); if (WARN_ON_ONCE(saved_unhash == sock_map_unhash)) return; if (saved_unhash) but perhaps what needs to be fixed instead is af_unix shutdown()? CCing Kuniyuki. thanks, Michal ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net v2] net: sockmap: avoid race between sock_map_destroy() and sk_psock_put() 2024-09-24 8:23 ` Paolo Abeni 2024-09-25 0:22 ` Michal Luczaj @ 2024-09-27 13:04 ` Jakub Sitnicki 1 sibling, 0 replies; 6+ messages in thread From: Jakub Sitnicki @ 2024-09-27 13:04 UTC (permalink / raw) To: Paolo Abeni, Dmitry Antipov Cc: John Fastabend, Cong Wang, Jakub Kicinski, netdev, lvc-project, syzbot+f363afac6b0ace576f45 On Tue, Sep 24, 2024 at 10:23 AM +02, Paolo Abeni wrote: > I guess that the main point in Cong's feedback is that a sockmap update is not > supposed to race with sock_map_destroy() (???) @Cong, @John, @JakubS: any > comments on that? Looking into it, but will need a bit more time because I'm working through a backlog and OoO next week. @Dmitry, To start off, could you ask syzbot to give your patch a spin by replying to its report? See instructions following the report [1]. Thanks, -jkbs [1] https://lore.kernel.org/netdev/000000000000abe6b50620a7f370@google.com/ ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-09-27 13:04 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-09-10 11:43 [PATCH net v2] net: sockmap: avoid race between sock_map_destroy() and sk_psock_put() Dmitry Antipov 2024-09-19 8:51 ` Paolo Abeni 2024-09-19 9:26 ` Dmitry Antipov 2024-09-24 8:23 ` Paolo Abeni 2024-09-25 0:22 ` Michal Luczaj 2024-09-27 13:04 ` Jakub Sitnicki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).