netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v2] net: sockmap: avoid race between sock_map_destroy() and sk_psock_put()
@ 2024-09-10 11:43 Dmitry Antipov
  2024-09-19  8:51 ` Paolo Abeni
  0 siblings, 1 reply; 6+ messages in thread
From: Dmitry Antipov @ 2024-09-10 11:43 UTC (permalink / raw)
  To: John Fastabend, Jakub Sitnicki
  Cc: Cong Wang, Jakub Kicinski, Paolo Abeni, netdev, lvc-project,
	Dmitry Antipov, syzbot+f363afac6b0ace576f45

Syzbot has triggered the following race condition:

On CPU0, 'sk_psock_drop()' (most likely scheduled from 'sock_map_unref()'
called by 'sock_map_update_common()') is running at [1]:

void sk_psock_drop(struct sock *sk, struct sk_psock *psock)
{
        write_lock_bh(&sk->sk_callback_lock);
        sk_psock_restore_proto(sk, psock);                              [1]
        rcu_assign_sk_user_data(sk, NULL);                              [2]
        ...
}

If 'sock_map_destroy()' is scheduled on CPU1 at the same time, psock is
always NULL at [3]. But, since [1] may be is in progress during [4], the
value of 'saved_destroy' at this point is undefined:

void sock_map_destroy(struct sock *sk)
{
        void (*saved_destroy)(struct sock *sk);
        struct sk_psock *psock;

        rcu_read_lock();
        psock = sk_psock_get(sk);                                       [3]
        if (unlikely(!psock)) {
                rcu_read_unlock();
                saved_destroy = READ_ONCE(sk->sk_prot)->destroy;        [4]
        } else {
                saved_destroy = psock->saved_destroy;                   [5]
                sock_map_remove_links(sk, psock);
                rcu_read_unlock();
                sk_psock_stop(psock);
                sk_psock_put(sk, psock);
        }
        if (WARN_ON_ONCE(saved_destroy == sock_map_destroy))
                return;
        if (saved_destroy)
                saved_destroy(sk);
}

Fix this issue in 3 steps:

1. Prefer 'sk_psock()' over 'sk_psock_get()' at [3]. Since zero
   refcount is ignored, 'psock' is non-NULL until [2] is completed.

2. Add read lock around [5], to make sure that [1] is not in progress
   when the former is executed.

3. Since 'sk_psock()' does not adjust reference counting, drop
   'sk_psock_put()' and redundant 'sk_psock_stop()' (which is
   executed by 'sk_psock_drop()' anyway).

Fixes: 5b4a79ba65a1 ("bpf, sockmap: Don't let sock_map_{close,destroy,unhash} call itself")
Reported-by: syzbot+f363afac6b0ace576f45@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f363afac6b0ace576f45
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
---
 net/core/sock_map.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index d3dbb92153f2..1eeb1d3a6b71 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -1649,16 +1649,16 @@ void sock_map_destroy(struct sock *sk)
 	struct sk_psock *psock;
 
 	rcu_read_lock();
-	psock = sk_psock_get(sk);
+	psock = sk_psock(sk);
 	if (unlikely(!psock)) {
 		rcu_read_unlock();
 		saved_destroy = READ_ONCE(sk->sk_prot)->destroy;
 	} else {
+		read_lock_bh(&sk->sk_callback_lock);
 		saved_destroy = psock->saved_destroy;
+		read_unlock_bh(&sk->sk_callback_lock);
 		sock_map_remove_links(sk, psock);
 		rcu_read_unlock();
-		sk_psock_stop(psock);
-		sk_psock_put(sk, psock);
 	}
 	if (WARN_ON_ONCE(saved_destroy == sock_map_destroy))
 		return;
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2] net: sockmap: avoid race between sock_map_destroy() and sk_psock_put()
  2024-09-10 11:43 [PATCH net v2] net: sockmap: avoid race between sock_map_destroy() and sk_psock_put() Dmitry Antipov
@ 2024-09-19  8:51 ` Paolo Abeni
  2024-09-19  9:26   ` Dmitry Antipov
  0 siblings, 1 reply; 6+ messages in thread
From: Paolo Abeni @ 2024-09-19  8:51 UTC (permalink / raw)
  To: Dmitry Antipov, John Fastabend, Jakub Sitnicki
  Cc: Cong Wang, Jakub Kicinski, netdev, lvc-project,
	syzbot+f363afac6b0ace576f45

On 9/10/24 13:43, Dmitry Antipov wrote:
> Syzbot has triggered the following race condition:
> 
> On CPU0, 'sk_psock_drop()' (most likely scheduled from 'sock_map_unref()'
> called by 'sock_map_update_common()') is running at [1]:
> 
> void sk_psock_drop(struct sock *sk, struct sk_psock *psock)
> {
>          write_lock_bh(&sk->sk_callback_lock);
>          sk_psock_restore_proto(sk, psock);                              [1]
>          rcu_assign_sk_user_data(sk, NULL);                              [2]
>          ...
> }
> 
> If 'sock_map_destroy()' is scheduled on CPU1 at the same time, psock is
> always NULL at [3]. But, since [1] may be is in progress during [4], the
> value of 'saved_destroy' at this point is undefined:
> 
> void sock_map_destroy(struct sock *sk)
> {
>          void (*saved_destroy)(struct sock *sk);
>          struct sk_psock *psock;
> 
>          rcu_read_lock();
>          psock = sk_psock_get(sk);                                       [3]
>          if (unlikely(!psock)) {
>                  rcu_read_unlock();
>                  saved_destroy = READ_ONCE(sk->sk_prot)->destroy;        [4]
>          } else {
>                  saved_destroy = psock->saved_destroy;                   [5]
>                  sock_map_remove_links(sk, psock);
>                  rcu_read_unlock();
>                  sk_psock_stop(psock);
>                  sk_psock_put(sk, psock);
>          }
>          if (WARN_ON_ONCE(saved_destroy == sock_map_destroy))
>                  return;
>          if (saved_destroy)
>                  saved_destroy(sk);
> }
> 
> Fix this issue in 3 steps:
> 
> 1. Prefer 'sk_psock()' over 'sk_psock_get()' at [3]. Since zero
>     refcount is ignored, 'psock' is non-NULL until [2] is completed.
> 
> 2. Add read lock around [5], to make sure that [1] is not in progress
>     when the former is executed.
> 
> 3. Since 'sk_psock()' does not adjust reference counting, drop
>     'sk_psock_put()' and redundant 'sk_psock_stop()' (which is
>     executed by 'sk_psock_drop()' anyway).
> 
> Fixes: 5b4a79ba65a1 ("bpf, sockmap: Don't let sock_map_{close,destroy,unhash} call itself")
> Reported-by: syzbot+f363afac6b0ace576f45@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=f363afac6b0ace576f45
> Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>

I think there is still Cong question pending: why this could not/ should 
not be addressed instead in the RDS code.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2] net: sockmap: avoid race between sock_map_destroy() and sk_psock_put()
  2024-09-19  8:51 ` Paolo Abeni
@ 2024-09-19  9:26   ` Dmitry Antipov
  2024-09-24  8:23     ` Paolo Abeni
  0 siblings, 1 reply; 6+ messages in thread
From: Dmitry Antipov @ 2024-09-19  9:26 UTC (permalink / raw)
  To: Paolo Abeni, John Fastabend, Jakub Sitnicki
  Cc: Cong Wang, Jakub Kicinski, netdev, lvc-project,
	syzbot+f363afac6b0ace576f45

On 9/19/24 11:51 AM, Paolo Abeni wrote:

> I think there is still Cong question pending: why this could not/ should not be addressed instead in the RDS code.

Hm. Except 'rds_tcp_accept_xxx()' functions, the rest of the backtrace
is contributed by generic TCP and IP things. I've tried to debug this
issue starting from the closest innards and seems found an issue within
sockmap code. Anyway, I'm strongly disagree with "unless otherwise proven,
there are a lot of bugs everywhere except the subsystem I'm responsible
to" bias, assuming that a bit more friendly and cooperative efforts
should give us the better results.

Dmitry


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2] net: sockmap: avoid race between sock_map_destroy() and sk_psock_put()
  2024-09-19  9:26   ` Dmitry Antipov
@ 2024-09-24  8:23     ` Paolo Abeni
  2024-09-25  0:22       ` Michal Luczaj
  2024-09-27 13:04       ` Jakub Sitnicki
  0 siblings, 2 replies; 6+ messages in thread
From: Paolo Abeni @ 2024-09-24  8:23 UTC (permalink / raw)
  To: Dmitry Antipov, John Fastabend, Jakub Sitnicki
  Cc: Cong Wang, Jakub Kicinski, netdev, lvc-project,
	syzbot+f363afac6b0ace576f45

On 9/19/24 11:26, Dmitry Antipov wrote:
> On 9/19/24 11:51 AM, Paolo Abeni wrote:
> 
>> I think there is still Cong question pending: why this could not/ should not be addressed instead in the RDS code.
> 
> Hm. Except 'rds_tcp_accept_xxx()' functions, the rest of the backtrace
> is contributed by generic TCP and IP things.

AFAICS most of RDS is build on top of TCP, most RDS-specific bugs will 
show a lot tcp/ip in their backtrace.

i.e. with mptcp we had quite a few bugs with a 'tcp mostily' or 'tcp 
only' backtrace and root caused in the mptcp code.

> I've tried to debug this
> issue starting from the closest innards and seems found an issue within
> sockmap code.
> Anyway, I'm strongly disagree with "unless otherwise proven,
> there are a lot of bugs everywhere except the subsystem I'm responsible
> to" bias, assuming that a bit more friendly and cooperative efforts
> should give us the better results.

I guess that the main point in Cong's feedback is that a sockmap update 
is not supposed to race with sock_map_destroy() (???) @Cong, @John, 
@JakubS: any comments on that?

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2] net: sockmap: avoid race between sock_map_destroy() and sk_psock_put()
  2024-09-24  8:23     ` Paolo Abeni
@ 2024-09-25  0:22       ` Michal Luczaj
  2024-09-27 13:04       ` Jakub Sitnicki
  1 sibling, 0 replies; 6+ messages in thread
From: Michal Luczaj @ 2024-09-25  0:22 UTC (permalink / raw)
  To: Paolo Abeni, Dmitry Antipov, John Fastabend, Jakub Sitnicki
  Cc: Cong Wang, Jakub Kicinski, netdev, lvc-project,
	syzbot+f363afac6b0ace576f45, Kuniyuki Iwashima

On 9/24/24 10:23, Paolo Abeni wrote:
> ...
> I guess that the main point in Cong's feedback is that a sockmap update 
> is not supposed to race with sock_map_destroy() (???) @Cong, @John, 
> @JakubS: any comments on that?

In somewhat related news: sock_map_unhash() races with the update, hitting
WARN_ON_ONCE(saved_unhash == sock_map_unhash).

CPU0					CPU1
====					====

BPF_MAP_DELETE_ELEM
  sk_psock_drop()
    sk_psock_restore_proto
    rcu_assign_sk_user_data(NULL)
    					shutdown()
					  sock_map_unhash()
					    psock = sk_psock(sk)
					    if (unlikely(!psock)) {
BPF_MAP_UPDATE_ELEM
  sock_map_init_proto()
    sock_replace_proto
					      saved_unhash = READ_ONCE(sk->sk_prot)->unhash;
					    }
					    if (WARN_ON_ONCE(saved_unhash == sock_map_unhash))
					      return;

[   20.860668] WARNING: CPU: 1 PID: 1238 at net/core/sock_map.c:1641 sock_map_unhash+0xa1/0x220
[   20.860686] CPU: 1 UID: 0 PID: 1238 Comm: a.out Not tainted 6.11.0+ #59
[   20.860688] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[   20.860705] Call Trace:
[   20.860706]  <TASK>
[   20.860725]  unix_shutdown+0xb0/0x470
[   20.860728]  __sys_shutdown+0x7a/0xa0
[   20.860731]  __x64_sys_shutdown+0x10/0x20
[   20.860733]  do_syscall_64+0x93/0x180
[   20.860788]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

Under VM it takes about 10 minutes to trigger the splat:

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <unistd.h>
#include <pthread.h>
#include <sys/un.h>
#include <sys/syscall.h>
#include <sys/socket.h>
#include <linux/bpf.h>

int s[2], sockmap;

static void die(char *msg)
{
	perror(msg);
	exit(-1);
}

static int create_sockmap(int key_size, int value_size, int max_entries)
{
	union bpf_attr attr = {
		.map_type = BPF_MAP_TYPE_SOCKMAP,
		.key_size = key_size,
		.value_size = value_size,
		.max_entries = max_entries
	};

	int map = syscall(SYS_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
	if (map < 0)
		die("bpf_create_map");

	return map;
}

static void map_update_elem(int map_fd, int key, void *value, uint64_t flags)
{
	union bpf_attr attr = {
		.map_fd = map_fd,
		.key = (uint64_t)&key,
		.value = (uint64_t)value,
		.flags = flags
	};

	syscall(SYS_bpf, BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
}

static void map_del_elem(int map_fd, int key)
{
	union bpf_attr attr = {
		.map_fd = map_fd,
		.key = (uint64_t)&key
	};

	syscall(SYS_bpf, BPF_MAP_DELETE_ELEM, &attr, sizeof(attr));
}

static void *racer_del(void *unused)
{
	for (;;)
		map_del_elem(sockmap, 0);

	return NULL;
}
static void *racer_update(void *unused)
{
	for (;;)
		map_update_elem(sockmap, 0, &s[0], BPF_ANY);

	return NULL;
}

int main(void)
{
	pthread_t t0, t1;

	if (pthread_create(&t0, NULL, racer_del, NULL))
		die("pthread_create");

	if (pthread_create(&t1, NULL, racer_update, NULL))
		die("pthread_create");

	sockmap = create_sockmap(sizeof(int), sizeof(int), 1);

	for (;;) {
		if (socketpair(AF_UNIX, SOCK_STREAM, 0, s) < 0)
			die("socketpair");

		map_update_elem(sockmap, 0, &s[0], BPF_ANY);
		shutdown(s[1], 0);

		close(s[0]);
		close(s[1]);
	}
}

With mdelay(1) it's a matter of seconds:

diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 724b6856fcc3..98a964399813 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -1631,6 +1631,7 @@ void sock_map_unhash(struct sock *sk)
 	psock = sk_psock(sk);
 	if (unlikely(!psock)) {
 		rcu_read_unlock();
+		mdelay(1);
 		saved_unhash = READ_ONCE(sk->sk_prot)->unhash;
 	} else {
 		saved_unhash = psock->saved_unhash;

I've tried the patch below and it seems to do the trick

diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 724b6856fcc3..a384771a66e8 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -1627,6 +1627,7 @@ void sock_map_unhash(struct sock *sk)
 	void (*saved_unhash)(struct sock *sk);
 	struct sk_psock *psock;
 
+	lock_sock(sk);
 	rcu_read_lock();
 	psock = sk_psock(sk);
 	if (unlikely(!psock)) {
@@ -1637,6 +1638,7 @@ void sock_map_unhash(struct sock *sk)
 		sock_map_remove_links(sk, psock);
 		rcu_read_unlock();
 	}
+	release_sock(sk);
 	if (WARN_ON_ONCE(saved_unhash == sock_map_unhash))
 		return;
 	if (saved_unhash)

but perhaps what needs to be fixed instead is af_unix shutdown()?
CCing Kuniyuki.

thanks,
Michal


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2] net: sockmap: avoid race between sock_map_destroy() and sk_psock_put()
  2024-09-24  8:23     ` Paolo Abeni
  2024-09-25  0:22       ` Michal Luczaj
@ 2024-09-27 13:04       ` Jakub Sitnicki
  1 sibling, 0 replies; 6+ messages in thread
From: Jakub Sitnicki @ 2024-09-27 13:04 UTC (permalink / raw)
  To: Paolo Abeni, Dmitry Antipov
  Cc: John Fastabend, Cong Wang, Jakub Kicinski, netdev, lvc-project,
	syzbot+f363afac6b0ace576f45

On Tue, Sep 24, 2024 at 10:23 AM +02, Paolo Abeni wrote:
> I guess that the main point in Cong's feedback is that a sockmap update is not
> supposed to race with sock_map_destroy() (???) @Cong, @John, @JakubS: any
> comments on that?

Looking into it, but will need a bit more time because I'm working
through a backlog and OoO next week.

@Dmitry,

To start off, could you ask syzbot to give your patch a spin by replying
to its report? See instructions following the report [1].

Thanks,
-jkbs

[1] https://lore.kernel.org/netdev/000000000000abe6b50620a7f370@google.com/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-09-27 13:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-10 11:43 [PATCH net v2] net: sockmap: avoid race between sock_map_destroy() and sk_psock_put() Dmitry Antipov
2024-09-19  8:51 ` Paolo Abeni
2024-09-19  9:26   ` Dmitry Antipov
2024-09-24  8:23     ` Paolo Abeni
2024-09-25  0:22       ` Michal Luczaj
2024-09-27 13:04       ` Jakub Sitnicki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).