netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 net] udp: Set SOCK_RCU_FREE earlier in udp_lib_get_port().
@ 2024-07-08 18:20 Kuniyuki Iwashima
  2024-07-08 18:38 ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Kuniyuki Iwashima @ 2024-07-08 18:20 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Willem de Bruijn, David Ahern
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev, syzkaller

syzkaller triggered the warning [0] in udp_v4_early_demux().

In udp_v4_early_demux(), we do not touch the refcount of the looked-up
sk and use sock_pfree() as skb->destructor, so we check SOCK_RCU_FREE
to ensure that the sk is safe to access during the RCU grace period.

Currently, SOCK_RCU_FREE is flagged for a bound socket after being put
into the hash table.  Moreover, the SOCK_RCU_FREE check is done too
early in udp_v4_early_demux(), so there could be a small race window:

  CPU1                                 CPU2
  ----                                 ----
  udp_v4_early_demux()                 udp_lib_get_port()
  |                                    |- hlist_add_head_rcu()
  |- sk = __udp4_lib_demux_lookup()    |
  |- DEBUG_NET_WARN_ON_ONCE(sk_is_refcounted(sk));
                                       `- sock_set_flag(sk, SOCK_RCU_FREE)

In practice, sock_pfree() is called much later, when SOCK_RCU_FREE
is most likely propagated for other CPUs; otherwise, we will see
another warning of sk refcount underflow, but at least I didn't.

Technically, moving sock_set_flag(sk, SOCK_RCU_FREE) before
hlist_add_{head,tail}_rcu() does not guarantee the order, and we
must put smp_mb() between them, or smp_wmb() there and smp_rmb()
in udp_v4_early_demux().

But it's overkill in the real scenario, so I just put smp_mb() only under
CONFIG_DEBUG_NET to silence the splat.  When we see the refcount underflow
warning, we can remove the config guard.

Another option would be to remove DEBUG_NET_WARN_ON_ONCE(), but this could
make future debugging harder without the hints in udp_v4_early_demux() and
udp_lib_get_port().

[0]:
WARNING: CPU: 0 PID: 11198 at net/ipv4/udp.c:2599 udp_v4_early_demux+0x481/0xb70 net/ipv4/udp.c:2599
Modules linked in:
CPU: 0 PID: 11198 Comm: syz-executor.1 Not tainted 6.9.0-g93bda33046e7 #13
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:udp_v4_early_demux+0x481/0xb70 net/ipv4/udp.c:2599
Code: c5 7a 15 fe bb 01 00 00 00 44 89 e9 31 ff d3 e3 81 e3 bf ef ff ff 89 de e8 2c 74 15 fe 85 db 0f 85 02 06 00 00 e8 9f 7a 15 fe <0f> 0b e8 98 7a 15 fe 49 8d 7e 60 e8 4f 39 2f fe 49 c7 46 60 20 52
RSP: 0018:ffffc9000ce3fa58 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff8318c92c
RDX: ffff888036ccde00 RSI: ffffffff8318c2f1 RDI: 0000000000000001
RBP: ffff88805a2dd6e0 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0001ffffffffffff R12: ffff88805a2dd680
R13: 0000000000000007 R14: ffff88800923f900 R15: ffff88805456004e
FS:  00007fc449127640(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc449126e38 CR3: 000000003de4b002 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
PKRU: 55555554
Call Trace:
 <TASK>
 ip_rcv_finish_core.constprop.0+0xbdd/0xd20 net/ipv4/ip_input.c:349
 ip_rcv_finish+0xda/0x150 net/ipv4/ip_input.c:447
 NF_HOOK include/linux/netfilter.h:314 [inline]
 NF_HOOK include/linux/netfilter.h:308 [inline]
 ip_rcv+0x16c/0x180 net/ipv4/ip_input.c:569
 __netif_receive_skb_one_core+0xb3/0xe0 net/core/dev.c:5624
 __netif_receive_skb+0x21/0xd0 net/core/dev.c:5738
 netif_receive_skb_internal net/core/dev.c:5824 [inline]
 netif_receive_skb+0x271/0x300 net/core/dev.c:5884
 tun_rx_batched drivers/net/tun.c:1549 [inline]
 tun_get_user+0x24db/0x2c50 drivers/net/tun.c:2002
 tun_chr_write_iter+0x107/0x1a0 drivers/net/tun.c:2048
 new_sync_write fs/read_write.c:497 [inline]
 vfs_write+0x76f/0x8d0 fs/read_write.c:590
 ksys_write+0xbf/0x190 fs/read_write.c:643
 __do_sys_write fs/read_write.c:655 [inline]
 __se_sys_write fs/read_write.c:652 [inline]
 __x64_sys_write+0x41/0x50 fs/read_write.c:652
 x64_sys_call+0xe66/0x1990 arch/x86/include/generated/asm/syscalls_64.h:2
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x4b/0x110 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7fc44a68bc1f
Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 e9 cf f5 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 3c d0 f5 ff 48
RSP: 002b:00007fc449126c90 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00000000004bc050 RCX: 00007fc44a68bc1f
RDX: 0000000000000032 RSI: 00000000200000c0 RDI: 00000000000000c8
RBP: 00000000004bc050 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000032 R11: 0000000000000293 R12: 0000000000000000
R13: 000000000000000b R14: 00007fc44a5ec530 R15: 0000000000000000
 </TASK>

Fixes: 08842c43d016 ("udp: no longer touch sk->sk_refcnt in early demux")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/ipv4/udp.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 189c9113fe9a..1a05cc3d2b4f 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -326,6 +326,12 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
 			goto fail_unlock;
 		}
 
+		sock_set_flag(sk, SOCK_RCU_FREE);
+
+		if (IS_ENABLED(CONFIG_DEBUG_NET))
+			/* for DEBUG_NET_WARN_ON_ONCE() in udp_v4_early_demux(). */
+			smp_mb();
+
 		sk_add_node_rcu(sk, &hslot->head);
 		hslot->count++;
 		sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1);
@@ -342,7 +348,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
 		hslot2->count++;
 		spin_unlock(&hslot2->lock);
 	}
-	sock_set_flag(sk, SOCK_RCU_FREE);
+
 	error = 0;
 fail_unlock:
 	spin_unlock_bh(&hslot->lock);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v1 net] udp: Set SOCK_RCU_FREE earlier in udp_lib_get_port().
  2024-07-08 18:20 [PATCH v1 net] udp: Set SOCK_RCU_FREE earlier in udp_lib_get_port() Kuniyuki Iwashima
@ 2024-07-08 18:38 ` Eric Dumazet
  2024-07-08 18:55   ` Kuniyuki Iwashima
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2024-07-08 18:38 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Willem de Bruijn,
	David Ahern, Kuniyuki Iwashima, netdev, syzkaller

On Mon, Jul 8, 2024 at 11:20 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> syzkaller triggered the warning [0] in udp_v4_early_demux().
>
> In udp_v4_early_demux(), we do not touch the refcount of the looked-up
> sk and use sock_pfree() as skb->destructor, so we check SOCK_RCU_FREE
> to ensure that the sk is safe to access during the RCU grace period.
>
> Currently, SOCK_RCU_FREE is flagged for a bound socket after being put
> into the hash table.  Moreover, the SOCK_RCU_FREE check is done too
> early in udp_v4_early_demux(), so there could be a small race window:
>
>   CPU1                                 CPU2
>   ----                                 ----
>   udp_v4_early_demux()                 udp_lib_get_port()
>   |                                    |- hlist_add_head_rcu()
>   |- sk = __udp4_lib_demux_lookup()    |
>   |- DEBUG_NET_WARN_ON_ONCE(sk_is_refcounted(sk));
>                                        `- sock_set_flag(sk, SOCK_RCU_FREE)
>
> In practice, sock_pfree() is called much later, when SOCK_RCU_FREE
> is most likely propagated for other CPUs; otherwise, we will see
> another warning of sk refcount underflow, but at least I didn't.
>
> Technically, moving sock_set_flag(sk, SOCK_RCU_FREE) before
> hlist_add_{head,tail}_rcu() does not guarantee the order, and we
> must put smp_mb() between them, or smp_wmb() there and smp_rmb()
> in udp_v4_early_demux().
>
> But it's overkill in the real scenario, so I just put smp_mb() only under
> CONFIG_DEBUG_NET to silence the splat.  When we see the refcount underflow
> warning, we can remove the config guard.
>
> Another option would be to remove DEBUG_NET_WARN_ON_ONCE(), but this could
> make future debugging harder without the hints in udp_v4_early_demux() and
> udp_lib_get_port().
>
> [0]:
>
> Fixes: 08842c43d016 ("udp: no longer touch sk->sk_refcnt in early demux")
> Reported-by: syzkaller <syzkaller@googlegroups.com>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> ---
>  net/ipv4/udp.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 189c9113fe9a..1a05cc3d2b4f 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -326,6 +326,12 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
>                         goto fail_unlock;
>                 }
>
> +               sock_set_flag(sk, SOCK_RCU_FREE);

Nice catch.

> +
> +               if (IS_ENABLED(CONFIG_DEBUG_NET))
> +                       /* for DEBUG_NET_WARN_ON_ONCE() in udp_v4_early_demux(). */
> +                       smp_mb();
> +

I do not think this smp_mb() is needed. If this was, many other RCU
operations would need it,

RCU rules mandate that all memory writes must be committed before the
object can be seen
by other cpus in the hash table.

This includes the setting of the SOCK_RCU_FREE flag.

For instance, hlist_add_head_rcu() does a
rcu_assign_pointer(hlist_first_rcu(h), n);


>                 sk_add_node_rcu(sk, &hslot->head);
>                 hslot->count++;
>                 sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1);
> @@ -342,7 +348,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
>                 hslot2->count++;
>                 spin_unlock(&hslot2->lock);
>         }
> -       sock_set_flag(sk, SOCK_RCU_FREE);
> +
>         error = 0;
>  fail_unlock:
>         spin_unlock_bh(&hslot->lock);
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1 net] udp: Set SOCK_RCU_FREE earlier in udp_lib_get_port().
  2024-07-08 18:38 ` Eric Dumazet
@ 2024-07-08 18:55   ` Kuniyuki Iwashima
  2024-07-08 19:07     ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Kuniyuki Iwashima @ 2024-07-08 18:55 UTC (permalink / raw)
  To: edumazet
  Cc: davem, dsahern, kuba, kuni1840, kuniyu, netdev, pabeni, syzkaller,
	willemdebruijn.kernel

From: Eric Dumazet <edumazet@google.com>
Date: Mon, 8 Jul 2024 11:38:41 -0700
> On Mon, Jul 8, 2024 at 11:20 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> >
> > syzkaller triggered the warning [0] in udp_v4_early_demux().
> >
> > In udp_v4_early_demux(), we do not touch the refcount of the looked-up
> > sk and use sock_pfree() as skb->destructor, so we check SOCK_RCU_FREE
> > to ensure that the sk is safe to access during the RCU grace period.
> >
> > Currently, SOCK_RCU_FREE is flagged for a bound socket after being put
> > into the hash table.  Moreover, the SOCK_RCU_FREE check is done too
> > early in udp_v4_early_demux(), so there could be a small race window:
> >
> >   CPU1                                 CPU2
> >   ----                                 ----
> >   udp_v4_early_demux()                 udp_lib_get_port()
> >   |                                    |- hlist_add_head_rcu()
> >   |- sk = __udp4_lib_demux_lookup()    |
> >   |- DEBUG_NET_WARN_ON_ONCE(sk_is_refcounted(sk));
> >                                        `- sock_set_flag(sk, SOCK_RCU_FREE)
> >
> > In practice, sock_pfree() is called much later, when SOCK_RCU_FREE
> > is most likely propagated for other CPUs; otherwise, we will see
> > another warning of sk refcount underflow, but at least I didn't.
> >
> > Technically, moving sock_set_flag(sk, SOCK_RCU_FREE) before
> > hlist_add_{head,tail}_rcu() does not guarantee the order, and we
> > must put smp_mb() between them, or smp_wmb() there and smp_rmb()
> > in udp_v4_early_demux().
> >
> > But it's overkill in the real scenario, so I just put smp_mb() only under
> > CONFIG_DEBUG_NET to silence the splat.  When we see the refcount underflow
> > warning, we can remove the config guard.
> >
> > Another option would be to remove DEBUG_NET_WARN_ON_ONCE(), but this could
> > make future debugging harder without the hints in udp_v4_early_demux() and
> > udp_lib_get_port().
> >
> > [0]:
> >
> > Fixes: 08842c43d016 ("udp: no longer touch sk->sk_refcnt in early demux")
> > Reported-by: syzkaller <syzkaller@googlegroups.com>
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > ---
> >  net/ipv4/udp.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > index 189c9113fe9a..1a05cc3d2b4f 100644
> > --- a/net/ipv4/udp.c
> > +++ b/net/ipv4/udp.c
> > @@ -326,6 +326,12 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
> >                         goto fail_unlock;
> >                 }
> >
> > +               sock_set_flag(sk, SOCK_RCU_FREE);
> 
> Nice catch.
> 
> > +
> > +               if (IS_ENABLED(CONFIG_DEBUG_NET))
> > +                       /* for DEBUG_NET_WARN_ON_ONCE() in udp_v4_early_demux(). */
> > +                       smp_mb();
> > +
> 
> I do not think this smp_mb() is needed. If this was, many other RCU
> operations would need it,
> 
> RCU rules mandate that all memory writes must be committed before the
> object can be seen
> by other cpus in the hash table.
> 
> This includes the setting of the SOCK_RCU_FREE flag.
> 
> For instance, hlist_add_head_rcu() does a
> rcu_assign_pointer(hlist_first_rcu(h), n);

Ah, I was thinking spinlock will not prevent reordering, but
now I see, rcu_assign_pointer() had necessary barrier. :)

  /**
   * rcu_assign_pointer() - assign to RCU-protected pointer
   ...
   * Assigns the specified value to the specified RCU-protected
   * pointer, ensuring that any concurrent RCU readers will see
   * any prior initialization.

will remove smp_mb() and update the changelog in v2.

Thanks !

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1 net] udp: Set SOCK_RCU_FREE earlier in udp_lib_get_port().
  2024-07-08 18:55   ` Kuniyuki Iwashima
@ 2024-07-08 19:07     ` Eric Dumazet
  2024-07-08 19:20       ` Kuniyuki Iwashima
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2024-07-08 19:07 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: davem, dsahern, kuba, kuni1840, netdev, pabeni, syzkaller,
	willemdebruijn.kernel

On Mon, Jul 8, 2024 at 11:55 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> From: Eric Dumazet <edumazet@google.com>
> Date: Mon, 8 Jul 2024 11:38:41 -0700
> > On Mon, Jul 8, 2024 at 11:20 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > >
> > > syzkaller triggered the warning [0] in udp_v4_early_demux().
> > >
> > > In udp_v4_early_demux(), we do not touch the refcount of the looked-up
> > > sk and use sock_pfree() as skb->destructor, so we check SOCK_RCU_FREE
> > > to ensure that the sk is safe to access during the RCU grace period.
> > >
> > > Currently, SOCK_RCU_FREE is flagged for a bound socket after being put
> > > into the hash table.  Moreover, the SOCK_RCU_FREE check is done too
> > > early in udp_v4_early_demux(), so there could be a small race window:
> > >
> > >   CPU1                                 CPU2
> > >   ----                                 ----
> > >   udp_v4_early_demux()                 udp_lib_get_port()
> > >   |                                    |- hlist_add_head_rcu()
> > >   |- sk = __udp4_lib_demux_lookup()    |
> > >   |- DEBUG_NET_WARN_ON_ONCE(sk_is_refcounted(sk));
> > >                                        `- sock_set_flag(sk, SOCK_RCU_FREE)
> > >
> > > In practice, sock_pfree() is called much later, when SOCK_RCU_FREE
> > > is most likely propagated for other CPUs; otherwise, we will see
> > > another warning of sk refcount underflow, but at least I didn't.
> > >
> > > Technically, moving sock_set_flag(sk, SOCK_RCU_FREE) before
> > > hlist_add_{head,tail}_rcu() does not guarantee the order, and we
> > > must put smp_mb() between them, or smp_wmb() there and smp_rmb()
> > > in udp_v4_early_demux().
> > >
> > > But it's overkill in the real scenario, so I just put smp_mb() only under
> > > CONFIG_DEBUG_NET to silence the splat.  When we see the refcount underflow
> > > warning, we can remove the config guard.
> > >
> > > Another option would be to remove DEBUG_NET_WARN_ON_ONCE(), but this could
> > > make future debugging harder without the hints in udp_v4_early_demux() and
> > > udp_lib_get_port().
> > >
> > > [0]:
> > >
> > > Fixes: 08842c43d016 ("udp: no longer touch sk->sk_refcnt in early demux")
> > > Reported-by: syzkaller <syzkaller@googlegroups.com>
> > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > > ---
> > >  net/ipv4/udp.c | 8 +++++++-
> > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > > index 189c9113fe9a..1a05cc3d2b4f 100644
> > > --- a/net/ipv4/udp.c
> > > +++ b/net/ipv4/udp.c
> > > @@ -326,6 +326,12 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
> > >                         goto fail_unlock;
> > >                 }
> > >
> > > +               sock_set_flag(sk, SOCK_RCU_FREE);
> >
> > Nice catch.
> >
> > > +
> > > +               if (IS_ENABLED(CONFIG_DEBUG_NET))
> > > +                       /* for DEBUG_NET_WARN_ON_ONCE() in udp_v4_early_demux(). */
> > > +                       smp_mb();
> > > +
> >
> > I do not think this smp_mb() is needed. If this was, many other RCU
> > operations would need it,
> >
> > RCU rules mandate that all memory writes must be committed before the
> > object can be seen
> > by other cpus in the hash table.
> >
> > This includes the setting of the SOCK_RCU_FREE flag.
> >
> > For instance, hlist_add_head_rcu() does a
> > rcu_assign_pointer(hlist_first_rcu(h), n);
>
> Ah, I was thinking spinlock will not prevent reordering, but
> now I see, rcu_assign_pointer() had necessary barrier. :)
>
>   /**
>    * rcu_assign_pointer() - assign to RCU-protected pointer
>    ...
>    * Assigns the specified value to the specified RCU-protected
>    * pointer, ensuring that any concurrent RCU readers will see
>    * any prior initialization.
>
> will remove smp_mb() and update the changelog in v2.
>

A similar commit was

commit 871019b22d1bcc9fab2d1feba1b9a564acbb6e99
Author: Stanislav Fomichev <sdf@fomichev.me>
Date:   Wed Nov 8 13:13:25 2023 -0800

    net: set SOCK_RCU_FREE before inserting socket into hashtable

So I wonder if the bug could be older...

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1 net] udp: Set SOCK_RCU_FREE earlier in udp_lib_get_port().
  2024-07-08 19:07     ` Eric Dumazet
@ 2024-07-08 19:20       ` Kuniyuki Iwashima
  2024-07-08 20:53         ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Kuniyuki Iwashima @ 2024-07-08 19:20 UTC (permalink / raw)
  To: edumazet
  Cc: davem, dsahern, kuba, kuni1840, kuniyu, netdev, pabeni, syzkaller,
	willemdebruijn.kernel

From: Eric Dumazet <edumazet@google.com>
Date: Mon, 8 Jul 2024 12:07:56 -0700
> On Mon, Jul 8, 2024 at 11:55 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> >
> > From: Eric Dumazet <edumazet@google.com>
> > Date: Mon, 8 Jul 2024 11:38:41 -0700
> > > On Mon, Jul 8, 2024 at 11:20 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > > >
> > > > syzkaller triggered the warning [0] in udp_v4_early_demux().
> > > >
> > > > In udp_v4_early_demux(), we do not touch the refcount of the looked-up
> > > > sk and use sock_pfree() as skb->destructor, so we check SOCK_RCU_FREE
> > > > to ensure that the sk is safe to access during the RCU grace period.
> > > >
> > > > Currently, SOCK_RCU_FREE is flagged for a bound socket after being put
> > > > into the hash table.  Moreover, the SOCK_RCU_FREE check is done too
> > > > early in udp_v4_early_demux(), so there could be a small race window:
> > > >
> > > >   CPU1                                 CPU2
> > > >   ----                                 ----
> > > >   udp_v4_early_demux()                 udp_lib_get_port()
> > > >   |                                    |- hlist_add_head_rcu()
> > > >   |- sk = __udp4_lib_demux_lookup()    |
> > > >   |- DEBUG_NET_WARN_ON_ONCE(sk_is_refcounted(sk));
> > > >                                        `- sock_set_flag(sk, SOCK_RCU_FREE)
> > > >
> > > > In practice, sock_pfree() is called much later, when SOCK_RCU_FREE
> > > > is most likely propagated for other CPUs; otherwise, we will see
> > > > another warning of sk refcount underflow, but at least I didn't.
> > > >
> > > > Technically, moving sock_set_flag(sk, SOCK_RCU_FREE) before
> > > > hlist_add_{head,tail}_rcu() does not guarantee the order, and we
> > > > must put smp_mb() between them, or smp_wmb() there and smp_rmb()
> > > > in udp_v4_early_demux().
> > > >
> > > > But it's overkill in the real scenario, so I just put smp_mb() only under
> > > > CONFIG_DEBUG_NET to silence the splat.  When we see the refcount underflow
> > > > warning, we can remove the config guard.
> > > >
> > > > Another option would be to remove DEBUG_NET_WARN_ON_ONCE(), but this could
> > > > make future debugging harder without the hints in udp_v4_early_demux() and
> > > > udp_lib_get_port().
> > > >
> > > > [0]:
> > > >
> > > > Fixes: 08842c43d016 ("udp: no longer touch sk->sk_refcnt in early demux")
> > > > Reported-by: syzkaller <syzkaller@googlegroups.com>
> > > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > > > ---
> > > >  net/ipv4/udp.c | 8 +++++++-
> > > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > > > index 189c9113fe9a..1a05cc3d2b4f 100644
> > > > --- a/net/ipv4/udp.c
> > > > +++ b/net/ipv4/udp.c
> > > > @@ -326,6 +326,12 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
> > > >                         goto fail_unlock;
> > > >                 }
> > > >
> > > > +               sock_set_flag(sk, SOCK_RCU_FREE);
> > >
> > > Nice catch.
> > >
> > > > +
> > > > +               if (IS_ENABLED(CONFIG_DEBUG_NET))
> > > > +                       /* for DEBUG_NET_WARN_ON_ONCE() in udp_v4_early_demux(). */
> > > > +                       smp_mb();
> > > > +
> > >
> > > I do not think this smp_mb() is needed. If this was, many other RCU
> > > operations would need it,
> > >
> > > RCU rules mandate that all memory writes must be committed before the
> > > object can be seen
> > > by other cpus in the hash table.
> > >
> > > This includes the setting of the SOCK_RCU_FREE flag.
> > >
> > > For instance, hlist_add_head_rcu() does a
> > > rcu_assign_pointer(hlist_first_rcu(h), n);
> >
> > Ah, I was thinking spinlock will not prevent reordering, but
> > now I see, rcu_assign_pointer() had necessary barrier. :)
> >
> >   /**
> >    * rcu_assign_pointer() - assign to RCU-protected pointer
> >    ...
> >    * Assigns the specified value to the specified RCU-protected
> >    * pointer, ensuring that any concurrent RCU readers will see
> >    * any prior initialization.
> >
> > will remove smp_mb() and update the changelog in v2.
> >
> 
> A similar commit was
> 
> commit 871019b22d1bcc9fab2d1feba1b9a564acbb6e99
> Author: Stanislav Fomichev <sdf@fomichev.me>
> Date:   Wed Nov 8 13:13:25 2023 -0800
> 
>     net: set SOCK_RCU_FREE before inserting socket into hashtable
> 
> So I wonder if the bug could be older...

If we focus on the ordering, the Fixes tag would be

Fixes: ca065d0cf80f ("udp: no longer use SLAB_DESTROY_BY_RCU")

But, at that time, we had atomic_inc_not_zero_hint() and used
sock_efree(), which were removed later in 08842c43d016.

Which one should I use as Fixes: ?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1 net] udp: Set SOCK_RCU_FREE earlier in udp_lib_get_port().
  2024-07-08 19:20       ` Kuniyuki Iwashima
@ 2024-07-08 20:53         ` Eric Dumazet
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2024-07-08 20:53 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: davem, dsahern, kuba, kuni1840, netdev, pabeni, syzkaller,
	willemdebruijn.kernel

On Mon, Jul 8, 2024 at 12:20 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> From: Eric Dumazet <edumazet@google.com>
> Date: Mon, 8 Jul 2024 12:07:56 -0700
> > On Mon, Jul 8, 2024 at 11:55 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > >
> > > From: Eric Dumazet <edumazet@google.com>
> > > Date: Mon, 8 Jul 2024 11:38:41 -0700
> > > > On Mon, Jul 8, 2024 at 11:20 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > > > >
> > > > > syzkaller triggered the warning [0] in udp_v4_early_demux().
> > > > >
> > > > > In udp_v4_early_demux(), we do not touch the refcount of the looked-up
> > > > > sk and use sock_pfree() as skb->destructor, so we check SOCK_RCU_FREE
> > > > > to ensure that the sk is safe to access during the RCU grace period.
> > > > >
> > > > > Currently, SOCK_RCU_FREE is flagged for a bound socket after being put
> > > > > into the hash table.  Moreover, the SOCK_RCU_FREE check is done too
> > > > > early in udp_v4_early_demux(), so there could be a small race window:
> > > > >
> > > > >   CPU1                                 CPU2
> > > > >   ----                                 ----
> > > > >   udp_v4_early_demux()                 udp_lib_get_port()
> > > > >   |                                    |- hlist_add_head_rcu()
> > > > >   |- sk = __udp4_lib_demux_lookup()    |
> > > > >   |- DEBUG_NET_WARN_ON_ONCE(sk_is_refcounted(sk));
> > > > >                                        `- sock_set_flag(sk, SOCK_RCU_FREE)
> > > > >
> > > > > In practice, sock_pfree() is called much later, when SOCK_RCU_FREE
> > > > > is most likely propagated for other CPUs; otherwise, we will see
> > > > > another warning of sk refcount underflow, but at least I didn't.
> > > > >
> > > > > Technically, moving sock_set_flag(sk, SOCK_RCU_FREE) before
> > > > > hlist_add_{head,tail}_rcu() does not guarantee the order, and we
> > > > > must put smp_mb() between them, or smp_wmb() there and smp_rmb()
> > > > > in udp_v4_early_demux().
> > > > >
> > > > > But it's overkill in the real scenario, so I just put smp_mb() only under
> > > > > CONFIG_DEBUG_NET to silence the splat.  When we see the refcount underflow
> > > > > warning, we can remove the config guard.
> > > > >
> > > > > Another option would be to remove DEBUG_NET_WARN_ON_ONCE(), but this could
> > > > > make future debugging harder without the hints in udp_v4_early_demux() and
> > > > > udp_lib_get_port().
> > > > >
> > > > > [0]:
> > > > >
> > > > > Fixes: 08842c43d016 ("udp: no longer touch sk->sk_refcnt in early demux")
> > > > > Reported-by: syzkaller <syzkaller@googlegroups.com>
> > > > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > > > > ---
> > > > >  net/ipv4/udp.c | 8 +++++++-
> > > > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > > > > index 189c9113fe9a..1a05cc3d2b4f 100644
> > > > > --- a/net/ipv4/udp.c
> > > > > +++ b/net/ipv4/udp.c
> > > > > @@ -326,6 +326,12 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
> > > > >                         goto fail_unlock;
> > > > >                 }
> > > > >
> > > > > +               sock_set_flag(sk, SOCK_RCU_FREE);
> > > >
> > > > Nice catch.
> > > >
> > > > > +
> > > > > +               if (IS_ENABLED(CONFIG_DEBUG_NET))
> > > > > +                       /* for DEBUG_NET_WARN_ON_ONCE() in udp_v4_early_demux(). */
> > > > > +                       smp_mb();
> > > > > +
> > > >
> > > > I do not think this smp_mb() is needed. If this was, many other RCU
> > > > operations would need it,
> > > >
> > > > RCU rules mandate that all memory writes must be committed before the
> > > > object can be seen
> > > > by other cpus in the hash table.
> > > >
> > > > This includes the setting of the SOCK_RCU_FREE flag.
> > > >
> > > > For instance, hlist_add_head_rcu() does a
> > > > rcu_assign_pointer(hlist_first_rcu(h), n);
> > >
> > > Ah, I was thinking spinlock will not prevent reordering, but
> > > now I see, rcu_assign_pointer() had necessary barrier. :)
> > >
> > >   /**
> > >    * rcu_assign_pointer() - assign to RCU-protected pointer
> > >    ...
> > >    * Assigns the specified value to the specified RCU-protected
> > >    * pointer, ensuring that any concurrent RCU readers will see
> > >    * any prior initialization.
> > >
> > > will remove smp_mb() and update the changelog in v2.
> > >
> >
> > A similar commit was
> >
> > commit 871019b22d1bcc9fab2d1feba1b9a564acbb6e99
> > Author: Stanislav Fomichev <sdf@fomichev.me>
> > Date:   Wed Nov 8 13:13:25 2023 -0800
> >
> >     net: set SOCK_RCU_FREE before inserting socket into hashtable
> >
> > So I wonder if the bug could be older...
>
> If we focus on the ordering, the Fixes tag would be
>
> Fixes: ca065d0cf80f ("udp: no longer use SLAB_DESTROY_BY_RCU")
>
> But, at that time, we had atomic_inc_not_zero_hint() and used
> sock_efree(), which were removed later in 08842c43d016.
>
> Which one should I use as Fixes: ?

I think the older issue might only surface with eBPF users.

commit 6acc9b432e6714d72d7d77ec7c27f6f8358d0c71
Author: Joe Stringer <joe@wand.net.nz>
Date:   Tue Oct 2 13:35:36 2018 -0700

    bpf: Add helper to retrieve socket in BPF

The effect of the bug would be an UDP socket leak (because an
atomic_inc_not_zero_hint() could be used
before SOCK_RCU_FREE has been set. Then the refcount decrement would
be avoided if the SOCK_RCU_FREE was set before it)

08842c43d016 ("udp: no longer touch sk->sk_refcnt in early demux")
added a DEBUG_NET_WARN_ON_ONCE()
which made the bug visible.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-07-08 20:53 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-08 18:20 [PATCH v1 net] udp: Set SOCK_RCU_FREE earlier in udp_lib_get_port() Kuniyuki Iwashima
2024-07-08 18:38 ` Eric Dumazet
2024-07-08 18:55   ` Kuniyuki Iwashima
2024-07-08 19:07     ` Eric Dumazet
2024-07-08 19:20       ` Kuniyuki Iwashima
2024-07-08 20:53         ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).