* [PATCH net] net: set SOCK_RCU_FREE before inserting socket into hashtable
@ 2023-11-08 20:28 Stanislav Fomichev
2023-11-08 20:58 ` Eric Dumazet
0 siblings, 1 reply; 3+ messages in thread
From: Stanislav Fomichev @ 2023-11-08 20:28 UTC (permalink / raw)
To: netdev; +Cc: davem, edumazet, kuba, pabeni, Stanislav Fomichev
We've started to see the following kernel traces:
WARNING: CPU: 83 PID: 0 at net/core/filter.c:6641 sk_lookup+0x1bd/0x1d0
Call Trace:
<IRQ>
__bpf_skc_lookup+0x10d/0x120
bpf_sk_lookup+0x48/0xd0
bpf_sk_lookup_tcp+0x19/0x20
bpf_prog_<redacted>+0x37c/0x16a3
cls_bpf_classify+0x205/0x2e0
tcf_classify+0x92/0x160
__netif_receive_skb_core+0xe52/0xf10
__netif_receive_skb_list_core+0x96/0x2b0
napi_complete_done+0x7b5/0xb70
<redacted>_poll+0x94/0xb0
net_rx_action+0x163/0x1d70
__do_softirq+0xdc/0x32e
asm_call_irq_on_stack+0x12/0x20
</IRQ>
do_softirq_own_stack+0x36/0x50
do_softirq+0x44/0x70
I'm not 100% what is causing them. It might be some kernel change or
new code path in the bpf program. But looking at the code,
I'm assuming the issue has been there for a while.
__inet_hash can race with lockless (rcu) readers on the other cpus:
__inet_hash
__sk_nulls_add_node_rcu
<- (bpf triggers here)
sock_set_flag(SOCK_RCU_FREE)
Let's move the SOCK_RCU_FREE part up a bit, before we are inserting
the socket into hashtables. Note, that the race is really harmless;
the bpf callers are handling this situation (where listener socket
doesn't have SOCK_RCU_FREE set) correctly, so the only
annoyance is a WARN_ONCE (so not 100% sure whether it should
wait until net-next instead).
For the fixes tag, I'm using the original commit which added the flag.
Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
net/ipv4/inet_hashtables.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 598c1b114d2c..a532f749e477 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -751,12 +751,12 @@ int __inet_hash(struct sock *sk, struct sock *osk)
if (err)
goto unlock;
}
+ sock_set_flag(sk, SOCK_RCU_FREE);
if (IS_ENABLED(CONFIG_IPV6) && sk->sk_reuseport &&
sk->sk_family == AF_INET6)
__sk_nulls_add_node_tail_rcu(sk, &ilb2->nulls_head);
else
__sk_nulls_add_node_rcu(sk, &ilb2->nulls_head);
- sock_set_flag(sk, SOCK_RCU_FREE);
sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1);
unlock:
spin_unlock(&ilb2->lock);
--
2.42.0.869.gea05f2083d-goog
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH net] net: set SOCK_RCU_FREE before inserting socket into hashtable
2023-11-08 20:28 [PATCH net] net: set SOCK_RCU_FREE before inserting socket into hashtable Stanislav Fomichev
@ 2023-11-08 20:58 ` Eric Dumazet
2023-11-08 21:01 ` Stanislav Fomichev
0 siblings, 1 reply; 3+ messages in thread
From: Eric Dumazet @ 2023-11-08 20:58 UTC (permalink / raw)
To: Stanislav Fomichev; +Cc: netdev, davem, kuba, pabeni
On Wed, Nov 8, 2023 at 9:28 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> We've started to see the following kernel traces:
>
> WARNING: CPU: 83 PID: 0 at net/core/filter.c:6641 sk_lookup+0x1bd/0x1d0
>
> Call Trace:
> <IRQ>
> __bpf_skc_lookup+0x10d/0x120
> bpf_sk_lookup+0x48/0xd0
> bpf_sk_lookup_tcp+0x19/0x20
> bpf_prog_<redacted>+0x37c/0x16a3
> cls_bpf_classify+0x205/0x2e0
> tcf_classify+0x92/0x160
> __netif_receive_skb_core+0xe52/0xf10
> __netif_receive_skb_list_core+0x96/0x2b0
> napi_complete_done+0x7b5/0xb70
> <redacted>_poll+0x94/0xb0
> net_rx_action+0x163/0x1d70
> __do_softirq+0xdc/0x32e
> asm_call_irq_on_stack+0x12/0x20
> </IRQ>
> do_softirq_own_stack+0x36/0x50
> do_softirq+0x44/0x70
>
> I'm not 100% what is causing them. It might be some kernel change or
> new code path in the bpf program. But looking at the code,
> I'm assuming the issue has been there for a while.
>
> __inet_hash can race with lockless (rcu) readers on the other cpus:
>
> __inet_hash
> __sk_nulls_add_node_rcu
> <- (bpf triggers here)
> sock_set_flag(SOCK_RCU_FREE)
>
> Let's move the SOCK_RCU_FREE part up a bit, before we are inserting
> the socket into hashtables. Note, that the race is really harmless;
> the bpf callers are handling this situation (where listener socket
> doesn't have SOCK_RCU_FREE set) correctly, so the only
> annoyance is a WARN_ONCE (so not 100% sure whether it should
> wait until net-next instead).
>
> For the fixes tag, I'm using the original commit which added the flag.
When this commit added the flag, precise location of the
sock_set_flag(sk, SOCK_RCU_FREE)
did not matter, because the thread calling __inet_hash() owns a reference on sk.
SOCK_RCU_FREE was tested only at dismantle time.
Back then BPF was not able yet to perform lookups, and double check if
SOCK_RCU_FREE
was set or not.
Checking SOCK_RCU_FREE _after_ the lookup to infer if a refcount has
been taken came
with commit 6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF")
I think we can be more precise and help future debugging, in case more problems
need investigations.
Can you augment the changelog and use a different Fixes: tag ?
With that,
Reviewed-by: Eric Dumazet <edumazet@google.com>
>
> Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood")
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
> net/ipv4/inet_hashtables.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
> index 598c1b114d2c..a532f749e477 100644
> --- a/net/ipv4/inet_hashtables.c
> +++ b/net/ipv4/inet_hashtables.c
> @@ -751,12 +751,12 @@ int __inet_hash(struct sock *sk, struct sock *osk)
> if (err)
> goto unlock;
> }
> + sock_set_flag(sk, SOCK_RCU_FREE);
> if (IS_ENABLED(CONFIG_IPV6) && sk->sk_reuseport &&
> sk->sk_family == AF_INET6)
> __sk_nulls_add_node_tail_rcu(sk, &ilb2->nulls_head);
> else
> __sk_nulls_add_node_rcu(sk, &ilb2->nulls_head);
> - sock_set_flag(sk, SOCK_RCU_FREE);
> sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1);
> unlock:
> spin_unlock(&ilb2->lock);
> --
> 2.42.0.869.gea05f2083d-goog
>
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [PATCH net] net: set SOCK_RCU_FREE before inserting socket into hashtable
2023-11-08 20:58 ` Eric Dumazet
@ 2023-11-08 21:01 ` Stanislav Fomichev
0 siblings, 0 replies; 3+ messages in thread
From: Stanislav Fomichev @ 2023-11-08 21:01 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, davem, kuba, pabeni
On Wed, Nov 8, 2023 at 12:58 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, Nov 8, 2023 at 9:28 PM Stanislav Fomichev <sdf@google.com> wrote:
> >
> > We've started to see the following kernel traces:
> >
> > WARNING: CPU: 83 PID: 0 at net/core/filter.c:6641 sk_lookup+0x1bd/0x1d0
> >
> > Call Trace:
> > <IRQ>
> > __bpf_skc_lookup+0x10d/0x120
> > bpf_sk_lookup+0x48/0xd0
> > bpf_sk_lookup_tcp+0x19/0x20
> > bpf_prog_<redacted>+0x37c/0x16a3
> > cls_bpf_classify+0x205/0x2e0
> > tcf_classify+0x92/0x160
> > __netif_receive_skb_core+0xe52/0xf10
> > __netif_receive_skb_list_core+0x96/0x2b0
> > napi_complete_done+0x7b5/0xb70
> > <redacted>_poll+0x94/0xb0
> > net_rx_action+0x163/0x1d70
> > __do_softirq+0xdc/0x32e
> > asm_call_irq_on_stack+0x12/0x20
> > </IRQ>
> > do_softirq_own_stack+0x36/0x50
> > do_softirq+0x44/0x70
> >
> > I'm not 100% what is causing them. It might be some kernel change or
> > new code path in the bpf program. But looking at the code,
> > I'm assuming the issue has been there for a while.
> >
> > __inet_hash can race with lockless (rcu) readers on the other cpus:
> >
> > __inet_hash
> > __sk_nulls_add_node_rcu
> > <- (bpf triggers here)
> > sock_set_flag(SOCK_RCU_FREE)
> >
> > Let's move the SOCK_RCU_FREE part up a bit, before we are inserting
> > the socket into hashtables. Note, that the race is really harmless;
> > the bpf callers are handling this situation (where listener socket
> > doesn't have SOCK_RCU_FREE set) correctly, so the only
> > annoyance is a WARN_ONCE (so not 100% sure whether it should
> > wait until net-next instead).
> >
> > For the fixes tag, I'm using the original commit which added the flag.
>
> When this commit added the flag, precise location of the
> sock_set_flag(sk, SOCK_RCU_FREE)
> did not matter, because the thread calling __inet_hash() owns a reference on sk.
>
> SOCK_RCU_FREE was tested only at dismantle time.
>
> Back then BPF was not able yet to perform lookups, and double check if
> SOCK_RCU_FREE
> was set or not.
>
> Checking SOCK_RCU_FREE _after_ the lookup to infer if a refcount has
> been taken came
> with commit 6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF")
>
> I think we can be more precise and help future debugging, in case more problems
> need investigations.
>
> Can you augment the changelog and use a different Fixes: tag ?
>
> With that,
>
> Reviewed-by: Eric Dumazet <edumazet@google.com>
Sure, thank you for the timeline! Will resend shortly with the updated
changelog.
> >
> > Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood")
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> > net/ipv4/inet_hashtables.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
> > index 598c1b114d2c..a532f749e477 100644
> > --- a/net/ipv4/inet_hashtables.c
> > +++ b/net/ipv4/inet_hashtables.c
> > @@ -751,12 +751,12 @@ int __inet_hash(struct sock *sk, struct sock *osk)
> > if (err)
> > goto unlock;
> > }
> > + sock_set_flag(sk, SOCK_RCU_FREE);
> > if (IS_ENABLED(CONFIG_IPV6) && sk->sk_reuseport &&
> > sk->sk_family == AF_INET6)
> > __sk_nulls_add_node_tail_rcu(sk, &ilb2->nulls_head);
> > else
> > __sk_nulls_add_node_rcu(sk, &ilb2->nulls_head);
> > - sock_set_flag(sk, SOCK_RCU_FREE);
> > sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1);
> > unlock:
> > spin_unlock(&ilb2->lock);
> > --
> > 2.42.0.869.gea05f2083d-goog
> >
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-11-08 21:01 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-08 20:28 [PATCH net] net: set SOCK_RCU_FREE before inserting socket into hashtable Stanislav Fomichev
2023-11-08 20:58 ` Eric Dumazet
2023-11-08 21:01 ` Stanislav Fomichev
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox