All of lore.kernel.org
 help / color / mirror / Atom feed
From: luoxuanqiang <xuanqiang.luo@linux.dev>
To: Kuniyuki Iwashima <kuniyu@google.com>
Cc: edumazet@google.com, davem@davemloft.net, kuba@kernel.org,
	kernelxing@tencent.com, netdev@vger.kernel.org,
	Xuanqiang Luo <luoxuanqiang@kylinos.cn>
Subject: Re: [PATCH net] inet: Avoid established lookup missing active sk
Date: Wed, 3 Sep 2025 15:54:41 +0800	[thread overview]
Message-ID: <3ca7ad52-bfd6-4619-abef-93bf5be3969c@linux.dev> (raw)
In-Reply-To: <CAAVpQUDZFCYNMQ08uRLu388cmggckzPeP=N7WncFyxN-_hgaMw@mail.gmail.com>


在 2025/9/3 13:48, Kuniyuki Iwashima 写道:
> On Tue, Sep 2, 2025 at 10:16 PM Kuniyuki Iwashima <kuniyu@google.com> wrote:
>> On Tue, Sep 2, 2025 at 7:45 PM Xuanqiang Luo <xuanqiang.luo@linux.dev> wrote:
>>> From: Xuanqiang Luo <luoxuanqiang@kylinos.cn>
>>>
>>> Since the lookup of sk in ehash is lockless, when one CPU is performing a
>>> lookup while another CPU is executing delete and insert operations
>>> (deleting reqsk and inserting sk), the lookup CPU may miss either of
>>> them, if sk cannot be found, an RST may be sent.
>>>
>>> The call trace map is drawn as follows:
>>>     CPU 0                           CPU 1
>>>     -----                           -----
>>>                                  spin_lock()
>>>                                  sk_nulls_del_node_init_rcu(osk)
>>> __inet_lookup_established()
>>>                                  __sk_nulls_add_node_rcu(sk, list)
>>>                                  spin_unlock()
>> This usually does not happen except for local communication, and
>> retrying on the client side is much better than penalising all lookups
>> for SYN.
>>
>>> We can try using spin_lock()/spin_unlock() to wait for ehash updates
>>> (ensuring all deletions and insertions are completed) after a failed
>>> lookup in ehash, then lookup sk again after the update. Since the sk
>>> expected to be found is unlikely to encounter the aforementioned scenario
>>> multiple times consecutively, we only need one update.
>>>
>>> Similarly, an issue occurs in tw hashdance. Try adjusting the order in
>>> which it operates on ehash: remove sk first, then add tw. If sk is missed
>>> during lookup, it will likewise wait for the update to find tw, without
>>> worrying about the skc_refcnt issue that would arise if tw were found
>>> first.
>>>
>>> Fixes: 3ab5aee7fe84 ("net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls")
>>> Signed-off-by: Xuanqiang Luo <luoxuanqiang@kylinos.cn>
>>> ---
>>>   net/ipv4/inet_hashtables.c    | 12 ++++++++++++
>>>   net/ipv4/inet_timewait_sock.c |  9 ++++-----
>>>   2 files changed, 16 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
>>> index ceeeec9b7290..4eb3a55b855b 100644
>>> --- a/net/ipv4/inet_hashtables.c
>>> +++ b/net/ipv4/inet_hashtables.c
>>> @@ -505,6 +505,7 @@ struct sock *__inet_lookup_established(const struct net *net,
>>>          unsigned int hash = inet_ehashfn(net, daddr, hnum, saddr, sport);
>>>          unsigned int slot = hash & hashinfo->ehash_mask;
>>>          struct inet_ehash_bucket *head = &hashinfo->ehash[slot];
>>> +       bool try_lock = true;
>>>
>>>   begin:
>>>          sk_nulls_for_each_rcu(sk, node, &head->chain) {
>>> @@ -528,6 +529,17 @@ struct sock *__inet_lookup_established(const struct net *net,
>>>           */
>>>          if (get_nulls_value(node) != slot)
>>>                  goto begin;
>>> +
>>> +       if (try_lock) {
>>> +               spinlock_t *lock = inet_ehash_lockp(hashinfo, hash);
>>> +
>>> +               try_lock = false;
>>> +               spin_lock(lock);
>>> +               /* Ensure ehash ops under spinlock complete. */
>>> +               spin_unlock(lock);
>>> +               goto begin;
>>> +       }
>>> +
>>>   out:
>>>          sk = NULL;
>>>   found:
>>> diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
>>> index 875ff923a8ed..a91e02e19c53 100644
>>> --- a/net/ipv4/inet_timewait_sock.c
>>> +++ b/net/ipv4/inet_timewait_sock.c
>>> @@ -139,14 +139,10 @@ void inet_twsk_hashdance_schedule(struct inet_timewait_sock *tw,
>>>
>>>          spin_lock(lock);
>>>
>>> -       /* Step 2: Hash TW into tcp ehash chain */
>>> -       inet_twsk_add_node_rcu(tw, &ehead->chain);
>> You are adding a new RST scenario where the corresponding
>> socket is not found and a listener or no socket is found.
>>
>> The try_lock part is not guaranteed to happen after twsk
>> insertion below.

I'm sorry, I didn't get it. If try_lock can search again, other places 
should have left the spinlock critical section. That means twsk should 
have been inserted already. Or maybe I missed some details. Could you 
please explain more clearly? Your guidance would be really helpful. :)

> Oh no, spin_lock() dance sychronises the threads but I still
> think this is rather harmful for normal cases; now sending
> an unmatched packet can trigger lock dance, which is easily
> abused for DDoS.
>
I agree this scenario is possible. It does make DDoS attacks more 
impactful. Thanks for pointing that out. However, this is the best 
approach I can think of right now. Thanks Xuanqiang

>>
>>> -
>>> -       /* Step 3: Remove SK from hash chain */
>>> +       /* Step 2: Remove SK from hash chain */
>>>          if (__sk_nulls_del_node_init_rcu(sk))
>>>                  sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);
>>>
>>> -
>>>          /* Ensure above writes are committed into memory before updating the
>>>           * refcount.
>>>           * Provides ordering vs later refcount_inc().
>>> @@ -161,6 +157,9 @@ void inet_twsk_hashdance_schedule(struct inet_timewait_sock *tw,
>>>           */
>>>          refcount_set(&tw->tw_refcnt, 3);
>>>
>>> +       /* Step 3: Hash TW into tcp ehash chain */
>>> +       inet_twsk_add_node_rcu(tw, &ehead->chain);
>>> +
>>>          inet_twsk_schedule(tw, timeo);
>>>
>>>          spin_unlock(lock);
>>> --
>>> 2.25.1
>>>

  reply	other threads:[~2025-09-03  7:55 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-03  2:44 [PATCH net] inet: Avoid established lookup missing active sk Xuanqiang Luo
2025-09-03  5:16 ` Kuniyuki Iwashima
2025-09-03  5:48   ` Kuniyuki Iwashima
2025-09-03  7:54     ` luoxuanqiang [this message]
2025-09-03  6:40 ` Eric Dumazet
2025-09-03  6:52   ` Jason Xing
2025-09-03  8:03     ` luoxuanqiang
2025-09-03  8:35     ` Eric Dumazet
2025-09-03  9:05       ` Jason Xing
2025-09-03 11:51   ` luoxuanqiang
2025-09-03 21:53 ` [syzbot ci] " syzbot ci

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3ca7ad52-bfd6-4619-abef-93bf5be3969c@linux.dev \
    --to=xuanqiang.luo@linux.dev \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kernelxing@tencent.com \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=luoxuanqiang@kylinos.cn \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.