All of lore.kernel.org
 help / color / mirror / Atom feed
From: luoxuanqiang <xuanqiang.luo@linux.dev>
To: Eric Dumazet <edumazet@google.com>
Cc: kuniyu@google.com, davem@davemloft.net, kuba@kernel.org,
	kernelxing@tencent.com, netdev@vger.kernel.org,
	Xuanqiang Luo <luoxuanqiang@kylinos.cn>
Subject: Re: [PATCH net] inet: Avoid established lookup missing active sk
Date: Wed, 3 Sep 2025 19:51:07 +0800	[thread overview]
Message-ID: <b6a40fe0-09da-4b84-aa8d-0dcebf15d822@linux.dev> (raw)
In-Reply-To: <CANn89i+XH95h4UANWpR-39LSRkvM3LL=_pRL0+6fp6dwTZxn_g@mail.gmail.com>


在 2025/9/3 14:40, Eric Dumazet 写道:
> On Tue, Sep 2, 2025 at 7:46 PM Xuanqiang Luo <xuanqiang.luo@linux.dev> wrote:
>> From: Xuanqiang Luo <luoxuanqiang@kylinos.cn>
>>
>> Since the lookup of sk in ehash is lockless, when one CPU is performing a
>> lookup while another CPU is executing delete and insert operations
>> (deleting reqsk and inserting sk), the lookup CPU may miss either of
>> them, if sk cannot be found, an RST may be sent.
>>
>> The call trace map is drawn as follows:
>>     CPU 0                           CPU 1
>>     -----                           -----
>>                                  spin_lock()
>>                                  sk_nulls_del_node_init_rcu(osk)
>> __inet_lookup_established()
>>                                  __sk_nulls_add_node_rcu(sk, list)
>>                                  spin_unlock()
>>
>> We can try using spin_lock()/spin_unlock() to wait for ehash updates
>> (ensuring all deletions and insertions are completed) after a failed
>> lookup in ehash, then lookup sk again after the update. Since the sk
>> expected to be found is unlikely to encounter the aforementioned scenario
>> multiple times consecutively, we only need one update.
> No need for a lock really...
> - add the new node (with a temporary 'wrong' nulls value),
> - delete the old node
> - replace the nulls value by the expected one.
>
Hi Eric, May I ask if your main purpose is to add a temporary 'wrong' 
nulls to trigger a re-lookup in the lookup, until the real new sk is 
successfully replaced, like the following code (rough code)?

Thanks
Xuanqiang

--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -526,6 +526,8 @@ struct sock *__inet_lookup_established(const struct net *net,
       * not the expected one, we must restart lookup.
       * We probably met an item that was moved to another chain.
       */
+
+    /* Here, temporary NULL values cause a re-lookup. */
      if (get_nulls_value(node) != slot)
          goto begin;
  out:
@@ -684,7 +686,34 @@ bool inet_ehash_insert(struct sock *sk, struct sock *osk, bool *found_dup_sk)
  
      spin_lock(lock);
      if (osk) {
+        struct hlist_nulls_node *i, *last = NULL, *n = &sk->sk_nulls_node;
          WARN_ON_ONCE(sk->sk_hash != osk->sk_hash);
+
+        /* find the last node in the list */
+        for (i = list->first; !is_a_nulls(i); i = i->next)
+            last = i;
+
+        if (last) {
+            /* save the original nulls values  */
+            i = last->next;
+
+            /* 1. add the temporary 'wrong' nulls value */
+            rcu_assign_pointer(hlist_nulls_next_rcu(last),
+                       (struct hlist_nulls_node *)NULLS_MARKER(NULL));
+
+            /* 2. delete the old node */
+            ret = sk_nulls_del_node_init_rcu(osk);
+
+            /* 3. add the new node */
+            if (ret) {
+                WRITE_ONCE(n->next, i);
+                n->pprev = &last->next;
+                rcu_assign_pointer(hlist_nulls_next_rcu(last), n);
+            } else {
+                rcu_assign_pointer(hlist_nulls_next_rcu(last), i);
+            }
+            goto unlock;
+        }
          ret = sk_nulls_del_node_init_rcu(osk);
      } else if (found_dup_sk) {
          *found_dup_sk = inet_ehash_lookup_by_sk(sk, list);
@@ -695,6 +724,7 @@ bool inet_ehash_insert(struct sock *sk, struct sock *osk, bool *found_dup_sk)
      if (ret)
          __sk_nulls_add_node_rcu(sk, list);
  
+unlock:
      spin_unlock(lock);
  
      return ret;


  parent reply	other threads:[~2025-09-03 11:51 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-03  2:44 [PATCH net] inet: Avoid established lookup missing active sk Xuanqiang Luo
2025-09-03  5:16 ` Kuniyuki Iwashima
2025-09-03  5:48   ` Kuniyuki Iwashima
2025-09-03  7:54     ` luoxuanqiang
2025-09-03  6:40 ` Eric Dumazet
2025-09-03  6:52   ` Jason Xing
2025-09-03  8:03     ` luoxuanqiang
2025-09-03  8:35     ` Eric Dumazet
2025-09-03  9:05       ` Jason Xing
2025-09-03 11:51   ` luoxuanqiang [this message]
2025-09-03 21:53 ` [syzbot ci] " syzbot ci

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b6a40fe0-09da-4b84-aa8d-0dcebf15d822@linux.dev \
    --to=xuanqiang.luo@linux.dev \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kernelxing@tencent.com \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=luoxuanqiang@kylinos.cn \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.