All of lore.kernel.org
 help / color / mirror / Atom feed
From: luoxuanqiang <xuanqiang.luo@linux.dev>
To: Kuniyuki Iwashima <kuniyu@google.com>
Cc: edumazet@google.com, kerneljasonxing@gmail.com,
	davem@davemloft.net, kuba@kernel.org, netdev@vger.kernel.org,
	Xuanqiang Luo <luoxuanqiang@kylinos.cn>
Subject: Re: [PATCH net-next v3 1/3] rculist: Add __hlist_nulls_replace_rcu() and hlist_nulls_replace_init_rcu()
Date: Thu, 18 Sep 2025 14:09:11 +0800	[thread overview]
Message-ID: <7ece5d34-aa1c-4251-9650-756de3b3dc18@linux.dev> (raw)
In-Reply-To: <CAAVpQUCoCizxTm6wRs0+n6_kPK+kgxwszsYKNds3YvuBfBvrhg@mail.gmail.com>


在 2025/9/17 12:27, Kuniyuki Iwashima 写道:
> On Tue, Sep 16, 2025 at 8:27 PM luoxuanqiang <xuanqiang.luo@linux.dev> wrote:
>>
>> 在 2025/9/17 02:58, Kuniyuki Iwashima 写道:
>>> On Tue, Sep 16, 2025 at 3:31 AM <xuanqiang.luo@linux.dev> wrote:
>>>> From: Xuanqiang Luo <luoxuanqiang@kylinos.cn>
>>>>
>>>> Add two functions to atomically replace RCU-protected hlist_nulls entries.
>>>>
>>>> Signed-off-by: Xuanqiang Luo <luoxuanqiang@kylinos.cn>
>>>> ---
>>>>    include/linux/rculist_nulls.h | 61 +++++++++++++++++++++++++++++++++++
>>>>    1 file changed, 61 insertions(+)
>>>>
>>>> diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
>>>> index 89186c499dd4..8ed604f65a3e 100644
>>>> --- a/include/linux/rculist_nulls.h
>>>> +++ b/include/linux/rculist_nulls.h
>>>> @@ -152,6 +152,67 @@ static inline void hlist_nulls_add_fake(struct hlist_nulls_node *n)
>>>>           n->next = (struct hlist_nulls_node *)NULLS_MARKER(NULL);
>>>>    }
>>>>
>>>> +/**
>>>> + * __hlist_nulls_replace_rcu - replace an old entry by a new one
>>>> + * @old: the element to be replaced
>>>> + * @new: the new element to insert
>>>> + *
>>>> + * Description:
>>>> + * Replace the old entry with the new one in a RCU-protected hlist_nulls, while
>>>> + * permitting racing traversals.
>>>> + *
>>>> + * The caller must take whatever precautions are necessary (such as holding
>>>> + * appropriate locks) to avoid racing with another list-mutation primitive, such
>>>> + * as hlist_nulls_add_head_rcu() or hlist_nulls_del_rcu(), running on this same
>>>> + * list.  However, it is perfectly legal to run concurrently with the _rcu
>>>> + * list-traversal primitives, such as hlist_nulls_for_each_entry_rcu().
>>>> + */
>>>> +static inline void __hlist_nulls_replace_rcu(struct hlist_nulls_node *old,
>>>> +                                            struct hlist_nulls_node *new)
>>>> +{
>>>> +       struct hlist_nulls_node *next = old->next;
>>>> +
>>>> +       new->next = next;
>> Do we need to use WRITE_ONCE() here, as mentioned in efd04f8a8b45
>> ("rcu: Use WRITE_ONCE() for assignments to ->next for rculist_nulls")?
>> I am more inclined to think that it is necessary.
> Good point, then WRITE_ONCE() makes sense.
>
>>>> +       WRITE_ONCE(new->pprev, old->pprev);
>>> As you don't use WRITE_ONCE() for ->next, the new node must
>>> not be published yet, so WRITE_ONCE() is unnecessary for ->pprev
>>> too.
>> I noticed that point. My understanding is that using WRITE_ONCE()
>> for new->pprev follows the approach in hlist_replace_rcu() to
>> match the READ_ONCE() in hlist_nulls_unhashed_lockless() and
>> hlist_unhashed_lockless().
> Using WRITE_ONCE() or READ_ONCE() implies lockless readers
> or writers elsewhere.
>
> sk_hashed() does not use the lockless version, and I think it's
> always called under lock_sock() or bh_.  Perhaps run kernel
> w/ KCSAN and see if it complains.
>
> [ It seems hlist_nulls_unhashed_lockless is not used at all and
>    hlist_unhashed_lockless() is only used by bpf and timer code. ]
>
> That said, it might be fair to use WRITE_ONCE() here to make
> future users less error-prone.
>
>
>>>> +       rcu_assign_pointer(*(struct hlist_nulls_node __rcu **)new->pprev, new);
>>>> +       if (!is_a_nulls(next))
>>>> +               WRITE_ONCE(new->next->pprev, &new->next);
>>>> +}
>>>> +
>>>> +/**
>>>> + * hlist_nulls_replace_init_rcu - replace an old entry by a new one and
>>>> + * initialize the old
>>>> + * @old: the element to be replaced
>>>> + * @new: the new element to insert
>>>> + *
>>>> + * Description:
>>>> + * Replace the old entry with the new one in a RCU-protected hlist_nulls, while
>>>> + * permitting racing traversals, and reinitialize the old entry.
>>>> + *
>>>> + * Return: true if the old entry was hashed and was replaced successfully, false
>>>> + * otherwise.
>>>> + *
>>>> + * Note: hlist_nulls_unhashed() on the old node returns true after this.
>>>> + * It is useful for RCU based read lockfree traversal if the writer side must
>>>> + * know if the list entry is still hashed or already unhashed.
>>>> + *
>>>> + * The caller must take whatever precautions are necessary (such as holding
>>>> + * appropriate locks) to avoid racing with another list-mutation primitive, such
>>>> + * as hlist_nulls_add_head_rcu() or hlist_nulls_del_rcu(), running on this same
>>>> + * list. However, it is perfectly legal to run concurrently with the _rcu
>>>> + * list-traversal primitives, such as hlist_nulls_for_each_entry_rcu().
>>>> + */
>>>> +static inline bool hlist_nulls_replace_init_rcu(struct hlist_nulls_node *old,
>>>> +                                               struct hlist_nulls_node *new)
>>>> +{
>>>> +       if (!hlist_nulls_unhashed(old)) {
>>> As mentioned in v1, this check is redundant.
>> Apologies for bringing this up again. My understanding is that
>> replacing a node requires checking if the old node is unhashed.
> Only if the caller does not check it.
>
> __sk_nulls_replace_node_init_rcu() has already checked
> sk_hashed(old), which is !hlist_nulls_unhashed(old), no ?
>
> __sk_nulls_replace_node_init_rcu(struct sock *old, ...)
>    if (sk_hashed(old))
>      hlist_nulls_replace_init_rcu(&old->sk_nulls_node, ...)
>        if (!hlist_nulls_unhashed(old))
>
I understand that sk_hashed(old) is equivalent to
!hlist_nulls_unhashed(old). However,
hlist_nulls_replace_init_rcu() is also used in
inet_twsk_hashdance_schedule().

If it's confirmed that the unhashed check is
unnecessary in inet_twsk_hashdance_schedule()
(as discussed in https://lore.kernel.org/all/CAAVpQUBY=h3gDfaX=J9vbSuhYTn8cfCsBGhPLqoer0OSYdihDg@mail.gmail.com/),
then for this specific patchset, this redundant check
can indeed be removed.

But I'm concerned that others might later use
hlist_nulls_replace_init_rcu() standalone, similar to
how hlist_nulls_del_init_rcu() is used. This could cause
confusion since replace might not always succeed. Given
this, might retaining the hlist_nulls_unhashed(old)
check be safer?

Really appreciate your patient review and suggestions!

Thanks
Xuanqiang.

>> If so, we need a return value to inform the caller that the
>> replace operation would fail.
>>
>>>> +               __hlist_nulls_replace_rcu(old, new);
>>>> +               WRITE_ONCE(old->pprev, NULL);
>>>> +               return true;
>>>> +       }
>>>> +       return false;
>>>> +}
>>>> +
>>>>    /**
>>>>     * hlist_nulls_for_each_entry_rcu - iterate over rcu list of given type
>>>>     * @tpos:      the type * to use as a loop cursor.
>>>> --
>>>> 2.25.1
>>>>

  parent reply	other threads:[~2025-09-18  6:09 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-16 10:30 [PATCH net-next v3 0/3] net: Avoid ehash lookup races xuanqiang.luo
2025-09-16 10:30 ` [PATCH net-next v3 1/3] rculist: Add __hlist_nulls_replace_rcu() and hlist_nulls_replace_init_rcu() xuanqiang.luo
2025-09-16 18:58   ` Kuniyuki Iwashima
2025-09-17  3:26     ` luoxuanqiang
2025-09-17  4:27       ` Kuniyuki Iwashima
2025-09-17  4:43         ` Kuniyuki Iwashima
2025-09-18  6:09           ` luoxuanqiang
2025-09-18  6:09         ` luoxuanqiang [this message]
2025-09-16 10:30 ` [PATCH net-next v3 2/3] inet: Avoid ehash lookup race in inet_ehash_insert() xuanqiang.luo
2025-09-16 10:30 ` [PATCH net-next v3 3/3] inet: Avoid ehash lookup race in inet_twsk_hashdance_schedule() xuanqiang.luo
2025-09-16 19:48   ` Kuniyuki Iwashima
2025-09-17  3:26     ` luoxuanqiang
2025-09-17  4:36       ` Kuniyuki Iwashima
2025-09-18  8:32         ` luoxuanqiang
2025-09-19  8:38           ` Kuniyuki Iwashima

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7ece5d34-aa1c-4251-9650-756de3b3dc18@linux.dev \
    --to=xuanqiang.luo@linux.dev \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kerneljasonxing@gmail.com \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=luoxuanqiang@kylinos.cn \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.