From: Chuck Lever <chuck.lever@oracle.com>
To: Jeff Layton <jlayton@kernel.org>,
Olga Kornievskaia <okorniev@redhat.com>
Cc: linux-nfs@vger.kernel.org
Subject: Re: [PATCH 1/3] llist: add ability to remove a particular entry from the list
Date: Thu, 16 Jan 2025 11:00:42 -0500 [thread overview]
Message-ID: <b4b38fc9-3a0a-4324-b7c9-5e080ef492a6@oracle.com> (raw)
In-Reply-To: <9757da07ce21d1c1275c637ae49cbe69a9c83a71.camel@kernel.org>
On 1/16/25 10:42 AM, Jeff Layton wrote:
> On Thu, 2025-01-16 at 10:33 -0500, Chuck Lever wrote:
>> On 1/16/25 9:54 AM, Olga Kornievskaia wrote:
>>> On Thu, Jan 16, 2025 at 9:27 AM Chuck Lever <chuck.lever@oracle.com> wrote:
>>>>
>>>> On 1/15/25 6:24 PM, Olga Kornievskaia wrote:
>>>>> nfsd stores its network transports in a lwq (which is a lockless list)
>>>>> llist has no ability to remove a particular entry which nfsd needs
>>>>> to remove a listener thread.
>>>>
>>>> Note that scripts/get_maintainer.pl says that the maintainer of this
>>>> file is:
>>>>
>>>> linux-kernel@vger.kernel.org
>>>>
>>>> so that address should appear on the cc: list of this series. So
>>>> should the list of reviewers for NFSD that appear in MAINTAINERS.
>>>
>>> I will resend and include the mentioned list.
>>>
>>>> ISTR Neil found the same gap in the llist API. I don't think it's
>>>> possible to safely remove an item from the middle of an llist. Much
>>>> of the complexity of the current svc thread scheduler is due to this
>>>> complication.
>>>>
>>>> I think you will need to mark the listener as dead and find some
>>>> way of safely dealing with it later.
>>>
>>> A listener can only be removed if there are no active threads. This
>>> code in nfsd_nl_listener_set_doit() won't allow it which happens
>>> before we are manipulating the listener:
>>> /* For now, no removing old sockets while server is running */
>>> if (serv->sv_nrthreads && !list_empty(&permsocks)) {
>>> list_splice_init(&permsocks, &serv->sv_permsocks);
>>> spin_unlock_bh(&serv->sv_lock);
>>> err = -EBUSY;
>>> goto out_unlock_mtx;
>>> }
>>
>> Got it.
>>
>> I'm splitting hairs, but this seems like a special case that might be
>> true only for NFSD and could be abused by other API consumers.
>>
>> At the least, the opening block comment in llist.h should be updated
>> to include del_entry in the locking table.
>>
>> I would be more comfortable with a solution that works in alignment with
>> the current llist API, but if others are fine with this solution, then I
>> won't object strenuously.
>>
>> (And to be clear, I agree that there is a bug in set_doit() that needs
>> to be addressed quickly -- it's the specific fix that I'm queasy about).
>>
>
> Yeah, it would be good to address it quickly since you can crash the
> box with this right now.
I'm asking myself why this isn't a problem during server shutdown or
when removing just one listener -- with rpc.nfsd. Have we never done
this before we had netlink?
> I'm not thrilled with adding the llist_del_entry() footgun either, but
> it should work.
I would like to see one or two alternatives before sticking with this
llist_del_entry() idea.
> Another approach we could consider instead would be to delay queueing
> all of these sockets to the lwq until after the sv_permsocks list is
> settled. You could even just queue up the whole sv_permsocks list at
> the end of this. If there's no work there, then there's no real harm.
> That is a bit more surgery, however, since you'd have to lift
> svc_xprt_received() handling out of svc_xprt_create_from_sa().
Handling the list once instead of adding and/or removing one at a time
seems like a better plan to me (IIUC).
Also, nit: the use of the term "sockets" throughout this code is
confusing. We're dealing with RDMA endpoints as well in here. We can't
easily rename the structure fields, true, but the comments could be more
helpful.
>>>>> Suggested-by: Jeff Layton <jlayton@kernel.org>
>>>>> Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
>>>>> ---
>>>>> include/linux/llist.h | 36 ++++++++++++++++++++++++++++++++++++
>>>>> 1 file changed, 36 insertions(+)
>>>>>
>>>>> diff --git a/include/linux/llist.h b/include/linux/llist.h
>>>>> index 2c982ff7475a..fe6be21897d9 100644
>>>>> --- a/include/linux/llist.h
>>>>> +++ b/include/linux/llist.h
>>>>> @@ -253,6 +253,42 @@ static inline bool __llist_add(struct llist_node *new, struct llist_head *head)
>>>>> return __llist_add_batch(new, new, head);
>>>>> }
>>>>>
>>>>> +/**
>>>>> + * llist_del_entry - remove a particular entry from a lock-less list
>>>>> + * @head: head of the list to remove the entry from
>>>>> + * @entry: entry to be removed from the list
>>>>> + *
>>>>> + * Walk the list, find the given entry and remove it from the list.
>>
>> The above sentence repeats the comments in the code and the code itself.
>> It visually obscures the next, much more important, sentence.
>>
>>
>>>>> + * The caller must ensure that nothing can race in and change the
>>>>> + * list while this is running.
>>>>> + *
>>>>> + * Returns true if the entry was found and removed.
>>>>> + */
>>>>> +static inline bool llist_del_entry(struct llist_head *head, struct llist_node *entry)
>>>>> +{
>>>>> + struct llist_node *pos;
>>>>> +
>>>>> + if (!head->first)
>>>>> + return false;
>>
>> if (llist_empty()) ?
>>
>>
>>>>> +
>>>>> + /* Is it the first entry? */
>>>>> + if (head->first == entry) {
>>>>> + head->first = entry->next;
>>>>> + entry->next = entry;
>>>>> + return true;
>>
>> llist_del_first() or even llist_del_first_this()
>>
>> Basically I would avoid open-coding this logic and use the existing
>> helpers where possible. The new code doesn't provide memory release
>> semantics that would ensure the next access of the llist sees these
>> updates.
>>
>>
>>>>> + }
>>>>> +
>>>>> + /* Find it in the list */
>>>>> + llist_for_each(head->first, pos) {
>>>>> + if (pos->next == entry) {
>>>>> + pos->next = entry->next;
>>>>> + entry->next = entry;
>>>>> + return true;
>>>>> + }
>>>>> + }
>>>>> + return false;
>>>>> +}
>>>>> +
>>>>> /**
>>>>> * llist_del_all - delete all entries from lock-less list
>>>>> * @head: the head of lock-less list to delete all entries
>>>>
>>>>
>>>> --
>>>> Chuck Lever
>>>>
>>>
>>
>>
>
--
Chuck Lever
next prev parent reply other threads:[~2025-01-16 16:00 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-15 23:24 [PATCH 0/3] fix removal of nfsd listeners Olga Kornievskaia
2025-01-15 23:24 ` [PATCH 1/3] llist: add ability to remove a particular entry from the list Olga Kornievskaia
2025-01-16 14:26 ` Chuck Lever
2025-01-16 14:54 ` Olga Kornievskaia
2025-01-16 15:33 ` Chuck Lever
2025-01-16 15:42 ` Jeff Layton
2025-01-16 16:00 ` Chuck Lever [this message]
2025-01-16 16:31 ` Olga Kornievskaia
2025-01-16 16:44 ` Chuck Lever
2025-01-17 4:08 ` kernel test robot
2025-01-17 4:08 ` kernel test robot
2025-01-24 0:57 ` kernel test robot
2025-01-15 23:24 ` [PATCH 2/3] SUNRPC: add ability to remove specific server transport Olga Kornievskaia
2025-01-15 23:24 ` [PATCH 3/3] nfsd: fix management of listener transports Olga Kornievskaia
2025-01-16 14:27 ` Chuck Lever
2025-01-16 14:46 ` Jeff Layton
2025-01-16 14:55 ` Chuck Lever
2025-01-16 15:34 ` Olga Kornievskaia
2025-01-16 15:42 ` Chuck Lever
2025-01-16 11:53 ` [PATCH 0/3] fix removal of nfsd listeners Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b4b38fc9-3a0a-4324-b7c9-5e080ef492a6@oracle.com \
--to=chuck.lever@oracle.com \
--cc=jlayton@kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=okorniev@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox