From: Chuck Lever <chuck.lever@oracle.com>
To: Jeff Layton <jlayton@kernel.org>,
Olga Kornievskaia <okorniev@redhat.com>
Cc: linux-nfs@vger.kernel.org, neil@brown.name, Dai.Ngo@oracle.com,
tom@talpey.com
Subject: Re: [RFC PATCH 1/1] nfsd: rework how a listener is removed
Date: Thu, 28 Aug 2025 14:16:16 -0400 [thread overview]
Message-ID: <adddae47-63de-4e93-a53a-352ad0e5c772@oracle.com> (raw)
In-Reply-To: <e56f9194f0e65e85d92da4129f636fe40b34e54c.camel@kernel.org>
On 8/28/25 1:29 PM, Jeff Layton wrote:
> On Thu, 2025-08-28 at 11:21 -0400, Chuck Lever wrote:
>> On 8/27/25 10:21 AM, Chuck Lever wrote:
>>> On 8/26/25 6:00 PM, Olga Kornievskaia wrote:
>>>> This patch tries to address the following failure:
>>>> nfsdctl threads 0
>>>> nfsdctl listener +rdma::20049
>>>> nfsdctl listener +tcp::2049
>>>> nfsdctl listener -tcp::2049
>>>> nfsdctl: Error: Cannot assign requested address
>>>>
>>>> The reason for the failure is due to the fact that socket cleanup only
>>>> happens in __svc_rdma_free() which is a deferred work triggers when an
>>>> rdma transport is destroyed. To remove a listener nfsdctl is forced to
>>>> first remove all transports via svc_xprt_destroy_all() and then re-add
>>>> the ones that are left. Due to the fact that there isn't a way to
>>>> delete a particular entry from server's lwq sp_xprts that stores
>>>> transports. Going back to the deferred work done in __svc_rdma_free(),
>>>> the work might not get to run before nfsd_nl_listener_set_doit() creates
>>>> the new transports. As a result, it finds that something is still
>>>> listening of the rdma port and rdma_bind_addr() fails.
>>>>
>>>> Instead of using svc_xprt_destroy_all() to manipulate the sp_xprt,
>>>> instead introduce a function that just dequeues all transports. Then,
>>>> we add non-removed transports back to the list.
>>>>
>>>> Still not allowing to remove a listener while the server is active.
>>>>
>>>> We need to make several passes over the list of existing/new list
>>>> entries. On the first pass we determined if any of the entries need
>>>> to be removed. If so, we then check if the server has no active
>>>> threads. Then we dequeue all the transports and then go over the
>>>> list and recreate both permsocks list and sp_xprts lists. Then,
>>>> for the deleted transports, the transport is closed.
>>>
>>>> --- Comments:
>>>> (1) There is still a restriction on removing an active listener as
>>>> I dont know how to handle if the transport to be remove is currently
>>>> serving a request (it won't be on the sp_xprt list I believe?).
>>>
>>> This is a good reason why just setting a bit in the xprt and waiting for
>>> the close to complete is probably a better strategy than draining and
>>> refilling the permsock list.
>>>
>>> The idea of setting XPT_CLOSE and enqueuing the transport ... you know,
>>> like this:
>>>
>>> 151 /**
>>>
>>> 152 * svc_xprt_deferred_close - Close a transport
>>>
>>> 153 * @xprt: transport instance
>>>
>>> 154 *
>>>
>>> 155 * Used in contexts that need to defer the work of shutting down
>>>
>>> 156 * the transport to an nfsd thread.
>>>
>>> 157 */
>>>
>>> 158 void svc_xprt_deferred_close(struct svc_xprt *xprt)
>>>
>>> 159 {
>>>
>>> 160 trace_svc_xprt_close(xprt);
>>>
>>> 161 if (!test_and_set_bit(XPT_CLOSE, &xprt->xpt_flags))
>>>
>>> 162 svc_xprt_enqueue(xprt);
>>>
>>> 163 }
>>>
>>> 164 EXPORT_SYMBOL_GPL(svc_xprt_deferred_close);
>>>
>>> I expect that eventually the xprt will show up to svc_handle_xprt() and
>>> get deleted there. But you might still need some serialization with
>>> ->xpo_accept ?
>>
>> It occurred to me why the deferred close mechanism doesn't work: it
>> relies on having an nfsd thread to pick up the deferred work.
>>
>> If listener removal requires all nfsd threads to be terminated, there
>> is no thread to pick up the xprt and close it.
>>
>
> Interesting. I guess that the old nfsdfs file just didn't allow you to
> get into this situation, since you couldn't remove a listener at all.
>
> It really sounds like we just need a more selective version of
> svc_clean_up_xprts(). Something that can dequeue everything, close the
> ones that need to be closed (synchronously) and then requeues the rest.
That is roughly what Olga has done here. Now that I understand the
problem space a little better, maybe we can iterate on this.
>>>> In general, I'm unsure if there are other things I'm not considering.
>>>> (2) I'm questioning if in svc_xprt_dequeue_all() it is correct. I
>>>> used svc_cleanup_up_xprts() as the example.
>>>>> Fixes: d093c90892607 ("nfsd: fix management of listener transports")
>>>> Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
>>>> ---
>>>> fs/nfsd/nfsctl.c | 123 +++++++++++++++++++-------------
>>>> include/linux/sunrpc/svc_xprt.h | 1 +
>>>> include/linux/sunrpc/svcsock.h | 1 -
>>>> net/sunrpc/svc_xprt.c | 12 ++++
>>>> 4 files changed, 88 insertions(+), 49 deletions(-)
>>>>
>>>> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
>>>> index dd3267b4c203..38aaaef4734e 100644
>>>> --- a/fs/nfsd/nfsctl.c
>>>> +++ b/fs/nfsd/nfsctl.c
>>>> @@ -1902,44 +1902,17 @@ int nfsd_nl_version_get_doit(struct sk_buff *skb, struct genl_info *info)
>>>> return err;
>>>> }
>>>>
>>>> -/**
>>>> - * nfsd_nl_listener_set_doit - set the nfs running sockets
>>>> - * @skb: reply buffer
>>>> - * @info: netlink metadata and command arguments
>>>> - *
>>>> - * Return 0 on success or a negative errno.
>>>> - */
>>>> -int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
>>>> +static void _nfsd_walk_listeners(struct genl_info *info, struct svc_serv *serv,
>>>> + struct list_head *permsocks, int modify_xprt)
>>>
>>> So this function looks for the one listener we need to remove.
>>>
>>> Should removing a listener also close down all active temporary sockets
>>> for the service, or should it kill only the ones that were established
>>> via the listener being removed, or should it leave all active temporary
>>> sockets in place?
>>>
>>> Perhaps this is why /all/ permanent and temporary sockets are currently
>>> being removed. Once the target listener is gone, clients can't
>>> re-establish new connections, and the service is effectively ready to
>>> be shut down cleanly.
>>>
>>>
>>>> {
>>>> struct net *net = genl_info_net(info);
>>>> - struct svc_xprt *xprt, *tmp;
>>>> const struct nlattr *attr;
>>>> - struct svc_serv *serv;
>>>> - LIST_HEAD(permsocks);
>>>> - struct nfsd_net *nn;
>>>> - bool delete = false;
>>>> - int err, rem;
>>>> -
>>>> - mutex_lock(&nfsd_mutex);
>>>> -
>>>> - err = nfsd_create_serv(net);
>>>> - if (err) {
>>>> - mutex_unlock(&nfsd_mutex);
>>>> - return err;
>>>> - }
>>>> -
>>>> - nn = net_generic(net, nfsd_net_id);
>>>> - serv = nn->nfsd_serv;
>>>> -
>>>> - spin_lock_bh(&serv->sv_lock);
>>>> + struct svc_xprt *xprt, *tmp;
>>>> + int rem;
>>>>
>>>> - /* Move all of the old listener sockets to a temp list */
>>>> - list_splice_init(&serv->sv_permsocks, &permsocks);
>>>> + if (modify_xprt)
>>>> + svc_xprt_dequeue_all(serv);
>>>>
>>>> - /*
>>>> - * Walk the list of server_socks from userland and move any that match
>>>> - * back to sv_permsocks
>>>> - */
>>>> nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) {
>>>> struct nlattr *tb[NFSD_A_SOCK_MAX + 1];
>>>> const char *xcl_name;
>>>> @@ -1962,7 +1935,7 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
>>>> sa = nla_data(tb[NFSD_A_SOCK_ADDR]);
>>>>
>>>> /* Put back any matching sockets */
>>>> - list_for_each_entry_safe(xprt, tmp, &permsocks, xpt_list) {
>>>> + list_for_each_entry_safe(xprt, tmp, permsocks, xpt_list) {
>>>> /* This shouldn't be possible */
>>>> if (WARN_ON_ONCE(xprt->xpt_net != net)) {
>>>> list_move(&xprt->xpt_list, &serv->sv_permsocks);
>>>> @@ -1971,35 +1944,89 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
>>>>
>>>> /* If everything matches, put it back */
>>>> if (!strcmp(xprt->xpt_class->xcl_name, xcl_name) &&
>>>> - rpc_cmp_addr_port(sa, (struct sockaddr *)&xprt->xpt_local)) {
>>>> + rpc_cmp_addr_port(sa,
>>>> + (struct sockaddr *)&xprt->xpt_local)) {
>>>> list_move(&xprt->xpt_list, &serv->sv_permsocks);
>>>> + if (modify_xprt)
>>>> + svc_xprt_enqueue(xprt);
>>>> break;
>>>> }
>>>> }
>>>> }
>>>> +}
>>>> +
>>>> +/**
>>>> + * nfsd_nl_listener_set_doit - set the nfs running sockets
>>>> + * @skb: reply buffer
>>>> + * @info: netlink metadata and command arguments
>>>> + *
>>>> + * Return 0 on success or a negative errno.
>>>> + */
>>>> +int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
>>>> +{
>>>> + struct net *net = genl_info_net(info);
>>>> + struct svc_xprt *xprt;
>>>> + const struct nlattr *attr;
>>>> + struct svc_serv *serv;
>>>> + LIST_HEAD(permsocks);
>>>> + struct nfsd_net *nn;
>>>> + bool delete = false;
>>>> + int err, rem;
>>>> +
>>>> + mutex_lock(&nfsd_mutex);
>>>> +
>>>> + err = nfsd_create_serv(net);
>>>> + if (err) {
>>>> + mutex_unlock(&nfsd_mutex);
>>>> + return err;
>>>> + }
>>>> +
>>>> + nn = net_generic(net, nfsd_net_id);
>>>> + serv = nn->nfsd_serv;
>>>> +
>>>> + spin_lock_bh(&serv->sv_lock);
>>>> +
>>>> + /* Move all of the old listener sockets to a temp list */
>>>> + list_splice_init(&serv->sv_permsocks, &permsocks);
>>>>
>>>> /*
>>>> - * If there are listener transports remaining on the permsocks list,
>>>> - * it means we were asked to remove a listener.
>>>> + * Walk the list of server_socks from userland and move any that match
>>>> + * back to sv_permsocks. Determine if anything needs to be removed so
>>>> + * don't manipulate sp_xprts list.
>>>> */
>>>> - if (!list_empty(&permsocks)) {
>>>> - list_splice_init(&permsocks, &serv->sv_permsocks);
>>>> - delete = true;
>>>> - }
>>>> - spin_unlock_bh(&serv->sv_lock);
>>>> + _nfsd_walk_listeners(info, serv, &permsocks, false);
>>>>
>>>> - /* Do not remove listeners while there are active threads. */
>>>> - if (serv->sv_nrthreads) {
>>>> + /* For now, no removing old sockets while server is running */
>>>> + if (serv->sv_nrthreads && !list_empty(&permsocks)) {
>>>> + list_splice_init(&permsocks, &serv->sv_permsocks);
>>>> + spin_unlock_bh(&serv->sv_lock);
>>>> err = -EBUSY;
>>>> goto out_unlock_mtx;
>>>> }
>>>>
>>>> /*
>>>> - * Since we can't delete an arbitrary llist entry, destroy the
>>>> - * remaining listeners and recreate the list.
>>>> + * If there are listener transports remaining on the permsocks list,
>>>> + * it means we were asked to remove a listener. Walk the list again,
>>>> + * but this time also manage the sp_xprts but first removing all of
>>>> + * them and only adding back the ones not being deleted. Then close
>>>> + * the ones left on the list.
>>>> */
>>>> - if (delete)
>>>> - svc_xprt_destroy_all(serv, net, false);
>>>> + if (!list_empty(&permsocks)) {
>>>> + list_splice_init(&permsocks, &serv->sv_permsocks);
>>>> + list_splice_init(&serv->sv_permsocks, &permsocks);
>>>> + _nfsd_walk_listeners(info, serv, &permsocks, true);
>>>> + while (!list_empty(&permsocks)) {
>>>> + xprt = list_first_entry(&permsocks, struct svc_xprt, xpt_list);
>>>> + clear_bit(XPT_BUSY, &xprt->xpt_flags);
>>>> + set_bit(XPT_CLOSE, &xprt->xpt_flags);
>>>> + spin_unlock_bh(&serv->sv_lock);
>>>> + svc_xprt_close(xprt);
>>>> + spin_lock_bh(&serv->sv_lock);
>>>> + }
>>>> + spin_unlock_bh(&serv->sv_lock);
>>>> + goto out_unlock_mtx;
>>>> + }
>>>> + spin_unlock_bh(&serv->sv_lock);
>>>>
>>>> /* walk list of addrs again, open any that still don't exist */
>>>> nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) {
>>>> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
>>>> index da2a2531e110..7038fd8ef20a 100644
>>>> --- a/include/linux/sunrpc/svc_xprt.h
>>>> +++ b/include/linux/sunrpc/svc_xprt.h
>>>> @@ -186,6 +186,7 @@ int svc_xprt_names(struct svc_serv *serv, char *buf, const int buflen);
>>>> void svc_add_new_perm_xprt(struct svc_serv *serv, struct svc_xprt *xprt);
>>>> void svc_age_temp_xprts_now(struct svc_serv *, struct sockaddr *);
>>>> void svc_xprt_deferred_close(struct svc_xprt *xprt);
>>>> +void svc_xprt_dequeue_all(struct svc_serv *serv);
>>>>
>>>> static inline void svc_xprt_get(struct svc_xprt *xprt)
>>>> {
>>>> diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
>>>> index 963bbe251e52..4c1be01afdb7 100644
>>>> --- a/include/linux/sunrpc/svcsock.h
>>>> +++ b/include/linux/sunrpc/svcsock.h
>>>> @@ -65,7 +65,6 @@ int svc_addsock(struct svc_serv *serv, struct net *net,
>>>> const struct cred *cred);
>>>> void svc_init_xprt_sock(void);
>>>> void svc_cleanup_xprt_sock(void);
>>>> -
>>>> /*
>>>> * svc_makesock socket characteristics
>>>> */
>>>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
>>>> index 6973184ff667..2aa46b9468d4 100644
>>>> --- a/net/sunrpc/svc_xprt.c
>>>> +++ b/net/sunrpc/svc_xprt.c
>>>> @@ -890,6 +890,18 @@ void svc_recv(struct svc_rqst *rqstp)
>>>> }
>>>> EXPORT_SYMBOL_GPL(svc_recv);
>>>>
>>>> +void svc_xprt_dequeue_all(struct svc_serv *serv)
>>>> +{
>>>> + int i;
>>>> +
>>>> + for (i = 0; i < serv->sv_nrpools; i++) {
>>>> + struct svc_pool *pool = &serv->sv_pools[i];
>>>> +
>>>> + lwq_dequeue_all(&pool->sp_xprts);
>>>> + }
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(svc_xprt_dequeue_all);
>>>> +
>>>> /**
>>>> * svc_send - Return reply to client
>>>> * @rqstp: RPC transaction context
>>>
>>>
>>
>
--
Chuck Lever
prev parent reply other threads:[~2025-08-28 18:16 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-26 22:00 [RFC PATCH 1/1] nfsd: rework how a listener is removed Olga Kornievskaia
2025-08-27 14:21 ` Chuck Lever
2025-08-28 15:21 ` Chuck Lever
2025-08-28 17:29 ` Jeff Layton
2025-08-28 18:16 ` Chuck Lever [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=adddae47-63de-4e93-a53a-352ad0e5c772@oracle.com \
--to=chuck.lever@oracle.com \
--cc=Dai.Ngo@oracle.com \
--cc=jlayton@kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neil@brown.name \
--cc=okorniev@redhat.com \
--cc=tom@talpey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox