Linux NFS development
 help / color / mirror / Atom feed
From: Chuck Lever <chuck.lever@oracle.com>
To: Jeff Layton <jlayton@kernel.org>,
	Olga Kornievskaia <okorniev@redhat.com>
Cc: linux-nfs@vger.kernel.org, neil@brown.name, Dai.Ngo@oracle.com,
	tom@talpey.com
Subject: Re: [RFC PATCH 1/1] nfsd: rework how a listener is removed
Date: Thu, 28 Aug 2025 14:16:16 -0400	[thread overview]
Message-ID: <adddae47-63de-4e93-a53a-352ad0e5c772@oracle.com> (raw)
In-Reply-To: <e56f9194f0e65e85d92da4129f636fe40b34e54c.camel@kernel.org>

On 8/28/25 1:29 PM, Jeff Layton wrote:
> On Thu, 2025-08-28 at 11:21 -0400, Chuck Lever wrote:
>> On 8/27/25 10:21 AM, Chuck Lever wrote:
>>> On 8/26/25 6:00 PM, Olga Kornievskaia wrote:
>>>> This patch tries to address the following failure:
>>>> nfsdctl threads 0
>>>> nfsdctl listener +rdma::20049
>>>> nfsdctl listener +tcp::2049
>>>> nfsdctl listener -tcp::2049
>>>> nfsdctl: Error: Cannot assign requested address
>>>>
>>>> The reason for the failure is due to the fact that socket cleanup only
>>>> happens in __svc_rdma_free() which is a deferred work triggers when an
>>>> rdma transport is destroyed. To remove a listener nfsdctl is forced to
>>>> first remove all transports via svc_xprt_destroy_all() and then re-add
>>>> the ones that are left. Due to the fact that there isn't a way to
>>>> delete a particular entry from server's lwq sp_xprts that stores
>>>> transports. Going back to the deferred work done in __svc_rdma_free(),
>>>> the work might not get to run before nfsd_nl_listener_set_doit() creates
>>>> the new transports. As a result, it finds that something is still
>>>> listening of the rdma port and rdma_bind_addr() fails.
>>>>
>>>> Instead of using svc_xprt_destroy_all() to manipulate the sp_xprt,
>>>> instead introduce a function that just dequeues all transports. Then,
>>>> we add non-removed transports back to the list.
>>>>
>>>> Still not allowing to remove a listener while the server is active.
>>>>
>>>> We need to make several passes over the list of existing/new list
>>>> entries. On the first pass we determined if any of the entries need
>>>> to be removed. If so, we then check if the server has no active
>>>> threads. Then we dequeue all the transports and then go over the
>>>> list and recreate both permsocks list and sp_xprts lists. Then,
>>>> for the deleted transports, the transport is closed.
>>>
>>>> --- Comments:
>>>> (1) There is still a restriction on removing an active listener as
>>>> I dont know how to handle if the transport to be remove is currently
>>>> serving a request (it won't be on the sp_xprt list I believe?).
>>>
>>> This is a good reason why just setting a bit in the xprt and waiting for
>>> the close to complete is probably a better strategy than draining and
>>> refilling the permsock list.
>>>
>>> The idea of setting XPT_CLOSE and enqueuing the transport ... you know,
>>> like this:
>>>
>>>  151 /**
>>>
>>>  152  * svc_xprt_deferred_close - Close a transport
>>>
>>>  153  * @xprt: transport instance
>>>
>>>  154  *
>>>
>>>  155  * Used in contexts that need to defer the work of shutting down
>>>
>>>  156  * the transport to an nfsd thread.
>>>
>>>  157  */
>>>
>>>  158 void svc_xprt_deferred_close(struct svc_xprt *xprt)
>>>
>>>  159 {
>>>
>>>  160         trace_svc_xprt_close(xprt);
>>>
>>>  161         if (!test_and_set_bit(XPT_CLOSE, &xprt->xpt_flags))
>>>
>>>  162                 svc_xprt_enqueue(xprt);
>>>
>>>  163 }
>>>
>>>  164 EXPORT_SYMBOL_GPL(svc_xprt_deferred_close);
>>>
>>> I expect that eventually the xprt will show up to svc_handle_xprt() and
>>> get deleted there. But you might still need some serialization with
>>>   ->xpo_accept ?
>>
>> It occurred to me why the deferred close mechanism doesn't work: it
>> relies on having an nfsd thread to pick up the deferred work.
>>
>> If listener removal requires all nfsd threads to be terminated, there
>> is no thread to pick up the xprt and close it.
>>
> 
> Interesting. I guess that the old nfsdfs file just didn't allow you to
> get into this situation, since you couldn't remove a listener at all.
> 
> It really sounds like we just need a more selective version of
> svc_clean_up_xprts(). Something that can dequeue everything, close the
> ones that need to be closed (synchronously) and then requeues the rest.

That is roughly what Olga has done here. Now that I understand the
problem space a little better, maybe we can iterate on this.


>>>> In general, I'm unsure if there are other things I'm not considering.
>>>> (2) I'm questioning if in svc_xprt_dequeue_all() it is correct. I
>>>> used svc_cleanup_up_xprts() as the example.
>>>>> Fixes: d093c90892607 ("nfsd: fix management of listener transports")
>>>> Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
>>>> ---
>>>>  fs/nfsd/nfsctl.c                | 123 +++++++++++++++++++-------------
>>>>  include/linux/sunrpc/svc_xprt.h |   1 +
>>>>  include/linux/sunrpc/svcsock.h  |   1 -
>>>>  net/sunrpc/svc_xprt.c           |  12 ++++
>>>>  4 files changed, 88 insertions(+), 49 deletions(-)
>>>>
>>>> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
>>>> index dd3267b4c203..38aaaef4734e 100644
>>>> --- a/fs/nfsd/nfsctl.c
>>>> +++ b/fs/nfsd/nfsctl.c
>>>> @@ -1902,44 +1902,17 @@ int nfsd_nl_version_get_doit(struct sk_buff *skb, struct genl_info *info)
>>>>  	return err;
>>>>  }
>>>>  
>>>> -/**
>>>> - * nfsd_nl_listener_set_doit - set the nfs running sockets
>>>> - * @skb: reply buffer
>>>> - * @info: netlink metadata and command arguments
>>>> - *
>>>> - * Return 0 on success or a negative errno.
>>>> - */
>>>> -int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
>>>> +static void _nfsd_walk_listeners(struct genl_info *info, struct svc_serv *serv,
>>>> +				 struct list_head *permsocks, int modify_xprt)
>>>
>>> So this function looks for the one listener we need to remove.
>>>
>>> Should removing a listener also close down all active temporary sockets
>>> for the service, or should it kill only the ones that were established
>>> via the listener being removed, or should it leave all active temporary
>>> sockets in place?
>>>
>>> Perhaps this is why /all/ permanent and temporary sockets are currently
>>> being removed. Once the target listener is gone, clients can't
>>> re-establish new connections, and the service is effectively ready to
>>> be shut down cleanly.
>>>
>>>
>>>>  {
>>>>  	struct net *net = genl_info_net(info);
>>>> -	struct svc_xprt *xprt, *tmp;
>>>>  	const struct nlattr *attr;
>>>> -	struct svc_serv *serv;
>>>> -	LIST_HEAD(permsocks);
>>>> -	struct nfsd_net *nn;
>>>> -	bool delete = false;
>>>> -	int err, rem;
>>>> -
>>>> -	mutex_lock(&nfsd_mutex);
>>>> -
>>>> -	err = nfsd_create_serv(net);
>>>> -	if (err) {
>>>> -		mutex_unlock(&nfsd_mutex);
>>>> -		return err;
>>>> -	}
>>>> -
>>>> -	nn = net_generic(net, nfsd_net_id);
>>>> -	serv = nn->nfsd_serv;
>>>> -
>>>> -	spin_lock_bh(&serv->sv_lock);
>>>> +	struct svc_xprt *xprt, *tmp;
>>>> +	int rem;
>>>>  
>>>> -	/* Move all of the old listener sockets to a temp list */
>>>> -	list_splice_init(&serv->sv_permsocks, &permsocks);
>>>> +	if (modify_xprt)
>>>> +		svc_xprt_dequeue_all(serv);
>>>>  
>>>> -	/*
>>>> -	 * Walk the list of server_socks from userland and move any that match
>>>> -	 * back to sv_permsocks
>>>> -	 */
>>>>  	nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) {
>>>>  		struct nlattr *tb[NFSD_A_SOCK_MAX + 1];
>>>>  		const char *xcl_name;
>>>> @@ -1962,7 +1935,7 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
>>>>  		sa = nla_data(tb[NFSD_A_SOCK_ADDR]);
>>>>  
>>>>  		/* Put back any matching sockets */
>>>> -		list_for_each_entry_safe(xprt, tmp, &permsocks, xpt_list) {
>>>> +		list_for_each_entry_safe(xprt, tmp, permsocks, xpt_list) {
>>>>  			/* This shouldn't be possible */
>>>>  			if (WARN_ON_ONCE(xprt->xpt_net != net)) {
>>>>  				list_move(&xprt->xpt_list, &serv->sv_permsocks);
>>>> @@ -1971,35 +1944,89 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
>>>>  
>>>>  			/* If everything matches, put it back */
>>>>  			if (!strcmp(xprt->xpt_class->xcl_name, xcl_name) &&
>>>> -			    rpc_cmp_addr_port(sa, (struct sockaddr *)&xprt->xpt_local)) {
>>>> +			    rpc_cmp_addr_port(sa,
>>>> +				    (struct sockaddr *)&xprt->xpt_local)) {
>>>>  				list_move(&xprt->xpt_list, &serv->sv_permsocks);
>>>> +				if (modify_xprt)
>>>> +					svc_xprt_enqueue(xprt);
>>>>  				break;
>>>>  			}
>>>>  		}
>>>>  	}
>>>> +}
>>>> +
>>>> +/**
>>>> + * nfsd_nl_listener_set_doit - set the nfs running sockets
>>>> + * @skb: reply buffer
>>>> + * @info: netlink metadata and command arguments
>>>> + *
>>>> + * Return 0 on success or a negative errno.
>>>> + */
>>>> +int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
>>>> +{
>>>> +	struct net *net = genl_info_net(info);
>>>> +	struct svc_xprt *xprt;
>>>> +	const struct nlattr *attr;
>>>> +	struct svc_serv *serv;
>>>> +	LIST_HEAD(permsocks);
>>>> +	struct nfsd_net *nn;
>>>> +	bool delete = false;
>>>> +	int err, rem;
>>>> +
>>>> +	mutex_lock(&nfsd_mutex);
>>>> +
>>>> +	err = nfsd_create_serv(net);
>>>> +	if (err) {
>>>> +		mutex_unlock(&nfsd_mutex);
>>>> +		return err;
>>>> +	}
>>>> +
>>>> +	nn = net_generic(net, nfsd_net_id);
>>>> +	serv = nn->nfsd_serv;
>>>> +
>>>> +	spin_lock_bh(&serv->sv_lock);
>>>> +
>>>> +	/* Move all of the old listener sockets to a temp list */
>>>> +	list_splice_init(&serv->sv_permsocks, &permsocks);
>>>>  
>>>>  	/*
>>>> -	 * If there are listener transports remaining on the permsocks list,
>>>> -	 * it means we were asked to remove a listener.
>>>> +	 * Walk the list of server_socks from userland and move any that match
>>>> +	 * back to sv_permsocks. Determine if anything needs to be removed so
>>>> +	 * don't manipulate sp_xprts list.
>>>>  	 */
>>>> -	if (!list_empty(&permsocks)) {
>>>> -		list_splice_init(&permsocks, &serv->sv_permsocks);
>>>> -		delete = true;
>>>> -	}
>>>> -	spin_unlock_bh(&serv->sv_lock);
>>>> +	_nfsd_walk_listeners(info, serv, &permsocks, false);
>>>>  
>>>> -	/* Do not remove listeners while there are active threads. */
>>>> -	if (serv->sv_nrthreads) {
>>>> +	/* For now, no removing old sockets while server is running */
>>>> +	if (serv->sv_nrthreads && !list_empty(&permsocks)) {
>>>> +		list_splice_init(&permsocks, &serv->sv_permsocks);
>>>> +		spin_unlock_bh(&serv->sv_lock);
>>>>  		err = -EBUSY;
>>>>  		goto out_unlock_mtx;
>>>>  	}
>>>>  
>>>>  	/*
>>>> -	 * Since we can't delete an arbitrary llist entry, destroy the
>>>> -	 * remaining listeners and recreate the list.
>>>> +	 * If there are listener transports remaining on the permsocks list,
>>>> +	 * it means we were asked to remove a listener. Walk the list again,
>>>> +	 * but this time also manage the sp_xprts but first removing all of
>>>> +	 * them and only adding back the ones not being deleted. Then close
>>>> +	 * the ones left on the list.
>>>>  	 */
>>>> -	if (delete)
>>>> -		svc_xprt_destroy_all(serv, net, false);
>>>> +	if (!list_empty(&permsocks)) {
>>>> +		list_splice_init(&permsocks, &serv->sv_permsocks);
>>>> +		list_splice_init(&serv->sv_permsocks, &permsocks);
>>>> +		_nfsd_walk_listeners(info, serv, &permsocks, true);
>>>> +		while (!list_empty(&permsocks)) {
>>>> +			xprt = list_first_entry(&permsocks, struct svc_xprt, xpt_list);
>>>> +			clear_bit(XPT_BUSY, &xprt->xpt_flags);
>>>> +			set_bit(XPT_CLOSE, &xprt->xpt_flags);
>>>> +			spin_unlock_bh(&serv->sv_lock);
>>>> +			svc_xprt_close(xprt);
>>>> +			spin_lock_bh(&serv->sv_lock);
>>>> +		}
>>>> +		spin_unlock_bh(&serv->sv_lock);
>>>> +		goto out_unlock_mtx;
>>>> +	}
>>>> +	spin_unlock_bh(&serv->sv_lock);
>>>>  
>>>>  	/* walk list of addrs again, open any that still don't exist */
>>>>  	nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) {
>>>> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
>>>> index da2a2531e110..7038fd8ef20a 100644
>>>> --- a/include/linux/sunrpc/svc_xprt.h
>>>> +++ b/include/linux/sunrpc/svc_xprt.h
>>>> @@ -186,6 +186,7 @@ int	svc_xprt_names(struct svc_serv *serv, char *buf, const int buflen);
>>>>  void	svc_add_new_perm_xprt(struct svc_serv *serv, struct svc_xprt *xprt);
>>>>  void	svc_age_temp_xprts_now(struct svc_serv *, struct sockaddr *);
>>>>  void	svc_xprt_deferred_close(struct svc_xprt *xprt);
>>>> +void	svc_xprt_dequeue_all(struct svc_serv *serv);
>>>>  
>>>>  static inline void svc_xprt_get(struct svc_xprt *xprt)
>>>>  {
>>>> diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
>>>> index 963bbe251e52..4c1be01afdb7 100644
>>>> --- a/include/linux/sunrpc/svcsock.h
>>>> +++ b/include/linux/sunrpc/svcsock.h
>>>> @@ -65,7 +65,6 @@ int		svc_addsock(struct svc_serv *serv, struct net *net,
>>>>  			    const struct cred *cred);
>>>>  void		svc_init_xprt_sock(void);
>>>>  void		svc_cleanup_xprt_sock(void);
>>>> -
>>>>  /*
>>>>   * svc_makesock socket characteristics
>>>>   */
>>>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
>>>> index 6973184ff667..2aa46b9468d4 100644
>>>> --- a/net/sunrpc/svc_xprt.c
>>>> +++ b/net/sunrpc/svc_xprt.c
>>>> @@ -890,6 +890,18 @@ void svc_recv(struct svc_rqst *rqstp)
>>>>  }
>>>>  EXPORT_SYMBOL_GPL(svc_recv);
>>>>  
>>>> +void svc_xprt_dequeue_all(struct svc_serv *serv)
>>>> +{
>>>> +	int i;
>>>> +
>>>> +	for (i = 0; i < serv->sv_nrpools; i++) {
>>>> +		struct svc_pool *pool = &serv->sv_pools[i];
>>>> +
>>>> +		lwq_dequeue_all(&pool->sp_xprts);
>>>> +	}
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(svc_xprt_dequeue_all);
>>>> +
>>>>  /**
>>>>   * svc_send - Return reply to client
>>>>   * @rqstp: RPC transaction context
>>>
>>>
>>
> 


-- 
Chuck Lever

      reply	other threads:[~2025-08-28 18:16 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-26 22:00 [RFC PATCH 1/1] nfsd: rework how a listener is removed Olga Kornievskaia
2025-08-27 14:21 ` Chuck Lever
2025-08-28 15:21   ` Chuck Lever
2025-08-28 17:29     ` Jeff Layton
2025-08-28 18:16       ` Chuck Lever [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adddae47-63de-4e93-a53a-352ad0e5c772@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=Dai.Ngo@oracle.com \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neil@brown.name \
    --cc=okorniev@redhat.com \
    --cc=tom@talpey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox