From: Jeff Layton <jlayton@kernel.org>
To: Chuck Lever <chuck.lever@oracle.com>,
Olga Kornievskaia <okorniev@redhat.com>
Cc: linux-nfs@vger.kernel.org, neil@brown.name, Dai.Ngo@oracle.com,
tom@talpey.com
Subject: Re: [RFC PATCH 1/1] nfsd: rework how a listener is removed
Date: Thu, 28 Aug 2025 13:29:38 -0400 [thread overview]
Message-ID: <e56f9194f0e65e85d92da4129f636fe40b34e54c.camel@kernel.org> (raw)
In-Reply-To: <a641de95-07d3-479d-be64-11d99e56e08b@oracle.com>
On Thu, 2025-08-28 at 11:21 -0400, Chuck Lever wrote:
> On 8/27/25 10:21 AM, Chuck Lever wrote:
> > On 8/26/25 6:00 PM, Olga Kornievskaia wrote:
> > > This patch tries to address the following failure:
> > > nfsdctl threads 0
> > > nfsdctl listener +rdma::20049
> > > nfsdctl listener +tcp::2049
> > > nfsdctl listener -tcp::2049
> > > nfsdctl: Error: Cannot assign requested address
> > >
> > > The reason for the failure is due to the fact that socket cleanup only
> > > happens in __svc_rdma_free() which is a deferred work triggers when an
> > > rdma transport is destroyed. To remove a listener nfsdctl is forced to
> > > first remove all transports via svc_xprt_destroy_all() and then re-add
> > > the ones that are left. Due to the fact that there isn't a way to
> > > delete a particular entry from server's lwq sp_xprts that stores
> > > transports. Going back to the deferred work done in __svc_rdma_free(),
> > > the work might not get to run before nfsd_nl_listener_set_doit() creates
> > > the new transports. As a result, it finds that something is still
> > > listening of the rdma port and rdma_bind_addr() fails.
> > >
> > > Instead of using svc_xprt_destroy_all() to manipulate the sp_xprt,
> > > instead introduce a function that just dequeues all transports. Then,
> > > we add non-removed transports back to the list.
> > >
> > > Still not allowing to remove a listener while the server is active.
> > >
> > > We need to make several passes over the list of existing/new list
> > > entries. On the first pass we determined if any of the entries need
> > > to be removed. If so, we then check if the server has no active
> > > threads. Then we dequeue all the transports and then go over the
> > > list and recreate both permsocks list and sp_xprts lists. Then,
> > > for the deleted transports, the transport is closed.
> >
> > > --- Comments:
> > > (1) There is still a restriction on removing an active listener as
> > > I dont know how to handle if the transport to be remove is currently
> > > serving a request (it won't be on the sp_xprt list I believe?).
> >
> > This is a good reason why just setting a bit in the xprt and waiting for
> > the close to complete is probably a better strategy than draining and
> > refilling the permsock list.
> >
> > The idea of setting XPT_CLOSE and enqueuing the transport ... you know,
> > like this:
> >
> > 151 /**
> >
> > 152 * svc_xprt_deferred_close - Close a transport
> >
> > 153 * @xprt: transport instance
> >
> > 154 *
> >
> > 155 * Used in contexts that need to defer the work of shutting down
> >
> > 156 * the transport to an nfsd thread.
> >
> > 157 */
> >
> > 158 void svc_xprt_deferred_close(struct svc_xprt *xprt)
> >
> > 159 {
> >
> > 160 trace_svc_xprt_close(xprt);
> >
> > 161 if (!test_and_set_bit(XPT_CLOSE, &xprt->xpt_flags))
> >
> > 162 svc_xprt_enqueue(xprt);
> >
> > 163 }
> >
> > 164 EXPORT_SYMBOL_GPL(svc_xprt_deferred_close);
> >
> > I expect that eventually the xprt will show up to svc_handle_xprt() and
> > get deleted there. But you might still need some serialization with
> > ->xpo_accept ?
>
> It occurred to me why the deferred close mechanism doesn't work: it
> relies on having an nfsd thread to pick up the deferred work.
>
> If listener removal requires all nfsd threads to be terminated, there
> is no thread to pick up the xprt and close it.
>
Interesting. I guess that the old nfsdfs file just didn't allow you to
get into this situation, since you couldn't remove a listener at all.
It really sounds like we just need a more selective version of
svc_clean_up_xprts(). Something that can dequeue everything, close the
ones that need to be closed (synchronously) and then requeues the rest.
>
> > > In general, I'm unsure if there are other things I'm not considering.
> > > (2) I'm questioning if in svc_xprt_dequeue_all() it is correct. I
> > > used svc_cleanup_up_xprts() as the example.
> > > > Fixes: d093c90892607 ("nfsd: fix management of listener transports")
> > > Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
> > > ---
> > > fs/nfsd/nfsctl.c | 123 +++++++++++++++++++-------------
> > > include/linux/sunrpc/svc_xprt.h | 1 +
> > > include/linux/sunrpc/svcsock.h | 1 -
> > > net/sunrpc/svc_xprt.c | 12 ++++
> > > 4 files changed, 88 insertions(+), 49 deletions(-)
> > >
> > > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> > > index dd3267b4c203..38aaaef4734e 100644
> > > --- a/fs/nfsd/nfsctl.c
> > > +++ b/fs/nfsd/nfsctl.c
> > > @@ -1902,44 +1902,17 @@ int nfsd_nl_version_get_doit(struct sk_buff *skb, struct genl_info *info)
> > > return err;
> > > }
> > >
> > > -/**
> > > - * nfsd_nl_listener_set_doit - set the nfs running sockets
> > > - * @skb: reply buffer
> > > - * @info: netlink metadata and command arguments
> > > - *
> > > - * Return 0 on success or a negative errno.
> > > - */
> > > -int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
> > > +static void _nfsd_walk_listeners(struct genl_info *info, struct svc_serv *serv,
> > > + struct list_head *permsocks, int modify_xprt)
> >
> > So this function looks for the one listener we need to remove.
> >
> > Should removing a listener also close down all active temporary sockets
> > for the service, or should it kill only the ones that were established
> > via the listener being removed, or should it leave all active temporary
> > sockets in place?
> >
> > Perhaps this is why /all/ permanent and temporary sockets are currently
> > being removed. Once the target listener is gone, clients can't
> > re-establish new connections, and the service is effectively ready to
> > be shut down cleanly.
> >
> >
> > > {
> > > struct net *net = genl_info_net(info);
> > > - struct svc_xprt *xprt, *tmp;
> > > const struct nlattr *attr;
> > > - struct svc_serv *serv;
> > > - LIST_HEAD(permsocks);
> > > - struct nfsd_net *nn;
> > > - bool delete = false;
> > > - int err, rem;
> > > -
> > > - mutex_lock(&nfsd_mutex);
> > > -
> > > - err = nfsd_create_serv(net);
> > > - if (err) {
> > > - mutex_unlock(&nfsd_mutex);
> > > - return err;
> > > - }
> > > -
> > > - nn = net_generic(net, nfsd_net_id);
> > > - serv = nn->nfsd_serv;
> > > -
> > > - spin_lock_bh(&serv->sv_lock);
> > > + struct svc_xprt *xprt, *tmp;
> > > + int rem;
> > >
> > > - /* Move all of the old listener sockets to a temp list */
> > > - list_splice_init(&serv->sv_permsocks, &permsocks);
> > > + if (modify_xprt)
> > > + svc_xprt_dequeue_all(serv);
> > >
> > > - /*
> > > - * Walk the list of server_socks from userland and move any that match
> > > - * back to sv_permsocks
> > > - */
> > > nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) {
> > > struct nlattr *tb[NFSD_A_SOCK_MAX + 1];
> > > const char *xcl_name;
> > > @@ -1962,7 +1935,7 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
> > > sa = nla_data(tb[NFSD_A_SOCK_ADDR]);
> > >
> > > /* Put back any matching sockets */
> > > - list_for_each_entry_safe(xprt, tmp, &permsocks, xpt_list) {
> > > + list_for_each_entry_safe(xprt, tmp, permsocks, xpt_list) {
> > > /* This shouldn't be possible */
> > > if (WARN_ON_ONCE(xprt->xpt_net != net)) {
> > > list_move(&xprt->xpt_list, &serv->sv_permsocks);
> > > @@ -1971,35 +1944,89 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
> > >
> > > /* If everything matches, put it back */
> > > if (!strcmp(xprt->xpt_class->xcl_name, xcl_name) &&
> > > - rpc_cmp_addr_port(sa, (struct sockaddr *)&xprt->xpt_local)) {
> > > + rpc_cmp_addr_port(sa,
> > > + (struct sockaddr *)&xprt->xpt_local)) {
> > > list_move(&xprt->xpt_list, &serv->sv_permsocks);
> > > + if (modify_xprt)
> > > + svc_xprt_enqueue(xprt);
> > > break;
> > > }
> > > }
> > > }
> > > +}
> > > +
> > > +/**
> > > + * nfsd_nl_listener_set_doit - set the nfs running sockets
> > > + * @skb: reply buffer
> > > + * @info: netlink metadata and command arguments
> > > + *
> > > + * Return 0 on success or a negative errno.
> > > + */
> > > +int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
> > > +{
> > > + struct net *net = genl_info_net(info);
> > > + struct svc_xprt *xprt;
> > > + const struct nlattr *attr;
> > > + struct svc_serv *serv;
> > > + LIST_HEAD(permsocks);
> > > + struct nfsd_net *nn;
> > > + bool delete = false;
> > > + int err, rem;
> > > +
> > > + mutex_lock(&nfsd_mutex);
> > > +
> > > + err = nfsd_create_serv(net);
> > > + if (err) {
> > > + mutex_unlock(&nfsd_mutex);
> > > + return err;
> > > + }
> > > +
> > > + nn = net_generic(net, nfsd_net_id);
> > > + serv = nn->nfsd_serv;
> > > +
> > > + spin_lock_bh(&serv->sv_lock);
> > > +
> > > + /* Move all of the old listener sockets to a temp list */
> > > + list_splice_init(&serv->sv_permsocks, &permsocks);
> > >
> > > /*
> > > - * If there are listener transports remaining on the permsocks list,
> > > - * it means we were asked to remove a listener.
> > > + * Walk the list of server_socks from userland and move any that match
> > > + * back to sv_permsocks. Determine if anything needs to be removed so
> > > + * don't manipulate sp_xprts list.
> > > */
> > > - if (!list_empty(&permsocks)) {
> > > - list_splice_init(&permsocks, &serv->sv_permsocks);
> > > - delete = true;
> > > - }
> > > - spin_unlock_bh(&serv->sv_lock);
> > > + _nfsd_walk_listeners(info, serv, &permsocks, false);
> > >
> > > - /* Do not remove listeners while there are active threads. */
> > > - if (serv->sv_nrthreads) {
> > > + /* For now, no removing old sockets while server is running */
> > > + if (serv->sv_nrthreads && !list_empty(&permsocks)) {
> > > + list_splice_init(&permsocks, &serv->sv_permsocks);
> > > + spin_unlock_bh(&serv->sv_lock);
> > > err = -EBUSY;
> > > goto out_unlock_mtx;
> > > }
> > >
> > > /*
> > > - * Since we can't delete an arbitrary llist entry, destroy the
> > > - * remaining listeners and recreate the list.
> > > + * If there are listener transports remaining on the permsocks list,
> > > + * it means we were asked to remove a listener. Walk the list again,
> > > + * but this time also manage the sp_xprts but first removing all of
> > > + * them and only adding back the ones not being deleted. Then close
> > > + * the ones left on the list.
> > > */
> > > - if (delete)
> > > - svc_xprt_destroy_all(serv, net, false);
> > > + if (!list_empty(&permsocks)) {
> > > + list_splice_init(&permsocks, &serv->sv_permsocks);
> > > + list_splice_init(&serv->sv_permsocks, &permsocks);
> > > + _nfsd_walk_listeners(info, serv, &permsocks, true);
> > > + while (!list_empty(&permsocks)) {
> > > + xprt = list_first_entry(&permsocks, struct svc_xprt, xpt_list);
> > > + clear_bit(XPT_BUSY, &xprt->xpt_flags);
> > > + set_bit(XPT_CLOSE, &xprt->xpt_flags);
> > > + spin_unlock_bh(&serv->sv_lock);
> > > + svc_xprt_close(xprt);
> > > + spin_lock_bh(&serv->sv_lock);
> > > + }
> > > + spin_unlock_bh(&serv->sv_lock);
> > > + goto out_unlock_mtx;
> > > + }
> > > + spin_unlock_bh(&serv->sv_lock);
> > >
> > > /* walk list of addrs again, open any that still don't exist */
> > > nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) {
> > > diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
> > > index da2a2531e110..7038fd8ef20a 100644
> > > --- a/include/linux/sunrpc/svc_xprt.h
> > > +++ b/include/linux/sunrpc/svc_xprt.h
> > > @@ -186,6 +186,7 @@ int svc_xprt_names(struct svc_serv *serv, char *buf, const int buflen);
> > > void svc_add_new_perm_xprt(struct svc_serv *serv, struct svc_xprt *xprt);
> > > void svc_age_temp_xprts_now(struct svc_serv *, struct sockaddr *);
> > > void svc_xprt_deferred_close(struct svc_xprt *xprt);
> > > +void svc_xprt_dequeue_all(struct svc_serv *serv);
> > >
> > > static inline void svc_xprt_get(struct svc_xprt *xprt)
> > > {
> > > diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
> > > index 963bbe251e52..4c1be01afdb7 100644
> > > --- a/include/linux/sunrpc/svcsock.h
> > > +++ b/include/linux/sunrpc/svcsock.h
> > > @@ -65,7 +65,6 @@ int svc_addsock(struct svc_serv *serv, struct net *net,
> > > const struct cred *cred);
> > > void svc_init_xprt_sock(void);
> > > void svc_cleanup_xprt_sock(void);
> > > -
> > > /*
> > > * svc_makesock socket characteristics
> > > */
> > > diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> > > index 6973184ff667..2aa46b9468d4 100644
> > > --- a/net/sunrpc/svc_xprt.c
> > > +++ b/net/sunrpc/svc_xprt.c
> > > @@ -890,6 +890,18 @@ void svc_recv(struct svc_rqst *rqstp)
> > > }
> > > EXPORT_SYMBOL_GPL(svc_recv);
> > >
> > > +void svc_xprt_dequeue_all(struct svc_serv *serv)
> > > +{
> > > + int i;
> > > +
> > > + for (i = 0; i < serv->sv_nrpools; i++) {
> > > + struct svc_pool *pool = &serv->sv_pools[i];
> > > +
> > > + lwq_dequeue_all(&pool->sp_xprts);
> > > + }
> > > +}
> > > +EXPORT_SYMBOL_GPL(svc_xprt_dequeue_all);
> > > +
> > > /**
> > > * svc_send - Return reply to client
> > > * @rqstp: RPC transaction context
> >
> >
>
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2025-08-28 17:29 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-26 22:00 [RFC PATCH 1/1] nfsd: rework how a listener is removed Olga Kornievskaia
2025-08-27 14:21 ` Chuck Lever
2025-08-28 15:21 ` Chuck Lever
2025-08-28 17:29 ` Jeff Layton [this message]
2025-08-28 18:16 ` Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e56f9194f0e65e85d92da4129f636fe40b34e54c.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=Dai.Ngo@oracle.com \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
--cc=neil@brown.name \
--cc=okorniev@redhat.com \
--cc=tom@talpey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox