From: Scott Mayhew <smayhew@redhat.com>
To: NeilBrown <neilb@ownmail.net>
Cc: steved@redhat.com, bcodding@redhat.com, yoyang@redhat.com,
linux-nfs@vger.kernel.org
Subject: Re: [nfs-utils PATCH] rpc-statd.service: weaken the dependency on rpcbind.socket
Date: Tue, 9 Sep 2025 08:49:39 -0400 [thread overview]
Message-ID: <aMAiY8n7KtMnoKAz@aion> (raw)
In-Reply-To: <175738314848.2850467.15762759585599929390@noble.neil.brown.name>
On Tue, 09 Sep 2025, NeilBrown wrote:
> On Tue, 09 Sep 2025, Scott Mayhew wrote:
> > On Sat, 06 Sep 2025, NeilBrown wrote:
> >
> > > On Sat, 06 Sep 2025, Scott Mayhew wrote:
> > > > In 91da135f ("systemd unit files: fix up dependencies on rpcbind"),
> > > > Neil laid out the rationale for how the nfs services should define their
> > > > dependencies on rpcbind. In a nutshell:
> > > >
> > > > 1. Dependencies should only be defined using rpcbind.socket
> > > > 2. Ordering for dependencies should only be defined usint "After="
> > > > 3. nfs-server.service should use "Wants=rpcbind.socket", to allow
> > > > rpcbind.socket to be masked in NFSv4-only setups.
> > > > 4. rpc-statd.service should use "Requires=rpcbind.socket", as rpc.statd
> > > > is useless if it can't register with rpcbind.
> > > >
> > > > Then in https://bugzilla.redhat.com/show_bug.cgi?id=2100395, Ben noted
> > > > that due to the way the dependencies are ordered, when 'systemctl stop
> > > > rpcbind.socket' is run, systemd first sends SIGTERM to rpcbind, then
> > > > SIGTERM to rpc.statd. On SIGTERM, rpcbind tears down /var/run/rpcbind.sock.
> > > > However, rpc-statd on SIGTERM attempts to unregister from rpcbind. This
> > > > results in a long delay:
> > > >
> > > > [root@rawhide ~]# time systemctl restart rpcbind.socket
> > > >
> > > > real 1m0.147s
> > > > user 0m0.004s
> > > > sys 0m0.003s
> > > >
> > > > 8a835ceb ("rpc-statd.service: Stop rpcbind and rpc.stat in an exit race")
> > > > fixed this by changing the dependency in rpc-statd.service to use
> > > > "After=rpcbind.service", bending rule #1 from above.
> > >
> > > Thanks for the thorough and detailed explanation.
> > >
> > > I'd like to suggest a different fix. Change rpc-statd.service to
> > > declare:
> > >
> > > After=network-online.target nss-lookup.target rpcbind.socket rpcbind.service
> > >
> > > i.e. it is declared to be After both the socket and the service.
> > >
> > > "After" declarations only have effect if the units are in the same
> > > transaction. If the Unit is not being started or stopped, the After
> > > declaration has no effect.
> > >
> > > So on startup, this will ensure rpcbind.socket is started before
> > > rpc-statd.service.
> > > On shutdown in a transaction that stops both rpc-statd.service and
> > > rpcbind.service, rpcbind.service won't be stopped until after
> > > rpc-statd.service is stopped.
> >
> > That works too.
> >
> > >
> > > I agree that it isn't necessary to restart rpc-statd when rpcbind is
> > > restarted.
> > > Maybe that is a justification to use Wants instead of Requires.
> > > Or maybe Upholds would be even better.
> >
> > I think Upholds is confusing.... especially since there aren't any
> > existing unit files using it, at least on a stock Fedora Rawhide
> > system. I don't see it being used on OpenSUSE Tumbleweed or Debian
> > Trixie either. I think it's going to confuse users if they try to stop
> > rpcbind.socket and then find that it's still running. Finally, when I tested
> > it, it prevented me from stopping rpc-statd. Eventually the shutdown
> > timer hit and systemd sent rpc-statd a SIGABRT, which in turned
> > triggered the systemd-coredump handler. That's a whole mess of syslog
> > entries that's going to more bug reports. I'd rather stick with Wants.
>
> Thanks for testing. I've never used Upholds myself but I was reading
> the man page any wondered if it might be a good fit. Apparently it
> isn't.
>
>
> >
> > >
> > > I wonder if putting
> > >
> > > ConditionPathIsSymbolisLink !/etc/systemd/system/rpcbind.socket
> >
> > I'm lost. What what cause the rpcbind.socket symlink to be created
> > directly in /etc/systemd/system? I've seen it get created in
> > /etc/systemd/system/sockets.target.wants or
> > /etc/systemd/system/multi-user.target.wants, but never directly in
> > /etc/systemd/system.
>
> $ sudo systemctl mask rpcbind.socket
> Created symlink '/etc/systemd/system/rpcbind.socket' → '/dev/null'.
>
> I'm not sure it's a good idea. I was mostly thinking aloud.
Doh! Apparently I wasn't thinking at all.
>
> I think the After line should be changed to include both .service and
> .socket and that cleanly fixed just the observed problem about shutdown.
>
> And I don't think any other change should be made without a clear
> demonstrated need.
Sounds good. Thanks for the feedback.
-Scott
>
> Thanks,
> NeilBrown
>
>
> >
> > -Scott
> > >
> > > in rpc-statd.service would be a suitable way to stop rpc-statd from
> > > starting if rpcbind.socket is masked.
> > >
> > > In any case I think there are two separate issues here which deserve two
> > > separate patch.
> > > 1/ shutdown ordering isn't handled correctly. Adding the extra After
> > > directive should fix that
> > > 2/ rpc.statd is restarted unnecessarily. Wants or Upholds might be part
> > > of the solution.
> > >
> > > Thanks,
> > > NeilBrown
> > >
> > >
> > >
> > > >
> > > > Yongcheng recently noted that when runnnig the following test:
> > > >
> > > > [root@rawhide ~]# for i in `seq 10`; do systemctl reset-failed; \
> > > > systemctl stop rpcbind rpcbind.socket ; systemctl restart nfs-server ; \
> > > > systemctl status rpc-statd; done
> > > >
> > > > rpc-statd.service would often fail to start:
> > > >
> > > > × rpc-statd.service - NFS status monitor for NFSv2/3 locking.
> > > > Loaded: loaded (/usr/lib/systemd/system/rpc-statd.service; enabled-runtime; preset: disabled)
> > > > Drop-In: /usr/lib/systemd/system/service.d
> > > > └─10-timeout-abort.conf
> > > > Active: failed (Result: exit-code) since Fri 2025-09-05 18:01:15 EDT; 229ms ago
> > > > Duration: 228ms
> > > > Invocation: bafb2bb00761439ebc348000704e8fbb
> > > > Docs: man:rpc.statd(8)
> > > > Process: 29937 ExecStart=/usr/sbin/rpc.statd (code=exited, status=1/FAILURE)
> > > > Mem peak: 1.5M
> > > > CPU: 7ms
> > > >
> > > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Version 2.8.2 starting
> > > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Flags: TI-RPC
> > > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, udp): svc_reg() err: RPC: Remote system error - Connection refused
> > > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, tcp): svc_reg() err: RPC: Success
> > > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, udp6): svc_reg() err: RPC: Success
> > > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, tcp6): svc_reg() err: RPC: Success
> > > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: failed to create RPC listeners, exiting
> > > > Sep 05 18:01:15 rawhide.smayhew.test systemd[1]: rpc-statd.service: Control process exited, code=exited, status=1/FAILURE
> > > > Sep 05 18:01:15 rawhide.smayhew.test systemd[1]: rpc-statd.service: Failed with result 'exit-code'.
> > > > Sep 05 18:01:15 rawhide.smayhew.test systemd[1]: Failed to start rpc-statd.service - NFS status monitor for NFSv2/3 locking..
> > > >
> > > > I propose we revert the change from 8a835ceb and instead turn the
> > > > dependency into a weak dependency by using "Wants=rpcbind.socket"
> > > > instead of "Requires=rpcbind.socket". This bends rule #4 above and will
> > > > make it so that systemd will try to start rpcbind.socket if it isn't
> > > > already running when rpc-statd.service starts, but it won't restart
> > > > rpc-statd.service whenever rpcbind is restarted. Frankly, we shouldn't
> > > > need to restart services whenever rpcbind is restarted (thats why
> > > > rpcbind has the warmstart feature). The only drawback is that now if an
> > > > admin wants to set up an NFSv4-only server by masking rpcbind.socket,
> > > > they'll need to mask rpc-statd.service as well. I don't think that's
> > > > too much to ask, so the nfs.systemd man page has been updated
> > > > accordingly.
> > > >
> > > > Signed-off-by: Scott Mayhew <smayhew@redhat.com>
> > > > ---
> > > > systemd/nfs.systemd.man | 10 +++++++---
> > > > systemd/rpc-statd.service | 5 +++--
> > > > 2 files changed, 10 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/systemd/nfs.systemd.man b/systemd/nfs.systemd.man
> > > > index a8476038..93fb87cd 100644
> > > > --- a/systemd/nfs.systemd.man
> > > > +++ b/systemd/nfs.systemd.man
> > > > @@ -137,7 +137,9 @@ NFSv2) and does not want
> > > > .I rpcbind
> > > > to be running, the correct approach is to run
> > > > .RS
> > > > -.B systemctl mask rpcbind
> > > > +.B systemctl mask rpcbind.socket
> > > > +.br
> > > > +.B systemctl mask rpc-statd.service
> > > > .RE
> > > > This will disable
> > > > .IR rpcbind ,
> > > > @@ -145,9 +147,11 @@ and the various NFS services which depend on it (and are only needed
> > > > for NFSv3) will refuse to start, without interfering with the
> > > > operation of NFSv4 services. In particular,
> > > > .I rpc.statd
> > > > -will not run when
> > > > +will fail to start when
> > > > .I rpcbind
> > > > -is masked.
> > > > +is masked, so
> > > > +.I rpc-statd.service
> > > > +should be masked as well.
> > > > .PP
> > > > .I idmapd
> > > > is only needed for NFSv4, and even then is not needed when the client
> > > > diff --git a/systemd/rpc-statd.service b/systemd/rpc-statd.service
> > > > index 660ed861..4e138f69 100644
> > > > --- a/systemd/rpc-statd.service
> > > > +++ b/systemd/rpc-statd.service
> > > > @@ -3,10 +3,11 @@ Description=NFS status monitor for NFSv2/3 locking.
> > > > Documentation=man:rpc.statd(8)
> > > > DefaultDependencies=no
> > > > Conflicts=umount.target
> > > > -Requires=nss-lookup.target rpcbind.socket
> > > > +Requires=nss-lookup.target
> > > > +Wants=rpcbind.socket
> > > > Wants=network-online.target
> > > > Wants=rpc-statd-notify.service
> > > > -After=network-online.target nss-lookup.target rpcbind.service
> > > > +After=network-online.target nss-lookup.target rpcbind.socket
> > > >
> > > > PartOf=nfs-utils.service
> > > > IgnoreOnIsolate=yes
> > > > --
> > > > 2.50.1
> > > >
> > > >
> > >
> > >
> >
> >
>
prev parent reply other threads:[~2025-09-09 12:49 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-05 22:35 [nfs-utils PATCH] rpc-statd.service: weaken the dependency on rpcbind.socket Scott Mayhew
2025-09-05 23:40 ` NeilBrown
2025-09-08 19:07 ` Scott Mayhew
2025-09-09 1:59 ` NeilBrown
2025-09-09 12:49 ` Scott Mayhew [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aMAiY8n7KtMnoKAz@aion \
--to=smayhew@redhat.com \
--cc=bcodding@redhat.com \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@ownmail.net \
--cc=steved@redhat.com \
--cc=yoyang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.