* [nfs-utils PATCH] rpc-statd.service: weaken the dependency on rpcbind.socket
@ 2025-09-05 22:35 Scott Mayhew
2025-09-05 23:40 ` NeilBrown
0 siblings, 1 reply; 5+ messages in thread
From: Scott Mayhew @ 2025-09-05 22:35 UTC (permalink / raw)
To: steved; +Cc: neil, bcodding, yoyang, linux-nfs
In 91da135f ("systemd unit files: fix up dependencies on rpcbind"),
Neil laid out the rationale for how the nfs services should define their
dependencies on rpcbind. In a nutshell:
1. Dependencies should only be defined using rpcbind.socket
2. Ordering for dependencies should only be defined usint "After="
3. nfs-server.service should use "Wants=rpcbind.socket", to allow
rpcbind.socket to be masked in NFSv4-only setups.
4. rpc-statd.service should use "Requires=rpcbind.socket", as rpc.statd
is useless if it can't register with rpcbind.
Then in https://bugzilla.redhat.com/show_bug.cgi?id=2100395, Ben noted
that due to the way the dependencies are ordered, when 'systemctl stop
rpcbind.socket' is run, systemd first sends SIGTERM to rpcbind, then
SIGTERM to rpc.statd. On SIGTERM, rpcbind tears down /var/run/rpcbind.sock.
However, rpc-statd on SIGTERM attempts to unregister from rpcbind. This
results in a long delay:
[root@rawhide ~]# time systemctl restart rpcbind.socket
real 1m0.147s
user 0m0.004s
sys 0m0.003s
8a835ceb ("rpc-statd.service: Stop rpcbind and rpc.stat in an exit race")
fixed this by changing the dependency in rpc-statd.service to use
"After=rpcbind.service", bending rule #1 from above.
Yongcheng recently noted that when runnnig the following test:
[root@rawhide ~]# for i in `seq 10`; do systemctl reset-failed; \
systemctl stop rpcbind rpcbind.socket ; systemctl restart nfs-server ; \
systemctl status rpc-statd; done
rpc-statd.service would often fail to start:
× rpc-statd.service - NFS status monitor for NFSv2/3 locking.
Loaded: loaded (/usr/lib/systemd/system/rpc-statd.service; enabled-runtime; preset: disabled)
Drop-In: /usr/lib/systemd/system/service.d
└─10-timeout-abort.conf
Active: failed (Result: exit-code) since Fri 2025-09-05 18:01:15 EDT; 229ms ago
Duration: 228ms
Invocation: bafb2bb00761439ebc348000704e8fbb
Docs: man:rpc.statd(8)
Process: 29937 ExecStart=/usr/sbin/rpc.statd (code=exited, status=1/FAILURE)
Mem peak: 1.5M
CPU: 7ms
Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Version 2.8.2 starting
Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Flags: TI-RPC
Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, udp): svc_reg() err: RPC: Remote system error - Connection refused
Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, tcp): svc_reg() err: RPC: Success
Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, udp6): svc_reg() err: RPC: Success
Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, tcp6): svc_reg() err: RPC: Success
Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: failed to create RPC listeners, exiting
Sep 05 18:01:15 rawhide.smayhew.test systemd[1]: rpc-statd.service: Control process exited, code=exited, status=1/FAILURE
Sep 05 18:01:15 rawhide.smayhew.test systemd[1]: rpc-statd.service: Failed with result 'exit-code'.
Sep 05 18:01:15 rawhide.smayhew.test systemd[1]: Failed to start rpc-statd.service - NFS status monitor for NFSv2/3 locking..
I propose we revert the change from 8a835ceb and instead turn the
dependency into a weak dependency by using "Wants=rpcbind.socket"
instead of "Requires=rpcbind.socket". This bends rule #4 above and will
make it so that systemd will try to start rpcbind.socket if it isn't
already running when rpc-statd.service starts, but it won't restart
rpc-statd.service whenever rpcbind is restarted. Frankly, we shouldn't
need to restart services whenever rpcbind is restarted (thats why
rpcbind has the warmstart feature). The only drawback is that now if an
admin wants to set up an NFSv4-only server by masking rpcbind.socket,
they'll need to mask rpc-statd.service as well. I don't think that's
too much to ask, so the nfs.systemd man page has been updated
accordingly.
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
---
systemd/nfs.systemd.man | 10 +++++++---
systemd/rpc-statd.service | 5 +++--
2 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/systemd/nfs.systemd.man b/systemd/nfs.systemd.man
index a8476038..93fb87cd 100644
--- a/systemd/nfs.systemd.man
+++ b/systemd/nfs.systemd.man
@@ -137,7 +137,9 @@ NFSv2) and does not want
.I rpcbind
to be running, the correct approach is to run
.RS
-.B systemctl mask rpcbind
+.B systemctl mask rpcbind.socket
+.br
+.B systemctl mask rpc-statd.service
.RE
This will disable
.IR rpcbind ,
@@ -145,9 +147,11 @@ and the various NFS services which depend on it (and are only needed
for NFSv3) will refuse to start, without interfering with the
operation of NFSv4 services. In particular,
.I rpc.statd
-will not run when
+will fail to start when
.I rpcbind
-is masked.
+is masked, so
+.I rpc-statd.service
+should be masked as well.
.PP
.I idmapd
is only needed for NFSv4, and even then is not needed when the client
diff --git a/systemd/rpc-statd.service b/systemd/rpc-statd.service
index 660ed861..4e138f69 100644
--- a/systemd/rpc-statd.service
+++ b/systemd/rpc-statd.service
@@ -3,10 +3,11 @@ Description=NFS status monitor for NFSv2/3 locking.
Documentation=man:rpc.statd(8)
DefaultDependencies=no
Conflicts=umount.target
-Requires=nss-lookup.target rpcbind.socket
+Requires=nss-lookup.target
+Wants=rpcbind.socket
Wants=network-online.target
Wants=rpc-statd-notify.service
-After=network-online.target nss-lookup.target rpcbind.service
+After=network-online.target nss-lookup.target rpcbind.socket
PartOf=nfs-utils.service
IgnoreOnIsolate=yes
--
2.50.1
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [nfs-utils PATCH] rpc-statd.service: weaken the dependency on rpcbind.socket 2025-09-05 22:35 [nfs-utils PATCH] rpc-statd.service: weaken the dependency on rpcbind.socket Scott Mayhew @ 2025-09-05 23:40 ` NeilBrown 2025-09-08 19:07 ` Scott Mayhew 0 siblings, 1 reply; 5+ messages in thread From: NeilBrown @ 2025-09-05 23:40 UTC (permalink / raw) To: Scott Mayhew; +Cc: steved, bcodding, yoyang, linux-nfs On Sat, 06 Sep 2025, Scott Mayhew wrote: > In 91da135f ("systemd unit files: fix up dependencies on rpcbind"), > Neil laid out the rationale for how the nfs services should define their > dependencies on rpcbind. In a nutshell: > > 1. Dependencies should only be defined using rpcbind.socket > 2. Ordering for dependencies should only be defined usint "After=" > 3. nfs-server.service should use "Wants=rpcbind.socket", to allow > rpcbind.socket to be masked in NFSv4-only setups. > 4. rpc-statd.service should use "Requires=rpcbind.socket", as rpc.statd > is useless if it can't register with rpcbind. > > Then in https://bugzilla.redhat.com/show_bug.cgi?id=2100395, Ben noted > that due to the way the dependencies are ordered, when 'systemctl stop > rpcbind.socket' is run, systemd first sends SIGTERM to rpcbind, then > SIGTERM to rpc.statd. On SIGTERM, rpcbind tears down /var/run/rpcbind.sock. > However, rpc-statd on SIGTERM attempts to unregister from rpcbind. This > results in a long delay: > > [root@rawhide ~]# time systemctl restart rpcbind.socket > > real 1m0.147s > user 0m0.004s > sys 0m0.003s > > 8a835ceb ("rpc-statd.service: Stop rpcbind and rpc.stat in an exit race") > fixed this by changing the dependency in rpc-statd.service to use > "After=rpcbind.service", bending rule #1 from above. Thanks for the thorough and detailed explanation. I'd like to suggest a different fix. Change rpc-statd.service to declare: After=network-online.target nss-lookup.target rpcbind.socket rpcbind.service i.e. it is declared to be After both the socket and the service. "After" declarations only have effect if the units are in the same transaction. If the Unit is not being started or stopped, the After declaration has no effect. So on startup, this will ensure rpcbind.socket is started before rpc-statd.service. On shutdown in a transaction that stops both rpc-statd.service and rpcbind.service, rpcbind.service won't be stopped until after rpc-statd.service is stopped. I agree that it isn't necessary to restart rpc-statd when rpcbind is restarted. Maybe that is a justification to use Wants instead of Requires. Or maybe Upholds would be even better. I wonder if putting ConditionPathIsSymbolisLink !/etc/systemd/system/rpcbind.socket in rpc-statd.service would be a suitable way to stop rpc-statd from starting if rpcbind.socket is masked. In any case I think there are two separate issues here which deserve two separate patch. 1/ shutdown ordering isn't handled correctly. Adding the extra After directive should fix that 2/ rpc.statd is restarted unnecessarily. Wants or Upholds might be part of the solution. Thanks, NeilBrown > > Yongcheng recently noted that when runnnig the following test: > > [root@rawhide ~]# for i in `seq 10`; do systemctl reset-failed; \ > systemctl stop rpcbind rpcbind.socket ; systemctl restart nfs-server ; \ > systemctl status rpc-statd; done > > rpc-statd.service would often fail to start: > > × rpc-statd.service - NFS status monitor for NFSv2/3 locking. > Loaded: loaded (/usr/lib/systemd/system/rpc-statd.service; enabled-runtime; preset: disabled) > Drop-In: /usr/lib/systemd/system/service.d > └─10-timeout-abort.conf > Active: failed (Result: exit-code) since Fri 2025-09-05 18:01:15 EDT; 229ms ago > Duration: 228ms > Invocation: bafb2bb00761439ebc348000704e8fbb > Docs: man:rpc.statd(8) > Process: 29937 ExecStart=/usr/sbin/rpc.statd (code=exited, status=1/FAILURE) > Mem peak: 1.5M > CPU: 7ms > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Version 2.8.2 starting > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Flags: TI-RPC > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, udp): svc_reg() err: RPC: Remote system error - Connection refused > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, tcp): svc_reg() err: RPC: Success > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, udp6): svc_reg() err: RPC: Success > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, tcp6): svc_reg() err: RPC: Success > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: failed to create RPC listeners, exiting > Sep 05 18:01:15 rawhide.smayhew.test systemd[1]: rpc-statd.service: Control process exited, code=exited, status=1/FAILURE > Sep 05 18:01:15 rawhide.smayhew.test systemd[1]: rpc-statd.service: Failed with result 'exit-code'. > Sep 05 18:01:15 rawhide.smayhew.test systemd[1]: Failed to start rpc-statd.service - NFS status monitor for NFSv2/3 locking.. > > I propose we revert the change from 8a835ceb and instead turn the > dependency into a weak dependency by using "Wants=rpcbind.socket" > instead of "Requires=rpcbind.socket". This bends rule #4 above and will > make it so that systemd will try to start rpcbind.socket if it isn't > already running when rpc-statd.service starts, but it won't restart > rpc-statd.service whenever rpcbind is restarted. Frankly, we shouldn't > need to restart services whenever rpcbind is restarted (thats why > rpcbind has the warmstart feature). The only drawback is that now if an > admin wants to set up an NFSv4-only server by masking rpcbind.socket, > they'll need to mask rpc-statd.service as well. I don't think that's > too much to ask, so the nfs.systemd man page has been updated > accordingly. > > Signed-off-by: Scott Mayhew <smayhew@redhat.com> > --- > systemd/nfs.systemd.man | 10 +++++++--- > systemd/rpc-statd.service | 5 +++-- > 2 files changed, 10 insertions(+), 5 deletions(-) > > diff --git a/systemd/nfs.systemd.man b/systemd/nfs.systemd.man > index a8476038..93fb87cd 100644 > --- a/systemd/nfs.systemd.man > +++ b/systemd/nfs.systemd.man > @@ -137,7 +137,9 @@ NFSv2) and does not want > .I rpcbind > to be running, the correct approach is to run > .RS > -.B systemctl mask rpcbind > +.B systemctl mask rpcbind.socket > +.br > +.B systemctl mask rpc-statd.service > .RE > This will disable > .IR rpcbind , > @@ -145,9 +147,11 @@ and the various NFS services which depend on it (and are only needed > for NFSv3) will refuse to start, without interfering with the > operation of NFSv4 services. In particular, > .I rpc.statd > -will not run when > +will fail to start when > .I rpcbind > -is masked. > +is masked, so > +.I rpc-statd.service > +should be masked as well. > .PP > .I idmapd > is only needed for NFSv4, and even then is not needed when the client > diff --git a/systemd/rpc-statd.service b/systemd/rpc-statd.service > index 660ed861..4e138f69 100644 > --- a/systemd/rpc-statd.service > +++ b/systemd/rpc-statd.service > @@ -3,10 +3,11 @@ Description=NFS status monitor for NFSv2/3 locking. > Documentation=man:rpc.statd(8) > DefaultDependencies=no > Conflicts=umount.target > -Requires=nss-lookup.target rpcbind.socket > +Requires=nss-lookup.target > +Wants=rpcbind.socket > Wants=network-online.target > Wants=rpc-statd-notify.service > -After=network-online.target nss-lookup.target rpcbind.service > +After=network-online.target nss-lookup.target rpcbind.socket > > PartOf=nfs-utils.service > IgnoreOnIsolate=yes > -- > 2.50.1 > > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [nfs-utils PATCH] rpc-statd.service: weaken the dependency on rpcbind.socket 2025-09-05 23:40 ` NeilBrown @ 2025-09-08 19:07 ` Scott Mayhew 2025-09-09 1:59 ` NeilBrown 0 siblings, 1 reply; 5+ messages in thread From: Scott Mayhew @ 2025-09-08 19:07 UTC (permalink / raw) To: NeilBrown; +Cc: steved, bcodding, yoyang, linux-nfs On Sat, 06 Sep 2025, NeilBrown wrote: > On Sat, 06 Sep 2025, Scott Mayhew wrote: > > In 91da135f ("systemd unit files: fix up dependencies on rpcbind"), > > Neil laid out the rationale for how the nfs services should define their > > dependencies on rpcbind. In a nutshell: > > > > 1. Dependencies should only be defined using rpcbind.socket > > 2. Ordering for dependencies should only be defined usint "After=" > > 3. nfs-server.service should use "Wants=rpcbind.socket", to allow > > rpcbind.socket to be masked in NFSv4-only setups. > > 4. rpc-statd.service should use "Requires=rpcbind.socket", as rpc.statd > > is useless if it can't register with rpcbind. > > > > Then in https://bugzilla.redhat.com/show_bug.cgi?id=2100395, Ben noted > > that due to the way the dependencies are ordered, when 'systemctl stop > > rpcbind.socket' is run, systemd first sends SIGTERM to rpcbind, then > > SIGTERM to rpc.statd. On SIGTERM, rpcbind tears down /var/run/rpcbind.sock. > > However, rpc-statd on SIGTERM attempts to unregister from rpcbind. This > > results in a long delay: > > > > [root@rawhide ~]# time systemctl restart rpcbind.socket > > > > real 1m0.147s > > user 0m0.004s > > sys 0m0.003s > > > > 8a835ceb ("rpc-statd.service: Stop rpcbind and rpc.stat in an exit race") > > fixed this by changing the dependency in rpc-statd.service to use > > "After=rpcbind.service", bending rule #1 from above. > > Thanks for the thorough and detailed explanation. > > I'd like to suggest a different fix. Change rpc-statd.service to > declare: > > After=network-online.target nss-lookup.target rpcbind.socket rpcbind.service > > i.e. it is declared to be After both the socket and the service. > > "After" declarations only have effect if the units are in the same > transaction. If the Unit is not being started or stopped, the After > declaration has no effect. > > So on startup, this will ensure rpcbind.socket is started before > rpc-statd.service. > On shutdown in a transaction that stops both rpc-statd.service and > rpcbind.service, rpcbind.service won't be stopped until after > rpc-statd.service is stopped. That works too. > > I agree that it isn't necessary to restart rpc-statd when rpcbind is > restarted. > Maybe that is a justification to use Wants instead of Requires. > Or maybe Upholds would be even better. I think Upholds is confusing.... especially since there aren't any existing unit files using it, at least on a stock Fedora Rawhide system. I don't see it being used on OpenSUSE Tumbleweed or Debian Trixie either. I think it's going to confuse users if they try to stop rpcbind.socket and then find that it's still running. Finally, when I tested it, it prevented me from stopping rpc-statd. Eventually the shutdown timer hit and systemd sent rpc-statd a SIGABRT, which in turned triggered the systemd-coredump handler. That's a whole mess of syslog entries that's going to more bug reports. I'd rather stick with Wants. > > I wonder if putting > > ConditionPathIsSymbolisLink !/etc/systemd/system/rpcbind.socket I'm lost. What what cause the rpcbind.socket symlink to be created directly in /etc/systemd/system? I've seen it get created in /etc/systemd/system/sockets.target.wants or /etc/systemd/system/multi-user.target.wants, but never directly in /etc/systemd/system. -Scott > > in rpc-statd.service would be a suitable way to stop rpc-statd from > starting if rpcbind.socket is masked. > > In any case I think there are two separate issues here which deserve two > separate patch. > 1/ shutdown ordering isn't handled correctly. Adding the extra After > directive should fix that > 2/ rpc.statd is restarted unnecessarily. Wants or Upholds might be part > of the solution. > > Thanks, > NeilBrown > > > > > > > Yongcheng recently noted that when runnnig the following test: > > > > [root@rawhide ~]# for i in `seq 10`; do systemctl reset-failed; \ > > systemctl stop rpcbind rpcbind.socket ; systemctl restart nfs-server ; \ > > systemctl status rpc-statd; done > > > > rpc-statd.service would often fail to start: > > > > × rpc-statd.service - NFS status monitor for NFSv2/3 locking. > > Loaded: loaded (/usr/lib/systemd/system/rpc-statd.service; enabled-runtime; preset: disabled) > > Drop-In: /usr/lib/systemd/system/service.d > > └─10-timeout-abort.conf > > Active: failed (Result: exit-code) since Fri 2025-09-05 18:01:15 EDT; 229ms ago > > Duration: 228ms > > Invocation: bafb2bb00761439ebc348000704e8fbb > > Docs: man:rpc.statd(8) > > Process: 29937 ExecStart=/usr/sbin/rpc.statd (code=exited, status=1/FAILURE) > > Mem peak: 1.5M > > CPU: 7ms > > > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Version 2.8.2 starting > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Flags: TI-RPC > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, udp): svc_reg() err: RPC: Remote system error - Connection refused > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, tcp): svc_reg() err: RPC: Success > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, udp6): svc_reg() err: RPC: Success > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, tcp6): svc_reg() err: RPC: Success > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: failed to create RPC listeners, exiting > > Sep 05 18:01:15 rawhide.smayhew.test systemd[1]: rpc-statd.service: Control process exited, code=exited, status=1/FAILURE > > Sep 05 18:01:15 rawhide.smayhew.test systemd[1]: rpc-statd.service: Failed with result 'exit-code'. > > Sep 05 18:01:15 rawhide.smayhew.test systemd[1]: Failed to start rpc-statd.service - NFS status monitor for NFSv2/3 locking.. > > > > I propose we revert the change from 8a835ceb and instead turn the > > dependency into a weak dependency by using "Wants=rpcbind.socket" > > instead of "Requires=rpcbind.socket". This bends rule #4 above and will > > make it so that systemd will try to start rpcbind.socket if it isn't > > already running when rpc-statd.service starts, but it won't restart > > rpc-statd.service whenever rpcbind is restarted. Frankly, we shouldn't > > need to restart services whenever rpcbind is restarted (thats why > > rpcbind has the warmstart feature). The only drawback is that now if an > > admin wants to set up an NFSv4-only server by masking rpcbind.socket, > > they'll need to mask rpc-statd.service as well. I don't think that's > > too much to ask, so the nfs.systemd man page has been updated > > accordingly. > > > > Signed-off-by: Scott Mayhew <smayhew@redhat.com> > > --- > > systemd/nfs.systemd.man | 10 +++++++--- > > systemd/rpc-statd.service | 5 +++-- > > 2 files changed, 10 insertions(+), 5 deletions(-) > > > > diff --git a/systemd/nfs.systemd.man b/systemd/nfs.systemd.man > > index a8476038..93fb87cd 100644 > > --- a/systemd/nfs.systemd.man > > +++ b/systemd/nfs.systemd.man > > @@ -137,7 +137,9 @@ NFSv2) and does not want > > .I rpcbind > > to be running, the correct approach is to run > > .RS > > -.B systemctl mask rpcbind > > +.B systemctl mask rpcbind.socket > > +.br > > +.B systemctl mask rpc-statd.service > > .RE > > This will disable > > .IR rpcbind , > > @@ -145,9 +147,11 @@ and the various NFS services which depend on it (and are only needed > > for NFSv3) will refuse to start, without interfering with the > > operation of NFSv4 services. In particular, > > .I rpc.statd > > -will not run when > > +will fail to start when > > .I rpcbind > > -is masked. > > +is masked, so > > +.I rpc-statd.service > > +should be masked as well. > > .PP > > .I idmapd > > is only needed for NFSv4, and even then is not needed when the client > > diff --git a/systemd/rpc-statd.service b/systemd/rpc-statd.service > > index 660ed861..4e138f69 100644 > > --- a/systemd/rpc-statd.service > > +++ b/systemd/rpc-statd.service > > @@ -3,10 +3,11 @@ Description=NFS status monitor for NFSv2/3 locking. > > Documentation=man:rpc.statd(8) > > DefaultDependencies=no > > Conflicts=umount.target > > -Requires=nss-lookup.target rpcbind.socket > > +Requires=nss-lookup.target > > +Wants=rpcbind.socket > > Wants=network-online.target > > Wants=rpc-statd-notify.service > > -After=network-online.target nss-lookup.target rpcbind.service > > +After=network-online.target nss-lookup.target rpcbind.socket > > > > PartOf=nfs-utils.service > > IgnoreOnIsolate=yes > > -- > > 2.50.1 > > > > > > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [nfs-utils PATCH] rpc-statd.service: weaken the dependency on rpcbind.socket 2025-09-08 19:07 ` Scott Mayhew @ 2025-09-09 1:59 ` NeilBrown 2025-09-09 12:49 ` Scott Mayhew 0 siblings, 1 reply; 5+ messages in thread From: NeilBrown @ 2025-09-09 1:59 UTC (permalink / raw) To: Scott Mayhew; +Cc: steved, bcodding, yoyang, linux-nfs On Tue, 09 Sep 2025, Scott Mayhew wrote: > On Sat, 06 Sep 2025, NeilBrown wrote: > > > On Sat, 06 Sep 2025, Scott Mayhew wrote: > > > In 91da135f ("systemd unit files: fix up dependencies on rpcbind"), > > > Neil laid out the rationale for how the nfs services should define their > > > dependencies on rpcbind. In a nutshell: > > > > > > 1. Dependencies should only be defined using rpcbind.socket > > > 2. Ordering for dependencies should only be defined usint "After=" > > > 3. nfs-server.service should use "Wants=rpcbind.socket", to allow > > > rpcbind.socket to be masked in NFSv4-only setups. > > > 4. rpc-statd.service should use "Requires=rpcbind.socket", as rpc.statd > > > is useless if it can't register with rpcbind. > > > > > > Then in https://bugzilla.redhat.com/show_bug.cgi?id=2100395, Ben noted > > > that due to the way the dependencies are ordered, when 'systemctl stop > > > rpcbind.socket' is run, systemd first sends SIGTERM to rpcbind, then > > > SIGTERM to rpc.statd. On SIGTERM, rpcbind tears down /var/run/rpcbind.sock. > > > However, rpc-statd on SIGTERM attempts to unregister from rpcbind. This > > > results in a long delay: > > > > > > [root@rawhide ~]# time systemctl restart rpcbind.socket > > > > > > real 1m0.147s > > > user 0m0.004s > > > sys 0m0.003s > > > > > > 8a835ceb ("rpc-statd.service: Stop rpcbind and rpc.stat in an exit race") > > > fixed this by changing the dependency in rpc-statd.service to use > > > "After=rpcbind.service", bending rule #1 from above. > > > > Thanks for the thorough and detailed explanation. > > > > I'd like to suggest a different fix. Change rpc-statd.service to > > declare: > > > > After=network-online.target nss-lookup.target rpcbind.socket rpcbind.service > > > > i.e. it is declared to be After both the socket and the service. > > > > "After" declarations only have effect if the units are in the same > > transaction. If the Unit is not being started or stopped, the After > > declaration has no effect. > > > > So on startup, this will ensure rpcbind.socket is started before > > rpc-statd.service. > > On shutdown in a transaction that stops both rpc-statd.service and > > rpcbind.service, rpcbind.service won't be stopped until after > > rpc-statd.service is stopped. > > That works too. > > > > > I agree that it isn't necessary to restart rpc-statd when rpcbind is > > restarted. > > Maybe that is a justification to use Wants instead of Requires. > > Or maybe Upholds would be even better. > > I think Upholds is confusing.... especially since there aren't any > existing unit files using it, at least on a stock Fedora Rawhide > system. I don't see it being used on OpenSUSE Tumbleweed or Debian > Trixie either. I think it's going to confuse users if they try to stop > rpcbind.socket and then find that it's still running. Finally, when I tested > it, it prevented me from stopping rpc-statd. Eventually the shutdown > timer hit and systemd sent rpc-statd a SIGABRT, which in turned > triggered the systemd-coredump handler. That's a whole mess of syslog > entries that's going to more bug reports. I'd rather stick with Wants. Thanks for testing. I've never used Upholds myself but I was reading the man page any wondered if it might be a good fit. Apparently it isn't. > > > > > I wonder if putting > > > > ConditionPathIsSymbolisLink !/etc/systemd/system/rpcbind.socket > > I'm lost. What what cause the rpcbind.socket symlink to be created > directly in /etc/systemd/system? I've seen it get created in > /etc/systemd/system/sockets.target.wants or > /etc/systemd/system/multi-user.target.wants, but never directly in > /etc/systemd/system. $ sudo systemctl mask rpcbind.socket Created symlink '/etc/systemd/system/rpcbind.socket' → '/dev/null'. I'm not sure it's a good idea. I was mostly thinking aloud. I think the After line should be changed to include both .service and .socket and that cleanly fixed just the observed problem about shutdown. And I don't think any other change should be made without a clear demonstrated need. Thanks, NeilBrown > > -Scott > > > > in rpc-statd.service would be a suitable way to stop rpc-statd from > > starting if rpcbind.socket is masked. > > > > In any case I think there are two separate issues here which deserve two > > separate patch. > > 1/ shutdown ordering isn't handled correctly. Adding the extra After > > directive should fix that > > 2/ rpc.statd is restarted unnecessarily. Wants or Upholds might be part > > of the solution. > > > > Thanks, > > NeilBrown > > > > > > > > > > > > Yongcheng recently noted that when runnnig the following test: > > > > > > [root@rawhide ~]# for i in `seq 10`; do systemctl reset-failed; \ > > > systemctl stop rpcbind rpcbind.socket ; systemctl restart nfs-server ; \ > > > systemctl status rpc-statd; done > > > > > > rpc-statd.service would often fail to start: > > > > > > × rpc-statd.service - NFS status monitor for NFSv2/3 locking. > > > Loaded: loaded (/usr/lib/systemd/system/rpc-statd.service; enabled-runtime; preset: disabled) > > > Drop-In: /usr/lib/systemd/system/service.d > > > └─10-timeout-abort.conf > > > Active: failed (Result: exit-code) since Fri 2025-09-05 18:01:15 EDT; 229ms ago > > > Duration: 228ms > > > Invocation: bafb2bb00761439ebc348000704e8fbb > > > Docs: man:rpc.statd(8) > > > Process: 29937 ExecStart=/usr/sbin/rpc.statd (code=exited, status=1/FAILURE) > > > Mem peak: 1.5M > > > CPU: 7ms > > > > > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Version 2.8.2 starting > > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Flags: TI-RPC > > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, udp): svc_reg() err: RPC: Remote system error - Connection refused > > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, tcp): svc_reg() err: RPC: Success > > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, udp6): svc_reg() err: RPC: Success > > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, tcp6): svc_reg() err: RPC: Success > > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: failed to create RPC listeners, exiting > > > Sep 05 18:01:15 rawhide.smayhew.test systemd[1]: rpc-statd.service: Control process exited, code=exited, status=1/FAILURE > > > Sep 05 18:01:15 rawhide.smayhew.test systemd[1]: rpc-statd.service: Failed with result 'exit-code'. > > > Sep 05 18:01:15 rawhide.smayhew.test systemd[1]: Failed to start rpc-statd.service - NFS status monitor for NFSv2/3 locking.. > > > > > > I propose we revert the change from 8a835ceb and instead turn the > > > dependency into a weak dependency by using "Wants=rpcbind.socket" > > > instead of "Requires=rpcbind.socket". This bends rule #4 above and will > > > make it so that systemd will try to start rpcbind.socket if it isn't > > > already running when rpc-statd.service starts, but it won't restart > > > rpc-statd.service whenever rpcbind is restarted. Frankly, we shouldn't > > > need to restart services whenever rpcbind is restarted (thats why > > > rpcbind has the warmstart feature). The only drawback is that now if an > > > admin wants to set up an NFSv4-only server by masking rpcbind.socket, > > > they'll need to mask rpc-statd.service as well. I don't think that's > > > too much to ask, so the nfs.systemd man page has been updated > > > accordingly. > > > > > > Signed-off-by: Scott Mayhew <smayhew@redhat.com> > > > --- > > > systemd/nfs.systemd.man | 10 +++++++--- > > > systemd/rpc-statd.service | 5 +++-- > > > 2 files changed, 10 insertions(+), 5 deletions(-) > > > > > > diff --git a/systemd/nfs.systemd.man b/systemd/nfs.systemd.man > > > index a8476038..93fb87cd 100644 > > > --- a/systemd/nfs.systemd.man > > > +++ b/systemd/nfs.systemd.man > > > @@ -137,7 +137,9 @@ NFSv2) and does not want > > > .I rpcbind > > > to be running, the correct approach is to run > > > .RS > > > -.B systemctl mask rpcbind > > > +.B systemctl mask rpcbind.socket > > > +.br > > > +.B systemctl mask rpc-statd.service > > > .RE > > > This will disable > > > .IR rpcbind , > > > @@ -145,9 +147,11 @@ and the various NFS services which depend on it (and are only needed > > > for NFSv3) will refuse to start, without interfering with the > > > operation of NFSv4 services. In particular, > > > .I rpc.statd > > > -will not run when > > > +will fail to start when > > > .I rpcbind > > > -is masked. > > > +is masked, so > > > +.I rpc-statd.service > > > +should be masked as well. > > > .PP > > > .I idmapd > > > is only needed for NFSv4, and even then is not needed when the client > > > diff --git a/systemd/rpc-statd.service b/systemd/rpc-statd.service > > > index 660ed861..4e138f69 100644 > > > --- a/systemd/rpc-statd.service > > > +++ b/systemd/rpc-statd.service > > > @@ -3,10 +3,11 @@ Description=NFS status monitor for NFSv2/3 locking. > > > Documentation=man:rpc.statd(8) > > > DefaultDependencies=no > > > Conflicts=umount.target > > > -Requires=nss-lookup.target rpcbind.socket > > > +Requires=nss-lookup.target > > > +Wants=rpcbind.socket > > > Wants=network-online.target > > > Wants=rpc-statd-notify.service > > > -After=network-online.target nss-lookup.target rpcbind.service > > > +After=network-online.target nss-lookup.target rpcbind.socket > > > > > > PartOf=nfs-utils.service > > > IgnoreOnIsolate=yes > > > -- > > > 2.50.1 > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [nfs-utils PATCH] rpc-statd.service: weaken the dependency on rpcbind.socket 2025-09-09 1:59 ` NeilBrown @ 2025-09-09 12:49 ` Scott Mayhew 0 siblings, 0 replies; 5+ messages in thread From: Scott Mayhew @ 2025-09-09 12:49 UTC (permalink / raw) To: NeilBrown; +Cc: steved, bcodding, yoyang, linux-nfs On Tue, 09 Sep 2025, NeilBrown wrote: > On Tue, 09 Sep 2025, Scott Mayhew wrote: > > On Sat, 06 Sep 2025, NeilBrown wrote: > > > > > On Sat, 06 Sep 2025, Scott Mayhew wrote: > > > > In 91da135f ("systemd unit files: fix up dependencies on rpcbind"), > > > > Neil laid out the rationale for how the nfs services should define their > > > > dependencies on rpcbind. In a nutshell: > > > > > > > > 1. Dependencies should only be defined using rpcbind.socket > > > > 2. Ordering for dependencies should only be defined usint "After=" > > > > 3. nfs-server.service should use "Wants=rpcbind.socket", to allow > > > > rpcbind.socket to be masked in NFSv4-only setups. > > > > 4. rpc-statd.service should use "Requires=rpcbind.socket", as rpc.statd > > > > is useless if it can't register with rpcbind. > > > > > > > > Then in https://bugzilla.redhat.com/show_bug.cgi?id=2100395, Ben noted > > > > that due to the way the dependencies are ordered, when 'systemctl stop > > > > rpcbind.socket' is run, systemd first sends SIGTERM to rpcbind, then > > > > SIGTERM to rpc.statd. On SIGTERM, rpcbind tears down /var/run/rpcbind.sock. > > > > However, rpc-statd on SIGTERM attempts to unregister from rpcbind. This > > > > results in a long delay: > > > > > > > > [root@rawhide ~]# time systemctl restart rpcbind.socket > > > > > > > > real 1m0.147s > > > > user 0m0.004s > > > > sys 0m0.003s > > > > > > > > 8a835ceb ("rpc-statd.service: Stop rpcbind and rpc.stat in an exit race") > > > > fixed this by changing the dependency in rpc-statd.service to use > > > > "After=rpcbind.service", bending rule #1 from above. > > > > > > Thanks for the thorough and detailed explanation. > > > > > > I'd like to suggest a different fix. Change rpc-statd.service to > > > declare: > > > > > > After=network-online.target nss-lookup.target rpcbind.socket rpcbind.service > > > > > > i.e. it is declared to be After both the socket and the service. > > > > > > "After" declarations only have effect if the units are in the same > > > transaction. If the Unit is not being started or stopped, the After > > > declaration has no effect. > > > > > > So on startup, this will ensure rpcbind.socket is started before > > > rpc-statd.service. > > > On shutdown in a transaction that stops both rpc-statd.service and > > > rpcbind.service, rpcbind.service won't be stopped until after > > > rpc-statd.service is stopped. > > > > That works too. > > > > > > > > I agree that it isn't necessary to restart rpc-statd when rpcbind is > > > restarted. > > > Maybe that is a justification to use Wants instead of Requires. > > > Or maybe Upholds would be even better. > > > > I think Upholds is confusing.... especially since there aren't any > > existing unit files using it, at least on a stock Fedora Rawhide > > system. I don't see it being used on OpenSUSE Tumbleweed or Debian > > Trixie either. I think it's going to confuse users if they try to stop > > rpcbind.socket and then find that it's still running. Finally, when I tested > > it, it prevented me from stopping rpc-statd. Eventually the shutdown > > timer hit and systemd sent rpc-statd a SIGABRT, which in turned > > triggered the systemd-coredump handler. That's a whole mess of syslog > > entries that's going to more bug reports. I'd rather stick with Wants. > > Thanks for testing. I've never used Upholds myself but I was reading > the man page any wondered if it might be a good fit. Apparently it > isn't. > > > > > > > > > > I wonder if putting > > > > > > ConditionPathIsSymbolisLink !/etc/systemd/system/rpcbind.socket > > > > I'm lost. What what cause the rpcbind.socket symlink to be created > > directly in /etc/systemd/system? I've seen it get created in > > /etc/systemd/system/sockets.target.wants or > > /etc/systemd/system/multi-user.target.wants, but never directly in > > /etc/systemd/system. > > $ sudo systemctl mask rpcbind.socket > Created symlink '/etc/systemd/system/rpcbind.socket' → '/dev/null'. > > I'm not sure it's a good idea. I was mostly thinking aloud. Doh! Apparently I wasn't thinking at all. > > I think the After line should be changed to include both .service and > .socket and that cleanly fixed just the observed problem about shutdown. > > And I don't think any other change should be made without a clear > demonstrated need. Sounds good. Thanks for the feedback. -Scott > > Thanks, > NeilBrown > > > > > > -Scott > > > > > > in rpc-statd.service would be a suitable way to stop rpc-statd from > > > starting if rpcbind.socket is masked. > > > > > > In any case I think there are two separate issues here which deserve two > > > separate patch. > > > 1/ shutdown ordering isn't handled correctly. Adding the extra After > > > directive should fix that > > > 2/ rpc.statd is restarted unnecessarily. Wants or Upholds might be part > > > of the solution. > > > > > > Thanks, > > > NeilBrown > > > > > > > > > > > > > > > > > Yongcheng recently noted that when runnnig the following test: > > > > > > > > [root@rawhide ~]# for i in `seq 10`; do systemctl reset-failed; \ > > > > systemctl stop rpcbind rpcbind.socket ; systemctl restart nfs-server ; \ > > > > systemctl status rpc-statd; done > > > > > > > > rpc-statd.service would often fail to start: > > > > > > > > × rpc-statd.service - NFS status monitor for NFSv2/3 locking. > > > > Loaded: loaded (/usr/lib/systemd/system/rpc-statd.service; enabled-runtime; preset: disabled) > > > > Drop-In: /usr/lib/systemd/system/service.d > > > > └─10-timeout-abort.conf > > > > Active: failed (Result: exit-code) since Fri 2025-09-05 18:01:15 EDT; 229ms ago > > > > Duration: 228ms > > > > Invocation: bafb2bb00761439ebc348000704e8fbb > > > > Docs: man:rpc.statd(8) > > > > Process: 29937 ExecStart=/usr/sbin/rpc.statd (code=exited, status=1/FAILURE) > > > > Mem peak: 1.5M > > > > CPU: 7ms > > > > > > > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Version 2.8.2 starting > > > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Flags: TI-RPC > > > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, udp): svc_reg() err: RPC: Remote system error - Connection refused > > > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, tcp): svc_reg() err: RPC: Success > > > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, udp6): svc_reg() err: RPC: Success > > > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, tcp6): svc_reg() err: RPC: Success > > > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: failed to create RPC listeners, exiting > > > > Sep 05 18:01:15 rawhide.smayhew.test systemd[1]: rpc-statd.service: Control process exited, code=exited, status=1/FAILURE > > > > Sep 05 18:01:15 rawhide.smayhew.test systemd[1]: rpc-statd.service: Failed with result 'exit-code'. > > > > Sep 05 18:01:15 rawhide.smayhew.test systemd[1]: Failed to start rpc-statd.service - NFS status monitor for NFSv2/3 locking.. > > > > > > > > I propose we revert the change from 8a835ceb and instead turn the > > > > dependency into a weak dependency by using "Wants=rpcbind.socket" > > > > instead of "Requires=rpcbind.socket". This bends rule #4 above and will > > > > make it so that systemd will try to start rpcbind.socket if it isn't > > > > already running when rpc-statd.service starts, but it won't restart > > > > rpc-statd.service whenever rpcbind is restarted. Frankly, we shouldn't > > > > need to restart services whenever rpcbind is restarted (thats why > > > > rpcbind has the warmstart feature). The only drawback is that now if an > > > > admin wants to set up an NFSv4-only server by masking rpcbind.socket, > > > > they'll need to mask rpc-statd.service as well. I don't think that's > > > > too much to ask, so the nfs.systemd man page has been updated > > > > accordingly. > > > > > > > > Signed-off-by: Scott Mayhew <smayhew@redhat.com> > > > > --- > > > > systemd/nfs.systemd.man | 10 +++++++--- > > > > systemd/rpc-statd.service | 5 +++-- > > > > 2 files changed, 10 insertions(+), 5 deletions(-) > > > > > > > > diff --git a/systemd/nfs.systemd.man b/systemd/nfs.systemd.man > > > > index a8476038..93fb87cd 100644 > > > > --- a/systemd/nfs.systemd.man > > > > +++ b/systemd/nfs.systemd.man > > > > @@ -137,7 +137,9 @@ NFSv2) and does not want > > > > .I rpcbind > > > > to be running, the correct approach is to run > > > > .RS > > > > -.B systemctl mask rpcbind > > > > +.B systemctl mask rpcbind.socket > > > > +.br > > > > +.B systemctl mask rpc-statd.service > > > > .RE > > > > This will disable > > > > .IR rpcbind , > > > > @@ -145,9 +147,11 @@ and the various NFS services which depend on it (and are only needed > > > > for NFSv3) will refuse to start, without interfering with the > > > > operation of NFSv4 services. In particular, > > > > .I rpc.statd > > > > -will not run when > > > > +will fail to start when > > > > .I rpcbind > > > > -is masked. > > > > +is masked, so > > > > +.I rpc-statd.service > > > > +should be masked as well. > > > > .PP > > > > .I idmapd > > > > is only needed for NFSv4, and even then is not needed when the client > > > > diff --git a/systemd/rpc-statd.service b/systemd/rpc-statd.service > > > > index 660ed861..4e138f69 100644 > > > > --- a/systemd/rpc-statd.service > > > > +++ b/systemd/rpc-statd.service > > > > @@ -3,10 +3,11 @@ Description=NFS status monitor for NFSv2/3 locking. > > > > Documentation=man:rpc.statd(8) > > > > DefaultDependencies=no > > > > Conflicts=umount.target > > > > -Requires=nss-lookup.target rpcbind.socket > > > > +Requires=nss-lookup.target > > > > +Wants=rpcbind.socket > > > > Wants=network-online.target > > > > Wants=rpc-statd-notify.service > > > > -After=network-online.target nss-lookup.target rpcbind.service > > > > +After=network-online.target nss-lookup.target rpcbind.socket > > > > > > > > PartOf=nfs-utils.service > > > > IgnoreOnIsolate=yes > > > > -- > > > > 2.50.1 > > > > > > > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-09-09 12:49 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-09-05 22:35 [nfs-utils PATCH] rpc-statd.service: weaken the dependency on rpcbind.socket Scott Mayhew 2025-09-05 23:40 ` NeilBrown 2025-09-08 19:07 ` Scott Mayhew 2025-09-09 1:59 ` NeilBrown 2025-09-09 12:49 ` Scott Mayhew
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox