* help with understanding match_fsid() errors in nfs-utils
@ 2023-09-27 2:10 NeilBrown
2023-09-27 23:00 ` Trond Myklebust
0 siblings, 1 reply; 4+ messages in thread
From: NeilBrown @ 2023-09-27 2:10 UTC (permalink / raw)
To: Trond Myklebust, Steve Dickson; +Cc: linux-nfs
hi Trond,
I'm trying to understand
Commit 76c21e3f70a8 ("mountd: Check the stat() return values in match_fsid()")
in nfs-utils.
The effect of this patch is that if a 'stat' of any path in
/etc/exports or any mountpoint below any path marked crossmnt fails
with an error other than one of a small set, then the fsid to path
lookup aborts without reporting anything to the kernel, so the kernel
doesn't reply to the client and the mount attempt blocks indefinitely.
I have seen this happen when "/" is exported crossmnt, and when a stat
of /run/user/1000/doc returns EACCES. This is a "fuse" mount for user
1000, and presumably it rejects any access from any other user.
Could you please help me understand what this patch was trying to
achieve? What sorts of errors were you expecting this to catch?
Would it make sense to silently ignore the stat failure for paths that
were found when scanning the mount table, and only take the more
drastic action for paths explicitly listed in /etc/exports ??
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: help with understanding match_fsid() errors in nfs-utils 2023-09-27 2:10 help with understanding match_fsid() errors in nfs-utils NeilBrown @ 2023-09-27 23:00 ` Trond Myklebust 2023-10-10 23:38 ` NeilBrown 0 siblings, 1 reply; 4+ messages in thread From: Trond Myklebust @ 2023-09-27 23:00 UTC (permalink / raw) To: neilb@suse.de, steved@redhat.com; +Cc: linux-nfs@vger.kernel.org On Wed, 2023-09-27 at 12:10 +1000, NeilBrown wrote: > > hi Trond, > I'm trying to understand > > Commit 76c21e3f70a8 ("mountd: Check the stat() return values in > match_fsid()") > > in nfs-utils. > > The effect of this patch is that if a 'stat' of any path in > /etc/exports or any mountpoint below any path marked crossmnt fails > with an error other than one of a small set, then the fsid to path > lookup aborts without reporting anything to the kernel, so the > kernel > doesn't reply to the client and the mount attempt blocks > indefinitely. > > I have seen this happen when "/" is exported crossmnt, and when a > stat > of /run/user/1000/doc returns EACCES. This is a "fuse" mount for > user > 1000, and presumably it rejects any access from any other user. > > Could you please help me understand what this patch was trying to > achieve? What sorts of errors were you expecting this to catch? > Would it make sense to silently ignore the stat failure for paths > that > were found when scanning the mount table, and only take the more > drastic action for paths explicitly listed in /etc/exports ?? > > The point is that if the filesystem is re-exported NFS, and you mount as 'softerr' in order to avoid hanging your knfsd server on ephemeral errors, then you have to be more careful about which errors are fatal, and hence need to be propagated to the kernel export/path cache, and which ones are just temporary due to a network partition or server reboot. Hence the creation of path_lookup_error() which differentiates between the two cases. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: help with understanding match_fsid() errors in nfs-utils 2023-09-27 23:00 ` Trond Myklebust @ 2023-10-10 23:38 ` NeilBrown 2023-10-11 14:33 ` Trond Myklebust 0 siblings, 1 reply; 4+ messages in thread From: NeilBrown @ 2023-10-10 23:38 UTC (permalink / raw) To: Trond Myklebust; +Cc: steved@redhat.com, linux-nfs@vger.kernel.org On Thu, 28 Sep 2023, Trond Myklebust wrote: > On Wed, 2023-09-27 at 12:10 +1000, NeilBrown wrote: > > > > hi Trond, > > I'm trying to understand > > > > Commit 76c21e3f70a8 ("mountd: Check the stat() return values in > > match_fsid()") > > > > in nfs-utils. > > > > The effect of this patch is that if a 'stat' of any path in > > /etc/exports or any mountpoint below any path marked crossmnt fails > > with an error other than one of a small set, then the fsid to path > > lookup aborts without reporting anything to the kernel, so the > > kernel > > doesn't reply to the client and the mount attempt blocks > > indefinitely. > > > > I have seen this happen when "/" is exported crossmnt, and when a > > stat > > of /run/user/1000/doc returns EACCES. This is a "fuse" mount for > > user > > 1000, and presumably it rejects any access from any other user. > > > > Could you please help me understand what this patch was trying to > > achieve? What sorts of errors were you expecting this to catch? > > Would it make sense to silently ignore the stat failure for paths > > that > > were found when scanning the mount table, and only take the more > > drastic action for paths explicitly listed in /etc/exports ?? > > > > > > The point is that if the filesystem is re-exported NFS, and you mount > as 'softerr' in order to avoid hanging your knfsd server on ephemeral > errors, then you have to be more careful about which errors are fatal, > and hence need to be propagated to the kernel export/path cache, and > which ones are just temporary due to a network partition or server > reboot. > > Hence the creation of path_lookup_error() which differentiates between > the two cases. Thanks for the explanation - and sorry about the delay in following up - leave and stuff.... If I understand the 'softerr' option correctly, the particular error code you need to catch in this case is ETIMEDOUT. That is the error the mountd would get when stating an NFS mount that wasn't responding. So it would be safe for me to add EACCES to the set of errors that path_lookup_error() checks - just as long as I don't add ETIMEDOUT. Do you agree? But I think there is another issue here. If the 'stat' of the mountpoint returns ETIMEDOUT we want a transient failure so that the kernel will ask again - presumably when the client asks again. That doesn't happen with the current code. When the kernel queues a request to mountd, it assumes that the request WILL be answered. It never resends the same request. So mountd must reply with something. I think that returning a negative result with an expiry that has already passed might have the desired result, but we would need to test that. Thanks, NeilBrown ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: help with understanding match_fsid() errors in nfs-utils 2023-10-10 23:38 ` NeilBrown @ 2023-10-11 14:33 ` Trond Myklebust 0 siblings, 0 replies; 4+ messages in thread From: Trond Myklebust @ 2023-10-11 14:33 UTC (permalink / raw) To: neilb@suse.de; +Cc: linux-nfs@vger.kernel.org, steved@redhat.com On Wed, 2023-10-11 at 10:38 +1100, NeilBrown wrote: > On Thu, 28 Sep 2023, Trond Myklebust wrote: > > On Wed, 2023-09-27 at 12:10 +1000, NeilBrown wrote: > > > > > > hi Trond, > > > I'm trying to understand > > > > > > Commit 76c21e3f70a8 ("mountd: Check the stat() return values in > > > match_fsid()") > > > > > > in nfs-utils. > > > > > > The effect of this patch is that if a 'stat' of any path in > > > /etc/exports or any mountpoint below any path marked crossmnt > > > fails > > > with an error other than one of a small set, then the fsid to > > > path > > > lookup aborts without reporting anything to the kernel, so the > > > kernel > > > doesn't reply to the client and the mount attempt blocks > > > indefinitely. > > > > > > I have seen this happen when "/" is exported crossmnt, and when > > > a > > > stat > > > of /run/user/1000/doc returns EACCES. This is a "fuse" mount > > > for > > > user > > > 1000, and presumably it rejects any access from any other user. > > > > > > Could you please help me understand what this patch was trying > > > to > > > achieve? What sorts of errors were you expecting this to catch? > > > Would it make sense to silently ignore the stat failure for > > > paths > > > that > > > were found when scanning the mount table, and only take the more > > > drastic action for paths explicitly listed in /etc/exports ?? > > > > > > > > > > The point is that if the filesystem is re-exported NFS, and you > > mount > > as 'softerr' in order to avoid hanging your knfsd server on > > ephemeral > > errors, then you have to be more careful about which errors are > > fatal, > > and hence need to be propagated to the kernel export/path cache, > > and > > which ones are just temporary due to a network partition or server > > reboot. > > > > Hence the creation of path_lookup_error() which differentiates > > between > > the two cases. > > Thanks for the explanation - and sorry about the delay in following > up - > leave and stuff.... > > If I understand the 'softerr' option correctly, the particular error > code you need to catch in this case is ETIMEDOUT. That is the error > the > mountd would get when stating an NFS mount that wasn't responding. > So it would be safe for me to add EACCES to the set of errors that > path_lookup_error() checks - just as long as I don't add ETIMEDOUT. > Do you agree? > I think that should be safe, but it is worth testing. I don't remember exactly why I excluded EACCES from the list. > But I think there is another issue here. If the 'stat' of the > mountpoint returns ETIMEDOUT we want a transient failure so that the > kernel will ask again - presumably when the client asks again. > That doesn't happen with the current code. When the kernel queues a > request to mountd, it assumes that the request WILL be answered. It > never resends the same request. > > So mountd must reply with something. I think that returning a > negative > result with an expiry that has already passed might have the desired > result, but we would need to test that. > Ideally, we probably normally want to pass the filesystem error down into knfsd so that it can do the right thing. For ETIMEDOUT, that should result in an NFS4ERR_DELAY/NFS3ERR_JUKEBOX being sent to the client, which is what we want here. I'm not sure about caching these errors. It seems to me that while ETIMEDOUT might allow a time-bounded cache, the EACCES is a per-user error, so can't really be cached unless you are able to cache it as a per-user thing in knfsd. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-10-11 14:33 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-09-27 2:10 help with understanding match_fsid() errors in nfs-utils NeilBrown 2023-09-27 23:00 ` Trond Myklebust 2023-10-10 23:38 ` NeilBrown 2023-10-11 14:33 ` Trond Myklebust
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).