* Re: [PATCH] knfsd: fix the fallback implementation of the get_name export operation [not found] <20231228201510.985235-1-trondmy@kernel.org> @ 2023-12-29 5:46 ` Amir Goldstein 2023-12-29 14:34 ` Chuck Lever 2023-12-29 15:21 ` Trond Myklebust 0 siblings, 2 replies; 10+ messages in thread From: Amir Goldstein @ 2023-12-29 5:46 UTC (permalink / raw) To: trondmy; +Cc: Chuck Lever, Linux NFS Mailing List, linux-fsdevel, Al Viro [CC: fsdevel, viro] On Thu, Dec 28, 2023 at 10:22 PM <trondmy@kernel.org> wrote: > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > The fallback implementation for the get_name export operation uses > readdir() to try to match the inode number to a filename. That filename > is then used together with lookup_one() to produce a dentry. > A problem arises when we match the '.' or '..' entries, since that > causes lookup_one() to fail. This has sometimes been seen to occur for > filesystems that violate POSIX requirements around uniqueness of inode > numbers, something that is common for snapshot directories. Ouch. Nasty. Looks to me like the root cause is "filesystems that violate POSIX requirements around uniqueness of inode numbers". This violation can cause any of the parent's children to wrongly match get_name() not only '.' and '..' and fail the d_inode sanity check after lookup_one(). I understand why this would be common with parent of snapshot dir, but the only fs that support snapshots that I know of (btrfs, bcachefs) do implement ->get_name(), so which filesystem did you encounter this behavior with? can it be fixed by implementing a snapshot aware ->get_name()? > > This patch just ensures that we skip '.' and '..' rather than allowing a > match. I agree that skipping '.' and '..' makes sense, but... > > Fixes: 21d8a15ac333 ("lookup_one_len: don't accept . and ..") ...This Fixes is a bit odd to me. Does the problem go away if the Fixes patch is reverted? I don't think so, I think you would just hit the d_inode sanity check after lookup_one() succeeds. Maybe I did not understand the problem then. > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> > --- > fs/exportfs/expfs.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c > index 3ae0154c5680..84af58eaf2ca 100644 > --- a/fs/exportfs/expfs.c > +++ b/fs/exportfs/expfs.c > @@ -255,7 +255,9 @@ static bool filldir_one(struct dir_context *ctx, const char *name, int len, > container_of(ctx, struct getdents_callback, ctx); > > buf->sequence++; > - if (buf->ino == ino && len <= NAME_MAX) { > + /* Ignore the '.' and '..' entries */ > + if ((len > 2 || name[0] != '.' || (len == 2 && name[1] != '.')) && I wish I did not have to review that this condition is correct. I wish there was a common helper is_dot_or_dotdot() that would be used here as !is_dot_dotdot(name, len). I found 3 copies of is_dot_dotdot(). I didn't even try to find how many places have open coded this. Thanks, Amir. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] knfsd: fix the fallback implementation of the get_name export operation 2023-12-29 5:46 ` [PATCH] knfsd: fix the fallback implementation of the get_name export operation Amir Goldstein @ 2023-12-29 14:34 ` Chuck Lever 2023-12-29 17:44 ` Amir Goldstein 2023-12-29 15:21 ` Trond Myklebust 1 sibling, 1 reply; 10+ messages in thread From: Chuck Lever @ 2023-12-29 14:34 UTC (permalink / raw) To: Amir Goldstein; +Cc: trondmy, Linux NFS Mailing List, linux-fsdevel, Al Viro On Fri, Dec 29, 2023 at 07:46:54AM +0200, Amir Goldstein wrote: > [CC: fsdevel, viro] Thanks for picking this up, Amir, and for copying viro/fsdevel. I was planning to repost this next week when more folks are back, but this works too. Trond, if you'd like, I can handle review changes if you don't have time to follow up. > On Thu, Dec 28, 2023 at 10:22 PM <trondmy@kernel.org> wrote: > > > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > The fallback implementation for the get_name export operation uses > > readdir() to try to match the inode number to a filename. That filename > > is then used together with lookup_one() to produce a dentry. > > A problem arises when we match the '.' or '..' entries, since that > > causes lookup_one() to fail. This has sometimes been seen to occur for > > filesystems that violate POSIX requirements around uniqueness of inode > > numbers, something that is common for snapshot directories. > > Ouch. Nasty. > > Looks to me like the root cause is "filesystems that violate POSIX > requirements around uniqueness of inode numbers". > This violation can cause any of the parent's children to wrongly match > get_name() not only '.' and '..' and fail the d_inode sanity check after > lookup_one(). > > I understand why this would be common with parent of snapshot dir, > but the only fs that support snapshots that I know of (btrfs, bcachefs) > do implement ->get_name(), so which filesystem did you encounter > this behavior with? can it be fixed by implementing a snapshot > aware ->get_name()? > > > This patch just ensures that we skip '.' and '..' rather than allowing a > > match. > > I agree that skipping '.' and '..' makes sense, but... Does skipping '.' and '..' make sense for file systems that do indeed guarantee inode number uniqueness? Given your explanation here, I'm wondering whether the generic get_name() function is the right place to address the issue. > > Fixes: 21d8a15ac333 ("lookup_one_len: don't accept . and ..") > > ...This Fixes is a bit odd to me. Me too, but I didn't see a more obvious choice. Maybe drop the specific Fixes: tag in favor of just Cc: stable. > Does the problem go away if the Fixes patch is reverted? > I don't think so, I think you would just hit the d_inode sanity check > after lookup_one() succeeds. > Maybe I did not understand the problem then. > > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> > > --- > > fs/exportfs/expfs.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c > > index 3ae0154c5680..84af58eaf2ca 100644 > > --- a/fs/exportfs/expfs.c > > +++ b/fs/exportfs/expfs.c > > @@ -255,7 +255,9 @@ static bool filldir_one(struct dir_context *ctx, const char *name, int len, > > container_of(ctx, struct getdents_callback, ctx); > > > > buf->sequence++; > > - if (buf->ino == ino && len <= NAME_MAX) { > > + /* Ignore the '.' and '..' entries */ > > + if ((len > 2 || name[0] != '.' || (len == 2 && name[1] != '.')) && > > I wish I did not have to review that this condition is correct. > I wish there was a common helper is_dot_or_dotdot() that would be > used here as !is_dot_dotdot(name, len). > I found 3 copies of is_dot_dotdot(). > I didn't even try to find how many places have open coded this. -- Chuck Lever ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] knfsd: fix the fallback implementation of the get_name export operation 2023-12-29 14:34 ` Chuck Lever @ 2023-12-29 17:44 ` Amir Goldstein 2023-12-29 23:29 ` Chuck Lever 0 siblings, 1 reply; 10+ messages in thread From: Amir Goldstein @ 2023-12-29 17:44 UTC (permalink / raw) To: Chuck Lever; +Cc: trondmy, Linux NFS Mailing List, linux-fsdevel, Al Viro On Fri, Dec 29, 2023 at 4:35 PM Chuck Lever <chuck.lever@oracle.com> wrote: > > On Fri, Dec 29, 2023 at 07:46:54AM +0200, Amir Goldstein wrote: > > [CC: fsdevel, viro] > > Thanks for picking this up, Amir, and for copying viro/fsdevel. I > was planning to repost this next week when more folks are back, but > this works too. > > Trond, if you'd like, I can handle review changes if you don't have > time to follow up. > > > > On Thu, Dec 28, 2023 at 10:22 PM <trondmy@kernel.org> wrote: > > > > > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > > > The fallback implementation for the get_name export operation uses > > > readdir() to try to match the inode number to a filename. That filename > > > is then used together with lookup_one() to produce a dentry. > > > A problem arises when we match the '.' or '..' entries, since that > > > causes lookup_one() to fail. This has sometimes been seen to occur for > > > filesystems that violate POSIX requirements around uniqueness of inode > > > numbers, something that is common for snapshot directories. > > > > Ouch. Nasty. > > > > Looks to me like the root cause is "filesystems that violate POSIX > > requirements around uniqueness of inode numbers". > > This violation can cause any of the parent's children to wrongly match > > get_name() not only '.' and '..' and fail the d_inode sanity check after > > lookup_one(). > > > > I understand why this would be common with parent of snapshot dir, > > but the only fs that support snapshots that I know of (btrfs, bcachefs) > > do implement ->get_name(), so which filesystem did you encounter > > this behavior with? can it be fixed by implementing a snapshot > > aware ->get_name()? > > > > > This patch just ensures that we skip '.' and '..' rather than allowing a > > > match. > > > > I agree that skipping '.' and '..' makes sense, but... > > Does skipping '.' and '..' make sense for file systems that do It makes sense because if the child's name in its parent would have been "." or ".." it would have been its own parent or its own grandparent (ELOOP situation). IOW, we can safely skip "." and "..", regardless of anything else. > indeed guarantee inode number uniqueness? Given your explanation > here, I'm wondering whether the generic get_name() function is the > right place to address the issue. > > > > > Fixes: 21d8a15ac333 ("lookup_one_len: don't accept . and ..") > > > > ...This Fixes is a bit odd to me. > > Me too, but I didn't see a more obvious choice. Maybe drop the > specific Fixes: tag in favor of just Cc: stable. > Yeh, that'd be fine. Thanks, Amir. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] knfsd: fix the fallback implementation of the get_name export operation 2023-12-29 17:44 ` Amir Goldstein @ 2023-12-29 23:29 ` Chuck Lever 2023-12-29 23:49 ` Trond Myklebust 0 siblings, 1 reply; 10+ messages in thread From: Chuck Lever @ 2023-12-29 23:29 UTC (permalink / raw) To: Amir Goldstein; +Cc: trondmy, Linux NFS Mailing List, linux-fsdevel, Al Viro On Fri, Dec 29, 2023 at 07:44:20PM +0200, Amir Goldstein wrote: > On Fri, Dec 29, 2023 at 4:35 PM Chuck Lever <chuck.lever@oracle.com> wrote: > > > > On Fri, Dec 29, 2023 at 07:46:54AM +0200, Amir Goldstein wrote: > > > [CC: fsdevel, viro] > > > > Thanks for picking this up, Amir, and for copying viro/fsdevel. I > > was planning to repost this next week when more folks are back, but > > this works too. > > > > Trond, if you'd like, I can handle review changes if you don't have > > time to follow up. > > > > > > > On Thu, Dec 28, 2023 at 10:22 PM <trondmy@kernel.org> wrote: > > > > > > > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > > > > > The fallback implementation for the get_name export operation uses > > > > readdir() to try to match the inode number to a filename. That filename > > > > is then used together with lookup_one() to produce a dentry. > > > > A problem arises when we match the '.' or '..' entries, since that > > > > causes lookup_one() to fail. This has sometimes been seen to occur for > > > > filesystems that violate POSIX requirements around uniqueness of inode > > > > numbers, something that is common for snapshot directories. > > > > > > Ouch. Nasty. > > > > > > Looks to me like the root cause is "filesystems that violate POSIX > > > requirements around uniqueness of inode numbers". > > > This violation can cause any of the parent's children to wrongly match > > > get_name() not only '.' and '..' and fail the d_inode sanity check after > > > lookup_one(). > > > > > > I understand why this would be common with parent of snapshot dir, > > > but the only fs that support snapshots that I know of (btrfs, bcachefs) > > > do implement ->get_name(), so which filesystem did you encounter > > > this behavior with? can it be fixed by implementing a snapshot > > > aware ->get_name()? > > > > > > > This patch just ensures that we skip '.' and '..' rather than allowing a > > > > match. > > > > > > I agree that skipping '.' and '..' makes sense, but... > > > > Does skipping '.' and '..' make sense for file systems that do > > It makes sense because if the child's name in its parent would > have been "." or ".." it would have been its own parent or its own > grandparent (ELOOP situation). > IOW, we can safely skip "." and "..", regardless of anything else. This new comment: + /* Ignore the '.' and '..' entries */ then seems inadequate to explain why dot and dot-dot are now never matched. Perhaps the function's documenting comment could expand on this a little. I'll give it some thought. > > indeed guarantee inode number uniqueness? Given your explanation > > here, I'm wondering whether the generic get_name() function is the > > right place to address the issue. -- Chuck Lever ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] knfsd: fix the fallback implementation of the get_name export operation 2023-12-29 23:29 ` Chuck Lever @ 2023-12-29 23:49 ` Trond Myklebust 2023-12-30 6:23 ` Amir Goldstein 0 siblings, 1 reply; 10+ messages in thread From: Trond Myklebust @ 2023-12-29 23:49 UTC (permalink / raw) To: amir73il@gmail.com, chuck.lever@oracle.com Cc: linux-nfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk On Fri, 2023-12-29 at 18:29 -0500, Chuck Lever wrote: > On Fri, Dec 29, 2023 at 07:44:20PM +0200, Amir Goldstein wrote: > > On Fri, Dec 29, 2023 at 4:35 PM Chuck Lever > > <chuck.lever@oracle.com> wrote: > > > > > > On Fri, Dec 29, 2023 at 07:46:54AM +0200, Amir Goldstein wrote: > > > > [CC: fsdevel, viro] > > > > > > Thanks for picking this up, Amir, and for copying viro/fsdevel. I > > > was planning to repost this next week when more folks are back, > > > but > > > this works too. > > > > > > Trond, if you'd like, I can handle review changes if you don't > > > have > > > time to follow up. > > > > > > > > > > On Thu, Dec 28, 2023 at 10:22 PM <trondmy@kernel.org> wrote: > > > > > > > > > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > > > > > > > The fallback implementation for the get_name export operation > > > > > uses > > > > > readdir() to try to match the inode number to a filename. > > > > > That filename > > > > > is then used together with lookup_one() to produce a dentry. > > > > > A problem arises when we match the '.' or '..' entries, since > > > > > that > > > > > causes lookup_one() to fail. This has sometimes been seen to > > > > > occur for > > > > > filesystems that violate POSIX requirements around uniqueness > > > > > of inode > > > > > numbers, something that is common for snapshot directories. > > > > > > > > Ouch. Nasty. > > > > > > > > Looks to me like the root cause is "filesystems that violate > > > > POSIX > > > > requirements around uniqueness of inode numbers". > > > > This violation can cause any of the parent's children to > > > > wrongly match > > > > get_name() not only '.' and '..' and fail the d_inode sanity > > > > check after > > > > lookup_one(). > > > > > > > > I understand why this would be common with parent of snapshot > > > > dir, > > > > but the only fs that support snapshots that I know of (btrfs, > > > > bcachefs) > > > > do implement ->get_name(), so which filesystem did you > > > > encounter > > > > this behavior with? can it be fixed by implementing a snapshot > > > > aware ->get_name()? > > > > > > > > > This patch just ensures that we skip '.' and '..' rather than > > > > > allowing a > > > > > match. > > > > > > > > I agree that skipping '.' and '..' makes sense, but... > > > > > > Does skipping '.' and '..' make sense for file systems that do > > > > It makes sense because if the child's name in its parent would > > have been "." or ".." it would have been its own parent or its own > > grandparent (ELOOP situation). > > IOW, we can safely skip "." and "..", regardless of anything else. > > This new comment: > > + /* Ignore the '.' and '..' entries */ > > then seems inadequate to explain why dot and dot-dot are now never > matched. Perhaps the function's documenting comment could expand on > this a little. I'll give it some thought. The point of this code is to attempt to create a valid path that connects the inode found by the filehandle to the export point. The readdir() must determine a valid name for a dentry that is a component of that path, which is why '.' and '..' can never be acceptable. This is why I think we should keep the 'Fixes:' line. The commit it points to explains quite concisely why this patch is needed. > > > > > indeed guarantee inode number uniqueness? Given your explanation > > > here, I'm wondering whether the generic get_name() function is > > > the > > > right place to address the issue. > -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] knfsd: fix the fallback implementation of the get_name export operation 2023-12-29 23:49 ` Trond Myklebust @ 2023-12-30 6:23 ` Amir Goldstein 2023-12-30 19:36 ` Trond Myklebust 0 siblings, 1 reply; 10+ messages in thread From: Amir Goldstein @ 2023-12-30 6:23 UTC (permalink / raw) To: Trond Myklebust Cc: chuck.lever@oracle.com, linux-nfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk On Sat, Dec 30, 2023 at 1:50 AM Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Fri, 2023-12-29 at 18:29 -0500, Chuck Lever wrote: > > On Fri, Dec 29, 2023 at 07:44:20PM +0200, Amir Goldstein wrote: > > > On Fri, Dec 29, 2023 at 4:35 PM Chuck Lever > > > <chuck.lever@oracle.com> wrote: > > > > > > > > On Fri, Dec 29, 2023 at 07:46:54AM +0200, Amir Goldstein wrote: > > > > > [CC: fsdevel, viro] > > > > > > > > Thanks for picking this up, Amir, and for copying viro/fsdevel. I > > > > was planning to repost this next week when more folks are back, > > > > but > > > > this works too. > > > > > > > > Trond, if you'd like, I can handle review changes if you don't > > > > have > > > > time to follow up. > > > > > > > > > > > > > On Thu, Dec 28, 2023 at 10:22 PM <trondmy@kernel.org> wrote: > > > > > > > > > > > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > > > > > > > > > The fallback implementation for the get_name export operation > > > > > > uses > > > > > > readdir() to try to match the inode number to a filename. > > > > > > That filename > > > > > > is then used together with lookup_one() to produce a dentry. > > > > > > A problem arises when we match the '.' or '..' entries, since > > > > > > that > > > > > > causes lookup_one() to fail. This has sometimes been seen to > > > > > > occur for > > > > > > filesystems that violate POSIX requirements around uniqueness > > > > > > of inode > > > > > > numbers, something that is common for snapshot directories. > > > > > > > > > > Ouch. Nasty. > > > > > > > > > > Looks to me like the root cause is "filesystems that violate > > > > > POSIX > > > > > requirements around uniqueness of inode numbers". > > > > > This violation can cause any of the parent's children to > > > > > wrongly match > > > > > get_name() not only '.' and '..' and fail the d_inode sanity > > > > > check after > > > > > lookup_one(). > > > > > > > > > > I understand why this would be common with parent of snapshot > > > > > dir, > > > > > but the only fs that support snapshots that I know of (btrfs, > > > > > bcachefs) > > > > > do implement ->get_name(), so which filesystem did you > > > > > encounter > > > > > this behavior with? can it be fixed by implementing a snapshot > > > > > aware ->get_name()? > > > > > > > > > > > This patch just ensures that we skip '.' and '..' rather than > > > > > > allowing a > > > > > > match. > > > > > > > > > > I agree that skipping '.' and '..' makes sense, but... > > > > > > > > Does skipping '.' and '..' make sense for file systems that do > > > > > > It makes sense because if the child's name in its parent would > > > have been "." or ".." it would have been its own parent or its own > > > grandparent (ELOOP situation). > > > IOW, we can safely skip "." and "..", regardless of anything else. > > > > This new comment: > > > > + /* Ignore the '.' and '..' entries */ > > > > then seems inadequate to explain why dot and dot-dot are now never > > matched. Perhaps the function's documenting comment could expand on > > this a little. I'll give it some thought. > > The point of this code is to attempt to create a valid path that > connects the inode found by the filehandle to the export point. The > readdir() must determine a valid name for a dentry that is a component > of that path, which is why '.' and '..' can never be acceptable. > > This is why I think we should keep the 'Fixes:' line. The commit it > points to explains quite concisely why this patch is needed. > By all means, mention this commit, just not with a fixed tag please. IIUC, commit 21d8a15ac333 did not introduce a regression that this patch fixes. Right? So why insist on abusing Fixes: tag instead of a mention? Thanks, Amir. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] knfsd: fix the fallback implementation of the get_name export operation 2023-12-30 6:23 ` Amir Goldstein @ 2023-12-30 19:36 ` Trond Myklebust 2023-12-31 10:44 ` Amir Goldstein 0 siblings, 1 reply; 10+ messages in thread From: Trond Myklebust @ 2023-12-30 19:36 UTC (permalink / raw) To: amir73il@gmail.com Cc: linux-nfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk, chuck.lever@oracle.com On Sat, 2023-12-30 at 08:23 +0200, Amir Goldstein wrote: > On Sat, Dec 30, 2023 at 1:50 AM Trond Myklebust > <trondmy@hammerspace.com> wrote: > > > > On Fri, 2023-12-29 at 18:29 -0500, Chuck Lever wrote: > > > On Fri, Dec 29, 2023 at 07:44:20PM +0200, Amir Goldstein wrote: > > > > On Fri, Dec 29, 2023 at 4:35 PM Chuck Lever > > > > <chuck.lever@oracle.com> wrote: > > > > > > > > > > On Fri, Dec 29, 2023 at 07:46:54AM +0200, Amir Goldstein > > > > > wrote: > > > > > > [CC: fsdevel, viro] > > > > > > > > > > Thanks for picking this up, Amir, and for copying > > > > > viro/fsdevel. I > > > > > was planning to repost this next week when more folks are > > > > > back, > > > > > but > > > > > this works too. > > > > > > > > > > Trond, if you'd like, I can handle review changes if you > > > > > don't > > > > > have > > > > > time to follow up. > > > > > > > > > > > > > > > > On Thu, Dec 28, 2023 at 10:22 PM <trondmy@kernel.org> > > > > > > wrote: > > > > > > > > > > > > > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > > > > > > > > > > > The fallback implementation for the get_name export > > > > > > > operation > > > > > > > uses > > > > > > > readdir() to try to match the inode number to a filename. > > > > > > > That filename > > > > > > > is then used together with lookup_one() to produce a > > > > > > > dentry. > > > > > > > A problem arises when we match the '.' or '..' entries, > > > > > > > since > > > > > > > that > > > > > > > causes lookup_one() to fail. This has sometimes been seen > > > > > > > to > > > > > > > occur for > > > > > > > filesystems that violate POSIX requirements around > > > > > > > uniqueness > > > > > > > of inode > > > > > > > numbers, something that is common for snapshot > > > > > > > directories. > > > > > > > > > > > > Ouch. Nasty. > > > > > > > > > > > > Looks to me like the root cause is "filesystems that > > > > > > violate > > > > > > POSIX > > > > > > requirements around uniqueness of inode numbers". > > > > > > This violation can cause any of the parent's children to > > > > > > wrongly match > > > > > > get_name() not only '.' and '..' and fail the d_inode > > > > > > sanity > > > > > > check after > > > > > > lookup_one(). > > > > > > > > > > > > I understand why this would be common with parent of > > > > > > snapshot > > > > > > dir, > > > > > > but the only fs that support snapshots that I know of > > > > > > (btrfs, > > > > > > bcachefs) > > > > > > do implement ->get_name(), so which filesystem did you > > > > > > encounter > > > > > > this behavior with? can it be fixed by implementing a > > > > > > snapshot > > > > > > aware ->get_name()? > > > > > > > > > > > > > This patch just ensures that we skip '.' and '..' rather > > > > > > > than > > > > > > > allowing a > > > > > > > match. > > > > > > > > > > > > I agree that skipping '.' and '..' makes sense, but... > > > > > > > > > > Does skipping '.' and '..' make sense for file systems that > > > > > do > > > > > > > > It makes sense because if the child's name in its parent would > > > > have been "." or ".." it would have been its own parent or its > > > > own > > > > grandparent (ELOOP situation). > > > > IOW, we can safely skip "." and "..", regardless of anything > > > > else. > > > > > > This new comment: > > > > > > + /* Ignore the '.' and '..' entries */ > > > > > > then seems inadequate to explain why dot and dot-dot are now > > > never > > > matched. Perhaps the function's documenting comment could expand > > > on > > > this a little. I'll give it some thought. > > > > The point of this code is to attempt to create a valid path that > > connects the inode found by the filehandle to the export point. The > > readdir() must determine a valid name for a dentry that is a > > component > > of that path, which is why '.' and '..' can never be acceptable. > > > > This is why I think we should keep the 'Fixes:' line. The commit it > > points to explains quite concisely why this patch is needed. > > > > By all means, mention this commit, just not with a fixed tag please. > IIUC, commit 21d8a15ac333 did not introduce a regression that this > patch fixes. Right? > So why insist on abusing Fixes: tag instead of a mention? I don't see it as being that straightforward. Prior to commit 21d8a15ac333, the call to lookup_one_len() could return a dentry (albeit one with an invalid name) depending on whether or not the filesystem lookup succeeds. Note that knfsd does support a lookup of "." and "..", as do several other NFS servers. With commit 21d8a15ac333 applied, however, lookup_one_len() automatically returns an EACCES error. So while I agree that there are good reasons for introducing commit 21d8a15ac333, it does change the behaviour in this code path. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] knfsd: fix the fallback implementation of the get_name export operation 2023-12-30 19:36 ` Trond Myklebust @ 2023-12-31 10:44 ` Amir Goldstein 0 siblings, 0 replies; 10+ messages in thread From: Amir Goldstein @ 2023-12-31 10:44 UTC (permalink / raw) To: Trond Myklebust Cc: linux-nfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk, chuck.lever@oracle.com On Sat, Dec 30, 2023 at 9:36 PM Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Sat, 2023-12-30 at 08:23 +0200, Amir Goldstein wrote: > > On Sat, Dec 30, 2023 at 1:50 AM Trond Myklebust > > <trondmy@hammerspace.com> wrote: > > > > > > On Fri, 2023-12-29 at 18:29 -0500, Chuck Lever wrote: > > > > On Fri, Dec 29, 2023 at 07:44:20PM +0200, Amir Goldstein wrote: > > > > > On Fri, Dec 29, 2023 at 4:35 PM Chuck Lever > > > > > <chuck.lever@oracle.com> wrote: > > > > > > > > > > > > On Fri, Dec 29, 2023 at 07:46:54AM +0200, Amir Goldstein > > > > > > wrote: > > > > > > > [CC: fsdevel, viro] > > > > > > > > > > > > Thanks for picking this up, Amir, and for copying > > > > > > viro/fsdevel. I > > > > > > was planning to repost this next week when more folks are > > > > > > back, > > > > > > but > > > > > > this works too. > > > > > > > > > > > > Trond, if you'd like, I can handle review changes if you > > > > > > don't > > > > > > have > > > > > > time to follow up. > > > > > > > > > > > > > > > > > > > On Thu, Dec 28, 2023 at 10:22 PM <trondmy@kernel.org> > > > > > > > wrote: > > > > > > > > > > > > > > > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > > > > > > > > > > > > > The fallback implementation for the get_name export > > > > > > > > operation > > > > > > > > uses > > > > > > > > readdir() to try to match the inode number to a filename. > > > > > > > > That filename > > > > > > > > is then used together with lookup_one() to produce a > > > > > > > > dentry. > > > > > > > > A problem arises when we match the '.' or '..' entries, > > > > > > > > since > > > > > > > > that > > > > > > > > causes lookup_one() to fail. This has sometimes been seen > > > > > > > > to > > > > > > > > occur for > > > > > > > > filesystems that violate POSIX requirements around > > > > > > > > uniqueness > > > > > > > > of inode > > > > > > > > numbers, something that is common for snapshot > > > > > > > > directories. > > > > > > > > > > > > > > Ouch. Nasty. > > > > > > > > > > > > > > Looks to me like the root cause is "filesystems that > > > > > > > violate > > > > > > > POSIX > > > > > > > requirements around uniqueness of inode numbers". > > > > > > > This violation can cause any of the parent's children to > > > > > > > wrongly match > > > > > > > get_name() not only '.' and '..' and fail the d_inode > > > > > > > sanity > > > > > > > check after > > > > > > > lookup_one(). > > > > > > > > > > > > > > I understand why this would be common with parent of > > > > > > > snapshot > > > > > > > dir, > > > > > > > but the only fs that support snapshots that I know of > > > > > > > (btrfs, > > > > > > > bcachefs) > > > > > > > do implement ->get_name(), so which filesystem did you > > > > > > > encounter > > > > > > > this behavior with? can it be fixed by implementing a > > > > > > > snapshot > > > > > > > aware ->get_name()? > > > > > > > > > > > > > > > This patch just ensures that we skip '.' and '..' rather > > > > > > > > than > > > > > > > > allowing a > > > > > > > > match. > > > > > > > > > > > > > > I agree that skipping '.' and '..' makes sense, but... > > > > > > > > > > > > Does skipping '.' and '..' make sense for file systems that > > > > > > do > > > > > > > > > > It makes sense because if the child's name in its parent would > > > > > have been "." or ".." it would have been its own parent or its > > > > > own > > > > > grandparent (ELOOP situation). > > > > > IOW, we can safely skip "." and "..", regardless of anything > > > > > else. > > > > > > > > This new comment: > > > > > > > > + /* Ignore the '.' and '..' entries */ > > > > > > > > then seems inadequate to explain why dot and dot-dot are now > > > > never > > > > matched. Perhaps the function's documenting comment could expand > > > > on > > > > this a little. I'll give it some thought. > > > > > > The point of this code is to attempt to create a valid path that > > > connects the inode found by the filehandle to the export point. The > > > readdir() must determine a valid name for a dentry that is a > > > component > > > of that path, which is why '.' and '..' can never be acceptable. > > > > > > This is why I think we should keep the 'Fixes:' line. The commit it > > > points to explains quite concisely why this patch is needed. > > > > > > > By all means, mention this commit, just not with a fixed tag please. > > IIUC, commit 21d8a15ac333 did not introduce a regression that this > > patch fixes. Right? > > So why insist on abusing Fixes: tag instead of a mention? > > I don't see it as being that straightforward. > > Prior to commit 21d8a15ac333, the call to lookup_one_len() could return > a dentry (albeit one with an invalid name) depending on whether or not > the filesystem lookup succeeds. Note that knfsd does support a lookup > of "." and "..", as do several other NFS servers. > > With commit 21d8a15ac333 applied, however, lookup_one_len() > automatically returns an EACCES error. > > So while I agree that there are good reasons for introducing commit > 21d8a15ac333, it does change the behaviour in this code path. > I feel that we are miscommunicating. Let me explain how I understand the code and please tell me where I am wrong. The way I see it, before 21d8a15ac333, exportfs_decode_fh_raw() would call lookup_one() and may get a dentry (with invalid name), but then the sanity check following lookup_one() would surely fail, because no fs should allow a directory to be its own parent/grandparent: if (unlikely(nresult->d_inode != result->d_inode)) { dput(nresult); nresult = ERR_PTR(-ESTALE); } The way I see it, the only thing that commit 21d8a15ac333 changed in this code is the return value of exportfs_decode_fh_raw() from -ESTALE to -EACCES. exportfs_decode_fh() converts both these errors to -ESTALE and so does nfsd_set_fh_dentry(). Bottom line, if I am reading the code correctly, commit 21d8a15ac333 did not change the behaviour for knfsd nor any user visible behavior for open_by_handle_at() for userspace nfsd. Your fix is good because: 1. It saves an unneeded call to lookup_one() 2. skipping "." and ".." increases the chance of finding the correct child name in the case of non-unique ino So I have no objection to your fix in generic code, but I do not see it being a regression fix. Where are we miscommunicating? What am I missing? Thanks, Amir. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] knfsd: fix the fallback implementation of the get_name export operation 2023-12-29 5:46 ` [PATCH] knfsd: fix the fallback implementation of the get_name export operation Amir Goldstein 2023-12-29 14:34 ` Chuck Lever @ 2023-12-29 15:21 ` Trond Myklebust 2023-12-29 17:54 ` Amir Goldstein 1 sibling, 1 reply; 10+ messages in thread From: Trond Myklebust @ 2023-12-29 15:21 UTC (permalink / raw) To: amir73il@gmail.com Cc: linux-nfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk, chuck.lever@oracle.com On Fri, 2023-12-29 at 07:46 +0200, Amir Goldstein wrote: > [CC: fsdevel, viro] > > On Thu, Dec 28, 2023 at 10:22 PM <trondmy@kernel.org> wrote: > > > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > The fallback implementation for the get_name export operation uses > > readdir() to try to match the inode number to a filename. That > > filename > > is then used together with lookup_one() to produce a dentry. > > A problem arises when we match the '.' or '..' entries, since that > > causes lookup_one() to fail. This has sometimes been seen to occur > > for > > filesystems that violate POSIX requirements around uniqueness of > > inode > > numbers, something that is common for snapshot directories. > > Ouch. Nasty. > > Looks to me like the root cause is "filesystems that violate POSIX > requirements around uniqueness of inode numbers". > This violation can cause any of the parent's children to wrongly > match > get_name() not only '.' and '..' and fail the d_inode sanity check > after > lookup_one(). > > I understand why this would be common with parent of snapshot dir, > but the only fs that support snapshots that I know of (btrfs, > bcachefs) > do implement ->get_name(), so which filesystem did you encounter > this behavior with? can it be fixed by implementing a snapshot > aware ->get_name()? NFS (i.e. re-exporting NFS). Why do you not want a fix in the generic code? > > > > > This patch just ensures that we skip '.' and '..' rather than > > allowing a > > match. > > I agree that skipping '.' and '..' makes sense, but... > > > > > Fixes: 21d8a15ac333 ("lookup_one_len: don't accept . and ..") > > ...This Fixes is a bit odd to me. > Does the problem go away if the Fixes patch is reverted? > I don't think so, I think you would just hit the d_inode sanity check > after lookup_one() succeeds. > Maybe I did not understand the problem then. > > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> > > --- > > fs/exportfs/expfs.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c > > index 3ae0154c5680..84af58eaf2ca 100644 > > --- a/fs/exportfs/expfs.c > > +++ b/fs/exportfs/expfs.c > > @@ -255,7 +255,9 @@ static bool filldir_one(struct dir_context > > *ctx, const char *name, int len, > > container_of(ctx, struct getdents_callback, ctx); > > > > buf->sequence++; > > - if (buf->ino == ino && len <= NAME_MAX) { > > + /* Ignore the '.' and '..' entries */ > > + if ((len > 2 || name[0] != '.' || (len == 2 && name[1] != > > '.')) && > > I wish I did not have to review that this condition is correct. > I wish there was a common helper is_dot_or_dotdot() that would be > used here as !is_dot_dotdot(name, len). > I found 3 copies of is_dot_dotdot(). > I didn't even try to find how many places have open coded this. > > Thanks, > Amir. > -- Trond Myklebust CTO, Hammerspace Inc 1900 S Norfolk St, Suite 350 - #45 San Mateo, CA 94403 www.hammerspace.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] knfsd: fix the fallback implementation of the get_name export operation 2023-12-29 15:21 ` Trond Myklebust @ 2023-12-29 17:54 ` Amir Goldstein 0 siblings, 0 replies; 10+ messages in thread From: Amir Goldstein @ 2023-12-29 17:54 UTC (permalink / raw) To: Trond Myklebust Cc: linux-nfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk, chuck.lever@oracle.com On Fri, Dec 29, 2023 at 5:22 PM Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Fri, 2023-12-29 at 07:46 +0200, Amir Goldstein wrote: > > [CC: fsdevel, viro] > > > > On Thu, Dec 28, 2023 at 10:22 PM <trondmy@kernel.org> wrote: > > > > > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > > > The fallback implementation for the get_name export operation uses > > > readdir() to try to match the inode number to a filename. That > > > filename > > > is then used together with lookup_one() to produce a dentry. > > > A problem arises when we match the '.' or '..' entries, since that > > > causes lookup_one() to fail. This has sometimes been seen to occur > > > for > > > filesystems that violate POSIX requirements around uniqueness of > > > inode > > > numbers, something that is common for snapshot directories. > > > > Ouch. Nasty. > > > > Looks to me like the root cause is "filesystems that violate POSIX > > requirements around uniqueness of inode numbers". > > This violation can cause any of the parent's children to wrongly > > match > > get_name() not only '.' and '..' and fail the d_inode sanity check > > after > > lookup_one(). > > > > I understand why this would be common with parent of snapshot dir, > > but the only fs that support snapshots that I know of (btrfs, > > bcachefs) > > do implement ->get_name(), so which filesystem did you encounter > > this behavior with? can it be fixed by implementing a snapshot > > aware ->get_name()? > > NFS (i.e. re-exporting NFS). > Ah. > Why do you not want a fix in the generic code? > I do not object to your fix at all. I only objected to the Fixes tag. I was just pointing out that this is not a complete solution. A decode of an NFS (re-exported) file handle could fail if get_name() iterates the parent of a snapshot root dir and finds a false match (which is not "." nor "..") before it finds the snapshot subdir name. It may be solved by nfs_get_name() which does not stop when if finds an ino match but checks further, but I don't know nfs re-export to know what else could be checked. Anyway, for this patch, without the Fixes tag, feel free to add: Acked-by: Amir Goldstein <amir73il@gmail.com> I'd prefer the use of is_dot_dotdot(), but I do not insist. Thanks and a happy new year! Amir. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-12-31 10:45 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20231228201510.985235-1-trondmy@kernel.org>
2023-12-29 5:46 ` [PATCH] knfsd: fix the fallback implementation of the get_name export operation Amir Goldstein
2023-12-29 14:34 ` Chuck Lever
2023-12-29 17:44 ` Amir Goldstein
2023-12-29 23:29 ` Chuck Lever
2023-12-29 23:49 ` Trond Myklebust
2023-12-30 6:23 ` Amir Goldstein
2023-12-30 19:36 ` Trond Myklebust
2023-12-31 10:44 ` Amir Goldstein
2023-12-29 15:21 ` Trond Myklebust
2023-12-29 17:54 ` Amir Goldstein
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).