* [PATCH 0/2 v2] VFS: minor improvements to a couple of interfaces @ 2025-02-17 0:27 NeilBrown 2025-02-17 0:27 ` [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry NeilBrown 2025-02-17 0:27 ` [PATCH 2/2] VFS: add common error checks to lookup_one_qstr_excl() NeilBrown 0 siblings, 2 replies; 21+ messages in thread From: NeilBrown @ 2025-02-17 0:27 UTC (permalink / raw) To: Christian Brauner, Alexander Viro, Jan Kara; +Cc: linux-fsdevel, linux-nfs Hi, please replace the top two patches in vfs/vfs-6.15.async.dir with these two revised versions. Both add text to filesytems/porting.rst The first moves an 'err = 0' as was requested if any refresh is needed. The second adds a check for errors from ->lookup. This omission is triggering various error reports. Thanks, NeilBrown [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() [PATCH 2/2] VFS: add common error checks to lookup_one_qstr_excl() ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry 2025-02-17 0:27 [PATCH 0/2 v2] VFS: minor improvements to a couple of interfaces NeilBrown @ 2025-02-17 0:27 ` NeilBrown 2025-02-17 8:26 ` Christian Brauner ` (2 more replies) 2025-02-17 0:27 ` [PATCH 2/2] VFS: add common error checks to lookup_one_qstr_excl() NeilBrown 1 sibling, 3 replies; 21+ messages in thread From: NeilBrown @ 2025-02-17 0:27 UTC (permalink / raw) To: Christian Brauner, Alexander Viro, Jan Kara; +Cc: linux-fsdevel, linux-nfs No callers of kern_path_locked() or user_path_locked_at() want a negative dentry. So change them to return -ENOENT instead. This simplifies callers. This results in a subtle change to bcachefs in that an ioctl will now return -ENOENT in preference to -EXDEV. I believe this restores the behaviour to what it was prior to Commit bbe6a7c899e7 ("bch2_ioctl_subvolume_destroy(): fix locking") Signed-off-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/20250207034040.3402438-2-neilb@suse.de Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Christian Brauner <brauner@kernel.org> --- Documentation/filesystems/porting.rst | 8 ++++ drivers/base/devtmpfs.c | 65 +++++++++++++-------------- fs/bcachefs/fs-ioctl.c | 4 -- fs/namei.c | 4 ++ kernel/audit_watch.c | 12 ++--- 5 files changed, 48 insertions(+), 45 deletions(-) diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst index 1639e78e3146..2ead47e20677 100644 --- a/Documentation/filesystems/porting.rst +++ b/Documentation/filesystems/porting.rst @@ -1157,3 +1157,11 @@ in normal case it points into the pathname being looked up. NOTE: if you need something like full path from the root of filesystem, you are still on your own - this assists with simple cases, but it's not magic. + +--- + +** recommended** + +kern_path_locked() and user_path_locked() no longer return a negative +dentry so this doesn't need to be checked. If the name cannot be found, +ERR_PTR(-ENOENT) is returned. diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c index b848764ef018..c9e34842139f 100644 --- a/drivers/base/devtmpfs.c +++ b/drivers/base/devtmpfs.c @@ -245,15 +245,12 @@ static int dev_rmdir(const char *name) dentry = kern_path_locked(name, &parent); if (IS_ERR(dentry)) return PTR_ERR(dentry); - if (d_really_is_positive(dentry)) { - if (d_inode(dentry)->i_private == &thread) - err = vfs_rmdir(&nop_mnt_idmap, d_inode(parent.dentry), - dentry); - else - err = -EPERM; - } else { - err = -ENOENT; - } + if (d_inode(dentry)->i_private == &thread) + err = vfs_rmdir(&nop_mnt_idmap, d_inode(parent.dentry), + dentry); + else + err = -EPERM; + dput(dentry); inode_unlock(d_inode(parent.dentry)); path_put(&parent); @@ -310,6 +307,8 @@ static int handle_remove(const char *nodename, struct device *dev) { struct path parent; struct dentry *dentry; + struct kstat stat; + struct path p; int deleted = 0; int err; @@ -317,32 +316,28 @@ static int handle_remove(const char *nodename, struct device *dev) if (IS_ERR(dentry)) return PTR_ERR(dentry); - if (d_really_is_positive(dentry)) { - struct kstat stat; - struct path p = {.mnt = parent.mnt, .dentry = dentry}; - err = vfs_getattr(&p, &stat, STATX_TYPE | STATX_MODE, - AT_STATX_SYNC_AS_STAT); - if (!err && dev_mynode(dev, d_inode(dentry), &stat)) { - struct iattr newattrs; - /* - * before unlinking this node, reset permissions - * of possible references like hardlinks - */ - newattrs.ia_uid = GLOBAL_ROOT_UID; - newattrs.ia_gid = GLOBAL_ROOT_GID; - newattrs.ia_mode = stat.mode & ~0777; - newattrs.ia_valid = - ATTR_UID|ATTR_GID|ATTR_MODE; - inode_lock(d_inode(dentry)); - notify_change(&nop_mnt_idmap, dentry, &newattrs, NULL); - inode_unlock(d_inode(dentry)); - err = vfs_unlink(&nop_mnt_idmap, d_inode(parent.dentry), - dentry, NULL); - if (!err || err == -ENOENT) - deleted = 1; - } - } else { - err = -ENOENT; + p.mnt = parent.mnt; + p.dentry = dentry; + err = vfs_getattr(&p, &stat, STATX_TYPE | STATX_MODE, + AT_STATX_SYNC_AS_STAT); + if (!err && dev_mynode(dev, d_inode(dentry), &stat)) { + struct iattr newattrs; + /* + * before unlinking this node, reset permissions + * of possible references like hardlinks + */ + newattrs.ia_uid = GLOBAL_ROOT_UID; + newattrs.ia_gid = GLOBAL_ROOT_GID; + newattrs.ia_mode = stat.mode & ~0777; + newattrs.ia_valid = + ATTR_UID|ATTR_GID|ATTR_MODE; + inode_lock(d_inode(dentry)); + notify_change(&nop_mnt_idmap, dentry, &newattrs, NULL); + inode_unlock(d_inode(dentry)); + err = vfs_unlink(&nop_mnt_idmap, d_inode(parent.dentry), + dentry, NULL); + if (!err || err == -ENOENT) + deleted = 1; } dput(dentry); inode_unlock(d_inode(parent.dentry)); diff --git a/fs/bcachefs/fs-ioctl.c b/fs/bcachefs/fs-ioctl.c index 15725b4ce393..595b57fabc9a 100644 --- a/fs/bcachefs/fs-ioctl.c +++ b/fs/bcachefs/fs-ioctl.c @@ -511,10 +511,6 @@ static long bch2_ioctl_subvolume_destroy(struct bch_fs *c, struct file *filp, ret = -EXDEV; goto err; } - if (!d_is_positive(victim)) { - ret = -ENOENT; - goto err; - } ret = __bch2_unlink(dir, victim, true); if (!ret) { fsnotify_rmdir(dir, victim); diff --git a/fs/namei.c b/fs/namei.c index 3ab9440c5b93..fb6da3ca0ca5 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2741,6 +2741,10 @@ static struct dentry *__kern_path_locked(int dfd, struct filename *name, struct } inode_lock_nested(path->dentry->d_inode, I_MUTEX_PARENT); d = lookup_one_qstr_excl(&last, path->dentry, 0); + if (!IS_ERR(d) && d_is_negative(d)) { + dput(d); + d = ERR_PTR(-ENOENT); + } if (IS_ERR(d)) { inode_unlock(path->dentry->d_inode); path_put(path); diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c index 7f358740e958..367eaf2c78b7 100644 --- a/kernel/audit_watch.c +++ b/kernel/audit_watch.c @@ -350,11 +350,10 @@ static int audit_get_nd(struct audit_watch *watch, struct path *parent) struct dentry *d = kern_path_locked(watch->path, parent); if (IS_ERR(d)) return PTR_ERR(d); - if (d_is_positive(d)) { - /* update watch filter fields */ - watch->dev = d->d_sb->s_dev; - watch->ino = d_backing_inode(d)->i_ino; - } + /* update watch filter fields */ + watch->dev = d->d_sb->s_dev; + watch->ino = d_backing_inode(d)->i_ino; + inode_unlock(d_backing_inode(parent->dentry)); dput(d); return 0; @@ -419,10 +418,11 @@ int audit_add_watch(struct audit_krule *krule, struct list_head **list) /* caller expects mutex locked */ mutex_lock(&audit_filter_mutex); - if (ret) { + if (ret && ret != -ENOENT) { audit_put_watch(watch); return ret; } + ret = 0; /* either find an old parent or attach a new one */ parent = audit_find_parent(d_backing_inode(parent_path.dentry)); -- 2.47.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry 2025-02-17 0:27 ` [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry NeilBrown @ 2025-02-17 8:26 ` Christian Brauner 2025-02-17 13:41 ` Jeff Layton 2025-04-14 11:01 ` Christian Brauner 2 siblings, 0 replies; 21+ messages in thread From: Christian Brauner @ 2025-02-17 8:26 UTC (permalink / raw) To: NeilBrown Cc: Christian Brauner, linux-fsdevel, linux-nfs, Alexander Viro, Jan Kara On Mon, 17 Feb 2025 11:27:20 +1100, NeilBrown wrote: > No callers of kern_path_locked() or user_path_locked_at() want a > negative dentry. So change them to return -ENOENT instead. This > simplifies callers. > > This results in a subtle change to bcachefs in that an ioctl will now > return -ENOENT in preference to -EXDEV. I believe this restores the > behaviour to what it was prior to > Commit bbe6a7c899e7 ("bch2_ioctl_subvolume_destroy(): fix locking") > > [...] Applied to the vfs-6.15.async.dir branch of the vfs/vfs.git tree. Patches in the vfs-6.15.async.dir branch should appear in linux-next soon. Please report any outstanding bugs that were missed during review in a new review to the original patch series allowing us to drop it. It's encouraged to provide Acked-bys and Reviewed-bys even though the patch has now been applied. If possible patch trailers will be updated. Note that commit hashes shown below are subject to change due to rebase, trailer updates or similar. If in doubt, please check the listed branch. tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git branch: vfs-6.15.async.dir [1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry https://git.kernel.org/vfs/vfs/c/a97b8bfbb9f1 [2/2] VFS: add common error checks to lookup_one_qstr_excl() https://git.kernel.org/vfs/vfs/c/20c2c1baa9ab ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry 2025-02-17 0:27 ` [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry NeilBrown 2025-02-17 8:26 ` Christian Brauner @ 2025-02-17 13:41 ` Jeff Layton 2025-04-14 11:01 ` Christian Brauner 2 siblings, 0 replies; 21+ messages in thread From: Jeff Layton @ 2025-02-17 13:41 UTC (permalink / raw) To: NeilBrown, Christian Brauner, Alexander Viro, Jan Kara Cc: linux-fsdevel, linux-nfs On Mon, 2025-02-17 at 11:27 +1100, NeilBrown wrote: > No callers of kern_path_locked() or user_path_locked_at() want a > negative dentry. So change them to return -ENOENT instead. This > simplifies callers. > > This results in a subtle change to bcachefs in that an ioctl will now > return -ENOENT in preference to -EXDEV. I believe this restores the > behaviour to what it was prior to > Commit bbe6a7c899e7 ("bch2_ioctl_subvolume_destroy(): fix locking") > > Signed-off-by: NeilBrown <neilb@suse.de> > Link: https://lore.kernel.org/r/20250207034040.3402438-2-neilb@suse.de > Acked-by: Paul Moore <paul@paul-moore.com> > Signed-off-by: Christian Brauner <brauner@kernel.org> > --- > Documentation/filesystems/porting.rst | 8 ++++ > drivers/base/devtmpfs.c | 65 +++++++++++++-------------- > fs/bcachefs/fs-ioctl.c | 4 -- > fs/namei.c | 4 ++ > kernel/audit_watch.c | 12 ++--- > 5 files changed, 48 insertions(+), 45 deletions(-) > > diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst > index 1639e78e3146..2ead47e20677 100644 > --- a/Documentation/filesystems/porting.rst > +++ b/Documentation/filesystems/porting.rst > @@ -1157,3 +1157,11 @@ in normal case it points into the pathname being looked up. > NOTE: if you need something like full path from the root of filesystem, > you are still on your own - this assists with simple cases, but it's not > magic. > + > +--- > + > +** recommended** > + > +kern_path_locked() and user_path_locked() no longer return a negative > +dentry so this doesn't need to be checked. If the name cannot be found, > +ERR_PTR(-ENOENT) is returned. > diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c > index b848764ef018..c9e34842139f 100644 > --- a/drivers/base/devtmpfs.c > +++ b/drivers/base/devtmpfs.c > @@ -245,15 +245,12 @@ static int dev_rmdir(const char *name) > dentry = kern_path_locked(name, &parent); > if (IS_ERR(dentry)) > return PTR_ERR(dentry); > - if (d_really_is_positive(dentry)) { > - if (d_inode(dentry)->i_private == &thread) > - err = vfs_rmdir(&nop_mnt_idmap, d_inode(parent.dentry), > - dentry); > - else > - err = -EPERM; > - } else { > - err = -ENOENT; > - } > + if (d_inode(dentry)->i_private == &thread) > + err = vfs_rmdir(&nop_mnt_idmap, d_inode(parent.dentry), > + dentry); > + else > + err = -EPERM; > + > dput(dentry); > inode_unlock(d_inode(parent.dentry)); > path_put(&parent); > @@ -310,6 +307,8 @@ static int handle_remove(const char *nodename, struct device *dev) > { > struct path parent; > struct dentry *dentry; > + struct kstat stat; > + struct path p; > int deleted = 0; > int err; > > @@ -317,32 +316,28 @@ static int handle_remove(const char *nodename, struct device *dev) > if (IS_ERR(dentry)) > return PTR_ERR(dentry); > > - if (d_really_is_positive(dentry)) { > - struct kstat stat; > - struct path p = {.mnt = parent.mnt, .dentry = dentry}; > - err = vfs_getattr(&p, &stat, STATX_TYPE | STATX_MODE, > - AT_STATX_SYNC_AS_STAT); > - if (!err && dev_mynode(dev, d_inode(dentry), &stat)) { > - struct iattr newattrs; > - /* > - * before unlinking this node, reset permissions > - * of possible references like hardlinks > - */ > - newattrs.ia_uid = GLOBAL_ROOT_UID; > - newattrs.ia_gid = GLOBAL_ROOT_GID; > - newattrs.ia_mode = stat.mode & ~0777; > - newattrs.ia_valid = > - ATTR_UID|ATTR_GID|ATTR_MODE; > - inode_lock(d_inode(dentry)); > - notify_change(&nop_mnt_idmap, dentry, &newattrs, NULL); > - inode_unlock(d_inode(dentry)); > - err = vfs_unlink(&nop_mnt_idmap, d_inode(parent.dentry), > - dentry, NULL); > - if (!err || err == -ENOENT) > - deleted = 1; > - } > - } else { > - err = -ENOENT; > + p.mnt = parent.mnt; > + p.dentry = dentry; > + err = vfs_getattr(&p, &stat, STATX_TYPE | STATX_MODE, > + AT_STATX_SYNC_AS_STAT); > + if (!err && dev_mynode(dev, d_inode(dentry), &stat)) { > + struct iattr newattrs; > + /* > + * before unlinking this node, reset permissions > + * of possible references like hardlinks > + */ > + newattrs.ia_uid = GLOBAL_ROOT_UID; > + newattrs.ia_gid = GLOBAL_ROOT_GID; > + newattrs.ia_mode = stat.mode & ~0777; > + newattrs.ia_valid = > + ATTR_UID|ATTR_GID|ATTR_MODE; > + inode_lock(d_inode(dentry)); > + notify_change(&nop_mnt_idmap, dentry, &newattrs, NULL); > + inode_unlock(d_inode(dentry)); > + err = vfs_unlink(&nop_mnt_idmap, d_inode(parent.dentry), > + dentry, NULL); > + if (!err || err == -ENOENT) > + deleted = 1; > } > dput(dentry); > inode_unlock(d_inode(parent.dentry)); > diff --git a/fs/bcachefs/fs-ioctl.c b/fs/bcachefs/fs-ioctl.c > index 15725b4ce393..595b57fabc9a 100644 > --- a/fs/bcachefs/fs-ioctl.c > +++ b/fs/bcachefs/fs-ioctl.c > @@ -511,10 +511,6 @@ static long bch2_ioctl_subvolume_destroy(struct bch_fs *c, struct file *filp, > ret = -EXDEV; > goto err; > } > - if (!d_is_positive(victim)) { > - ret = -ENOENT; > - goto err; > - } > ret = __bch2_unlink(dir, victim, true); > if (!ret) { > fsnotify_rmdir(dir, victim); > diff --git a/fs/namei.c b/fs/namei.c > index 3ab9440c5b93..fb6da3ca0ca5 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -2741,6 +2741,10 @@ static struct dentry *__kern_path_locked(int dfd, struct filename *name, struct > } > inode_lock_nested(path->dentry->d_inode, I_MUTEX_PARENT); > d = lookup_one_qstr_excl(&last, path->dentry, 0); > + if (!IS_ERR(d) && d_is_negative(d)) { > + dput(d); > + d = ERR_PTR(-ENOENT); > + } > if (IS_ERR(d)) { > inode_unlock(path->dentry->d_inode); > path_put(path); > diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c > index 7f358740e958..367eaf2c78b7 100644 > --- a/kernel/audit_watch.c > +++ b/kernel/audit_watch.c > @@ -350,11 +350,10 @@ static int audit_get_nd(struct audit_watch *watch, struct path *parent) > struct dentry *d = kern_path_locked(watch->path, parent); > if (IS_ERR(d)) > return PTR_ERR(d); > - if (d_is_positive(d)) { > - /* update watch filter fields */ > - watch->dev = d->d_sb->s_dev; > - watch->ino = d_backing_inode(d)->i_ino; > - } > + /* update watch filter fields */ > + watch->dev = d->d_sb->s_dev; > + watch->ino = d_backing_inode(d)->i_ino; > + > inode_unlock(d_backing_inode(parent->dentry)); > dput(d); > return 0; > @@ -419,10 +418,11 @@ int audit_add_watch(struct audit_krule *krule, struct list_head **list) > /* caller expects mutex locked */ > mutex_lock(&audit_filter_mutex); > > - if (ret) { > + if (ret && ret != -ENOENT) { > audit_put_watch(watch); > return ret; > } > + ret = 0; > > /* either find an old parent or attach a new one */ > parent = audit_find_parent(d_backing_inode(parent_path.dentry)); Reviewed-by: Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry 2025-02-17 0:27 ` [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry NeilBrown 2025-02-17 8:26 ` Christian Brauner 2025-02-17 13:41 ` Jeff Layton @ 2025-04-14 11:01 ` Christian Brauner 2025-04-14 15:54 ` Christian Brauner 2 siblings, 1 reply; 21+ messages in thread From: Christian Brauner @ 2025-04-14 11:01 UTC (permalink / raw) To: NeilBrown Cc: Vlastimil Babka, Alexander Viro, Jan Kara, linux-fsdevel, linux-nfs > diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c > index 7f358740e958..367eaf2c78b7 100644 > --- a/kernel/audit_watch.c > +++ b/kernel/audit_watch.c > @@ -350,11 +350,10 @@ static int audit_get_nd(struct audit_watch *watch, struct path *parent) > struct dentry *d = kern_path_locked(watch->path, parent); > if (IS_ERR(d)) > return PTR_ERR(d); > - if (d_is_positive(d)) { > - /* update watch filter fields */ > - watch->dev = d->d_sb->s_dev; > - watch->ino = d_backing_inode(d)->i_ino; > - } > + /* update watch filter fields */ > + watch->dev = d->d_sb->s_dev; > + watch->ino = d_backing_inode(d)->i_ino; > + > inode_unlock(d_backing_inode(parent->dentry)); > dput(d); > return 0; > @@ -419,10 +418,11 @@ int audit_add_watch(struct audit_krule *krule, struct list_head **list) > /* caller expects mutex locked */ > mutex_lock(&audit_filter_mutex); > > - if (ret) { > + if (ret && ret != -ENOENT) { > audit_put_watch(watch); > return ret; > } > + ret = 0; So this is broken. If kern_path_locked() fails due to a negative dentry and returns ENOENT it will have already called path_put() and @parent_path is invalid. But right after this audit does: > > /* either find an old parent or attach a new one */ > parent = audit_find_parent(d_backing_inode(parent_path.dentry)); and then later on calls path_put() again. So this is a UAF. We need to fix this. This used to work before because kern_path_locked() return a path with a negative dentry. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry 2025-04-14 11:01 ` Christian Brauner @ 2025-04-14 15:54 ` Christian Brauner 0 siblings, 0 replies; 21+ messages in thread From: Christian Brauner @ 2025-04-14 15:54 UTC (permalink / raw) To: NeilBrown Cc: Vlastimil Babka, Alexander Viro, Jan Kara, linux-fsdevel, linux-nfs On Mon, Apr 14, 2025 at 01:01:53PM +0200, Christian Brauner wrote: > > diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c > > index 7f358740e958..367eaf2c78b7 100644 > > --- a/kernel/audit_watch.c > > +++ b/kernel/audit_watch.c > > @@ -350,11 +350,10 @@ static int audit_get_nd(struct audit_watch *watch, struct path *parent) > > struct dentry *d = kern_path_locked(watch->path, parent); > > if (IS_ERR(d)) > > return PTR_ERR(d); > > - if (d_is_positive(d)) { > > - /* update watch filter fields */ > > - watch->dev = d->d_sb->s_dev; > > - watch->ino = d_backing_inode(d)->i_ino; > > - } > > + /* update watch filter fields */ > > + watch->dev = d->d_sb->s_dev; > > + watch->ino = d_backing_inode(d)->i_ino; > > + > > inode_unlock(d_backing_inode(parent->dentry)); > > dput(d); > > return 0; > > @@ -419,10 +418,11 @@ int audit_add_watch(struct audit_krule *krule, struct list_head **list) > > /* caller expects mutex locked */ > > mutex_lock(&audit_filter_mutex); > > > > - if (ret) { > > + if (ret && ret != -ENOENT) { > > audit_put_watch(watch); > > return ret; > > } > > + ret = 0; > > So this is broken. > > If kern_path_locked() fails due to a negative dentry and returns ENOENT > it will have already called path_put() and @parent_path is invalid. > > But right after this audit does: > > > > > /* either find an old parent or attach a new one */ > > parent = audit_find_parent(d_backing_inode(parent_path.dentry)); > > and then later on calls path_put() again. So this is a UAF. We need to > fix this. > > This used to work before because kern_path_locked() return a path with a > negative dentry. *returned the parent path even if the looked up dentry was negative ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 2/2] VFS: add common error checks to lookup_one_qstr_excl() 2025-02-17 0:27 [PATCH 0/2 v2] VFS: minor improvements to a couple of interfaces NeilBrown 2025-02-17 0:27 ` [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry NeilBrown @ 2025-02-17 0:27 ` NeilBrown 2025-02-17 13:46 ` Jeff Layton 1 sibling, 1 reply; 21+ messages in thread From: NeilBrown @ 2025-02-17 0:27 UTC (permalink / raw) To: Christian Brauner, Alexander Viro, Jan Kara; +Cc: linux-fsdevel, linux-nfs Callers of lookup_one_qstr_excl() often check if the result is negative or positive. These changes can easily be moved into lookup_one_qstr_excl() by checking the lookup flags: LOOKUP_CREATE means it is NOT an error if the name doesn't exist. LOOKUP_EXCL means it IS an error if the name DOES exist. This patch adds these checks, then removes error checks from callers, and ensures that appropriate flags are passed. This subtly changes the meaning of LOOKUP_EXCL. Previously it could only accompany LOOKUP_CREATE. Now it can accompany LOOKUP_RENAME_TARGET as well. A couple of small changes are needed to accommodate this. The NFS change is functionally a no-op but ensures nfs_is_exclusive_create() does exactly what the name says. Signed-off-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/20250207034040.3402438-3-neilb@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org> --- Documentation/filesystems/porting.rst | 12 ++++++ fs/namei.c | 61 +++++++++------------------ fs/nfs/dir.c | 3 +- fs/smb/server/vfs.c | 26 +++++------- 4 files changed, 45 insertions(+), 57 deletions(-) diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst index 2ead47e20677..3b6622fbd66b 100644 --- a/Documentation/filesystems/porting.rst +++ b/Documentation/filesystems/porting.rst @@ -1165,3 +1165,15 @@ magic. kern_path_locked() and user_path_locked() no longer return a negative dentry so this doesn't need to be checked. If the name cannot be found, ERR_PTR(-ENOENT) is returned. + +** recommend** + +lookup_one_qstr_excl() is changed to return errors in more cases, so +these conditions don't require explicit checks. + - if LOOKUP_CREATE is NOT given, then the dentry won't be negative, + ERR_PTR(-ENOENT) is returned instead + - if LOOKUP_EXCL IS given, then the dentry won't be positive, + ERR_PTR(-EEXIST) is rreturned instread + +LOOKUP_EXCL now means "target must not exist". It can be combined with +LOOK_CREATE or LOOKUP_RENAME_TARGET. diff --git a/fs/namei.c b/fs/namei.c index fb6da3ca0ca5..b7cdca902803 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1670,6 +1670,8 @@ static struct dentry *lookup_dcache(const struct qstr *name, * dentries - as the matter of fact, this only gets called * when directory is guaranteed to have no in-lookup children * at all. + * Will return -ENOENT if name isn't found and LOOKUP_CREATE wasn't passed. + * Will return -EEXIST if name is found and LOOKUP_EXCL was passed. */ struct dentry *lookup_one_qstr_excl(const struct qstr *name, struct dentry *base, @@ -1680,7 +1682,7 @@ struct dentry *lookup_one_qstr_excl(const struct qstr *name, struct inode *dir = base->d_inode; if (dentry) - return dentry; + goto found; /* Don't create child dentry for a dead directory. */ if (unlikely(IS_DEADDIR(dir))) @@ -1695,6 +1697,17 @@ struct dentry *lookup_one_qstr_excl(const struct qstr *name, dput(dentry); dentry = old; } +found: + if (IS_ERR(dentry)) + return dentry; + if (d_is_negative(dentry) && !(flags & LOOKUP_CREATE)) { + dput(dentry); + return ERR_PTR(-ENOENT); + } + if (d_is_positive(dentry) && (flags & LOOKUP_EXCL)) { + dput(dentry); + return ERR_PTR(-EEXIST); + } return dentry; } EXPORT_SYMBOL(lookup_one_qstr_excl); @@ -2741,10 +2754,6 @@ static struct dentry *__kern_path_locked(int dfd, struct filename *name, struct } inode_lock_nested(path->dentry->d_inode, I_MUTEX_PARENT); d = lookup_one_qstr_excl(&last, path->dentry, 0); - if (!IS_ERR(d) && d_is_negative(d)) { - dput(d); - d = ERR_PTR(-ENOENT); - } if (IS_ERR(d)) { inode_unlock(path->dentry->d_inode); path_put(path); @@ -4082,27 +4091,13 @@ static struct dentry *filename_create(int dfd, struct filename *name, * '/', and a directory wasn't requested. */ if (last.name[last.len] && !want_dir) - create_flags = 0; + create_flags &= ~LOOKUP_CREATE; inode_lock_nested(path->dentry->d_inode, I_MUTEX_PARENT); dentry = lookup_one_qstr_excl(&last, path->dentry, reval_flag | create_flags); if (IS_ERR(dentry)) goto unlock; - error = -EEXIST; - if (d_is_positive(dentry)) - goto fail; - - /* - * Special case - lookup gave negative, but... we had foo/bar/ - * From the vfs_mknod() POV we just have a negative dentry - - * all is fine. Let's be bastards - you had / on the end, you've - * been asking for (non-existent) directory. -ENOENT for you. - */ - if (unlikely(!create_flags)) { - error = -ENOENT; - goto fail; - } if (unlikely(err2)) { error = err2; goto fail; @@ -4449,10 +4444,6 @@ int do_rmdir(int dfd, struct filename *name) error = PTR_ERR(dentry); if (IS_ERR(dentry)) goto exit3; - if (!dentry->d_inode) { - error = -ENOENT; - goto exit4; - } error = security_path_rmdir(&path, dentry); if (error) goto exit4; @@ -4583,7 +4574,7 @@ int do_unlinkat(int dfd, struct filename *name) if (!IS_ERR(dentry)) { /* Why not before? Because we want correct error value */ - if (last.name[last.len] || d_is_negative(dentry)) + if (last.name[last.len]) goto slashes; inode = dentry->d_inode; ihold(inode); @@ -4617,9 +4608,7 @@ int do_unlinkat(int dfd, struct filename *name) return error; slashes: - if (d_is_negative(dentry)) - error = -ENOENT; - else if (d_is_dir(dentry)) + if (d_is_dir(dentry)) error = -EISDIR; else error = -ENOTDIR; @@ -5119,7 +5108,8 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd, struct qstr old_last, new_last; int old_type, new_type; struct inode *delegated_inode = NULL; - unsigned int lookup_flags = 0, target_flags = LOOKUP_RENAME_TARGET; + unsigned int lookup_flags = 0, target_flags = + LOOKUP_RENAME_TARGET | LOOKUP_CREATE; bool should_retry = false; int error = -EINVAL; @@ -5132,6 +5122,8 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd, if (flags & RENAME_EXCHANGE) target_flags = 0; + if (flags & RENAME_NOREPLACE) + target_flags |= LOOKUP_EXCL; retry: error = filename_parentat(olddfd, from, lookup_flags, &old_path, @@ -5173,23 +5165,12 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd, error = PTR_ERR(old_dentry); if (IS_ERR(old_dentry)) goto exit3; - /* source must exist */ - error = -ENOENT; - if (d_is_negative(old_dentry)) - goto exit4; new_dentry = lookup_one_qstr_excl(&new_last, new_path.dentry, lookup_flags | target_flags); error = PTR_ERR(new_dentry); if (IS_ERR(new_dentry)) goto exit4; - error = -EEXIST; - if ((flags & RENAME_NOREPLACE) && d_is_positive(new_dentry)) - goto exit5; if (flags & RENAME_EXCHANGE) { - error = -ENOENT; - if (d_is_negative(new_dentry)) - goto exit5; - if (!d_is_dir(new_dentry)) { error = -ENOTDIR; if (new_last.name[new_last.len]) diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index 2b04038b0e40..56cf16a72334 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -1532,7 +1532,8 @@ static int nfs_is_exclusive_create(struct inode *dir, unsigned int flags) { if (NFS_PROTO(dir)->version == 2) return 0; - return flags & LOOKUP_EXCL; + return (flags & (LOOKUP_CREATE | LOOKUP_EXCL)) == + (LOOKUP_CREATE | LOOKUP_EXCL); } /* diff --git a/fs/smb/server/vfs.c b/fs/smb/server/vfs.c index 6890016e1923..fe29acef5872 100644 --- a/fs/smb/server/vfs.c +++ b/fs/smb/server/vfs.c @@ -113,11 +113,6 @@ static int ksmbd_vfs_path_lookup_locked(struct ksmbd_share_config *share_conf, if (IS_ERR(d)) goto err_out; - if (d_is_negative(d)) { - dput(d); - goto err_out; - } - path->dentry = d; path->mnt = mntget(parent_path->mnt); @@ -693,6 +688,7 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path, struct ksmbd_file *parent_fp; int new_type; int err, lookup_flags = LOOKUP_NO_SYMLINKS; + int target_lookup_flags = LOOKUP_RENAME_TARGET; if (ksmbd_override_fsids(work)) return -ENOMEM; @@ -703,6 +699,14 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path, goto revert_fsids; } + /* + * explicitly handle file overwrite case, for compatibility with + * filesystems that may not support rename flags (e.g: fuse) + */ + if (flags & RENAME_NOREPLACE) + target_lookup_flags |= LOOKUP_EXCL; + flags &= ~(RENAME_NOREPLACE); + retry: err = vfs_path_parent_lookup(to, lookup_flags | LOOKUP_BENEATH, &new_path, &new_last, &new_type, @@ -743,7 +747,7 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path, } new_dentry = lookup_one_qstr_excl(&new_last, new_path.dentry, - lookup_flags | LOOKUP_RENAME_TARGET); + lookup_flags | target_lookup_flags); if (IS_ERR(new_dentry)) { err = PTR_ERR(new_dentry); goto out3; @@ -754,16 +758,6 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path, goto out4; } - /* - * explicitly handle file overwrite case, for compatibility with - * filesystems that may not support rename flags (e.g: fuse) - */ - if ((flags & RENAME_NOREPLACE) && d_is_positive(new_dentry)) { - err = -EEXIST; - goto out4; - } - flags &= ~(RENAME_NOREPLACE); - if (old_child == trap) { err = -EINVAL; goto out4; -- 2.47.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] VFS: add common error checks to lookup_one_qstr_excl() 2025-02-17 0:27 ` [PATCH 2/2] VFS: add common error checks to lookup_one_qstr_excl() NeilBrown @ 2025-02-17 13:46 ` Jeff Layton 0 siblings, 0 replies; 21+ messages in thread From: Jeff Layton @ 2025-02-17 13:46 UTC (permalink / raw) To: NeilBrown, Christian Brauner, Alexander Viro, Jan Kara Cc: linux-fsdevel, linux-nfs On Mon, 2025-02-17 at 11:27 +1100, NeilBrown wrote: > Callers of lookup_one_qstr_excl() often check if the result is negative or > positive. > These changes can easily be moved into lookup_one_qstr_excl() by checking the > lookup flags: > LOOKUP_CREATE means it is NOT an error if the name doesn't exist. > LOOKUP_EXCL means it IS an error if the name DOES exist. > > This patch adds these checks, then removes error checks from callers, > and ensures that appropriate flags are passed. > > This subtly changes the meaning of LOOKUP_EXCL. Previously it could > only accompany LOOKUP_CREATE. Now it can accompany LOOKUP_RENAME_TARGET > as well. A couple of small changes are needed to accommodate this. The > NFS change is functionally a no-op but ensures nfs_is_exclusive_create() does > exactly what the name says. > > Signed-off-by: NeilBrown <neilb@suse.de> > Link: https://lore.kernel.org/r/20250207034040.3402438-3-neilb@suse.de > Signed-off-by: Christian Brauner <brauner@kernel.org> > --- > Documentation/filesystems/porting.rst | 12 ++++++ > fs/namei.c | 61 +++++++++------------------ > fs/nfs/dir.c | 3 +- > fs/smb/server/vfs.c | 26 +++++------- > 4 files changed, 45 insertions(+), 57 deletions(-) > > diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst > index 2ead47e20677..3b6622fbd66b 100644 > --- a/Documentation/filesystems/porting.rst > +++ b/Documentation/filesystems/porting.rst > @@ -1165,3 +1165,15 @@ magic. > kern_path_locked() and user_path_locked() no longer return a negative > dentry so this doesn't need to be checked. If the name cannot be found, > ERR_PTR(-ENOENT) is returned. > + > +** recommend** > + > +lookup_one_qstr_excl() is changed to return errors in more cases, so > +these conditions don't require explicit checks. > + - if LOOKUP_CREATE is NOT given, then the dentry won't be negative, > + ERR_PTR(-ENOENT) is returned instead > + - if LOOKUP_EXCL IS given, then the dentry won't be positive, > + ERR_PTR(-EEXIST) is rreturned instread > + > +LOOKUP_EXCL now means "target must not exist". It can be combined with > +LOOK_CREATE or LOOKUP_RENAME_TARGET. > diff --git a/fs/namei.c b/fs/namei.c > index fb6da3ca0ca5..b7cdca902803 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -1670,6 +1670,8 @@ static struct dentry *lookup_dcache(const struct qstr *name, > * dentries - as the matter of fact, this only gets called > * when directory is guaranteed to have no in-lookup children > * at all. > + * Will return -ENOENT if name isn't found and LOOKUP_CREATE wasn't passed. > + * Will return -EEXIST if name is found and LOOKUP_EXCL was passed. > */ > struct dentry *lookup_one_qstr_excl(const struct qstr *name, > struct dentry *base, > @@ -1680,7 +1682,7 @@ struct dentry *lookup_one_qstr_excl(const struct qstr *name, > struct inode *dir = base->d_inode; > > if (dentry) > - return dentry; > + goto found; > > /* Don't create child dentry for a dead directory. */ > if (unlikely(IS_DEADDIR(dir))) > @@ -1695,6 +1697,17 @@ struct dentry *lookup_one_qstr_excl(const struct qstr *name, > dput(dentry); > dentry = old; > } > +found: > + if (IS_ERR(dentry)) > + return dentry; > + if (d_is_negative(dentry) && !(flags & LOOKUP_CREATE)) { > + dput(dentry); > + return ERR_PTR(-ENOENT); > + } > + if (d_is_positive(dentry) && (flags & LOOKUP_EXCL)) { > + dput(dentry); > + return ERR_PTR(-EEXIST); > + } > return dentry; > } > EXPORT_SYMBOL(lookup_one_qstr_excl); > @@ -2741,10 +2754,6 @@ static struct dentry *__kern_path_locked(int dfd, struct filename *name, struct > } > inode_lock_nested(path->dentry->d_inode, I_MUTEX_PARENT); > d = lookup_one_qstr_excl(&last, path->dentry, 0); > - if (!IS_ERR(d) && d_is_negative(d)) { > - dput(d); > - d = ERR_PTR(-ENOENT); > - } > if (IS_ERR(d)) { > inode_unlock(path->dentry->d_inode); > path_put(path); > @@ -4082,27 +4091,13 @@ static struct dentry *filename_create(int dfd, struct filename *name, > * '/', and a directory wasn't requested. > */ > if (last.name[last.len] && !want_dir) > - create_flags = 0; > + create_flags &= ~LOOKUP_CREATE; > inode_lock_nested(path->dentry->d_inode, I_MUTEX_PARENT); > dentry = lookup_one_qstr_excl(&last, path->dentry, > reval_flag | create_flags); > if (IS_ERR(dentry)) > goto unlock; > > - error = -EEXIST; > - if (d_is_positive(dentry)) > - goto fail; > - > - /* > - * Special case - lookup gave negative, but... we had foo/bar/ > - * From the vfs_mknod() POV we just have a negative dentry - > - * all is fine. Let's be bastards - you had / on the end, you've > - * been asking for (non-existent) directory. -ENOENT for you. > - */ > - if (unlikely(!create_flags)) { > - error = -ENOENT; > - goto fail; > - } > if (unlikely(err2)) { > error = err2; > goto fail; > @@ -4449,10 +4444,6 @@ int do_rmdir(int dfd, struct filename *name) > error = PTR_ERR(dentry); > if (IS_ERR(dentry)) > goto exit3; > - if (!dentry->d_inode) { > - error = -ENOENT; > - goto exit4; > - } > error = security_path_rmdir(&path, dentry); > if (error) > goto exit4; > @@ -4583,7 +4574,7 @@ int do_unlinkat(int dfd, struct filename *name) > if (!IS_ERR(dentry)) { > > /* Why not before? Because we want correct error value */ > - if (last.name[last.len] || d_is_negative(dentry)) > + if (last.name[last.len]) > goto slashes; > inode = dentry->d_inode; > ihold(inode); > @@ -4617,9 +4608,7 @@ int do_unlinkat(int dfd, struct filename *name) > return error; > > slashes: > - if (d_is_negative(dentry)) > - error = -ENOENT; > - else if (d_is_dir(dentry)) > + if (d_is_dir(dentry)) > error = -EISDIR; > else > error = -ENOTDIR; > @@ -5119,7 +5108,8 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd, > struct qstr old_last, new_last; > int old_type, new_type; > struct inode *delegated_inode = NULL; > - unsigned int lookup_flags = 0, target_flags = LOOKUP_RENAME_TARGET; > + unsigned int lookup_flags = 0, target_flags = > + LOOKUP_RENAME_TARGET | LOOKUP_CREATE; > bool should_retry = false; > int error = -EINVAL; > > @@ -5132,6 +5122,8 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd, > > if (flags & RENAME_EXCHANGE) > target_flags = 0; > + if (flags & RENAME_NOREPLACE) > + target_flags |= LOOKUP_EXCL; > > retry: > error = filename_parentat(olddfd, from, lookup_flags, &old_path, > @@ -5173,23 +5165,12 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd, > error = PTR_ERR(old_dentry); > if (IS_ERR(old_dentry)) > goto exit3; > - /* source must exist */ > - error = -ENOENT; > - if (d_is_negative(old_dentry)) > - goto exit4; > new_dentry = lookup_one_qstr_excl(&new_last, new_path.dentry, > lookup_flags | target_flags); > error = PTR_ERR(new_dentry); > if (IS_ERR(new_dentry)) > goto exit4; > - error = -EEXIST; > - if ((flags & RENAME_NOREPLACE) && d_is_positive(new_dentry)) > - goto exit5; > if (flags & RENAME_EXCHANGE) { > - error = -ENOENT; > - if (d_is_negative(new_dentry)) > - goto exit5; > - > if (!d_is_dir(new_dentry)) { > error = -ENOTDIR; > if (new_last.name[new_last.len]) > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c > index 2b04038b0e40..56cf16a72334 100644 > --- a/fs/nfs/dir.c > +++ b/fs/nfs/dir.c > @@ -1532,7 +1532,8 @@ static int nfs_is_exclusive_create(struct inode *dir, unsigned int flags) > { > if (NFS_PROTO(dir)->version == 2) > return 0; > - return flags & LOOKUP_EXCL; > + return (flags & (LOOKUP_CREATE | LOOKUP_EXCL)) == > + (LOOKUP_CREATE | LOOKUP_EXCL); > } > > /* > diff --git a/fs/smb/server/vfs.c b/fs/smb/server/vfs.c > index 6890016e1923..fe29acef5872 100644 > --- a/fs/smb/server/vfs.c > +++ b/fs/smb/server/vfs.c > @@ -113,11 +113,6 @@ static int ksmbd_vfs_path_lookup_locked(struct ksmbd_share_config *share_conf, > if (IS_ERR(d)) > goto err_out; > > - if (d_is_negative(d)) { > - dput(d); > - goto err_out; > - } > - > path->dentry = d; > path->mnt = mntget(parent_path->mnt); > > @@ -693,6 +688,7 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path, > struct ksmbd_file *parent_fp; > int new_type; > int err, lookup_flags = LOOKUP_NO_SYMLINKS; > + int target_lookup_flags = LOOKUP_RENAME_TARGET; > > if (ksmbd_override_fsids(work)) > return -ENOMEM; > @@ -703,6 +699,14 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path, > goto revert_fsids; > } > > + /* > + * explicitly handle file overwrite case, for compatibility with > + * filesystems that may not support rename flags (e.g: fuse) > + */ > + if (flags & RENAME_NOREPLACE) > + target_lookup_flags |= LOOKUP_EXCL; > + flags &= ~(RENAME_NOREPLACE); > + > retry: > err = vfs_path_parent_lookup(to, lookup_flags | LOOKUP_BENEATH, > &new_path, &new_last, &new_type, > @@ -743,7 +747,7 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path, > } > > new_dentry = lookup_one_qstr_excl(&new_last, new_path.dentry, > - lookup_flags | LOOKUP_RENAME_TARGET); > + lookup_flags | target_lookup_flags); > if (IS_ERR(new_dentry)) { > err = PTR_ERR(new_dentry); > goto out3; > @@ -754,16 +758,6 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path, > goto out4; > } > > - /* > - * explicitly handle file overwrite case, for compatibility with > - * filesystems that may not support rename flags (e.g: fuse) > - */ > - if ((flags & RENAME_NOREPLACE) && d_is_positive(new_dentry)) { > - err = -EEXIST; > - goto out4; > - } > - flags &= ~(RENAME_NOREPLACE); > - > if (old_child == trap) { > err = -EINVAL; > goto out4; Nice simplification. Reviewed-by: Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 0/2] VFS: minor improvements to a couple of interfaces @ 2025-02-07 3:36 NeilBrown 2025-02-07 3:36 ` [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry NeilBrown 0 siblings, 1 reply; 21+ messages in thread From: NeilBrown @ 2025-02-07 3:36 UTC (permalink / raw) To: Christian Brauner, Alexander Viro, Jan Kara Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Danilo Krummrich, Kent Overstreet, Trond Myklebust, Anna Schumaker, Namjae Jeon, Steve French, Sergey Senozhatsky, Tom Talpey, Paul Moore, Eric Paris, linux-kernel, linux-bcachefs, linux-fsdevel, linux-nfs, linux-cifs, audit Hi, I found these opportunities for simplification as part of my work to enhance filesystem directory operations to not require an exclusive lock on the directory. There are quite a collection of users of these interfaces incluing NFS, smb/server, bcachefs, devtmpfs, and audit. Hence the long Cc line. NeilBrown [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() [PATCH 2/2] VFS: add common error checks to lookup_one_qstr_excl() ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry 2025-02-07 3:36 [PATCH 0/2] VFS: minor improvements to a couple of interfaces NeilBrown @ 2025-02-07 3:36 ` NeilBrown 2025-02-07 3:46 ` Kent Overstreet 2025-02-07 19:09 ` Paul Moore 0 siblings, 2 replies; 21+ messages in thread From: NeilBrown @ 2025-02-07 3:36 UTC (permalink / raw) To: Christian Brauner, Alexander Viro, Jan Kara Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Danilo Krummrich, Kent Overstreet, Trond Myklebust, Anna Schumaker, Namjae Jeon, Steve French, Sergey Senozhatsky, Tom Talpey, Paul Moore, Eric Paris, linux-kernel, linux-bcachefs, linux-fsdevel, linux-nfs, linux-cifs, audit No callers of kern_path_locked() or user_path_locked_at() want a negative dentry. So change them to return -ENOENT instead. This simplifies callers. This results in a subtle change to bcachefs in that an ioctl will now return -ENOENT in preference to -EXDEV. I believe this restores the behaviour to what it was prior to Commit bbe6a7c899e7 ("bch2_ioctl_subvolume_destroy(): fix locking") Signed-off-by: NeilBrown <neilb@suse.de> --- drivers/base/devtmpfs.c | 65 +++++++++++++++++++---------------------- fs/bcachefs/fs-ioctl.c | 4 --- fs/namei.c | 4 +++ kernel/audit_watch.c | 12 ++++---- 4 files changed, 40 insertions(+), 45 deletions(-) diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c index b848764ef018..c9e34842139f 100644 --- a/drivers/base/devtmpfs.c +++ b/drivers/base/devtmpfs.c @@ -245,15 +245,12 @@ static int dev_rmdir(const char *name) dentry = kern_path_locked(name, &parent); if (IS_ERR(dentry)) return PTR_ERR(dentry); - if (d_really_is_positive(dentry)) { - if (d_inode(dentry)->i_private == &thread) - err = vfs_rmdir(&nop_mnt_idmap, d_inode(parent.dentry), - dentry); - else - err = -EPERM; - } else { - err = -ENOENT; - } + if (d_inode(dentry)->i_private == &thread) + err = vfs_rmdir(&nop_mnt_idmap, d_inode(parent.dentry), + dentry); + else + err = -EPERM; + dput(dentry); inode_unlock(d_inode(parent.dentry)); path_put(&parent); @@ -310,6 +307,8 @@ static int handle_remove(const char *nodename, struct device *dev) { struct path parent; struct dentry *dentry; + struct kstat stat; + struct path p; int deleted = 0; int err; @@ -317,32 +316,28 @@ static int handle_remove(const char *nodename, struct device *dev) if (IS_ERR(dentry)) return PTR_ERR(dentry); - if (d_really_is_positive(dentry)) { - struct kstat stat; - struct path p = {.mnt = parent.mnt, .dentry = dentry}; - err = vfs_getattr(&p, &stat, STATX_TYPE | STATX_MODE, - AT_STATX_SYNC_AS_STAT); - if (!err && dev_mynode(dev, d_inode(dentry), &stat)) { - struct iattr newattrs; - /* - * before unlinking this node, reset permissions - * of possible references like hardlinks - */ - newattrs.ia_uid = GLOBAL_ROOT_UID; - newattrs.ia_gid = GLOBAL_ROOT_GID; - newattrs.ia_mode = stat.mode & ~0777; - newattrs.ia_valid = - ATTR_UID|ATTR_GID|ATTR_MODE; - inode_lock(d_inode(dentry)); - notify_change(&nop_mnt_idmap, dentry, &newattrs, NULL); - inode_unlock(d_inode(dentry)); - err = vfs_unlink(&nop_mnt_idmap, d_inode(parent.dentry), - dentry, NULL); - if (!err || err == -ENOENT) - deleted = 1; - } - } else { - err = -ENOENT; + p.mnt = parent.mnt; + p.dentry = dentry; + err = vfs_getattr(&p, &stat, STATX_TYPE | STATX_MODE, + AT_STATX_SYNC_AS_STAT); + if (!err && dev_mynode(dev, d_inode(dentry), &stat)) { + struct iattr newattrs; + /* + * before unlinking this node, reset permissions + * of possible references like hardlinks + */ + newattrs.ia_uid = GLOBAL_ROOT_UID; + newattrs.ia_gid = GLOBAL_ROOT_GID; + newattrs.ia_mode = stat.mode & ~0777; + newattrs.ia_valid = + ATTR_UID|ATTR_GID|ATTR_MODE; + inode_lock(d_inode(dentry)); + notify_change(&nop_mnt_idmap, dentry, &newattrs, NULL); + inode_unlock(d_inode(dentry)); + err = vfs_unlink(&nop_mnt_idmap, d_inode(parent.dentry), + dentry, NULL); + if (!err || err == -ENOENT) + deleted = 1; } dput(dentry); inode_unlock(d_inode(parent.dentry)); diff --git a/fs/bcachefs/fs-ioctl.c b/fs/bcachefs/fs-ioctl.c index 15725b4ce393..595b57fabc9a 100644 --- a/fs/bcachefs/fs-ioctl.c +++ b/fs/bcachefs/fs-ioctl.c @@ -511,10 +511,6 @@ static long bch2_ioctl_subvolume_destroy(struct bch_fs *c, struct file *filp, ret = -EXDEV; goto err; } - if (!d_is_positive(victim)) { - ret = -ENOENT; - goto err; - } ret = __bch2_unlink(dir, victim, true); if (!ret) { fsnotify_rmdir(dir, victim); diff --git a/fs/namei.c b/fs/namei.c index 3a4039acdb3f..e3047db7b2b4 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2736,6 +2736,10 @@ static struct dentry *__kern_path_locked(int dfd, struct filename *name, struct } inode_lock_nested(path->dentry->d_inode, I_MUTEX_PARENT); d = lookup_one_qstr_excl(&last, path->dentry, 0); + if (!IS_ERR(d) && d_is_negative(d)) { + dput(d); + d = ERR_PTR(-ENOENT); + } if (IS_ERR(d)) { inode_unlock(path->dentry->d_inode); path_put(path); diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c index 7f358740e958..e3130675ee6b 100644 --- a/kernel/audit_watch.c +++ b/kernel/audit_watch.c @@ -350,11 +350,10 @@ static int audit_get_nd(struct audit_watch *watch, struct path *parent) struct dentry *d = kern_path_locked(watch->path, parent); if (IS_ERR(d)) return PTR_ERR(d); - if (d_is_positive(d)) { - /* update watch filter fields */ - watch->dev = d->d_sb->s_dev; - watch->ino = d_backing_inode(d)->i_ino; - } + /* update watch filter fields */ + watch->dev = d->d_sb->s_dev; + watch->ino = d_backing_inode(d)->i_ino; + inode_unlock(d_backing_inode(parent->dentry)); dput(d); return 0; @@ -419,7 +418,7 @@ int audit_add_watch(struct audit_krule *krule, struct list_head **list) /* caller expects mutex locked */ mutex_lock(&audit_filter_mutex); - if (ret) { + if (ret && ret != -ENOENT) { audit_put_watch(watch); return ret; } @@ -438,6 +437,7 @@ int audit_add_watch(struct audit_krule *krule, struct list_head **list) h = audit_hash_ino((u32)watch->ino); *list = &audit_inode_hash[h]; + ret = 0; error: path_put(&parent_path); audit_put_watch(watch); -- 2.47.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry 2025-02-07 3:36 ` [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry NeilBrown @ 2025-02-07 3:46 ` Kent Overstreet 2025-02-07 4:53 ` NeilBrown 2025-02-07 19:09 ` Paul Moore 1 sibling, 1 reply; 21+ messages in thread From: Kent Overstreet @ 2025-02-07 3:46 UTC (permalink / raw) To: NeilBrown Cc: Christian Brauner, Alexander Viro, Jan Kara, Greg Kroah-Hartman, Rafael J. Wysocki, Danilo Krummrich, Trond Myklebust, Anna Schumaker, Namjae Jeon, Steve French, Sergey Senozhatsky, Tom Talpey, Paul Moore, Eric Paris, linux-kernel, linux-bcachefs, linux-fsdevel, linux-nfs, linux-cifs, audit On Fri, Feb 07, 2025 at 02:36:47PM +1100, NeilBrown wrote: > No callers of kern_path_locked() or user_path_locked_at() want a > negative dentry. So change them to return -ENOENT instead. This > simplifies callers. > > This results in a subtle change to bcachefs in that an ioctl will now > return -ENOENT in preference to -EXDEV. I believe this restores the > behaviour to what it was prior to I'm not following how the code change matches the commit message? > Commit bbe6a7c899e7 ("bch2_ioctl_subvolume_destroy(): fix locking") > > Signed-off-by: NeilBrown <neilb@suse.de> > --- > diff --git a/fs/bcachefs/fs-ioctl.c b/fs/bcachefs/fs-ioctl.c > index 15725b4ce393..595b57fabc9a 100644 > --- a/fs/bcachefs/fs-ioctl.c > +++ b/fs/bcachefs/fs-ioctl.c > @@ -511,10 +511,6 @@ static long bch2_ioctl_subvolume_destroy(struct bch_fs *c, struct file *filp, > ret = -EXDEV; > goto err; > } > - if (!d_is_positive(victim)) { > - ret = -ENOENT; > - goto err; > - } > ret = __bch2_unlink(dir, victim, true); > if (!ret) { > fsnotify_rmdir(dir, victim); ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry 2025-02-07 3:46 ` Kent Overstreet @ 2025-02-07 4:53 ` NeilBrown 2025-02-07 6:13 ` Kent Overstreet 2025-02-07 6:53 ` NeilBrown 0 siblings, 2 replies; 21+ messages in thread From: NeilBrown @ 2025-02-07 4:53 UTC (permalink / raw) To: Kent Overstreet Cc: Christian Brauner, Alexander Viro, Jan Kara, Greg Kroah-Hartman, Rafael J. Wysocki, Danilo Krummrich, Trond Myklebust, Anna Schumaker, Namjae Jeon, Steve French, Sergey Senozhatsky, Tom Talpey, Paul Moore, Eric Paris, linux-kernel, linux-bcachefs, linux-fsdevel, linux-nfs, linux-cifs, audit On Fri, 07 Feb 2025, Kent Overstreet wrote: > On Fri, Feb 07, 2025 at 02:36:47PM +1100, NeilBrown wrote: > > No callers of kern_path_locked() or user_path_locked_at() want a > > negative dentry. So change them to return -ENOENT instead. This > > simplifies callers. > > > > This results in a subtle change to bcachefs in that an ioctl will now > > return -ENOENT in preference to -EXDEV. I believe this restores the > > behaviour to what it was prior to > > I'm not following how the code change matches the commit message? Maybe it doesn't. Let me checked. Two of the possible error returns from bch2_ioctl_subvolume_destroy(), which implements the BCH_IOCTL_SUBVOLUME_DESTROY ioctl, are -ENOENT and -EXDEV. -ENOENT is returned if the path named in arg.dst_ptr cannot be found. -EXDEV is returned if the filesystem on which that path exists is not the one that the ioctl is called on. If the target filesystem is "/foo" and the path given is "/bar/baz" and /bar exists but /bar/baz does not, then user_path_locked_at or user_path_at will return a negative dentry corresponding to the (non-existent) name "baz" in /bar. In this case the dentry exists so the filesystem on which it was found can be tested, but the dentry is negative. So both -ENOENT and -EXDEV are credible return values. - before bbe6a7c899e7 the -EXDEV is tested immediately after the call to user_path_att() so there is no chance that ENOENT will be returned. I cannot actually find where ENOENT could be returned ... but that doesn't really matter now. - after that patch .... again the -EXDEV test comes first. That isn't what I remember. I must have misread it. - after my patch user_path_locked_at() will return -ENOENT if the whole name cannot be found. So now you get -ENOENT instead of -EXDEV. So with my patch, ENOENT always wins, and it was never like that before. Thanks for challenging me! Do you think there could be a problem with changing the error returned in this circumstance? i.e. if you try to destroy a subvolume with a non-existant name on a different filesystem could getting -ENOENT instead of -EXDEV be noticed? Thanks, NeilBrown ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry 2025-02-07 4:53 ` NeilBrown @ 2025-02-07 6:13 ` Kent Overstreet 2025-02-07 6:34 ` NeilBrown 2025-02-07 6:53 ` NeilBrown 1 sibling, 1 reply; 21+ messages in thread From: Kent Overstreet @ 2025-02-07 6:13 UTC (permalink / raw) To: NeilBrown Cc: Christian Brauner, Alexander Viro, Jan Kara, Greg Kroah-Hartman, Rafael J. Wysocki, Danilo Krummrich, Trond Myklebust, Anna Schumaker, Namjae Jeon, Steve French, Sergey Senozhatsky, Tom Talpey, Paul Moore, Eric Paris, linux-kernel, linux-bcachefs, linux-fsdevel, linux-nfs, linux-cifs, audit On Fri, Feb 07, 2025 at 03:53:52PM +1100, NeilBrown wrote: > On Fri, 07 Feb 2025, Kent Overstreet wrote: > > On Fri, Feb 07, 2025 at 02:36:47PM +1100, NeilBrown wrote: > > > No callers of kern_path_locked() or user_path_locked_at() want a > > > negative dentry. So change them to return -ENOENT instead. This > > > simplifies callers. > > > > > > This results in a subtle change to bcachefs in that an ioctl will now > > > return -ENOENT in preference to -EXDEV. I believe this restores the > > > behaviour to what it was prior to > > > > I'm not following how the code change matches the commit message? > > Maybe it doesn't. Let me checked. > > Two of the possible error returns from bch2_ioctl_subvolume_destroy(), > which implements the BCH_IOCTL_SUBVOLUME_DESTROY ioctl, are -ENOENT and > -EXDEV. > > -ENOENT is returned if the path named in arg.dst_ptr cannot be found. > -EXDEV is returned if the filesystem on which that path exists is not > the one that the ioctl is called on. > > If the target filesystem is "/foo" and the path given is "/bar/baz" and > /bar exists but /bar/baz does not, then user_path_locked_at or > user_path_at will return a negative dentry corresponding to the > (non-existent) name "baz" in /bar. > > In this case the dentry exists so the filesystem on which it was found > can be tested, but the dentry is negative. So both -ENOENT and -EXDEV > are credible return values. > > > - before bbe6a7c899e7 the -EXDEV is tested immediately after the call > to user_path_att() so there is no chance that ENOENT will be returned. > I cannot actually find where ENOENT could be returned ... but that > doesn't really matter now. > > - after that patch .... again the -EXDEV test comes first. That isn't > what I remember. I must have misread it. > > - after my patch user_path_locked_at() will return -ENOENT if the whole > name cannot be found. So now you get -ENOENT instead of -EXDEV. > > So with my patch, ENOENT always wins, and it was never like that before. > Thanks for challenging me! How do you always manage to be unfailingly polite? :) > > Do you think there could be a problem with changing the error returned > in this circumstance? i.e. if you try to destroy a subvolume with a > non-existant name on a different filesystem could getting -ENOENT > instead of -EXDEV be noticed? -EXDEV is the standard error code for "we're crossing a filesystem boundary and we can't or aren't supposed to be", so no, let's not change that. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry 2025-02-07 6:13 ` Kent Overstreet @ 2025-02-07 6:34 ` NeilBrown 2025-02-07 6:51 ` Kent Overstreet 0 siblings, 1 reply; 21+ messages in thread From: NeilBrown @ 2025-02-07 6:34 UTC (permalink / raw) To: Kent Overstreet Cc: Christian Brauner, Alexander Viro, Jan Kara, Greg Kroah-Hartman, Rafael J. Wysocki, Danilo Krummrich, Trond Myklebust, Anna Schumaker, Namjae Jeon, Steve French, Sergey Senozhatsky, Tom Talpey, Paul Moore, Eric Paris, linux-kernel, linux-bcachefs, linux-fsdevel, linux-nfs, linux-cifs, audit On Fri, 07 Feb 2025, Kent Overstreet wrote: > On Fri, Feb 07, 2025 at 03:53:52PM +1100, NeilBrown wrote: > > On Fri, 07 Feb 2025, Kent Overstreet wrote: > > > On Fri, Feb 07, 2025 at 02:36:47PM +1100, NeilBrown wrote: > > > > No callers of kern_path_locked() or user_path_locked_at() want a > > > > negative dentry. So change them to return -ENOENT instead. This > > > > simplifies callers. > > > > > > > > This results in a subtle change to bcachefs in that an ioctl will now > > > > return -ENOENT in preference to -EXDEV. I believe this restores the > > > > behaviour to what it was prior to > > > > > > I'm not following how the code change matches the commit message? > > > > Maybe it doesn't. Let me checked. > > > > Two of the possible error returns from bch2_ioctl_subvolume_destroy(), > > which implements the BCH_IOCTL_SUBVOLUME_DESTROY ioctl, are -ENOENT and > > -EXDEV. > > > > -ENOENT is returned if the path named in arg.dst_ptr cannot be found. > > -EXDEV is returned if the filesystem on which that path exists is not > > the one that the ioctl is called on. > > > > If the target filesystem is "/foo" and the path given is "/bar/baz" and > > /bar exists but /bar/baz does not, then user_path_locked_at or > > user_path_at will return a negative dentry corresponding to the > > (non-existent) name "baz" in /bar. > > > > In this case the dentry exists so the filesystem on which it was found > > can be tested, but the dentry is negative. So both -ENOENT and -EXDEV > > are credible return values. > > > > > > - before bbe6a7c899e7 the -EXDEV is tested immediately after the call > > to user_path_att() so there is no chance that ENOENT will be returned. > > I cannot actually find where ENOENT could be returned ... but that > > doesn't really matter now. > > > > - after that patch .... again the -EXDEV test comes first. That isn't > > what I remember. I must have misread it. > > > > - after my patch user_path_locked_at() will return -ENOENT if the whole > > name cannot be found. So now you get -ENOENT instead of -EXDEV. > > > > So with my patch, ENOENT always wins, and it was never like that before. > > Thanks for challenging me! > > How do you always manage to be unfailingly polite? :) My dad impressed the value of politeness on me at an early age. It is an effective way of manipulating people! > > > > > Do you think there could be a problem with changing the error returned > > in this circumstance? i.e. if you try to destroy a subvolume with a > > non-existant name on a different filesystem could getting -ENOENT > > instead of -EXDEV be noticed? > > -EXDEV is the standard error code for "we're crossing a filesystem > boundary and we can't or aren't supposed to be", so no, let's not change > that. > OK. As bcachefs is the only user of user_path_locked_at() it shouldn't be too hard. Thanks, NeilBrown ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry 2025-02-07 6:34 ` NeilBrown @ 2025-02-07 6:51 ` Kent Overstreet 2025-02-07 7:30 ` NeilBrown 0 siblings, 1 reply; 21+ messages in thread From: Kent Overstreet @ 2025-02-07 6:51 UTC (permalink / raw) To: NeilBrown Cc: Christian Brauner, Alexander Viro, Jan Kara, Greg Kroah-Hartman, Rafael J. Wysocki, Danilo Krummrich, Trond Myklebust, Anna Schumaker, Namjae Jeon, Steve French, Sergey Senozhatsky, Tom Talpey, Paul Moore, Eric Paris, linux-kernel, linux-bcachefs, linux-fsdevel, linux-nfs, linux-cifs, audit On Fri, Feb 07, 2025 at 05:34:23PM +1100, NeilBrown wrote: > On Fri, 07 Feb 2025, Kent Overstreet wrote: > > On Fri, Feb 07, 2025 at 03:53:52PM +1100, NeilBrown wrote: > > > Do you think there could be a problem with changing the error returned > > > in this circumstance? i.e. if you try to destroy a subvolume with a > > > non-existant name on a different filesystem could getting -ENOENT > > > instead of -EXDEV be noticed? > > > > -EXDEV is the standard error code for "we're crossing a filesystem > > boundary and we can't or aren't supposed to be", so no, let's not change > > that. > > > > OK. As bcachefs is the only user of user_path_locked_at() it shouldn't > be too hard. Hang on, why does that require keeping user_path_locked_at()? Just compare i_sb... ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry 2025-02-07 6:51 ` Kent Overstreet @ 2025-02-07 7:30 ` NeilBrown 2025-02-07 13:35 ` Kent Overstreet 0 siblings, 1 reply; 21+ messages in thread From: NeilBrown @ 2025-02-07 7:30 UTC (permalink / raw) To: Kent Overstreet Cc: Christian Brauner, Alexander Viro, Jan Kara, Greg Kroah-Hartman, Rafael J. Wysocki, Danilo Krummrich, Trond Myklebust, Anna Schumaker, Namjae Jeon, Steve French, Sergey Senozhatsky, Tom Talpey, Paul Moore, Eric Paris, linux-kernel, linux-bcachefs, linux-fsdevel, linux-nfs, linux-cifs, audit On Fri, 07 Feb 2025, Kent Overstreet wrote: > On Fri, Feb 07, 2025 at 05:34:23PM +1100, NeilBrown wrote: > > On Fri, 07 Feb 2025, Kent Overstreet wrote: > > > On Fri, Feb 07, 2025 at 03:53:52PM +1100, NeilBrown wrote: > > > > Do you think there could be a problem with changing the error returned > > > > in this circumstance? i.e. if you try to destroy a subvolume with a > > > > non-existant name on a different filesystem could getting -ENOENT > > > > instead of -EXDEV be noticed? > > > > > > -EXDEV is the standard error code for "we're crossing a filesystem > > > boundary and we can't or aren't supposed to be", so no, let's not change > > > that. > > > > > > > OK. As bcachefs is the only user of user_path_locked_at() it shouldn't > > be too hard. > > Hang on, why does that require keeping user_path_locked_at()? Just > compare i_sb... > I changed user_path_locked_at() to not return a dentry at all when the full path couldn't be found. If there is no dentry, then there is no ->d_sb. (if there was an ->i_sb, there would be an inode and this all wouldn't be an issue). To recap: the difference happens if the path DOESN'T exist but the parent DOES exist on a DIFFERENT filesystem. It is very much a corner case and the error code shouldn't matter. But I had to ask... NeilBrown ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry 2025-02-07 7:30 ` NeilBrown @ 2025-02-07 13:35 ` Kent Overstreet 2025-02-10 1:20 ` NeilBrown 0 siblings, 1 reply; 21+ messages in thread From: Kent Overstreet @ 2025-02-07 13:35 UTC (permalink / raw) To: NeilBrown Cc: Christian Brauner, Alexander Viro, Jan Kara, Greg Kroah-Hartman, Rafael J. Wysocki, Danilo Krummrich, Trond Myklebust, Anna Schumaker, Namjae Jeon, Steve French, Sergey Senozhatsky, Tom Talpey, Paul Moore, Eric Paris, linux-kernel, linux-bcachefs, linux-fsdevel, linux-nfs, linux-cifs, audit On Fri, Feb 07, 2025 at 06:30:00PM +1100, NeilBrown wrote: > On Fri, 07 Feb 2025, Kent Overstreet wrote: > > On Fri, Feb 07, 2025 at 05:34:23PM +1100, NeilBrown wrote: > > > On Fri, 07 Feb 2025, Kent Overstreet wrote: > > > > On Fri, Feb 07, 2025 at 03:53:52PM +1100, NeilBrown wrote: > > > > > Do you think there could be a problem with changing the error returned > > > > > in this circumstance? i.e. if you try to destroy a subvolume with a > > > > > non-existant name on a different filesystem could getting -ENOENT > > > > > instead of -EXDEV be noticed? > > > > > > > > -EXDEV is the standard error code for "we're crossing a filesystem > > > > boundary and we can't or aren't supposed to be", so no, let's not change > > > > that. > > > > > > > > > > OK. As bcachefs is the only user of user_path_locked_at() it shouldn't > > > be too hard. > > > > Hang on, why does that require keeping user_path_locked_at()? Just > > compare i_sb... > > > > I changed user_path_locked_at() to not return a dentry at all when the > full path couldn't be found. If there is no dentry, then there is no > ->d_sb. > (if there was an ->i_sb, there would be an inode and this all wouldn't > be an issue). > > To recap: the difference happens if the path DOESN'T exist but the > parent DOES exist on a DIFFERENT filesystem. It is very much a corner > case and the error code shouldn't matter. But I had to ask... Ahh... Well, if I've scanned the series correctly (sorry, we're on different timezones and I haven't had much caffeine yet) I hope you don't have to keep that function just for bcachefs - but I do think the error code is important. Userspace getting -ENOENT and reporting -ENOENT to the user will inevitably lead to head banging frustration by someone, somewhere, when they're trying to delete something and the system is tell them it doesn't exist when they can see it very much does exist, right there :) the more precise error code is a very helpful cue... ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry 2025-02-07 13:35 ` Kent Overstreet @ 2025-02-10 1:20 ` NeilBrown 2025-02-10 16:33 ` Kent Overstreet 0 siblings, 1 reply; 21+ messages in thread From: NeilBrown @ 2025-02-10 1:20 UTC (permalink / raw) To: Kent Overstreet Cc: Christian Brauner, Alexander Viro, Jan Kara, Greg Kroah-Hartman, Rafael J. Wysocki, Danilo Krummrich, Trond Myklebust, Anna Schumaker, Namjae Jeon, Steve French, Sergey Senozhatsky, Tom Talpey, Paul Moore, Eric Paris, linux-kernel, linux-bcachefs, linux-fsdevel, linux-nfs, linux-cifs, audit On Sat, 08 Feb 2025, Kent Overstreet wrote: > On Fri, Feb 07, 2025 at 06:30:00PM +1100, NeilBrown wrote: > > On Fri, 07 Feb 2025, Kent Overstreet wrote: > > > On Fri, Feb 07, 2025 at 05:34:23PM +1100, NeilBrown wrote: > > > > On Fri, 07 Feb 2025, Kent Overstreet wrote: > > > > > On Fri, Feb 07, 2025 at 03:53:52PM +1100, NeilBrown wrote: > > > > > > Do you think there could be a problem with changing the error returned > > > > > > in this circumstance? i.e. if you try to destroy a subvolume with a > > > > > > non-existant name on a different filesystem could getting -ENOENT > > > > > > instead of -EXDEV be noticed? > > > > > > > > > > -EXDEV is the standard error code for "we're crossing a filesystem > > > > > boundary and we can't or aren't supposed to be", so no, let's not change > > > > > that. > > > > > > > > > > > > > OK. As bcachefs is the only user of user_path_locked_at() it shouldn't > > > > be too hard. > > > > > > Hang on, why does that require keeping user_path_locked_at()? Just > > > compare i_sb... > > > > > > > I changed user_path_locked_at() to not return a dentry at all when the > > full path couldn't be found. If there is no dentry, then there is no > > ->d_sb. > > (if there was an ->i_sb, there would be an inode and this all wouldn't > > be an issue). > > > > To recap: the difference happens if the path DOESN'T exist but the > > parent DOES exist on a DIFFERENT filesystem. It is very much a corner > > case and the error code shouldn't matter. But I had to ask... > > Ahh... > > Well, if I've scanned the series correctly (sorry, we're on different > timezones and I haven't had much caffeine yet) I hope you don't have to > keep that function just for bcachefs - but I do think the error code is > important. > > Userspace getting -ENOENT and reporting -ENOENT to the user will > inevitably lead to head banging frustration by someone, somewhere, when > they're trying to delete something and the system is tell them it > doesn't exist when they can see it very much does exist, right there :) > the more precise error code is a very helpful cue... > ??? You will only get -ENOENT if there is no ent. There is no question of a confusing error message. If you ask for a non-exist name on the correct filesystem, you get -ENOENT If you ask for an existing name of the wrong filesystem, you get -EXDEV That all works as expected and always has. But what if you ask for a non-existing name in a directory on the wrong filesystem? The code you originally wrote in 42d237320e9817a9 would return -ENOENT because that it what user_path_at() would return. But using user_path_at() is "wrong" because it doesn't lock the directory so ->d_parent is not guaranteed to be stable. Al fixed that in bbe6a7c899e7f265c using user_path_locked_at(), but that doesn't check for a negative dentry so Al added a check to return -ENOENT, but that was added *after* the test that returns -EXDEV. So now if you call subvolume_destroy on a non-existing name in a directory on the wrong filesystem, you get -EXDEV. I think that is a bit weird but not a lot weird. My patch will change it back to -ENOENT - the way you originally wrote it. I hope you are ok with that. NeilBrown ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry 2025-02-10 1:20 ` NeilBrown @ 2025-02-10 16:33 ` Kent Overstreet 2025-02-12 3:24 ` NeilBrown 0 siblings, 1 reply; 21+ messages in thread From: Kent Overstreet @ 2025-02-10 16:33 UTC (permalink / raw) To: NeilBrown Cc: Christian Brauner, Alexander Viro, Jan Kara, Greg Kroah-Hartman, Rafael J. Wysocki, Danilo Krummrich, Trond Myklebust, Anna Schumaker, Namjae Jeon, Steve French, Sergey Senozhatsky, Tom Talpey, Paul Moore, Eric Paris, linux-kernel, linux-bcachefs, linux-fsdevel, linux-nfs, linux-cifs, audit On Mon, Feb 10, 2025 at 12:20:15PM +1100, NeilBrown wrote: > On Sat, 08 Feb 2025, Kent Overstreet wrote: > > On Fri, Feb 07, 2025 at 06:30:00PM +1100, NeilBrown wrote: > > > On Fri, 07 Feb 2025, Kent Overstreet wrote: > > > > On Fri, Feb 07, 2025 at 05:34:23PM +1100, NeilBrown wrote: > > > > > On Fri, 07 Feb 2025, Kent Overstreet wrote: > > > > > > On Fri, Feb 07, 2025 at 03:53:52PM +1100, NeilBrown wrote: > > > > > > > Do you think there could be a problem with changing the error returned > > > > > > > in this circumstance? i.e. if you try to destroy a subvolume with a > > > > > > > non-existant name on a different filesystem could getting -ENOENT > > > > > > > instead of -EXDEV be noticed? > > > > > > > > > > > > -EXDEV is the standard error code for "we're crossing a filesystem > > > > > > boundary and we can't or aren't supposed to be", so no, let's not change > > > > > > that. > > > > > > > > > > > > > > > > OK. As bcachefs is the only user of user_path_locked_at() it shouldn't > > > > > be too hard. > > > > > > > > Hang on, why does that require keeping user_path_locked_at()? Just > > > > compare i_sb... > > > > > > > > > > I changed user_path_locked_at() to not return a dentry at all when the > > > full path couldn't be found. If there is no dentry, then there is no > > > ->d_sb. > > > (if there was an ->i_sb, there would be an inode and this all wouldn't > > > be an issue). > > > > > > To recap: the difference happens if the path DOESN'T exist but the > > > parent DOES exist on a DIFFERENT filesystem. It is very much a corner > > > case and the error code shouldn't matter. But I had to ask... > > > > Ahh... > > > > Well, if I've scanned the series correctly (sorry, we're on different > > timezones and I haven't had much caffeine yet) I hope you don't have to > > keep that function just for bcachefs - but I do think the error code is > > important. > > > > Userspace getting -ENOENT and reporting -ENOENT to the user will > > inevitably lead to head banging frustration by someone, somewhere, when > > they're trying to delete something and the system is tell them it > > doesn't exist when they can see it very much does exist, right there :) > > the more precise error code is a very helpful cue... > > > > ??? > You will only get -ENOENT if there is no ent. There is no question of a > confusing error message. > If you ask for a non-exist name on the correct filesystem, you get -ENOENT > If you ask for an existing name of the wrong filesystem, you get -EXDEV > That all works as expected and always has. > > But what if you ask for a non-existing name in a directory on the > wrong filesystem? > The code you originally wrote in 42d237320e9817a9 would return > -ENOENT because that it what user_path_at() would return. Ahh - ok, I think I see where I misread before > But using user_path_at() is "wrong" because it doesn't lock the directory > so ->d_parent is not guaranteed to be stable. > Al fixed that in bbe6a7c899e7f265c using user_path_locked_at(), but > that doesn't check for a negative dentry so Al added a check to return > -ENOENT, but that was added *after* the test that returns -EXDEV. > > So now if you call subvolume_destroy on a non-existing name in a > directory on the wrong filesystem, you get -EXDEV. I think that is > a bit weird but not a lot weird. Yeah, we don't need to preserve that. As long as calling it on a name that _does_ exist on a different filesystem returns -EXDEV, that's all I care about. So assuming that's the case you can go ahead and add my acked-by... Nit: I would go back and stare at the patch some more, but threading got completely fubar so I can't find anything. Doh. > My patch will change it back to -ENOENT - the way you originally wrote > it. > > I hope you are ok with that. Yes, sounds good. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry 2025-02-10 16:33 ` Kent Overstreet @ 2025-02-12 3:24 ` NeilBrown 0 siblings, 0 replies; 21+ messages in thread From: NeilBrown @ 2025-02-12 3:24 UTC (permalink / raw) To: Kent Overstreet Cc: Christian Brauner, Alexander Viro, Jan Kara, Greg Kroah-Hartman, Rafael J. Wysocki, Danilo Krummrich, Trond Myklebust, Anna Schumaker, Namjae Jeon, Steve French, Sergey Senozhatsky, Tom Talpey, Paul Moore, Eric Paris, linux-kernel, linux-bcachefs, linux-fsdevel, linux-nfs, linux-cifs, audit On Tue, 11 Feb 2025, Kent Overstreet wrote: > On Mon, Feb 10, 2025 at 12:20:15PM +1100, NeilBrown wrote: > > On Sat, 08 Feb 2025, Kent Overstreet wrote: > > > On Fri, Feb 07, 2025 at 06:30:00PM +1100, NeilBrown wrote: > > > > On Fri, 07 Feb 2025, Kent Overstreet wrote: > > > > > On Fri, Feb 07, 2025 at 05:34:23PM +1100, NeilBrown wrote: > > > > > > On Fri, 07 Feb 2025, Kent Overstreet wrote: > > > > > > > On Fri, Feb 07, 2025 at 03:53:52PM +1100, NeilBrown wrote: > > > > > > > > Do you think there could be a problem with changing the error returned > > > > > > > > in this circumstance? i.e. if you try to destroy a subvolume with a > > > > > > > > non-existant name on a different filesystem could getting -ENOENT > > > > > > > > instead of -EXDEV be noticed? > > > > > > > > > > > > > > -EXDEV is the standard error code for "we're crossing a filesystem > > > > > > > boundary and we can't or aren't supposed to be", so no, let's not change > > > > > > > that. > > > > > > > > > > > > > > > > > > > OK. As bcachefs is the only user of user_path_locked_at() it shouldn't > > > > > > be too hard. > > > > > > > > > > Hang on, why does that require keeping user_path_locked_at()? Just > > > > > compare i_sb... > > > > > > > > > > > > > I changed user_path_locked_at() to not return a dentry at all when the > > > > full path couldn't be found. If there is no dentry, then there is no > > > > ->d_sb. > > > > (if there was an ->i_sb, there would be an inode and this all wouldn't > > > > be an issue). > > > > > > > > To recap: the difference happens if the path DOESN'T exist but the > > > > parent DOES exist on a DIFFERENT filesystem. It is very much a corner > > > > case and the error code shouldn't matter. But I had to ask... > > > > > > Ahh... > > > > > > Well, if I've scanned the series correctly (sorry, we're on different > > > timezones and I haven't had much caffeine yet) I hope you don't have to > > > keep that function just for bcachefs - but I do think the error code is > > > important. > > > > > > Userspace getting -ENOENT and reporting -ENOENT to the user will > > > inevitably lead to head banging frustration by someone, somewhere, when > > > they're trying to delete something and the system is tell them it > > > doesn't exist when they can see it very much does exist, right there :) > > > the more precise error code is a very helpful cue... > > > > > > > ??? > > You will only get -ENOENT if there is no ent. There is no question of a > > confusing error message. > > If you ask for a non-exist name on the correct filesystem, you get -ENOENT > > If you ask for an existing name of the wrong filesystem, you get -EXDEV > > That all works as expected and always has. > > > > But what if you ask for a non-existing name in a directory on the > > wrong filesystem? > > The code you originally wrote in 42d237320e9817a9 would return > > -ENOENT because that it what user_path_at() would return. > > Ahh - ok, I think I see where I misread before > > > But using user_path_at() is "wrong" because it doesn't lock the directory > > so ->d_parent is not guaranteed to be stable. > > Al fixed that in bbe6a7c899e7f265c using user_path_locked_at(), but > > that doesn't check for a negative dentry so Al added a check to return > > -ENOENT, but that was added *after* the test that returns -EXDEV. > > > > So now if you call subvolume_destroy on a non-existing name in a > > directory on the wrong filesystem, you get -EXDEV. I think that is > > a bit weird but not a lot weird. > > Yeah, we don't need to preserve that. As long as calling it on a name > that _does_ exist on a different filesystem returns -EXDEV, that's all I > care about. > > So assuming that's the case you can go ahead and add my acked-by... Cool - thanks. > > Nit: I would go back and stare at the patch some more, but threading got > completely fubar so I can't find anything. Doh. > :-) NeilBrown > > My patch will change it back to -ENOENT - the way you originally wrote > > it. > > > > I hope you are ok with that. > > Yes, sounds good. > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry 2025-02-07 4:53 ` NeilBrown 2025-02-07 6:13 ` Kent Overstreet @ 2025-02-07 6:53 ` NeilBrown 1 sibling, 0 replies; 21+ messages in thread From: NeilBrown @ 2025-02-07 6:53 UTC (permalink / raw) To: Kent Overstreet Cc: Christian Brauner, Alexander Viro, Jan Kara, Greg Kroah-Hartman, Rafael J. Wysocki, Danilo Krummrich, Trond Myklebust, Anna Schumaker, Namjae Jeon, Steve French, Sergey Senozhatsky, Tom Talpey, Paul Moore, Eric Paris, linux-kernel, linux-bcachefs, linux-fsdevel, linux-nfs, linux-cifs, audit On Fri, 07 Feb 2025, NeilBrown wrote: > On Fri, 07 Feb 2025, Kent Overstreet wrote: > > On Fri, Feb 07, 2025 at 02:36:47PM +1100, NeilBrown wrote: > > > No callers of kern_path_locked() or user_path_locked_at() want a > > > negative dentry. So change them to return -ENOENT instead. This > > > simplifies callers. > > > > > > This results in a subtle change to bcachefs in that an ioctl will now > > > return -ENOENT in preference to -EXDEV. I believe this restores the > > > behaviour to what it was prior to > > > > I'm not following how the code change matches the commit message? > > Maybe it doesn't. Let me checked. > > Two of the possible error returns from bch2_ioctl_subvolume_destroy(), > which implements the BCH_IOCTL_SUBVOLUME_DESTROY ioctl, are -ENOENT and > -EXDEV. > > -ENOENT is returned if the path named in arg.dst_ptr cannot be found. > -EXDEV is returned if the filesystem on which that path exists is not > the one that the ioctl is called on. > > If the target filesystem is "/foo" and the path given is "/bar/baz" and > /bar exists but /bar/baz does not, then user_path_locked_at or > user_path_at will return a negative dentry corresponding to the > (non-existent) name "baz" in /bar. > > In this case the dentry exists so the filesystem on which it was found > can be tested, but the dentry is negative. So both -ENOENT and -EXDEV > are credible return values. > > > - before bbe6a7c899e7 the -EXDEV is tested immediately after the call > to user_path_at() so there is no chance that ENOENT will be returned. > I cannot actually find where ENOENT could be returned ... but that > doesn't really matter now. No, that's not right. user_path_at() never returns a negative dentry. It isn't documented and I tried reading the code instead of looking at callers. So before that commit it DID return ENOENT rather than -EXDEV where as after that commit it returns -EXDEV rather than -ENOENT. So my patch IS restoring the old behaviour. Are you *sure* you want -EXDEV when given a non-existent path? Thanks, NeilBrown > > - after that patch .... again the -EXDEV test comes first. That isn't > what I remember. I must have misread it. > > - after my patch user_path_locked_at() will return -ENOENT if the whole > name cannot be found. So now you get -ENOENT instead of -EXDEV. > > So with my patch, ENOENT always wins, and it was never like that before. > Thanks for challenging me! > > Do you think there could be a problem with changing the error returned > in this circumstance? i.e. if you try to destroy a subvolume with a > non-existant name on a different filesystem could getting -ENOENT > instead of -EXDEV be noticed? > > Thanks, > NeilBrown > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry 2025-02-07 3:36 ` [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry NeilBrown 2025-02-07 3:46 ` Kent Overstreet @ 2025-02-07 19:09 ` Paul Moore 1 sibling, 0 replies; 21+ messages in thread From: Paul Moore @ 2025-02-07 19:09 UTC (permalink / raw) To: NeilBrown Cc: Christian Brauner, Alexander Viro, Jan Kara, Greg Kroah-Hartman, Rafael J. Wysocki, Danilo Krummrich, Kent Overstreet, Trond Myklebust, Anna Schumaker, Namjae Jeon, Steve French, Sergey Senozhatsky, Tom Talpey, Eric Paris, linux-kernel, linux-bcachefs, linux-fsdevel, linux-nfs, linux-cifs, audit On Thu, Feb 6, 2025 at 10:41 PM NeilBrown <neilb@suse.de> wrote: > > No callers of kern_path_locked() or user_path_locked_at() want a > negative dentry. So change them to return -ENOENT instead. This > simplifies callers. > > This results in a subtle change to bcachefs in that an ioctl will now > return -ENOENT in preference to -EXDEV. I believe this restores the > behaviour to what it was prior to > Commit bbe6a7c899e7 ("bch2_ioctl_subvolume_destroy(): fix locking") > > Signed-off-by: NeilBrown <neilb@suse.de> > --- > drivers/base/devtmpfs.c | 65 +++++++++++++++++++---------------------- > fs/bcachefs/fs-ioctl.c | 4 --- > fs/namei.c | 4 +++ > kernel/audit_watch.c | 12 ++++---- > 4 files changed, 40 insertions(+), 45 deletions(-) ... > diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c > index 7f358740e958..e3130675ee6b 100644 > --- a/kernel/audit_watch.c > +++ b/kernel/audit_watch.c > @@ -350,11 +350,10 @@ static int audit_get_nd(struct audit_watch *watch, struct path *parent) > struct dentry *d = kern_path_locked(watch->path, parent); > if (IS_ERR(d)) > return PTR_ERR(d); > - if (d_is_positive(d)) { > - /* update watch filter fields */ > - watch->dev = d->d_sb->s_dev; > - watch->ino = d_backing_inode(d)->i_ino; > - } > + /* update watch filter fields */ > + watch->dev = d->d_sb->s_dev; > + watch->ino = d_backing_inode(d)->i_ino; > + > inode_unlock(d_backing_inode(parent->dentry)); > dput(d); > return 0; > @@ -419,7 +418,7 @@ int audit_add_watch(struct audit_krule *krule, struct list_head **list) > /* caller expects mutex locked */ > mutex_lock(&audit_filter_mutex); > > - if (ret) { > + if (ret && ret != -ENOENT) { > audit_put_watch(watch); > return ret; > } > @@ -438,6 +437,7 @@ int audit_add_watch(struct audit_krule *krule, struct list_head **list) > > h = audit_hash_ino((u32)watch->ino); > *list = &audit_inode_hash[h]; > + ret = 0; If you have to respin this patch for any reason I'd prefer to move the 'ret = 0' up to just after the if block you're modifying in the chunk above, but that's a trivial nitpick so please don't respin only for that. Otherwise it looks fine to me from an audit perspective. Acked-by: Paul Moore <paul@paul-moore.com> (Audit) > error: > path_put(&parent_path); > audit_put_watch(watch); > -- > 2.47.1 -- paul-moore.com ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2025-04-14 15:54 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-02-17 0:27 [PATCH 0/2 v2] VFS: minor improvements to a couple of interfaces NeilBrown 2025-02-17 0:27 ` [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry NeilBrown 2025-02-17 8:26 ` Christian Brauner 2025-02-17 13:41 ` Jeff Layton 2025-04-14 11:01 ` Christian Brauner 2025-04-14 15:54 ` Christian Brauner 2025-02-17 0:27 ` [PATCH 2/2] VFS: add common error checks to lookup_one_qstr_excl() NeilBrown 2025-02-17 13:46 ` Jeff Layton -- strict thread matches above, loose matches on Subject: below -- 2025-02-07 3:36 [PATCH 0/2] VFS: minor improvements to a couple of interfaces NeilBrown 2025-02-07 3:36 ` [PATCH 1/2] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry NeilBrown 2025-02-07 3:46 ` Kent Overstreet 2025-02-07 4:53 ` NeilBrown 2025-02-07 6:13 ` Kent Overstreet 2025-02-07 6:34 ` NeilBrown 2025-02-07 6:51 ` Kent Overstreet 2025-02-07 7:30 ` NeilBrown 2025-02-07 13:35 ` Kent Overstreet 2025-02-10 1:20 ` NeilBrown 2025-02-10 16:33 ` Kent Overstreet 2025-02-12 3:24 ` NeilBrown 2025-02-07 6:53 ` NeilBrown 2025-02-07 19:09 ` Paul Moore
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).