* [PATCH 0/6] VFS: more prep for change to directory locking
@ 2025-09-06 4:57 NeilBrown
2025-09-06 4:57 ` [PATCH 1/6] fs/proc: Don't look root inode when creating "self" and "thread-self" NeilBrown
` (5 more replies)
0 siblings, 6 replies; 22+ messages in thread
From: NeilBrown @ 2025-09-06 4:57 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein; +Cc: Jan Kara, linux-fsdevel
This is a selection of cleanups, renaming of some APIs, and the
addition of one simple API (patch 2) which follows an existing
pattern. It seemed useful to separate this set from others which
will be primarily focussed on adding and using some new APIs.
Thanks,
NeilBrown
[PATCH 1/6] fs/proc: Don't look root inode when creating "self" and
[PATCH 2/6] VFS/ovl: add lookup_one_positive_killable()
[PATCH 3/6] VFS: discard err2 in filename_create()
[PATCH 4/6] VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata
[PATCH 5/6] VFS/audit: introduce kern_path_parent() for audit
[PATCH 6/6] VFS: rename kern_path_locked() to kern_path_removing()
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 1/6] fs/proc: Don't look root inode when creating "self" and "thread-self"
2025-09-06 4:57 [PATCH 0/6] VFS: more prep for change to directory locking NeilBrown
@ 2025-09-06 4:57 ` NeilBrown
2025-09-06 5:50 ` Al Viro
2025-09-06 4:57 ` [PATCH 2/6] VFS/ovl: add lookup_one_positive_killable() NeilBrown
` (4 subsequent siblings)
5 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2025-09-06 4:57 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein; +Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
proc_setup_self() and proc_setup_thread_self() are only called from
proc_fill_super() which is before the filesystem is "live". So there is
no need to lock the root directory when adding "self" and "thread-self".
The locking rules are expected to change, so this locking will become
anachronistic if we don't remove it.
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/proc/self.c | 3 ---
fs/proc/thread_self.c | 3 ---
2 files changed, 6 deletions(-)
diff --git a/fs/proc/self.c b/fs/proc/self.c
index b46fbfd22681..a2867b009269 100644
--- a/fs/proc/self.c
+++ b/fs/proc/self.c
@@ -35,12 +35,10 @@ static unsigned self_inum __ro_after_init;
int proc_setup_self(struct super_block *s)
{
- struct inode *root_inode = d_inode(s->s_root);
struct proc_fs_info *fs_info = proc_sb_info(s);
struct dentry *self;
int ret = -ENOMEM;
- inode_lock(root_inode);
self = d_alloc_name(s->s_root, "self");
if (self) {
struct inode *inode = new_inode(s);
@@ -57,7 +55,6 @@ int proc_setup_self(struct super_block *s)
dput(self);
}
}
- inode_unlock(root_inode);
if (ret)
pr_err("proc_fill_super: can't allocate /proc/self\n");
diff --git a/fs/proc/thread_self.c b/fs/proc/thread_self.c
index 0e5050d6ab64..9c4ff2131046 100644
--- a/fs/proc/thread_self.c
+++ b/fs/proc/thread_self.c
@@ -35,12 +35,10 @@ static unsigned thread_self_inum __ro_after_init;
int proc_setup_thread_self(struct super_block *s)
{
- struct inode *root_inode = d_inode(s->s_root);
struct proc_fs_info *fs_info = proc_sb_info(s);
struct dentry *thread_self;
int ret = -ENOMEM;
- inode_lock(root_inode);
thread_self = d_alloc_name(s->s_root, "thread-self");
if (thread_self) {
struct inode *inode = new_inode(s);
@@ -57,7 +55,6 @@ int proc_setup_thread_self(struct super_block *s)
dput(thread_self);
}
}
- inode_unlock(root_inode);
if (ret)
pr_err("proc_fill_super: can't allocate /proc/thread-self\n");
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 2/6] VFS/ovl: add lookup_one_positive_killable()
2025-09-06 4:57 [PATCH 0/6] VFS: more prep for change to directory locking NeilBrown
2025-09-06 4:57 ` [PATCH 1/6] fs/proc: Don't look root inode when creating "self" and "thread-self" NeilBrown
@ 2025-09-06 4:57 ` NeilBrown
2025-09-06 9:44 ` Amir Goldstein
2025-09-06 4:57 ` [PATCH 3/6] VFS: discard err2 in filename_create() NeilBrown
` (3 subsequent siblings)
5 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2025-09-06 4:57 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein; +Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
ovl wants a lookup which won't block on a fatal signal.
It currently uses down_write_killable() and then repeated
calls to lookup_one()
The lock may not be needed if the name is already in the dcache and it
aid proposed future changes if the locking is kept internal to namei.c
So this patch adds lookup_one_positive_killable() which is like
lookup_one_positive() but will abort in the face of a fatal signal.
overlayfs is changed to use this.
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/namei.c | 54 ++++++++++++++++++++++++++++++++++++++++++
fs/overlayfs/readdir.c | 28 +++++++++++-----------
include/linux/namei.h | 3 +++
3 files changed, 71 insertions(+), 14 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index cd43ff89fbaa..b1bc298b9d7c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1827,6 +1827,19 @@ static struct dentry *lookup_slow(const struct qstr *name,
return res;
}
+static struct dentry *lookup_slow_killable(const struct qstr *name,
+ struct dentry *dir,
+ unsigned int flags)
+{
+ struct inode *inode = dir->d_inode;
+ struct dentry *res;
+ if (inode_lock_shared_killable(inode))
+ return ERR_PTR(-EINTR);
+ res = __lookup_slow(name, dir, flags);
+ inode_unlock_shared(inode);
+ return res;
+}
+
static inline int may_lookup(struct mnt_idmap *idmap,
struct nameidata *restrict nd)
{
@@ -3010,6 +3023,47 @@ struct dentry *lookup_one_unlocked(struct mnt_idmap *idmap, struct qstr *name,
}
EXPORT_SYMBOL(lookup_one_unlocked);
+/**
+ * lookup_one_positive_killable - lookup single pathname component
+ * @idmap: idmap of the mount the lookup is performed from
+ * @name: qstr olding pathname component to lookup
+ * @base: base directory to lookup from
+ *
+ * This helper will yield ERR_PTR(-ENOENT) on negatives. The helper returns
+ * known positive or ERR_PTR(). This is what most of the users want.
+ *
+ * Note that pinned negative with unlocked parent _can_ become positive at any
+ * time, so callers of lookup_one_unlocked() need to be very careful; pinned
+ * positives have >d_inode stable, so this one avoids such problems.
+ *
+ * This can be used for in-kernel filesystem clients such as file servers.
+ *
+ * Ut should be called without the parent i_rwsem held, and will take
+ * the i_rwsem itself if necessary. If a fatal signal is pending or
+ * delivered, it will return %-EINTR.
+ */
+struct dentry *lookup_one_positive_killable(struct mnt_idmap *idmap,
+ struct qstr *name,
+ struct dentry *base)
+{
+ int err;
+ struct dentry *ret;
+
+ err = lookup_one_common(idmap, name, base);
+ if (err)
+ return ERR_PTR(err);
+
+ ret = lookup_dcache(name, base, 0);
+ if (!ret)
+ ret = lookup_slow_killable(name, base, 0);
+ if (!IS_ERR(ret) && d_flags_negative(smp_load_acquire(&ret->d_flags))) {
+ dput(ret);
+ ret = ERR_PTR(-ENOENT);
+ }
+ return ret;
+}
+EXPORT_SYMBOL(lookup_one_positive_killable);
+
/**
* lookup_one_positive_unlocked - lookup single pathname component
* @idmap: idmap of the mount the lookup is performed from
diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index b65cdfce31ce..15cb06fa0c9a 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -270,26 +270,26 @@ static bool ovl_fill_merge(struct dir_context *ctx, const char *name,
static int ovl_check_whiteouts(const struct path *path, struct ovl_readdir_data *rdd)
{
- int err;
+ int err = 0;
struct dentry *dentry, *dir = path->dentry;
const struct cred *old_cred;
old_cred = ovl_override_creds(rdd->dentry->d_sb);
- err = down_write_killable(&dir->d_inode->i_rwsem);
- if (!err) {
- while (rdd->first_maybe_whiteout) {
- struct ovl_cache_entry *p =
- rdd->first_maybe_whiteout;
- rdd->first_maybe_whiteout = p->next_maybe_whiteout;
- dentry = lookup_one(mnt_idmap(path->mnt),
- &QSTR_LEN(p->name, p->len), dir);
- if (!IS_ERR(dentry)) {
- p->is_whiteout = ovl_is_whiteout(dentry);
- dput(dentry);
- }
+ while (rdd->first_maybe_whiteout) {
+ struct ovl_cache_entry *p =
+ rdd->first_maybe_whiteout;
+ rdd->first_maybe_whiteout = p->next_maybe_whiteout;
+ dentry = lookup_one_positive_killable(mnt_idmap(path->mnt),
+ &QSTR_LEN(p->name, p->len),
+ dir);
+ if (!IS_ERR(dentry)) {
+ p->is_whiteout = ovl_is_whiteout(dentry);
+ dput(dentry);
+ } else if (PTR_ERR(dentry) == -EINTR) {
+ err = -EINTR;
+ break;
}
- inode_unlock(dir->d_inode);
}
ovl_revert_creds(old_cred);
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 5d085428e471..551a1a01e5e7 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -80,6 +80,9 @@ struct dentry *lookup_one_unlocked(struct mnt_idmap *idmap,
struct dentry *lookup_one_positive_unlocked(struct mnt_idmap *idmap,
struct qstr *name,
struct dentry *base);
+struct dentry *lookup_one_positive_killable(struct mnt_idmap *idmap,
+ struct qstr *name,
+ struct dentry *base);
extern int follow_down_one(struct path *);
extern int follow_down(struct path *path, unsigned int flags);
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 3/6] VFS: discard err2 in filename_create()
2025-09-06 4:57 [PATCH 0/6] VFS: more prep for change to directory locking NeilBrown
2025-09-06 4:57 ` [PATCH 1/6] fs/proc: Don't look root inode when creating "self" and "thread-self" NeilBrown
2025-09-06 4:57 ` [PATCH 2/6] VFS/ovl: add lookup_one_positive_killable() NeilBrown
@ 2025-09-06 4:57 ` NeilBrown
2025-09-06 4:57 ` [PATCH 4/6] VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata NeilBrown
` (2 subsequent siblings)
5 siblings, 0 replies; 22+ messages in thread
From: NeilBrown @ 2025-09-06 4:57 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein; +Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
Since 204a575e91f3 "VFS: add common error checks to lookup_one_qstr_excl()"
filename_create() does not need to stash the error value from mnt_want_write()
into a separate variable - the logic that used to clobber 'error' after the
call of mnt_want_write() has migrated into lookup_one_qstr_excl().
So there is no need for two different err variables.
This patch discards "err2" and uses "error' throughout.
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/namei.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index b1bc298b9d7c..a8d9fa44f2bf 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4168,7 +4168,6 @@ static struct dentry *filename_create(int dfd, struct filename *name,
unsigned int reval_flag = lookup_flags & LOOKUP_REVAL;
unsigned int create_flags = LOOKUP_CREATE | LOOKUP_EXCL;
int type;
- int err2;
int error;
error = filename_parentat(dfd, name, reval_flag, path, &last, &type);
@@ -4183,7 +4182,7 @@ static struct dentry *filename_create(int dfd, struct filename *name,
goto out;
/* don't fail immediately if it's r/o, at least try to report other errors */
- err2 = mnt_want_write(path->mnt);
+ error = mnt_want_write(path->mnt);
/*
* Do the final lookup. Suppress 'create' if there is a trailing
* '/', and a directory wasn't requested.
@@ -4196,17 +4195,16 @@ static struct dentry *filename_create(int dfd, struct filename *name,
if (IS_ERR(dentry))
goto unlock;
- if (unlikely(err2)) {
- error = err2;
+ if (unlikely(error))
goto fail;
- }
+
return dentry;
fail:
dput(dentry);
dentry = ERR_PTR(error);
unlock:
inode_unlock(path->dentry->d_inode);
- if (!err2)
+ if (!error)
mnt_drop_write(path->mnt);
out:
path_put(path);
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 4/6] VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata
2025-09-06 4:57 [PATCH 0/6] VFS: more prep for change to directory locking NeilBrown
` (2 preceding siblings ...)
2025-09-06 4:57 ` [PATCH 3/6] VFS: discard err2 in filename_create() NeilBrown
@ 2025-09-06 4:57 ` NeilBrown
2025-09-15 12:10 ` Christian Brauner
2025-09-06 4:57 ` [PATCH 5/6] VFS/audit: introduce kern_path_parent() for audit NeilBrown
2025-09-06 4:57 ` [PATCH 6/6] VFS: rename kern_path_locked() to kern_path_removing() NeilBrown
5 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2025-09-06 4:57 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein; +Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
A rename can only rename within a single mount. Callers of vfs_rename()
must and do ensure this is the case.
So there is no point in having two mnt_idmaps in renamedata as they are
always the same. Only one of them is passed to ->rename in any case.
This patch replaces both with a single "mnt_idmap" and changes all
callers.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/cachefiles/namei.c | 3 +--
fs/ecryptfs/inode.c | 3 +--
fs/namei.c | 17 ++++++++---------
fs/nfsd/vfs.c | 3 +--
fs/overlayfs/overlayfs.h | 3 +--
fs/smb/server/vfs.c | 3 +--
include/linux/fs.h | 6 ++----
7 files changed, 15 insertions(+), 23 deletions(-)
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 91dfd0231877..d1edb2ac3837 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -387,10 +387,9 @@ int cachefiles_bury_object(struct cachefiles_cache *cache,
cachefiles_io_error(cache, "Rename security error %d", ret);
} else {
struct renamedata rd = {
- .old_mnt_idmap = &nop_mnt_idmap,
+ .mnt_idmap = &nop_mnt_idmap,
.old_parent = dir,
.old_dentry = rep,
- .new_mnt_idmap = &nop_mnt_idmap,
.new_parent = cache->graveyard,
.new_dentry = grave,
};
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 72fbe1316ab8..abd954c6a14e 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -634,10 +634,9 @@ ecryptfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
goto out_lock;
}
- rd.old_mnt_idmap = &nop_mnt_idmap;
+ rd.mnt_idmap = &nop_mnt_idmap;
rd.old_parent = lower_old_dir_dentry;
rd.old_dentry = lower_old_dentry;
- rd.new_mnt_idmap = &nop_mnt_idmap;
rd.new_parent = lower_new_dir_dentry;
rd.new_dentry = lower_new_dentry;
rc = vfs_rename(&rd);
diff --git a/fs/namei.c b/fs/namei.c
index a8d9fa44f2bf..8b065dbca2ab 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -5076,20 +5076,20 @@ int vfs_rename(struct renamedata *rd)
if (source == target)
return 0;
- error = may_delete(rd->old_mnt_idmap, old_dir, old_dentry, is_dir);
+ error = may_delete(rd->mnt_idmap, old_dir, old_dentry, is_dir);
if (error)
return error;
if (!target) {
- error = may_create(rd->new_mnt_idmap, new_dir, new_dentry);
+ error = may_create(rd->mnt_idmap, new_dir, new_dentry);
} else {
new_is_dir = d_is_dir(new_dentry);
if (!(flags & RENAME_EXCHANGE))
- error = may_delete(rd->new_mnt_idmap, new_dir,
+ error = may_delete(rd->mnt_idmap, new_dir,
new_dentry, is_dir);
else
- error = may_delete(rd->new_mnt_idmap, new_dir,
+ error = may_delete(rd->mnt_idmap, new_dir,
new_dentry, new_is_dir);
}
if (error)
@@ -5104,13 +5104,13 @@ int vfs_rename(struct renamedata *rd)
*/
if (new_dir != old_dir) {
if (is_dir) {
- error = inode_permission(rd->old_mnt_idmap, source,
+ error = inode_permission(rd->mnt_idmap, source,
MAY_WRITE);
if (error)
return error;
}
if ((flags & RENAME_EXCHANGE) && new_is_dir) {
- error = inode_permission(rd->new_mnt_idmap, target,
+ error = inode_permission(rd->mnt_idmap, target,
MAY_WRITE);
if (error)
return error;
@@ -5178,7 +5178,7 @@ int vfs_rename(struct renamedata *rd)
if (error)
goto out;
}
- error = old_dir->i_op->rename(rd->new_mnt_idmap, old_dir, old_dentry,
+ error = old_dir->i_op->rename(rd->mnt_idmap, old_dir, old_dentry,
new_dir, new_dentry, flags);
if (error)
goto out;
@@ -5321,10 +5321,9 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
rd.old_parent = old_path.dentry;
rd.old_dentry = old_dentry;
- rd.old_mnt_idmap = mnt_idmap(old_path.mnt);
+ rd.mnt_idmap = mnt_idmap(old_path.mnt);
rd.new_parent = new_path.dentry;
rd.new_dentry = new_dentry;
- rd.new_mnt_idmap = mnt_idmap(new_path.mnt);
rd.delegated_inode = &delegated_inode;
rd.flags = flags;
error = vfs_rename(&rd);
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index edf050766e57..aa4a95713a48 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1951,10 +1951,9 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
goto out_dput_old;
} else {
struct renamedata rd = {
- .old_mnt_idmap = &nop_mnt_idmap,
+ .mnt_idmap = &nop_mnt_idmap,
.old_parent = fdentry,
.old_dentry = odentry,
- .new_mnt_idmap = &nop_mnt_idmap,
.new_parent = tdentry,
.new_dentry = ndentry,
};
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index bb0d7ded8e76..4f84abaa0d68 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -361,10 +361,9 @@ static inline int ovl_do_rename(struct ovl_fs *ofs, struct dentry *olddir,
{
int err;
struct renamedata rd = {
- .old_mnt_idmap = ovl_upper_mnt_idmap(ofs),
+ .mnt_idmap = ovl_upper_mnt_idmap(ofs),
.old_parent = olddir,
.old_dentry = olddentry,
- .new_mnt_idmap = ovl_upper_mnt_idmap(ofs),
.new_parent = newdir,
.new_dentry = newdentry,
.flags = flags,
diff --git a/fs/smb/server/vfs.c b/fs/smb/server/vfs.c
index 04539037108c..07739055ac9f 100644
--- a/fs/smb/server/vfs.c
+++ b/fs/smb/server/vfs.c
@@ -770,10 +770,9 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path,
goto out4;
}
- rd.old_mnt_idmap = mnt_idmap(old_path->mnt),
+ rd.mnt_idmap = mnt_idmap(old_path->mnt),
rd.old_parent = old_parent,
rd.old_dentry = old_child,
- rd.new_mnt_idmap = mnt_idmap(new_path.mnt),
rd.new_parent = new_path.dentry,
rd.new_dentry = new_dentry,
rd.flags = flags,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d7ab4f96d705..73b39e5bb9e4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2008,20 +2008,18 @@ int vfs_unlink(struct mnt_idmap *, struct inode *, struct dentry *,
/**
* struct renamedata - contains all information required for renaming
- * @old_mnt_idmap: idmap of the old mount the inode was found from
+ * @mnt_idmap: idmap of the mount in which the rename is happening.
* @old_parent: parent of source
* @old_dentry: source
- * @new_mnt_idmap: idmap of the new mount the inode was found from
* @new_parent: parent of destination
* @new_dentry: destination
* @delegated_inode: returns an inode needing a delegation break
* @flags: rename flags
*/
struct renamedata {
- struct mnt_idmap *old_mnt_idmap;
+ struct mnt_idmap *mnt_idmap;
struct dentry *old_parent;
struct dentry *old_dentry;
- struct mnt_idmap *new_mnt_idmap;
struct dentry *new_parent;
struct dentry *new_dentry;
struct inode **delegated_inode;
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 5/6] VFS/audit: introduce kern_path_parent() for audit
2025-09-06 4:57 [PATCH 0/6] VFS: more prep for change to directory locking NeilBrown
` (3 preceding siblings ...)
2025-09-06 4:57 ` [PATCH 4/6] VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata NeilBrown
@ 2025-09-06 4:57 ` NeilBrown
2025-09-07 2:17 ` kernel test robot
2025-09-06 4:57 ` [PATCH 6/6] VFS: rename kern_path_locked() to kern_path_removing() NeilBrown
5 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2025-09-06 4:57 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein; +Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
audit_alloc_mark() and audit_get_nd() both need to perform a path
lookup getting the parent dentry (which must exist) and the final
target (following a LAST_NORM name) which sometimes doesn't need to
exist.
They don't need the parent to be locked, but use kern_path_locked() or
kern_path_locked_negative() anyway. This is somewhat misleading to the
casual reader.
This patch introduces a more targeted function, kern_path_parent(),
which returns not holding locks. On success the "path" will
be set to the parent, which must be found, and the return value is the
dentry of the target, which might be negative.
This will clear the way to rename kern_path_locked() which is
otherwise only used to prepare for removing something.
It also allows us to remove kern_path_locked_negative(), which is
transformed into the new kern_path_parent().
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/namei.c | 23 +++++++++++++++++------
include/linux/namei.h | 2 +-
kernel/audit_fsnotify.c | 11 ++++++-----
kernel/audit_watch.c | 3 +--
4 files changed, 25 insertions(+), 14 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index 8b065dbca2ab..104015f302a7 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2780,7 +2780,20 @@ static struct dentry *__kern_path_locked(int dfd, struct filename *name, struct
return d;
}
-struct dentry *kern_path_locked_negative(const char *name, struct path *path)
+/**
+ * kern_path_parent: lookup path returning parent and target
+ * @name: path name
+ * @parent: path to store parent in
+ *
+ * The path @name should end with a normal component, not "." or ".." or "/".
+ * A lookup is performed and if successful the parent information
+ * is store in @parent and the dentry is returned.
+ *
+ * The dentry maybe negative, the parent will be positive.
+ *
+ * Returns: dentry or error.
+ */
+struct dentry *kern_path_parent(const char *name, struct path *path)
{
struct path parent_path __free(path_put) = {};
struct filename *filename __free(putname) = getname_kernel(name);
@@ -2793,12 +2806,10 @@ struct dentry *kern_path_locked_negative(const char *name, struct path *path)
return ERR_PTR(error);
if (unlikely(type != LAST_NORM))
return ERR_PTR(-EINVAL);
- inode_lock_nested(parent_path.dentry->d_inode, I_MUTEX_PARENT);
- d = lookup_one_qstr_excl(&last, parent_path.dentry, LOOKUP_CREATE);
- if (IS_ERR(d)) {
- inode_unlock(parent_path.dentry->d_inode);
+
+ d = lookup_noperm_unlocked(&last, parent_path.dentry);
+ if (IS_ERR(d))
return d;
- }
path->dentry = no_free_ptr(parent_path.dentry);
path->mnt = no_free_ptr(parent_path.mnt);
return d;
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 551a1a01e5e7..1d5038c21c20 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -57,12 +57,12 @@ struct dentry *lookup_one_qstr_excl(const struct qstr *name,
struct dentry *base,
unsigned int flags);
extern int kern_path(const char *, unsigned, struct path *);
+struct dentry *kern_path_parent(const char *name, struct path *parent);
extern struct dentry *kern_path_create(int, const char *, struct path *, unsigned int);
extern struct dentry *user_path_create(int, const char __user *, struct path *, unsigned int);
extern void done_path_create(struct path *, struct dentry *);
extern struct dentry *kern_path_locked(const char *, struct path *);
-extern struct dentry *kern_path_locked_negative(const char *, struct path *);
extern struct dentry *user_path_locked_at(int , const char __user *, struct path *);
int vfs_path_parent_lookup(struct filename *filename, unsigned int flags,
struct path *parent, struct qstr *last, int *type,
diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c
index c565fbf66ac8..a58c72ae0bb4 100644
--- a/kernel/audit_fsnotify.c
+++ b/kernel/audit_fsnotify.c
@@ -76,17 +76,18 @@ struct audit_fsnotify_mark *audit_alloc_mark(struct audit_krule *krule, char *pa
struct audit_fsnotify_mark *audit_mark;
struct path path;
struct dentry *dentry;
- struct inode *inode;
int ret;
if (pathname[0] != '/' || pathname[len-1] == '/')
return ERR_PTR(-EINVAL);
- dentry = kern_path_locked(pathname, &path);
+ dentry = kern_path_parent(pathname, &path);
if (IS_ERR(dentry))
return ERR_CAST(dentry); /* returning an error */
- inode = path.dentry->d_inode;
- inode_unlock(inode);
+ if (d_is_negative(dentry)) {
+ audit_mark = ERR_PTR(-ENOENT);
+ goto out;
+ }
audit_mark = kzalloc(sizeof(*audit_mark), GFP_KERNEL);
if (unlikely(!audit_mark)) {
@@ -100,7 +101,7 @@ struct audit_fsnotify_mark *audit_alloc_mark(struct audit_krule *krule, char *pa
audit_update_mark(audit_mark, dentry->d_inode);
audit_mark->rule = krule;
- ret = fsnotify_add_inode_mark(&audit_mark->mark, inode, 0);
+ ret = fsnotify_add_inode_mark(&audit_mark->mark, path.dentry->d_inode, 0);
if (ret < 0) {
audit_mark->path = NULL;
fsnotify_put_mark(&audit_mark->mark);
diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
index 0ebbbe37a60f..a700e3c8925f 100644
--- a/kernel/audit_watch.c
+++ b/kernel/audit_watch.c
@@ -349,7 +349,7 @@ static int audit_get_nd(struct audit_watch *watch, struct path *parent)
{
struct dentry *d;
- d = kern_path_locked_negative(watch->path, parent);
+ d = kern_path_parent(watch->path, parent);
if (IS_ERR(d))
return PTR_ERR(d);
@@ -359,7 +359,6 @@ static int audit_get_nd(struct audit_watch *watch, struct path *parent)
watch->ino = d_backing_inode(d)->i_ino;
}
- inode_unlock(d_backing_inode(parent->dentry));
dput(d);
return 0;
}
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 6/6] VFS: rename kern_path_locked() to kern_path_removing()
2025-09-06 4:57 [PATCH 0/6] VFS: more prep for change to directory locking NeilBrown
` (4 preceding siblings ...)
2025-09-06 4:57 ` [PATCH 5/6] VFS/audit: introduce kern_path_parent() for audit NeilBrown
@ 2025-09-06 4:57 ` NeilBrown
2025-09-06 9:09 ` Amir Goldstein
5 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2025-09-06 4:57 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein; +Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
Also rename user_path_locked_at() to user_path_removing_at()
Add done_path_removing() to clean up after these calls.
The only credible need for a locked positive dentry is to remove it, so
make that explicit in the name.
Signed-off-by: NeilBrown <neil@brown.name>
---
Documentation/filesystems/porting.rst | 10 ++++++++++
drivers/base/devtmpfs.c | 12 ++++--------
fs/bcachefs/fs-ioctl.c | 6 ++----
fs/namei.c | 23 +++++++++++++++++------
include/linux/namei.h | 5 +++--
5 files changed, 36 insertions(+), 20 deletions(-)
diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 85f590254f07..defbae457310 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1285,3 +1285,13 @@ rather than a VMA, as the VMA at this stage is not yet valid.
The vm_area_desc provides the minimum required information for a filesystem
to initialise state upon memory mapping of a file-backed region, and output
parameters for the file system to set this state.
+
+---
+
+**mandatory**
+
+kern_path_locked and user_path_locked_at() are renamed to
+kern_path_removing() and user_path_removing_at() and should only
+be used when removing a name. done_path_removing() should be called
+after removal.
+
diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
index 31bfb3194b4c..26d0beead1f0 100644
--- a/drivers/base/devtmpfs.c
+++ b/drivers/base/devtmpfs.c
@@ -256,7 +256,7 @@ static int dev_rmdir(const char *name)
struct dentry *dentry;
int err;
- dentry = kern_path_locked(name, &parent);
+ dentry = kern_path_removing(name, &parent);
if (IS_ERR(dentry))
return PTR_ERR(dentry);
if (d_inode(dentry)->i_private == &thread)
@@ -265,9 +265,7 @@ static int dev_rmdir(const char *name)
else
err = -EPERM;
- dput(dentry);
- inode_unlock(d_inode(parent.dentry));
- path_put(&parent);
+ done_path_removing(dentry, &parent);
return err;
}
@@ -325,7 +323,7 @@ static int handle_remove(const char *nodename, struct device *dev)
int deleted = 0;
int err = 0;
- dentry = kern_path_locked(nodename, &parent);
+ dentry = kern_path_removing(nodename, &parent);
if (IS_ERR(dentry))
return PTR_ERR(dentry);
@@ -349,10 +347,8 @@ static int handle_remove(const char *nodename, struct device *dev)
if (!err || err == -ENOENT)
deleted = 1;
}
- dput(dentry);
- inode_unlock(d_inode(parent.dentry));
+ done_path_removing(dentry, &parent);
- path_put(&parent);
if (deleted && strchr(nodename, '/'))
delete_path(nodename);
return err;
diff --git a/fs/bcachefs/fs-ioctl.c b/fs/bcachefs/fs-ioctl.c
index 4e72e654da96..9446cefbe249 100644
--- a/fs/bcachefs/fs-ioctl.c
+++ b/fs/bcachefs/fs-ioctl.c
@@ -334,7 +334,7 @@ static long bch2_ioctl_subvolume_destroy(struct bch_fs *c, struct file *filp,
if (arg.flags)
return -EINVAL;
- victim = user_path_locked_at(arg.dirfd, name, &path);
+ victim = user_path_removing_at(arg.dirfd, name, &path);
if (IS_ERR(victim))
return PTR_ERR(victim);
@@ -351,9 +351,7 @@ static long bch2_ioctl_subvolume_destroy(struct bch_fs *c, struct file *filp,
d_invalidate(victim);
}
err:
- inode_unlock(dir);
- dput(victim);
- path_put(&path);
+ done_path_removing(victim, &path);
return ret;
}
diff --git a/fs/namei.c b/fs/namei.c
index 104015f302a7..c750820b27b9 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2757,7 +2757,8 @@ static int filename_parentat(int dfd, struct filename *name,
}
/* does lookup, returns the object with parent locked */
-static struct dentry *__kern_path_locked(int dfd, struct filename *name, struct path *path)
+static struct dentry *__kern_path_removing(int dfd, struct filename *name,
+ struct path *path)
{
struct path parent_path __free(path_put) = {};
struct dentry *d;
@@ -2815,24 +2816,34 @@ struct dentry *kern_path_parent(const char *name, struct path *path)
return d;
}
-struct dentry *kern_path_locked(const char *name, struct path *path)
+struct dentry *kern_path_removing(const char *name, struct path *path)
{
struct filename *filename = getname_kernel(name);
- struct dentry *res = __kern_path_locked(AT_FDCWD, filename, path);
+ struct dentry *res = __kern_path_removing(AT_FDCWD, filename, path);
putname(filename);
return res;
}
-struct dentry *user_path_locked_at(int dfd, const char __user *name, struct path *path)
+void done_path_removing(struct dentry *dentry, struct path *path)
+{
+ if (!IS_ERR(dentry)) {
+ inode_unlock(path->dentry->d_inode);
+ dput(dentry);
+ path_put(path);
+ }
+}
+EXPORT_SYMBOL(done_path_removing);
+
+struct dentry *user_path_removing_at(int dfd, const char __user *name, struct path *path)
{
struct filename *filename = getname(name);
- struct dentry *res = __kern_path_locked(dfd, filename, path);
+ struct dentry *res = __kern_path_removing(dfd, filename, path);
putname(filename);
return res;
}
-EXPORT_SYMBOL(user_path_locked_at);
+EXPORT_SYMBOL(user_path_removing_at);
int kern_path(const char *name, unsigned int flags, struct path *path)
{
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 1d5038c21c20..37568f8055f9 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -62,8 +62,9 @@ struct dentry *kern_path_parent(const char *name, struct path *parent);
extern struct dentry *kern_path_create(int, const char *, struct path *, unsigned int);
extern struct dentry *user_path_create(int, const char __user *, struct path *, unsigned int);
extern void done_path_create(struct path *, struct dentry *);
-extern struct dentry *kern_path_locked(const char *, struct path *);
-extern struct dentry *user_path_locked_at(int , const char __user *, struct path *);
+extern struct dentry *kern_path_removing(const char *, struct path *);
+extern struct dentry *user_path_removing_at(int , const char __user *, struct path *);
+void done_path_removing(struct dentry *dentry, struct path *path);
int vfs_path_parent_lookup(struct filename *filename, unsigned int flags,
struct path *parent, struct qstr *last, int *type,
const struct path *root);
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 1/6] fs/proc: Don't look root inode when creating "self" and "thread-self"
2025-09-06 4:57 ` [PATCH 1/6] fs/proc: Don't look root inode when creating "self" and "thread-self" NeilBrown
@ 2025-09-06 5:50 ` Al Viro
2025-09-06 5:52 ` Al Viro
0 siblings, 1 reply; 22+ messages in thread
From: Al Viro @ 2025-09-06 5:50 UTC (permalink / raw)
To: NeilBrown; +Cc: Christian Brauner, Amir Goldstein, Jan Kara, linux-fsdevel
On Sat, Sep 06, 2025 at 02:57:05PM +1000, NeilBrown wrote:
> From: NeilBrown <neil@brown.name>
>
> proc_setup_self() and proc_setup_thread_self() are only called from
> proc_fill_super() which is before the filesystem is "live". So there is
> no need to lock the root directory when adding "self" and "thread-self".
>
> The locking rules are expected to change, so this locking will become
> anachronistic if we don't remove it.
Please, leave that one alone. FWIW, in tree-in-dcache branch (will push
tomorrow or on Sunday, once I sort the fucking #work.f_path out) there's
this:
commit fcac614cd72f0dfc45168817a139653877649507
Author: Al Viro <viro@zeniv.linux.org.uk>
Date: Mon Feb 26 01:55:36 2024 -0500
procfs: make /self and /thread_self dentries persistent
... and there's no need to remember those pointers anywhere - ->kill_sb()
no longer needs to bother since kill_anon_super() will take care of
them anyway and proc_pid_readdir() only wants the inumbers, which
we had in a couple of static variables all along.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
diff --git a/fs/proc/base.c b/fs/proc/base.c
index c667702dc69b..2d6a541ede27 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3585,14 +3585,12 @@ int proc_pid_readdir(struct file *file, struct dir_context *ctx)
return 0;
if (pos == TGID_OFFSET - 2) {
- struct inode *inode = d_inode(fs_info->proc_self);
- if (!dir_emit(ctx, "self", 4, inode->i_ino, DT_LNK))
+ if (!dir_emit(ctx, "self", 4, self_inum, DT_LNK))
return 0;
ctx->pos = pos = pos + 1;
}
if (pos == TGID_OFFSET - 1) {
- struct inode *inode = d_inode(fs_info->proc_thread_self);
- if (!dir_emit(ctx, "thread-self", 11, inode->i_ino, DT_LNK))
+ if (!dir_emit(ctx, "thread-self", 11, thread_self_inum, DT_LNK))
return 0;
ctx->pos = pos = pos + 1;
}
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 96122e91c645..dadc621556d9 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -369,6 +369,7 @@ static inline void proc_tty_init(void) {}
extern struct proc_dir_entry proc_root;
extern void proc_self_init(void);
+extern unsigned self_inum, thread_self_inum;
/*
* task_[no]mmu.c
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 06a297a27ba3..923ae40f19b9 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -261,17 +261,11 @@ static void proc_kill_sb(struct super_block *sb)
{
struct proc_fs_info *fs_info = proc_sb_info(sb);
- if (!fs_info) {
- kill_anon_super(sb);
- return;
- }
-
- dput(fs_info->proc_self);
- dput(fs_info->proc_thread_self);
-
kill_anon_super(sb);
- put_pid_ns(fs_info->pid_ns);
- kfree_rcu(fs_info, rcu);
+ if (fs_info) {
+ put_pid_ns(fs_info->pid_ns);
+ kfree(fs_info);
+ }
}
static struct file_system_type proc_fs_type = {
diff --git a/fs/proc/self.c b/fs/proc/self.c
index b46fbfd22681..62d2c0cfe35c 100644
--- a/fs/proc/self.c
+++ b/fs/proc/self.c
@@ -31,12 +31,11 @@ static const struct inode_operations proc_self_inode_operations = {
.get_link = proc_self_get_link,
};
-static unsigned self_inum __ro_after_init;
+unsigned self_inum __ro_after_init;
int proc_setup_self(struct super_block *s)
{
struct inode *root_inode = d_inode(s->s_root);
- struct proc_fs_info *fs_info = proc_sb_info(s);
struct dentry *self;
int ret = -ENOMEM;
@@ -51,18 +50,15 @@ int proc_setup_self(struct super_block *s)
inode->i_uid = GLOBAL_ROOT_UID;
inode->i_gid = GLOBAL_ROOT_GID;
inode->i_op = &proc_self_inode_operations;
- d_add(self, inode);
+ d_make_persistent(self, inode);
ret = 0;
- } else {
- dput(self);
}
+ dput(self);
}
inode_unlock(root_inode);
if (ret)
pr_err("proc_fill_super: can't allocate /proc/self\n");
- else
- fs_info->proc_self = self;
return ret;
}
diff --git a/fs/proc/thread_self.c b/fs/proc/thread_self.c
index 0e5050d6ab64..d6113dbe58e0 100644
--- a/fs/proc/thread_self.c
+++ b/fs/proc/thread_self.c
@@ -31,12 +31,11 @@ static const struct inode_operations proc_thread_self_inode_operations = {
.get_link = proc_thread_self_get_link,
};
-static unsigned thread_self_inum __ro_after_init;
+unsigned thread_self_inum __ro_after_init;
int proc_setup_thread_self(struct super_block *s)
{
struct inode *root_inode = d_inode(s->s_root);
- struct proc_fs_info *fs_info = proc_sb_info(s);
struct dentry *thread_self;
int ret = -ENOMEM;
@@ -51,19 +50,15 @@ int proc_setup_thread_self(struct super_block *s)
inode->i_uid = GLOBAL_ROOT_UID;
inode->i_gid = GLOBAL_ROOT_GID;
inode->i_op = &proc_thread_self_inode_operations;
- d_add(thread_self, inode);
+ d_make_persistent(thread_self, inode);
ret = 0;
- } else {
- dput(thread_self);
}
+ dput(thread_self);
}
inode_unlock(root_inode);
if (ret)
pr_err("proc_fill_super: can't allocate /proc/thread-self\n");
- else
- fs_info->proc_thread_self = thread_self;
-
return ret;
}
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index ea62201c74c4..de0edb431eac 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -63,8 +63,6 @@ enum proc_pidonly {
struct proc_fs_info {
struct pid_namespace *pid_ns;
- struct dentry *proc_self; /* For /proc/self */
- struct dentry *proc_thread_self; /* For /proc/thread-self */
kgid_t pid_gid;
enum proc_hidepid hide_pid;
enum proc_pidonly pidonly;
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 1/6] fs/proc: Don't look root inode when creating "self" and "thread-self"
2025-09-06 5:50 ` Al Viro
@ 2025-09-06 5:52 ` Al Viro
2025-09-06 6:08 ` NeilBrown
0 siblings, 1 reply; 22+ messages in thread
From: Al Viro @ 2025-09-06 5:52 UTC (permalink / raw)
To: NeilBrown; +Cc: Christian Brauner, Amir Goldstein, Jan Kara, linux-fsdevel
On Sat, Sep 06, 2025 at 06:50:05AM +0100, Al Viro wrote:
> On Sat, Sep 06, 2025 at 02:57:05PM +1000, NeilBrown wrote:
> > From: NeilBrown <neil@brown.name>
> >
> > proc_setup_self() and proc_setup_thread_self() are only called from
> > proc_fill_super() which is before the filesystem is "live". So there is
> > no need to lock the root directory when adding "self" and "thread-self".
> >
> > The locking rules are expected to change, so this locking will become
> > anachronistic if we don't remove it.
>
> Please, leave that one alone. FWIW, in tree-in-dcache branch (will push
> tomorrow or on Sunday, once I sort the fucking #work.f_path out) there's
> this:
PS: you do realize that we have similar things in devpts, binder, functionfs,
etc., right? What's special about procfs?
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/6] fs/proc: Don't look root inode when creating "self" and "thread-self"
2025-09-06 5:52 ` Al Viro
@ 2025-09-06 6:08 ` NeilBrown
0 siblings, 0 replies; 22+ messages in thread
From: NeilBrown @ 2025-09-06 6:08 UTC (permalink / raw)
To: Al Viro; +Cc: Christian Brauner, Amir Goldstein, Jan Kara, linux-fsdevel
On Sat, 06 Sep 2025, Al Viro wrote:
> On Sat, Sep 06, 2025 at 06:50:05AM +0100, Al Viro wrote:
> > On Sat, Sep 06, 2025 at 02:57:05PM +1000, NeilBrown wrote:
> > > From: NeilBrown <neil@brown.name>
> > >
> > > proc_setup_self() and proc_setup_thread_self() are only called from
> > > proc_fill_super() which is before the filesystem is "live". So there is
> > > no need to lock the root directory when adding "self" and "thread-self".
> > >
> > > The locking rules are expected to change, so this locking will become
> > > anachronistic if we don't remove it.
> >
> > Please, leave that one alone. FWIW, in tree-in-dcache branch (will push
> > tomorrow or on Sunday, once I sort the fucking #work.f_path out) there's
> > this:
Sure, I'm happy to drop this patch.
>
> PS: you do realize that we have similar things in devpts, binder, functionfs,
> etc., right? What's special about procfs?
>
devpts uses simple_start_creating().
The ->lookup function for for procfs will return -ENOENT for a
non-existing name rather than returning NULL and leaving the dentry
uninstantiated, so simple_start_creating() will fail.
binder uses d_alloc_name() and d_add() without any locking, which is
exactly what I was proposing for proc.
functionfs seems to do the same.
NeilBrown
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 6/6] VFS: rename kern_path_locked() to kern_path_removing()
2025-09-06 4:57 ` [PATCH 6/6] VFS: rename kern_path_locked() to kern_path_removing() NeilBrown
@ 2025-09-06 9:09 ` Amir Goldstein
2025-09-06 9:29 ` NeilBrown
0 siblings, 1 reply; 22+ messages in thread
From: Amir Goldstein @ 2025-09-06 9:09 UTC (permalink / raw)
To: NeilBrown; +Cc: Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel
On Sat, Sep 6, 2025 at 7:01 AM NeilBrown <neilb@ownmail.net> wrote:
>
> From: NeilBrown <neil@brown.name>
>
> Also rename user_path_locked_at() to user_path_removing_at()
>
> Add done_path_removing() to clean up after these calls.
>
> The only credible need for a locked positive dentry is to remove it, so
> make that explicit in the name.
That's a pretty bold statement...
I generally like the done_ abstraction that could be also used as a guard
cleanup helper.
The problem I have with this is that {kern,done}_path_removing rhymes with
{kern,done}_path_create, while in fact they are very different.
What is the motivation for the function rename (you did not specify it)?
Is it just because done_path_locked() sounds weird or something else?
I wonder if using guard semantics could be the better choice if
looking to clarify the code.
Thanks,
Amir.
>
> Signed-off-by: NeilBrown <neil@brown.name>
> ---
> Documentation/filesystems/porting.rst | 10 ++++++++++
> drivers/base/devtmpfs.c | 12 ++++--------
> fs/bcachefs/fs-ioctl.c | 6 ++----
> fs/namei.c | 23 +++++++++++++++++------
> include/linux/namei.h | 5 +++--
> 5 files changed, 36 insertions(+), 20 deletions(-)
>
> diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
> index 85f590254f07..defbae457310 100644
> --- a/Documentation/filesystems/porting.rst
> +++ b/Documentation/filesystems/porting.rst
> @@ -1285,3 +1285,13 @@ rather than a VMA, as the VMA at this stage is not yet valid.
> The vm_area_desc provides the minimum required information for a filesystem
> to initialise state upon memory mapping of a file-backed region, and output
> parameters for the file system to set this state.
> +
> +---
> +
> +**mandatory**
> +
> +kern_path_locked and user_path_locked_at() are renamed to
> +kern_path_removing() and user_path_removing_at() and should only
> +be used when removing a name. done_path_removing() should be called
> +after removal.
> +
> diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
> index 31bfb3194b4c..26d0beead1f0 100644
> --- a/drivers/base/devtmpfs.c
> +++ b/drivers/base/devtmpfs.c
> @@ -256,7 +256,7 @@ static int dev_rmdir(const char *name)
> struct dentry *dentry;
> int err;
>
> - dentry = kern_path_locked(name, &parent);
> + dentry = kern_path_removing(name, &parent);
> if (IS_ERR(dentry))
> return PTR_ERR(dentry);
> if (d_inode(dentry)->i_private == &thread)
> @@ -265,9 +265,7 @@ static int dev_rmdir(const char *name)
> else
> err = -EPERM;
>
> - dput(dentry);
> - inode_unlock(d_inode(parent.dentry));
> - path_put(&parent);
> + done_path_removing(dentry, &parent);
> return err;
> }
>
> @@ -325,7 +323,7 @@ static int handle_remove(const char *nodename, struct device *dev)
> int deleted = 0;
> int err = 0;
>
> - dentry = kern_path_locked(nodename, &parent);
> + dentry = kern_path_removing(nodename, &parent);
> if (IS_ERR(dentry))
> return PTR_ERR(dentry);
>
> @@ -349,10 +347,8 @@ static int handle_remove(const char *nodename, struct device *dev)
> if (!err || err == -ENOENT)
> deleted = 1;
> }
> - dput(dentry);
> - inode_unlock(d_inode(parent.dentry));
> + done_path_removing(dentry, &parent);
>
> - path_put(&parent);
> if (deleted && strchr(nodename, '/'))
> delete_path(nodename);
> return err;
> diff --git a/fs/bcachefs/fs-ioctl.c b/fs/bcachefs/fs-ioctl.c
> index 4e72e654da96..9446cefbe249 100644
> --- a/fs/bcachefs/fs-ioctl.c
> +++ b/fs/bcachefs/fs-ioctl.c
> @@ -334,7 +334,7 @@ static long bch2_ioctl_subvolume_destroy(struct bch_fs *c, struct file *filp,
> if (arg.flags)
> return -EINVAL;
>
> - victim = user_path_locked_at(arg.dirfd, name, &path);
> + victim = user_path_removing_at(arg.dirfd, name, &path);
> if (IS_ERR(victim))
> return PTR_ERR(victim);
>
> @@ -351,9 +351,7 @@ static long bch2_ioctl_subvolume_destroy(struct bch_fs *c, struct file *filp,
> d_invalidate(victim);
> }
> err:
> - inode_unlock(dir);
> - dput(victim);
> - path_put(&path);
> + done_path_removing(victim, &path);
> return ret;
> }
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 104015f302a7..c750820b27b9 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -2757,7 +2757,8 @@ static int filename_parentat(int dfd, struct filename *name,
> }
>
> /* does lookup, returns the object with parent locked */
> -static struct dentry *__kern_path_locked(int dfd, struct filename *name, struct path *path)
> +static struct dentry *__kern_path_removing(int dfd, struct filename *name,
> + struct path *path)
> {
> struct path parent_path __free(path_put) = {};
> struct dentry *d;
> @@ -2815,24 +2816,34 @@ struct dentry *kern_path_parent(const char *name, struct path *path)
> return d;
> }
>
> -struct dentry *kern_path_locked(const char *name, struct path *path)
> +struct dentry *kern_path_removing(const char *name, struct path *path)
> {
> struct filename *filename = getname_kernel(name);
> - struct dentry *res = __kern_path_locked(AT_FDCWD, filename, path);
> + struct dentry *res = __kern_path_removing(AT_FDCWD, filename, path);
>
> putname(filename);
> return res;
> }
>
> -struct dentry *user_path_locked_at(int dfd, const char __user *name, struct path *path)
> +void done_path_removing(struct dentry *dentry, struct path *path)
> +{
> + if (!IS_ERR(dentry)) {
> + inode_unlock(path->dentry->d_inode);
> + dput(dentry);
> + path_put(path);
> + }
> +}
> +EXPORT_SYMBOL(done_path_removing);
> +
> +struct dentry *user_path_removing_at(int dfd, const char __user *name, struct path *path)
> {
> struct filename *filename = getname(name);
> - struct dentry *res = __kern_path_locked(dfd, filename, path);
> + struct dentry *res = __kern_path_removing(dfd, filename, path);
>
> putname(filename);
> return res;
> }
> -EXPORT_SYMBOL(user_path_locked_at);
> +EXPORT_SYMBOL(user_path_removing_at);
>
> int kern_path(const char *name, unsigned int flags, struct path *path)
> {
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 1d5038c21c20..37568f8055f9 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -62,8 +62,9 @@ struct dentry *kern_path_parent(const char *name, struct path *parent);
> extern struct dentry *kern_path_create(int, const char *, struct path *, unsigned int);
> extern struct dentry *user_path_create(int, const char __user *, struct path *, unsigned int);
> extern void done_path_create(struct path *, struct dentry *);
> -extern struct dentry *kern_path_locked(const char *, struct path *);
> -extern struct dentry *user_path_locked_at(int , const char __user *, struct path *);
> +extern struct dentry *kern_path_removing(const char *, struct path *);
> +extern struct dentry *user_path_removing_at(int , const char __user *, struct path *);
> +void done_path_removing(struct dentry *dentry, struct path *path);
> int vfs_path_parent_lookup(struct filename *filename, unsigned int flags,
> struct path *parent, struct qstr *last, int *type,
> const struct path *root);
> --
> 2.50.0.107.gf914562f5916.dirty
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 6/6] VFS: rename kern_path_locked() to kern_path_removing()
2025-09-06 9:09 ` Amir Goldstein
@ 2025-09-06 9:29 ` NeilBrown
2025-09-06 10:35 ` Amir Goldstein
0 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2025-09-06 9:29 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel
On Sat, 06 Sep 2025, Amir Goldstein wrote:
> On Sat, Sep 6, 2025 at 7:01 AM NeilBrown <neilb@ownmail.net> wrote:
> >
> > From: NeilBrown <neil@brown.name>
> >
> > Also rename user_path_locked_at() to user_path_removing_at()
> >
> > Add done_path_removing() to clean up after these calls.
> >
> > The only credible need for a locked positive dentry is to remove it, so
> > make that explicit in the name.
>
> That's a pretty bold statement...
True - but it appears to be the case.
>
> I generally like the done_ abstraction that could be also used as a guard
> cleanup helper.
>
> The problem I have with this is that {kern,done}_path_removing rhymes with
> {kern,done}_path_create, while in fact they are very different.
As far as I can see the only difference is that one prepares to remove
an object and the other prepares to create an object - as reflected in
the names.
What other difference do you see?
>
> What is the motivation for the function rename (you did not specify it)?
> Is it just because done_path_locked() sounds weird or something else?
Making the name more specific discourages misuse. It also highlights
the similarity to kern_path_create (which obviously works because you
noticed it).
This also prepares readers for when they see my next series which adds
start_creating and start_removing (with _noperm and _killable options
and end_foo()) so that they will think "oh, this fits an established
pattern - good".
Note that I chose "end_" for symmetry with end_creating() in debugfs and
tracefs_end_creating() which already exist.
Maybe they should be changed to "done_" but I'm not so keen on done_ as
if there was an error then it wasn't "done". And end_ matches start_
better than done_. They mark the start and end of a code fragment which
tries to create or remove (or rename - I have a selection of
start_renaming options too).
(simple_start_creating() was introduced a few months ago and I'm basing
some of my naming choices on that).
Maybe kern_path_create() should be renamed to start_path_creating().
We don't really need the "kern" - that is the default.
>
> I wonder if using guard semantics could be the better choice if
> looking to clarify the code.
Al suggested in reply to a previous patch that RAII (aka guard etc)
should be added separately so it can be reviewed separately.
Thanks,
NeilBrown
>
> Thanks,
> Amir.
>
> >
> > Signed-off-by: NeilBrown <neil@brown.name>
> > ---
> > Documentation/filesystems/porting.rst | 10 ++++++++++
> > drivers/base/devtmpfs.c | 12 ++++--------
> > fs/bcachefs/fs-ioctl.c | 6 ++----
> > fs/namei.c | 23 +++++++++++++++++------
> > include/linux/namei.h | 5 +++--
> > 5 files changed, 36 insertions(+), 20 deletions(-)
> >
> > diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
> > index 85f590254f07..defbae457310 100644
> > --- a/Documentation/filesystems/porting.rst
> > +++ b/Documentation/filesystems/porting.rst
> > @@ -1285,3 +1285,13 @@ rather than a VMA, as the VMA at this stage is not yet valid.
> > The vm_area_desc provides the minimum required information for a filesystem
> > to initialise state upon memory mapping of a file-backed region, and output
> > parameters for the file system to set this state.
> > +
> > +---
> > +
> > +**mandatory**
> > +
> > +kern_path_locked and user_path_locked_at() are renamed to
> > +kern_path_removing() and user_path_removing_at() and should only
> > +be used when removing a name. done_path_removing() should be called
> > +after removal.
> > +
> > diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
> > index 31bfb3194b4c..26d0beead1f0 100644
> > --- a/drivers/base/devtmpfs.c
> > +++ b/drivers/base/devtmpfs.c
> > @@ -256,7 +256,7 @@ static int dev_rmdir(const char *name)
> > struct dentry *dentry;
> > int err;
> >
> > - dentry = kern_path_locked(name, &parent);
> > + dentry = kern_path_removing(name, &parent);
> > if (IS_ERR(dentry))
> > return PTR_ERR(dentry);
> > if (d_inode(dentry)->i_private == &thread)
> > @@ -265,9 +265,7 @@ static int dev_rmdir(const char *name)
> > else
> > err = -EPERM;
> >
> > - dput(dentry);
> > - inode_unlock(d_inode(parent.dentry));
> > - path_put(&parent);
> > + done_path_removing(dentry, &parent);
> > return err;
> > }
> >
> > @@ -325,7 +323,7 @@ static int handle_remove(const char *nodename, struct device *dev)
> > int deleted = 0;
> > int err = 0;
> >
> > - dentry = kern_path_locked(nodename, &parent);
> > + dentry = kern_path_removing(nodename, &parent);
> > if (IS_ERR(dentry))
> > return PTR_ERR(dentry);
> >
> > @@ -349,10 +347,8 @@ static int handle_remove(const char *nodename, struct device *dev)
> > if (!err || err == -ENOENT)
> > deleted = 1;
> > }
> > - dput(dentry);
> > - inode_unlock(d_inode(parent.dentry));
> > + done_path_removing(dentry, &parent);
> >
> > - path_put(&parent);
> > if (deleted && strchr(nodename, '/'))
> > delete_path(nodename);
> > return err;
> > diff --git a/fs/bcachefs/fs-ioctl.c b/fs/bcachefs/fs-ioctl.c
> > index 4e72e654da96..9446cefbe249 100644
> > --- a/fs/bcachefs/fs-ioctl.c
> > +++ b/fs/bcachefs/fs-ioctl.c
> > @@ -334,7 +334,7 @@ static long bch2_ioctl_subvolume_destroy(struct bch_fs *c, struct file *filp,
> > if (arg.flags)
> > return -EINVAL;
> >
> > - victim = user_path_locked_at(arg.dirfd, name, &path);
> > + victim = user_path_removing_at(arg.dirfd, name, &path);
> > if (IS_ERR(victim))
> > return PTR_ERR(victim);
> >
> > @@ -351,9 +351,7 @@ static long bch2_ioctl_subvolume_destroy(struct bch_fs *c, struct file *filp,
> > d_invalidate(victim);
> > }
> > err:
> > - inode_unlock(dir);
> > - dput(victim);
> > - path_put(&path);
> > + done_path_removing(victim, &path);
> > return ret;
> > }
> >
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 104015f302a7..c750820b27b9 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -2757,7 +2757,8 @@ static int filename_parentat(int dfd, struct filename *name,
> > }
> >
> > /* does lookup, returns the object with parent locked */
> > -static struct dentry *__kern_path_locked(int dfd, struct filename *name, struct path *path)
> > +static struct dentry *__kern_path_removing(int dfd, struct filename *name,
> > + struct path *path)
> > {
> > struct path parent_path __free(path_put) = {};
> > struct dentry *d;
> > @@ -2815,24 +2816,34 @@ struct dentry *kern_path_parent(const char *name, struct path *path)
> > return d;
> > }
> >
> > -struct dentry *kern_path_locked(const char *name, struct path *path)
> > +struct dentry *kern_path_removing(const char *name, struct path *path)
> > {
> > struct filename *filename = getname_kernel(name);
> > - struct dentry *res = __kern_path_locked(AT_FDCWD, filename, path);
> > + struct dentry *res = __kern_path_removing(AT_FDCWD, filename, path);
> >
> > putname(filename);
> > return res;
> > }
> >
> > -struct dentry *user_path_locked_at(int dfd, const char __user *name, struct path *path)
> > +void done_path_removing(struct dentry *dentry, struct path *path)
> > +{
> > + if (!IS_ERR(dentry)) {
> > + inode_unlock(path->dentry->d_inode);
> > + dput(dentry);
> > + path_put(path);
> > + }
> > +}
> > +EXPORT_SYMBOL(done_path_removing);
> > +
> > +struct dentry *user_path_removing_at(int dfd, const char __user *name, struct path *path)
> > {
> > struct filename *filename = getname(name);
> > - struct dentry *res = __kern_path_locked(dfd, filename, path);
> > + struct dentry *res = __kern_path_removing(dfd, filename, path);
> >
> > putname(filename);
> > return res;
> > }
> > -EXPORT_SYMBOL(user_path_locked_at);
> > +EXPORT_SYMBOL(user_path_removing_at);
> >
> > int kern_path(const char *name, unsigned int flags, struct path *path)
> > {
> > diff --git a/include/linux/namei.h b/include/linux/namei.h
> > index 1d5038c21c20..37568f8055f9 100644
> > --- a/include/linux/namei.h
> > +++ b/include/linux/namei.h
> > @@ -62,8 +62,9 @@ struct dentry *kern_path_parent(const char *name, struct path *parent);
> > extern struct dentry *kern_path_create(int, const char *, struct path *, unsigned int);
> > extern struct dentry *user_path_create(int, const char __user *, struct path *, unsigned int);
> > extern void done_path_create(struct path *, struct dentry *);
> > -extern struct dentry *kern_path_locked(const char *, struct path *);
> > -extern struct dentry *user_path_locked_at(int , const char __user *, struct path *);
> > +extern struct dentry *kern_path_removing(const char *, struct path *);
> > +extern struct dentry *user_path_removing_at(int , const char __user *, struct path *);
> > +void done_path_removing(struct dentry *dentry, struct path *path);
> > int vfs_path_parent_lookup(struct filename *filename, unsigned int flags,
> > struct path *parent, struct qstr *last, int *type,
> > const struct path *root);
> > --
> > 2.50.0.107.gf914562f5916.dirty
> >
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/6] VFS/ovl: add lookup_one_positive_killable()
2025-09-06 4:57 ` [PATCH 2/6] VFS/ovl: add lookup_one_positive_killable() NeilBrown
@ 2025-09-06 9:44 ` Amir Goldstein
2025-09-08 2:07 ` NeilBrown
0 siblings, 1 reply; 22+ messages in thread
From: Amir Goldstein @ 2025-09-06 9:44 UTC (permalink / raw)
To: NeilBrown
Cc: Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
Miklos Szeredi
On Sat, Sep 6, 2025 at 7:00 AM NeilBrown <neilb@ownmail.net> wrote:
>
> From: NeilBrown <neil@brown.name>
>
> ovl wants a lookup which won't block on a fatal signal.
> It currently uses down_write_killable() and then repeated
> calls to lookup_one()
>
> The lock may not be needed if the name is already in the dcache and it
> aid proposed future changes if the locking is kept internal to namei.c
>
> So this patch adds lookup_one_positive_killable() which is like
> lookup_one_positive() but will abort in the face of a fatal signal.
> overlayfs is changed to use this.
>
> Signed-off-by: NeilBrown <neil@brown.name>
I think the commit should mention that this changes from
inode_lock_killable() to inode_lock_shared_killable() on the
underlying dir inode which is a good thing for this scope.
BTW I was reading the git history that led to down_write_killable()
in this code and I had noticed that commit 3e32715496707
("vfs: get rid of old '->iterate' directory operation") has made
the ovl directory iteration non-killable when promoting the read
lock on the ovl directory to write lock.
In any case, you may add:
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> ---
> fs/namei.c | 54 ++++++++++++++++++++++++++++++++++++++++++
> fs/overlayfs/readdir.c | 28 +++++++++++-----------
> include/linux/namei.h | 3 +++
> 3 files changed, 71 insertions(+), 14 deletions(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index cd43ff89fbaa..b1bc298b9d7c 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -1827,6 +1827,19 @@ static struct dentry *lookup_slow(const struct qstr *name,
> return res;
> }
>
> +static struct dentry *lookup_slow_killable(const struct qstr *name,
> + struct dentry *dir,
> + unsigned int flags)
> +{
> + struct inode *inode = dir->d_inode;
> + struct dentry *res;
> + if (inode_lock_shared_killable(inode))
> + return ERR_PTR(-EINTR);
> + res = __lookup_slow(name, dir, flags);
> + inode_unlock_shared(inode);
> + return res;
> +}
> +
> static inline int may_lookup(struct mnt_idmap *idmap,
> struct nameidata *restrict nd)
> {
> @@ -3010,6 +3023,47 @@ struct dentry *lookup_one_unlocked(struct mnt_idmap *idmap, struct qstr *name,
> }
> EXPORT_SYMBOL(lookup_one_unlocked);
>
> +/**
> + * lookup_one_positive_killable - lookup single pathname component
> + * @idmap: idmap of the mount the lookup is performed from
> + * @name: qstr olding pathname component to lookup
> + * @base: base directory to lookup from
> + *
> + * This helper will yield ERR_PTR(-ENOENT) on negatives. The helper returns
> + * known positive or ERR_PTR(). This is what most of the users want.
> + *
> + * Note that pinned negative with unlocked parent _can_ become positive at any
> + * time, so callers of lookup_one_unlocked() need to be very careful; pinned
> + * positives have >d_inode stable, so this one avoids such problems.
> + *
> + * This can be used for in-kernel filesystem clients such as file servers.
> + *
> + * Ut should be called without the parent i_rwsem held, and will take
Typo: ^^ It
Thanks,
Amir.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 6/6] VFS: rename kern_path_locked() to kern_path_removing()
2025-09-06 9:29 ` NeilBrown
@ 2025-09-06 10:35 ` Amir Goldstein
0 siblings, 0 replies; 22+ messages in thread
From: Amir Goldstein @ 2025-09-06 10:35 UTC (permalink / raw)
To: NeilBrown; +Cc: Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel
On Sat, Sep 6, 2025 at 11:30 AM NeilBrown <neilb@ownmail.net> wrote:
>
> On Sat, 06 Sep 2025, Amir Goldstein wrote:
> > On Sat, Sep 6, 2025 at 7:01 AM NeilBrown <neilb@ownmail.net> wrote:
> > >
> > > From: NeilBrown <neil@brown.name>
> > >
> > > Also rename user_path_locked_at() to user_path_removing_at()
> > >
> > > Add done_path_removing() to clean up after these calls.
> > >
> > > The only credible need for a locked positive dentry is to remove it, so
> > > make that explicit in the name.
> >
> > That's a pretty bold statement...
>
> True - but it appears to be the case.
>
> >
> > I generally like the done_ abstraction that could be also used as a guard
> > cleanup helper.
> >
> > The problem I have with this is that {kern,done}_path_removing rhymes with
> > {kern,done}_path_create, while in fact they are very different.
>
> As far as I can see the only difference is that one prepares to remove
> an object and the other prepares to create an object - as reflected in
> the names.
>
> What other difference do you see?
>
void done_path_create(struct path *path, struct dentry *dentry)
{
if (!IS_ERR(dentry))
dput(dentry);
inode_unlock(path->dentry->d_inode);
mnt_drop_write(path->mnt);
path_put(path);
}
void done_path_removing(struct dentry *dentry, struct path *path)
{
if (!IS_ERR(dentry)) {
inode_unlock(path->dentry->d_inode);
dput(dentry);
path_put(path);
}
}
They look pretty different and the difference does not look like
it is related to differences between creation and removal.
For example, I do not see mnt_want_write() in handle_remove()
so I wonder why that is.
IOW, maybe xxx_path_removing() is like xxx_path_create()
but it does not act this way.
If you could use xxx_path_removing() in do_unlink() do_rmdir()
it would certainly prove your point, but as it is, I don't think you can.
> >
> > What is the motivation for the function rename (you did not specify it)?
> > Is it just because done_path_locked() sounds weird or something else?
>
> Making the name more specific discourages misuse. It also highlights
> the similarity to kern_path_create (which obviously works because you
> noticed it).
>
> This also prepares readers for when they see my next series which adds
> start_creating and start_removing (with _noperm and _killable options
> and end_foo()) so that they will think "oh, this fits an established
> pattern - good".
>
> Note that I chose "end_" for symmetry with end_creating() in debugfs and
> tracefs_end_creating() which already exist.
> Maybe they should be changed to "done_" but I'm not so keen on done_ as
> if there was an error then it wasn't "done". And end_ matches start_
> better than done_. They mark the start and end of a code fragment which
> tries to create or remove (or rename - I have a selection of
> start_renaming options too).
> (simple_start_creating() was introduced a few months ago and I'm basing
> some of my naming choices on that).
>
> Maybe kern_path_create() should be renamed to start_path_creating().
> We don't really need the "kern" - that is the default.
>
>
That makes sense.
> >
> > I wonder if using guard semantics could be the better choice if
> > looking to clarify the code.
>
> Al suggested in reply to a previous patch that RAII (aka guard etc)
> should be added separately so it can be reviewed separately.
>
Makes sense.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 5/6] VFS/audit: introduce kern_path_parent() for audit
2025-09-06 4:57 ` [PATCH 5/6] VFS/audit: introduce kern_path_parent() for audit NeilBrown
@ 2025-09-07 2:17 ` kernel test robot
0 siblings, 0 replies; 22+ messages in thread
From: kernel test robot @ 2025-09-07 2:17 UTC (permalink / raw)
To: NeilBrown, Alexander Viro, Christian Brauner, Amir Goldstein
Cc: oe-kbuild-all, Jan Kara, linux-fsdevel
Hi NeilBrown,
kernel test robot noticed the following build warnings:
[auto build test WARNING on brauner-vfs/vfs.all]
[also build test WARNING on pcmoore-audit/next viro-vfs/for-next linus/master v6.17-rc4 next-20250905]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/NeilBrown/fs-proc-Don-t-look-root-inode-when-creating-self-and-thread-self/20250906-130248
base: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git vfs.all
patch link: https://lore.kernel.org/r/20250906050015.3158851-6-neilb%40ownmail.net
patch subject: [PATCH 5/6] VFS/audit: introduce kern_path_parent() for audit
config: x86_64-buildonly-randconfig-001-20250907 (https://download.01.org/0day-ci/archive/20250907/202509070916.8uBFTDCH-lkp@intel.com/config)
compiler: gcc-13 (Debian 13.3.0-16) 13.3.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250907/202509070916.8uBFTDCH-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202509070916.8uBFTDCH-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> Warning: fs/namei.c:2804 function parameter 'path' not described in 'kern_path_parent'
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/6] VFS/ovl: add lookup_one_positive_killable()
2025-09-06 9:44 ` Amir Goldstein
@ 2025-09-08 2:07 ` NeilBrown
2025-09-08 7:09 ` Amir Goldstein
0 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2025-09-08 2:07 UTC (permalink / raw)
To: Amir Goldstein
Cc: Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
Miklos Szeredi
On Sat, 06 Sep 2025, Amir Goldstein wrote:
> On Sat, Sep 6, 2025 at 7:00 AM NeilBrown <neilb@ownmail.net> wrote:
> >
> > From: NeilBrown <neil@brown.name>
> >
> > ovl wants a lookup which won't block on a fatal signal.
> > It currently uses down_write_killable() and then repeated
> > calls to lookup_one()
> >
> > The lock may not be needed if the name is already in the dcache and it
> > aid proposed future changes if the locking is kept internal to namei.c
> >
> > So this patch adds lookup_one_positive_killable() which is like
> > lookup_one_positive() but will abort in the face of a fatal signal.
> > overlayfs is changed to use this.
> >
> > Signed-off-by: NeilBrown <neil@brown.name>
>
> I think the commit should mention that this changes from
> inode_lock_killable() to inode_lock_shared_killable() on the
> underlying dir inode which is a good thing for this scope.
>
> BTW I was reading the git history that led to down_write_killable()
> in this code and I had noticed that commit 3e32715496707
> ("vfs: get rid of old '->iterate' directory operation") has made
> the ovl directory iteration non-killable when promoting the read
> lock on the ovl directory to write lock.
hmmmm....
So the reason that this uses a killable lock is simply because it used
to happen under readdir and readdir uses a killable lock. Is that
right?
So there is no particularly reason that "killable" is important here?
So I could simply change it to use lookup_one_positive() and you
wouldn't mind?
I'd actually like to make all directory/dentry locking killable - I
don't think there is any downside. But I don't want to try pushing that
until my current exercise is finished.
>
> In any case, you may add:
> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Thanks!
NeilBrown
>
> > ---
> > fs/namei.c | 54 ++++++++++++++++++++++++++++++++++++++++++
> > fs/overlayfs/readdir.c | 28 +++++++++++-----------
> > include/linux/namei.h | 3 +++
> > 3 files changed, 71 insertions(+), 14 deletions(-)
> >
> > diff --git a/fs/namei.c b/fs/namei.c
> > index cd43ff89fbaa..b1bc298b9d7c 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -1827,6 +1827,19 @@ static struct dentry *lookup_slow(const struct qstr *name,
> > return res;
> > }
> >
> > +static struct dentry *lookup_slow_killable(const struct qstr *name,
> > + struct dentry *dir,
> > + unsigned int flags)
> > +{
> > + struct inode *inode = dir->d_inode;
> > + struct dentry *res;
> > + if (inode_lock_shared_killable(inode))
> > + return ERR_PTR(-EINTR);
> > + res = __lookup_slow(name, dir, flags);
> > + inode_unlock_shared(inode);
> > + return res;
> > +}
> > +
> > static inline int may_lookup(struct mnt_idmap *idmap,
> > struct nameidata *restrict nd)
> > {
> > @@ -3010,6 +3023,47 @@ struct dentry *lookup_one_unlocked(struct mnt_idmap *idmap, struct qstr *name,
> > }
> > EXPORT_SYMBOL(lookup_one_unlocked);
> >
> > +/**
> > + * lookup_one_positive_killable - lookup single pathname component
> > + * @idmap: idmap of the mount the lookup is performed from
> > + * @name: qstr olding pathname component to lookup
> > + * @base: base directory to lookup from
> > + *
> > + * This helper will yield ERR_PTR(-ENOENT) on negatives. The helper returns
> > + * known positive or ERR_PTR(). This is what most of the users want.
> > + *
> > + * Note that pinned negative with unlocked parent _can_ become positive at any
> > + * time, so callers of lookup_one_unlocked() need to be very careful; pinned
> > + * positives have >d_inode stable, so this one avoids such problems.
> > + *
> > + * This can be used for in-kernel filesystem clients such as file servers.
> > + *
> > + * Ut should be called without the parent i_rwsem held, and will take
>
> Typo: ^^ It
>
> Thanks,
> Amir.
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/6] VFS/ovl: add lookup_one_positive_killable()
2025-09-08 2:07 ` NeilBrown
@ 2025-09-08 7:09 ` Amir Goldstein
2025-09-09 2:14 ` NeilBrown
0 siblings, 1 reply; 22+ messages in thread
From: Amir Goldstein @ 2025-09-08 7:09 UTC (permalink / raw)
To: NeilBrown
Cc: Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
Miklos Szeredi
On Mon, Sep 8, 2025 at 4:07 AM NeilBrown <neilb@ownmail.net> wrote:
>
> On Sat, 06 Sep 2025, Amir Goldstein wrote:
> > On Sat, Sep 6, 2025 at 7:00 AM NeilBrown <neilb@ownmail.net> wrote:
> > >
> > > From: NeilBrown <neil@brown.name>
> > >
> > > ovl wants a lookup which won't block on a fatal signal.
> > > It currently uses down_write_killable() and then repeated
> > > calls to lookup_one()
> > >
> > > The lock may not be needed if the name is already in the dcache and it
> > > aid proposed future changes if the locking is kept internal to namei.c
> > >
> > > So this patch adds lookup_one_positive_killable() which is like
> > > lookup_one_positive() but will abort in the face of a fatal signal.
> > > overlayfs is changed to use this.
> > >
> > > Signed-off-by: NeilBrown <neil@brown.name>
> >
> > I think the commit should mention that this changes from
> > inode_lock_killable() to inode_lock_shared_killable() on the
> > underlying dir inode which is a good thing for this scope.
> >
> > BTW I was reading the git history that led to down_write_killable()
> > in this code and I had noticed that commit 3e32715496707
> > ("vfs: get rid of old '->iterate' directory operation") has made
> > the ovl directory iteration non-killable when promoting the read
> > lock on the ovl directory to write lock.
>
> hmmmm....
>
> So the reason that this uses a killable lock is simply because it used
> to happen under readdir and readdir uses a killable lock. Is that
> right?
I think the semantics were copied from readdir of that moment yes.
>
> So there is no particularly reason that "killable" is important here?
I can think of some reasons -
Maybe overlayfs (ever user mounted overlayfs) has just one process
accessing it but underlying lower layer is remote fs with many processes
accessing it so chances of lower layer dir lock being held by another thread
are much higher than chances of overlayfs dir lock being held.
> So I could simply change it to use lookup_one_positive() and you
> wouldn't mind?
>
I do mind and prefer that you keep this killable as you patch does.
The more important reason to keep this killable IMO is that we can and
should make overlayfs readdir shared lock one day.
> I'd actually like to make all directory/dentry locking killable - I
> don't think there is any downside. But I don't want to try pushing that
> until my current exercise is finished.
>
The path to making overlayfs readdir shared and killable is
to move the synchronization of ovl readdir cache and
OVL_I(inode)->version from the implicit vfs inode_lock() to
explicit ovl_inode_lock().
The mechanical change is easy.
My concern is from hidden assumptions in the code that
I am not aware of, ones which are not annotated with
inode_is_locked() like ovl_inode_version_get() and
ovl_dir_version_inc() are.
And the fact that noone has yet to complain about overlayfs readdir
scalability makes this conversion non urgent.
If you have other reasons to want to make ovl readdir killable
or shared, we can look into that.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/6] VFS/ovl: add lookup_one_positive_killable()
2025-09-08 7:09 ` Amir Goldstein
@ 2025-09-09 2:14 ` NeilBrown
0 siblings, 0 replies; 22+ messages in thread
From: NeilBrown @ 2025-09-09 2:14 UTC (permalink / raw)
To: Amir Goldstein
Cc: Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
Miklos Szeredi
On Mon, 08 Sep 2025, Amir Goldstein wrote:
> On Mon, Sep 8, 2025 at 4:07 AM NeilBrown <neilb@ownmail.net> wrote:
> >
> > On Sat, 06 Sep 2025, Amir Goldstein wrote:
> > > On Sat, Sep 6, 2025 at 7:00 AM NeilBrown <neilb@ownmail.net> wrote:
> > > >
> > > > From: NeilBrown <neil@brown.name>
> > > >
> > > > ovl wants a lookup which won't block on a fatal signal.
> > > > It currently uses down_write_killable() and then repeated
> > > > calls to lookup_one()
> > > >
> > > > The lock may not be needed if the name is already in the dcache and it
> > > > aid proposed future changes if the locking is kept internal to namei.c
> > > >
> > > > So this patch adds lookup_one_positive_killable() which is like
> > > > lookup_one_positive() but will abort in the face of a fatal signal.
> > > > overlayfs is changed to use this.
> > > >
> > > > Signed-off-by: NeilBrown <neil@brown.name>
> > >
> > > I think the commit should mention that this changes from
> > > inode_lock_killable() to inode_lock_shared_killable() on the
> > > underlying dir inode which is a good thing for this scope.
> > >
> > > BTW I was reading the git history that led to down_write_killable()
> > > in this code and I had noticed that commit 3e32715496707
> > > ("vfs: get rid of old '->iterate' directory operation") has made
> > > the ovl directory iteration non-killable when promoting the read
> > > lock on the ovl directory to write lock.
> >
> > hmmmm....
> >
> > So the reason that this uses a killable lock is simply because it used
> > to happen under readdir and readdir uses a killable lock. Is that
> > right?
>
> I think the semantics were copied from readdir of that moment yes.
>
> >
> > So there is no particularly reason that "killable" is important here?
>
> I can think of some reasons -
> Maybe overlayfs (ever user mounted overlayfs) has just one process
> accessing it but underlying lower layer is remote fs with many processes
> accessing it so chances of lower layer dir lock being held by another thread
> are much higher than chances of overlayfs dir lock being held.
>
> > So I could simply change it to use lookup_one_positive() and you
> > wouldn't mind?
> >
>
> I do mind and prefer that you keep this killable as you patch does.
> The more important reason to keep this killable IMO is that we can and
> should make overlayfs readdir shared lock one day.
Fair enough - I will persist with my current patch. I just wanted to be
sure it was wanted.
>
> > I'd actually like to make all directory/dentry locking killable - I
> > don't think there is any downside. But I don't want to try pushing that
> > until my current exercise is finished.
> >
>
> The path to making overlayfs readdir shared and killable is
> to move the synchronization of ovl readdir cache and
> OVL_I(inode)->version from the implicit vfs inode_lock() to
> explicit ovl_inode_lock().
>
> The mechanical change is easy.
> My concern is from hidden assumptions in the code that
> I am not aware of, ones which are not annotated with
> inode_is_locked() like ovl_inode_version_get() and
> ovl_dir_version_inc() are.
>
> And the fact that noone has yet to complain about overlayfs readdir
> scalability makes this conversion non urgent.
>
> If you have other reasons to want to make ovl readdir killable
> or shared, we can look into that.
readdir locking is not on my radar. We need to keep it to ensure
exclusion with rmdir (as we need a counter of current readdir threads
and there is nowhere else suitable to store a counter even if I wanted
to).
And readdir is already non-exclusive and would not benefit from
asynchrony (as far as I can see), so there is no need to change it.
I might find improvements to ovl readdir locking interesting, but not at
all a priority.
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 4/6] VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata
2025-09-06 4:57 ` [PATCH 4/6] VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata NeilBrown
@ 2025-09-15 12:10 ` Christian Brauner
2025-09-19 22:54 ` NeilBrown
0 siblings, 1 reply; 22+ messages in thread
From: Christian Brauner @ 2025-09-15 12:10 UTC (permalink / raw)
To: NeilBrown; +Cc: Alexander Viro, Amir Goldstein, Jan Kara, linux-fsdevel
On Sat, Sep 06, 2025 at 02:57:08PM +1000, NeilBrown wrote:
> From: NeilBrown <neil@brown.name>
>
> A rename can only rename within a single mount. Callers of vfs_rename()
> must and do ensure this is the case.
>
> So there is no point in having two mnt_idmaps in renamedata as they are
> always the same. Only one of them is passed to ->rename in any case.
>
> This patch replaces both with a single "mnt_idmap" and changes all
> callers.
>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
> Signed-off-by: NeilBrown <neil@brown.name>
> ---
Hah, thanks. I'm stealing this now.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 4/6] VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata
2025-09-15 12:10 ` Christian Brauner
@ 2025-09-19 22:54 ` NeilBrown
2025-09-19 23:17 ` Al Viro
0 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2025-09-19 22:54 UTC (permalink / raw)
To: Christian Brauner; +Cc: Alexander Viro, Amir Goldstein, Jan Kara, linux-fsdevel
On Mon, 15 Sep 2025, Christian Brauner wrote:
> On Sat, Sep 06, 2025 at 02:57:08PM +1000, NeilBrown wrote:
> > From: NeilBrown <neil@brown.name>
> >
> > A rename can only rename within a single mount. Callers of vfs_rename()
> > must and do ensure this is the case.
> >
> > So there is no point in having two mnt_idmaps in renamedata as they are
> > always the same. Only one of them is passed to ->rename in any case.
> >
> > This patch replaces both with a single "mnt_idmap" and changes all
> > callers.
> >
> > Reviewed-by: Jeff Layton <jlayton@kernel.org>
> > Signed-off-by: NeilBrown <neil@brown.name>
> > ---
>
> Hah, thanks. I'm stealing this now.
>
I was hoping you would steal the whole series - v3 of it.
https://lore.kernel.org/all/20250915021504.2632889-1-neilb@ownmail.net/
Is there anything preventing that going into vfs.all now?
But maybe I misunderstood what you meant by "stealing" - I don't even
see this one patch in vfs.all.
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 4/6] VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata
2025-09-19 22:54 ` NeilBrown
@ 2025-09-19 23:17 ` Al Viro
2025-09-19 23:44 ` NeilBrown
0 siblings, 1 reply; 22+ messages in thread
From: Al Viro @ 2025-09-19 23:17 UTC (permalink / raw)
To: NeilBrown; +Cc: Christian Brauner, Amir Goldstein, Jan Kara, linux-fsdevel
On Sat, Sep 20, 2025 at 08:54:36AM +1000, NeilBrown wrote:
> On Mon, 15 Sep 2025, Christian Brauner wrote:
> > On Sat, Sep 06, 2025 at 02:57:08PM +1000, NeilBrown wrote:
> > > From: NeilBrown <neil@brown.name>
> > >
> > > A rename can only rename within a single mount. Callers of vfs_rename()
> > > must and do ensure this is the case.
> > >
> > > So there is no point in having two mnt_idmaps in renamedata as they are
> > > always the same. Only one of them is passed to ->rename in any case.
> > >
> > > This patch replaces both with a single "mnt_idmap" and changes all
> > > callers.
> > >
> > > Reviewed-by: Jeff Layton <jlayton@kernel.org>
> > > Signed-off-by: NeilBrown <neil@brown.name>
> > > ---
> >
> > Hah, thanks. I'm stealing this now.
> >
>
> I was hoping you would steal the whole series - v3 of it.
>
> https://lore.kernel.org/all/20250915021504.2632889-1-neilb@ownmail.net/
>
> Is there anything preventing that going into vfs.all now?
1/6, perhaps?
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 4/6] VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata
2025-09-19 23:17 ` Al Viro
@ 2025-09-19 23:44 ` NeilBrown
0 siblings, 0 replies; 22+ messages in thread
From: NeilBrown @ 2025-09-19 23:44 UTC (permalink / raw)
To: Al Viro; +Cc: Christian Brauner, Amir Goldstein, Jan Kara, linux-fsdevel
On Sat, 20 Sep 2025, Al Viro wrote:
> On Sat, Sep 20, 2025 at 08:54:36AM +1000, NeilBrown wrote:
> > On Mon, 15 Sep 2025, Christian Brauner wrote:
> > > On Sat, Sep 06, 2025 at 02:57:08PM +1000, NeilBrown wrote:
> > > > From: NeilBrown <neil@brown.name>
> > > >
> > > > A rename can only rename within a single mount. Callers of vfs_rename()
> > > > must and do ensure this is the case.
> > > >
> > > > So there is no point in having two mnt_idmaps in renamedata as they are
> > > > always the same. Only one of them is passed to ->rename in any case.
> > > >
> > > > This patch replaces both with a single "mnt_idmap" and changes all
> > > > callers.
> > > >
> > > > Reviewed-by: Jeff Layton <jlayton@kernel.org>
> > > > Signed-off-by: NeilBrown <neil@brown.name>
> > > > ---
> > >
> > > Hah, thanks. I'm stealing this now.
> > >
> >
> > I was hoping you would steal the whole series - v3 of it.
> >
> > https://lore.kernel.org/all/20250915021504.2632889-1-neilb@ownmail.net/
> >
> > Is there anything preventing that going into vfs.all now?
>
> 1/6, perhaps?
>
Why might that patch be a barrier?
The only concern that has been raise is your question about the locking
environment for the list walk and I answered that - the list is
thread-local and doesn't need locking.
NeilBrown
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2025-09-19 23:44 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-06 4:57 [PATCH 0/6] VFS: more prep for change to directory locking NeilBrown
2025-09-06 4:57 ` [PATCH 1/6] fs/proc: Don't look root inode when creating "self" and "thread-self" NeilBrown
2025-09-06 5:50 ` Al Viro
2025-09-06 5:52 ` Al Viro
2025-09-06 6:08 ` NeilBrown
2025-09-06 4:57 ` [PATCH 2/6] VFS/ovl: add lookup_one_positive_killable() NeilBrown
2025-09-06 9:44 ` Amir Goldstein
2025-09-08 2:07 ` NeilBrown
2025-09-08 7:09 ` Amir Goldstein
2025-09-09 2:14 ` NeilBrown
2025-09-06 4:57 ` [PATCH 3/6] VFS: discard err2 in filename_create() NeilBrown
2025-09-06 4:57 ` [PATCH 4/6] VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata NeilBrown
2025-09-15 12:10 ` Christian Brauner
2025-09-19 22:54 ` NeilBrown
2025-09-19 23:17 ` Al Viro
2025-09-19 23:44 ` NeilBrown
2025-09-06 4:57 ` [PATCH 5/6] VFS/audit: introduce kern_path_parent() for audit NeilBrown
2025-09-07 2:17 ` kernel test robot
2025-09-06 4:57 ` [PATCH 6/6] VFS: rename kern_path_locked() to kern_path_removing() NeilBrown
2025-09-06 9:09 ` Amir Goldstein
2025-09-06 9:29 ` NeilBrown
2025-09-06 10:35 ` Amir Goldstein
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).