* [PATCH v2 0/8] VFS: more prep for change to directory locking
@ 2025-09-09 4:43 NeilBrown
2025-09-09 4:43 ` [PATCH v2 1/7] VFS/ovl: add lookup_one_positive_killable() NeilBrown
` (6 more replies)
0 siblings, 7 replies; 25+ messages in thread
From: NeilBrown @ 2025-09-09 4:43 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
This is a revised selection of cleanups and API renaming which continues
my work to centralise locking of create/remove/rename operations.
I've removed a procfs cleanup which conflicts slightly with some of Al's
work, and added more renaming, and have added some more places that use
simple_start_creating().
Thanks,
NeilBrown
[PATCH v2 1/7] VFS/ovl: add lookup_one_positive_killable()
[PATCH v2 2/7] VFS: discard err2 in filename_create()
[PATCH v2 3/7] VFS: unify old_mnt_idmap and new_mnt_idmap in
[PATCH v2 4/7] VFS/audit: introduce kern_path_parent() for audit
[PATCH v2 5/7] VFS: rename kern_path_locked() and related functions.
[PATCH v2 6/7] VFS: introduce simple_end_creating() and
[PATCH v2 7/7] Use simple_start_creating() in various places.
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v2 1/7] VFS/ovl: add lookup_one_positive_killable()
2025-09-09 4:43 [PATCH v2 0/8] VFS: more prep for change to directory locking NeilBrown
@ 2025-09-09 4:43 ` NeilBrown
2025-09-11 20:05 ` Al Viro
2025-09-09 4:43 ` [PATCH v2 2/7] VFS: discard err2 in filename_create() NeilBrown
` (5 subsequent siblings)
6 siblings, 1 reply; 25+ messages in thread
From: NeilBrown @ 2025-09-09 4:43 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
ovl wants a lookup which won't block on a fatal signal. It currently
uses down_write_killable() and then repeatedly calls to lookup_one()
The lock may not be needed if the name is already in the dcache and it
aids proposed future changes if the locking is kept internal to namei.c
So this patch adds lookup_one_positive_killable() which is like
lookup_one_positive() but will abort in the face of a fatal signal.
overlayfs is changed to use this.
Note that instead of always getting an exclusive lock, ovl now only gets
a shared lock, and only sometimes. The exclusive lock was never needed.
However down_read_killable() was only added in v4.15 but overlayfs started
using down_write_killable() here in v4.7.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/namei.c | 55 ++++++++++++++++++++++++++++++++++++++++++
fs/overlayfs/readdir.c | 28 ++++++++++-----------
include/linux/namei.h | 3 +++
3 files changed, 72 insertions(+), 14 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index cd43ff89fbaa..c7c6b255db2c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1827,6 +1827,20 @@ static struct dentry *lookup_slow(const struct qstr *name,
return res;
}
+static struct dentry *lookup_slow_killable(const struct qstr *name,
+ struct dentry *dir,
+ unsigned int flags)
+{
+ struct inode *inode = dir->d_inode;
+ struct dentry *res;
+
+ if (inode_lock_shared_killable(inode))
+ return ERR_PTR(-EINTR);
+ res = __lookup_slow(name, dir, flags);
+ inode_unlock_shared(inode);
+ return res;
+}
+
static inline int may_lookup(struct mnt_idmap *idmap,
struct nameidata *restrict nd)
{
@@ -3010,6 +3024,47 @@ struct dentry *lookup_one_unlocked(struct mnt_idmap *idmap, struct qstr *name,
}
EXPORT_SYMBOL(lookup_one_unlocked);
+/**
+ * lookup_one_positive_killable - lookup single pathname component
+ * @idmap: idmap of the mount the lookup is performed from
+ * @name: qstr olding pathname component to lookup
+ * @base: base directory to lookup from
+ *
+ * This helper will yield ERR_PTR(-ENOENT) on negatives. The helper returns
+ * known positive or ERR_PTR(). This is what most of the users want.
+ *
+ * Note that pinned negative with unlocked parent _can_ become positive at any
+ * time, so callers of lookup_one_unlocked() need to be very careful; pinned
+ * positives have >d_inode stable, so this one avoids such problems.
+ *
+ * This can be used for in-kernel filesystem clients such as file servers.
+ *
+ * It should be called without the parent i_rwsem held, and will take
+ * the i_rwsem itself if necessary. If a fatal signal is pending or
+ * delivered, it will return %-EINTR if the lock is needed.
+ */
+struct dentry *lookup_one_positive_killable(struct mnt_idmap *idmap,
+ struct qstr *name,
+ struct dentry *base)
+{
+ int err;
+ struct dentry *ret;
+
+ err = lookup_one_common(idmap, name, base);
+ if (err)
+ return ERR_PTR(err);
+
+ ret = lookup_dcache(name, base, 0);
+ if (!ret)
+ ret = lookup_slow_killable(name, base, 0);
+ if (!IS_ERR(ret) && d_flags_negative(smp_load_acquire(&ret->d_flags))) {
+ dput(ret);
+ ret = ERR_PTR(-ENOENT);
+ }
+ return ret;
+}
+EXPORT_SYMBOL(lookup_one_positive_killable);
+
/**
* lookup_one_positive_unlocked - lookup single pathname component
* @idmap: idmap of the mount the lookup is performed from
diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index b65cdfce31ce..15cb06fa0c9a 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -270,26 +270,26 @@ static bool ovl_fill_merge(struct dir_context *ctx, const char *name,
static int ovl_check_whiteouts(const struct path *path, struct ovl_readdir_data *rdd)
{
- int err;
+ int err = 0;
struct dentry *dentry, *dir = path->dentry;
const struct cred *old_cred;
old_cred = ovl_override_creds(rdd->dentry->d_sb);
- err = down_write_killable(&dir->d_inode->i_rwsem);
- if (!err) {
- while (rdd->first_maybe_whiteout) {
- struct ovl_cache_entry *p =
- rdd->first_maybe_whiteout;
- rdd->first_maybe_whiteout = p->next_maybe_whiteout;
- dentry = lookup_one(mnt_idmap(path->mnt),
- &QSTR_LEN(p->name, p->len), dir);
- if (!IS_ERR(dentry)) {
- p->is_whiteout = ovl_is_whiteout(dentry);
- dput(dentry);
- }
+ while (rdd->first_maybe_whiteout) {
+ struct ovl_cache_entry *p =
+ rdd->first_maybe_whiteout;
+ rdd->first_maybe_whiteout = p->next_maybe_whiteout;
+ dentry = lookup_one_positive_killable(mnt_idmap(path->mnt),
+ &QSTR_LEN(p->name, p->len),
+ dir);
+ if (!IS_ERR(dentry)) {
+ p->is_whiteout = ovl_is_whiteout(dentry);
+ dput(dentry);
+ } else if (PTR_ERR(dentry) == -EINTR) {
+ err = -EINTR;
+ break;
}
- inode_unlock(dir->d_inode);
}
ovl_revert_creds(old_cred);
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 5d085428e471..551a1a01e5e7 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -80,6 +80,9 @@ struct dentry *lookup_one_unlocked(struct mnt_idmap *idmap,
struct dentry *lookup_one_positive_unlocked(struct mnt_idmap *idmap,
struct qstr *name,
struct dentry *base);
+struct dentry *lookup_one_positive_killable(struct mnt_idmap *idmap,
+ struct qstr *name,
+ struct dentry *base);
extern int follow_down_one(struct path *);
extern int follow_down(struct path *path, unsigned int flags);
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v2 2/7] VFS: discard err2 in filename_create()
2025-09-09 4:43 [PATCH v2 0/8] VFS: more prep for change to directory locking NeilBrown
2025-09-09 4:43 ` [PATCH v2 1/7] VFS/ovl: add lookup_one_positive_killable() NeilBrown
@ 2025-09-09 4:43 ` NeilBrown
2025-09-09 11:24 ` Jeff Layton
2025-09-09 4:43 ` [PATCH v2 3/7] VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata NeilBrown
` (4 subsequent siblings)
6 siblings, 1 reply; 25+ messages in thread
From: NeilBrown @ 2025-09-09 4:43 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
Since 204a575e91f3 "VFS: add common error checks to lookup_one_qstr_excl()"
filename_create() does not need to stash the error value from mnt_want_write()
into a separate variable - the logic that used to clobber 'error' after the
call of mnt_want_write() has migrated into lookup_one_qstr_excl().
So there is no need for two different err variables.
This patch discards "err2" and uses "error' throughout.
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/namei.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index c7c6b255db2c..e2c2ab286bc0 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4169,7 +4169,6 @@ static struct dentry *filename_create(int dfd, struct filename *name,
unsigned int reval_flag = lookup_flags & LOOKUP_REVAL;
unsigned int create_flags = LOOKUP_CREATE | LOOKUP_EXCL;
int type;
- int err2;
int error;
error = filename_parentat(dfd, name, reval_flag, path, &last, &type);
@@ -4184,7 +4183,7 @@ static struct dentry *filename_create(int dfd, struct filename *name,
goto out;
/* don't fail immediately if it's r/o, at least try to report other errors */
- err2 = mnt_want_write(path->mnt);
+ error = mnt_want_write(path->mnt);
/*
* Do the final lookup. Suppress 'create' if there is a trailing
* '/', and a directory wasn't requested.
@@ -4197,17 +4196,16 @@ static struct dentry *filename_create(int dfd, struct filename *name,
if (IS_ERR(dentry))
goto unlock;
- if (unlikely(err2)) {
- error = err2;
+ if (unlikely(error))
goto fail;
- }
+
return dentry;
fail:
dput(dentry);
dentry = ERR_PTR(error);
unlock:
inode_unlock(path->dentry->d_inode);
- if (!err2)
+ if (!error)
mnt_drop_write(path->mnt);
out:
path_put(path);
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v2 3/7] VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata
2025-09-09 4:43 [PATCH v2 0/8] VFS: more prep for change to directory locking NeilBrown
2025-09-09 4:43 ` [PATCH v2 1/7] VFS/ovl: add lookup_one_positive_killable() NeilBrown
2025-09-09 4:43 ` [PATCH v2 2/7] VFS: discard err2 in filename_create() NeilBrown
@ 2025-09-09 4:43 ` NeilBrown
2025-09-09 4:43 ` [PATCH v2 4/7] VFS/audit: introduce kern_path_parent() for audit NeilBrown
` (3 subsequent siblings)
6 siblings, 0 replies; 25+ messages in thread
From: NeilBrown @ 2025-09-09 4:43 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
A rename operation can only rename within a single mount. Callers of
vfs_rename() must and do ensure this is the case.
So there is no point in having two mnt_idmaps in renamedata as they are
always the same. Only one of them is passed to ->rename in any case.
This patch replaces both with a single "mnt_idmap" and changes all
callers.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/cachefiles/namei.c | 3 +--
fs/ecryptfs/inode.c | 3 +--
fs/namei.c | 17 ++++++++---------
fs/nfsd/vfs.c | 3 +--
fs/overlayfs/overlayfs.h | 3 +--
fs/smb/server/vfs.c | 3 +--
include/linux/fs.h | 6 ++----
7 files changed, 15 insertions(+), 23 deletions(-)
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 91dfd0231877..d1edb2ac3837 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -387,10 +387,9 @@ int cachefiles_bury_object(struct cachefiles_cache *cache,
cachefiles_io_error(cache, "Rename security error %d", ret);
} else {
struct renamedata rd = {
- .old_mnt_idmap = &nop_mnt_idmap,
+ .mnt_idmap = &nop_mnt_idmap,
.old_parent = dir,
.old_dentry = rep,
- .new_mnt_idmap = &nop_mnt_idmap,
.new_parent = cache->graveyard,
.new_dentry = grave,
};
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 72fbe1316ab8..abd954c6a14e 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -634,10 +634,9 @@ ecryptfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
goto out_lock;
}
- rd.old_mnt_idmap = &nop_mnt_idmap;
+ rd.mnt_idmap = &nop_mnt_idmap;
rd.old_parent = lower_old_dir_dentry;
rd.old_dentry = lower_old_dentry;
- rd.new_mnt_idmap = &nop_mnt_idmap;
rd.new_parent = lower_new_dir_dentry;
rd.new_dentry = lower_new_dentry;
rc = vfs_rename(&rd);
diff --git a/fs/namei.c b/fs/namei.c
index e2c2ab286bc0..180037b96956 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -5077,20 +5077,20 @@ int vfs_rename(struct renamedata *rd)
if (source == target)
return 0;
- error = may_delete(rd->old_mnt_idmap, old_dir, old_dentry, is_dir);
+ error = may_delete(rd->mnt_idmap, old_dir, old_dentry, is_dir);
if (error)
return error;
if (!target) {
- error = may_create(rd->new_mnt_idmap, new_dir, new_dentry);
+ error = may_create(rd->mnt_idmap, new_dir, new_dentry);
} else {
new_is_dir = d_is_dir(new_dentry);
if (!(flags & RENAME_EXCHANGE))
- error = may_delete(rd->new_mnt_idmap, new_dir,
+ error = may_delete(rd->mnt_idmap, new_dir,
new_dentry, is_dir);
else
- error = may_delete(rd->new_mnt_idmap, new_dir,
+ error = may_delete(rd->mnt_idmap, new_dir,
new_dentry, new_is_dir);
}
if (error)
@@ -5105,13 +5105,13 @@ int vfs_rename(struct renamedata *rd)
*/
if (new_dir != old_dir) {
if (is_dir) {
- error = inode_permission(rd->old_mnt_idmap, source,
+ error = inode_permission(rd->mnt_idmap, source,
MAY_WRITE);
if (error)
return error;
}
if ((flags & RENAME_EXCHANGE) && new_is_dir) {
- error = inode_permission(rd->new_mnt_idmap, target,
+ error = inode_permission(rd->mnt_idmap, target,
MAY_WRITE);
if (error)
return error;
@@ -5179,7 +5179,7 @@ int vfs_rename(struct renamedata *rd)
if (error)
goto out;
}
- error = old_dir->i_op->rename(rd->new_mnt_idmap, old_dir, old_dentry,
+ error = old_dir->i_op->rename(rd->mnt_idmap, old_dir, old_dentry,
new_dir, new_dentry, flags);
if (error)
goto out;
@@ -5322,10 +5322,9 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
rd.old_parent = old_path.dentry;
rd.old_dentry = old_dentry;
- rd.old_mnt_idmap = mnt_idmap(old_path.mnt);
+ rd.mnt_idmap = mnt_idmap(old_path.mnt);
rd.new_parent = new_path.dentry;
rd.new_dentry = new_dentry;
- rd.new_mnt_idmap = mnt_idmap(new_path.mnt);
rd.delegated_inode = &delegated_inode;
rd.flags = flags;
error = vfs_rename(&rd);
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index edf050766e57..aa4a95713a48 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1951,10 +1951,9 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
goto out_dput_old;
} else {
struct renamedata rd = {
- .old_mnt_idmap = &nop_mnt_idmap,
+ .mnt_idmap = &nop_mnt_idmap,
.old_parent = fdentry,
.old_dentry = odentry,
- .new_mnt_idmap = &nop_mnt_idmap,
.new_parent = tdentry,
.new_dentry = ndentry,
};
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index bb0d7ded8e76..4f84abaa0d68 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -361,10 +361,9 @@ static inline int ovl_do_rename(struct ovl_fs *ofs, struct dentry *olddir,
{
int err;
struct renamedata rd = {
- .old_mnt_idmap = ovl_upper_mnt_idmap(ofs),
+ .mnt_idmap = ovl_upper_mnt_idmap(ofs),
.old_parent = olddir,
.old_dentry = olddentry,
- .new_mnt_idmap = ovl_upper_mnt_idmap(ofs),
.new_parent = newdir,
.new_dentry = newdentry,
.flags = flags,
diff --git a/fs/smb/server/vfs.c b/fs/smb/server/vfs.c
index 04539037108c..07739055ac9f 100644
--- a/fs/smb/server/vfs.c
+++ b/fs/smb/server/vfs.c
@@ -770,10 +770,9 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path,
goto out4;
}
- rd.old_mnt_idmap = mnt_idmap(old_path->mnt),
+ rd.mnt_idmap = mnt_idmap(old_path->mnt),
rd.old_parent = old_parent,
rd.old_dentry = old_child,
- rd.new_mnt_idmap = mnt_idmap(new_path.mnt),
rd.new_parent = new_path.dentry,
rd.new_dentry = new_dentry,
rd.flags = flags,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d7ab4f96d705..73b39e5bb9e4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2008,20 +2008,18 @@ int vfs_unlink(struct mnt_idmap *, struct inode *, struct dentry *,
/**
* struct renamedata - contains all information required for renaming
- * @old_mnt_idmap: idmap of the old mount the inode was found from
+ * @mnt_idmap: idmap of the mount in which the rename is happening.
* @old_parent: parent of source
* @old_dentry: source
- * @new_mnt_idmap: idmap of the new mount the inode was found from
* @new_parent: parent of destination
* @new_dentry: destination
* @delegated_inode: returns an inode needing a delegation break
* @flags: rename flags
*/
struct renamedata {
- struct mnt_idmap *old_mnt_idmap;
+ struct mnt_idmap *mnt_idmap;
struct dentry *old_parent;
struct dentry *old_dentry;
- struct mnt_idmap *new_mnt_idmap;
struct dentry *new_parent;
struct dentry *new_dentry;
struct inode **delegated_inode;
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v2 4/7] VFS/audit: introduce kern_path_parent() for audit
2025-09-09 4:43 [PATCH v2 0/8] VFS: more prep for change to directory locking NeilBrown
` (2 preceding siblings ...)
2025-09-09 4:43 ` [PATCH v2 3/7] VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata NeilBrown
@ 2025-09-09 4:43 ` NeilBrown
2025-09-09 4:43 ` [PATCH v2 5/7] VFS: rename kern_path_locked() and related functions NeilBrown
` (2 subsequent siblings)
6 siblings, 0 replies; 25+ messages in thread
From: NeilBrown @ 2025-09-09 4:43 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
audit_alloc_mark() and audit_get_nd() both need to perform a path
lookup getting the parent dentry (which must exist) and the final
target (following a LAST_NORM name) which sometimes doesn't need to
exist.
They don't need the parent to be locked, but use kern_path_locked() or
kern_path_locked_negative() anyway. This is somewhat misleading to the
casual reader.
This patch introduces a more targeted function, kern_path_parent(),
which returns not holding locks. On success the "path" will
be set to the parent, which must be found, and the return value is the
dentry of the target, which might be negative.
This will clear the way to rename kern_path_locked() which is
otherwise only used to prepare for removing something.
It also allows us to remove kern_path_locked_negative(), which is
transformed into the new kern_path_parent().
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/namei.c | 23 +++++++++++++++++------
include/linux/namei.h | 2 +-
kernel/audit_fsnotify.c | 11 ++++++-----
kernel/audit_watch.c | 3 +--
4 files changed, 25 insertions(+), 14 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index 180037b96956..4017bc8641d3 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2781,7 +2781,20 @@ static struct dentry *__kern_path_locked(int dfd, struct filename *name, struct
return d;
}
-struct dentry *kern_path_locked_negative(const char *name, struct path *path)
+/**
+ * kern_path_parent: lookup path returning parent and target
+ * @name: path name
+ * @path: path to store parent in
+ *
+ * The path @name should end with a normal component, not "." or ".." or "/".
+ * A lookup is performed and if successful the parent information
+ * is store in @parent and the dentry is returned.
+ *
+ * The dentry maybe negative, the parent will be positive.
+ *
+ * Returns: dentry or error.
+ */
+struct dentry *kern_path_parent(const char *name, struct path *path)
{
struct path parent_path __free(path_put) = {};
struct filename *filename __free(putname) = getname_kernel(name);
@@ -2794,12 +2807,10 @@ struct dentry *kern_path_locked_negative(const char *name, struct path *path)
return ERR_PTR(error);
if (unlikely(type != LAST_NORM))
return ERR_PTR(-EINVAL);
- inode_lock_nested(parent_path.dentry->d_inode, I_MUTEX_PARENT);
- d = lookup_one_qstr_excl(&last, parent_path.dentry, LOOKUP_CREATE);
- if (IS_ERR(d)) {
- inode_unlock(parent_path.dentry->d_inode);
+
+ d = lookup_noperm_unlocked(&last, parent_path.dentry);
+ if (IS_ERR(d))
return d;
- }
path->dentry = no_free_ptr(parent_path.dentry);
path->mnt = no_free_ptr(parent_path.mnt);
return d;
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 551a1a01e5e7..1d5038c21c20 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -57,12 +57,12 @@ struct dentry *lookup_one_qstr_excl(const struct qstr *name,
struct dentry *base,
unsigned int flags);
extern int kern_path(const char *, unsigned, struct path *);
+struct dentry *kern_path_parent(const char *name, struct path *parent);
extern struct dentry *kern_path_create(int, const char *, struct path *, unsigned int);
extern struct dentry *user_path_create(int, const char __user *, struct path *, unsigned int);
extern void done_path_create(struct path *, struct dentry *);
extern struct dentry *kern_path_locked(const char *, struct path *);
-extern struct dentry *kern_path_locked_negative(const char *, struct path *);
extern struct dentry *user_path_locked_at(int , const char __user *, struct path *);
int vfs_path_parent_lookup(struct filename *filename, unsigned int flags,
struct path *parent, struct qstr *last, int *type,
diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c
index c565fbf66ac8..b92805b317a2 100644
--- a/kernel/audit_fsnotify.c
+++ b/kernel/audit_fsnotify.c
@@ -76,17 +76,18 @@ struct audit_fsnotify_mark *audit_alloc_mark(struct audit_krule *krule, char *pa
struct audit_fsnotify_mark *audit_mark;
struct path path;
struct dentry *dentry;
- struct inode *inode;
int ret;
if (pathname[0] != '/' || pathname[len-1] == '/')
return ERR_PTR(-EINVAL);
- dentry = kern_path_locked(pathname, &path);
+ dentry = kern_path_parent(pathname, &path);
if (IS_ERR(dentry))
return ERR_CAST(dentry); /* returning an error */
- inode = path.dentry->d_inode;
- inode_unlock(inode);
+ if (d_really_is_negative(dentry)) {
+ audit_mark = ERR_PTR(-ENOENT);
+ goto out;
+ }
audit_mark = kzalloc(sizeof(*audit_mark), GFP_KERNEL);
if (unlikely(!audit_mark)) {
@@ -100,7 +101,7 @@ struct audit_fsnotify_mark *audit_alloc_mark(struct audit_krule *krule, char *pa
audit_update_mark(audit_mark, dentry->d_inode);
audit_mark->rule = krule;
- ret = fsnotify_add_inode_mark(&audit_mark->mark, inode, 0);
+ ret = fsnotify_add_inode_mark(&audit_mark->mark, path.dentry->d_inode, 0);
if (ret < 0) {
audit_mark->path = NULL;
fsnotify_put_mark(&audit_mark->mark);
diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
index 0ebbbe37a60f..a700e3c8925f 100644
--- a/kernel/audit_watch.c
+++ b/kernel/audit_watch.c
@@ -349,7 +349,7 @@ static int audit_get_nd(struct audit_watch *watch, struct path *parent)
{
struct dentry *d;
- d = kern_path_locked_negative(watch->path, parent);
+ d = kern_path_parent(watch->path, parent);
if (IS_ERR(d))
return PTR_ERR(d);
@@ -359,7 +359,6 @@ static int audit_get_nd(struct audit_watch *watch, struct path *parent)
watch->ino = d_backing_inode(d)->i_ino;
}
- inode_unlock(d_backing_inode(parent->dentry));
dput(d);
return 0;
}
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v2 5/7] VFS: rename kern_path_locked() and related functions.
2025-09-09 4:43 [PATCH v2 0/8] VFS: more prep for change to directory locking NeilBrown
` (3 preceding siblings ...)
2025-09-09 4:43 ` [PATCH v2 4/7] VFS/audit: introduce kern_path_parent() for audit NeilBrown
@ 2025-09-09 4:43 ` NeilBrown
2025-09-09 14:07 ` Amir Goldstein
2025-09-09 4:43 ` [PATCH v2 6/7] VFS: introduce simple_end_creating() and simple_failed_creating() NeilBrown
2025-09-09 4:43 ` [PATCH v2 7/7] Use simple_start_creating() in various places NeilBrown
6 siblings, 1 reply; 25+ messages in thread
From: NeilBrown @ 2025-09-09 4:43 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
kern_path_locked() is now only used to prepare for removing an object
from the filesystem (and that is the only credible reason for wanting a
positive locked dentry). Thus it corresponds to kern_path_create() and
so should have a corresponding name.
Unfortunately the name "kern_path_create" is somewhat misleading as it
doesn't actually create anything. The recently added
simple_start_creating() provides a better pattern I believe. The
"start" can be matched with "end" to bracket the creating or removing.
So this patch changes names:
kern_path_locked -> start_removing_path
kern_path_create -> start_creating_path
user_path_create -> start_creating_user_path
user_path_locked_at -> start_removing_user_path_at
done_path_create -> end_creating_path
and also introduces end_removing_path() which is identical to
end_creating_path().
__start_removing_path (which was __kern_path_locked) is enhanced to
call mnt_want_write() for consistency with the start_creating_path().
Signed-off-by: NeilBrown <neil@brown.name>
---
Documentation/filesystems/porting.rst | 12 ++++
arch/powerpc/platforms/cell/spufs/syscalls.c | 4 +-
drivers/base/devtmpfs.c | 22 +++-----
fs/bcachefs/fs-ioctl.c | 10 ++--
fs/init.c | 17 +++---
fs/namei.c | 58 ++++++++++++--------
fs/ocfs2/refcounttree.c | 4 +-
fs/smb/server/vfs.c | 8 +--
include/linux/namei.h | 14 +++--
kernel/bpf/inode.c | 4 +-
net/unix/af_unix.c | 6 +-
11 files changed, 92 insertions(+), 67 deletions(-)
diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 85f590254f07..e0494860be6b 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1285,3 +1285,15 @@ rather than a VMA, as the VMA at this stage is not yet valid.
The vm_area_desc provides the minimum required information for a filesystem
to initialise state upon memory mapping of a file-backed region, and output
parameters for the file system to set this state.
+
+---
+
+**mandatory**
+
+Several functions are renamed:
+
+- kern_path_locked -> start_removing_path
+- kern_path_create -> start_creating_path
+- user_path_create -> start_creating_user_path
+- user_path_locked_at -> start_removing_user_path_at
+- done_path_create -> end_creating_path
diff --git a/arch/powerpc/platforms/cell/spufs/syscalls.c b/arch/powerpc/platforms/cell/spufs/syscalls.c
index 157e046e6e93..ea4ba1b6ce6a 100644
--- a/arch/powerpc/platforms/cell/spufs/syscalls.c
+++ b/arch/powerpc/platforms/cell/spufs/syscalls.c
@@ -67,11 +67,11 @@ static long do_spu_create(const char __user *pathname, unsigned int flags,
struct dentry *dentry;
int ret;
- dentry = user_path_create(AT_FDCWD, pathname, &path, LOOKUP_DIRECTORY);
+ dentry = start_creating_user_path(AT_FDCWD, pathname, &path, LOOKUP_DIRECTORY);
ret = PTR_ERR(dentry);
if (!IS_ERR(dentry)) {
ret = spufs_create(&path, dentry, flags, mode, neighbor);
- done_path_create(&path, dentry);
+ end_creating_path(&path, dentry);
}
return ret;
diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
index 31bfb3194b4c..9d4e46ad8352 100644
--- a/drivers/base/devtmpfs.c
+++ b/drivers/base/devtmpfs.c
@@ -176,7 +176,7 @@ static int dev_mkdir(const char *name, umode_t mode)
struct dentry *dentry;
struct path path;
- dentry = kern_path_create(AT_FDCWD, name, &path, LOOKUP_DIRECTORY);
+ dentry = start_creating_path(AT_FDCWD, name, &path, LOOKUP_DIRECTORY);
if (IS_ERR(dentry))
return PTR_ERR(dentry);
@@ -184,7 +184,7 @@ static int dev_mkdir(const char *name, umode_t mode)
if (!IS_ERR(dentry))
/* mark as kernel-created inode */
d_inode(dentry)->i_private = &thread;
- done_path_create(&path, dentry);
+ end_creating_path(&path, dentry);
return PTR_ERR_OR_ZERO(dentry);
}
@@ -222,10 +222,10 @@ static int handle_create(const char *nodename, umode_t mode, kuid_t uid,
struct path path;
int err;
- dentry = kern_path_create(AT_FDCWD, nodename, &path, 0);
+ dentry = start_creating_path(AT_FDCWD, nodename, &path, 0);
if (dentry == ERR_PTR(-ENOENT)) {
create_path(nodename);
- dentry = kern_path_create(AT_FDCWD, nodename, &path, 0);
+ dentry = start_creating_path(AT_FDCWD, nodename, &path, 0);
}
if (IS_ERR(dentry))
return PTR_ERR(dentry);
@@ -246,7 +246,7 @@ static int handle_create(const char *nodename, umode_t mode, kuid_t uid,
/* mark as kernel-created inode */
d_inode(dentry)->i_private = &thread;
}
- done_path_create(&path, dentry);
+ end_creating_path(&path, dentry);
return err;
}
@@ -256,7 +256,7 @@ static int dev_rmdir(const char *name)
struct dentry *dentry;
int err;
- dentry = kern_path_locked(name, &parent);
+ dentry = start_removing_path(name, &parent);
if (IS_ERR(dentry))
return PTR_ERR(dentry);
if (d_inode(dentry)->i_private == &thread)
@@ -265,9 +265,7 @@ static int dev_rmdir(const char *name)
else
err = -EPERM;
- dput(dentry);
- inode_unlock(d_inode(parent.dentry));
- path_put(&parent);
+ end_removing_path(&parent, dentry);
return err;
}
@@ -325,7 +323,7 @@ static int handle_remove(const char *nodename, struct device *dev)
int deleted = 0;
int err = 0;
- dentry = kern_path_locked(nodename, &parent);
+ dentry = start_removing_path(nodename, &parent);
if (IS_ERR(dentry))
return PTR_ERR(dentry);
@@ -349,10 +347,8 @@ static int handle_remove(const char *nodename, struct device *dev)
if (!err || err == -ENOENT)
deleted = 1;
}
- dput(dentry);
- inode_unlock(d_inode(parent.dentry));
+ end_removing_path(&parent, dentry);
- path_put(&parent);
if (deleted && strchr(nodename, '/'))
delete_path(nodename);
return err;
diff --git a/fs/bcachefs/fs-ioctl.c b/fs/bcachefs/fs-ioctl.c
index 4e72e654da96..43510da5e734 100644
--- a/fs/bcachefs/fs-ioctl.c
+++ b/fs/bcachefs/fs-ioctl.c
@@ -255,7 +255,7 @@ static long bch2_ioctl_subvolume_create(struct bch_fs *c, struct file *filp,
snapshot_src = inode_inum(to_bch_ei(src_path.dentry->d_inode));
}
- dst_dentry = user_path_create(arg.dirfd,
+ dst_dentry = start_creating_user_path(arg.dirfd,
(const char __user *)(unsigned long)arg.dst_ptr,
&dst_path, lookup_flags);
error = PTR_ERR_OR_ZERO(dst_dentry);
@@ -314,7 +314,7 @@ static long bch2_ioctl_subvolume_create(struct bch_fs *c, struct file *filp,
d_instantiate(dst_dentry, &inode->v);
fsnotify_mkdir(dir, dst_dentry);
err3:
- done_path_create(&dst_path, dst_dentry);
+ end_creating_path(&dst_path, dst_dentry);
err2:
if (arg.src_ptr)
path_put(&src_path);
@@ -334,7 +334,7 @@ static long bch2_ioctl_subvolume_destroy(struct bch_fs *c, struct file *filp,
if (arg.flags)
return -EINVAL;
- victim = user_path_locked_at(arg.dirfd, name, &path);
+ victim = start_removing_user_path_at(arg.dirfd, name, &path);
if (IS_ERR(victim))
return PTR_ERR(victim);
@@ -351,9 +351,7 @@ static long bch2_ioctl_subvolume_destroy(struct bch_fs *c, struct file *filp,
d_invalidate(victim);
}
err:
- inode_unlock(dir);
- dput(victim);
- path_put(&path);
+ end_removing_path(&path, victim);
return ret;
}
diff --git a/fs/init.c b/fs/init.c
index eef5124885e3..07f592ccdba8 100644
--- a/fs/init.c
+++ b/fs/init.c
@@ -149,7 +149,7 @@ int __init init_mknod(const char *filename, umode_t mode, unsigned int dev)
else if (!(S_ISBLK(mode) || S_ISCHR(mode)))
return -EINVAL;
- dentry = kern_path_create(AT_FDCWD, filename, &path, 0);
+ dentry = start_creating_path(AT_FDCWD, filename, &path, 0);
if (IS_ERR(dentry))
return PTR_ERR(dentry);
@@ -158,7 +158,7 @@ int __init init_mknod(const char *filename, umode_t mode, unsigned int dev)
if (!error)
error = vfs_mknod(mnt_idmap(path.mnt), path.dentry->d_inode,
dentry, mode, new_decode_dev(dev));
- done_path_create(&path, dentry);
+ end_creating_path(&path, dentry);
return error;
}
@@ -173,7 +173,7 @@ int __init init_link(const char *oldname, const char *newname)
if (error)
return error;
- new_dentry = kern_path_create(AT_FDCWD, newname, &new_path, 0);
+ new_dentry = start_creating_path(AT_FDCWD, newname, &new_path, 0);
error = PTR_ERR(new_dentry);
if (IS_ERR(new_dentry))
goto out;
@@ -191,7 +191,7 @@ int __init init_link(const char *oldname, const char *newname)
error = vfs_link(old_path.dentry, idmap, new_path.dentry->d_inode,
new_dentry, NULL);
out_dput:
- done_path_create(&new_path, new_dentry);
+ end_creating_path(&new_path, new_dentry);
out:
path_put(&old_path);
return error;
@@ -203,14 +203,14 @@ int __init init_symlink(const char *oldname, const char *newname)
struct path path;
int error;
- dentry = kern_path_create(AT_FDCWD, newname, &path, 0);
+ dentry = start_creating_path(AT_FDCWD, newname, &path, 0);
if (IS_ERR(dentry))
return PTR_ERR(dentry);
error = security_path_symlink(&path, dentry, oldname);
if (!error)
error = vfs_symlink(mnt_idmap(path.mnt), path.dentry->d_inode,
dentry, oldname);
- done_path_create(&path, dentry);
+ end_creating_path(&path, dentry);
return error;
}
@@ -225,7 +225,8 @@ int __init init_mkdir(const char *pathname, umode_t mode)
struct path path;
int error;
- dentry = kern_path_create(AT_FDCWD, pathname, &path, LOOKUP_DIRECTORY);
+ dentry = start_creating_path(AT_FDCWD, pathname, &path,
+ LOOKUP_DIRECTORY);
if (IS_ERR(dentry))
return PTR_ERR(dentry);
mode = mode_strip_umask(d_inode(path.dentry), mode);
@@ -236,7 +237,7 @@ int __init init_mkdir(const char *pathname, umode_t mode)
if (IS_ERR(dentry))
error = PTR_ERR(dentry);
}
- done_path_create(&path, dentry);
+ end_creating_path(&path, dentry);
return error;
}
diff --git a/fs/namei.c b/fs/namei.c
index 4017bc8641d3..ee693d16086e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2758,7 +2758,8 @@ static int filename_parentat(int dfd, struct filename *name,
}
/* does lookup, returns the object with parent locked */
-static struct dentry *__kern_path_locked(int dfd, struct filename *name, struct path *path)
+static struct dentry *__start_removing_path(int dfd, struct filename *name,
+ struct path *path)
{
struct path parent_path __free(path_put) = {};
struct dentry *d;
@@ -2770,15 +2771,25 @@ static struct dentry *__kern_path_locked(int dfd, struct filename *name, struct
return ERR_PTR(error);
if (unlikely(type != LAST_NORM))
return ERR_PTR(-EINVAL);
+ error = mnt_want_write(path->mnt);
inode_lock_nested(parent_path.dentry->d_inode, I_MUTEX_PARENT);
d = lookup_one_qstr_excl(&last, parent_path.dentry, 0);
- if (IS_ERR(d)) {
- inode_unlock(parent_path.dentry->d_inode);
- return d;
- }
+ if (IS_ERR(d))
+ goto unlock;
+ if (error)
+ goto fail;
path->dentry = no_free_ptr(parent_path.dentry);
path->mnt = no_free_ptr(parent_path.mnt);
return d;
+
+fail:
+ dput(d);
+ d = ERR_PTR(error);
+unlock:
+ inode_unlock(parent_path.dentry->d_inode);
+ if (!error)
+ mnt_drop_write(path->mnt);
+ return d;
}
/**
@@ -2816,24 +2827,26 @@ struct dentry *kern_path_parent(const char *name, struct path *path)
return d;
}
-struct dentry *kern_path_locked(const char *name, struct path *path)
+struct dentry *start_removing_path(const char *name, struct path *path)
{
struct filename *filename = getname_kernel(name);
- struct dentry *res = __kern_path_locked(AT_FDCWD, filename, path);
+ struct dentry *res = __start_removing_path(AT_FDCWD, filename, path);
putname(filename);
return res;
}
-struct dentry *user_path_locked_at(int dfd, const char __user *name, struct path *path)
+struct dentry *start_removing_user_path_at(int dfd,
+ const char __user *name,
+ struct path *path)
{
struct filename *filename = getname(name);
- struct dentry *res = __kern_path_locked(dfd, filename, path);
+ struct dentry *res = __start_removing_path(dfd, filename, path);
putname(filename);
return res;
}
-EXPORT_SYMBOL(user_path_locked_at);
+EXPORT_SYMBOL(start_removing_user_path_at);
int kern_path(const char *name, unsigned int flags, struct path *path)
{
@@ -4223,8 +4236,8 @@ static struct dentry *filename_create(int dfd, struct filename *name,
return dentry;
}
-struct dentry *kern_path_create(int dfd, const char *pathname,
- struct path *path, unsigned int lookup_flags)
+struct dentry *start_creating_path(int dfd, const char *pathname,
+ struct path *path, unsigned int lookup_flags)
{
struct filename *filename = getname_kernel(pathname);
struct dentry *res = filename_create(dfd, filename, path, lookup_flags);
@@ -4232,9 +4245,9 @@ struct dentry *kern_path_create(int dfd, const char *pathname,
putname(filename);
return res;
}
-EXPORT_SYMBOL(kern_path_create);
+EXPORT_SYMBOL(start_creating_path);
-void done_path_create(struct path *path, struct dentry *dentry)
+void end_creating_path(struct path *path, struct dentry *dentry)
{
if (!IS_ERR(dentry))
dput(dentry);
@@ -4242,10 +4255,11 @@ void done_path_create(struct path *path, struct dentry *dentry)
mnt_drop_write(path->mnt);
path_put(path);
}
-EXPORT_SYMBOL(done_path_create);
+EXPORT_SYMBOL(end_creating_path);
-inline struct dentry *user_path_create(int dfd, const char __user *pathname,
- struct path *path, unsigned int lookup_flags)
+inline struct dentry *start_creating_user_path(
+ int dfd, const char __user *pathname,
+ struct path *path, unsigned int lookup_flags)
{
struct filename *filename = getname(pathname);
struct dentry *res = filename_create(dfd, filename, path, lookup_flags);
@@ -4253,7 +4267,7 @@ inline struct dentry *user_path_create(int dfd, const char __user *pathname,
putname(filename);
return res;
}
-EXPORT_SYMBOL(user_path_create);
+EXPORT_SYMBOL(start_creating_user_path);
/**
* vfs_mknod - create device node or file
@@ -4361,7 +4375,7 @@ static int do_mknodat(int dfd, struct filename *name, umode_t mode,
break;
}
out2:
- done_path_create(&path, dentry);
+ end_creating_path(&path, dentry);
if (retry_estale(error, lookup_flags)) {
lookup_flags |= LOOKUP_REVAL;
goto retry;
@@ -4465,7 +4479,7 @@ int do_mkdirat(int dfd, struct filename *name, umode_t mode)
if (IS_ERR(dentry))
error = PTR_ERR(dentry);
}
- done_path_create(&path, dentry);
+ end_creating_path(&path, dentry);
if (retry_estale(error, lookup_flags)) {
lookup_flags |= LOOKUP_REVAL;
goto retry;
@@ -4819,7 +4833,7 @@ int do_symlinkat(struct filename *from, int newdfd, struct filename *to)
if (!error)
error = vfs_symlink(mnt_idmap(path.mnt), path.dentry->d_inode,
dentry, from->name);
- done_path_create(&path, dentry);
+ end_creating_path(&path, dentry);
if (retry_estale(error, lookup_flags)) {
lookup_flags |= LOOKUP_REVAL;
goto retry;
@@ -4988,7 +5002,7 @@ int do_linkat(int olddfd, struct filename *old, int newdfd,
error = vfs_link(old_path.dentry, idmap, new_path.dentry->d_inode,
new_dentry, &delegated_inode);
out_dput:
- done_path_create(&new_path, new_dentry);
+ end_creating_path(&new_path, new_dentry);
if (delegated_inode) {
error = break_deleg_wait(&delegated_inode);
if (!error) {
diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c
index 8f732742b26e..267b50e8e42e 100644
--- a/fs/ocfs2/refcounttree.c
+++ b/fs/ocfs2/refcounttree.c
@@ -4418,7 +4418,7 @@ int ocfs2_reflink_ioctl(struct inode *inode,
return error;
}
- new_dentry = user_path_create(AT_FDCWD, newname, &new_path, 0);
+ new_dentry = start_creating_user_path(AT_FDCWD, newname, &new_path, 0);
error = PTR_ERR(new_dentry);
if (IS_ERR(new_dentry)) {
mlog_errno(error);
@@ -4435,7 +4435,7 @@ int ocfs2_reflink_ioctl(struct inode *inode,
d_inode(new_path.dentry),
new_dentry, preserve);
out_dput:
- done_path_create(&new_path, new_dentry);
+ end_creating_path(&new_path, new_dentry);
out:
path_put(&old_path);
diff --git a/fs/smb/server/vfs.c b/fs/smb/server/vfs.c
index 07739055ac9f..1cfa688904b2 100644
--- a/fs/smb/server/vfs.c
+++ b/fs/smb/server/vfs.c
@@ -196,7 +196,7 @@ int ksmbd_vfs_create(struct ksmbd_work *work, const char *name, umode_t mode)
pr_err("File(%s): creation failed (err:%d)\n", name, err);
}
- done_path_create(&path, dentry);
+ end_creating_path(&path, dentry);
return err;
}
@@ -237,7 +237,7 @@ int ksmbd_vfs_mkdir(struct ksmbd_work *work, const char *name, umode_t mode)
if (!err && dentry != d)
ksmbd_vfs_inherit_owner(work, d_inode(path.dentry), d_inode(dentry));
- done_path_create(&path, dentry);
+ end_creating_path(&path, dentry);
if (err)
pr_err("mkdir(%s): creation failed (err:%d)\n", name, err);
return err;
@@ -669,7 +669,7 @@ int ksmbd_vfs_link(struct ksmbd_work *work, const char *oldname,
ksmbd_debug(VFS, "vfs_link failed err %d\n", err);
out3:
- done_path_create(&newpath, dentry);
+ end_creating_path(&newpath, dentry);
out2:
path_put(&oldpath);
out1:
@@ -1325,7 +1325,7 @@ struct dentry *ksmbd_vfs_kern_path_create(struct ksmbd_work *work,
if (!abs_name)
return ERR_PTR(-ENOMEM);
- dent = kern_path_create(AT_FDCWD, abs_name, path, flags);
+ dent = start_creating_path(AT_FDCWD, abs_name, path, flags);
kfree(abs_name);
return dent;
}
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 1d5038c21c20..a7800ef04e76 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -59,11 +59,15 @@ struct dentry *lookup_one_qstr_excl(const struct qstr *name,
extern int kern_path(const char *, unsigned, struct path *);
struct dentry *kern_path_parent(const char *name, struct path *parent);
-extern struct dentry *kern_path_create(int, const char *, struct path *, unsigned int);
-extern struct dentry *user_path_create(int, const char __user *, struct path *, unsigned int);
-extern void done_path_create(struct path *, struct dentry *);
-extern struct dentry *kern_path_locked(const char *, struct path *);
-extern struct dentry *user_path_locked_at(int , const char __user *, struct path *);
+extern struct dentry *start_creating_path(int, const char *, struct path *, unsigned int);
+extern struct dentry *start_creating_user_path(int, const char __user *, struct path *, unsigned int);
+extern void end_creating_path(struct path *, struct dentry *);
+extern struct dentry *start_removing_path(const char *, struct path *);
+extern struct dentry *start_removing_user_path_at(int , const char __user *, struct path *);
+static inline void end_removing_path(struct path *path , struct dentry *dentry)
+{
+ end_creating_path(path, dentry);
+}
int vfs_path_parent_lookup(struct filename *filename, unsigned int flags,
struct path *parent, struct qstr *last, int *type,
const struct path *root);
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 5c2e96b19392..fadf3817a9c5 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -442,7 +442,7 @@ static int bpf_obj_do_pin(int path_fd, const char __user *pathname, void *raw,
umode_t mode;
int ret;
- dentry = user_path_create(path_fd, pathname, &path, 0);
+ dentry = start_creating_user_path(path_fd, pathname, &path, 0);
if (IS_ERR(dentry))
return PTR_ERR(dentry);
@@ -471,7 +471,7 @@ static int bpf_obj_do_pin(int path_fd, const char __user *pathname, void *raw,
ret = -EPERM;
}
out:
- done_path_create(&path, dentry);
+ end_creating_path(&path, dentry);
return ret;
}
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 6d7c110814ff..768098dec231 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1387,7 +1387,7 @@ static int unix_bind_bsd(struct sock *sk, struct sockaddr_un *sunaddr,
* Get the parent directory, calculate the hash for last
* component.
*/
- dentry = kern_path_create(AT_FDCWD, addr->name->sun_path, &parent, 0);
+ dentry = start_creating_path(AT_FDCWD, addr->name->sun_path, &parent, 0);
if (IS_ERR(dentry)) {
err = PTR_ERR(dentry);
goto out;
@@ -1417,7 +1417,7 @@ static int unix_bind_bsd(struct sock *sk, struct sockaddr_un *sunaddr,
unix_table_double_unlock(net, old_hash, new_hash);
unix_insert_bsd_socket(sk);
mutex_unlock(&u->bindlock);
- done_path_create(&parent, dentry);
+ end_creating_path(&parent, dentry);
return 0;
out_unlock:
@@ -1427,7 +1427,7 @@ static int unix_bind_bsd(struct sock *sk, struct sockaddr_un *sunaddr,
/* failed after successful mknod? unlink what we'd created... */
vfs_unlink(idmap, d_inode(parent.dentry), dentry, NULL);
out_path:
- done_path_create(&parent, dentry);
+ end_creating_path(&parent, dentry);
out:
unix_release_addr(addr);
return err == -EEXIST ? -EADDRINUSE : err;
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v2 6/7] VFS: introduce simple_end_creating() and simple_failed_creating()
2025-09-09 4:43 [PATCH v2 0/8] VFS: more prep for change to directory locking NeilBrown
` (4 preceding siblings ...)
2025-09-09 4:43 ` [PATCH v2 5/7] VFS: rename kern_path_locked() and related functions NeilBrown
@ 2025-09-09 4:43 ` NeilBrown
2025-09-09 8:20 ` Al Viro
2025-09-09 4:43 ` [PATCH v2 7/7] Use simple_start_creating() in various places NeilBrown
6 siblings, 1 reply; 25+ messages in thread
From: NeilBrown @ 2025-09-09 4:43 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
These are partners of simple_start_creating().
On failure we don't keep a reference. On success we do.
We now Use these where simple_start_creating() is used, in debugfs,
tracefs, and rpcpipefs.
This is part of centralising all locking for create/remote/rename
operations.
Also rename start_creating, end_creating, failed_creating in debugfs to
free up these generic names for more generic use.
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/debugfs/inode.c | 43 +++++++++++++++++++++----------------------
fs/tracefs/inode.c | 6 ++----
include/linux/fs.h | 19 +++++++++++++++++++
net/sunrpc/rpc_pipe.c | 11 ++++-------
4 files changed, 46 insertions(+), 33 deletions(-)
diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
index c12d649df6a5..1a13739b2cef 100644
--- a/fs/debugfs/inode.c
+++ b/fs/debugfs/inode.c
@@ -362,7 +362,8 @@ struct dentry *debugfs_lookup(const char *name, struct dentry *parent)
}
EXPORT_SYMBOL_GPL(debugfs_lookup);
-static struct dentry *start_creating(const char *name, struct dentry *parent)
+static struct dentry *debugfs_start_creating(const char *name,
+ struct dentry *parent)
{
struct dentry *dentry;
int error;
@@ -402,18 +403,16 @@ static struct dentry *start_creating(const char *name, struct dentry *parent)
return dentry;
}
-static struct dentry *failed_creating(struct dentry *dentry)
+static struct dentry *debugfs_failed_creating(struct dentry *dentry)
{
- inode_unlock(d_inode(dentry->d_parent));
- dput(dentry);
+ simple_failed_creating(dentry);
simple_release_fs(&debugfs_mount, &debugfs_mount_count);
return ERR_PTR(-ENOMEM);
}
-static struct dentry *end_creating(struct dentry *dentry)
+static struct dentry *debugfs_end_creating(struct dentry *dentry)
{
- inode_unlock(d_inode(dentry->d_parent));
- return dentry;
+ return simple_end_creating(dentry);
}
static struct dentry *__debugfs_create_file(const char *name, umode_t mode,
@@ -428,13 +427,13 @@ static struct dentry *__debugfs_create_file(const char *name, umode_t mode,
if (!(mode & S_IFMT))
mode |= S_IFREG;
BUG_ON(!S_ISREG(mode));
- dentry = start_creating(name, parent);
+ dentry = debugfs_start_creating(name, parent);
if (IS_ERR(dentry))
return dentry;
if (!(debugfs_allow & DEBUGFS_ALLOW_API)) {
- failed_creating(dentry);
+ debugfs_failed_creating(dentry);
return ERR_PTR(-EPERM);
}
@@ -442,7 +441,7 @@ static struct dentry *__debugfs_create_file(const char *name, umode_t mode,
if (unlikely(!inode)) {
pr_err("out of free dentries, can not create file '%s'\n",
name);
- return failed_creating(dentry);
+ return debugfs_failed_creating(dentry);
}
inode->i_mode = mode;
@@ -457,7 +456,7 @@ static struct dentry *__debugfs_create_file(const char *name, umode_t mode,
d_instantiate(dentry, inode);
fsnotify_create(d_inode(dentry->d_parent), dentry);
- return end_creating(dentry);
+ return debugfs_end_creating(dentry);
}
struct dentry *debugfs_create_file_full(const char *name, umode_t mode,
@@ -577,14 +576,14 @@ EXPORT_SYMBOL_GPL(debugfs_create_file_size);
*/
struct dentry *debugfs_create_dir(const char *name, struct dentry *parent)
{
- struct dentry *dentry = start_creating(name, parent);
+ struct dentry *dentry = debugfs_start_creating(name, parent);
struct inode *inode;
if (IS_ERR(dentry))
return dentry;
if (!(debugfs_allow & DEBUGFS_ALLOW_API)) {
- failed_creating(dentry);
+ debugfs_failed_creating(dentry);
return ERR_PTR(-EPERM);
}
@@ -592,7 +591,7 @@ struct dentry *debugfs_create_dir(const char *name, struct dentry *parent)
if (unlikely(!inode)) {
pr_err("out of free dentries, can not create directory '%s'\n",
name);
- return failed_creating(dentry);
+ return debugfs_failed_creating(dentry);
}
inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
@@ -604,7 +603,7 @@ struct dentry *debugfs_create_dir(const char *name, struct dentry *parent)
d_instantiate(dentry, inode);
inc_nlink(d_inode(dentry->d_parent));
fsnotify_mkdir(d_inode(dentry->d_parent), dentry);
- return end_creating(dentry);
+ return debugfs_end_creating(dentry);
}
EXPORT_SYMBOL_GPL(debugfs_create_dir);
@@ -624,14 +623,14 @@ struct dentry *debugfs_create_automount(const char *name,
debugfs_automount_t f,
void *data)
{
- struct dentry *dentry = start_creating(name, parent);
+ struct dentry *dentry = debugfs_start_creating(name, parent);
struct inode *inode;
if (IS_ERR(dentry))
return dentry;
if (!(debugfs_allow & DEBUGFS_ALLOW_API)) {
- failed_creating(dentry);
+ debugfs_failed_creating(dentry);
return ERR_PTR(-EPERM);
}
@@ -639,7 +638,7 @@ struct dentry *debugfs_create_automount(const char *name,
if (unlikely(!inode)) {
pr_err("out of free dentries, can not create automount '%s'\n",
name);
- return failed_creating(dentry);
+ return debugfs_failed_creating(dentry);
}
make_empty_dir_inode(inode);
@@ -651,7 +650,7 @@ struct dentry *debugfs_create_automount(const char *name,
d_instantiate(dentry, inode);
inc_nlink(d_inode(dentry->d_parent));
fsnotify_mkdir(d_inode(dentry->d_parent), dentry);
- return end_creating(dentry);
+ return debugfs_end_creating(dentry);
}
EXPORT_SYMBOL(debugfs_create_automount);
@@ -687,7 +686,7 @@ struct dentry *debugfs_create_symlink(const char *name, struct dentry *parent,
if (!link)
return ERR_PTR(-ENOMEM);
- dentry = start_creating(name, parent);
+ dentry = debugfs_start_creating(name, parent);
if (IS_ERR(dentry)) {
kfree(link);
return dentry;
@@ -698,13 +697,13 @@ struct dentry *debugfs_create_symlink(const char *name, struct dentry *parent,
pr_err("out of free dentries, can not create symlink '%s'\n",
name);
kfree(link);
- return failed_creating(dentry);
+ return debugfs_failed_creating(dentry);
}
inode->i_mode = S_IFLNK | S_IRWXUGO;
inode->i_op = &debugfs_symlink_inode_operations;
inode->i_link = link;
d_instantiate(dentry, inode);
- return end_creating(dentry);
+ return debugfs_end_creating(dentry);
}
EXPORT_SYMBOL_GPL(debugfs_create_symlink);
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index 0c023941a316..320d7f25024b 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -571,16 +571,14 @@ struct dentry *tracefs_start_creating(const char *name, struct dentry *parent)
struct dentry *tracefs_failed_creating(struct dentry *dentry)
{
- inode_unlock(d_inode(dentry->d_parent));
- dput(dentry);
+ simple_failed_creating(dentry);
simple_release_fs(&tracefs_mount, &tracefs_mount_count);
return NULL;
}
struct dentry *tracefs_end_creating(struct dentry *dentry)
{
- inode_unlock(d_inode(dentry->d_parent));
- return dentry;
+ return simple_end_creating(dentry);
}
/* Find the inode that this will use for default */
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 73b39e5bb9e4..f2b5456305e2 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3656,6 +3656,25 @@ extern int simple_pin_fs(struct file_system_type *, struct vfsmount **mount, int
extern void simple_release_fs(struct vfsmount **mount, int *count);
struct dentry *simple_start_creating(struct dentry *, const char *);
+/* filesystems which use the dcache as backing store don't
+ * drop the reference after creating an object - they keep it
+ * to be dropped by kill_litter_super().
+ */
+static inline struct dentry *simple_end_creating(struct dentry *dentry)
+{
+ inode_unlock(dentry->d_parent->d_inode);
+ return dentry;
+}
+
+/* On failure, we DO drop the reference because there is no need
+ * keep it.
+ */
+static inline void simple_failed_creating(struct dentry *dentry)
+{
+ inode_unlock(dentry->d_parent->d_inode);
+ dput(dentry);
+}
+
extern ssize_t simple_read_from_buffer(void __user *to, size_t count,
loff_t *ppos, const void *from, size_t available);
extern ssize_t simple_write_to_buffer(void *to, size_t available, loff_t *ppos,
diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
index 0bd1df2ebb47..38c26909235d 100644
--- a/net/sunrpc/rpc_pipe.c
+++ b/net/sunrpc/rpc_pipe.c
@@ -536,8 +536,7 @@ static int rpc_new_file(struct dentry *parent,
inode = rpc_get_inode(dir->i_sb, S_IFREG | mode);
if (unlikely(!inode)) {
- dput(dentry);
- inode_unlock(dir);
+ simple_failed_creating(dentry);
return -ENOMEM;
}
inode->i_ino = iunique(dir->i_sb, 100);
@@ -546,7 +545,7 @@ static int rpc_new_file(struct dentry *parent,
rpc_inode_setowner(inode, private);
d_instantiate(dentry, inode);
fsnotify_create(dir, dentry);
- inode_unlock(dir);
+ simple_end_creating(dentry);
return 0;
}
@@ -572,9 +571,8 @@ static struct dentry *rpc_new_dir(struct dentry *parent,
inc_nlink(dir);
d_instantiate(dentry, inode);
fsnotify_mkdir(dir, dentry);
- inode_unlock(dir);
- return dentry;
+ return simple_end_creating(dentry);
}
static int rpc_populate(struct dentry *parent,
@@ -669,9 +667,8 @@ int rpc_mkpipe_dentry(struct dentry *parent, const char *name,
rpci->pipe = pipe;
rpc_inode_setowner(inode, private);
d_instantiate(dentry, inode);
- pipe->dentry = dentry;
fsnotify_create(dir, dentry);
- inode_unlock(dir);
+ pipe->dentry = simple_end_creating(dentry);
return 0;
failed:
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v2 7/7] Use simple_start_creating() in various places.
2025-09-09 4:43 [PATCH v2 0/8] VFS: more prep for change to directory locking NeilBrown
` (5 preceding siblings ...)
2025-09-09 4:43 ` [PATCH v2 6/7] VFS: introduce simple_end_creating() and simple_failed_creating() NeilBrown
@ 2025-09-09 4:43 ` NeilBrown
2025-09-09 8:19 ` Al Viro
6 siblings, 1 reply; 25+ messages in thread
From: NeilBrown @ 2025-09-09 4:43 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
Each of
s390/hypfs, android/binder, binfmt_misc,
devpts, nfsd, bpf, apparmour, selinux, security
create FS objects following the pattern of simple_start_creating().
This patch changes them all to use simple_start_creating() and to clean
up with simple_failed_creating() or simple_end_creating().
Signed-off-by: NeilBrown <neil@brown.name>
---
arch/s390/hypfs/inode.c | 20 +++++-------
drivers/android/binderfs.c | 53 +++++++-------------------------
fs/binfmt_misc.c | 30 +++++++-----------
fs/devpts/inode.c | 29 +++++++-----------
fs/nfsd/nfsctl.c | 56 ++++++++++++++--------------------
kernel/bpf/inode.c | 14 ++++-----
security/apparmor/apparmorfs.c | 37 ++++++----------------
security/inode.c | 19 +++---------
security/selinux/selinuxfs.c | 8 ++---
9 files changed, 88 insertions(+), 178 deletions(-)
diff --git a/arch/s390/hypfs/inode.c b/arch/s390/hypfs/inode.c
index 96409573c75d..00639f458068 100644
--- a/arch/s390/hypfs/inode.c
+++ b/arch/s390/hypfs/inode.c
@@ -341,17 +341,14 @@ static struct dentry *hypfs_create_file(struct dentry *parent, const char *name,
struct dentry *dentry;
struct inode *inode;
- inode_lock(d_inode(parent));
- dentry = lookup_noperm(&QSTR(name), parent);
- if (IS_ERR(dentry)) {
- dentry = ERR_PTR(-ENOMEM);
- goto fail;
- }
+ dentry = simple_start_creating(parent, name);
+ if (IS_ERR(dentry))
+ return ERR_PTR(-ENOMEM);
+
inode = hypfs_make_inode(parent->d_sb, mode);
if (!inode) {
- dput(dentry);
- dentry = ERR_PTR(-ENOMEM);
- goto fail;
+ simple_failed_creating(dentry);
+ return ERR_PTR(-ENOMEM);
}
if (S_ISREG(mode)) {
inode->i_fop = &hypfs_file_ops;
@@ -367,10 +364,7 @@ static struct dentry *hypfs_create_file(struct dentry *parent, const char *name,
BUG();
inode->i_private = data;
d_instantiate(dentry, inode);
- dget(dentry);
-fail:
- inode_unlock(d_inode(parent));
- return dentry;
+ return simple_end_creating(dentry);
}
struct dentry *hypfs_mkdir(struct dentry *parent, const char *name)
diff --git a/drivers/android/binderfs.c b/drivers/android/binderfs.c
index 0d9d95a7fb60..466e8c0007c3 100644
--- a/drivers/android/binderfs.c
+++ b/drivers/android/binderfs.c
@@ -181,28 +181,17 @@ static int binderfs_binder_device_create(struct inode *ref_inode,
}
root = sb->s_root;
- inode_lock(d_inode(root));
- /* look it up */
- dentry = lookup_noperm(&QSTR(name), root);
+ dentry = simple_start_creating(root, name);
if (IS_ERR(dentry)) {
- inode_unlock(d_inode(root));
ret = PTR_ERR(dentry);
goto err;
}
- if (d_really_is_positive(dentry)) {
- /* already exists */
- dput(dentry);
- inode_unlock(d_inode(root));
- ret = -EEXIST;
- goto err;
- }
-
inode->i_private = device;
d_instantiate(dentry, inode);
fsnotify_create(root->d_inode, dentry);
- inode_unlock(d_inode(root));
+ simple_end_creating(dentry);
binder_add_device(device);
@@ -482,19 +471,7 @@ static struct inode *binderfs_make_inode(struct super_block *sb, int mode)
static struct dentry *binderfs_create_dentry(struct dentry *parent,
const char *name)
{
- struct dentry *dentry;
-
- dentry = lookup_noperm(&QSTR(name), parent);
- if (IS_ERR(dentry))
- return dentry;
-
- /* Return error if the file/dir already exists. */
- if (d_really_is_positive(dentry)) {
- dput(dentry);
- return ERR_PTR(-EEXIST);
- }
-
- return dentry;
+ return simple_start_creating(parent, name);
}
struct dentry *binderfs_create_file(struct dentry *parent, const char *name,
@@ -506,18 +483,16 @@ struct dentry *binderfs_create_file(struct dentry *parent, const char *name,
struct super_block *sb;
parent_inode = d_inode(parent);
- inode_lock(parent_inode);
dentry = binderfs_create_dentry(parent, name);
if (IS_ERR(dentry))
- goto out;
+ return dentry;
sb = parent_inode->i_sb;
new_inode = binderfs_make_inode(sb, S_IFREG | 0444);
if (!new_inode) {
- dput(dentry);
- dentry = ERR_PTR(-ENOMEM);
- goto out;
+ simple_failed_creating(dentry);
+ return ERR_PTR(-ENOMEM);
}
new_inode->i_fop = fops;
@@ -525,9 +500,7 @@ struct dentry *binderfs_create_file(struct dentry *parent, const char *name,
d_instantiate(dentry, new_inode);
fsnotify_create(parent_inode, dentry);
-out:
- inode_unlock(parent_inode);
- return dentry;
+ return simple_end_creating(dentry);
}
static struct dentry *binderfs_create_dir(struct dentry *parent,
@@ -538,18 +511,16 @@ static struct dentry *binderfs_create_dir(struct dentry *parent,
struct super_block *sb;
parent_inode = d_inode(parent);
- inode_lock(parent_inode);
dentry = binderfs_create_dentry(parent, name);
if (IS_ERR(dentry))
- goto out;
+ return dentry;
sb = parent_inode->i_sb;
new_inode = binderfs_make_inode(sb, S_IFDIR | 0755);
if (!new_inode) {
- dput(dentry);
- dentry = ERR_PTR(-ENOMEM);
- goto out;
+ simple_failed_creating(dentry);
+ return ERR_PTR(-ENOMEM);
}
new_inode->i_fop = &simple_dir_operations;
@@ -560,9 +531,7 @@ static struct dentry *binderfs_create_dir(struct dentry *parent,
inc_nlink(parent_inode);
fsnotify_mkdir(parent_inode, dentry);
-out:
- inode_unlock(parent_inode);
- return dentry;
+ return simple_end_creating(dentry);
}
static int binder_features_show(struct seq_file *m, void *unused)
diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index a839f960cd4a..dbe56f243174 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -796,28 +796,23 @@ static ssize_t bm_register_write(struct file *file, const char __user *buffer,
revert_creds(old_cred);
if (IS_ERR(f)) {
pr_notice("register: failed to install interpreter file %s\n",
- e->interpreter);
+ e->interpreter);
kfree(e);
return PTR_ERR(f);
}
e->interp_file = f;
}
- inode_lock(d_inode(root));
- dentry = lookup_noperm(&QSTR(e->name), root);
+ dentry = simple_start_creating(root, e->name);
err = PTR_ERR(dentry);
if (IS_ERR(dentry))
goto out;
- err = -EEXIST;
- if (d_really_is_positive(dentry))
- goto out2;
-
inode = bm_get_inode(sb, S_IFREG | 0644);
err = -ENOMEM;
if (!inode)
- goto out2;
+ goto out;
refcount_set(&e->users, 1);
e->dentry = dget(dentry);
@@ -830,19 +825,16 @@ static ssize_t bm_register_write(struct file *file, const char __user *buffer,
list_add(&e->list, &misc->entries);
write_unlock(&misc->entries_lock);
- err = 0;
-out2:
- dput(dentry);
+ simple_end_creating(dentry);
+ return count;
+
out:
- inode_unlock(d_inode(root));
+ simple_failed_creating(dentry);
- if (err) {
- if (f)
- filp_close(f, NULL);
- kfree(e);
- return err;
- }
- return count;
+ if (f)
+ filp_close(f, NULL);
+ kfree(e);
+ return err;
}
static const struct file_operations bm_register_operations = {
diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
index fdf22264a8e9..85c78e133dad 100644
--- a/fs/devpts/inode.c
+++ b/fs/devpts/inode.c
@@ -259,7 +259,6 @@ static int devpts_parse_param(struct fs_context *fc, struct fs_parameter *param)
static int mknod_ptmx(struct super_block *sb, struct fs_context *fc)
{
int mode;
- int rc = -ENOMEM;
struct dentry *dentry;
struct inode *inode;
struct dentry *root = sb->s_root;
@@ -268,18 +267,15 @@ static int mknod_ptmx(struct super_block *sb, struct fs_context *fc)
kuid_t ptmx_uid = current_fsuid();
kgid_t ptmx_gid = current_fsgid();
- inode_lock(d_inode(root));
-
- /* If we have already created ptmx node, return */
- if (fsi->ptmx_dentry) {
- rc = 0;
- goto out;
- }
+ dentry = simple_start_creating(root, "ptmx");
+ if (IS_ERR(dentry)) {
+ if (dentry == ERR_PTR(-EEXIST)) {
+ /* If we have already created ptmx node, return */
+ return 0;
+ }
- dentry = d_alloc_name(root, "ptmx");
- if (!dentry) {
pr_err("Unable to alloc dentry for ptmx node\n");
- goto out;
+ return -ENOMEM;
}
/*
@@ -288,8 +284,8 @@ static int mknod_ptmx(struct super_block *sb, struct fs_context *fc)
inode = new_inode(sb);
if (!inode) {
pr_err("Unable to alloc inode for ptmx node\n");
- dput(dentry);
- goto out;
+ simple_failed_creating(dentry);
+ return -ENOMEM;
}
inode->i_ino = 2;
@@ -302,11 +298,8 @@ static int mknod_ptmx(struct super_block *sb, struct fs_context *fc)
d_add(dentry, inode);
- fsi->ptmx_dentry = dentry;
- rc = 0;
-out:
- inode_unlock(d_inode(root));
- return rc;
+ fsi->ptmx_dentry = simple_end_creating(dentry);
+ return 0;
}
static void update_ptmx_mode(struct pts_fs_info *fsi)
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index bc6b776fc657..e0aac224172c 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1147,24 +1147,21 @@ static int __nfsd_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode,
static struct dentry *nfsd_mkdir(struct dentry *parent, struct nfsdfs_client *ncl, char *name)
{
- struct inode *dir = parent->d_inode;
struct dentry *dentry;
- int ret = -ENOMEM;
+ int ret;
+
+ dentry = simple_start_creating(parent, name);
+ if (IS_ERR(dentry))
+ return dentry;
- inode_lock(dir);
- dentry = d_alloc_name(parent, name);
- if (!dentry)
- goto out_err;
ret = __nfsd_mkdir(d_inode(parent), dentry, S_IFDIR | 0600, ncl);
if (ret)
goto out_err;
-out:
- inode_unlock(dir);
- return dentry;
+ return simple_end_creating(dentry);
+
out_err:
- dput(dentry);
- dentry = ERR_PTR(ret);
- goto out;
+ simple_failed_creating(dentry);
+ return ERR_PTR(ret);
}
#if IS_ENABLED(CONFIG_SUNRPC_GSS)
@@ -1193,19 +1190,18 @@ static int __nfsd_symlink(struct inode *dir, struct dentry *dentry,
static void _nfsd_symlink(struct dentry *parent, const char *name,
const char *content)
{
- struct inode *dir = parent->d_inode;
struct dentry *dentry;
int ret;
- inode_lock(dir);
- dentry = d_alloc_name(parent, name);
- if (!dentry)
- goto out;
+ dentry = simple_start_creating(parent, name);
+ if (IS_ERR(dentry))
+ return;
ret = __nfsd_symlink(d_inode(parent), dentry, S_IFLNK | 0777, content);
- if (ret)
- dput(dentry);
-out:
- inode_unlock(dir);
+ if (ret) {
+ simple_failed_creating(dentry);
+ return;
+ }
+ simple_end_creating(dentry);
}
#else
static inline void _nfsd_symlink(struct dentry *parent, const char *name,
@@ -1250,30 +1246,24 @@ static int nfsdfs_create_files(struct dentry *root,
struct dentry *dentry;
int i;
- inode_lock(dir);
for (i = 0; files->name && files->name[0]; i++, files++) {
- dentry = d_alloc_name(root, files->name);
- if (!dentry)
- goto out;
+ dentry = simple_start_creating(root, files->name);
+ if (IS_ERR(dentry))
+ return PTR_ERR(dentry);;
inode = nfsd_get_inode(d_inode(root)->i_sb,
S_IFREG | files->mode);
if (!inode) {
- dput(dentry);
- goto out;
+ simple_failed_creating(dentry);
+ return -ENOMEM;
}
kref_get(&ncl->cl_ref);
inode->i_fop = files->ops;
inode->i_private = ncl;
d_add(dentry, inode);
fsnotify_create(dir, dentry);
- if (fdentries)
- fdentries[i] = dentry;
+ fdentries[i] = simple_end_creating(dentry);
}
- inode_unlock(dir);
return 0;
-out:
- inode_unlock(dir);
- return -ENOMEM;
}
/* on success, returns positive number unique to that client. */
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index fadf3817a9c5..d68e1f35e450 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -420,16 +420,14 @@ static int bpf_iter_link_pin_kernel(struct dentry *parent,
struct dentry *dentry;
int ret;
- inode_lock(parent->d_inode);
- dentry = lookup_noperm(&QSTR(name), parent);
- if (IS_ERR(dentry)) {
- inode_unlock(parent->d_inode);
- return PTR_ERR(dentry);
- }
+ dentry = simple_start_creating(parent, name);
+ if (IS_ERR(dentry))
+ return PTR_ERR(dentry);
+
ret = bpf_mkobj_ops(dentry, mode, link, &bpf_link_iops,
&bpf_iter_fops);
- dput(dentry);
- inode_unlock(parent->d_inode);
+ /* bpf_mkobj_ops took the ref if needed, so we dput() here */
+ dput(simple_end_creating(dentry));
return ret;
}
diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c
index 391a586d0557..13260352198f 100644
--- a/security/apparmor/apparmorfs.c
+++ b/security/apparmor/apparmorfs.c
@@ -280,32 +280,20 @@ static struct dentry *aafs_create(const char *name, umode_t mode,
if (error)
return ERR_PTR(error);
- dir = d_inode(parent);
-
- inode_lock(dir);
- dentry = lookup_noperm(&QSTR(name), parent);
+ dentry = simple_start_creating(parent, name);
if (IS_ERR(dentry)) {
error = PTR_ERR(dentry);
- goto fail_lock;
- }
-
- if (d_really_is_positive(dentry)) {
- error = -EEXIST;
- goto fail_dentry;
+ goto fail;
}
error = __aafs_setup_d_inode(dir, dentry, mode, data, link, fops, iops);
if (error)
- goto fail_dentry;
- inode_unlock(dir);
-
- return dentry;
+ goto fail;
-fail_dentry:
- dput(dentry);
+ return simple_end_creating(dentry);
-fail_lock:
- inode_unlock(dir);
+fail:
+ simple_failed_creating(dentry);
simple_release_fs(&aafs_mnt, &aafs_count);
return ERR_PTR(error);
@@ -2567,8 +2555,7 @@ static int aa_mk_null_file(struct dentry *parent)
if (error)
return error;
- inode_lock(d_inode(parent));
- dentry = lookup_noperm(&QSTR(NULL_FILE_NAME), parent);
+ dentry = simple_start_creating(parent, NULL_FILE_NAME);
if (IS_ERR(dentry)) {
error = PTR_ERR(dentry);
goto out;
@@ -2576,7 +2563,8 @@ static int aa_mk_null_file(struct dentry *parent)
inode = new_inode(parent->d_inode->i_sb);
if (!inode) {
error = -ENOMEM;
- goto out1;
+ simple_failed_creating(dentry);
+ goto out;
}
inode->i_ino = get_next_ino();
@@ -2588,18 +2576,13 @@ static int aa_mk_null_file(struct dentry *parent)
aa_null.dentry = dget(dentry);
aa_null.mnt = mntget(mount);
- error = 0;
+ simple_end_creating(dentry);
-out1:
- dput(dentry);
out:
- inode_unlock(d_inode(parent));
simple_release_fs(&mount, &count);
return error;
}
-
-
static const char *policy_get_link(struct dentry *dentry,
struct inode *inode,
struct delayed_call *done)
diff --git a/security/inode.c b/security/inode.c
index 43382ef8896e..0d2fab99e71e 100644
--- a/security/inode.c
+++ b/security/inode.c
@@ -129,20 +129,14 @@ static struct dentry *securityfs_create_dentry(const char *name, umode_t mode,
dir = d_inode(parent);
- inode_lock(dir);
- dentry = lookup_noperm(&QSTR(name), parent);
+ dentry = simple_start_creating(parent, name);
if (IS_ERR(dentry))
goto out;
- if (d_really_is_positive(dentry)) {
- error = -EEXIST;
- goto out1;
- }
-
inode = new_inode(dir->i_sb);
if (!inode) {
error = -ENOMEM;
- goto out1;
+ goto out;
}
inode->i_ino = get_next_ino();
@@ -161,14 +155,11 @@ static struct dentry *securityfs_create_dentry(const char *name, umode_t mode,
inode->i_fop = fops;
}
d_instantiate(dentry, inode);
- inode_unlock(dir);
- return dentry;
+ return simple_end_creating(dentry);
-out1:
- dput(dentry);
- dentry = ERR_PTR(error);
out:
- inode_unlock(dir);
+ simple_failed_creating(dentry);
+ dentry = ERR_PTR(error);
if (pinned)
simple_release_fs(&mount, &mount_count);
return dentry;
diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
index 9aa1d03ab612..5aa1ad5be587 100644
--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
@@ -1949,15 +1949,16 @@ static const struct inode_operations swapover_dir_inode_operations = {
static struct dentry *sel_make_swapover_dir(struct super_block *sb,
unsigned long *ino)
{
- struct dentry *dentry = d_alloc_name(sb->s_root, ".swapover");
+ struct dentry *dentry;
struct inode *inode;
+ dentry = simple_start_creating(sb->s_root, ".swapover");
if (!dentry)
return ERR_PTR(-ENOMEM);
inode = sel_make_inode(sb, S_IFDIR);
if (!inode) {
- dput(dentry);
+ simple_failed_creating(dentry);
return ERR_PTR(-ENOMEM);
}
@@ -1968,8 +1969,7 @@ static struct dentry *sel_make_swapover_dir(struct super_block *sb,
inode_lock(sb->s_root->d_inode);
d_add(dentry, inode);
inc_nlink(sb->s_root->d_inode);
- inode_unlock(sb->s_root->d_inode);
- return dentry;
+ return simple_end_creating(dentry);
}
#define NULL_FILE_NAME "null"
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v2 7/7] Use simple_start_creating() in various places.
2025-09-09 4:43 ` [PATCH v2 7/7] Use simple_start_creating() in various places NeilBrown
@ 2025-09-09 8:19 ` Al Viro
2025-09-10 3:01 ` NeilBrown
0 siblings, 1 reply; 25+ messages in thread
From: Al Viro @ 2025-09-09 8:19 UTC (permalink / raw)
To: neil
Cc: Christian Brauner, Amir Goldstein, Jeff Layton, Jan Kara,
linux-fsdevel
On Tue, Sep 09, 2025 at 02:43:21PM +1000, NeilBrown wrote:
> d_instantiate(dentry, inode);
> - dget(dentry);
> -fail:
> - inode_unlock(d_inode(parent));
> - return dentry;
> + return simple_end_creating(dentry);
No. This is the wrong model - dget() belongs with d_instantiate()
here; your simple_end_creating() calling conventions are wrong.
What really happens is a controlled leak. simple_start_creating()
in the beginning is correct, but the right model here is to have
identical refcounting logics on the exits - the only difference
should be in having done that combination on success.
Please, leave those alone for now.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 6/7] VFS: introduce simple_end_creating() and simple_failed_creating()
2025-09-09 4:43 ` [PATCH v2 6/7] VFS: introduce simple_end_creating() and simple_failed_creating() NeilBrown
@ 2025-09-09 8:20 ` Al Viro
0 siblings, 0 replies; 25+ messages in thread
From: Al Viro @ 2025-09-09 8:20 UTC (permalink / raw)
To: neil
Cc: Christian Brauner, Amir Goldstein, Jeff Layton, Jan Kara,
linux-fsdevel
On Tue, Sep 09, 2025 at 02:43:20PM +1000, NeilBrown wrote:
> From: NeilBrown <neil@brown.name>
>
> These are partners of simple_start_creating().
> On failure we don't keep a reference. On success we do.
>
> We now Use these where simple_start_creating() is used, in debugfs,
> tracefs, and rpcpipefs.
NAK, for the same reason as with the next commit.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 2/7] VFS: discard err2 in filename_create()
2025-09-09 4:43 ` [PATCH v2 2/7] VFS: discard err2 in filename_create() NeilBrown
@ 2025-09-09 11:24 ` Jeff Layton
0 siblings, 0 replies; 25+ messages in thread
From: Jeff Layton @ 2025-09-09 11:24 UTC (permalink / raw)
To: neil, Alexander Viro, Christian Brauner, Amir Goldstein
Cc: Jan Kara, linux-fsdevel
On Tue, 2025-09-09 at 14:43 +1000, NeilBrown wrote:
> From: NeilBrown <neil@brown.name>
>
> Since 204a575e91f3 "VFS: add common error checks to lookup_one_qstr_excl()"
> filename_create() does not need to stash the error value from mnt_want_write()
> into a separate variable - the logic that used to clobber 'error' after the
> call of mnt_want_write() has migrated into lookup_one_qstr_excl().
>
> So there is no need for two different err variables.
> This patch discards "err2" and uses "error' throughout.
>
> Signed-off-by: NeilBrown <neil@brown.name>
> ---
> fs/namei.c | 10 ++++------
> 1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index c7c6b255db2c..e2c2ab286bc0 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -4169,7 +4169,6 @@ static struct dentry *filename_create(int dfd, struct filename *name,
> unsigned int reval_flag = lookup_flags & LOOKUP_REVAL;
> unsigned int create_flags = LOOKUP_CREATE | LOOKUP_EXCL;
> int type;
> - int err2;
> int error;
>
> error = filename_parentat(dfd, name, reval_flag, path, &last, &type);
> @@ -4184,7 +4183,7 @@ static struct dentry *filename_create(int dfd, struct filename *name,
> goto out;
>
> /* don't fail immediately if it's r/o, at least try to report other errors */
> - err2 = mnt_want_write(path->mnt);
> + error = mnt_want_write(path->mnt);
> /*
> * Do the final lookup. Suppress 'create' if there is a trailing
> * '/', and a directory wasn't requested.
> @@ -4197,17 +4196,16 @@ static struct dentry *filename_create(int dfd, struct filename *name,
> if (IS_ERR(dentry))
> goto unlock;
>
> - if (unlikely(err2)) {
> - error = err2;
> + if (unlikely(error))
> goto fail;
> - }
> +
> return dentry;
> fail:
> dput(dentry);
> dentry = ERR_PTR(error);
> unlock:
> inode_unlock(path->dentry->d_inode);
> - if (!err2)
> + if (!error)
> mnt_drop_write(path->mnt);
> out:
> path_put(path);
Nice cleanup.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
IMO, Christian should go ahead and take in patches 2 and 3 in this
series. They make sense on their own.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 5/7] VFS: rename kern_path_locked() and related functions.
2025-09-09 4:43 ` [PATCH v2 5/7] VFS: rename kern_path_locked() and related functions NeilBrown
@ 2025-09-09 14:07 ` Amir Goldstein
2025-09-10 2:49 ` NeilBrown
0 siblings, 1 reply; 25+ messages in thread
From: Amir Goldstein @ 2025-09-09 14:07 UTC (permalink / raw)
To: neil
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Tue, Sep 9, 2025 at 6:48 AM NeilBrown <neilb@ownmail.net> wrote:
>
> From: NeilBrown <neil@brown.name>
>
> kern_path_locked() is now only used to prepare for removing an object
> from the filesystem (and that is the only credible reason for wanting a
> positive locked dentry). Thus it corresponds to kern_path_create() and
> so should have a corresponding name.
>
> Unfortunately the name "kern_path_create" is somewhat misleading as it
> doesn't actually create anything. The recently added
> simple_start_creating() provides a better pattern I believe. The
> "start" can be matched with "end" to bracket the creating or removing.
>
> So this patch changes names:
>
> kern_path_locked -> start_removing_path
> kern_path_create -> start_creating_path
> user_path_create -> start_creating_user_path
> user_path_locked_at -> start_removing_user_path_at
> done_path_create -> end_creating_path
This looks nice.
With one comment below fixed feel free to add:
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
>
> and also introduces end_removing_path() which is identical to
> end_creating_path().
>
> __start_removing_path (which was __kern_path_locked) is enhanced to
> call mnt_want_write() for consistency with the start_creating_path().
>
> Signed-off-by: NeilBrown <neil@brown.name>
> ---
> Documentation/filesystems/porting.rst | 12 ++++
> arch/powerpc/platforms/cell/spufs/syscalls.c | 4 +-
> drivers/base/devtmpfs.c | 22 +++-----
> fs/bcachefs/fs-ioctl.c | 10 ++--
> fs/init.c | 17 +++---
> fs/namei.c | 58 ++++++++++++--------
> fs/ocfs2/refcounttree.c | 4 +-
> fs/smb/server/vfs.c | 8 +--
> include/linux/namei.h | 14 +++--
> kernel/bpf/inode.c | 4 +-
> net/unix/af_unix.c | 6 +-
> 11 files changed, 92 insertions(+), 67 deletions(-)
>
> diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
> index 85f590254f07..e0494860be6b 100644
> --- a/Documentation/filesystems/porting.rst
> +++ b/Documentation/filesystems/porting.rst
> @@ -1285,3 +1285,15 @@ rather than a VMA, as the VMA at this stage is not yet valid.
> The vm_area_desc provides the minimum required information for a filesystem
> to initialise state upon memory mapping of a file-backed region, and output
> parameters for the file system to set this state.
> +
> +---
> +
> +**mandatory**
> +
> +Several functions are renamed:
> +
> +- kern_path_locked -> start_removing_path
> +- kern_path_create -> start_creating_path
> +- user_path_create -> start_creating_user_path
> +- user_path_locked_at -> start_removing_user_path_at
> +- done_path_create -> end_creating_path
> diff --git a/arch/powerpc/platforms/cell/spufs/syscalls.c b/arch/powerpc/platforms/cell/spufs/syscalls.c
> index 157e046e6e93..ea4ba1b6ce6a 100644
> --- a/arch/powerpc/platforms/cell/spufs/syscalls.c
> +++ b/arch/powerpc/platforms/cell/spufs/syscalls.c
> @@ -67,11 +67,11 @@ static long do_spu_create(const char __user *pathname, unsigned int flags,
> struct dentry *dentry;
> int ret;
>
> - dentry = user_path_create(AT_FDCWD, pathname, &path, LOOKUP_DIRECTORY);
> + dentry = start_creating_user_path(AT_FDCWD, pathname, &path, LOOKUP_DIRECTORY);
> ret = PTR_ERR(dentry);
> if (!IS_ERR(dentry)) {
> ret = spufs_create(&path, dentry, flags, mode, neighbor);
> - done_path_create(&path, dentry);
> + end_creating_path(&path, dentry);
> }
>
> return ret;
> diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
> index 31bfb3194b4c..9d4e46ad8352 100644
> --- a/drivers/base/devtmpfs.c
> +++ b/drivers/base/devtmpfs.c
> @@ -176,7 +176,7 @@ static int dev_mkdir(const char *name, umode_t mode)
> struct dentry *dentry;
> struct path path;
>
> - dentry = kern_path_create(AT_FDCWD, name, &path, LOOKUP_DIRECTORY);
> + dentry = start_creating_path(AT_FDCWD, name, &path, LOOKUP_DIRECTORY);
> if (IS_ERR(dentry))
> return PTR_ERR(dentry);
>
> @@ -184,7 +184,7 @@ static int dev_mkdir(const char *name, umode_t mode)
> if (!IS_ERR(dentry))
> /* mark as kernel-created inode */
> d_inode(dentry)->i_private = &thread;
> - done_path_create(&path, dentry);
> + end_creating_path(&path, dentry);
> return PTR_ERR_OR_ZERO(dentry);
> }
>
> @@ -222,10 +222,10 @@ static int handle_create(const char *nodename, umode_t mode, kuid_t uid,
> struct path path;
> int err;
>
> - dentry = kern_path_create(AT_FDCWD, nodename, &path, 0);
> + dentry = start_creating_path(AT_FDCWD, nodename, &path, 0);
> if (dentry == ERR_PTR(-ENOENT)) {
> create_path(nodename);
> - dentry = kern_path_create(AT_FDCWD, nodename, &path, 0);
> + dentry = start_creating_path(AT_FDCWD, nodename, &path, 0);
> }
> if (IS_ERR(dentry))
> return PTR_ERR(dentry);
> @@ -246,7 +246,7 @@ static int handle_create(const char *nodename, umode_t mode, kuid_t uid,
> /* mark as kernel-created inode */
> d_inode(dentry)->i_private = &thread;
> }
> - done_path_create(&path, dentry);
> + end_creating_path(&path, dentry);
> return err;
> }
>
> @@ -256,7 +256,7 @@ static int dev_rmdir(const char *name)
> struct dentry *dentry;
> int err;
>
> - dentry = kern_path_locked(name, &parent);
> + dentry = start_removing_path(name, &parent);
> if (IS_ERR(dentry))
> return PTR_ERR(dentry);
> if (d_inode(dentry)->i_private == &thread)
> @@ -265,9 +265,7 @@ static int dev_rmdir(const char *name)
> else
> err = -EPERM;
>
> - dput(dentry);
> - inode_unlock(d_inode(parent.dentry));
> - path_put(&parent);
> + end_removing_path(&parent, dentry);
> return err;
> }
>
> @@ -325,7 +323,7 @@ static int handle_remove(const char *nodename, struct device *dev)
> int deleted = 0;
> int err = 0;
>
> - dentry = kern_path_locked(nodename, &parent);
> + dentry = start_removing_path(nodename, &parent);
> if (IS_ERR(dentry))
> return PTR_ERR(dentry);
>
> @@ -349,10 +347,8 @@ static int handle_remove(const char *nodename, struct device *dev)
> if (!err || err == -ENOENT)
> deleted = 1;
> }
> - dput(dentry);
> - inode_unlock(d_inode(parent.dentry));
> + end_removing_path(&parent, dentry);
>
> - path_put(&parent);
> if (deleted && strchr(nodename, '/'))
> delete_path(nodename);
> return err;
> diff --git a/fs/bcachefs/fs-ioctl.c b/fs/bcachefs/fs-ioctl.c
> index 4e72e654da96..43510da5e734 100644
> --- a/fs/bcachefs/fs-ioctl.c
> +++ b/fs/bcachefs/fs-ioctl.c
> @@ -255,7 +255,7 @@ static long bch2_ioctl_subvolume_create(struct bch_fs *c, struct file *filp,
> snapshot_src = inode_inum(to_bch_ei(src_path.dentry->d_inode));
> }
>
> - dst_dentry = user_path_create(arg.dirfd,
> + dst_dentry = start_creating_user_path(arg.dirfd,
> (const char __user *)(unsigned long)arg.dst_ptr,
> &dst_path, lookup_flags);
> error = PTR_ERR_OR_ZERO(dst_dentry);
> @@ -314,7 +314,7 @@ static long bch2_ioctl_subvolume_create(struct bch_fs *c, struct file *filp,
> d_instantiate(dst_dentry, &inode->v);
> fsnotify_mkdir(dir, dst_dentry);
> err3:
> - done_path_create(&dst_path, dst_dentry);
> + end_creating_path(&dst_path, dst_dentry);
> err2:
> if (arg.src_ptr)
> path_put(&src_path);
> @@ -334,7 +334,7 @@ static long bch2_ioctl_subvolume_destroy(struct bch_fs *c, struct file *filp,
> if (arg.flags)
> return -EINVAL;
>
> - victim = user_path_locked_at(arg.dirfd, name, &path);
> + victim = start_removing_user_path_at(arg.dirfd, name, &path);
> if (IS_ERR(victim))
> return PTR_ERR(victim);
>
> @@ -351,9 +351,7 @@ static long bch2_ioctl_subvolume_destroy(struct bch_fs *c, struct file *filp,
> d_invalidate(victim);
> }
> err:
> - inode_unlock(dir);
> - dput(victim);
> - path_put(&path);
> + end_removing_path(&path, victim);
> return ret;
> }
>
> diff --git a/fs/init.c b/fs/init.c
> index eef5124885e3..07f592ccdba8 100644
> --- a/fs/init.c
> +++ b/fs/init.c
> @@ -149,7 +149,7 @@ int __init init_mknod(const char *filename, umode_t mode, unsigned int dev)
> else if (!(S_ISBLK(mode) || S_ISCHR(mode)))
> return -EINVAL;
>
> - dentry = kern_path_create(AT_FDCWD, filename, &path, 0);
> + dentry = start_creating_path(AT_FDCWD, filename, &path, 0);
> if (IS_ERR(dentry))
> return PTR_ERR(dentry);
>
> @@ -158,7 +158,7 @@ int __init init_mknod(const char *filename, umode_t mode, unsigned int dev)
> if (!error)
> error = vfs_mknod(mnt_idmap(path.mnt), path.dentry->d_inode,
> dentry, mode, new_decode_dev(dev));
> - done_path_create(&path, dentry);
> + end_creating_path(&path, dentry);
> return error;
> }
>
> @@ -173,7 +173,7 @@ int __init init_link(const char *oldname, const char *newname)
> if (error)
> return error;
>
> - new_dentry = kern_path_create(AT_FDCWD, newname, &new_path, 0);
> + new_dentry = start_creating_path(AT_FDCWD, newname, &new_path, 0);
> error = PTR_ERR(new_dentry);
> if (IS_ERR(new_dentry))
> goto out;
> @@ -191,7 +191,7 @@ int __init init_link(const char *oldname, const char *newname)
> error = vfs_link(old_path.dentry, idmap, new_path.dentry->d_inode,
> new_dentry, NULL);
> out_dput:
> - done_path_create(&new_path, new_dentry);
> + end_creating_path(&new_path, new_dentry);
> out:
> path_put(&old_path);
> return error;
> @@ -203,14 +203,14 @@ int __init init_symlink(const char *oldname, const char *newname)
> struct path path;
> int error;
>
> - dentry = kern_path_create(AT_FDCWD, newname, &path, 0);
> + dentry = start_creating_path(AT_FDCWD, newname, &path, 0);
> if (IS_ERR(dentry))
> return PTR_ERR(dentry);
> error = security_path_symlink(&path, dentry, oldname);
> if (!error)
> error = vfs_symlink(mnt_idmap(path.mnt), path.dentry->d_inode,
> dentry, oldname);
> - done_path_create(&path, dentry);
> + end_creating_path(&path, dentry);
> return error;
> }
>
> @@ -225,7 +225,8 @@ int __init init_mkdir(const char *pathname, umode_t mode)
> struct path path;
> int error;
>
> - dentry = kern_path_create(AT_FDCWD, pathname, &path, LOOKUP_DIRECTORY);
> + dentry = start_creating_path(AT_FDCWD, pathname, &path,
> + LOOKUP_DIRECTORY);
> if (IS_ERR(dentry))
> return PTR_ERR(dentry);
> mode = mode_strip_umask(d_inode(path.dentry), mode);
> @@ -236,7 +237,7 @@ int __init init_mkdir(const char *pathname, umode_t mode)
> if (IS_ERR(dentry))
> error = PTR_ERR(dentry);
> }
> - done_path_create(&path, dentry);
> + end_creating_path(&path, dentry);
> return error;
> }
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 4017bc8641d3..ee693d16086e 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -2758,7 +2758,8 @@ static int filename_parentat(int dfd, struct filename *name,
> }
>
> /* does lookup, returns the object with parent locked */
> -static struct dentry *__kern_path_locked(int dfd, struct filename *name, struct path *path)
> +static struct dentry *__start_removing_path(int dfd, struct filename *name,
> + struct path *path)
> {
> struct path parent_path __free(path_put) = {};
> struct dentry *d;
> @@ -2770,15 +2771,25 @@ static struct dentry *__kern_path_locked(int dfd, struct filename *name, struct
> return ERR_PTR(error);
> if (unlikely(type != LAST_NORM))
> return ERR_PTR(-EINVAL);
This abnormal error handling pattern deserves a comment:
/* don't fail immediately if it's r/o, at least try to report other errors */
> + error = mnt_want_write(path->mnt);
> inode_lock_nested(parent_path.dentry->d_inode, I_MUTEX_PARENT);
> d = lookup_one_qstr_excl(&last, parent_path.dentry, 0);
> - if (IS_ERR(d)) {
> - inode_unlock(parent_path.dentry->d_inode);
> - return d;
> - }
> + if (IS_ERR(d))
> + goto unlock;
> + if (error)
> + goto fail;
> path->dentry = no_free_ptr(parent_path.dentry);
> path->mnt = no_free_ptr(parent_path.mnt);
> return d;
> +
> +fail:
> + dput(d);
> + d = ERR_PTR(error);
> +unlock:
> + inode_unlock(parent_path.dentry->d_inode);
> + if (!error)
> + mnt_drop_write(path->mnt);
> + return d;
> }
>
> /**
> @@ -2816,24 +2827,26 @@ struct dentry *kern_path_parent(const char *name, struct path *path)
> return d;
> }
>
> -struct dentry *kern_path_locked(const char *name, struct path *path)
> +struct dentry *start_removing_path(const char *name, struct path *path)
> {
> struct filename *filename = getname_kernel(name);
> - struct dentry *res = __kern_path_locked(AT_FDCWD, filename, path);
> + struct dentry *res = __start_removing_path(AT_FDCWD, filename, path);
>
> putname(filename);
> return res;
> }
>
> -struct dentry *user_path_locked_at(int dfd, const char __user *name, struct path *path)
> +struct dentry *start_removing_user_path_at(int dfd,
> + const char __user *name,
> + struct path *path)
> {
> struct filename *filename = getname(name);
> - struct dentry *res = __kern_path_locked(dfd, filename, path);
> + struct dentry *res = __start_removing_path(dfd, filename, path);
>
> putname(filename);
> return res;
> }
> -EXPORT_SYMBOL(user_path_locked_at);
> +EXPORT_SYMBOL(start_removing_user_path_at);
>
> int kern_path(const char *name, unsigned int flags, struct path *path)
> {
> @@ -4223,8 +4236,8 @@ static struct dentry *filename_create(int dfd, struct filename *name,
> return dentry;
> }
>
> -struct dentry *kern_path_create(int dfd, const char *pathname,
> - struct path *path, unsigned int lookup_flags)
> +struct dentry *start_creating_path(int dfd, const char *pathname,
> + struct path *path, unsigned int lookup_flags)
> {
> struct filename *filename = getname_kernel(pathname);
> struct dentry *res = filename_create(dfd, filename, path, lookup_flags);
> @@ -4232,9 +4245,9 @@ struct dentry *kern_path_create(int dfd, const char *pathname,
> putname(filename);
> return res;
> }
> -EXPORT_SYMBOL(kern_path_create);
> +EXPORT_SYMBOL(start_creating_path);
>
> -void done_path_create(struct path *path, struct dentry *dentry)
> +void end_creating_path(struct path *path, struct dentry *dentry)
> {
> if (!IS_ERR(dentry))
> dput(dentry);
> @@ -4242,10 +4255,11 @@ void done_path_create(struct path *path, struct dentry *dentry)
> mnt_drop_write(path->mnt);
> path_put(path);
> }
> -EXPORT_SYMBOL(done_path_create);
> +EXPORT_SYMBOL(end_creating_path);
>
> -inline struct dentry *user_path_create(int dfd, const char __user *pathname,
> - struct path *path, unsigned int lookup_flags)
> +inline struct dentry *start_creating_user_path(
> + int dfd, const char __user *pathname,
> + struct path *path, unsigned int lookup_flags)
> {
> struct filename *filename = getname(pathname);
> struct dentry *res = filename_create(dfd, filename, path, lookup_flags);
> @@ -4253,7 +4267,7 @@ inline struct dentry *user_path_create(int dfd, const char __user *pathname,
> putname(filename);
> return res;
> }
> -EXPORT_SYMBOL(user_path_create);
> +EXPORT_SYMBOL(start_creating_user_path);
>
> /**
> * vfs_mknod - create device node or file
> @@ -4361,7 +4375,7 @@ static int do_mknodat(int dfd, struct filename *name, umode_t mode,
> break;
> }
> out2:
> - done_path_create(&path, dentry);
> + end_creating_path(&path, dentry);
> if (retry_estale(error, lookup_flags)) {
> lookup_flags |= LOOKUP_REVAL;
> goto retry;
> @@ -4465,7 +4479,7 @@ int do_mkdirat(int dfd, struct filename *name, umode_t mode)
> if (IS_ERR(dentry))
> error = PTR_ERR(dentry);
> }
> - done_path_create(&path, dentry);
> + end_creating_path(&path, dentry);
> if (retry_estale(error, lookup_flags)) {
> lookup_flags |= LOOKUP_REVAL;
> goto retry;
> @@ -4819,7 +4833,7 @@ int do_symlinkat(struct filename *from, int newdfd, struct filename *to)
> if (!error)
> error = vfs_symlink(mnt_idmap(path.mnt), path.dentry->d_inode,
> dentry, from->name);
> - done_path_create(&path, dentry);
> + end_creating_path(&path, dentry);
> if (retry_estale(error, lookup_flags)) {
> lookup_flags |= LOOKUP_REVAL;
> goto retry;
> @@ -4988,7 +5002,7 @@ int do_linkat(int olddfd, struct filename *old, int newdfd,
> error = vfs_link(old_path.dentry, idmap, new_path.dentry->d_inode,
> new_dentry, &delegated_inode);
> out_dput:
> - done_path_create(&new_path, new_dentry);
> + end_creating_path(&new_path, new_dentry);
> if (delegated_inode) {
> error = break_deleg_wait(&delegated_inode);
> if (!error) {
> diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c
> index 8f732742b26e..267b50e8e42e 100644
> --- a/fs/ocfs2/refcounttree.c
> +++ b/fs/ocfs2/refcounttree.c
> @@ -4418,7 +4418,7 @@ int ocfs2_reflink_ioctl(struct inode *inode,
> return error;
> }
>
> - new_dentry = user_path_create(AT_FDCWD, newname, &new_path, 0);
> + new_dentry = start_creating_user_path(AT_FDCWD, newname, &new_path, 0);
> error = PTR_ERR(new_dentry);
> if (IS_ERR(new_dentry)) {
> mlog_errno(error);
> @@ -4435,7 +4435,7 @@ int ocfs2_reflink_ioctl(struct inode *inode,
> d_inode(new_path.dentry),
> new_dentry, preserve);
> out_dput:
> - done_path_create(&new_path, new_dentry);
> + end_creating_path(&new_path, new_dentry);
> out:
> path_put(&old_path);
>
> diff --git a/fs/smb/server/vfs.c b/fs/smb/server/vfs.c
> index 07739055ac9f..1cfa688904b2 100644
> --- a/fs/smb/server/vfs.c
> +++ b/fs/smb/server/vfs.c
> @@ -196,7 +196,7 @@ int ksmbd_vfs_create(struct ksmbd_work *work, const char *name, umode_t mode)
> pr_err("File(%s): creation failed (err:%d)\n", name, err);
> }
>
> - done_path_create(&path, dentry);
> + end_creating_path(&path, dentry);
> return err;
> }
>
> @@ -237,7 +237,7 @@ int ksmbd_vfs_mkdir(struct ksmbd_work *work, const char *name, umode_t mode)
> if (!err && dentry != d)
> ksmbd_vfs_inherit_owner(work, d_inode(path.dentry), d_inode(dentry));
>
> - done_path_create(&path, dentry);
> + end_creating_path(&path, dentry);
> if (err)
> pr_err("mkdir(%s): creation failed (err:%d)\n", name, err);
> return err;
> @@ -669,7 +669,7 @@ int ksmbd_vfs_link(struct ksmbd_work *work, const char *oldname,
> ksmbd_debug(VFS, "vfs_link failed err %d\n", err);
>
> out3:
> - done_path_create(&newpath, dentry);
> + end_creating_path(&newpath, dentry);
> out2:
> path_put(&oldpath);
> out1:
> @@ -1325,7 +1325,7 @@ struct dentry *ksmbd_vfs_kern_path_create(struct ksmbd_work *work,
> if (!abs_name)
> return ERR_PTR(-ENOMEM);
>
> - dent = kern_path_create(AT_FDCWD, abs_name, path, flags);
> + dent = start_creating_path(AT_FDCWD, abs_name, path, flags);
> kfree(abs_name);
> return dent;
> }
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 1d5038c21c20..a7800ef04e76 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -59,11 +59,15 @@ struct dentry *lookup_one_qstr_excl(const struct qstr *name,
> extern int kern_path(const char *, unsigned, struct path *);
> struct dentry *kern_path_parent(const char *name, struct path *parent);
>
> -extern struct dentry *kern_path_create(int, const char *, struct path *, unsigned int);
> -extern struct dentry *user_path_create(int, const char __user *, struct path *, unsigned int);
> -extern void done_path_create(struct path *, struct dentry *);
> -extern struct dentry *kern_path_locked(const char *, struct path *);
> -extern struct dentry *user_path_locked_at(int , const char __user *, struct path *);
> +extern struct dentry *start_creating_path(int, const char *, struct path *, unsigned int);
> +extern struct dentry *start_creating_user_path(int, const char __user *, struct path *, unsigned int);
> +extern void end_creating_path(struct path *, struct dentry *);
> +extern struct dentry *start_removing_path(const char *, struct path *);
> +extern struct dentry *start_removing_user_path_at(int , const char __user *, struct path *);
> +static inline void end_removing_path(struct path *path , struct dentry *dentry)
> +{
> + end_creating_path(path, dentry);
> +}
> int vfs_path_parent_lookup(struct filename *filename, unsigned int flags,
> struct path *parent, struct qstr *last, int *type,
> const struct path *root);
> diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
> index 5c2e96b19392..fadf3817a9c5 100644
> --- a/kernel/bpf/inode.c
> +++ b/kernel/bpf/inode.c
> @@ -442,7 +442,7 @@ static int bpf_obj_do_pin(int path_fd, const char __user *pathname, void *raw,
> umode_t mode;
> int ret;
>
> - dentry = user_path_create(path_fd, pathname, &path, 0);
> + dentry = start_creating_user_path(path_fd, pathname, &path, 0);
> if (IS_ERR(dentry))
> return PTR_ERR(dentry);
>
> @@ -471,7 +471,7 @@ static int bpf_obj_do_pin(int path_fd, const char __user *pathname, void *raw,
> ret = -EPERM;
> }
> out:
> - done_path_create(&path, dentry);
> + end_creating_path(&path, dentry);
> return ret;
> }
>
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 6d7c110814ff..768098dec231 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -1387,7 +1387,7 @@ static int unix_bind_bsd(struct sock *sk, struct sockaddr_un *sunaddr,
> * Get the parent directory, calculate the hash for last
> * component.
> */
> - dentry = kern_path_create(AT_FDCWD, addr->name->sun_path, &parent, 0);
> + dentry = start_creating_path(AT_FDCWD, addr->name->sun_path, &parent, 0);
> if (IS_ERR(dentry)) {
> err = PTR_ERR(dentry);
> goto out;
> @@ -1417,7 +1417,7 @@ static int unix_bind_bsd(struct sock *sk, struct sockaddr_un *sunaddr,
> unix_table_double_unlock(net, old_hash, new_hash);
> unix_insert_bsd_socket(sk);
> mutex_unlock(&u->bindlock);
> - done_path_create(&parent, dentry);
> + end_creating_path(&parent, dentry);
> return 0;
>
> out_unlock:
> @@ -1427,7 +1427,7 @@ static int unix_bind_bsd(struct sock *sk, struct sockaddr_un *sunaddr,
> /* failed after successful mknod? unlink what we'd created... */
> vfs_unlink(idmap, d_inode(parent.dentry), dentry, NULL);
> out_path:
> - done_path_create(&parent, dentry);
> + end_creating_path(&parent, dentry);
> out:
> unix_release_addr(addr);
> return err == -EEXIST ? -EADDRINUSE : err;
> --
> 2.50.0.107.gf914562f5916.dirty
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 5/7] VFS: rename kern_path_locked() and related functions.
2025-09-09 14:07 ` Amir Goldstein
@ 2025-09-10 2:49 ` NeilBrown
0 siblings, 0 replies; 25+ messages in thread
From: NeilBrown @ 2025-09-10 2:49 UTC (permalink / raw)
To: Amir Goldstein
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Wed, 10 Sep 2025, Amir Goldstein wrote:
> On Tue, Sep 9, 2025 at 6:48 AM NeilBrown <neilb@ownmail.net> wrote:
> >
> > From: NeilBrown <neil@brown.name>
> >
> > kern_path_locked() is now only used to prepare for removing an object
> > from the filesystem (and that is the only credible reason for wanting a
> > positive locked dentry). Thus it corresponds to kern_path_create() and
> > so should have a corresponding name.
> >
> > Unfortunately the name "kern_path_create" is somewhat misleading as it
> > doesn't actually create anything. The recently added
> > simple_start_creating() provides a better pattern I believe. The
> > "start" can be matched with "end" to bracket the creating or removing.
> >
> > So this patch changes names:
> >
> > kern_path_locked -> start_removing_path
> > kern_path_create -> start_creating_path
> > user_path_create -> start_creating_user_path
> > user_path_locked_at -> start_removing_user_path_at
> > done_path_create -> end_creating_path
>
> This looks nice.
>
> With one comment below fixed feel free to add:
>
> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Thanks.
> > @@ -2770,15 +2771,25 @@ static struct dentry *__kern_path_locked(int dfd, struct filename *name, struct
> > return ERR_PTR(error);
> > if (unlikely(type != LAST_NORM))
> > return ERR_PTR(-EINVAL);
>
> This abnormal error handling pattern deserves a comment:
> /* don't fail immediately if it's r/o, at least try to report other errors */
Just like in start_creating_path(). Yes, that is fair. Done.
Thanks,
NeilBrown
>
> > + error = mnt_want_write(path->mnt);
> > inode_lock_nested(parent_path.dentry->d_inode, I_MUTEX_PARENT);
> > d = lookup_one_qstr_excl(&last, parent_path.dentry, 0);
> > - if (IS_ERR(d)) {
> > - inode_unlock(parent_path.dentry->d_inode);
> > - return d;
> > - }
> > + if (IS_ERR(d))
> > + goto unlock;
> > + if (error)
> > + goto fail;
> > path->dentry = no_free_ptr(parent_path.dentry);
> > path->mnt = no_free_ptr(parent_path.mnt);
> > return d;
> > +
> > +fail:
> > + dput(d);
> > + d = ERR_PTR(error);
> > +unlock:
> > + inode_unlock(parent_path.dentry->d_inode);
> > + if (!error)
> > + mnt_drop_write(path->mnt);
> > + return d;
> > }
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 7/7] Use simple_start_creating() in various places.
2025-09-09 8:19 ` Al Viro
@ 2025-09-10 3:01 ` NeilBrown
2025-09-10 4:12 ` Al Viro
0 siblings, 1 reply; 25+ messages in thread
From: NeilBrown @ 2025-09-10 3:01 UTC (permalink / raw)
To: Al Viro
Cc: Christian Brauner, Amir Goldstein, Jeff Layton, Jan Kara,
linux-fsdevel
On Tue, 09 Sep 2025, Al Viro wrote:
> On Tue, Sep 09, 2025 at 02:43:21PM +1000, NeilBrown wrote:
>
> > d_instantiate(dentry, inode);
> > - dget(dentry);
> > -fail:
> > - inode_unlock(d_inode(parent));
> > - return dentry;
> > + return simple_end_creating(dentry);
>
> No. This is the wrong model - dget() belongs with d_instantiate()
> here; your simple_end_creating() calling conventions are wrong.
I can see that I shouldn't have removed the dget() there - thanks.
It is not entirely clear why hypfs_create_file() returns with two
references held to the dentry....
I see now one is added either to ->update_file or the list at
hypfs_last_dentry, and the other is disposed of by kill_litter_super().
But apart from that one error is there something broader wrong with the
patch? You say "the wrong model" but I don't see it.
Thanks,
NeilBrown
>
> What really happens is a controlled leak. simple_start_creating()
> in the beginning is correct, but the right model here is to have
> identical refcounting logics on the exits - the only difference
> should be in having done that combination on success.
>
> Please, leave those alone for now.
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 7/7] Use simple_start_creating() in various places.
2025-09-10 3:01 ` NeilBrown
@ 2025-09-10 4:12 ` Al Viro
2025-09-10 4:16 ` Al Viro
0 siblings, 1 reply; 25+ messages in thread
From: Al Viro @ 2025-09-10 4:12 UTC (permalink / raw)
To: NeilBrown
Cc: Christian Brauner, Amir Goldstein, Jeff Layton, Jan Kara,
linux-fsdevel
On Wed, Sep 10, 2025 at 01:01:49PM +1000, NeilBrown wrote:
> On Tue, 09 Sep 2025, Al Viro wrote:
> > On Tue, Sep 09, 2025 at 02:43:21PM +1000, NeilBrown wrote:
> >
> > > d_instantiate(dentry, inode);
> > > - dget(dentry);
> > > -fail:
> > > - inode_unlock(d_inode(parent));
> > > - return dentry;
> > > + return simple_end_creating(dentry);
> >
> > No. This is the wrong model - dget() belongs with d_instantiate()
> > here; your simple_end_creating() calling conventions are wrong.
>
> I can see that I shouldn't have removed the dget() there - thanks.
> It is not entirely clear why hypfs_create_file() returns with two
> references held to the dentry....
> I see now one is added either to ->update_file or the list at
> hypfs_last_dentry, and the other is disposed of by kill_litter_super().
>
> But apart from that one error is there something broader wrong with the
> patch? You say "the wrong model" but I don't see it.
See below for hypfs:
commit c0c58d3cadfcefcf25f4885b47c47d6e6a3f9aee
Author: Al Viro <viro@zeniv.linux.org.uk>
Date: Thu May 9 16:00:48 2024 -0400
hypfs: don't pin dentries twice
hypfs dentries end up with refcount 2 when they are not busy.
Refcount 1 is enough to keep them pinned, and going that way
allows to simplify things nicely:
* don't need to drop an extra reference before the
call of kill_litter_super() in ->kill_sb(); all we need
there is to reset the cleanup list - everything on it will
be taken out automatically.
* we can make use of simple_recursive_removal() on
tree rebuilds; just make sure that only children of root
end up in the cleanup list and hypfs_delete_tree() becomes
much simpler
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
diff --git a/arch/s390/hypfs/inode.c b/arch/s390/hypfs/inode.c
index 96409573c75d..a4dc8e13d999 100644
--- a/arch/s390/hypfs/inode.c
+++ b/arch/s390/hypfs/inode.c
@@ -61,33 +61,17 @@ static void hypfs_update_update(struct super_block *sb)
static void hypfs_add_dentry(struct dentry *dentry)
{
- dentry->d_fsdata = hypfs_last_dentry;
- hypfs_last_dentry = dentry;
-}
-
-static void hypfs_remove(struct dentry *dentry)
-{
- struct dentry *parent;
-
- parent = dentry->d_parent;
- inode_lock(d_inode(parent));
- if (simple_positive(dentry)) {
- if (d_is_dir(dentry))
- simple_rmdir(d_inode(parent), dentry);
- else
- simple_unlink(d_inode(parent), dentry);
+ if (IS_ROOT(dentry->d_parent)) {
+ dentry->d_fsdata = hypfs_last_dentry;
+ hypfs_last_dentry = dentry;
}
- d_drop(dentry);
- dput(dentry);
- inode_unlock(d_inode(parent));
}
-static void hypfs_delete_tree(struct dentry *root)
+static void hypfs_delete_tree(void)
{
while (hypfs_last_dentry) {
- struct dentry *next_dentry;
- next_dentry = hypfs_last_dentry->d_fsdata;
- hypfs_remove(hypfs_last_dentry);
+ struct dentry *next_dentry = hypfs_last_dentry->d_fsdata;
+ simple_recursive_removal(hypfs_last_dentry, NULL);
hypfs_last_dentry = next_dentry;
}
}
@@ -184,14 +168,14 @@ static ssize_t hypfs_write_iter(struct kiocb *iocb, struct iov_iter *from)
rc = -EBUSY;
goto out;
}
- hypfs_delete_tree(sb->s_root);
+ hypfs_delete_tree();
if (machine_is_vm())
rc = hypfs_vm_create_files(sb->s_root);
else
rc = hypfs_diag_create_files(sb->s_root);
if (rc) {
pr_err("Updating the hypfs tree failed\n");
- hypfs_delete_tree(sb->s_root);
+ hypfs_delete_tree();
goto out;
}
hypfs_update_update(sb);
@@ -326,13 +310,9 @@ static void hypfs_kill_super(struct super_block *sb)
{
struct hypfs_sb_info *sb_info = sb->s_fs_info;
- if (sb->s_root)
- hypfs_delete_tree(sb->s_root);
- if (sb_info && sb_info->update_file)
- hypfs_remove(sb_info->update_file);
- kfree(sb->s_fs_info);
- sb->s_fs_info = NULL;
+ hypfs_last_dentry = NULL;
kill_litter_super(sb);
+ kfree(sb_info);
}
static struct dentry *hypfs_create_file(struct dentry *parent, const char *name,
@@ -367,7 +347,6 @@ static struct dentry *hypfs_create_file(struct dentry *parent, const char *name,
BUG();
inode->i_private = data;
d_instantiate(dentry, inode);
- dget(dentry);
fail:
inode_unlock(d_inode(parent));
return dentry;
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v2 7/7] Use simple_start_creating() in various places.
2025-09-10 4:12 ` Al Viro
@ 2025-09-10 4:16 ` Al Viro
2025-09-10 7:37 ` Al Viro
2025-09-10 23:13 ` NeilBrown
0 siblings, 2 replies; 25+ messages in thread
From: Al Viro @ 2025-09-10 4:16 UTC (permalink / raw)
To: NeilBrown
Cc: Christian Brauner, Amir Goldstein, Jeff Layton, Jan Kara,
linux-fsdevel
On Wed, Sep 10, 2025 at 05:12:49AM +0100, Al Viro wrote:
> On Wed, Sep 10, 2025 at 01:01:49PM +1000, NeilBrown wrote:
> > On Tue, 09 Sep 2025, Al Viro wrote:
> > > On Tue, Sep 09, 2025 at 02:43:21PM +1000, NeilBrown wrote:
> > >
> > > > d_instantiate(dentry, inode);
> > > > - dget(dentry);
> > > > -fail:
> > > > - inode_unlock(d_inode(parent));
> > > > - return dentry;
> > > > + return simple_end_creating(dentry);
> > >
> > > No. This is the wrong model - dget() belongs with d_instantiate()
> > > here; your simple_end_creating() calling conventions are wrong.
> >
> > I can see that I shouldn't have removed the dget() there - thanks.
> > It is not entirely clear why hypfs_create_file() returns with two
> > references held to the dentry....
> > I see now one is added either to ->update_file or the list at
> > hypfs_last_dentry, and the other is disposed of by kill_litter_super().
> >
> > But apart from that one error is there something broader wrong with the
> > patch? You say "the wrong model" but I don't see it.
>
> See below for hypfs:
... and see viro/vfs.git#work.persistency for the part of the queue that
had order already settled down (I'm reshuffling the tail at the moment;
hypfs commit is still in the leftovers pile - the whole thing used to
have a really messy topology, with most of the prep work that used to
be the cause of that topology already in mainline - e.g. rpc_pipefs
series, securityfs one, etc.)
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 7/7] Use simple_start_creating() in various places.
2025-09-10 4:16 ` Al Viro
@ 2025-09-10 7:37 ` Al Viro
2025-09-10 11:04 ` Jeff Layton
2025-09-10 12:34 ` Jeff Layton
2025-09-10 23:13 ` NeilBrown
1 sibling, 2 replies; 25+ messages in thread
From: Al Viro @ 2025-09-10 7:37 UTC (permalink / raw)
To: NeilBrown
Cc: Christian Brauner, Amir Goldstein, Jeff Layton, Jan Kara,
linux-fsdevel, linux-nfs
> ... and see viro/vfs.git#work.persistency for the part of the queue that
> had order already settled down (I'm reshuffling the tail at the moment;
> hypfs commit is still in the leftovers pile - the whole thing used to
> have a really messy topology, with most of the prep work that used to
> be the cause of that topology already in mainline - e.g. rpc_pipefs
> series, securityfs one, etc.)
Speaking of which, nfsctl series contains the following and I'd like to
make sure that behaviour being fixed there *is* just an accident...
Could nfsd folks comment?
[PATCH] nfsctl: don't bump st_nlink of directory when creating a symlink in it
apparently blindly copied from mkdir...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index bc6b776fc657..282b961d8788 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1181,7 +1181,6 @@ static int __nfsd_symlink(struct inode *dir, struct dentry *dentry,
inode->i_size = strlen(content);
d_add(dentry, inode);
- inc_nlink(dir);
fsnotify_create(dir, dentry);
return 0;
}
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v2 7/7] Use simple_start_creating() in various places.
2025-09-10 7:37 ` Al Viro
@ 2025-09-10 11:04 ` Jeff Layton
2025-09-10 11:30 ` NeilBrown
2025-09-10 12:34 ` Jeff Layton
1 sibling, 1 reply; 25+ messages in thread
From: Jeff Layton @ 2025-09-10 11:04 UTC (permalink / raw)
To: Al Viro, NeilBrown
Cc: Christian Brauner, Amir Goldstein, Jan Kara, linux-fsdevel,
linux-nfs
On Wed, 2025-09-10 at 08:37 +0100, Al Viro wrote:
> > ... and see viro/vfs.git#work.persistency for the part of the queue that
> > had order already settled down (I'm reshuffling the tail at the moment;
> > hypfs commit is still in the leftovers pile - the whole thing used to
> > have a really messy topology, with most of the prep work that used to
> > be the cause of that topology already in mainline - e.g. rpc_pipefs
> > series, securityfs one, etc.)
>
> Speaking of which, nfsctl series contains the following and I'd like to
> make sure that behaviour being fixed there *is* just an accident...
> Could nfsd folks comment?
>
> [PATCH] nfsctl: don't bump st_nlink of directory when creating a symlink in it
>
>
> apparently blindly copied from mkdir...
>
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index bc6b776fc657..282b961d8788 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -1181,7 +1181,6 @@ static int __nfsd_symlink(struct inode *dir, struct dentry *dentry,
> inode->i_size = strlen(content);
>
>
> d_add(dentry, inode);
> - inc_nlink(dir);
> fsnotify_create(dir, dentry);
> return 0;
> }
That is increasing the link count on the parent because it's adding a
dentry to "dir". The link count on a dir doesn't have much meaning, but
why do we need to remove it here, but keep the one in __nfsd_mkdir?
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 7/7] Use simple_start_creating() in various places.
2025-09-10 11:04 ` Jeff Layton
@ 2025-09-10 11:30 ` NeilBrown
2025-09-10 11:54 ` Jeff Layton
0 siblings, 1 reply; 25+ messages in thread
From: NeilBrown @ 2025-09-10 11:30 UTC (permalink / raw)
To: Jeff Layton
Cc: Al Viro, Christian Brauner, Amir Goldstein, Jan Kara,
linux-fsdevel, linux-nfs
On Wed, 10 Sep 2025, Jeff Layton wrote:
> On Wed, 2025-09-10 at 08:37 +0100, Al Viro wrote:
> > > ... and see viro/vfs.git#work.persistency for the part of the queue that
> > > had order already settled down (I'm reshuffling the tail at the moment;
> > > hypfs commit is still in the leftovers pile - the whole thing used to
> > > have a really messy topology, with most of the prep work that used to
> > > be the cause of that topology already in mainline - e.g. rpc_pipefs
> > > series, securityfs one, etc.)
> >
> > Speaking of which, nfsctl series contains the following and I'd like to
> > make sure that behaviour being fixed there *is* just an accident...
> > Could nfsd folks comment?
> >
> > [PATCH] nfsctl: don't bump st_nlink of directory when creating a symlink in it
> >
> >
> > apparently blindly copied from mkdir...
> >
> >
> > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > ---
> > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> > index bc6b776fc657..282b961d8788 100644
> > --- a/fs/nfsd/nfsctl.c
> > +++ b/fs/nfsd/nfsctl.c
> > @@ -1181,7 +1181,6 @@ static int __nfsd_symlink(struct inode *dir, struct dentry *dentry,
> > inode->i_size = strlen(content);
> >
> >
> > d_add(dentry, inode);
> > - inc_nlink(dir);
> > fsnotify_create(dir, dentry);
> > return 0;
> > }
>
> That is increasing the link count on the parent because it's adding a
> dentry to "dir". The link count on a dir doesn't have much meaning, but
> why do we need to remove it here, but keep the one in __nfsd_mkdir?
The link count in an inode is the number of links *to* the inode.
A symlink (or file etc) in a directory doesn't imply a link to that
directory (they are links "from" the directory, but those aren't counted).
A directory in a directory, on the other hand, does imply a link to the
(parent) directory due to the ".." entry.
In fact the link count on a directory should always be 2 plus the number
of subdirectories (one for the name in the parent, one for "." in the
directory itself, and one for ".." in each subdirectory). Some "find"
style programs depend on that to a degree, though mostly as an
optimisation.
NeilBrown
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 7/7] Use simple_start_creating() in various places.
2025-09-10 11:30 ` NeilBrown
@ 2025-09-10 11:54 ` Jeff Layton
2025-09-10 18:28 ` Al Viro
0 siblings, 1 reply; 25+ messages in thread
From: Jeff Layton @ 2025-09-10 11:54 UTC (permalink / raw)
To: NeilBrown
Cc: Al Viro, Christian Brauner, Amir Goldstein, Jan Kara,
linux-fsdevel, linux-nfs
On Wed, 2025-09-10 at 21:30 +1000, NeilBrown wrote:
> On Wed, 10 Sep 2025, Jeff Layton wrote:
> > On Wed, 2025-09-10 at 08:37 +0100, Al Viro wrote:
> > > > ... and see viro/vfs.git#work.persistency for the part of the queue that
> > > > had order already settled down (I'm reshuffling the tail at the moment;
> > > > hypfs commit is still in the leftovers pile - the whole thing used to
> > > > have a really messy topology, with most of the prep work that used to
> > > > be the cause of that topology already in mainline - e.g. rpc_pipefs
> > > > series, securityfs one, etc.)
> > >
> > > Speaking of which, nfsctl series contains the following and I'd like to
> > > make sure that behaviour being fixed there *is* just an accident...
> > > Could nfsd folks comment?
> > >
> > > [PATCH] nfsctl: don't bump st_nlink of directory when creating a symlink in it
> > >
> > >
> > > apparently blindly copied from mkdir...
> > >
> > >
> > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > > ---
> > > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> > > index bc6b776fc657..282b961d8788 100644
> > > --- a/fs/nfsd/nfsctl.c
> > > +++ b/fs/nfsd/nfsctl.c
> > > @@ -1181,7 +1181,6 @@ static int __nfsd_symlink(struct inode *dir, struct dentry *dentry,
> > > inode->i_size = strlen(content);
> > >
> > >
> > > d_add(dentry, inode);
> > > - inc_nlink(dir);
> > > fsnotify_create(dir, dentry);
> > > return 0;
> > > }
> >
> > That is increasing the link count on the parent because it's adding a
> > dentry to "dir". The link count on a dir doesn't have much meaning, but
> > why do we need to remove it here, but keep the one in __nfsd_mkdir?
>
> The link count in an inode is the number of links *to* the inode.
> A symlink (or file etc) in a directory doesn't imply a link to that
> directory (they are links "from" the directory, but those aren't counted).
> A directory in a directory, on the other hand, does imply a link to the
> (parent) directory due to the ".." entry.
>
> In fact the link count on a directory should always be 2 plus the number
> of subdirectories (one for the name in the parent, one for "." in the
> directory itself, and one for ".." in each subdirectory). Some "find"
> style programs depend on that to a degree, though mostly as an
> optimisation.
>
I'm having a hard time finding where that is defined in the POSIX
specs. The link count for normal files is fairly well-defined. The link
count for directories has always been more nebulous.
I think we would be well-served by a clearly-defined meaning for the
link count on a directory for Linux, because different filesystems do
this differently today.
Yours and Al's semantics seem fine (I don't have a real preference,
tbh), but we should codify this in Documentation/ since it is unclear.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 7/7] Use simple_start_creating() in various places.
2025-09-10 7:37 ` Al Viro
2025-09-10 11:04 ` Jeff Layton
@ 2025-09-10 12:34 ` Jeff Layton
1 sibling, 0 replies; 25+ messages in thread
From: Jeff Layton @ 2025-09-10 12:34 UTC (permalink / raw)
To: Al Viro, NeilBrown
Cc: Christian Brauner, Amir Goldstein, Jan Kara, linux-fsdevel,
linux-nfs
On Wed, 2025-09-10 at 08:37 +0100, Al Viro wrote:
> > ... and see viro/vfs.git#work.persistency for the part of the queue that
> > had order already settled down (I'm reshuffling the tail at the moment;
> > hypfs commit is still in the leftovers pile - the whole thing used to
> > have a really messy topology, with most of the prep work that used to
> > be the cause of that topology already in mainline - e.g. rpc_pipefs
> > series, securityfs one, etc.)
>
> Speaking of which, nfsctl series contains the following and I'd like to
> make sure that behaviour being fixed there *is* just an accident...
> Could nfsd folks comment?
>
> [PATCH] nfsctl: don't bump st_nlink of directory when creating a symlink in it
>
> apparently blindly copied from mkdir...
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index bc6b776fc657..282b961d8788 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -1181,7 +1181,6 @@ static int __nfsd_symlink(struct inode *dir, struct dentry *dentry,
> inode->i_size = strlen(content);
>
> d_add(dentry, inode);
> - inc_nlink(dir);
> fsnotify_create(dir, dentry);
> return 0;
> }
Given Neil's explanation:
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 7/7] Use simple_start_creating() in various places.
2025-09-10 11:54 ` Jeff Layton
@ 2025-09-10 18:28 ` Al Viro
0 siblings, 0 replies; 25+ messages in thread
From: Al Viro @ 2025-09-10 18:28 UTC (permalink / raw)
To: Jeff Layton
Cc: NeilBrown, Christian Brauner, Amir Goldstein, Jan Kara,
linux-fsdevel, linux-nfs
On Wed, Sep 10, 2025 at 07:54:25AM -0400, Jeff Layton wrote:
> I'm having a hard time finding where that is defined in the POSIX
> specs. The link count for normal files is fairly well-defined. The link
> count for directories has always been more nebulous.
Try UNIX textbooks... <pulls some out> in Stevens that would be 4.14
and I'm pretty sure that it's covered in other standard ones.
History of that thing: on traditional filesystem "." and ".." are real
directory entries, "." refering to the inode of directory itself and
".." to that of parent (if it's not a root directory) or to directory
itself (if it is root). Link count is literally the number of directory
entries pointing to given inode. For directories that boils down to 2
if empty ("." + reference in parent or ".." for non-root and root resp.),
then each subdirectory adds 1 (".." in it points back to ours).
That goes for any local UNIX filesystem, and there hadn't been anything
else until RFS and NFS, both keeping the same rules (both coming from
the underlying filesystem layout). Try mkdir and ln -s on NFS, watch
what's happening to stat of parent.
The things got murkier for FAT, when its support got added - on-disk
layout has nothing like a link count. It has only one directory entry
for any non-directory *and* directory entries have object type of the
thing they are pointing to, so one can match the behaviour of normal UNIX
filesystem by going through the directory contents when reading an inode;
the cost is not particularly high, so that's what everyone did.
For original isofs (i.e. no rockridge extensions, no ownership, etc.)
that was considerably harder; I don't remember what SunOS hsfs had done
there (thankfully), our implementation started with "just slap 2 for
directories if RR is not there", but that switched to "we don't know the
exact answer, so use 1 to indicate that" pretty early <checks> 1.1.40 -
Aug '94; original isofs implementation went into the tree in Dec '92.
More complications came when... odd people came complaining about the
overflows when they got 65534 subdirectories in the same directory.
That had two sides to it - on-disk inode layout and userland ABI.
For the latter the long-term solution was to make st_nlink 32bit
(in newer variant of stat(2) if needed) and fail with EOVERFLOW if the
value doesn't fit into the ABI you are trying to use. For the latter...
some weird kludges followed, with the things eventually settling down
on "if you can't manage the expected value, at least report something
that couldn't be confused for it". Since 1 is normally impossible for
a directory, that turned into "can't tell you how many links are there".
That covered both "we don't have enough bits in the on-disk field" and
"we don't have that field on disk at all and can't be bothered calculating
it" (as in iso9660 case above).
Of course, for e.g. NFS the value we report is whatever the server
tells us; nobody is going to have client to readdirplus the entire
directory and count subdirectories in it just to check if server lies
and is inconsistent at that. But that's not really different from the
situation with local filesystem - we assume that the count in on-disk
inode matches the number of directory entries pointing to it.
The find(1) (well, tree-walkers in general, really) thing Neil has
mentioned is that on filesystems where readdir(3) gives you no reliable
dirent->d_type you need to stat every entry in order to decide whether
it's a subdirectory you would need to walk into. Being able to tell
"this directory has no subdirectories" allows to skip those stat(2) calls
when going through it. Same for "I've already seen 5 subdirectories,
stat on our directory has reported st_nlink being 7, so we'd already
seen all subdirectories here; no need to stat(2) further entries", for
that matter... On a sane filesystem you'd just look for entries with
->dt_type == DT_DIR and skip all those stat(2).
IOW, the real rules are
* st_nlink >= 2: st_nlink - 2 subdirectories
* st_nlink = 1: refused to report the number of subdirectories
* st_nlink = 0: fstat on something that had been removed
In case of corrupted filesystem, bullshitting server, etc. result might
have no relationship to reality, of course.
I don't know of any case where creation of symlinks in a directory would
affected the parent's link count. Frankly, I thought that was just
an accidental cut'n'paste from __nfsd_mkdir()... As long as nothing
in the userland is playing odd games with that st_nlink value, I'd say
we should remove the temptation to start doing that and return to the
usual semantics.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 7/7] Use simple_start_creating() in various places.
2025-09-10 4:16 ` Al Viro
2025-09-10 7:37 ` Al Viro
@ 2025-09-10 23:13 ` NeilBrown
1 sibling, 0 replies; 25+ messages in thread
From: NeilBrown @ 2025-09-10 23:13 UTC (permalink / raw)
To: Al Viro
Cc: Christian Brauner, Amir Goldstein, Jeff Layton, Jan Kara,
linux-fsdevel
On Wed, 10 Sep 2025, Al Viro wrote:
> On Wed, Sep 10, 2025 at 05:12:49AM +0100, Al Viro wrote:
> > On Wed, Sep 10, 2025 at 01:01:49PM +1000, NeilBrown wrote:
> > > On Tue, 09 Sep 2025, Al Viro wrote:
> > > > On Tue, Sep 09, 2025 at 02:43:21PM +1000, NeilBrown wrote:
> > > >
> > > > > d_instantiate(dentry, inode);
> > > > > - dget(dentry);
> > > > > -fail:
> > > > > - inode_unlock(d_inode(parent));
> > > > > - return dentry;
> > > > > + return simple_end_creating(dentry);
> > > >
> > > > No. This is the wrong model - dget() belongs with d_instantiate()
> > > > here; your simple_end_creating() calling conventions are wrong.
> > >
> > > I can see that I shouldn't have removed the dget() there - thanks.
> > > It is not entirely clear why hypfs_create_file() returns with two
> > > references held to the dentry....
> > > I see now one is added either to ->update_file or the list at
> > > hypfs_last_dentry, and the other is disposed of by kill_litter_super().
> > >
> > > But apart from that one error is there something broader wrong with the
> > > patch? You say "the wrong model" but I don't see it.
> >
> > See below for hypfs:
>
> ... and see viro/vfs.git#work.persistency for the part of the queue that
> had order already settled down (I'm reshuffling the tail at the moment;
> hypfs commit is still in the leftovers pile - the whole thing used to
> have a really messy topology, with most of the prep work that used to
> be the cause of that topology already in mainline - e.g. rpc_pipefs
> series, securityfs one, etc.)
>
Thanks.
I now see you were saying that you are introducing a new model for these
simple_ filesystems and the simple_end_creating() I proposed does not
fit with that model. Fair enough - I do agree your model is better. I
assume it will eventually remove DCACHE_DENTRY_KILLED as setting that is
equivalent to clearing the new DCACHE_PERSISTENT.
So I'll leave those filesystems alone for now, though I would like to
change "start_creating()" in debugfs to something like
"debugfs_start_creating()" so I can use the generic name elsewhere.
But are you OK with 1-5 of this series? I've got a comment addition but no
other changes suggested in review.
May I add the start_creating() rename patch 6?
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 1/7] VFS/ovl: add lookup_one_positive_killable()
2025-09-09 4:43 ` [PATCH v2 1/7] VFS/ovl: add lookup_one_positive_killable() NeilBrown
@ 2025-09-11 20:05 ` Al Viro
2025-09-11 23:02 ` NeilBrown
0 siblings, 1 reply; 25+ messages in thread
From: Al Viro @ 2025-09-11 20:05 UTC (permalink / raw)
To: neil
Cc: Christian Brauner, Amir Goldstein, Jeff Layton, Jan Kara,
linux-fsdevel
On Tue, Sep 09, 2025 at 02:43:15PM +1000, NeilBrown wrote:
> Note that instead of always getting an exclusive lock, ovl now only gets
> a shared lock, and only sometimes. The exclusive lock was never needed.
what it is the locking environment in callers and what stabilizes
that list hanging off rdd, seeing that you now run through it without
having dir held exclusive?
> - int err;
> + int err = 0;
> struct dentry *dentry, *dir = path->dentry;
> const struct cred *old_cred;
>
> old_cred = ovl_override_creds(rdd->dentry->d_sb);
>
> - err = down_write_killable(&dir->d_inode->i_rwsem);
> - if (!err) {
> - while (rdd->first_maybe_whiteout) {
> - struct ovl_cache_entry *p =
> - rdd->first_maybe_whiteout;
> - rdd->first_maybe_whiteout = p->next_maybe_whiteout;
> - dentry = lookup_one(mnt_idmap(path->mnt),
> - &QSTR_LEN(p->name, p->len), dir);
> - if (!IS_ERR(dentry)) {
> - p->is_whiteout = ovl_is_whiteout(dentry);
> - dput(dentry);
> - }
> + while (rdd->first_maybe_whiteout) {
> + struct ovl_cache_entry *p =
> + rdd->first_maybe_whiteout;
> + rdd->first_maybe_whiteout = p->next_maybe_whiteout;
> + dentry = lookup_one_positive_killable(mnt_idmap(path->mnt),
> + &QSTR_LEN(p->name, p->len),
> + dir);
> + if (!IS_ERR(dentry)) {
> + p->is_whiteout = ovl_is_whiteout(dentry);
> + dput(dentry);
> + } else if (PTR_ERR(dentry) == -EINTR) {
> + err = -EINTR;
> + break;
> }
> - inode_unlock(dir->d_inode);
> }
> ovl_revert_creds(old_cred);
>
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 5d085428e471..551a1a01e5e7 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -80,6 +80,9 @@ struct dentry *lookup_one_unlocked(struct mnt_idmap *idmap,
> struct dentry *lookup_one_positive_unlocked(struct mnt_idmap *idmap,
> struct qstr *name,
> struct dentry *base);
> +struct dentry *lookup_one_positive_killable(struct mnt_idmap *idmap,
> + struct qstr *name,
> + struct dentry *base);
>
> extern int follow_down_one(struct path *);
> extern int follow_down(struct path *path, unsigned int flags);
> --
> 2.50.0.107.gf914562f5916.dirty
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 1/7] VFS/ovl: add lookup_one_positive_killable()
2025-09-11 20:05 ` Al Viro
@ 2025-09-11 23:02 ` NeilBrown
0 siblings, 0 replies; 25+ messages in thread
From: NeilBrown @ 2025-09-11 23:02 UTC (permalink / raw)
To: Al Viro
Cc: Christian Brauner, Amir Goldstein, Jeff Layton, Jan Kara,
linux-fsdevel
On Fri, 12 Sep 2025, Al Viro wrote:
> On Tue, Sep 09, 2025 at 02:43:15PM +1000, NeilBrown wrote:
>
> > Note that instead of always getting an exclusive lock, ovl now only gets
> > a shared lock, and only sometimes. The exclusive lock was never needed.
>
> what it is the locking environment in callers and what stabilizes
> that list hanging off rdd, seeing that you now run through it without
> having dir held exclusive?
rdd is per-thread - always allocated on the stack.
In this case an iterate_dir() call has built a cache of a directory and
the maybe_whiteout and been constructed of everything that is DT_CHR.
Now, outside the readdir lock, we are lookup up each of those to check
if they are really whiteouts.
So no other thread can tough this list at all.
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2025-09-11 23:02 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-09 4:43 [PATCH v2 0/8] VFS: more prep for change to directory locking NeilBrown
2025-09-09 4:43 ` [PATCH v2 1/7] VFS/ovl: add lookup_one_positive_killable() NeilBrown
2025-09-11 20:05 ` Al Viro
2025-09-11 23:02 ` NeilBrown
2025-09-09 4:43 ` [PATCH v2 2/7] VFS: discard err2 in filename_create() NeilBrown
2025-09-09 11:24 ` Jeff Layton
2025-09-09 4:43 ` [PATCH v2 3/7] VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata NeilBrown
2025-09-09 4:43 ` [PATCH v2 4/7] VFS/audit: introduce kern_path_parent() for audit NeilBrown
2025-09-09 4:43 ` [PATCH v2 5/7] VFS: rename kern_path_locked() and related functions NeilBrown
2025-09-09 14:07 ` Amir Goldstein
2025-09-10 2:49 ` NeilBrown
2025-09-09 4:43 ` [PATCH v2 6/7] VFS: introduce simple_end_creating() and simple_failed_creating() NeilBrown
2025-09-09 8:20 ` Al Viro
2025-09-09 4:43 ` [PATCH v2 7/7] Use simple_start_creating() in various places NeilBrown
2025-09-09 8:19 ` Al Viro
2025-09-10 3:01 ` NeilBrown
2025-09-10 4:12 ` Al Viro
2025-09-10 4:16 ` Al Viro
2025-09-10 7:37 ` Al Viro
2025-09-10 11:04 ` Jeff Layton
2025-09-10 11:30 ` NeilBrown
2025-09-10 11:54 ` Jeff Layton
2025-09-10 18:28 ` Al Viro
2025-09-10 12:34 ` Jeff Layton
2025-09-10 23:13 ` NeilBrown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).