* [PATCH 01/11] debugfs: rename end_creating() to debugfs_end_creating()
2025-09-26 2:49 [PATCH 00/11] Create APIs to centralise locking for directory ops NeilBrown
@ 2025-09-26 2:49 ` NeilBrown
2025-09-27 9:13 ` Amir Goldstein
2025-09-27 11:29 ` Jeff Layton
2025-09-26 2:49 ` [PATCH 02/11] VFS: introduce start_dirop() and end_dirop() NeilBrown
` (10 subsequent siblings)
11 siblings, 2 replies; 49+ messages in thread
From: NeilBrown @ 2025-09-26 2:49 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
By not using the generic end_creating() name here we are free to use it
more globally for a more generic function.
This should have been done when start_creating() was renamed.
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/debugfs/inode.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
index 661a99a7dfbe..b863c8d0cbcd 100644
--- a/fs/debugfs/inode.c
+++ b/fs/debugfs/inode.c
@@ -411,7 +411,7 @@ static struct dentry *failed_creating(struct dentry *dentry)
return ERR_PTR(-ENOMEM);
}
-static struct dentry *end_creating(struct dentry *dentry)
+static struct dentry *debugfs_end_creating(struct dentry *dentry)
{
inode_unlock(d_inode(dentry->d_parent));
return dentry;
@@ -458,7 +458,7 @@ static struct dentry *__debugfs_create_file(const char *name, umode_t mode,
d_instantiate(dentry, inode);
fsnotify_create(d_inode(dentry->d_parent), dentry);
- return end_creating(dentry);
+ return debugfs_end_creating(dentry);
}
struct dentry *debugfs_create_file_full(const char *name, umode_t mode,
@@ -605,7 +605,7 @@ struct dentry *debugfs_create_dir(const char *name, struct dentry *parent)
d_instantiate(dentry, inode);
inc_nlink(d_inode(dentry->d_parent));
fsnotify_mkdir(d_inode(dentry->d_parent), dentry);
- return end_creating(dentry);
+ return debugfs_end_creating(dentry);
}
EXPORT_SYMBOL_GPL(debugfs_create_dir);
@@ -652,7 +652,7 @@ struct dentry *debugfs_create_automount(const char *name,
d_instantiate(dentry, inode);
inc_nlink(d_inode(dentry->d_parent));
fsnotify_mkdir(d_inode(dentry->d_parent), dentry);
- return end_creating(dentry);
+ return debugfs_end_creating(dentry);
}
EXPORT_SYMBOL(debugfs_create_automount);
@@ -705,7 +705,7 @@ struct dentry *debugfs_create_symlink(const char *name, struct dentry *parent,
inode->i_op = &debugfs_symlink_inode_operations;
inode->i_link = link;
d_instantiate(dentry, inode);
- return end_creating(dentry);
+ return debugfs_end_creating(dentry);
}
EXPORT_SYMBOL_GPL(debugfs_create_symlink);
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 49+ messages in thread* Re: [PATCH 01/11] debugfs: rename end_creating() to debugfs_end_creating()
2025-09-26 2:49 ` [PATCH 01/11] debugfs: rename end_creating() to debugfs_end_creating() NeilBrown
@ 2025-09-27 9:13 ` Amir Goldstein
2025-09-27 11:29 ` Jeff Layton
1 sibling, 0 replies; 49+ messages in thread
From: Amir Goldstein @ 2025-09-27 9:13 UTC (permalink / raw)
To: NeilBrown
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Fri, Sep 26, 2025 at 4:50 AM NeilBrown <neilb@ownmail.net> wrote:
>
> From: NeilBrown <neil@brown.name>
>
> By not using the generic end_creating() name here we are free to use it
> more globally for a more generic function.
> This should have been done when start_creating() was renamed.
>
> Signed-off-by: NeilBrown <neil@brown.name>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> ---
> fs/debugfs/inode.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
> index 661a99a7dfbe..b863c8d0cbcd 100644
> --- a/fs/debugfs/inode.c
> +++ b/fs/debugfs/inode.c
> @@ -411,7 +411,7 @@ static struct dentry *failed_creating(struct dentry *dentry)
> return ERR_PTR(-ENOMEM);
> }
>
> -static struct dentry *end_creating(struct dentry *dentry)
> +static struct dentry *debugfs_end_creating(struct dentry *dentry)
> {
> inode_unlock(d_inode(dentry->d_parent));
> return dentry;
> @@ -458,7 +458,7 @@ static struct dentry *__debugfs_create_file(const char *name, umode_t mode,
>
> d_instantiate(dentry, inode);
> fsnotify_create(d_inode(dentry->d_parent), dentry);
> - return end_creating(dentry);
> + return debugfs_end_creating(dentry);
> }
>
> struct dentry *debugfs_create_file_full(const char *name, umode_t mode,
> @@ -605,7 +605,7 @@ struct dentry *debugfs_create_dir(const char *name, struct dentry *parent)
> d_instantiate(dentry, inode);
> inc_nlink(d_inode(dentry->d_parent));
> fsnotify_mkdir(d_inode(dentry->d_parent), dentry);
> - return end_creating(dentry);
> + return debugfs_end_creating(dentry);
> }
> EXPORT_SYMBOL_GPL(debugfs_create_dir);
>
> @@ -652,7 +652,7 @@ struct dentry *debugfs_create_automount(const char *name,
> d_instantiate(dentry, inode);
> inc_nlink(d_inode(dentry->d_parent));
> fsnotify_mkdir(d_inode(dentry->d_parent), dentry);
> - return end_creating(dentry);
> + return debugfs_end_creating(dentry);
> }
> EXPORT_SYMBOL(debugfs_create_automount);
>
> @@ -705,7 +705,7 @@ struct dentry *debugfs_create_symlink(const char *name, struct dentry *parent,
> inode->i_op = &debugfs_symlink_inode_operations;
> inode->i_link = link;
> d_instantiate(dentry, inode);
> - return end_creating(dentry);
> + return debugfs_end_creating(dentry);
> }
> EXPORT_SYMBOL_GPL(debugfs_create_symlink);
>
> --
> 2.50.0.107.gf914562f5916.dirty
>
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH 01/11] debugfs: rename end_creating() to debugfs_end_creating()
2025-09-26 2:49 ` [PATCH 01/11] debugfs: rename end_creating() to debugfs_end_creating() NeilBrown
2025-09-27 9:13 ` Amir Goldstein
@ 2025-09-27 11:29 ` Jeff Layton
1 sibling, 0 replies; 49+ messages in thread
From: Jeff Layton @ 2025-09-27 11:29 UTC (permalink / raw)
To: NeilBrown, Alexander Viro, Christian Brauner, Amir Goldstein
Cc: Jan Kara, linux-fsdevel
On Fri, 2025-09-26 at 12:49 +1000, NeilBrown wrote:
> From: NeilBrown <neil@brown.name>
>
> By not using the generic end_creating() name here we are free to use it
> more globally for a more generic function.
> This should have been done when start_creating() was renamed.
>
> Signed-off-by: NeilBrown <neil@brown.name>
> ---
> fs/debugfs/inode.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
> index 661a99a7dfbe..b863c8d0cbcd 100644
> --- a/fs/debugfs/inode.c
> +++ b/fs/debugfs/inode.c
> @@ -411,7 +411,7 @@ static struct dentry *failed_creating(struct dentry *dentry)
> return ERR_PTR(-ENOMEM);
> }
>
> -static struct dentry *end_creating(struct dentry *dentry)
> +static struct dentry *debugfs_end_creating(struct dentry *dentry)
> {
> inode_unlock(d_inode(dentry->d_parent));
> return dentry;
> @@ -458,7 +458,7 @@ static struct dentry *__debugfs_create_file(const char *name, umode_t mode,
>
> d_instantiate(dentry, inode);
> fsnotify_create(d_inode(dentry->d_parent), dentry);
> - return end_creating(dentry);
> + return debugfs_end_creating(dentry);
> }
>
> struct dentry *debugfs_create_file_full(const char *name, umode_t mode,
> @@ -605,7 +605,7 @@ struct dentry *debugfs_create_dir(const char *name, struct dentry *parent)
> d_instantiate(dentry, inode);
> inc_nlink(d_inode(dentry->d_parent));
> fsnotify_mkdir(d_inode(dentry->d_parent), dentry);
> - return end_creating(dentry);
> + return debugfs_end_creating(dentry);
> }
> EXPORT_SYMBOL_GPL(debugfs_create_dir);
>
> @@ -652,7 +652,7 @@ struct dentry *debugfs_create_automount(const char *name,
> d_instantiate(dentry, inode);
> inc_nlink(d_inode(dentry->d_parent));
> fsnotify_mkdir(d_inode(dentry->d_parent), dentry);
> - return end_creating(dentry);
> + return debugfs_end_creating(dentry);
> }
> EXPORT_SYMBOL(debugfs_create_automount);
>
> @@ -705,7 +705,7 @@ struct dentry *debugfs_create_symlink(const char *name, struct dentry *parent,
> inode->i_op = &debugfs_symlink_inode_operations;
> inode->i_link = link;
> d_instantiate(dentry, inode);
> - return end_creating(dentry);
> + return debugfs_end_creating(dentry);
> }
> EXPORT_SYMBOL_GPL(debugfs_create_symlink);
>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH 02/11] VFS: introduce start_dirop() and end_dirop()
2025-09-26 2:49 [PATCH 00/11] Create APIs to centralise locking for directory ops NeilBrown
2025-09-26 2:49 ` [PATCH 01/11] debugfs: rename end_creating() to debugfs_end_creating() NeilBrown
@ 2025-09-26 2:49 ` NeilBrown
2025-09-26 16:41 ` Amir Goldstein
2025-09-26 2:49 ` [PATCH 03/11] VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating() NeilBrown
` (9 subsequent siblings)
11 siblings, 1 reply; 49+ messages in thread
From: NeilBrown @ 2025-09-26 2:49 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
The fact that directory operations (create,remove,rename) are protected
by a lock on the parent is known widely throughout the kernel.
In order to change this - to instead lock the target dentry - it is
best to centralise this knowledge so it can be changed in one place.
This patch introduces start_dirop() which is local to VFS code.
It performs the required locking for create and remove. Rename
will be handled separately.
Various functions with names like start_creating() or start_removing_path(),
some of which already exist, will export this functionality beyond the VFS.
end_dirop() is the partner of start_dirop(). It drops the lock and
releases the reference on the dentry.
It *is* exported so that various end_creating etc functions can be inline.
As vfs_mkdir() drops the dentry on error we cannot use end_dirop() as
that won't unlock when the dentry IS_ERR(). For those cases we have
end_dirop_mkdir().
end_dirop() can always be called on the result of start_dirop(), but not
after vfs_mkdir().
end_dirop_mkdir() can only be called on the result of start_dirop() if
that was not an error, and can also be called on the result of
vfs_mkdir().
When we change vfs_mkdir() to drop the lock when it drops the dentry,
end_dirop_mkdir() can be discarded.
As well as adding start_dirop() and end_dirop()/end_dirop_mkdir()
this patch uses them in:
- simple_start_creating (which requires sharing lookup_noperm_common()
with libfs.c)
- start_removing_path / start_removing_user_path_at
- filename_create / end_creating_path()
- do_rmdir(), do_unlinkat()
The change in do_unlinkat() opens the opportunity for some cleanup.
As we don't need to unlock on lookup failure, "inode" can be local
to the non-error patch. Also the "slashes" handler is moved
in-line with an "unlikely" annotation on the branch.
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/internal.h | 3 ++
fs/libfs.c | 36 ++++++-------
fs/namei.c | 126 +++++++++++++++++++++++++++++++--------------
include/linux/fs.h | 3 ++
4 files changed, 110 insertions(+), 58 deletions(-)
diff --git a/fs/internal.h b/fs/internal.h
index a33d18ee5b74..d11fe787bbc1 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -67,6 +67,9 @@ int vfs_tmpfile(struct mnt_idmap *idmap,
const struct path *parentpath,
struct file *file, umode_t mode);
struct dentry *d_hash_and_lookup(struct dentry *, struct qstr *);
+struct dentry *start_dirop(struct dentry *parent, struct qstr *name,
+ unsigned int lookup_flags);
+int lookup_noperm_common(struct qstr *qname, struct dentry *base);
/*
* namespace.c
diff --git a/fs/libfs.c b/fs/libfs.c
index ce8c496a6940..fc979becd536 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -2289,27 +2289,25 @@ void stashed_dentry_prune(struct dentry *dentry)
cmpxchg(stashed, dentry, NULL);
}
-/* parent must be held exclusive */
+/**
+ * simple_start_creating - prepare to create a given name
+ * @parent - directory in which to prepare to create the name
+ * @name - the name to be created
+ *
+ * Required lock is taken and a lookup in performed prior to creating an
+ * object in a directory. No permission checking is performed.
+ *
+ * Returns: a negative dentry on which vfs_create() or similar may
+ * be attempted, or an error.
+ */
struct dentry *simple_start_creating(struct dentry *parent, const char *name)
{
- struct dentry *dentry;
- struct inode *dir = d_inode(parent);
+ struct qstr qname = QSTR(name);
+ int err;
- inode_lock(dir);
- if (unlikely(IS_DEADDIR(dir))) {
- inode_unlock(dir);
- return ERR_PTR(-ENOENT);
- }
- dentry = lookup_noperm(&QSTR(name), parent);
- if (IS_ERR(dentry)) {
- inode_unlock(dir);
- return dentry;
- }
- if (dentry->d_inode) {
- dput(dentry);
- inode_unlock(dir);
- return ERR_PTR(-EEXIST);
- }
- return dentry;
+ err = lookup_noperm_common(&qname, parent);
+ if (err)
+ return ERR_PTR(err);
+ return start_dirop(parent, &qname, LOOKUP_CREATE | LOOKUP_EXCL);
}
EXPORT_SYMBOL(simple_start_creating);
diff --git a/fs/namei.c b/fs/namei.c
index 507ca0d7878d..81cbaabbbe21 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2765,6 +2765,69 @@ static int filename_parentat(int dfd, struct filename *name,
return __filename_parentat(dfd, name, flags, parent, last, type, NULL);
}
+/**
+ * start_dirop - begin a create or remove dirop, performing locking and lookup
+ * @parent - the dentry of the parent in which the operation will occur
+ * @name - a qstr holding the name within that parent
+ * @lookup_flags - intent and other lookup flags.
+ *
+ * The lookup is performed and necessarly locks are taken so that, on success,
+ * the returned dentry can be operated on safely.
+ * The qstr must already have the hash value calculated.
+ *
+ * Returns: a locked dentry, or an error.
+ *
+ */
+struct dentry *start_dirop(struct dentry *parent, struct qstr *name,
+ unsigned int lookup_flags)
+{
+ struct dentry *dentry;
+ struct inode *dir = d_inode(parent);
+
+ inode_lock_nested(dir, I_MUTEX_PARENT);
+ dentry = lookup_one_qstr_excl(name, parent, lookup_flags);
+ if (IS_ERR(dentry))
+ inode_unlock(dir);
+ return dentry;
+}
+
+/**
+ * end_dirop - signal completion of a dirop
+ * @de - the dentry which was returned by start_dirop or similar.
+ *
+ * If the de is an error, nothing happens. Otherwise any lock taken to
+ * protect the dentry is dropped and the dentry itself is release (dput()).
+ */
+void end_dirop(struct dentry *de)
+{
+ if (!IS_ERR(de)) {
+ inode_unlock(de->d_parent->d_inode);
+ dput(de);
+ }
+}
+EXPORT_SYMBOL(end_dirop);
+
+/**
+ * end_dirop_mkdir - signal completion of a dirop which could have been vfs_mkdir
+ * @de - the dentry which was returned by start_dirop or similar.
+ * @parent - the parent in which the mkdir happened.
+ *
+ * Because vfs_mkdir() dput()s the dentry on failure, end_dirop() cannot
+ * be used with it. Instead this function must be used, and it must not
+ * be called if the original lookup failed.
+ *
+ * If de is an error the parent is unlocked, else this behaves the same as
+ * end_dirop().
+ */
+void end_dirop_mkdir(struct dentry *de, struct dentry *parent)
+{
+ if (IS_ERR(de))
+ inode_unlock(parent->d_inode);
+ else
+ end_dirop(de);
+}
+EXPORT_SYMBOL(end_dirop_mkdir);
+
/* does lookup, returns the object with parent locked */
static struct dentry *__start_removing_path(int dfd, struct filename *name,
struct path *path)
@@ -2781,10 +2844,9 @@ static struct dentry *__start_removing_path(int dfd, struct filename *name,
return ERR_PTR(-EINVAL);
/* don't fail immediately if it's r/o, at least try to report other errors */
error = mnt_want_write(parent_path.mnt);
- inode_lock_nested(parent_path.dentry->d_inode, I_MUTEX_PARENT);
- d = lookup_one_qstr_excl(&last, parent_path.dentry, 0);
+ d = start_dirop(parent_path.dentry, &last, 0);
if (IS_ERR(d))
- goto unlock;
+ goto drop;
if (error)
goto fail;
path->dentry = no_free_ptr(parent_path.dentry);
@@ -2792,10 +2854,9 @@ static struct dentry *__start_removing_path(int dfd, struct filename *name,
return d;
fail:
- dput(d);
+ end_dirop(d);
d = ERR_PTR(error);
-unlock:
- inode_unlock(parent_path.dentry->d_inode);
+drop:
if (!error)
mnt_drop_write(parent_path.mnt);
return d;
@@ -2910,7 +2971,7 @@ int vfs_path_lookup(struct dentry *dentry, struct vfsmount *mnt,
}
EXPORT_SYMBOL(vfs_path_lookup);
-static int lookup_noperm_common(struct qstr *qname, struct dentry *base)
+int lookup_noperm_common(struct qstr *qname, struct dentry *base)
{
const char *name = qname->name;
u32 len = qname->len;
@@ -4223,21 +4284,18 @@ static struct dentry *filename_create(int dfd, struct filename *name,
*/
if (last.name[last.len] && !want_dir)
create_flags &= ~LOOKUP_CREATE;
- inode_lock_nested(path->dentry->d_inode, I_MUTEX_PARENT);
- dentry = lookup_one_qstr_excl(&last, path->dentry,
- reval_flag | create_flags);
+ dentry = start_dirop(path->dentry, &last, reval_flag | create_flags);
if (IS_ERR(dentry))
- goto unlock;
+ goto out_drop_write;
if (unlikely(error))
goto fail;
return dentry;
fail:
- dput(dentry);
+ end_dirop(dentry);
dentry = ERR_PTR(error);
-unlock:
- inode_unlock(path->dentry->d_inode);
+out_drop_write:
if (!error)
mnt_drop_write(path->mnt);
out:
@@ -4258,9 +4316,7 @@ EXPORT_SYMBOL(start_creating_path);
void end_creating_path(struct path *path, struct dentry *dentry)
{
- if (!IS_ERR(dentry))
- dput(dentry);
- inode_unlock(path->dentry->d_inode);
+ end_dirop_mkdir(dentry, path->dentry);
mnt_drop_write(path->mnt);
path_put(path);
}
@@ -4592,8 +4648,7 @@ int do_rmdir(int dfd, struct filename *name)
if (error)
goto exit2;
- inode_lock_nested(path.dentry->d_inode, I_MUTEX_PARENT);
- dentry = lookup_one_qstr_excl(&last, path.dentry, lookup_flags);
+ dentry = start_dirop(path.dentry, &last, lookup_flags);
error = PTR_ERR(dentry);
if (IS_ERR(dentry))
goto exit3;
@@ -4602,9 +4657,8 @@ int do_rmdir(int dfd, struct filename *name)
goto exit4;
error = vfs_rmdir(mnt_idmap(path.mnt), path.dentry->d_inode, dentry);
exit4:
- dput(dentry);
+ end_dirop(dentry);
exit3:
- inode_unlock(path.dentry->d_inode);
mnt_drop_write(path.mnt);
exit2:
path_put(&path);
@@ -4705,7 +4759,6 @@ int do_unlinkat(int dfd, struct filename *name)
struct path path;
struct qstr last;
int type;
- struct inode *inode = NULL;
struct inode *delegated_inode = NULL;
unsigned int lookup_flags = 0;
retry:
@@ -4721,14 +4774,19 @@ int do_unlinkat(int dfd, struct filename *name)
if (error)
goto exit2;
retry_deleg:
- inode_lock_nested(path.dentry->d_inode, I_MUTEX_PARENT);
- dentry = lookup_one_qstr_excl(&last, path.dentry, lookup_flags);
+ dentry = start_dirop(path.dentry, &last, lookup_flags);
error = PTR_ERR(dentry);
if (!IS_ERR(dentry)) {
+ struct inode *inode = NULL;
/* Why not before? Because we want correct error value */
- if (last.name[last.len])
- goto slashes;
+ if (unlikely(last.name[last.len])) {
+ if (d_is_dir(dentry))
+ error = -EISDIR;
+ else
+ error = -ENOTDIR;
+ goto exit3;
+ }
inode = dentry->d_inode;
ihold(inode);
error = security_path_unlink(&path, dentry);
@@ -4737,12 +4795,10 @@ int do_unlinkat(int dfd, struct filename *name)
error = vfs_unlink(mnt_idmap(path.mnt), path.dentry->d_inode,
dentry, &delegated_inode);
exit3:
- dput(dentry);
+ end_dirop(dentry);
+ if (inode)
+ iput(inode); /* truncate the inode here */
}
- inode_unlock(path.dentry->d_inode);
- if (inode)
- iput(inode); /* truncate the inode here */
- inode = NULL;
if (delegated_inode) {
error = break_deleg_wait(&delegated_inode);
if (!error)
@@ -4753,19 +4809,11 @@ int do_unlinkat(int dfd, struct filename *name)
path_put(&path);
if (retry_estale(error, lookup_flags)) {
lookup_flags |= LOOKUP_REVAL;
- inode = NULL;
goto retry;
}
exit1:
putname(name);
return error;
-
-slashes:
- if (d_is_dir(dentry))
- error = -EISDIR;
- else
- error = -ENOTDIR;
- goto exit3;
}
SYSCALL_DEFINE3(unlinkat, int, dfd, const char __user *, pathname, int, flag)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9e9d7c757efe..738554664d54 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3599,6 +3599,9 @@ extern void iterate_supers_type(struct file_system_type *,
void filesystems_freeze(void);
void filesystems_thaw(void);
+void end_dirop(struct dentry *de);
+void end_dirop_mkdir(struct dentry *de, struct dentry *parent);
+
extern int dcache_dir_open(struct inode *, struct file *);
extern int dcache_dir_close(struct inode *, struct file *);
extern loff_t dcache_dir_lseek(struct file *, loff_t, int);
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 49+ messages in thread* Re: [PATCH 02/11] VFS: introduce start_dirop() and end_dirop()
2025-09-26 2:49 ` [PATCH 02/11] VFS: introduce start_dirop() and end_dirop() NeilBrown
@ 2025-09-26 16:41 ` Amir Goldstein
2025-09-27 11:32 ` NeilBrown
0 siblings, 1 reply; 49+ messages in thread
From: Amir Goldstein @ 2025-09-26 16:41 UTC (permalink / raw)
To: NeilBrown
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Fri, Sep 26, 2025 at 4:50 AM NeilBrown <neilb@ownmail.net> wrote:
>
> From: NeilBrown <neil@brown.name>
>
> The fact that directory operations (create,remove,rename) are protected
> by a lock on the parent is known widely throughout the kernel.
> In order to change this - to instead lock the target dentry - it is
> best to centralise this knowledge so it can be changed in one place.
>
> This patch introduces start_dirop() which is local to VFS code.
> It performs the required locking for create and remove. Rename
> will be handled separately.
>
> Various functions with names like start_creating() or start_removing_path(),
> some of which already exist, will export this functionality beyond the VFS.
>
> end_dirop() is the partner of start_dirop(). It drops the lock and
> releases the reference on the dentry.
> It *is* exported so that various end_creating etc functions can be inline.
>
--- from here
> As vfs_mkdir() drops the dentry on error we cannot use end_dirop() as
> that won't unlock when the dentry IS_ERR(). For those cases we have
> end_dirop_mkdir().
>
> end_dirop() can always be called on the result of start_dirop(), but not
> after vfs_mkdir().
> end_dirop_mkdir() can only be called on the result of start_dirop() if
> that was not an error, and can also be called on the result of
> vfs_mkdir().
>
> When we change vfs_mkdir() to drop the lock when it drops the dentry,
> end_dirop_mkdir() can be discarded.
---until here
I am really struggling swallowing end_dirop_mkdir() as a temp helper
It's has such fluid and weird semantics nobody has a chance to
remember or guess, it is scheduled to be removed, and it is
only used in two helpers end_creating_path() and end_creating()
(right?) both of which have perfectly understandable and normal
semantics.
So how about we stop pretending that end_dirop_mkdir() is a sane
abstraction and open code it twice inside the helpers where
the code makes sense and is well documented?
Am I missing some subtlety? or missing the bigger picture?
>
> As well as adding start_dirop() and end_dirop()/end_dirop_mkdir()
> this patch uses them in:
> - simple_start_creating (which requires sharing lookup_noperm_common()
> with libfs.c)
> - start_removing_path / start_removing_user_path_at
> - filename_create / end_creating_path()
> - do_rmdir(), do_unlinkat()
>
> The change in do_unlinkat() opens the opportunity for some cleanup.
> As we don't need to unlock on lookup failure, "inode" can be local
> to the non-error patch. Also the "slashes" handler is moved
> in-line with an "unlikely" annotation on the branch.
>
> Signed-off-by: NeilBrown <neil@brown.name>
> ---
> fs/internal.h | 3 ++
> fs/libfs.c | 36 ++++++-------
> fs/namei.c | 126 +++++++++++++++++++++++++++++++--------------
> include/linux/fs.h | 3 ++
> 4 files changed, 110 insertions(+), 58 deletions(-)
>
> diff --git a/fs/internal.h b/fs/internal.h
> index a33d18ee5b74..d11fe787bbc1 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -67,6 +67,9 @@ int vfs_tmpfile(struct mnt_idmap *idmap,
> const struct path *parentpath,
> struct file *file, umode_t mode);
> struct dentry *d_hash_and_lookup(struct dentry *, struct qstr *);
> +struct dentry *start_dirop(struct dentry *parent, struct qstr *name,
> + unsigned int lookup_flags);
> +int lookup_noperm_common(struct qstr *qname, struct dentry *base);
>
> /*
> * namespace.c
> diff --git a/fs/libfs.c b/fs/libfs.c
> index ce8c496a6940..fc979becd536 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -2289,27 +2289,25 @@ void stashed_dentry_prune(struct dentry *dentry)
> cmpxchg(stashed, dentry, NULL);
> }
>
> -/* parent must be held exclusive */
> +/**
> + * simple_start_creating - prepare to create a given name
> + * @parent - directory in which to prepare to create the name
> + * @name - the name to be created
> + *
> + * Required lock is taken and a lookup in performed prior to creating an
> + * object in a directory. No permission checking is performed.
> + *
> + * Returns: a negative dentry on which vfs_create() or similar may
> + * be attempted, or an error.
> + */
> struct dentry *simple_start_creating(struct dentry *parent, const char *name)
> {
> - struct dentry *dentry;
> - struct inode *dir = d_inode(parent);
> + struct qstr qname = QSTR(name);
> + int err;
>
> - inode_lock(dir);
> - if (unlikely(IS_DEADDIR(dir))) {
> - inode_unlock(dir);
> - return ERR_PTR(-ENOENT);
> - }
> - dentry = lookup_noperm(&QSTR(name), parent);
> - if (IS_ERR(dentry)) {
> - inode_unlock(dir);
> - return dentry;
> - }
> - if (dentry->d_inode) {
> - dput(dentry);
> - inode_unlock(dir);
> - return ERR_PTR(-EEXIST);
> - }
> - return dentry;
> + err = lookup_noperm_common(&qname, parent);
> + if (err)
> + return ERR_PTR(err);
> + return start_dirop(parent, &qname, LOOKUP_CREATE | LOOKUP_EXCL);
> }
> EXPORT_SYMBOL(simple_start_creating);
> diff --git a/fs/namei.c b/fs/namei.c
> index 507ca0d7878d..81cbaabbbe21 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -2765,6 +2765,69 @@ static int filename_parentat(int dfd, struct filename *name,
> return __filename_parentat(dfd, name, flags, parent, last, type, NULL);
> }
>
> +/**
> + * start_dirop - begin a create or remove dirop, performing locking and lookup
> + * @parent - the dentry of the parent in which the operation will occur
> + * @name - a qstr holding the name within that parent
> + * @lookup_flags - intent and other lookup flags.
> + *
> + * The lookup is performed and necessary locks are taken so that, on success,
typo: necessary
> + * the returned dentry can be operated on safely.
> + * The qstr must already have the hash value calculated.
> + *
> + * Returns: a locked dentry, or an error.
> + *
> + */
> +struct dentry *start_dirop(struct dentry *parent, struct qstr *name,
> + unsigned int lookup_flags)
> +{
> + struct dentry *dentry;
> + struct inode *dir = d_inode(parent);
> +
> + inode_lock_nested(dir, I_MUTEX_PARENT);
> + dentry = lookup_one_qstr_excl(name, parent, lookup_flags);
> + if (IS_ERR(dentry))
> + inode_unlock(dir);
> + return dentry;
> +}
> +
> +/**
> + * end_dirop - signal completion of a dirop
> + * @de - the dentry which was returned by start_dirop or similar.
> + *
> + * If the de is an error, nothing happens. Otherwise any lock taken to
> + * protect the dentry is dropped and the dentry itself is release (dput()).
> + */
> +void end_dirop(struct dentry *de)
> +{
> + if (!IS_ERR(de)) {
> + inode_unlock(de->d_parent->d_inode);
> + dput(de);
> + }
> +}
> +EXPORT_SYMBOL(end_dirop);
> +
> +/**
> + * end_dirop_mkdir - signal completion of a dirop which could have been vfs_mkdir
> + * @de - the dentry which was returned by start_dirop or similar.
> + * @parent - the parent in which the mkdir happened.
> + *
> + * Because vfs_mkdir() dput()s the dentry on failure, end_dirop() cannot
> + * be used with it. Instead this function must be used, and it must not
> + * be called if the original lookup failed.
> + *
> + * If de is an error the parent is unlocked, else this behaves the same as
> + * end_dirop().
> + */
> +void end_dirop_mkdir(struct dentry *de, struct dentry *parent)
> +{
> + if (IS_ERR(de))
> + inode_unlock(parent->d_inode);
> + else
> + end_dirop(de);
> +}
> +EXPORT_SYMBOL(end_dirop_mkdir);
> +
> /* does lookup, returns the object with parent locked */
> static struct dentry *__start_removing_path(int dfd, struct filename *name,
> struct path *path)
> @@ -2781,10 +2844,9 @@ static struct dentry *__start_removing_path(int dfd, struct filename *name,
> return ERR_PTR(-EINVAL);
> /* don't fail immediately if it's r/o, at least try to report other errors */
> error = mnt_want_write(parent_path.mnt);
> - inode_lock_nested(parent_path.dentry->d_inode, I_MUTEX_PARENT);
> - d = lookup_one_qstr_excl(&last, parent_path.dentry, 0);
> + d = start_dirop(parent_path.dentry, &last, 0);
> if (IS_ERR(d))
> - goto unlock;
> + goto drop;
> if (error)
> goto fail;
> path->dentry = no_free_ptr(parent_path.dentry);
> @@ -2792,10 +2854,9 @@ static struct dentry *__start_removing_path(int dfd, struct filename *name,
> return d;
>
> fail:
> - dput(d);
> + end_dirop(d);
> d = ERR_PTR(error);
> -unlock:
> - inode_unlock(parent_path.dentry->d_inode);
> +drop:
> if (!error)
> mnt_drop_write(parent_path.mnt);
> return d;
> @@ -2910,7 +2971,7 @@ int vfs_path_lookup(struct dentry *dentry, struct vfsmount *mnt,
> }
> EXPORT_SYMBOL(vfs_path_lookup);
>
> -static int lookup_noperm_common(struct qstr *qname, struct dentry *base)
> +int lookup_noperm_common(struct qstr *qname, struct dentry *base)
> {
> const char *name = qname->name;
> u32 len = qname->len;
> @@ -4223,21 +4284,18 @@ static struct dentry *filename_create(int dfd, struct filename *name,
> */
> if (last.name[last.len] && !want_dir)
> create_flags &= ~LOOKUP_CREATE;
> - inode_lock_nested(path->dentry->d_inode, I_MUTEX_PARENT);
> - dentry = lookup_one_qstr_excl(&last, path->dentry,
> - reval_flag | create_flags);
> + dentry = start_dirop(path->dentry, &last, reval_flag | create_flags);
> if (IS_ERR(dentry))
> - goto unlock;
> + goto out_drop_write;
>
> if (unlikely(error))
> goto fail;
>
> return dentry;
> fail:
> - dput(dentry);
> + end_dirop(dentry);
> dentry = ERR_PTR(error);
> -unlock:
> - inode_unlock(path->dentry->d_inode);
> +out_drop_write:
> if (!error)
> mnt_drop_write(path->mnt);
> out:
> @@ -4258,9 +4316,7 @@ EXPORT_SYMBOL(start_creating_path);
>
> void end_creating_path(struct path *path, struct dentry *dentry)
> {
> - if (!IS_ERR(dentry))
> - dput(dentry);
> - inode_unlock(path->dentry->d_inode);
> + end_dirop_mkdir(dentry, path->dentry);
I think it is better to open code end_dirop_mkdir()
here and document semantics of end_creating_path()
Yes, when you remove the parent lock there will be one
more place to remove a parent lock, but that place is in fs/namei.c
not in filesystems code, so I think that's reasonable.
> mnt_drop_write(path->mnt);
> path_put(path);
> }
> @@ -4592,8 +4648,7 @@ int do_rmdir(int dfd, struct filename *name)
> if (error)
> goto exit2;
>
> - inode_lock_nested(path.dentry->d_inode, I_MUTEX_PARENT);
> - dentry = lookup_one_qstr_excl(&last, path.dentry, lookup_flags);
> + dentry = start_dirop(path.dentry, &last, lookup_flags);
> error = PTR_ERR(dentry);
> if (IS_ERR(dentry))
> goto exit3;
> @@ -4602,9 +4657,8 @@ int do_rmdir(int dfd, struct filename *name)
> goto exit4;
> error = vfs_rmdir(mnt_idmap(path.mnt), path.dentry->d_inode, dentry);
> exit4:
> - dput(dentry);
> + end_dirop(dentry);
> exit3:
> - inode_unlock(path.dentry->d_inode);
> mnt_drop_write(path.mnt);
> exit2:
> path_put(&path);
> @@ -4705,7 +4759,6 @@ int do_unlinkat(int dfd, struct filename *name)
> struct path path;
> struct qstr last;
> int type;
> - struct inode *inode = NULL;
> struct inode *delegated_inode = NULL;
> unsigned int lookup_flags = 0;
> retry:
> @@ -4721,14 +4774,19 @@ int do_unlinkat(int dfd, struct filename *name)
> if (error)
> goto exit2;
> retry_deleg:
> - inode_lock_nested(path.dentry->d_inode, I_MUTEX_PARENT);
> - dentry = lookup_one_qstr_excl(&last, path.dentry, lookup_flags);
> + dentry = start_dirop(path.dentry, &last, lookup_flags);
> error = PTR_ERR(dentry);
Maybe it's just me, but possibly
if (IS_ERR(dentry))
goto exit_drop_write;
Could make this code look a bit easier to follow, because...
> if (!IS_ERR(dentry)) {
> + struct inode *inode = NULL;
>
> /* Why not before? Because we want correct error value */
> - if (last.name[last.len])
> - goto slashes;
> + if (unlikely(last.name[last.len])) {
> + if (d_is_dir(dentry))
> + error = -EISDIR;
> + else
> + error = -ENOTDIR;
> + goto exit3;
> + }
> inode = dentry->d_inode;
> ihold(inode);
> error = security_path_unlink(&path, dentry);
> @@ -4737,12 +4795,10 @@ int do_unlinkat(int dfd, struct filename *name)
> error = vfs_unlink(mnt_idmap(path.mnt), path.dentry->d_inode,
> dentry, &delegated_inode);
> exit3:
> - dput(dentry);
> + end_dirop(dentry);
> + if (inode)
> + iput(inode); /* truncate the inode here */
> }
The exit3 goto label inside the conditional scope looks
unconventional and feels wrong.
If you do not agree, feel free to ignore this comment.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH 02/11] VFS: introduce start_dirop() and end_dirop()
2025-09-26 16:41 ` Amir Goldstein
@ 2025-09-27 11:32 ` NeilBrown
0 siblings, 0 replies; 49+ messages in thread
From: NeilBrown @ 2025-09-27 11:32 UTC (permalink / raw)
To: Amir Goldstein
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Sat, 27 Sep 2025, Amir Goldstein wrote:
> On Fri, Sep 26, 2025 at 4:50 AM NeilBrown <neilb@ownmail.net> wrote:
> >
> > From: NeilBrown <neil@brown.name>
> >
> > The fact that directory operations (create,remove,rename) are protected
> > by a lock on the parent is known widely throughout the kernel.
> > In order to change this - to instead lock the target dentry - it is
> > best to centralise this knowledge so it can be changed in one place.
> >
> > This patch introduces start_dirop() which is local to VFS code.
> > It performs the required locking for create and remove. Rename
> > will be handled separately.
> >
> > Various functions with names like start_creating() or start_removing_path(),
> > some of which already exist, will export this functionality beyond the VFS.
> >
> > end_dirop() is the partner of start_dirop(). It drops the lock and
> > releases the reference on the dentry.
> > It *is* exported so that various end_creating etc functions can be inline.
> >
>
> --- from here
> > As vfs_mkdir() drops the dentry on error we cannot use end_dirop() as
> > that won't unlock when the dentry IS_ERR(). For those cases we have
> > end_dirop_mkdir().
> >
> > end_dirop() can always be called on the result of start_dirop(), but not
> > after vfs_mkdir().
> > end_dirop_mkdir() can only be called on the result of start_dirop() if
> > that was not an error, and can also be called on the result of
> > vfs_mkdir().
> >
> > When we change vfs_mkdir() to drop the lock when it drops the dentry,
> > end_dirop_mkdir() can be discarded.
> ---until here
>
> I am really struggling swallowing end_dirop_mkdir() as a temp helper
> It's has such fluid and weird semantics nobody has a chance to
> remember or guess, it is scheduled to be removed, and it is
> only used in two helpers end_creating_path() and end_creating()
> (right?) both of which have perfectly understandable and normal
> semantics.
>
> So how about we stop pretending that end_dirop_mkdir() is a sane
> abstraction and open code it twice inside the helpers where
> the code makes sense and is well documented?
>
> Am I missing some subtlety? or missing the bigger picture?
No, I don't think you are missing anything important.
end_dirop_mkdir() was useful scaffolding for me was I was writing the
code, but now that I look back prompted by you I see that it doesn't add
anything to the final product. I'll remove it.
>
> >
> > As well as adding start_dirop() and end_dirop()/end_dirop_mkdir()
> > this patch uses them in:
> > - simple_start_creating (which requires sharing lookup_noperm_common()
> > with libfs.c)
> > - start_removing_path / start_removing_user_path_at
> > - filename_create / end_creating_path()
> > - do_rmdir(), do_unlinkat()
> >
> > The change in do_unlinkat() opens the opportunity for some cleanup.
> > As we don't need to unlock on lookup failure, "inode" can be local
> > to the non-error patch. Also the "slashes" handler is moved
> > in-line with an "unlikely" annotation on the branch.
> >
> > Signed-off-by: NeilBrown <neil@brown.name>
> > ---
> > fs/internal.h | 3 ++
> > fs/libfs.c | 36 ++++++-------
> > fs/namei.c | 126 +++++++++++++++++++++++++++++++--------------
> > include/linux/fs.h | 3 ++
> > 4 files changed, 110 insertions(+), 58 deletions(-)
> >
> > diff --git a/fs/internal.h b/fs/internal.h
> > index a33d18ee5b74..d11fe787bbc1 100644
> > --- a/fs/internal.h
> > +++ b/fs/internal.h
> > @@ -67,6 +67,9 @@ int vfs_tmpfile(struct mnt_idmap *idmap,
> > const struct path *parentpath,
> > struct file *file, umode_t mode);
> > struct dentry *d_hash_and_lookup(struct dentry *, struct qstr *);
> > +struct dentry *start_dirop(struct dentry *parent, struct qstr *name,
> > + unsigned int lookup_flags);
> > +int lookup_noperm_common(struct qstr *qname, struct dentry *base);
> >
> > /*
> > * namespace.c
> > diff --git a/fs/libfs.c b/fs/libfs.c
> > index ce8c496a6940..fc979becd536 100644
> > --- a/fs/libfs.c
> > +++ b/fs/libfs.c
> > @@ -2289,27 +2289,25 @@ void stashed_dentry_prune(struct dentry *dentry)
> > cmpxchg(stashed, dentry, NULL);
> > }
> >
> > -/* parent must be held exclusive */
> > +/**
> > + * simple_start_creating - prepare to create a given name
> > + * @parent - directory in which to prepare to create the name
> > + * @name - the name to be created
> > + *
> > + * Required lock is taken and a lookup in performed prior to creating an
> > + * object in a directory. No permission checking is performed.
> > + *
> > + * Returns: a negative dentry on which vfs_create() or similar may
> > + * be attempted, or an error.
> > + */
> > struct dentry *simple_start_creating(struct dentry *parent, const char *name)
> > {
> > - struct dentry *dentry;
> > - struct inode *dir = d_inode(parent);
> > + struct qstr qname = QSTR(name);
> > + int err;
> >
> > - inode_lock(dir);
> > - if (unlikely(IS_DEADDIR(dir))) {
> > - inode_unlock(dir);
> > - return ERR_PTR(-ENOENT);
> > - }
> > - dentry = lookup_noperm(&QSTR(name), parent);
> > - if (IS_ERR(dentry)) {
> > - inode_unlock(dir);
> > - return dentry;
> > - }
> > - if (dentry->d_inode) {
> > - dput(dentry);
> > - inode_unlock(dir);
> > - return ERR_PTR(-EEXIST);
> > - }
> > - return dentry;
> > + err = lookup_noperm_common(&qname, parent);
> > + if (err)
> > + return ERR_PTR(err);
> > + return start_dirop(parent, &qname, LOOKUP_CREATE | LOOKUP_EXCL);
> > }
> > EXPORT_SYMBOL(simple_start_creating);
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 507ca0d7878d..81cbaabbbe21 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -2765,6 +2765,69 @@ static int filename_parentat(int dfd, struct filename *name,
> > return __filename_parentat(dfd, name, flags, parent, last, type, NULL);
> > }
> >
> > +/**
> > + * start_dirop - begin a create or remove dirop, performing locking and lookup
> > + * @parent - the dentry of the parent in which the operation will occur
> > + * @name - a qstr holding the name within that parent
> > + * @lookup_flags - intent and other lookup flags.
> > + *
> > + * The lookup is performed and necessary locks are taken so that, on success,
>
> typo: necessary
>
> > + * the returned dentry can be operated on safely.
> > + * The qstr must already have the hash value calculated.
> > + *
> > + * Returns: a locked dentry, or an error.
> > + *
> > + */
> > +struct dentry *start_dirop(struct dentry *parent, struct qstr *name,
> > + unsigned int lookup_flags)
> > +{
> > + struct dentry *dentry;
> > + struct inode *dir = d_inode(parent);
> > +
> > + inode_lock_nested(dir, I_MUTEX_PARENT);
> > + dentry = lookup_one_qstr_excl(name, parent, lookup_flags);
> > + if (IS_ERR(dentry))
> > + inode_unlock(dir);
> > + return dentry;
> > +}
> > +
> > +/**
> > + * end_dirop - signal completion of a dirop
> > + * @de - the dentry which was returned by start_dirop or similar.
> > + *
> > + * If the de is an error, nothing happens. Otherwise any lock taken to
> > + * protect the dentry is dropped and the dentry itself is release (dput()).
> > + */
> > +void end_dirop(struct dentry *de)
> > +{
> > + if (!IS_ERR(de)) {
> > + inode_unlock(de->d_parent->d_inode);
> > + dput(de);
> > + }
> > +}
> > +EXPORT_SYMBOL(end_dirop);
> > +
> > +/**
> > + * end_dirop_mkdir - signal completion of a dirop which could have been vfs_mkdir
> > + * @de - the dentry which was returned by start_dirop or similar.
> > + * @parent - the parent in which the mkdir happened.
> > + *
> > + * Because vfs_mkdir() dput()s the dentry on failure, end_dirop() cannot
> > + * be used with it. Instead this function must be used, and it must not
> > + * be called if the original lookup failed.
> > + *
> > + * If de is an error the parent is unlocked, else this behaves the same as
> > + * end_dirop().
> > + */
> > +void end_dirop_mkdir(struct dentry *de, struct dentry *parent)
> > +{
> > + if (IS_ERR(de))
> > + inode_unlock(parent->d_inode);
> > + else
> > + end_dirop(de);
> > +}
> > +EXPORT_SYMBOL(end_dirop_mkdir);
> > +
> > /* does lookup, returns the object with parent locked */
> > static struct dentry *__start_removing_path(int dfd, struct filename *name,
> > struct path *path)
> > @@ -2781,10 +2844,9 @@ static struct dentry *__start_removing_path(int dfd, struct filename *name,
> > return ERR_PTR(-EINVAL);
> > /* don't fail immediately if it's r/o, at least try to report other errors */
> > error = mnt_want_write(parent_path.mnt);
> > - inode_lock_nested(parent_path.dentry->d_inode, I_MUTEX_PARENT);
> > - d = lookup_one_qstr_excl(&last, parent_path.dentry, 0);
> > + d = start_dirop(parent_path.dentry, &last, 0);
> > if (IS_ERR(d))
> > - goto unlock;
> > + goto drop;
> > if (error)
> > goto fail;
> > path->dentry = no_free_ptr(parent_path.dentry);
> > @@ -2792,10 +2854,9 @@ static struct dentry *__start_removing_path(int dfd, struct filename *name,
> > return d;
> >
> > fail:
> > - dput(d);
> > + end_dirop(d);
> > d = ERR_PTR(error);
> > -unlock:
> > - inode_unlock(parent_path.dentry->d_inode);
> > +drop:
> > if (!error)
> > mnt_drop_write(parent_path.mnt);
> > return d;
> > @@ -2910,7 +2971,7 @@ int vfs_path_lookup(struct dentry *dentry, struct vfsmount *mnt,
> > }
> > EXPORT_SYMBOL(vfs_path_lookup);
> >
> > -static int lookup_noperm_common(struct qstr *qname, struct dentry *base)
> > +int lookup_noperm_common(struct qstr *qname, struct dentry *base)
> > {
> > const char *name = qname->name;
> > u32 len = qname->len;
> > @@ -4223,21 +4284,18 @@ static struct dentry *filename_create(int dfd, struct filename *name,
> > */
> > if (last.name[last.len] && !want_dir)
> > create_flags &= ~LOOKUP_CREATE;
> > - inode_lock_nested(path->dentry->d_inode, I_MUTEX_PARENT);
> > - dentry = lookup_one_qstr_excl(&last, path->dentry,
> > - reval_flag | create_flags);
> > + dentry = start_dirop(path->dentry, &last, reval_flag | create_flags);
> > if (IS_ERR(dentry))
> > - goto unlock;
> > + goto out_drop_write;
> >
> > if (unlikely(error))
> > goto fail;
> >
> > return dentry;
> > fail:
> > - dput(dentry);
> > + end_dirop(dentry);
> > dentry = ERR_PTR(error);
> > -unlock:
> > - inode_unlock(path->dentry->d_inode);
> > +out_drop_write:
> > if (!error)
> > mnt_drop_write(path->mnt);
> > out:
> > @@ -4258,9 +4316,7 @@ EXPORT_SYMBOL(start_creating_path);
> >
> > void end_creating_path(struct path *path, struct dentry *dentry)
> > {
> > - if (!IS_ERR(dentry))
> > - dput(dentry);
> > - inode_unlock(path->dentry->d_inode);
> > + end_dirop_mkdir(dentry, path->dentry);
>
> I think it is better to open code end_dirop_mkdir()
> here and document semantics of end_creating_path()
> Yes, when you remove the parent lock there will be one
> more place to remove a parent lock, but that place is in fs/namei.c
> not in filesystems code, so I think that's reasonable.
>
> > mnt_drop_write(path->mnt);
> > path_put(path);
> > }
> > @@ -4592,8 +4648,7 @@ int do_rmdir(int dfd, struct filename *name)
> > if (error)
> > goto exit2;
> >
> > - inode_lock_nested(path.dentry->d_inode, I_MUTEX_PARENT);
> > - dentry = lookup_one_qstr_excl(&last, path.dentry, lookup_flags);
> > + dentry = start_dirop(path.dentry, &last, lookup_flags);
> > error = PTR_ERR(dentry);
> > if (IS_ERR(dentry))
> > goto exit3;
> > @@ -4602,9 +4657,8 @@ int do_rmdir(int dfd, struct filename *name)
> > goto exit4;
> > error = vfs_rmdir(mnt_idmap(path.mnt), path.dentry->d_inode, dentry);
> > exit4:
> > - dput(dentry);
> > + end_dirop(dentry);
> > exit3:
> > - inode_unlock(path.dentry->d_inode);
> > mnt_drop_write(path.mnt);
> > exit2:
> > path_put(&path);
> > @@ -4705,7 +4759,6 @@ int do_unlinkat(int dfd, struct filename *name)
> > struct path path;
> > struct qstr last;
> > int type;
> > - struct inode *inode = NULL;
> > struct inode *delegated_inode = NULL;
> > unsigned int lookup_flags = 0;
> > retry:
> > @@ -4721,14 +4774,19 @@ int do_unlinkat(int dfd, struct filename *name)
> > if (error)
> > goto exit2;
> > retry_deleg:
> > - inode_lock_nested(path.dentry->d_inode, I_MUTEX_PARENT);
> > - dentry = lookup_one_qstr_excl(&last, path.dentry, lookup_flags);
> > + dentry = start_dirop(path.dentry, &last, lookup_flags);
> > error = PTR_ERR(dentry);
>
> Maybe it's just me, but possibly
>
> if (IS_ERR(dentry))
> goto exit_drop_write;
>
> Could make this code look a bit easier to follow, because...
Maybe... but that much refactoring really needs to be in a separate
patch. I was a bit uncomfortable with how much I did already.
Maybe I'll move it all to a follow-up patch.
>
> > if (!IS_ERR(dentry)) {
> > + struct inode *inode = NULL;
> >
> > /* Why not before? Because we want correct error value */
> > - if (last.name[last.len])
> > - goto slashes;
> > + if (unlikely(last.name[last.len])) {
> > + if (d_is_dir(dentry))
> > + error = -EISDIR;
> > + else
> > + error = -ENOTDIR;
> > + goto exit3;
> > + }
> > inode = dentry->d_inode;
> > ihold(inode);
> > error = security_path_unlink(&path, dentry);
> > @@ -4737,12 +4795,10 @@ int do_unlinkat(int dfd, struct filename *name)
> > error = vfs_unlink(mnt_idmap(path.mnt), path.dentry->d_inode,
> > dentry, &delegated_inode);
> > exit3:
> > - dput(dentry);
> > + end_dirop(dentry);
> > + if (inode)
> > + iput(inode); /* truncate the inode here */
> > }
>
> The exit3 goto label inside the conditional scope looks
> unconventional and feels wrong.
>
> If you do not agree, feel free to ignore this comment.
Thanks a lot,
NeilBrown
^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH 03/11] VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating()
2025-09-26 2:49 [PATCH 00/11] Create APIs to centralise locking for directory ops NeilBrown
2025-09-26 2:49 ` [PATCH 01/11] debugfs: rename end_creating() to debugfs_end_creating() NeilBrown
2025-09-26 2:49 ` [PATCH 02/11] VFS: introduce start_dirop() and end_dirop() NeilBrown
@ 2025-09-26 2:49 ` NeilBrown
2025-09-29 12:37 ` Jeff Layton
2025-09-30 8:54 ` Amir Goldstein
2025-09-26 2:49 ` [PATCH 04/11] VFS/nfsd/cachefiles/ovl: introduce start_removing() and end_removing() NeilBrown
` (8 subsequent siblings)
11 siblings, 2 replies; 49+ messages in thread
From: NeilBrown @ 2025-09-26 2:49 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
start_creating() is similar to simple_start_creating() but is not so
simple.
It takes a qstr for the name, includes permission checking, and does NOT
report an error if the name already exists, returning a positive dentry
instead.
This is currently used by nfsd, cachefiles, and overlayfs.
end_creating() is called after the dentry has been used.
end_creating() drops the reference to the dentry as it is generally no
longer needed. This is exactly end_dirop_mkdir(),
but using that everywhere looks a bit odd...
These calls help encapsulate locking rules so that directory locking can
be changed.
Occasionally this change means that the parent lock is held for a
shorter period of time, for example in cachefiles_commit_tmpfile().
As this function now unlocks after an unlink and before the following
lookup, it is possible that the lookup could again find a positive
dentry, so a while loop is introduced there.
In overlayfs the ovl_lookup_temp() function has ovl_tempname()
split out to be used in ovl_start_creating_temp(). The other use
of ovl_lookup_temp() is preparing for a rename. When rename handling
is updated, ovl_lookup_temp() will be removed.
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/cachefiles/namei.c | 37 ++++++++--------
fs/namei.c | 27 ++++++++++++
fs/nfsd/nfs3proc.c | 14 +++---
fs/nfsd/nfs4proc.c | 14 +++---
fs/nfsd/nfs4recover.c | 16 +++----
fs/nfsd/nfsproc.c | 11 +++--
fs/nfsd/vfs.c | 42 +++++++-----------
fs/overlayfs/copy_up.c | 19 ++++----
fs/overlayfs/dir.c | 94 ++++++++++++++++++++++++----------------
fs/overlayfs/overlayfs.h | 8 ++++
fs/overlayfs/super.c | 32 +++++++-------
include/linux/namei.h | 18 ++++++++
12 files changed, 187 insertions(+), 145 deletions(-)
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index d1edb2ac3837..965b22b2f58d 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -93,12 +93,11 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
_enter(",,%s", dirname);
/* search the current directory for the element name */
- inode_lock_nested(d_inode(dir), I_MUTEX_PARENT);
retry:
ret = cachefiles_inject_read_error();
if (ret == 0)
- subdir = lookup_one(&nop_mnt_idmap, &QSTR(dirname), dir);
+ subdir = start_creating(&nop_mnt_idmap, dir, &QSTR(dirname));
else
subdir = ERR_PTR(ret);
trace_cachefiles_lookup(NULL, dir, subdir);
@@ -141,7 +140,7 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
trace_cachefiles_mkdir(dir, subdir);
if (unlikely(d_unhashed(subdir) || d_is_negative(subdir))) {
- dput(subdir);
+ end_creating(subdir, dir);
goto retry;
}
ASSERT(d_backing_inode(subdir));
@@ -154,7 +153,8 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
/* Tell rmdir() it's not allowed to delete the subdir */
inode_lock(d_inode(subdir));
- inode_unlock(d_inode(dir));
+ dget(subdir);
+ end_creating(subdir, dir);
if (!__cachefiles_mark_inode_in_use(NULL, d_inode(subdir))) {
pr_notice("cachefiles: Inode already in use: %pd (B=%lx)\n",
@@ -196,14 +196,11 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
return ERR_PTR(-EBUSY);
mkdir_error:
- inode_unlock(d_inode(dir));
- if (!IS_ERR(subdir))
- dput(subdir);
+ end_creating(subdir, dir);
pr_err("mkdir %s failed with error %d\n", dirname, ret);
return ERR_PTR(ret);
lookup_error:
- inode_unlock(d_inode(dir));
ret = PTR_ERR(subdir);
pr_err("Lookup %s failed with error %d\n", dirname, ret);
return ERR_PTR(ret);
@@ -679,36 +676,37 @@ bool cachefiles_commit_tmpfile(struct cachefiles_cache *cache,
_enter(",%pD", object->file);
- inode_lock_nested(d_inode(fan), I_MUTEX_PARENT);
ret = cachefiles_inject_read_error();
if (ret == 0)
- dentry = lookup_one(&nop_mnt_idmap, &QSTR(object->d_name), fan);
+ dentry = start_creating(&nop_mnt_idmap, fan, &QSTR(object->d_name));
else
dentry = ERR_PTR(ret);
if (IS_ERR(dentry)) {
trace_cachefiles_vfs_error(object, d_inode(fan), PTR_ERR(dentry),
cachefiles_trace_lookup_error);
_debug("lookup fail %ld", PTR_ERR(dentry));
- goto out_unlock;
+ goto out;
}
- if (!d_is_negative(dentry)) {
+ while (!d_is_negative(dentry)) {
ret = cachefiles_unlink(volume->cache, object, fan, dentry,
FSCACHE_OBJECT_IS_STALE);
if (ret < 0)
- goto out_dput;
+ goto out_end;
+
+ end_creating(dentry, fan);
- dput(dentry);
ret = cachefiles_inject_read_error();
if (ret == 0)
- dentry = lookup_one(&nop_mnt_idmap, &QSTR(object->d_name), fan);
+ dentry = start_creating(&nop_mnt_idmap, fan,
+ &QSTR(object->d_name));
else
dentry = ERR_PTR(ret);
if (IS_ERR(dentry)) {
trace_cachefiles_vfs_error(object, d_inode(fan), PTR_ERR(dentry),
cachefiles_trace_lookup_error);
_debug("lookup fail %ld", PTR_ERR(dentry));
- goto out_unlock;
+ goto out;
}
}
@@ -729,10 +727,9 @@ bool cachefiles_commit_tmpfile(struct cachefiles_cache *cache,
success = true;
}
-out_dput:
- dput(dentry);
-out_unlock:
- inode_unlock(d_inode(fan));
+out_end:
+ end_creating(dentry, fan);
+out:
_leave(" = %u", success);
return success;
}
diff --git a/fs/namei.c b/fs/namei.c
index 81cbaabbbe21..064cb44a3a46 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3242,6 +3242,33 @@ struct dentry *lookup_noperm_positive_unlocked(struct qstr *name,
}
EXPORT_SYMBOL(lookup_noperm_positive_unlocked);
+/**
+ * start_creating - prepare to create a given name with permission checking
+ * @idmap - idmap of the mount
+ * @parent - directory in which to prepare to create the name
+ * @name - the name to be created
+ *
+ * Locks are taken and a lookup in performed prior to creating
+ * an object in a directory. Permission checking (MAY_EXEC) is performed
+ * against @idmap.
+ *
+ * If the name already exists, a positive dentry is returned, so
+ * behaviour is similar to O_CREAT without O_EXCL, which doesn't fail
+ * with -EEXIST.
+ *
+ * Returns: a negative or positive dentry, or an error.
+ */
+struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *parent,
+ struct qstr *name)
+{
+ int err = lookup_one_common(idmap, name, parent);
+
+ if (err)
+ return ERR_PTR(err);
+ return start_dirop(parent, name, LOOKUP_CREATE);
+}
+EXPORT_SYMBOL(start_creating);
+
#ifdef CONFIG_UNIX98_PTYS
int path_pts(struct path *path)
{
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
index b6d03e1ef5f7..e2aac0def2cb 100644
--- a/fs/nfsd/nfs3proc.c
+++ b/fs/nfsd/nfs3proc.c
@@ -281,14 +281,11 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
if (host_err)
return nfserrno(host_err);
- inode_lock_nested(inode, I_MUTEX_PARENT);
-
- child = lookup_one(&nop_mnt_idmap,
- &QSTR_LEN(argp->name, argp->len),
- parent);
+ child = start_creating(&nop_mnt_idmap, parent,
+ &QSTR_LEN(argp->name, argp->len));
if (IS_ERR(child)) {
status = nfserrno(PTR_ERR(child));
- goto out;
+ goto out_write;
}
if (d_really_is_negative(child)) {
@@ -367,9 +364,8 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
status = nfsd_create_setattr(rqstp, fhp, resfhp, &attrs);
out:
- inode_unlock(inode);
- if (child && !IS_ERR(child))
- dput(child);
+ end_creating(child, parent);
+out_write:
fh_drop_write(fhp);
return status;
}
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 71b428efcbb5..35d48221072f 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -264,14 +264,11 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
if (is_create_with_attrs(open))
nfsd4_acl_to_attr(NF4REG, open->op_acl, &attrs);
- inode_lock_nested(inode, I_MUTEX_PARENT);
-
- child = lookup_one(&nop_mnt_idmap,
- &QSTR_LEN(open->op_fname, open->op_fnamelen),
- parent);
+ child = start_creating(&nop_mnt_idmap, parent,
+ &QSTR_LEN(open->op_fname, open->op_fnamelen));
if (IS_ERR(child)) {
status = nfserrno(PTR_ERR(child));
- goto out;
+ goto out_write;
}
if (d_really_is_negative(child)) {
@@ -379,10 +376,9 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
if (attrs.na_aclerr)
open->op_bmval[0] &= ~FATTR4_WORD0_ACL;
out:
- inode_unlock(inode);
+ end_creating(child, parent);
nfsd_attrs_free(&attrs);
- if (child && !IS_ERR(child))
- dput(child);
+out_write:
fh_drop_write(fhp);
return status;
}
diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
index 2231192ec33f..93b2a3e764db 100644
--- a/fs/nfsd/nfs4recover.c
+++ b/fs/nfsd/nfs4recover.c
@@ -216,13 +216,11 @@ nfsd4_create_clid_dir(struct nfs4_client *clp)
goto out_creds;
dir = nn->rec_file->f_path.dentry;
- /* lock the parent */
- inode_lock(d_inode(dir));
- dentry = lookup_one(&nop_mnt_idmap, &QSTR(dname), dir);
+ dentry = start_creating(&nop_mnt_idmap, dir, &QSTR(dname));
if (IS_ERR(dentry)) {
status = PTR_ERR(dentry);
- goto out_unlock;
+ goto out;
}
if (d_really_is_positive(dentry))
/*
@@ -233,15 +231,13 @@ nfsd4_create_clid_dir(struct nfs4_client *clp)
* In the 4.0 case, we should never get here; but we may
* as well be forgiving and just succeed silently.
*/
- goto out_put;
+ goto out_end;
dentry = vfs_mkdir(&nop_mnt_idmap, d_inode(dir), dentry, S_IRWXU);
if (IS_ERR(dentry))
status = PTR_ERR(dentry);
-out_put:
- if (!status)
- dput(dentry);
-out_unlock:
- inode_unlock(d_inode(dir));
+out_end:
+ end_creating(dentry, dir);
+out:
if (status == 0) {
if (nn->in_grace)
__nfsd4_create_reclaim_record_grace(clp, dname,
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
index 8f71f5748c75..ee1b16e921fd 100644
--- a/fs/nfsd/nfsproc.c
+++ b/fs/nfsd/nfsproc.c
@@ -306,18 +306,16 @@ nfsd_proc_create(struct svc_rqst *rqstp)
goto done;
}
- inode_lock_nested(dirfhp->fh_dentry->d_inode, I_MUTEX_PARENT);
- dchild = lookup_one(&nop_mnt_idmap, &QSTR_LEN(argp->name, argp->len),
- dirfhp->fh_dentry);
+ dchild = start_creating(&nop_mnt_idmap, dirfhp->fh_dentry,
+ &QSTR_LEN(argp->name, argp->len));
if (IS_ERR(dchild)) {
resp->status = nfserrno(PTR_ERR(dchild));
- goto out_unlock;
+ goto out_write;
}
fh_init(newfhp, NFS_FHSIZE);
resp->status = fh_compose(newfhp, dirfhp->fh_export, dchild, dirfhp);
if (!resp->status && d_really_is_negative(dchild))
resp->status = nfserr_noent;
- dput(dchild);
if (resp->status) {
if (resp->status != nfserr_noent)
goto out_unlock;
@@ -423,7 +421,8 @@ nfsd_proc_create(struct svc_rqst *rqstp)
}
out_unlock:
- inode_unlock(dirfhp->fh_dentry->d_inode);
+ end_creating(dchild, dirfhp->fh_dentry);
+out_write:
fh_drop_write(dirfhp);
done:
fh_put(dirfhp);
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index aa4a95713a48..90c830c59c60 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1605,19 +1605,16 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
if (host_err)
return nfserrno(host_err);
- inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT);
- dchild = lookup_one(&nop_mnt_idmap, &QSTR_LEN(fname, flen), dentry);
+ dchild = start_creating(&nop_mnt_idmap, dentry, &QSTR_LEN(fname, flen));
host_err = PTR_ERR(dchild);
- if (IS_ERR(dchild)) {
- err = nfserrno(host_err);
- goto out_unlock;
- }
+ if (IS_ERR(dchild))
+ return nfserrno(host_err);
+
err = fh_compose(resfhp, fhp->fh_export, dchild, fhp);
/*
* We unconditionally drop our ref to dchild as fh_compose will have
* already grabbed its own ref for it.
*/
- dput(dchild);
if (err)
goto out_unlock;
err = fh_fill_pre_attrs(fhp);
@@ -1626,7 +1623,7 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
err = nfsd_create_locked(rqstp, fhp, attrs, type, rdev, resfhp);
fh_fill_post_attrs(fhp);
out_unlock:
- inode_unlock(dentry->d_inode);
+ end_creating(dchild, dentry);
return err;
}
@@ -1712,11 +1709,9 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
}
dentry = fhp->fh_dentry;
- inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT);
- dnew = lookup_one(&nop_mnt_idmap, &QSTR_LEN(fname, flen), dentry);
+ dnew = start_creating(&nop_mnt_idmap, dentry, &QSTR_LEN(fname, flen));
if (IS_ERR(dnew)) {
err = nfserrno(PTR_ERR(dnew));
- inode_unlock(dentry->d_inode);
goto out_drop_write;
}
err = fh_fill_pre_attrs(fhp);
@@ -1729,11 +1724,11 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
nfsd_create_setattr(rqstp, fhp, resfhp, attrs);
fh_fill_post_attrs(fhp);
out_unlock:
- inode_unlock(dentry->d_inode);
+ end_creating(dnew, dentry);
if (!err)
err = nfserrno(commit_metadata(fhp));
- dput(dnew);
- if (err==0) err = cerr;
+ if (!err)
+ err = cerr;
out_drop_write:
fh_drop_write(fhp);
out:
@@ -1788,32 +1783,31 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
ddir = ffhp->fh_dentry;
dirp = d_inode(ddir);
- inode_lock_nested(dirp, I_MUTEX_PARENT);
+ dnew = start_creating(&nop_mnt_idmap, ddir, &QSTR_LEN(name, len));
- dnew = lookup_one(&nop_mnt_idmap, &QSTR_LEN(name, len), ddir);
if (IS_ERR(dnew)) {
host_err = PTR_ERR(dnew);
- goto out_unlock;
+ goto out_drop_write;
}
dold = tfhp->fh_dentry;
err = nfserr_noent;
if (d_really_is_negative(dold))
- goto out_dput;
+ goto out_unlock;
err = fh_fill_pre_attrs(ffhp);
if (err != nfs_ok)
- goto out_dput;
+ goto out_unlock;
host_err = vfs_link(dold, &nop_mnt_idmap, dirp, dnew, NULL);
fh_fill_post_attrs(ffhp);
- inode_unlock(dirp);
+out_unlock:
+ end_creating(dnew, ddir);
if (!host_err) {
host_err = commit_metadata(ffhp);
if (!host_err)
host_err = commit_metadata(tfhp);
}
- dput(dnew);
out_drop_write:
fh_drop_write(tfhp);
if (host_err == -EBUSY) {
@@ -1828,12 +1822,6 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
}
out:
return err != nfs_ok ? err : nfserrno(host_err);
-
-out_dput:
- dput(dnew);
-out_unlock:
- inode_unlock(dirp);
- goto out_drop_write;
}
static void
diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 27396fe63f6d..6a31ea34ff80 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -613,9 +613,9 @@ static int ovl_link_up(struct ovl_copy_up_ctx *c)
if (err)
goto out;
- inode_lock_nested(udir, I_MUTEX_PARENT);
- upper = ovl_lookup_upper(ofs, c->dentry->d_name.name, upperdir,
- c->dentry->d_name.len);
+ upper = ovl_start_creating_upper(ofs, upperdir,
+ &QSTR_LEN(c->dentry->d_name.name,
+ c->dentry->d_name.len));
err = PTR_ERR(upper);
if (!IS_ERR(upper)) {
err = ovl_do_link(ofs, ovl_dentry_upper(c->dentry), udir, upper);
@@ -626,9 +626,8 @@ static int ovl_link_up(struct ovl_copy_up_ctx *c)
ovl_dentry_set_upper_alias(c->dentry);
ovl_dentry_update_reval(c->dentry, upper);
}
- dput(upper);
+ end_creating(upper, upperdir);
}
- inode_unlock(udir);
if (err)
goto out;
@@ -894,16 +893,14 @@ static int ovl_copy_up_tmpfile(struct ovl_copy_up_ctx *c)
if (err)
goto out;
- inode_lock_nested(udir, I_MUTEX_PARENT);
-
- upper = ovl_lookup_upper(ofs, c->destname.name, c->destdir,
- c->destname.len);
+ upper = ovl_start_creating_upper(ofs, c->destdir,
+ &QSTR_LEN(c->destname.name,
+ c->destname.len));
err = PTR_ERR(upper);
if (!IS_ERR(upper)) {
err = ovl_do_link(ofs, temp, udir, upper);
- dput(upper);
+ end_creating(upper, c->destdir);
}
- inode_unlock(udir);
if (err)
goto out;
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index dbd63a74df4b..0ae79efbfce7 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -59,15 +59,21 @@ int ovl_cleanup(struct ovl_fs *ofs, struct dentry *workdir,
return 0;
}
-struct dentry *ovl_lookup_temp(struct ovl_fs *ofs, struct dentry *workdir)
+#define OVL_TEMPNAME_SIZE 20
+static void ovl_tempname(char name[OVL_TEMPNAME_SIZE])
{
- struct dentry *temp;
- char name[20];
static atomic_t temp_id = ATOMIC_INIT(0);
/* counter is allowed to wrap, since temp dentries are ephemeral */
- snprintf(name, sizeof(name), "#%x", atomic_inc_return(&temp_id));
+ snprintf(name, OVL_TEMPNAME_SIZE, "#%x", atomic_inc_return(&temp_id));
+}
+
+struct dentry *ovl_lookup_temp(struct ovl_fs *ofs, struct dentry *workdir)
+{
+ struct dentry *temp;
+ char name[OVL_TEMPNAME_SIZE];
+ ovl_tempname(name);
temp = ovl_lookup_upper(ofs, name, workdir, strlen(name));
if (!IS_ERR(temp) && temp->d_inode) {
pr_err("workdir/%s already exists\n", name);
@@ -78,6 +84,16 @@ struct dentry *ovl_lookup_temp(struct ovl_fs *ofs, struct dentry *workdir)
return temp;
}
+static struct dentry *ovl_start_creating_temp(struct ovl_fs *ofs,
+ struct dentry *workdir)
+{
+ char name[OVL_TEMPNAME_SIZE];
+
+ ovl_tempname(name);
+ return start_creating(ovl_upper_mnt_idmap(ofs), workdir,
+ &QSTR(name));
+}
+
static struct dentry *ovl_whiteout(struct ovl_fs *ofs)
{
int err;
@@ -88,35 +104,31 @@ static struct dentry *ovl_whiteout(struct ovl_fs *ofs)
guard(mutex)(&ofs->whiteout_lock);
if (!ofs->whiteout) {
- inode_lock_nested(wdir, I_MUTEX_PARENT);
- whiteout = ovl_lookup_temp(ofs, workdir);
- if (!IS_ERR(whiteout)) {
- err = ovl_do_whiteout(ofs, wdir, whiteout);
- if (err) {
- dput(whiteout);
- whiteout = ERR_PTR(err);
- }
- }
- inode_unlock(wdir);
+ whiteout = ovl_start_creating_temp(ofs, workdir);
if (IS_ERR(whiteout))
return whiteout;
- ofs->whiteout = whiteout;
+ err = ovl_do_whiteout(ofs, wdir, whiteout);
+ if (!err)
+ ofs->whiteout = dget(whiteout);
+ end_creating(whiteout, workdir);
+ if (err)
+ return ERR_PTR(err);
}
if (!ofs->no_shared_whiteout) {
- inode_lock_nested(wdir, I_MUTEX_PARENT);
- whiteout = ovl_lookup_temp(ofs, workdir);
- if (!IS_ERR(whiteout)) {
- err = ovl_do_link(ofs, ofs->whiteout, wdir, whiteout);
- if (err) {
- dput(whiteout);
- whiteout = ERR_PTR(err);
- }
- }
- inode_unlock(wdir);
- if (!IS_ERR(whiteout))
+ struct dentry *ret = NULL;
+
+ whiteout = ovl_start_creating_temp(ofs, workdir);
+ if (IS_ERR(whiteout))
return whiteout;
- if (PTR_ERR(whiteout) != -EMLINK) {
+ err = ovl_do_link(ofs, ofs->whiteout, wdir, whiteout);
+ if (!err)
+ ret = dget(whiteout);
+ end_creating(whiteout, workdir);
+ if (ret)
+ return ret;
+
+ if (err != -EMLINK) {
pr_warn("Failed to link whiteout - disabling whiteout inode sharing(nlink=%u, err=%lu)\n",
ofs->whiteout->d_inode->i_nlink,
PTR_ERR(whiteout));
@@ -225,10 +237,13 @@ struct dentry *ovl_create_temp(struct ovl_fs *ofs, struct dentry *workdir,
struct ovl_cattr *attr)
{
struct dentry *ret;
- inode_lock_nested(workdir->d_inode, I_MUTEX_PARENT);
- ret = ovl_create_real(ofs, workdir,
- ovl_lookup_temp(ofs, workdir), attr);
- inode_unlock(workdir->d_inode);
+ ret = ovl_start_creating_temp(ofs, workdir);
+ if (IS_ERR(ret))
+ return ret;
+ ret = ovl_create_real(ofs, workdir, ret, attr);
+ if (!IS_ERR(ret))
+ dget(ret);
+ end_creating(ret, workdir);
return ret;
}
@@ -327,18 +342,21 @@ static int ovl_create_upper(struct dentry *dentry, struct inode *inode,
{
struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
struct dentry *upperdir = ovl_dentry_upper(dentry->d_parent);
- struct inode *udir = upperdir->d_inode;
struct dentry *newdentry;
int err;
- inode_lock_nested(udir, I_MUTEX_PARENT);
- newdentry = ovl_create_real(ofs, upperdir,
- ovl_lookup_upper(ofs, dentry->d_name.name,
- upperdir, dentry->d_name.len),
- attr);
- inode_unlock(udir);
+ newdentry = ovl_start_creating_upper(ofs, upperdir,
+ &QSTR_LEN(dentry->d_name.name,
+ dentry->d_name.len));
if (IS_ERR(newdentry))
return PTR_ERR(newdentry);
+ newdentry = ovl_create_real(ofs, upperdir, newdentry, attr);
+ if (IS_ERR(newdentry)) {
+ end_creating(newdentry, upperdir);
+ return PTR_ERR(newdentry);
+ }
+ dget(newdentry);
+ end_creating(newdentry, upperdir);
if (ovl_type_merge(dentry->d_parent) && d_is_dir(newdentry) &&
!ovl_allow_offline_changes(ofs)) {
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 4f84abaa0d68..c24c2da953bd 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -415,6 +415,14 @@ static inline struct dentry *ovl_lookup_upper_unlocked(struct ovl_fs *ofs,
&QSTR_LEN(name, len), base);
}
+static inline struct dentry *ovl_start_creating_upper(struct ovl_fs *ofs,
+ struct dentry *parent,
+ struct qstr *name)
+{
+ return start_creating(ovl_upper_mnt_idmap(ofs),
+ parent, name);
+}
+
static inline bool ovl_open_flags_need_copy_up(int flags)
{
if (!flags)
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index bd3d7ba8fb95..67abb62e205b 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -300,8 +300,7 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
bool retried = false;
retry:
- inode_lock_nested(dir, I_MUTEX_PARENT);
- work = ovl_lookup_upper(ofs, name, ofs->workbasedir, strlen(name));
+ work = ovl_start_creating_upper(ofs, ofs->workbasedir, &QSTR(name));
if (!IS_ERR(work)) {
struct iattr attr = {
@@ -310,14 +309,13 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
};
if (work->d_inode) {
+ dget(work);
+ end_creating(work, ofs->workbasedir);
+ if (persist)
+ return work;
err = -EEXIST;
- inode_unlock(dir);
if (retried)
goto out_dput;
-
- if (persist)
- return work;
-
retried = true;
err = ovl_workdir_cleanup(ofs, ofs->workbasedir, mnt, work, 0);
dput(work);
@@ -328,7 +326,9 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
}
work = ovl_do_mkdir(ofs, dir, work, attr.ia_mode);
- inode_unlock(dir);
+ if (!IS_ERR(work))
+ dget(work);
+ end_creating(work, ofs->workbasedir);
err = PTR_ERR(work);
if (IS_ERR(work))
goto out_err;
@@ -366,7 +366,6 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
if (err)
goto out_dput;
} else {
- inode_unlock(dir);
err = PTR_ERR(work);
goto out_err;
}
@@ -616,14 +615,17 @@ static struct dentry *ovl_lookup_or_create(struct ovl_fs *ofs,
struct dentry *parent,
const char *name, umode_t mode)
{
- size_t len = strlen(name);
struct dentry *child;
- inode_lock_nested(parent->d_inode, I_MUTEX_PARENT);
- child = ovl_lookup_upper(ofs, name, parent, len);
- if (!IS_ERR(child) && !child->d_inode)
- child = ovl_create_real(ofs, parent, child, OVL_CATTR(mode));
- inode_unlock(parent->d_inode);
+ child = ovl_start_creating_upper(ofs, parent, &QSTR(name));
+ if (!IS_ERR(child)) {
+ if (!child->d_inode)
+ child = ovl_create_real(ofs, parent, child,
+ OVL_CATTR(mode));
+ if (!IS_ERR(child))
+ dget(child);
+ end_creating(child, parent);
+ }
dput(parent);
return child;
diff --git a/include/linux/namei.h b/include/linux/namei.h
index a7800ef04e76..4cbe930054a1 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -88,6 +88,24 @@ struct dentry *lookup_one_positive_killable(struct mnt_idmap *idmap,
struct qstr *name,
struct dentry *base);
+struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *parent,
+ struct qstr *name);
+
+/* end_creating - finish action started with start_creating
+ * @child - dentry returned by start_creating()
+ * @parent - dentry given to start_creating()
+ *
+ * Unlock and release the child.
+ *
+ * Unlike end_dirop() this can only be called if start_creating() succeeded.
+ * It handles @child being and error as vfs_mkdir() might have converted the
+ * dentry to an error - in that case the parent still needs to be unlocked.
+ */
+static inline void end_creating(struct dentry *child, struct dentry *parent)
+{
+ end_dirop_mkdir(child, parent);
+}
+
extern int follow_down_one(struct path *);
extern int follow_down(struct path *path, unsigned int flags);
extern int follow_up(struct path *);
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 49+ messages in thread* Re: [PATCH 03/11] VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating()
2025-09-26 2:49 ` [PATCH 03/11] VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating() NeilBrown
@ 2025-09-29 12:37 ` Jeff Layton
2025-09-30 5:37 ` NeilBrown
2025-09-30 8:54 ` Amir Goldstein
1 sibling, 1 reply; 49+ messages in thread
From: Jeff Layton @ 2025-09-29 12:37 UTC (permalink / raw)
To: NeilBrown, Alexander Viro, Christian Brauner, Amir Goldstein
Cc: Jan Kara, linux-fsdevel
On Fri, 2025-09-26 at 12:49 +1000, NeilBrown wrote:
> From: NeilBrown <neil@brown.name>
>
> start_creating() is similar to simple_start_creating() but is not so
> simple.
> It takes a qstr for the name, includes permission checking, and does NOT
> report an error if the name already exists, returning a positive dentry
> instead.
>
> This is currently used by nfsd, cachefiles, and overlayfs.
>
> end_creating() is called after the dentry has been used.
> end_creating() drops the reference to the dentry as it is generally no
> longer needed. This is exactly end_dirop_mkdir(),
> but using that everywhere looks a bit odd...
>
> These calls help encapsulate locking rules so that directory locking can
> be changed.
>
> Occasionally this change means that the parent lock is held for a
> shorter period of time, for example in cachefiles_commit_tmpfile().
> As this function now unlocks after an unlink and before the following
> lookup, it is possible that the lookup could again find a positive
> dentry, so a while loop is introduced there.
>
> In overlayfs the ovl_lookup_temp() function has ovl_tempname()
> split out to be used in ovl_start_creating_temp(). The other use
> of ovl_lookup_temp() is preparing for a rename. When rename handling
> is updated, ovl_lookup_temp() will be removed.
>
> Signed-off-by: NeilBrown <neil@brown.name>
> ---
> fs/cachefiles/namei.c | 37 ++++++++--------
> fs/namei.c | 27 ++++++++++++
> fs/nfsd/nfs3proc.c | 14 +++---
> fs/nfsd/nfs4proc.c | 14 +++---
> fs/nfsd/nfs4recover.c | 16 +++----
> fs/nfsd/nfsproc.c | 11 +++--
> fs/nfsd/vfs.c | 42 +++++++-----------
> fs/overlayfs/copy_up.c | 19 ++++----
> fs/overlayfs/dir.c | 94 ++++++++++++++++++++++++----------------
> fs/overlayfs/overlayfs.h | 8 ++++
> fs/overlayfs/super.c | 32 +++++++-------
> include/linux/namei.h | 18 ++++++++
> 12 files changed, 187 insertions(+), 145 deletions(-)
>
> diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
> index d1edb2ac3837..965b22b2f58d 100644
> --- a/fs/cachefiles/namei.c
> +++ b/fs/cachefiles/namei.c
> @@ -93,12 +93,11 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
> _enter(",,%s", dirname);
>
> /* search the current directory for the element name */
> - inode_lock_nested(d_inode(dir), I_MUTEX_PARENT);
>
> retry:
> ret = cachefiles_inject_read_error();
> if (ret == 0)
> - subdir = lookup_one(&nop_mnt_idmap, &QSTR(dirname), dir);
> + subdir = start_creating(&nop_mnt_idmap, dir, &QSTR(dirname));
> else
> subdir = ERR_PTR(ret);
> trace_cachefiles_lookup(NULL, dir, subdir);
> @@ -141,7 +140,7 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
> trace_cachefiles_mkdir(dir, subdir);
>
> if (unlikely(d_unhashed(subdir) || d_is_negative(subdir))) {
> - dput(subdir);
> + end_creating(subdir, dir);
> goto retry;
> }
> ASSERT(d_backing_inode(subdir));
> @@ -154,7 +153,8 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
>
> /* Tell rmdir() it's not allowed to delete the subdir */
> inode_lock(d_inode(subdir));
> - inode_unlock(d_inode(dir));
> + dget(subdir);
> + end_creating(subdir, dir);
>
> if (!__cachefiles_mark_inode_in_use(NULL, d_inode(subdir))) {
> pr_notice("cachefiles: Inode already in use: %pd (B=%lx)\n",
> @@ -196,14 +196,11 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
> return ERR_PTR(-EBUSY);
>
> mkdir_error:
> - inode_unlock(d_inode(dir));
> - if (!IS_ERR(subdir))
> - dput(subdir);
> + end_creating(subdir, dir);
> pr_err("mkdir %s failed with error %d\n", dirname, ret);
> return ERR_PTR(ret);
>
> lookup_error:
> - inode_unlock(d_inode(dir));
> ret = PTR_ERR(subdir);
> pr_err("Lookup %s failed with error %d\n", dirname, ret);
> return ERR_PTR(ret);
> @@ -679,36 +676,37 @@ bool cachefiles_commit_tmpfile(struct cachefiles_cache *cache,
>
> _enter(",%pD", object->file);
>
> - inode_lock_nested(d_inode(fan), I_MUTEX_PARENT);
> ret = cachefiles_inject_read_error();
> if (ret == 0)
> - dentry = lookup_one(&nop_mnt_idmap, &QSTR(object->d_name), fan);
> + dentry = start_creating(&nop_mnt_idmap, fan, &QSTR(object->d_name));
> else
> dentry = ERR_PTR(ret);
> if (IS_ERR(dentry)) {
> trace_cachefiles_vfs_error(object, d_inode(fan), PTR_ERR(dentry),
> cachefiles_trace_lookup_error);
> _debug("lookup fail %ld", PTR_ERR(dentry));
> - goto out_unlock;
> + goto out;
> }
>
> - if (!d_is_negative(dentry)) {
> + while (!d_is_negative(dentry)) {
Can you explain why this changed from an if to a while? The existing
code doesn't seem to ever retry this operation.
> ret = cachefiles_unlink(volume->cache, object, fan, dentry,
> FSCACHE_OBJECT_IS_STALE);
> if (ret < 0)
> - goto out_dput;
> + goto out_end;
> +
> + end_creating(dentry, fan);
>
> - dput(dentry);
> ret = cachefiles_inject_read_error();
> if (ret == 0)
> - dentry = lookup_one(&nop_mnt_idmap, &QSTR(object->d_name), fan);
> + dentry = start_creating(&nop_mnt_idmap, fan,
> + &QSTR(object->d_name));
> else
> dentry = ERR_PTR(ret);
> if (IS_ERR(dentry)) {
> trace_cachefiles_vfs_error(object, d_inode(fan), PTR_ERR(dentry),
> cachefiles_trace_lookup_error);
> _debug("lookup fail %ld", PTR_ERR(dentry));
> - goto out_unlock;
> + goto out;
> }
> }
>
> @@ -729,10 +727,9 @@ bool cachefiles_commit_tmpfile(struct cachefiles_cache *cache,
> success = true;
> }
>
> -out_dput:
> - dput(dentry);
> -out_unlock:
> - inode_unlock(d_inode(fan));
> +out_end:
> + end_creating(dentry, fan);
> +out:
> _leave(" = %u", success);
> return success;
> }
> diff --git a/fs/namei.c b/fs/namei.c
> index 81cbaabbbe21..064cb44a3a46 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3242,6 +3242,33 @@ struct dentry *lookup_noperm_positive_unlocked(struct qstr *name,
> }
> EXPORT_SYMBOL(lookup_noperm_positive_unlocked);
>
> +/**
> + * start_creating - prepare to create a given name with permission checking
> + * @idmap - idmap of the mount
> + * @parent - directory in which to prepare to create the name
> + * @name - the name to be created
> + *
> + * Locks are taken and a lookup in performed prior to creating
> + * an object in a directory. Permission checking (MAY_EXEC) is performed
> + * against @idmap.
> + *
> + * If the name already exists, a positive dentry is returned, so
> + * behaviour is similar to O_CREAT without O_EXCL, which doesn't fail
> + * with -EEXIST.
> + *
> + * Returns: a negative or positive dentry, or an error.
> + */
> +struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *parent,
> + struct qstr *name)
> +{
> + int err = lookup_one_common(idmap, name, parent);
> +
> + if (err)
> + return ERR_PTR(err);
> + return start_dirop(parent, name, LOOKUP_CREATE);
> +}
> +EXPORT_SYMBOL(start_creating);
> +
> #ifdef CONFIG_UNIX98_PTYS
> int path_pts(struct path *path)
> {
> diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
> index b6d03e1ef5f7..e2aac0def2cb 100644
> --- a/fs/nfsd/nfs3proc.c
> +++ b/fs/nfsd/nfs3proc.c
> @@ -281,14 +281,11 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> if (host_err)
> return nfserrno(host_err);
>
> - inode_lock_nested(inode, I_MUTEX_PARENT);
> -
> - child = lookup_one(&nop_mnt_idmap,
> - &QSTR_LEN(argp->name, argp->len),
> - parent);
> + child = start_creating(&nop_mnt_idmap, parent,
> + &QSTR_LEN(argp->name, argp->len));
> if (IS_ERR(child)) {
> status = nfserrno(PTR_ERR(child));
> - goto out;
> + goto out_write;
> }
>
> if (d_really_is_negative(child)) {
> @@ -367,9 +364,8 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> status = nfsd_create_setattr(rqstp, fhp, resfhp, &attrs);
>
> out:
> - inode_unlock(inode);
> - if (child && !IS_ERR(child))
> - dput(child);
> + end_creating(child, parent);
> +out_write:
> fh_drop_write(fhp);
> return status;
> }
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 71b428efcbb5..35d48221072f 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -264,14 +264,11 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> if (is_create_with_attrs(open))
> nfsd4_acl_to_attr(NF4REG, open->op_acl, &attrs);
>
> - inode_lock_nested(inode, I_MUTEX_PARENT);
> -
> - child = lookup_one(&nop_mnt_idmap,
> - &QSTR_LEN(open->op_fname, open->op_fnamelen),
> - parent);
> + child = start_creating(&nop_mnt_idmap, parent,
> + &QSTR_LEN(open->op_fname, open->op_fnamelen));
> if (IS_ERR(child)) {
> status = nfserrno(PTR_ERR(child));
> - goto out;
> + goto out_write;
> }
>
> if (d_really_is_negative(child)) {
> @@ -379,10 +376,9 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> if (attrs.na_aclerr)
> open->op_bmval[0] &= ~FATTR4_WORD0_ACL;
> out:
> - inode_unlock(inode);
> + end_creating(child, parent);
> nfsd_attrs_free(&attrs);
> - if (child && !IS_ERR(child))
> - dput(child);
> +out_write:
> fh_drop_write(fhp);
> return status;
> }
> diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
> index 2231192ec33f..93b2a3e764db 100644
> --- a/fs/nfsd/nfs4recover.c
> +++ b/fs/nfsd/nfs4recover.c
> @@ -216,13 +216,11 @@ nfsd4_create_clid_dir(struct nfs4_client *clp)
> goto out_creds;
>
> dir = nn->rec_file->f_path.dentry;
> - /* lock the parent */
> - inode_lock(d_inode(dir));
>
> - dentry = lookup_one(&nop_mnt_idmap, &QSTR(dname), dir);
> + dentry = start_creating(&nop_mnt_idmap, dir, &QSTR(dname));
> if (IS_ERR(dentry)) {
> status = PTR_ERR(dentry);
> - goto out_unlock;
> + goto out;
> }
> if (d_really_is_positive(dentry))
> /*
> @@ -233,15 +231,13 @@ nfsd4_create_clid_dir(struct nfs4_client *clp)
> * In the 4.0 case, we should never get here; but we may
> * as well be forgiving and just succeed silently.
> */
> - goto out_put;
> + goto out_end;
> dentry = vfs_mkdir(&nop_mnt_idmap, d_inode(dir), dentry, S_IRWXU);
> if (IS_ERR(dentry))
> status = PTR_ERR(dentry);
> -out_put:
> - if (!status)
> - dput(dentry);
> -out_unlock:
> - inode_unlock(d_inode(dir));
> +out_end:
> + end_creating(dentry, dir);
> +out:
> if (status == 0) {
> if (nn->in_grace)
> __nfsd4_create_reclaim_record_grace(clp, dname,
> diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
> index 8f71f5748c75..ee1b16e921fd 100644
> --- a/fs/nfsd/nfsproc.c
> +++ b/fs/nfsd/nfsproc.c
> @@ -306,18 +306,16 @@ nfsd_proc_create(struct svc_rqst *rqstp)
> goto done;
> }
>
> - inode_lock_nested(dirfhp->fh_dentry->d_inode, I_MUTEX_PARENT);
> - dchild = lookup_one(&nop_mnt_idmap, &QSTR_LEN(argp->name, argp->len),
> - dirfhp->fh_dentry);
> + dchild = start_creating(&nop_mnt_idmap, dirfhp->fh_dentry,
> + &QSTR_LEN(argp->name, argp->len));
> if (IS_ERR(dchild)) {
> resp->status = nfserrno(PTR_ERR(dchild));
> - goto out_unlock;
> + goto out_write;
> }
> fh_init(newfhp, NFS_FHSIZE);
> resp->status = fh_compose(newfhp, dirfhp->fh_export, dchild, dirfhp);
> if (!resp->status && d_really_is_negative(dchild))
> resp->status = nfserr_noent;
> - dput(dchild);
> if (resp->status) {
> if (resp->status != nfserr_noent)
> goto out_unlock;
> @@ -423,7 +421,8 @@ nfsd_proc_create(struct svc_rqst *rqstp)
> }
>
> out_unlock:
> - inode_unlock(dirfhp->fh_dentry->d_inode);
> + end_creating(dchild, dirfhp->fh_dentry);
> +out_write:
> fh_drop_write(dirfhp);
> done:
> fh_put(dirfhp);
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index aa4a95713a48..90c830c59c60 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1605,19 +1605,16 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> if (host_err)
> return nfserrno(host_err);
>
> - inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT);
> - dchild = lookup_one(&nop_mnt_idmap, &QSTR_LEN(fname, flen), dentry);
> + dchild = start_creating(&nop_mnt_idmap, dentry, &QSTR_LEN(fname, flen));
> host_err = PTR_ERR(dchild);
> - if (IS_ERR(dchild)) {
> - err = nfserrno(host_err);
> - goto out_unlock;
> - }
> + if (IS_ERR(dchild))
> + return nfserrno(host_err);
> +
> err = fh_compose(resfhp, fhp->fh_export, dchild, fhp);
> /*
> * We unconditionally drop our ref to dchild as fh_compose will have
> * already grabbed its own ref for it.
> */
> - dput(dchild);
> if (err)
> goto out_unlock;
> err = fh_fill_pre_attrs(fhp);
> @@ -1626,7 +1623,7 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> err = nfsd_create_locked(rqstp, fhp, attrs, type, rdev, resfhp);
> fh_fill_post_attrs(fhp);
> out_unlock:
> - inode_unlock(dentry->d_inode);
> + end_creating(dchild, dentry);
> return err;
> }
>
> @@ -1712,11 +1709,9 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
> }
>
> dentry = fhp->fh_dentry;
> - inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT);
> - dnew = lookup_one(&nop_mnt_idmap, &QSTR_LEN(fname, flen), dentry);
> + dnew = start_creating(&nop_mnt_idmap, dentry, &QSTR_LEN(fname, flen));
> if (IS_ERR(dnew)) {
> err = nfserrno(PTR_ERR(dnew));
> - inode_unlock(dentry->d_inode);
> goto out_drop_write;
> }
> err = fh_fill_pre_attrs(fhp);
> @@ -1729,11 +1724,11 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
> nfsd_create_setattr(rqstp, fhp, resfhp, attrs);
> fh_fill_post_attrs(fhp);
> out_unlock:
> - inode_unlock(dentry->d_inode);
> + end_creating(dnew, dentry);
> if (!err)
> err = nfserrno(commit_metadata(fhp));
> - dput(dnew);
> - if (err==0) err = cerr;
> + if (!err)
> + err = cerr;
> out_drop_write:
> fh_drop_write(fhp);
> out:
> @@ -1788,32 +1783,31 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
>
> ddir = ffhp->fh_dentry;
> dirp = d_inode(ddir);
> - inode_lock_nested(dirp, I_MUTEX_PARENT);
> + dnew = start_creating(&nop_mnt_idmap, ddir, &QSTR_LEN(name, len));
>
> - dnew = lookup_one(&nop_mnt_idmap, &QSTR_LEN(name, len), ddir);
> if (IS_ERR(dnew)) {
> host_err = PTR_ERR(dnew);
> - goto out_unlock;
> + goto out_drop_write;
> }
>
> dold = tfhp->fh_dentry;
>
> err = nfserr_noent;
> if (d_really_is_negative(dold))
> - goto out_dput;
> + goto out_unlock;
> err = fh_fill_pre_attrs(ffhp);
> if (err != nfs_ok)
> - goto out_dput;
> + goto out_unlock;
> host_err = vfs_link(dold, &nop_mnt_idmap, dirp, dnew, NULL);
> fh_fill_post_attrs(ffhp);
> - inode_unlock(dirp);
> +out_unlock:
> + end_creating(dnew, ddir);
> if (!host_err) {
> host_err = commit_metadata(ffhp);
> if (!host_err)
> host_err = commit_metadata(tfhp);
> }
>
> - dput(dnew);
> out_drop_write:
> fh_drop_write(tfhp);
> if (host_err == -EBUSY) {
> @@ -1828,12 +1822,6 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
> }
> out:
> return err != nfs_ok ? err : nfserrno(host_err);
> -
> -out_dput:
> - dput(dnew);
> -out_unlock:
> - inode_unlock(dirp);
> - goto out_drop_write;
> }
>
>
I do quite like the nfsd cleanup though!
>
> static void
> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> index 27396fe63f6d..6a31ea34ff80 100644
> --- a/fs/overlayfs/copy_up.c
> +++ b/fs/overlayfs/copy_up.c
> @@ -613,9 +613,9 @@ static int ovl_link_up(struct ovl_copy_up_ctx *c)
> if (err)
> goto out;
>
> - inode_lock_nested(udir, I_MUTEX_PARENT);
> - upper = ovl_lookup_upper(ofs, c->dentry->d_name.name, upperdir,
> - c->dentry->d_name.len);
> + upper = ovl_start_creating_upper(ofs, upperdir,
> + &QSTR_LEN(c->dentry->d_name.name,
> + c->dentry->d_name.len));
> err = PTR_ERR(upper);
> if (!IS_ERR(upper)) {
> err = ovl_do_link(ofs, ovl_dentry_upper(c->dentry), udir, upper);
> @@ -626,9 +626,8 @@ static int ovl_link_up(struct ovl_copy_up_ctx *c)
> ovl_dentry_set_upper_alias(c->dentry);
> ovl_dentry_update_reval(c->dentry, upper);
> }
> - dput(upper);
> + end_creating(upper, upperdir);
> }
> - inode_unlock(udir);
> if (err)
> goto out;
>
> @@ -894,16 +893,14 @@ static int ovl_copy_up_tmpfile(struct ovl_copy_up_ctx *c)
> if (err)
> goto out;
>
> - inode_lock_nested(udir, I_MUTEX_PARENT);
> -
> - upper = ovl_lookup_upper(ofs, c->destname.name, c->destdir,
> - c->destname.len);
> + upper = ovl_start_creating_upper(ofs, c->destdir,
> + &QSTR_LEN(c->destname.name,
> + c->destname.len));
> err = PTR_ERR(upper);
> if (!IS_ERR(upper)) {
> err = ovl_do_link(ofs, temp, udir, upper);
> - dput(upper);
> + end_creating(upper, c->destdir);
> }
> - inode_unlock(udir);
>
> if (err)
> goto out;
> diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> index dbd63a74df4b..0ae79efbfce7 100644
> --- a/fs/overlayfs/dir.c
> +++ b/fs/overlayfs/dir.c
> @@ -59,15 +59,21 @@ int ovl_cleanup(struct ovl_fs *ofs, struct dentry *workdir,
> return 0;
> }
>
> -struct dentry *ovl_lookup_temp(struct ovl_fs *ofs, struct dentry *workdir)
> +#define OVL_TEMPNAME_SIZE 20
> +static void ovl_tempname(char name[OVL_TEMPNAME_SIZE])
> {
> - struct dentry *temp;
> - char name[20];
> static atomic_t temp_id = ATOMIC_INIT(0);
>
> /* counter is allowed to wrap, since temp dentries are ephemeral */
> - snprintf(name, sizeof(name), "#%x", atomic_inc_return(&temp_id));
> + snprintf(name, OVL_TEMPNAME_SIZE, "#%x", atomic_inc_return(&temp_id));
> +}
> +
> +struct dentry *ovl_lookup_temp(struct ovl_fs *ofs, struct dentry *workdir)
> +{
> + struct dentry *temp;
> + char name[OVL_TEMPNAME_SIZE];
>
> + ovl_tempname(name);
> temp = ovl_lookup_upper(ofs, name, workdir, strlen(name));
> if (!IS_ERR(temp) && temp->d_inode) {
> pr_err("workdir/%s already exists\n", name);
> @@ -78,6 +84,16 @@ struct dentry *ovl_lookup_temp(struct ovl_fs *ofs, struct dentry *workdir)
> return temp;
> }
>
> +static struct dentry *ovl_start_creating_temp(struct ovl_fs *ofs,
> + struct dentry *workdir)
> +{
> + char name[OVL_TEMPNAME_SIZE];
> +
> + ovl_tempname(name);
> + return start_creating(ovl_upper_mnt_idmap(ofs), workdir,
> + &QSTR(name));
> +}
> +
> static struct dentry *ovl_whiteout(struct ovl_fs *ofs)
> {
> int err;
> @@ -88,35 +104,31 @@ static struct dentry *ovl_whiteout(struct ovl_fs *ofs)
> guard(mutex)(&ofs->whiteout_lock);
>
> if (!ofs->whiteout) {
> - inode_lock_nested(wdir, I_MUTEX_PARENT);
> - whiteout = ovl_lookup_temp(ofs, workdir);
> - if (!IS_ERR(whiteout)) {
> - err = ovl_do_whiteout(ofs, wdir, whiteout);
> - if (err) {
> - dput(whiteout);
> - whiteout = ERR_PTR(err);
> - }
> - }
> - inode_unlock(wdir);
> + whiteout = ovl_start_creating_temp(ofs, workdir);
> if (IS_ERR(whiteout))
> return whiteout;
> - ofs->whiteout = whiteout;
> + err = ovl_do_whiteout(ofs, wdir, whiteout);
> + if (!err)
> + ofs->whiteout = dget(whiteout);
> + end_creating(whiteout, workdir);
> + if (err)
> + return ERR_PTR(err);
> }
>
> if (!ofs->no_shared_whiteout) {
> - inode_lock_nested(wdir, I_MUTEX_PARENT);
> - whiteout = ovl_lookup_temp(ofs, workdir);
> - if (!IS_ERR(whiteout)) {
> - err = ovl_do_link(ofs, ofs->whiteout, wdir, whiteout);
> - if (err) {
> - dput(whiteout);
> - whiteout = ERR_PTR(err);
> - }
> - }
> - inode_unlock(wdir);
> - if (!IS_ERR(whiteout))
> + struct dentry *ret = NULL;
> +
> + whiteout = ovl_start_creating_temp(ofs, workdir);
> + if (IS_ERR(whiteout))
> return whiteout;
> - if (PTR_ERR(whiteout) != -EMLINK) {
> + err = ovl_do_link(ofs, ofs->whiteout, wdir, whiteout);
> + if (!err)
> + ret = dget(whiteout);
> + end_creating(whiteout, workdir);
> + if (ret)
> + return ret;
> +
> + if (err != -EMLINK) {
> pr_warn("Failed to link whiteout - disabling whiteout inode sharing(nlink=%u, err=%lu)\n",
> ofs->whiteout->d_inode->i_nlink,
> PTR_ERR(whiteout));
> @@ -225,10 +237,13 @@ struct dentry *ovl_create_temp(struct ovl_fs *ofs, struct dentry *workdir,
> struct ovl_cattr *attr)
> {
> struct dentry *ret;
> - inode_lock_nested(workdir->d_inode, I_MUTEX_PARENT);
> - ret = ovl_create_real(ofs, workdir,
> - ovl_lookup_temp(ofs, workdir), attr);
> - inode_unlock(workdir->d_inode);
> + ret = ovl_start_creating_temp(ofs, workdir);
> + if (IS_ERR(ret))
> + return ret;
> + ret = ovl_create_real(ofs, workdir, ret, attr);
> + if (!IS_ERR(ret))
> + dget(ret);
> + end_creating(ret, workdir);
> return ret;
> }
>
> @@ -327,18 +342,21 @@ static int ovl_create_upper(struct dentry *dentry, struct inode *inode,
> {
> struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
> struct dentry *upperdir = ovl_dentry_upper(dentry->d_parent);
> - struct inode *udir = upperdir->d_inode;
> struct dentry *newdentry;
> int err;
>
> - inode_lock_nested(udir, I_MUTEX_PARENT);
> - newdentry = ovl_create_real(ofs, upperdir,
> - ovl_lookup_upper(ofs, dentry->d_name.name,
> - upperdir, dentry->d_name.len),
> - attr);
> - inode_unlock(udir);
> + newdentry = ovl_start_creating_upper(ofs, upperdir,
> + &QSTR_LEN(dentry->d_name.name,
> + dentry->d_name.len));
> if (IS_ERR(newdentry))
> return PTR_ERR(newdentry);
> + newdentry = ovl_create_real(ofs, upperdir, newdentry, attr);
> + if (IS_ERR(newdentry)) {
> + end_creating(newdentry, upperdir);
> + return PTR_ERR(newdentry);
> + }
> + dget(newdentry);
> + end_creating(newdentry, upperdir);
>
> if (ovl_type_merge(dentry->d_parent) && d_is_dir(newdentry) &&
> !ovl_allow_offline_changes(ofs)) {
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index 4f84abaa0d68..c24c2da953bd 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -415,6 +415,14 @@ static inline struct dentry *ovl_lookup_upper_unlocked(struct ovl_fs *ofs,
> &QSTR_LEN(name, len), base);
> }
>
> +static inline struct dentry *ovl_start_creating_upper(struct ovl_fs *ofs,
> + struct dentry *parent,
> + struct qstr *name)
> +{
> + return start_creating(ovl_upper_mnt_idmap(ofs),
> + parent, name);
> +}
> +
> static inline bool ovl_open_flags_need_copy_up(int flags)
> {
> if (!flags)
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index bd3d7ba8fb95..67abb62e205b 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -300,8 +300,7 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
> bool retried = false;
>
> retry:
> - inode_lock_nested(dir, I_MUTEX_PARENT);
> - work = ovl_lookup_upper(ofs, name, ofs->workbasedir, strlen(name));
> + work = ovl_start_creating_upper(ofs, ofs->workbasedir, &QSTR(name));
>
> if (!IS_ERR(work)) {
> struct iattr attr = {
> @@ -310,14 +309,13 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
> };
>
> if (work->d_inode) {
> + dget(work);
> + end_creating(work, ofs->workbasedir);
> + if (persist)
> + return work;
> err = -EEXIST;
> - inode_unlock(dir);
> if (retried)
> goto out_dput;
> -
> - if (persist)
> - return work;
> -
> retried = true;
> err = ovl_workdir_cleanup(ofs, ofs->workbasedir, mnt, work, 0);
> dput(work);
> @@ -328,7 +326,9 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
> }
>
> work = ovl_do_mkdir(ofs, dir, work, attr.ia_mode);
> - inode_unlock(dir);
> + if (!IS_ERR(work))
> + dget(work);
> + end_creating(work, ofs->workbasedir);
> err = PTR_ERR(work);
> if (IS_ERR(work))
> goto out_err;
> @@ -366,7 +366,6 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
> if (err)
> goto out_dput;
> } else {
> - inode_unlock(dir);
> err = PTR_ERR(work);
> goto out_err;
> }
> @@ -616,14 +615,17 @@ static struct dentry *ovl_lookup_or_create(struct ovl_fs *ofs,
> struct dentry *parent,
> const char *name, umode_t mode)
> {
> - size_t len = strlen(name);
> struct dentry *child;
>
> - inode_lock_nested(parent->d_inode, I_MUTEX_PARENT);
> - child = ovl_lookup_upper(ofs, name, parent, len);
> - if (!IS_ERR(child) && !child->d_inode)
> - child = ovl_create_real(ofs, parent, child, OVL_CATTR(mode));
> - inode_unlock(parent->d_inode);
> + child = ovl_start_creating_upper(ofs, parent, &QSTR(name));
> + if (!IS_ERR(child)) {
> + if (!child->d_inode)
> + child = ovl_create_real(ofs, parent, child,
> + OVL_CATTR(mode));
> + if (!IS_ERR(child))
> + dget(child);
> + end_creating(child, parent);
> + }
> dput(parent);
>
> return child;
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index a7800ef04e76..4cbe930054a1 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -88,6 +88,24 @@ struct dentry *lookup_one_positive_killable(struct mnt_idmap *idmap,
> struct qstr *name,
> struct dentry *base);
>
> +struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *parent,
> + struct qstr *name);
> +
> +/* end_creating - finish action started with start_creating
> + * @child - dentry returned by start_creating()
> + * @parent - dentry given to start_creating()
> + *
> + * Unlock and release the child.
> + *
> + * Unlike end_dirop() this can only be called if start_creating() succeeded.
> + * It handles @child being and error as vfs_mkdir() might have converted the
> + * dentry to an error - in that case the parent still needs to be unlocked.
> + */
> +static inline void end_creating(struct dentry *child, struct dentry *parent)
> +{
> + end_dirop_mkdir(child, parent);
> +}
> +
> extern int follow_down_one(struct path *);
> extern int follow_down(struct path *path, unsigned int flags);
> extern int follow_up(struct path *);
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH 03/11] VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating()
2025-09-29 12:37 ` Jeff Layton
@ 2025-09-30 5:37 ` NeilBrown
2025-09-30 10:19 ` Jeff Layton
0 siblings, 1 reply; 49+ messages in thread
From: NeilBrown @ 2025-09-30 5:37 UTC (permalink / raw)
To: Jeff Layton
Cc: Alexander Viro, Christian Brauner, Amir Goldstein, Jan Kara,
linux-fsdevel
On Mon, 29 Sep 2025, Jeff Layton wrote:
> On Fri, 2025-09-26 at 12:49 +1000, NeilBrown wrote:
> > From: NeilBrown <neil@brown.name>
> >
> > start_creating() is similar to simple_start_creating() but is not so
> > simple.
> > It takes a qstr for the name, includes permission checking, and does NOT
> > report an error if the name already exists, returning a positive dentry
> > instead.
> >
> > - if (!d_is_negative(dentry)) {
> > + while (!d_is_negative(dentry)) {
>
> Can you explain why this changed from an if to a while? The existing
> code doesn't seem to ever retry this operation.
I tried to explain that in the commit message:
> Occasionally this change means that the parent lock is held for a
> shorter period of time, for example in cachefiles_commit_tmpfile().
> As this function now unlocks after an unlink and before the following
> lookup, it is possible that the lookup could again find a positive
> dentry, so a while loop is introduced there.
Is there something I could do to make that clearer?
....
> > @@ -1828,12 +1822,6 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
> > }
> > out:
> > return err != nfs_ok ? err : nfserrno(host_err);
> > -
> > -out_dput:
> > - dput(dnew);
> > -out_unlock:
> > - inode_unlock(dirp);
> > - goto out_drop_write;
> > }
> >
> >
>
>
> I do quite like the nfsd cleanup though!
>
>
Thanks!
NeilBrown
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH 03/11] VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating()
2025-09-30 5:37 ` NeilBrown
@ 2025-09-30 10:19 ` Jeff Layton
0 siblings, 0 replies; 49+ messages in thread
From: Jeff Layton @ 2025-09-30 10:19 UTC (permalink / raw)
To: NeilBrown
Cc: Alexander Viro, Christian Brauner, Amir Goldstein, Jan Kara,
linux-fsdevel
On Tue, 2025-09-30 at 15:37 +1000, NeilBrown wrote:
> On Mon, 29 Sep 2025, Jeff Layton wrote:
> > On Fri, 2025-09-26 at 12:49 +1000, NeilBrown wrote:
> > > From: NeilBrown <neil@brown.name>
> > >
> > > start_creating() is similar to simple_start_creating() but is not so
> > > simple.
> > > It takes a qstr for the name, includes permission checking, and does NOT
> > > report an error if the name already exists, returning a positive dentry
> > > instead.
>
>
>
> > >
> > > - if (!d_is_negative(dentry)) {
> > > + while (!d_is_negative(dentry)) {
> >
> > Can you explain why this changed from an if to a while? The existing
> > code doesn't seem to ever retry this operation.
>
> I tried to explain that in the commit message:
>
> > Occasionally this change means that the parent lock is held for a
> > shorter period of time, for example in cachefiles_commit_tmpfile().
> > As this function now unlocks after an unlink and before the following
> > lookup, it is possible that the lookup could again find a positive
> > dentry, so a while loop is introduced there.
>
> Is there something I could do to make that clearer?
>
Clearly I didn't read the commit message well enough. Nothing I can
think of.
> ....
> > > @@ -1828,12 +1822,6 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
> > > }
> > > out:
> > > return err != nfs_ok ? err : nfserrno(host_err);
> > > -
> > > -out_dput:
> > > - dput(dnew);
> > > -out_unlock:
> > > - inode_unlock(dirp);
> > > - goto out_drop_write;
> > > }
> > >
> > >
> >
> >
> > I do quite like the nfsd cleanup though!
> >
> >
>
> Thanks!
>
> NeilBrown
You can add:
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 03/11] VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating()
2025-09-26 2:49 ` [PATCH 03/11] VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating() NeilBrown
2025-09-29 12:37 ` Jeff Layton
@ 2025-09-30 8:54 ` Amir Goldstein
2025-10-01 3:15 ` NeilBrown
1 sibling, 1 reply; 49+ messages in thread
From: Amir Goldstein @ 2025-09-30 8:54 UTC (permalink / raw)
To: NeilBrown
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Fri, Sep 26, 2025 at 4:50 AM NeilBrown <neilb@ownmail.net> wrote:
>
> From: NeilBrown <neil@brown.name>
>
> start_creating() is similar to simple_start_creating() but is not so
> simple.
> It takes a qstr for the name, includes permission checking, and does NOT
> report an error if the name already exists, returning a positive dentry
> instead.
>
> This is currently used by nfsd, cachefiles, and overlayfs.
>
> end_creating() is called after the dentry has been used.
> end_creating() drops the reference to the dentry as it is generally no
> longer needed. This is exactly end_dirop_mkdir(),
> but using that everywhere looks a bit odd...
>
> These calls help encapsulate locking rules so that directory locking can
> be changed.
>
> Occasionally this change means that the parent lock is held for a
> shorter period of time, for example in cachefiles_commit_tmpfile().
> As this function now unlocks after an unlink and before the following
> lookup, it is possible that the lookup could again find a positive
> dentry, so a while loop is introduced there.
>
> In overlayfs the ovl_lookup_temp() function has ovl_tempname()
> split out to be used in ovl_start_creating_temp(). The other use
> of ovl_lookup_temp() is preparing for a rename. When rename handling
> is updated, ovl_lookup_temp() will be removed.
>
> Signed-off-by: NeilBrown <neil@brown.name>
> ---
> fs/cachefiles/namei.c | 37 ++++++++--------
> fs/namei.c | 27 ++++++++++++
> fs/nfsd/nfs3proc.c | 14 +++---
> fs/nfsd/nfs4proc.c | 14 +++---
> fs/nfsd/nfs4recover.c | 16 +++----
> fs/nfsd/nfsproc.c | 11 +++--
> fs/nfsd/vfs.c | 42 +++++++-----------
> fs/overlayfs/copy_up.c | 19 ++++----
> fs/overlayfs/dir.c | 94 ++++++++++++++++++++++++----------------
> fs/overlayfs/overlayfs.h | 8 ++++
> fs/overlayfs/super.c | 32 +++++++-------
> include/linux/namei.h | 18 ++++++++
> 12 files changed, 187 insertions(+), 145 deletions(-)
>
> diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
> index d1edb2ac3837..965b22b2f58d 100644
> --- a/fs/cachefiles/namei.c
> +++ b/fs/cachefiles/namei.c
> @@ -93,12 +93,11 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
> _enter(",,%s", dirname);
>
> /* search the current directory for the element name */
> - inode_lock_nested(d_inode(dir), I_MUTEX_PARENT);
>
> retry:
> ret = cachefiles_inject_read_error();
> if (ret == 0)
> - subdir = lookup_one(&nop_mnt_idmap, &QSTR(dirname), dir);
> + subdir = start_creating(&nop_mnt_idmap, dir, &QSTR(dirname));
> else
> subdir = ERR_PTR(ret);
> trace_cachefiles_lookup(NULL, dir, subdir);
> @@ -141,7 +140,7 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
> trace_cachefiles_mkdir(dir, subdir);
>
> if (unlikely(d_unhashed(subdir) || d_is_negative(subdir))) {
> - dput(subdir);
> + end_creating(subdir, dir);
> goto retry;
> }
> ASSERT(d_backing_inode(subdir));
> @@ -154,7 +153,8 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
>
> /* Tell rmdir() it's not allowed to delete the subdir */
> inode_lock(d_inode(subdir));
> - inode_unlock(d_inode(dir));
> + dget(subdir);
> + end_creating(subdir, dir);
>
> if (!__cachefiles_mark_inode_in_use(NULL, d_inode(subdir))) {
> pr_notice("cachefiles: Inode already in use: %pd (B=%lx)\n",
> @@ -196,14 +196,11 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
> return ERR_PTR(-EBUSY);
>
> mkdir_error:
> - inode_unlock(d_inode(dir));
> - if (!IS_ERR(subdir))
> - dput(subdir);
> + end_creating(subdir, dir);
> pr_err("mkdir %s failed with error %d\n", dirname, ret);
> return ERR_PTR(ret);
>
> lookup_error:
> - inode_unlock(d_inode(dir));
> ret = PTR_ERR(subdir);
> pr_err("Lookup %s failed with error %d\n", dirname, ret);
> return ERR_PTR(ret);
> @@ -679,36 +676,37 @@ bool cachefiles_commit_tmpfile(struct cachefiles_cache *cache,
>
> _enter(",%pD", object->file);
>
> - inode_lock_nested(d_inode(fan), I_MUTEX_PARENT);
> ret = cachefiles_inject_read_error();
> if (ret == 0)
> - dentry = lookup_one(&nop_mnt_idmap, &QSTR(object->d_name), fan);
> + dentry = start_creating(&nop_mnt_idmap, fan, &QSTR(object->d_name));
> else
> dentry = ERR_PTR(ret);
> if (IS_ERR(dentry)) {
> trace_cachefiles_vfs_error(object, d_inode(fan), PTR_ERR(dentry),
> cachefiles_trace_lookup_error);
> _debug("lookup fail %ld", PTR_ERR(dentry));
> - goto out_unlock;
> + goto out;
> }
>
> - if (!d_is_negative(dentry)) {
> + while (!d_is_negative(dentry)) {
> ret = cachefiles_unlink(volume->cache, object, fan, dentry,
> FSCACHE_OBJECT_IS_STALE);
> if (ret < 0)
> - goto out_dput;
> + goto out_end;
> +
> + end_creating(dentry, fan);
>
> - dput(dentry);
> ret = cachefiles_inject_read_error();
> if (ret == 0)
> - dentry = lookup_one(&nop_mnt_idmap, &QSTR(object->d_name), fan);
> + dentry = start_creating(&nop_mnt_idmap, fan,
> + &QSTR(object->d_name));
> else
> dentry = ERR_PTR(ret);
> if (IS_ERR(dentry)) {
> trace_cachefiles_vfs_error(object, d_inode(fan), PTR_ERR(dentry),
> cachefiles_trace_lookup_error);
> _debug("lookup fail %ld", PTR_ERR(dentry));
> - goto out_unlock;
> + goto out;
> }
> }
>
> @@ -729,10 +727,9 @@ bool cachefiles_commit_tmpfile(struct cachefiles_cache *cache,
> success = true;
> }
>
> -out_dput:
> - dput(dentry);
> -out_unlock:
> - inode_unlock(d_inode(fan));
> +out_end:
> + end_creating(dentry, fan);
> +out:
> _leave(" = %u", success);
> return success;
> }
> diff --git a/fs/namei.c b/fs/namei.c
> index 81cbaabbbe21..064cb44a3a46 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3242,6 +3242,33 @@ struct dentry *lookup_noperm_positive_unlocked(struct qstr *name,
> }
> EXPORT_SYMBOL(lookup_noperm_positive_unlocked);
>
> +/**
> + * start_creating - prepare to create a given name with permission checking
> + * @idmap - idmap of the mount
> + * @parent - directory in which to prepare to create the name
> + * @name - the name to be created
> + *
> + * Locks are taken and a lookup in performed prior to creating
typo: is performed
> + * an object in a directory. Permission checking (MAY_EXEC) is performed
> + * against @idmap.
> + *
> + * If the name already exists, a positive dentry is returned, so
> + * behaviour is similar to O_CREAT without O_EXCL, which doesn't fail
> + * with -EEXIST.
> + *
> + * Returns: a negative or positive dentry, or an error.
> + */
> +struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *parent,
> + struct qstr *name)
> +{
> + int err = lookup_one_common(idmap, name, parent);
> +
> + if (err)
> + return ERR_PTR(err);
> + return start_dirop(parent, name, LOOKUP_CREATE);
> +}
> +EXPORT_SYMBOL(start_creating);
> +
> #ifdef CONFIG_UNIX98_PTYS
> int path_pts(struct path *path)
> {
> diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
> index b6d03e1ef5f7..e2aac0def2cb 100644
> --- a/fs/nfsd/nfs3proc.c
> +++ b/fs/nfsd/nfs3proc.c
> @@ -281,14 +281,11 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> if (host_err)
> return nfserrno(host_err);
>
> - inode_lock_nested(inode, I_MUTEX_PARENT);
> -
> - child = lookup_one(&nop_mnt_idmap,
> - &QSTR_LEN(argp->name, argp->len),
> - parent);
> + child = start_creating(&nop_mnt_idmap, parent,
> + &QSTR_LEN(argp->name, argp->len));
> if (IS_ERR(child)) {
> status = nfserrno(PTR_ERR(child));
> - goto out;
> + goto out_write;
> }
>
> if (d_really_is_negative(child)) {
> @@ -367,9 +364,8 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> status = nfsd_create_setattr(rqstp, fhp, resfhp, &attrs);
>
> out:
> - inode_unlock(inode);
> - if (child && !IS_ERR(child))
> - dput(child);
> + end_creating(child, parent);
> +out_write:
> fh_drop_write(fhp);
> return status;
> }
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 71b428efcbb5..35d48221072f 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -264,14 +264,11 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> if (is_create_with_attrs(open))
> nfsd4_acl_to_attr(NF4REG, open->op_acl, &attrs);
>
> - inode_lock_nested(inode, I_MUTEX_PARENT);
> -
> - child = lookup_one(&nop_mnt_idmap,
> - &QSTR_LEN(open->op_fname, open->op_fnamelen),
> - parent);
> + child = start_creating(&nop_mnt_idmap, parent,
> + &QSTR_LEN(open->op_fname, open->op_fnamelen));
> if (IS_ERR(child)) {
> status = nfserrno(PTR_ERR(child));
> - goto out;
> + goto out_write;
> }
>
> if (d_really_is_negative(child)) {
> @@ -379,10 +376,9 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> if (attrs.na_aclerr)
> open->op_bmval[0] &= ~FATTR4_WORD0_ACL;
> out:
> - inode_unlock(inode);
> + end_creating(child, parent);
> nfsd_attrs_free(&attrs);
> - if (child && !IS_ERR(child))
> - dput(child);
> +out_write:
> fh_drop_write(fhp);
> return status;
> }
> diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
> index 2231192ec33f..93b2a3e764db 100644
> --- a/fs/nfsd/nfs4recover.c
> +++ b/fs/nfsd/nfs4recover.c
> @@ -216,13 +216,11 @@ nfsd4_create_clid_dir(struct nfs4_client *clp)
> goto out_creds;
>
> dir = nn->rec_file->f_path.dentry;
> - /* lock the parent */
> - inode_lock(d_inode(dir));
>
> - dentry = lookup_one(&nop_mnt_idmap, &QSTR(dname), dir);
> + dentry = start_creating(&nop_mnt_idmap, dir, &QSTR(dname));
> if (IS_ERR(dentry)) {
> status = PTR_ERR(dentry);
> - goto out_unlock;
> + goto out;
> }
> if (d_really_is_positive(dentry))
> /*
> @@ -233,15 +231,13 @@ nfsd4_create_clid_dir(struct nfs4_client *clp)
> * In the 4.0 case, we should never get here; but we may
> * as well be forgiving and just succeed silently.
> */
> - goto out_put;
> + goto out_end;
> dentry = vfs_mkdir(&nop_mnt_idmap, d_inode(dir), dentry, S_IRWXU);
> if (IS_ERR(dentry))
> status = PTR_ERR(dentry);
> -out_put:
> - if (!status)
> - dput(dentry);
> -out_unlock:
> - inode_unlock(d_inode(dir));
> +out_end:
> + end_creating(dentry, dir);
> +out:
> if (status == 0) {
> if (nn->in_grace)
> __nfsd4_create_reclaim_record_grace(clp, dname,
> diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
> index 8f71f5748c75..ee1b16e921fd 100644
> --- a/fs/nfsd/nfsproc.c
> +++ b/fs/nfsd/nfsproc.c
> @@ -306,18 +306,16 @@ nfsd_proc_create(struct svc_rqst *rqstp)
> goto done;
> }
>
> - inode_lock_nested(dirfhp->fh_dentry->d_inode, I_MUTEX_PARENT);
> - dchild = lookup_one(&nop_mnt_idmap, &QSTR_LEN(argp->name, argp->len),
> - dirfhp->fh_dentry);
> + dchild = start_creating(&nop_mnt_idmap, dirfhp->fh_dentry,
> + &QSTR_LEN(argp->name, argp->len));
> if (IS_ERR(dchild)) {
> resp->status = nfserrno(PTR_ERR(dchild));
> - goto out_unlock;
> + goto out_write;
> }
> fh_init(newfhp, NFS_FHSIZE);
> resp->status = fh_compose(newfhp, dirfhp->fh_export, dchild, dirfhp);
> if (!resp->status && d_really_is_negative(dchild))
> resp->status = nfserr_noent;
> - dput(dchild);
> if (resp->status) {
> if (resp->status != nfserr_noent)
> goto out_unlock;
> @@ -423,7 +421,8 @@ nfsd_proc_create(struct svc_rqst *rqstp)
> }
>
> out_unlock:
> - inode_unlock(dirfhp->fh_dentry->d_inode);
> + end_creating(dchild, dirfhp->fh_dentry);
> +out_write:
> fh_drop_write(dirfhp);
> done:
> fh_put(dirfhp);
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index aa4a95713a48..90c830c59c60 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1605,19 +1605,16 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> if (host_err)
> return nfserrno(host_err);
>
> - inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT);
> - dchild = lookup_one(&nop_mnt_idmap, &QSTR_LEN(fname, flen), dentry);
> + dchild = start_creating(&nop_mnt_idmap, dentry, &QSTR_LEN(fname, flen));
> host_err = PTR_ERR(dchild);
> - if (IS_ERR(dchild)) {
> - err = nfserrno(host_err);
> - goto out_unlock;
> - }
> + if (IS_ERR(dchild))
> + return nfserrno(host_err);
> +
> err = fh_compose(resfhp, fhp->fh_export, dchild, fhp);
> /*
> * We unconditionally drop our ref to dchild as fh_compose will have
> * already grabbed its own ref for it.
> */
> - dput(dchild);
> if (err)
> goto out_unlock;
> err = fh_fill_pre_attrs(fhp);
> @@ -1626,7 +1623,7 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> err = nfsd_create_locked(rqstp, fhp, attrs, type, rdev, resfhp);
> fh_fill_post_attrs(fhp);
> out_unlock:
> - inode_unlock(dentry->d_inode);
> + end_creating(dchild, dentry);
> return err;
> }
>
> @@ -1712,11 +1709,9 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
> }
>
> dentry = fhp->fh_dentry;
> - inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT);
> - dnew = lookup_one(&nop_mnt_idmap, &QSTR_LEN(fname, flen), dentry);
> + dnew = start_creating(&nop_mnt_idmap, dentry, &QSTR_LEN(fname, flen));
> if (IS_ERR(dnew)) {
> err = nfserrno(PTR_ERR(dnew));
> - inode_unlock(dentry->d_inode);
> goto out_drop_write;
> }
> err = fh_fill_pre_attrs(fhp);
> @@ -1729,11 +1724,11 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
> nfsd_create_setattr(rqstp, fhp, resfhp, attrs);
> fh_fill_post_attrs(fhp);
> out_unlock:
> - inode_unlock(dentry->d_inode);
> + end_creating(dnew, dentry);
> if (!err)
> err = nfserrno(commit_metadata(fhp));
> - dput(dnew);
> - if (err==0) err = cerr;
> + if (!err)
> + err = cerr;
> out_drop_write:
> fh_drop_write(fhp);
> out:
> @@ -1788,32 +1783,31 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
>
> ddir = ffhp->fh_dentry;
> dirp = d_inode(ddir);
> - inode_lock_nested(dirp, I_MUTEX_PARENT);
> + dnew = start_creating(&nop_mnt_idmap, ddir, &QSTR_LEN(name, len));
>
> - dnew = lookup_one(&nop_mnt_idmap, &QSTR_LEN(name, len), ddir);
> if (IS_ERR(dnew)) {
> host_err = PTR_ERR(dnew);
> - goto out_unlock;
> + goto out_drop_write;
> }
>
> dold = tfhp->fh_dentry;
>
> err = nfserr_noent;
> if (d_really_is_negative(dold))
> - goto out_dput;
> + goto out_unlock;
> err = fh_fill_pre_attrs(ffhp);
> if (err != nfs_ok)
> - goto out_dput;
> + goto out_unlock;
> host_err = vfs_link(dold, &nop_mnt_idmap, dirp, dnew, NULL);
> fh_fill_post_attrs(ffhp);
> - inode_unlock(dirp);
> +out_unlock:
> + end_creating(dnew, ddir);
> if (!host_err) {
> host_err = commit_metadata(ffhp);
> if (!host_err)
> host_err = commit_metadata(tfhp);
> }
>
> - dput(dnew);
> out_drop_write:
> fh_drop_write(tfhp);
> if (host_err == -EBUSY) {
> @@ -1828,12 +1822,6 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
> }
> out:
> return err != nfs_ok ? err : nfserrno(host_err);
> -
> -out_dput:
> - dput(dnew);
> -out_unlock:
> - inode_unlock(dirp);
> - goto out_drop_write;
> }
>
> static void
> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> index 27396fe63f6d..6a31ea34ff80 100644
> --- a/fs/overlayfs/copy_up.c
> +++ b/fs/overlayfs/copy_up.c
> @@ -613,9 +613,9 @@ static int ovl_link_up(struct ovl_copy_up_ctx *c)
> if (err)
> goto out;
>
> - inode_lock_nested(udir, I_MUTEX_PARENT);
> - upper = ovl_lookup_upper(ofs, c->dentry->d_name.name, upperdir,
> - c->dentry->d_name.len);
> + upper = ovl_start_creating_upper(ofs, upperdir,
> + &QSTR_LEN(c->dentry->d_name.name,
> + c->dentry->d_name.len));
> err = PTR_ERR(upper);
> if (!IS_ERR(upper)) {
> err = ovl_do_link(ofs, ovl_dentry_upper(c->dentry), udir, upper);
> @@ -626,9 +626,8 @@ static int ovl_link_up(struct ovl_copy_up_ctx *c)
> ovl_dentry_set_upper_alias(c->dentry);
> ovl_dentry_update_reval(c->dentry, upper);
> }
> - dput(upper);
> + end_creating(upper, upperdir);
> }
> - inode_unlock(udir);
> if (err)
> goto out;
>
> @@ -894,16 +893,14 @@ static int ovl_copy_up_tmpfile(struct ovl_copy_up_ctx *c)
> if (err)
> goto out;
>
> - inode_lock_nested(udir, I_MUTEX_PARENT);
> -
> - upper = ovl_lookup_upper(ofs, c->destname.name, c->destdir,
> - c->destname.len);
> + upper = ovl_start_creating_upper(ofs, c->destdir,
> + &QSTR_LEN(c->destname.name,
> + c->destname.len));
> err = PTR_ERR(upper);
> if (!IS_ERR(upper)) {
> err = ovl_do_link(ofs, temp, udir, upper);
> - dput(upper);
> + end_creating(upper, c->destdir);
> }
> - inode_unlock(udir);
>
> if (err)
> goto out;
> diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> index dbd63a74df4b..0ae79efbfce7 100644
> --- a/fs/overlayfs/dir.c
> +++ b/fs/overlayfs/dir.c
> @@ -59,15 +59,21 @@ int ovl_cleanup(struct ovl_fs *ofs, struct dentry *workdir,
> return 0;
> }
>
> -struct dentry *ovl_lookup_temp(struct ovl_fs *ofs, struct dentry *workdir)
> +#define OVL_TEMPNAME_SIZE 20
> +static void ovl_tempname(char name[OVL_TEMPNAME_SIZE])
> {
> - struct dentry *temp;
> - char name[20];
> static atomic_t temp_id = ATOMIC_INIT(0);
>
> /* counter is allowed to wrap, since temp dentries are ephemeral */
> - snprintf(name, sizeof(name), "#%x", atomic_inc_return(&temp_id));
> + snprintf(name, OVL_TEMPNAME_SIZE, "#%x", atomic_inc_return(&temp_id));
> +}
> +
> +struct dentry *ovl_lookup_temp(struct ovl_fs *ofs, struct dentry *workdir)
> +{
> + struct dentry *temp;
> + char name[OVL_TEMPNAME_SIZE];
>
> + ovl_tempname(name);
> temp = ovl_lookup_upper(ofs, name, workdir, strlen(name));
> if (!IS_ERR(temp) && temp->d_inode) {
> pr_err("workdir/%s already exists\n", name);
> @@ -78,6 +84,16 @@ struct dentry *ovl_lookup_temp(struct ovl_fs *ofs, struct dentry *workdir)
> return temp;
> }
>
> +static struct dentry *ovl_start_creating_temp(struct ovl_fs *ofs,
> + struct dentry *workdir)
> +{
> + char name[OVL_TEMPNAME_SIZE];
> +
> + ovl_tempname(name);
> + return start_creating(ovl_upper_mnt_idmap(ofs), workdir,
> + &QSTR(name));
> +}
> +
> static struct dentry *ovl_whiteout(struct ovl_fs *ofs)
> {
> int err;
> @@ -88,35 +104,31 @@ static struct dentry *ovl_whiteout(struct ovl_fs *ofs)
> guard(mutex)(&ofs->whiteout_lock);
>
> if (!ofs->whiteout) {
> - inode_lock_nested(wdir, I_MUTEX_PARENT);
> - whiteout = ovl_lookup_temp(ofs, workdir);
> - if (!IS_ERR(whiteout)) {
> - err = ovl_do_whiteout(ofs, wdir, whiteout);
> - if (err) {
> - dput(whiteout);
> - whiteout = ERR_PTR(err);
> - }
> - }
> - inode_unlock(wdir);
> + whiteout = ovl_start_creating_temp(ofs, workdir);
> if (IS_ERR(whiteout))
> return whiteout;
> - ofs->whiteout = whiteout;
> + err = ovl_do_whiteout(ofs, wdir, whiteout);
> + if (!err)
> + ofs->whiteout = dget(whiteout);
> + end_creating(whiteout, workdir);
> + if (err)
> + return ERR_PTR(err);
> }
>
> if (!ofs->no_shared_whiteout) {
> - inode_lock_nested(wdir, I_MUTEX_PARENT);
> - whiteout = ovl_lookup_temp(ofs, workdir);
> - if (!IS_ERR(whiteout)) {
> - err = ovl_do_link(ofs, ofs->whiteout, wdir, whiteout);
> - if (err) {
> - dput(whiteout);
> - whiteout = ERR_PTR(err);
> - }
> - }
> - inode_unlock(wdir);
> - if (!IS_ERR(whiteout))
> + struct dentry *ret = NULL;
For clarity please name this var "link".
> +
> + whiteout = ovl_start_creating_temp(ofs, workdir);
> + if (IS_ERR(whiteout))
> return whiteout;
> - if (PTR_ERR(whiteout) != -EMLINK) {
> + err = ovl_do_link(ofs, ofs->whiteout, wdir, whiteout);
> + if (!err)
> + ret = dget(whiteout);
> + end_creating(whiteout, workdir);
> + if (ret)
> + return ret;
> +
> + if (err != -EMLINK) {
> pr_warn("Failed to link whiteout - disabling whiteout inode sharing(nlink=%u, err=%lu)\n",
> ofs->whiteout->d_inode->i_nlink,
> PTR_ERR(whiteout));
> @@ -225,10 +237,13 @@ struct dentry *ovl_create_temp(struct ovl_fs *ofs, struct dentry *workdir,
> struct ovl_cattr *attr)
> {
> struct dentry *ret;
> - inode_lock_nested(workdir->d_inode, I_MUTEX_PARENT);
> - ret = ovl_create_real(ofs, workdir,
> - ovl_lookup_temp(ofs, workdir), attr);
> - inode_unlock(workdir->d_inode);
> + ret = ovl_start_creating_temp(ofs, workdir);
> + if (IS_ERR(ret))
> + return ret;
> + ret = ovl_create_real(ofs, workdir, ret, attr);
> + if (!IS_ERR(ret))
> + dget(ret);
> + end_creating(ret, workdir);
> return ret;
> }
>
> @@ -327,18 +342,21 @@ static int ovl_create_upper(struct dentry *dentry, struct inode *inode,
> {
> struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
> struct dentry *upperdir = ovl_dentry_upper(dentry->d_parent);
> - struct inode *udir = upperdir->d_inode;
> struct dentry *newdentry;
> int err;
>
> - inode_lock_nested(udir, I_MUTEX_PARENT);
> - newdentry = ovl_create_real(ofs, upperdir,
> - ovl_lookup_upper(ofs, dentry->d_name.name,
> - upperdir, dentry->d_name.len),
> - attr);
> - inode_unlock(udir);
> + newdentry = ovl_start_creating_upper(ofs, upperdir,
> + &QSTR_LEN(dentry->d_name.name,
> + dentry->d_name.len));
> if (IS_ERR(newdentry))
> return PTR_ERR(newdentry);
> + newdentry = ovl_create_real(ofs, upperdir, newdentry, attr);
> + if (IS_ERR(newdentry)) {
> + end_creating(newdentry, upperdir);
> + return PTR_ERR(newdentry);
> + }
> + dget(newdentry);
> + end_creating(newdentry, upperdir);
See suggestion below to make this:
newdentry = end_creating_dentry(newdentry, upperdir);
if (IS_ERR(newdentry))
return PTR_ERR(newdentry);
>
> if (ovl_type_merge(dentry->d_parent) && d_is_dir(newdentry) &&
> !ovl_allow_offline_changes(ofs)) {
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index 4f84abaa0d68..c24c2da953bd 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -415,6 +415,14 @@ static inline struct dentry *ovl_lookup_upper_unlocked(struct ovl_fs *ofs,
> &QSTR_LEN(name, len), base);
> }
>
> +static inline struct dentry *ovl_start_creating_upper(struct ovl_fs *ofs,
> + struct dentry *parent,
> + struct qstr *name)
> +{
> + return start_creating(ovl_upper_mnt_idmap(ofs),
> + parent, name);
> +}
> +
> static inline bool ovl_open_flags_need_copy_up(int flags)
> {
> if (!flags)
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index bd3d7ba8fb95..67abb62e205b 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -300,8 +300,7 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
> bool retried = false;
>
> retry:
> - inode_lock_nested(dir, I_MUTEX_PARENT);
> - work = ovl_lookup_upper(ofs, name, ofs->workbasedir, strlen(name));
> + work = ovl_start_creating_upper(ofs, ofs->workbasedir, &QSTR(name));
>
> if (!IS_ERR(work)) {
> struct iattr attr = {
> @@ -310,14 +309,13 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
> };
>
> if (work->d_inode) {
> + dget(work);
> + end_creating(work, ofs->workbasedir);
> + if (persist)
> + return work;
> err = -EEXIST;
> - inode_unlock(dir);
> if (retried)
> goto out_dput;
> -
> - if (persist)
> - return work;
> -
> retried = true;
> err = ovl_workdir_cleanup(ofs, ofs->workbasedir, mnt, work, 0);
> dput(work);
> @@ -328,7 +326,9 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
> }
>
> work = ovl_do_mkdir(ofs, dir, work, attr.ia_mode);
> - inode_unlock(dir);
> + if (!IS_ERR(work))
> + dget(work);
> + end_creating(work, ofs->workbasedir);
> err = PTR_ERR(work);
> if (IS_ERR(work))
> goto out_err;
> @@ -366,7 +366,6 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
> if (err)
> goto out_dput;
> } else {
> - inode_unlock(dir);
> err = PTR_ERR(work);
> goto out_err;
> }
> @@ -616,14 +615,17 @@ static struct dentry *ovl_lookup_or_create(struct ovl_fs *ofs,
> struct dentry *parent,
> const char *name, umode_t mode)
> {
> - size_t len = strlen(name);
> struct dentry *child;
>
> - inode_lock_nested(parent->d_inode, I_MUTEX_PARENT);
> - child = ovl_lookup_upper(ofs, name, parent, len);
> - if (!IS_ERR(child) && !child->d_inode)
> - child = ovl_create_real(ofs, parent, child, OVL_CATTR(mode));
> - inode_unlock(parent->d_inode);
> + child = ovl_start_creating_upper(ofs, parent, &QSTR(name));
> + if (!IS_ERR(child)) {
> + if (!child->d_inode)
> + child = ovl_create_real(ofs, parent, child,
> + OVL_CATTR(mode));
> + if (!IS_ERR(child))
> + dget(child);
> + end_creating(child, parent);
We have a few of those things open code which are not so pretty IMO.
How about:
child = end_creating_dentry(child, parent);
Which is a variant of the void end_creating() which does dget()
in the non error case?
end_creating_dentry() could be matched with start_creating_dentry()
in common cases where we had a ref on the child before creating and
we want to keep the ref on the child after creating.
> + }
> dput(parent);
>
> return child;
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index a7800ef04e76..4cbe930054a1 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -88,6 +88,24 @@ struct dentry *lookup_one_positive_killable(struct mnt_idmap *idmap,
> struct qstr *name,
> struct dentry *base);
>
> +struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *parent,
> + struct qstr *name);
> +
> +/* end_creating - finish action started with start_creating
> + * @child - dentry returned by start_creating()
> + * @parent - dentry given to start_creating()
> + *
> + * Unlock and release the child.
> + *
> + * Unlike end_dirop() this can only be called if start_creating() succeeded.
> + * It handles @child being and error as vfs_mkdir() might have converted the
> + * dentry to an error - in that case the parent still needs to be unlocked.
> + */
> +static inline void end_creating(struct dentry *child, struct dentry *parent)
> +{
> + end_dirop_mkdir(child, parent);
> +}
> +
That concludes my out-of-order review of this series.
The ovl changes look overall good to me.
I will wait for v2 without end_dirop_mkdir() to re-review this patch.
Feel free to take or discard my suggestion for end_creating_dentry().
I agree with Jeff that the conversion of if condition to a while loop in
cachefiles feels odd, because it is not clear if there should be a stop
condition. Anyway, best if cachefiles developers could review this
code anyway.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH 03/11] VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating()
2025-09-30 8:54 ` Amir Goldstein
@ 2025-10-01 3:15 ` NeilBrown
2025-10-02 10:52 ` Amir Goldstein
0 siblings, 1 reply; 49+ messages in thread
From: NeilBrown @ 2025-10-01 3:15 UTC (permalink / raw)
To: Amir Goldstein
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Tue, 30 Sep 2025, Amir Goldstein wrote:
> On Fri, Sep 26, 2025 at 4:50 AM NeilBrown <neilb@ownmail.net> wrote:
> >
> > From: NeilBrown <neil@brown.name>
> >
> > start_creating() is similar to simple_start_creating() but is not so
> > simple.
> > It takes a qstr for the name, includes permission checking, and does NOT
> > report an error if the name already exists, returning a positive dentry
> > instead.
> >
> > This is currently used by nfsd, cachefiles, and overlayfs.
> >
> > end_creating() is called after the dentry has been used.
> > end_creating() drops the reference to the dentry as it is generally no
> > longer needed. This is exactly end_dirop_mkdir(),
> > but using that everywhere looks a bit odd...
> >
> > These calls help encapsulate locking rules so that directory locking can
> > be changed.
> >
> > Occasionally this change means that the parent lock is held for a
> > shorter period of time, for example in cachefiles_commit_tmpfile().
> > As this function now unlocks after an unlink and before the following
> > lookup, it is possible that the lookup could again find a positive
> > dentry, so a while loop is introduced there.
> >
> > In overlayfs the ovl_lookup_temp() function has ovl_tempname()
> > split out to be used in ovl_start_creating_temp(). The other use
> > of ovl_lookup_temp() is preparing for a rename. When rename handling
> > is updated, ovl_lookup_temp() will be removed.
> >
> > Signed-off-by: NeilBrown <neil@brown.name>
> > ---
> > fs/cachefiles/namei.c | 37 ++++++++--------
> > fs/namei.c | 27 ++++++++++++
> > fs/nfsd/nfs3proc.c | 14 +++---
> > fs/nfsd/nfs4proc.c | 14 +++---
> > fs/nfsd/nfs4recover.c | 16 +++----
> > fs/nfsd/nfsproc.c | 11 +++--
> > fs/nfsd/vfs.c | 42 +++++++-----------
> > fs/overlayfs/copy_up.c | 19 ++++----
> > fs/overlayfs/dir.c | 94 ++++++++++++++++++++++++----------------
> > fs/overlayfs/overlayfs.h | 8 ++++
> > fs/overlayfs/super.c | 32 +++++++-------
> > include/linux/namei.h | 18 ++++++++
> > 12 files changed, 187 insertions(+), 145 deletions(-)
> >
> > diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
> > index d1edb2ac3837..965b22b2f58d 100644
> > --- a/fs/cachefiles/namei.c
> > +++ b/fs/cachefiles/namei.c
> > @@ -93,12 +93,11 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
> > _enter(",,%s", dirname);
> >
> > /* search the current directory for the element name */
> > - inode_lock_nested(d_inode(dir), I_MUTEX_PARENT);
> >
> > retry:
> > ret = cachefiles_inject_read_error();
> > if (ret == 0)
> > - subdir = lookup_one(&nop_mnt_idmap, &QSTR(dirname), dir);
> > + subdir = start_creating(&nop_mnt_idmap, dir, &QSTR(dirname));
> > else
> > subdir = ERR_PTR(ret);
> > trace_cachefiles_lookup(NULL, dir, subdir);
> > @@ -141,7 +140,7 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
> > trace_cachefiles_mkdir(dir, subdir);
> >
> > if (unlikely(d_unhashed(subdir) || d_is_negative(subdir))) {
> > - dput(subdir);
> > + end_creating(subdir, dir);
> > goto retry;
> > }
> > ASSERT(d_backing_inode(subdir));
> > @@ -154,7 +153,8 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
> >
> > /* Tell rmdir() it's not allowed to delete the subdir */
> > inode_lock(d_inode(subdir));
> > - inode_unlock(d_inode(dir));
> > + dget(subdir);
> > + end_creating(subdir, dir);
> >
> > if (!__cachefiles_mark_inode_in_use(NULL, d_inode(subdir))) {
> > pr_notice("cachefiles: Inode already in use: %pd (B=%lx)\n",
> > @@ -196,14 +196,11 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
> > return ERR_PTR(-EBUSY);
> >
> > mkdir_error:
> > - inode_unlock(d_inode(dir));
> > - if (!IS_ERR(subdir))
> > - dput(subdir);
> > + end_creating(subdir, dir);
> > pr_err("mkdir %s failed with error %d\n", dirname, ret);
> > return ERR_PTR(ret);
> >
> > lookup_error:
> > - inode_unlock(d_inode(dir));
> > ret = PTR_ERR(subdir);
> > pr_err("Lookup %s failed with error %d\n", dirname, ret);
> > return ERR_PTR(ret);
> > @@ -679,36 +676,37 @@ bool cachefiles_commit_tmpfile(struct cachefiles_cache *cache,
> >
> > _enter(",%pD", object->file);
> >
> > - inode_lock_nested(d_inode(fan), I_MUTEX_PARENT);
> > ret = cachefiles_inject_read_error();
> > if (ret == 0)
> > - dentry = lookup_one(&nop_mnt_idmap, &QSTR(object->d_name), fan);
> > + dentry = start_creating(&nop_mnt_idmap, fan, &QSTR(object->d_name));
> > else
> > dentry = ERR_PTR(ret);
> > if (IS_ERR(dentry)) {
> > trace_cachefiles_vfs_error(object, d_inode(fan), PTR_ERR(dentry),
> > cachefiles_trace_lookup_error);
> > _debug("lookup fail %ld", PTR_ERR(dentry));
> > - goto out_unlock;
> > + goto out;
> > }
> >
> > - if (!d_is_negative(dentry)) {
> > + while (!d_is_negative(dentry)) {
> > ret = cachefiles_unlink(volume->cache, object, fan, dentry,
> > FSCACHE_OBJECT_IS_STALE);
> > if (ret < 0)
> > - goto out_dput;
> > + goto out_end;
> > +
> > + end_creating(dentry, fan);
> >
> > - dput(dentry);
> > ret = cachefiles_inject_read_error();
> > if (ret == 0)
> > - dentry = lookup_one(&nop_mnt_idmap, &QSTR(object->d_name), fan);
> > + dentry = start_creating(&nop_mnt_idmap, fan,
> > + &QSTR(object->d_name));
> > else
> > dentry = ERR_PTR(ret);
> > if (IS_ERR(dentry)) {
> > trace_cachefiles_vfs_error(object, d_inode(fan), PTR_ERR(dentry),
> > cachefiles_trace_lookup_error);
> > _debug("lookup fail %ld", PTR_ERR(dentry));
> > - goto out_unlock;
> > + goto out;
> > }
> > }
> >
> > @@ -729,10 +727,9 @@ bool cachefiles_commit_tmpfile(struct cachefiles_cache *cache,
> > success = true;
> > }
> >
> > -out_dput:
> > - dput(dentry);
> > -out_unlock:
> > - inode_unlock(d_inode(fan));
> > +out_end:
> > + end_creating(dentry, fan);
> > +out:
> > _leave(" = %u", success);
> > return success;
> > }
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 81cbaabbbe21..064cb44a3a46 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -3242,6 +3242,33 @@ struct dentry *lookup_noperm_positive_unlocked(struct qstr *name,
> > }
> > EXPORT_SYMBOL(lookup_noperm_positive_unlocked);
> >
> > +/**
> > + * start_creating - prepare to create a given name with permission checking
> > + * @idmap - idmap of the mount
> > + * @parent - directory in which to prepare to create the name
> > + * @name - the name to be created
> > + *
> > + * Locks are taken and a lookup in performed prior to creating
>
> typo: is performed
>
> > + * an object in a directory. Permission checking (MAY_EXEC) is performed
> > + * against @idmap.
> > + *
> > + * If the name already exists, a positive dentry is returned, so
> > + * behaviour is similar to O_CREAT without O_EXCL, which doesn't fail
> > + * with -EEXIST.
> > + *
> > + * Returns: a negative or positive dentry, or an error.
> > + */
> > +struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *parent,
> > + struct qstr *name)
> > +{
> > + int err = lookup_one_common(idmap, name, parent);
> > +
> > + if (err)
> > + return ERR_PTR(err);
> > + return start_dirop(parent, name, LOOKUP_CREATE);
> > +}
> > +EXPORT_SYMBOL(start_creating);
> > +
> > #ifdef CONFIG_UNIX98_PTYS
> > int path_pts(struct path *path)
> > {
> > diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
> > index b6d03e1ef5f7..e2aac0def2cb 100644
> > --- a/fs/nfsd/nfs3proc.c
> > +++ b/fs/nfsd/nfs3proc.c
> > @@ -281,14 +281,11 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > if (host_err)
> > return nfserrno(host_err);
> >
> > - inode_lock_nested(inode, I_MUTEX_PARENT);
> > -
> > - child = lookup_one(&nop_mnt_idmap,
> > - &QSTR_LEN(argp->name, argp->len),
> > - parent);
> > + child = start_creating(&nop_mnt_idmap, parent,
> > + &QSTR_LEN(argp->name, argp->len));
> > if (IS_ERR(child)) {
> > status = nfserrno(PTR_ERR(child));
> > - goto out;
> > + goto out_write;
> > }
> >
> > if (d_really_is_negative(child)) {
> > @@ -367,9 +364,8 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > status = nfsd_create_setattr(rqstp, fhp, resfhp, &attrs);
> >
> > out:
> > - inode_unlock(inode);
> > - if (child && !IS_ERR(child))
> > - dput(child);
> > + end_creating(child, parent);
> > +out_write:
> > fh_drop_write(fhp);
> > return status;
> > }
> > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > index 71b428efcbb5..35d48221072f 100644
> > --- a/fs/nfsd/nfs4proc.c
> > +++ b/fs/nfsd/nfs4proc.c
> > @@ -264,14 +264,11 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > if (is_create_with_attrs(open))
> > nfsd4_acl_to_attr(NF4REG, open->op_acl, &attrs);
> >
> > - inode_lock_nested(inode, I_MUTEX_PARENT);
> > -
> > - child = lookup_one(&nop_mnt_idmap,
> > - &QSTR_LEN(open->op_fname, open->op_fnamelen),
> > - parent);
> > + child = start_creating(&nop_mnt_idmap, parent,
> > + &QSTR_LEN(open->op_fname, open->op_fnamelen));
> > if (IS_ERR(child)) {
> > status = nfserrno(PTR_ERR(child));
> > - goto out;
> > + goto out_write;
> > }
> >
> > if (d_really_is_negative(child)) {
> > @@ -379,10 +376,9 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > if (attrs.na_aclerr)
> > open->op_bmval[0] &= ~FATTR4_WORD0_ACL;
> > out:
> > - inode_unlock(inode);
> > + end_creating(child, parent);
> > nfsd_attrs_free(&attrs);
> > - if (child && !IS_ERR(child))
> > - dput(child);
> > +out_write:
> > fh_drop_write(fhp);
> > return status;
> > }
> > diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
> > index 2231192ec33f..93b2a3e764db 100644
> > --- a/fs/nfsd/nfs4recover.c
> > +++ b/fs/nfsd/nfs4recover.c
> > @@ -216,13 +216,11 @@ nfsd4_create_clid_dir(struct nfs4_client *clp)
> > goto out_creds;
> >
> > dir = nn->rec_file->f_path.dentry;
> > - /* lock the parent */
> > - inode_lock(d_inode(dir));
> >
> > - dentry = lookup_one(&nop_mnt_idmap, &QSTR(dname), dir);
> > + dentry = start_creating(&nop_mnt_idmap, dir, &QSTR(dname));
> > if (IS_ERR(dentry)) {
> > status = PTR_ERR(dentry);
> > - goto out_unlock;
> > + goto out;
> > }
> > if (d_really_is_positive(dentry))
> > /*
> > @@ -233,15 +231,13 @@ nfsd4_create_clid_dir(struct nfs4_client *clp)
> > * In the 4.0 case, we should never get here; but we may
> > * as well be forgiving and just succeed silently.
> > */
> > - goto out_put;
> > + goto out_end;
> > dentry = vfs_mkdir(&nop_mnt_idmap, d_inode(dir), dentry, S_IRWXU);
> > if (IS_ERR(dentry))
> > status = PTR_ERR(dentry);
> > -out_put:
> > - if (!status)
> > - dput(dentry);
> > -out_unlock:
> > - inode_unlock(d_inode(dir));
> > +out_end:
> > + end_creating(dentry, dir);
> > +out:
> > if (status == 0) {
> > if (nn->in_grace)
> > __nfsd4_create_reclaim_record_grace(clp, dname,
> > diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
> > index 8f71f5748c75..ee1b16e921fd 100644
> > --- a/fs/nfsd/nfsproc.c
> > +++ b/fs/nfsd/nfsproc.c
> > @@ -306,18 +306,16 @@ nfsd_proc_create(struct svc_rqst *rqstp)
> > goto done;
> > }
> >
> > - inode_lock_nested(dirfhp->fh_dentry->d_inode, I_MUTEX_PARENT);
> > - dchild = lookup_one(&nop_mnt_idmap, &QSTR_LEN(argp->name, argp->len),
> > - dirfhp->fh_dentry);
> > + dchild = start_creating(&nop_mnt_idmap, dirfhp->fh_dentry,
> > + &QSTR_LEN(argp->name, argp->len));
> > if (IS_ERR(dchild)) {
> > resp->status = nfserrno(PTR_ERR(dchild));
> > - goto out_unlock;
> > + goto out_write;
> > }
> > fh_init(newfhp, NFS_FHSIZE);
> > resp->status = fh_compose(newfhp, dirfhp->fh_export, dchild, dirfhp);
> > if (!resp->status && d_really_is_negative(dchild))
> > resp->status = nfserr_noent;
> > - dput(dchild);
> > if (resp->status) {
> > if (resp->status != nfserr_noent)
> > goto out_unlock;
> > @@ -423,7 +421,8 @@ nfsd_proc_create(struct svc_rqst *rqstp)
> > }
> >
> > out_unlock:
> > - inode_unlock(dirfhp->fh_dentry->d_inode);
> > + end_creating(dchild, dirfhp->fh_dentry);
> > +out_write:
> > fh_drop_write(dirfhp);
> > done:
> > fh_put(dirfhp);
> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index aa4a95713a48..90c830c59c60 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -1605,19 +1605,16 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > if (host_err)
> > return nfserrno(host_err);
> >
> > - inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT);
> > - dchild = lookup_one(&nop_mnt_idmap, &QSTR_LEN(fname, flen), dentry);
> > + dchild = start_creating(&nop_mnt_idmap, dentry, &QSTR_LEN(fname, flen));
> > host_err = PTR_ERR(dchild);
> > - if (IS_ERR(dchild)) {
> > - err = nfserrno(host_err);
> > - goto out_unlock;
> > - }
> > + if (IS_ERR(dchild))
> > + return nfserrno(host_err);
> > +
> > err = fh_compose(resfhp, fhp->fh_export, dchild, fhp);
> > /*
> > * We unconditionally drop our ref to dchild as fh_compose will have
> > * already grabbed its own ref for it.
> > */
> > - dput(dchild);
> > if (err)
> > goto out_unlock;
> > err = fh_fill_pre_attrs(fhp);
> > @@ -1626,7 +1623,7 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > err = nfsd_create_locked(rqstp, fhp, attrs, type, rdev, resfhp);
> > fh_fill_post_attrs(fhp);
> > out_unlock:
> > - inode_unlock(dentry->d_inode);
> > + end_creating(dchild, dentry);
> > return err;
> > }
> >
> > @@ -1712,11 +1709,9 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > }
> >
> > dentry = fhp->fh_dentry;
> > - inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT);
> > - dnew = lookup_one(&nop_mnt_idmap, &QSTR_LEN(fname, flen), dentry);
> > + dnew = start_creating(&nop_mnt_idmap, dentry, &QSTR_LEN(fname, flen));
> > if (IS_ERR(dnew)) {
> > err = nfserrno(PTR_ERR(dnew));
> > - inode_unlock(dentry->d_inode);
> > goto out_drop_write;
> > }
> > err = fh_fill_pre_attrs(fhp);
> > @@ -1729,11 +1724,11 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > nfsd_create_setattr(rqstp, fhp, resfhp, attrs);
> > fh_fill_post_attrs(fhp);
> > out_unlock:
> > - inode_unlock(dentry->d_inode);
> > + end_creating(dnew, dentry);
> > if (!err)
> > err = nfserrno(commit_metadata(fhp));
> > - dput(dnew);
> > - if (err==0) err = cerr;
> > + if (!err)
> > + err = cerr;
> > out_drop_write:
> > fh_drop_write(fhp);
> > out:
> > @@ -1788,32 +1783,31 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
> >
> > ddir = ffhp->fh_dentry;
> > dirp = d_inode(ddir);
> > - inode_lock_nested(dirp, I_MUTEX_PARENT);
> > + dnew = start_creating(&nop_mnt_idmap, ddir, &QSTR_LEN(name, len));
> >
> > - dnew = lookup_one(&nop_mnt_idmap, &QSTR_LEN(name, len), ddir);
> > if (IS_ERR(dnew)) {
> > host_err = PTR_ERR(dnew);
> > - goto out_unlock;
> > + goto out_drop_write;
> > }
> >
> > dold = tfhp->fh_dentry;
> >
> > err = nfserr_noent;
> > if (d_really_is_negative(dold))
> > - goto out_dput;
> > + goto out_unlock;
> > err = fh_fill_pre_attrs(ffhp);
> > if (err != nfs_ok)
> > - goto out_dput;
> > + goto out_unlock;
> > host_err = vfs_link(dold, &nop_mnt_idmap, dirp, dnew, NULL);
> > fh_fill_post_attrs(ffhp);
> > - inode_unlock(dirp);
> > +out_unlock:
> > + end_creating(dnew, ddir);
> > if (!host_err) {
> > host_err = commit_metadata(ffhp);
> > if (!host_err)
> > host_err = commit_metadata(tfhp);
> > }
> >
> > - dput(dnew);
> > out_drop_write:
> > fh_drop_write(tfhp);
> > if (host_err == -EBUSY) {
> > @@ -1828,12 +1822,6 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
> > }
> > out:
> > return err != nfs_ok ? err : nfserrno(host_err);
> > -
> > -out_dput:
> > - dput(dnew);
> > -out_unlock:
> > - inode_unlock(dirp);
> > - goto out_drop_write;
> > }
> >
> > static void
> > diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> > index 27396fe63f6d..6a31ea34ff80 100644
> > --- a/fs/overlayfs/copy_up.c
> > +++ b/fs/overlayfs/copy_up.c
> > @@ -613,9 +613,9 @@ static int ovl_link_up(struct ovl_copy_up_ctx *c)
> > if (err)
> > goto out;
> >
> > - inode_lock_nested(udir, I_MUTEX_PARENT);
> > - upper = ovl_lookup_upper(ofs, c->dentry->d_name.name, upperdir,
> > - c->dentry->d_name.len);
> > + upper = ovl_start_creating_upper(ofs, upperdir,
> > + &QSTR_LEN(c->dentry->d_name.name,
> > + c->dentry->d_name.len));
> > err = PTR_ERR(upper);
> > if (!IS_ERR(upper)) {
> > err = ovl_do_link(ofs, ovl_dentry_upper(c->dentry), udir, upper);
> > @@ -626,9 +626,8 @@ static int ovl_link_up(struct ovl_copy_up_ctx *c)
> > ovl_dentry_set_upper_alias(c->dentry);
> > ovl_dentry_update_reval(c->dentry, upper);
> > }
> > - dput(upper);
> > + end_creating(upper, upperdir);
> > }
> > - inode_unlock(udir);
> > if (err)
> > goto out;
> >
> > @@ -894,16 +893,14 @@ static int ovl_copy_up_tmpfile(struct ovl_copy_up_ctx *c)
> > if (err)
> > goto out;
> >
> > - inode_lock_nested(udir, I_MUTEX_PARENT);
> > -
> > - upper = ovl_lookup_upper(ofs, c->destname.name, c->destdir,
> > - c->destname.len);
> > + upper = ovl_start_creating_upper(ofs, c->destdir,
> > + &QSTR_LEN(c->destname.name,
> > + c->destname.len));
> > err = PTR_ERR(upper);
> > if (!IS_ERR(upper)) {
> > err = ovl_do_link(ofs, temp, udir, upper);
> > - dput(upper);
> > + end_creating(upper, c->destdir);
> > }
> > - inode_unlock(udir);
> >
> > if (err)
> > goto out;
> > diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> > index dbd63a74df4b..0ae79efbfce7 100644
> > --- a/fs/overlayfs/dir.c
> > +++ b/fs/overlayfs/dir.c
> > @@ -59,15 +59,21 @@ int ovl_cleanup(struct ovl_fs *ofs, struct dentry *workdir,
> > return 0;
> > }
> >
> > -struct dentry *ovl_lookup_temp(struct ovl_fs *ofs, struct dentry *workdir)
> > +#define OVL_TEMPNAME_SIZE 20
> > +static void ovl_tempname(char name[OVL_TEMPNAME_SIZE])
> > {
> > - struct dentry *temp;
> > - char name[20];
> > static atomic_t temp_id = ATOMIC_INIT(0);
> >
> > /* counter is allowed to wrap, since temp dentries are ephemeral */
> > - snprintf(name, sizeof(name), "#%x", atomic_inc_return(&temp_id));
> > + snprintf(name, OVL_TEMPNAME_SIZE, "#%x", atomic_inc_return(&temp_id));
> > +}
> > +
> > +struct dentry *ovl_lookup_temp(struct ovl_fs *ofs, struct dentry *workdir)
> > +{
> > + struct dentry *temp;
> > + char name[OVL_TEMPNAME_SIZE];
> >
> > + ovl_tempname(name);
> > temp = ovl_lookup_upper(ofs, name, workdir, strlen(name));
> > if (!IS_ERR(temp) && temp->d_inode) {
> > pr_err("workdir/%s already exists\n", name);
> > @@ -78,6 +84,16 @@ struct dentry *ovl_lookup_temp(struct ovl_fs *ofs, struct dentry *workdir)
> > return temp;
> > }
> >
> > +static struct dentry *ovl_start_creating_temp(struct ovl_fs *ofs,
> > + struct dentry *workdir)
> > +{
> > + char name[OVL_TEMPNAME_SIZE];
> > +
> > + ovl_tempname(name);
> > + return start_creating(ovl_upper_mnt_idmap(ofs), workdir,
> > + &QSTR(name));
> > +}
> > +
> > static struct dentry *ovl_whiteout(struct ovl_fs *ofs)
> > {
> > int err;
> > @@ -88,35 +104,31 @@ static struct dentry *ovl_whiteout(struct ovl_fs *ofs)
> > guard(mutex)(&ofs->whiteout_lock);
> >
> > if (!ofs->whiteout) {
> > - inode_lock_nested(wdir, I_MUTEX_PARENT);
> > - whiteout = ovl_lookup_temp(ofs, workdir);
> > - if (!IS_ERR(whiteout)) {
> > - err = ovl_do_whiteout(ofs, wdir, whiteout);
> > - if (err) {
> > - dput(whiteout);
> > - whiteout = ERR_PTR(err);
> > - }
> > - }
> > - inode_unlock(wdir);
> > + whiteout = ovl_start_creating_temp(ofs, workdir);
> > if (IS_ERR(whiteout))
> > return whiteout;
> > - ofs->whiteout = whiteout;
> > + err = ovl_do_whiteout(ofs, wdir, whiteout);
> > + if (!err)
> > + ofs->whiteout = dget(whiteout);
> > + end_creating(whiteout, workdir);
> > + if (err)
> > + return ERR_PTR(err);
> > }
> >
> > if (!ofs->no_shared_whiteout) {
> > - inode_lock_nested(wdir, I_MUTEX_PARENT);
> > - whiteout = ovl_lookup_temp(ofs, workdir);
> > - if (!IS_ERR(whiteout)) {
> > - err = ovl_do_link(ofs, ofs->whiteout, wdir, whiteout);
> > - if (err) {
> > - dput(whiteout);
> > - whiteout = ERR_PTR(err);
> > - }
> > - }
> > - inode_unlock(wdir);
> > - if (!IS_ERR(whiteout))
> > + struct dentry *ret = NULL;
>
> For clarity please name this var "link".
Is "link" really clearer than "ret"?
Maybe if I make it
struct dentry *link;
link = ovl_start_creating_temp(ofs, workdir);
if (IS_ERR(link))
return link;
err = ovl_do_link(ofs, ofs->whiteout, wdir, link);
if (!err)
whiteout = dget(link);
end_creating(whiteout, workdir);
if (!err)
return whiteout;
Then "link" makes sense to me.
>
> > +
> > + whiteout = ovl_start_creating_temp(ofs, workdir);
> > + if (IS_ERR(whiteout))
> > return whiteout;
> > - if (PTR_ERR(whiteout) != -EMLINK) {
> > + err = ovl_do_link(ofs, ofs->whiteout, wdir, whiteout);
> > + if (!err)
> > + ret = dget(whiteout);
> > + end_creating(whiteout, workdir);
> > + if (ret)
> > + return ret;
> > +
> > + if (err != -EMLINK) {
> > pr_warn("Failed to link whiteout - disabling whiteout inode sharing(nlink=%u, err=%lu)\n",
> > ofs->whiteout->d_inode->i_nlink,
> > PTR_ERR(whiteout));
> > @@ -225,10 +237,13 @@ struct dentry *ovl_create_temp(struct ovl_fs *ofs, struct dentry *workdir,
> > struct ovl_cattr *attr)
> > {
> > struct dentry *ret;
> > - inode_lock_nested(workdir->d_inode, I_MUTEX_PARENT);
> > - ret = ovl_create_real(ofs, workdir,
> > - ovl_lookup_temp(ofs, workdir), attr);
> > - inode_unlock(workdir->d_inode);
> > + ret = ovl_start_creating_temp(ofs, workdir);
> > + if (IS_ERR(ret))
> > + return ret;
> > + ret = ovl_create_real(ofs, workdir, ret, attr);
> > + if (!IS_ERR(ret))
> > + dget(ret);
> > + end_creating(ret, workdir);
> > return ret;
> > }
> >
> > @@ -327,18 +342,21 @@ static int ovl_create_upper(struct dentry *dentry, struct inode *inode,
> > {
> > struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
> > struct dentry *upperdir = ovl_dentry_upper(dentry->d_parent);
> > - struct inode *udir = upperdir->d_inode;
> > struct dentry *newdentry;
> > int err;
> >
> > - inode_lock_nested(udir, I_MUTEX_PARENT);
> > - newdentry = ovl_create_real(ofs, upperdir,
> > - ovl_lookup_upper(ofs, dentry->d_name.name,
> > - upperdir, dentry->d_name.len),
> > - attr);
> > - inode_unlock(udir);
> > + newdentry = ovl_start_creating_upper(ofs, upperdir,
> > + &QSTR_LEN(dentry->d_name.name,
> > + dentry->d_name.len));
> > if (IS_ERR(newdentry))
> > return PTR_ERR(newdentry);
> > + newdentry = ovl_create_real(ofs, upperdir, newdentry, attr);
> > + if (IS_ERR(newdentry)) {
> > + end_creating(newdentry, upperdir);
> > + return PTR_ERR(newdentry);
> > + }
> > + dget(newdentry);
> > + end_creating(newdentry, upperdir);
>
> See suggestion below to make this:
>
> newdentry = end_creating_dentry(newdentry, upperdir);
> if (IS_ERR(newdentry))
> return PTR_ERR(newdentry);
>
> >
> > if (ovl_type_merge(dentry->d_parent) && d_is_dir(newdentry) &&
> > !ovl_allow_offline_changes(ofs)) {
> > diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> > index 4f84abaa0d68..c24c2da953bd 100644
> > --- a/fs/overlayfs/overlayfs.h
> > +++ b/fs/overlayfs/overlayfs.h
> > @@ -415,6 +415,14 @@ static inline struct dentry *ovl_lookup_upper_unlocked(struct ovl_fs *ofs,
> > &QSTR_LEN(name, len), base);
> > }
> >
> > +static inline struct dentry *ovl_start_creating_upper(struct ovl_fs *ofs,
> > + struct dentry *parent,
> > + struct qstr *name)
> > +{
> > + return start_creating(ovl_upper_mnt_idmap(ofs),
> > + parent, name);
> > +}
> > +
> > static inline bool ovl_open_flags_need_copy_up(int flags)
> > {
> > if (!flags)
> > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> > index bd3d7ba8fb95..67abb62e205b 100644
> > --- a/fs/overlayfs/super.c
> > +++ b/fs/overlayfs/super.c
> > @@ -300,8 +300,7 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
> > bool retried = false;
> >
> > retry:
> > - inode_lock_nested(dir, I_MUTEX_PARENT);
> > - work = ovl_lookup_upper(ofs, name, ofs->workbasedir, strlen(name));
> > + work = ovl_start_creating_upper(ofs, ofs->workbasedir, &QSTR(name));
> >
> > if (!IS_ERR(work)) {
> > struct iattr attr = {
> > @@ -310,14 +309,13 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
> > };
> >
> > if (work->d_inode) {
> > + dget(work);
> > + end_creating(work, ofs->workbasedir);
> > + if (persist)
> > + return work;
> > err = -EEXIST;
> > - inode_unlock(dir);
> > if (retried)
> > goto out_dput;
> > -
> > - if (persist)
> > - return work;
> > -
> > retried = true;
> > err = ovl_workdir_cleanup(ofs, ofs->workbasedir, mnt, work, 0);
> > dput(work);
> > @@ -328,7 +326,9 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
> > }
> >
> > work = ovl_do_mkdir(ofs, dir, work, attr.ia_mode);
> > - inode_unlock(dir);
> > + if (!IS_ERR(work))
> > + dget(work);
> > + end_creating(work, ofs->workbasedir);
> > err = PTR_ERR(work);
> > if (IS_ERR(work))
> > goto out_err;
> > @@ -366,7 +366,6 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
> > if (err)
> > goto out_dput;
> > } else {
> > - inode_unlock(dir);
> > err = PTR_ERR(work);
> > goto out_err;
> > }
> > @@ -616,14 +615,17 @@ static struct dentry *ovl_lookup_or_create(struct ovl_fs *ofs,
> > struct dentry *parent,
> > const char *name, umode_t mode)
> > {
> > - size_t len = strlen(name);
> > struct dentry *child;
> >
> > - inode_lock_nested(parent->d_inode, I_MUTEX_PARENT);
> > - child = ovl_lookup_upper(ofs, name, parent, len);
> > - if (!IS_ERR(child) && !child->d_inode)
> > - child = ovl_create_real(ofs, parent, child, OVL_CATTR(mode));
> > - inode_unlock(parent->d_inode);
> > + child = ovl_start_creating_upper(ofs, parent, &QSTR(name));
> > + if (!IS_ERR(child)) {
> > + if (!child->d_inode)
> > + child = ovl_create_real(ofs, parent, child,
> > + OVL_CATTR(mode));
> > + if (!IS_ERR(child))
> > + dget(child);
> > + end_creating(child, parent);
>
> We have a few of those things open code which are not so pretty IMO.
> How about:
>
> child = end_creating_dentry(child, parent);
>
> Which is a variant of the void end_creating() which does dget()
> in the non error case?
I have experimented with that idea. I'm not convinced.
There are six places where it would help.
One is in cachefiles, the rest are in overlayfs.
Two of them are just
dget()
end_creating()
There other have some sort of condition on error as you pointed out
above.
That is out of nearly 30 uses for end_creating().
I would rather declare the dentry variable as __free(end_dirop)
or similar so that "end_creating()" would not appear and the dget()
would stand alone with (hopefully) a clear meaning.
But I can't do that until the vfs_mkdir() issue is resolved.
Maybe it would end up being cleaner having and alternate form of
end_creating() which returns a reference. But I'm not convinced yet.
>
> end_creating_dentry() could be matched with start_creating_dentry()
> in common cases where we had a ref on the child before creating and
> we want to keep the ref on the child after creating.
I don't think there is a useful connection between end_creating_dentry()
and start_creating_dentry(), so I wouldn't use that name.
end_creating_return() maybe, or end_creating_keep().
start_creating_dentry() takes an extra ref so it is always correct to
use end_creating() and you don't need a dget() because you already have
a ref. You only need the dget() (or to avoid the dput()) if a lookup
was performed by start_creating().
>
> > + }
> > dput(parent);
> >
> > return child;
> > diff --git a/include/linux/namei.h b/include/linux/namei.h
> > index a7800ef04e76..4cbe930054a1 100644
> > --- a/include/linux/namei.h
> > +++ b/include/linux/namei.h
> > @@ -88,6 +88,24 @@ struct dentry *lookup_one_positive_killable(struct mnt_idmap *idmap,
> > struct qstr *name,
> > struct dentry *base);
> >
> > +struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *parent,
> > + struct qstr *name);
> > +
> > +/* end_creating - finish action started with start_creating
> > + * @child - dentry returned by start_creating()
> > + * @parent - dentry given to start_creating()
> > + *
> > + * Unlock and release the child.
> > + *
> > + * Unlike end_dirop() this can only be called if start_creating() succeeded.
> > + * It handles @child being and error as vfs_mkdir() might have converted the
> > + * dentry to an error - in that case the parent still needs to be unlocked.
> > + */
> > +static inline void end_creating(struct dentry *child, struct dentry *parent)
> > +{
> > + end_dirop_mkdir(child, parent);
> > +}
> > +
>
> That concludes my out-of-order review of this series.
>
> The ovl changes look overall good to me.
> I will wait for v2 without end_dirop_mkdir() to re-review this patch.
>
> Feel free to take or discard my suggestion for end_creating_dentry().
>
> I agree with Jeff that the conversion of if condition to a while loop in
> cachefiles feels odd, because it is not clear if there should be a stop
> condition. Anyway, best if cachefiles developers could review this
> code anyway.
>
> Thanks,
> Amir.
>
Thanks a lot of the thorough review - I really appreciate it.
NeilBrown
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH 03/11] VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating()
2025-10-01 3:15 ` NeilBrown
@ 2025-10-02 10:52 ` Amir Goldstein
0 siblings, 0 replies; 49+ messages in thread
From: Amir Goldstein @ 2025-10-02 10:52 UTC (permalink / raw)
To: NeilBrown
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Wed, Oct 1, 2025 at 5:15 AM NeilBrown <neilb@ownmail.net> wrote:
>
> On Tue, 30 Sep 2025, Amir Goldstein wrote:
> > On Fri, Sep 26, 2025 at 4:50 AM NeilBrown <neilb@ownmail.net> wrote:
> > >
> > > From: NeilBrown <neil@brown.name>
> > >
> > > start_creating() is similar to simple_start_creating() but is not so
> > > simple.
> > > It takes a qstr for the name, includes permission checking, and does NOT
> > > report an error if the name already exists, returning a positive dentry
> > > instead.
> > >
> > > This is currently used by nfsd, cachefiles, and overlayfs.
> > >
> > > end_creating() is called after the dentry has been used.
> > > end_creating() drops the reference to the dentry as it is generally no
> > > longer needed. This is exactly end_dirop_mkdir(),
> > > but using that everywhere looks a bit odd...
> > >
> > > These calls help encapsulate locking rules so that directory locking can
> > > be changed.
> > >
> > > Occasionally this change means that the parent lock is held for a
> > > shorter period of time, for example in cachefiles_commit_tmpfile().
> > > As this function now unlocks after an unlink and before the following
> > > lookup, it is possible that the lookup could again find a positive
> > > dentry, so a while loop is introduced there.
> > >
> > > In overlayfs the ovl_lookup_temp() function has ovl_tempname()
> > > split out to be used in ovl_start_creating_temp(). The other use
> > > of ovl_lookup_temp() is preparing for a rename. When rename handling
> > > is updated, ovl_lookup_temp() will be removed.
> > >
> > > Signed-off-by: NeilBrown <neil@brown.name>
> > > ---
> > > fs/cachefiles/namei.c | 37 ++++++++--------
> > > fs/namei.c | 27 ++++++++++++
> > > fs/nfsd/nfs3proc.c | 14 +++---
> > > fs/nfsd/nfs4proc.c | 14 +++---
> > > fs/nfsd/nfs4recover.c | 16 +++----
> > > fs/nfsd/nfsproc.c | 11 +++--
> > > fs/nfsd/vfs.c | 42 +++++++-----------
> > > fs/overlayfs/copy_up.c | 19 ++++----
> > > fs/overlayfs/dir.c | 94 ++++++++++++++++++++++++----------------
> > > fs/overlayfs/overlayfs.h | 8 ++++
> > > fs/overlayfs/super.c | 32 +++++++-------
> > > include/linux/namei.h | 18 ++++++++
> > > 12 files changed, 187 insertions(+), 145 deletions(-)
> > >
> > > diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
> > > index d1edb2ac3837..965b22b2f58d 100644
> > > --- a/fs/cachefiles/namei.c
> > > +++ b/fs/cachefiles/namei.c
> > > @@ -93,12 +93,11 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
> > > _enter(",,%s", dirname);
> > >
> > > /* search the current directory for the element name */
> > > - inode_lock_nested(d_inode(dir), I_MUTEX_PARENT);
> > >
> > > retry:
> > > ret = cachefiles_inject_read_error();
> > > if (ret == 0)
> > > - subdir = lookup_one(&nop_mnt_idmap, &QSTR(dirname), dir);
> > > + subdir = start_creating(&nop_mnt_idmap, dir, &QSTR(dirname));
> > > else
> > > subdir = ERR_PTR(ret);
> > > trace_cachefiles_lookup(NULL, dir, subdir);
> > > @@ -141,7 +140,7 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
> > > trace_cachefiles_mkdir(dir, subdir);
> > >
> > > if (unlikely(d_unhashed(subdir) || d_is_negative(subdir))) {
> > > - dput(subdir);
> > > + end_creating(subdir, dir);
> > > goto retry;
> > > }
> > > ASSERT(d_backing_inode(subdir));
> > > @@ -154,7 +153,8 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
> > >
> > > /* Tell rmdir() it's not allowed to delete the subdir */
> > > inode_lock(d_inode(subdir));
> > > - inode_unlock(d_inode(dir));
> > > + dget(subdir);
> > > + end_creating(subdir, dir);
> > >
> > > if (!__cachefiles_mark_inode_in_use(NULL, d_inode(subdir))) {
> > > pr_notice("cachefiles: Inode already in use: %pd (B=%lx)\n",
> > > @@ -196,14 +196,11 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
> > > return ERR_PTR(-EBUSY);
> > >
> > > mkdir_error:
> > > - inode_unlock(d_inode(dir));
> > > - if (!IS_ERR(subdir))
> > > - dput(subdir);
> > > + end_creating(subdir, dir);
> > > pr_err("mkdir %s failed with error %d\n", dirname, ret);
> > > return ERR_PTR(ret);
> > >
> > > lookup_error:
> > > - inode_unlock(d_inode(dir));
> > > ret = PTR_ERR(subdir);
> > > pr_err("Lookup %s failed with error %d\n", dirname, ret);
> > > return ERR_PTR(ret);
> > > @@ -679,36 +676,37 @@ bool cachefiles_commit_tmpfile(struct cachefiles_cache *cache,
> > >
> > > _enter(",%pD", object->file);
> > >
> > > - inode_lock_nested(d_inode(fan), I_MUTEX_PARENT);
> > > ret = cachefiles_inject_read_error();
> > > if (ret == 0)
> > > - dentry = lookup_one(&nop_mnt_idmap, &QSTR(object->d_name), fan);
> > > + dentry = start_creating(&nop_mnt_idmap, fan, &QSTR(object->d_name));
> > > else
> > > dentry = ERR_PTR(ret);
> > > if (IS_ERR(dentry)) {
> > > trace_cachefiles_vfs_error(object, d_inode(fan), PTR_ERR(dentry),
> > > cachefiles_trace_lookup_error);
> > > _debug("lookup fail %ld", PTR_ERR(dentry));
> > > - goto out_unlock;
> > > + goto out;
> > > }
> > >
> > > - if (!d_is_negative(dentry)) {
> > > + while (!d_is_negative(dentry)) {
> > > ret = cachefiles_unlink(volume->cache, object, fan, dentry,
> > > FSCACHE_OBJECT_IS_STALE);
> > > if (ret < 0)
> > > - goto out_dput;
> > > + goto out_end;
> > > +
> > > + end_creating(dentry, fan);
> > >
> > > - dput(dentry);
> > > ret = cachefiles_inject_read_error();
> > > if (ret == 0)
> > > - dentry = lookup_one(&nop_mnt_idmap, &QSTR(object->d_name), fan);
> > > + dentry = start_creating(&nop_mnt_idmap, fan,
> > > + &QSTR(object->d_name));
> > > else
> > > dentry = ERR_PTR(ret);
> > > if (IS_ERR(dentry)) {
> > > trace_cachefiles_vfs_error(object, d_inode(fan), PTR_ERR(dentry),
> > > cachefiles_trace_lookup_error);
> > > _debug("lookup fail %ld", PTR_ERR(dentry));
> > > - goto out_unlock;
> > > + goto out;
> > > }
> > > }
> > >
> > > @@ -729,10 +727,9 @@ bool cachefiles_commit_tmpfile(struct cachefiles_cache *cache,
> > > success = true;
> > > }
> > >
> > > -out_dput:
> > > - dput(dentry);
> > > -out_unlock:
> > > - inode_unlock(d_inode(fan));
> > > +out_end:
> > > + end_creating(dentry, fan);
> > > +out:
> > > _leave(" = %u", success);
> > > return success;
> > > }
> > > diff --git a/fs/namei.c b/fs/namei.c
> > > index 81cbaabbbe21..064cb44a3a46 100644
> > > --- a/fs/namei.c
> > > +++ b/fs/namei.c
> > > @@ -3242,6 +3242,33 @@ struct dentry *lookup_noperm_positive_unlocked(struct qstr *name,
> > > }
> > > EXPORT_SYMBOL(lookup_noperm_positive_unlocked);
> > >
> > > +/**
> > > + * start_creating - prepare to create a given name with permission checking
> > > + * @idmap - idmap of the mount
> > > + * @parent - directory in which to prepare to create the name
> > > + * @name - the name to be created
> > > + *
> > > + * Locks are taken and a lookup in performed prior to creating
> >
> > typo: is performed
> >
> > > + * an object in a directory. Permission checking (MAY_EXEC) is performed
> > > + * against @idmap.
> > > + *
> > > + * If the name already exists, a positive dentry is returned, so
> > > + * behaviour is similar to O_CREAT without O_EXCL, which doesn't fail
> > > + * with -EEXIST.
> > > + *
> > > + * Returns: a negative or positive dentry, or an error.
> > > + */
> > > +struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *parent,
> > > + struct qstr *name)
> > > +{
> > > + int err = lookup_one_common(idmap, name, parent);
> > > +
> > > + if (err)
> > > + return ERR_PTR(err);
> > > + return start_dirop(parent, name, LOOKUP_CREATE);
> > > +}
> > > +EXPORT_SYMBOL(start_creating);
> > > +
> > > #ifdef CONFIG_UNIX98_PTYS
> > > int path_pts(struct path *path)
> > > {
> > > diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
> > > index b6d03e1ef5f7..e2aac0def2cb 100644
> > > --- a/fs/nfsd/nfs3proc.c
> > > +++ b/fs/nfsd/nfs3proc.c
> > > @@ -281,14 +281,11 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > if (host_err)
> > > return nfserrno(host_err);
> > >
> > > - inode_lock_nested(inode, I_MUTEX_PARENT);
> > > -
> > > - child = lookup_one(&nop_mnt_idmap,
> > > - &QSTR_LEN(argp->name, argp->len),
> > > - parent);
> > > + child = start_creating(&nop_mnt_idmap, parent,
> > > + &QSTR_LEN(argp->name, argp->len));
> > > if (IS_ERR(child)) {
> > > status = nfserrno(PTR_ERR(child));
> > > - goto out;
> > > + goto out_write;
> > > }
> > >
> > > if (d_really_is_negative(child)) {
> > > @@ -367,9 +364,8 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > status = nfsd_create_setattr(rqstp, fhp, resfhp, &attrs);
> > >
> > > out:
> > > - inode_unlock(inode);
> > > - if (child && !IS_ERR(child))
> > > - dput(child);
> > > + end_creating(child, parent);
> > > +out_write:
> > > fh_drop_write(fhp);
> > > return status;
> > > }
> > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > > index 71b428efcbb5..35d48221072f 100644
> > > --- a/fs/nfsd/nfs4proc.c
> > > +++ b/fs/nfsd/nfs4proc.c
> > > @@ -264,14 +264,11 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > if (is_create_with_attrs(open))
> > > nfsd4_acl_to_attr(NF4REG, open->op_acl, &attrs);
> > >
> > > - inode_lock_nested(inode, I_MUTEX_PARENT);
> > > -
> > > - child = lookup_one(&nop_mnt_idmap,
> > > - &QSTR_LEN(open->op_fname, open->op_fnamelen),
> > > - parent);
> > > + child = start_creating(&nop_mnt_idmap, parent,
> > > + &QSTR_LEN(open->op_fname, open->op_fnamelen));
> > > if (IS_ERR(child)) {
> > > status = nfserrno(PTR_ERR(child));
> > > - goto out;
> > > + goto out_write;
> > > }
> > >
> > > if (d_really_is_negative(child)) {
> > > @@ -379,10 +376,9 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > if (attrs.na_aclerr)
> > > open->op_bmval[0] &= ~FATTR4_WORD0_ACL;
> > > out:
> > > - inode_unlock(inode);
> > > + end_creating(child, parent);
> > > nfsd_attrs_free(&attrs);
> > > - if (child && !IS_ERR(child))
> > > - dput(child);
> > > +out_write:
> > > fh_drop_write(fhp);
> > > return status;
> > > }
> > > diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
> > > index 2231192ec33f..93b2a3e764db 100644
> > > --- a/fs/nfsd/nfs4recover.c
> > > +++ b/fs/nfsd/nfs4recover.c
> > > @@ -216,13 +216,11 @@ nfsd4_create_clid_dir(struct nfs4_client *clp)
> > > goto out_creds;
> > >
> > > dir = nn->rec_file->f_path.dentry;
> > > - /* lock the parent */
> > > - inode_lock(d_inode(dir));
> > >
> > > - dentry = lookup_one(&nop_mnt_idmap, &QSTR(dname), dir);
> > > + dentry = start_creating(&nop_mnt_idmap, dir, &QSTR(dname));
> > > if (IS_ERR(dentry)) {
> > > status = PTR_ERR(dentry);
> > > - goto out_unlock;
> > > + goto out;
> > > }
> > > if (d_really_is_positive(dentry))
> > > /*
> > > @@ -233,15 +231,13 @@ nfsd4_create_clid_dir(struct nfs4_client *clp)
> > > * In the 4.0 case, we should never get here; but we may
> > > * as well be forgiving and just succeed silently.
> > > */
> > > - goto out_put;
> > > + goto out_end;
> > > dentry = vfs_mkdir(&nop_mnt_idmap, d_inode(dir), dentry, S_IRWXU);
> > > if (IS_ERR(dentry))
> > > status = PTR_ERR(dentry);
> > > -out_put:
> > > - if (!status)
> > > - dput(dentry);
> > > -out_unlock:
> > > - inode_unlock(d_inode(dir));
> > > +out_end:
> > > + end_creating(dentry, dir);
> > > +out:
> > > if (status == 0) {
> > > if (nn->in_grace)
> > > __nfsd4_create_reclaim_record_grace(clp, dname,
> > > diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
> > > index 8f71f5748c75..ee1b16e921fd 100644
> > > --- a/fs/nfsd/nfsproc.c
> > > +++ b/fs/nfsd/nfsproc.c
> > > @@ -306,18 +306,16 @@ nfsd_proc_create(struct svc_rqst *rqstp)
> > > goto done;
> > > }
> > >
> > > - inode_lock_nested(dirfhp->fh_dentry->d_inode, I_MUTEX_PARENT);
> > > - dchild = lookup_one(&nop_mnt_idmap, &QSTR_LEN(argp->name, argp->len),
> > > - dirfhp->fh_dentry);
> > > + dchild = start_creating(&nop_mnt_idmap, dirfhp->fh_dentry,
> > > + &QSTR_LEN(argp->name, argp->len));
> > > if (IS_ERR(dchild)) {
> > > resp->status = nfserrno(PTR_ERR(dchild));
> > > - goto out_unlock;
> > > + goto out_write;
> > > }
> > > fh_init(newfhp, NFS_FHSIZE);
> > > resp->status = fh_compose(newfhp, dirfhp->fh_export, dchild, dirfhp);
> > > if (!resp->status && d_really_is_negative(dchild))
> > > resp->status = nfserr_noent;
> > > - dput(dchild);
> > > if (resp->status) {
> > > if (resp->status != nfserr_noent)
> > > goto out_unlock;
> > > @@ -423,7 +421,8 @@ nfsd_proc_create(struct svc_rqst *rqstp)
> > > }
> > >
> > > out_unlock:
> > > - inode_unlock(dirfhp->fh_dentry->d_inode);
> > > + end_creating(dchild, dirfhp->fh_dentry);
> > > +out_write:
> > > fh_drop_write(dirfhp);
> > > done:
> > > fh_put(dirfhp);
> > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > > index aa4a95713a48..90c830c59c60 100644
> > > --- a/fs/nfsd/vfs.c
> > > +++ b/fs/nfsd/vfs.c
> > > @@ -1605,19 +1605,16 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > if (host_err)
> > > return nfserrno(host_err);
> > >
> > > - inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT);
> > > - dchild = lookup_one(&nop_mnt_idmap, &QSTR_LEN(fname, flen), dentry);
> > > + dchild = start_creating(&nop_mnt_idmap, dentry, &QSTR_LEN(fname, flen));
> > > host_err = PTR_ERR(dchild);
> > > - if (IS_ERR(dchild)) {
> > > - err = nfserrno(host_err);
> > > - goto out_unlock;
> > > - }
> > > + if (IS_ERR(dchild))
> > > + return nfserrno(host_err);
> > > +
> > > err = fh_compose(resfhp, fhp->fh_export, dchild, fhp);
> > > /*
> > > * We unconditionally drop our ref to dchild as fh_compose will have
> > > * already grabbed its own ref for it.
> > > */
> > > - dput(dchild);
> > > if (err)
> > > goto out_unlock;
> > > err = fh_fill_pre_attrs(fhp);
> > > @@ -1626,7 +1623,7 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > err = nfsd_create_locked(rqstp, fhp, attrs, type, rdev, resfhp);
> > > fh_fill_post_attrs(fhp);
> > > out_unlock:
> > > - inode_unlock(dentry->d_inode);
> > > + end_creating(dchild, dentry);
> > > return err;
> > > }
> > >
> > > @@ -1712,11 +1709,9 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > }
> > >
> > > dentry = fhp->fh_dentry;
> > > - inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT);
> > > - dnew = lookup_one(&nop_mnt_idmap, &QSTR_LEN(fname, flen), dentry);
> > > + dnew = start_creating(&nop_mnt_idmap, dentry, &QSTR_LEN(fname, flen));
> > > if (IS_ERR(dnew)) {
> > > err = nfserrno(PTR_ERR(dnew));
> > > - inode_unlock(dentry->d_inode);
> > > goto out_drop_write;
> > > }
> > > err = fh_fill_pre_attrs(fhp);
> > > @@ -1729,11 +1724,11 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > nfsd_create_setattr(rqstp, fhp, resfhp, attrs);
> > > fh_fill_post_attrs(fhp);
> > > out_unlock:
> > > - inode_unlock(dentry->d_inode);
> > > + end_creating(dnew, dentry);
> > > if (!err)
> > > err = nfserrno(commit_metadata(fhp));
> > > - dput(dnew);
> > > - if (err==0) err = cerr;
> > > + if (!err)
> > > + err = cerr;
> > > out_drop_write:
> > > fh_drop_write(fhp);
> > > out:
> > > @@ -1788,32 +1783,31 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
> > >
> > > ddir = ffhp->fh_dentry;
> > > dirp = d_inode(ddir);
> > > - inode_lock_nested(dirp, I_MUTEX_PARENT);
> > > + dnew = start_creating(&nop_mnt_idmap, ddir, &QSTR_LEN(name, len));
> > >
> > > - dnew = lookup_one(&nop_mnt_idmap, &QSTR_LEN(name, len), ddir);
> > > if (IS_ERR(dnew)) {
> > > host_err = PTR_ERR(dnew);
> > > - goto out_unlock;
> > > + goto out_drop_write;
> > > }
> > >
> > > dold = tfhp->fh_dentry;
> > >
> > > err = nfserr_noent;
> > > if (d_really_is_negative(dold))
> > > - goto out_dput;
> > > + goto out_unlock;
> > > err = fh_fill_pre_attrs(ffhp);
> > > if (err != nfs_ok)
> > > - goto out_dput;
> > > + goto out_unlock;
> > > host_err = vfs_link(dold, &nop_mnt_idmap, dirp, dnew, NULL);
> > > fh_fill_post_attrs(ffhp);
> > > - inode_unlock(dirp);
> > > +out_unlock:
> > > + end_creating(dnew, ddir);
> > > if (!host_err) {
> > > host_err = commit_metadata(ffhp);
> > > if (!host_err)
> > > host_err = commit_metadata(tfhp);
> > > }
> > >
> > > - dput(dnew);
> > > out_drop_write:
> > > fh_drop_write(tfhp);
> > > if (host_err == -EBUSY) {
> > > @@ -1828,12 +1822,6 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
> > > }
> > > out:
> > > return err != nfs_ok ? err : nfserrno(host_err);
> > > -
> > > -out_dput:
> > > - dput(dnew);
> > > -out_unlock:
> > > - inode_unlock(dirp);
> > > - goto out_drop_write;
> > > }
> > >
> > > static void
> > > diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> > > index 27396fe63f6d..6a31ea34ff80 100644
> > > --- a/fs/overlayfs/copy_up.c
> > > +++ b/fs/overlayfs/copy_up.c
> > > @@ -613,9 +613,9 @@ static int ovl_link_up(struct ovl_copy_up_ctx *c)
> > > if (err)
> > > goto out;
> > >
> > > - inode_lock_nested(udir, I_MUTEX_PARENT);
> > > - upper = ovl_lookup_upper(ofs, c->dentry->d_name.name, upperdir,
> > > - c->dentry->d_name.len);
> > > + upper = ovl_start_creating_upper(ofs, upperdir,
> > > + &QSTR_LEN(c->dentry->d_name.name,
> > > + c->dentry->d_name.len));
> > > err = PTR_ERR(upper);
> > > if (!IS_ERR(upper)) {
> > > err = ovl_do_link(ofs, ovl_dentry_upper(c->dentry), udir, upper);
> > > @@ -626,9 +626,8 @@ static int ovl_link_up(struct ovl_copy_up_ctx *c)
> > > ovl_dentry_set_upper_alias(c->dentry);
> > > ovl_dentry_update_reval(c->dentry, upper);
> > > }
> > > - dput(upper);
> > > + end_creating(upper, upperdir);
> > > }
> > > - inode_unlock(udir);
> > > if (err)
> > > goto out;
> > >
> > > @@ -894,16 +893,14 @@ static int ovl_copy_up_tmpfile(struct ovl_copy_up_ctx *c)
> > > if (err)
> > > goto out;
> > >
> > > - inode_lock_nested(udir, I_MUTEX_PARENT);
> > > -
> > > - upper = ovl_lookup_upper(ofs, c->destname.name, c->destdir,
> > > - c->destname.len);
> > > + upper = ovl_start_creating_upper(ofs, c->destdir,
> > > + &QSTR_LEN(c->destname.name,
> > > + c->destname.len));
> > > err = PTR_ERR(upper);
> > > if (!IS_ERR(upper)) {
> > > err = ovl_do_link(ofs, temp, udir, upper);
> > > - dput(upper);
> > > + end_creating(upper, c->destdir);
> > > }
> > > - inode_unlock(udir);
> > >
> > > if (err)
> > > goto out;
> > > diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> > > index dbd63a74df4b..0ae79efbfce7 100644
> > > --- a/fs/overlayfs/dir.c
> > > +++ b/fs/overlayfs/dir.c
> > > @@ -59,15 +59,21 @@ int ovl_cleanup(struct ovl_fs *ofs, struct dentry *workdir,
> > > return 0;
> > > }
> > >
> > > -struct dentry *ovl_lookup_temp(struct ovl_fs *ofs, struct dentry *workdir)
> > > +#define OVL_TEMPNAME_SIZE 20
> > > +static void ovl_tempname(char name[OVL_TEMPNAME_SIZE])
> > > {
> > > - struct dentry *temp;
> > > - char name[20];
> > > static atomic_t temp_id = ATOMIC_INIT(0);
> > >
> > > /* counter is allowed to wrap, since temp dentries are ephemeral */
> > > - snprintf(name, sizeof(name), "#%x", atomic_inc_return(&temp_id));
> > > + snprintf(name, OVL_TEMPNAME_SIZE, "#%x", atomic_inc_return(&temp_id));
> > > +}
> > > +
> > > +struct dentry *ovl_lookup_temp(struct ovl_fs *ofs, struct dentry *workdir)
> > > +{
> > > + struct dentry *temp;
> > > + char name[OVL_TEMPNAME_SIZE];
> > >
> > > + ovl_tempname(name);
> > > temp = ovl_lookup_upper(ofs, name, workdir, strlen(name));
> > > if (!IS_ERR(temp) && temp->d_inode) {
> > > pr_err("workdir/%s already exists\n", name);
> > > @@ -78,6 +84,16 @@ struct dentry *ovl_lookup_temp(struct ovl_fs *ofs, struct dentry *workdir)
> > > return temp;
> > > }
> > >
> > > +static struct dentry *ovl_start_creating_temp(struct ovl_fs *ofs,
> > > + struct dentry *workdir)
> > > +{
> > > + char name[OVL_TEMPNAME_SIZE];
> > > +
> > > + ovl_tempname(name);
> > > + return start_creating(ovl_upper_mnt_idmap(ofs), workdir,
> > > + &QSTR(name));
> > > +}
> > > +
> > > static struct dentry *ovl_whiteout(struct ovl_fs *ofs)
> > > {
> > > int err;
> > > @@ -88,35 +104,31 @@ static struct dentry *ovl_whiteout(struct ovl_fs *ofs)
> > > guard(mutex)(&ofs->whiteout_lock);
> > >
> > > if (!ofs->whiteout) {
> > > - inode_lock_nested(wdir, I_MUTEX_PARENT);
> > > - whiteout = ovl_lookup_temp(ofs, workdir);
> > > - if (!IS_ERR(whiteout)) {
> > > - err = ovl_do_whiteout(ofs, wdir, whiteout);
> > > - if (err) {
> > > - dput(whiteout);
> > > - whiteout = ERR_PTR(err);
> > > - }
> > > - }
> > > - inode_unlock(wdir);
> > > + whiteout = ovl_start_creating_temp(ofs, workdir);
> > > if (IS_ERR(whiteout))
> > > return whiteout;
> > > - ofs->whiteout = whiteout;
> > > + err = ovl_do_whiteout(ofs, wdir, whiteout);
> > > + if (!err)
> > > + ofs->whiteout = dget(whiteout);
> > > + end_creating(whiteout, workdir);
> > > + if (err)
> > > + return ERR_PTR(err);
> > > }
> > >
> > > if (!ofs->no_shared_whiteout) {
> > > - inode_lock_nested(wdir, I_MUTEX_PARENT);
> > > - whiteout = ovl_lookup_temp(ofs, workdir);
> > > - if (!IS_ERR(whiteout)) {
> > > - err = ovl_do_link(ofs, ofs->whiteout, wdir, whiteout);
> > > - if (err) {
> > > - dput(whiteout);
> > > - whiteout = ERR_PTR(err);
> > > - }
> > > - }
> > > - inode_unlock(wdir);
> > > - if (!IS_ERR(whiteout))
> > > + struct dentry *ret = NULL;
> >
> > For clarity please name this var "link".
>
> Is "link" really clearer than "ret"?
>
> Maybe if I make it
> struct dentry *link;
> link = ovl_start_creating_temp(ofs, workdir);
> if (IS_ERR(link))
> return link;
> err = ovl_do_link(ofs, ofs->whiteout, wdir, link);
> if (!err)
> whiteout = dget(link);
> end_creating(whiteout, workdir);
> if (!err)
> return whiteout;
>
> Then "link" makes sense to me.
That looks fine, but now I do not understand why ret was needed in the
first place,
so rather stay with:
err = ovl_do_link(ofs, ofs->whiteout, wdir, whiteout);
if (!err)
dget(whiteout);
end_creating(whiteout, workdir);
if (!err)
return whiteout;
unless I am missing something.
>
> >
> > > +
> > > + whiteout = ovl_start_creating_temp(ofs, workdir);
> > > + if (IS_ERR(whiteout))
> > > return whiteout;
> > > - if (PTR_ERR(whiteout) != -EMLINK) {
> > > + err = ovl_do_link(ofs, ofs->whiteout, wdir, whiteout);
> > > + if (!err)
> > > + ret = dget(whiteout);
> > > + end_creating(whiteout, workdir);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + if (err != -EMLINK) {
> > > pr_warn("Failed to link whiteout - disabling whiteout inode sharing(nlink=%u, err=%lu)\n",
> > > ofs->whiteout->d_inode->i_nlink,
> > > PTR_ERR(whiteout));
> > > @@ -225,10 +237,13 @@ struct dentry *ovl_create_temp(struct ovl_fs *ofs, struct dentry *workdir,
> > > struct ovl_cattr *attr)
> > > {
> > > struct dentry *ret;
> > > - inode_lock_nested(workdir->d_inode, I_MUTEX_PARENT);
> > > - ret = ovl_create_real(ofs, workdir,
> > > - ovl_lookup_temp(ofs, workdir), attr);
> > > - inode_unlock(workdir->d_inode);
> > > + ret = ovl_start_creating_temp(ofs, workdir);
> > > + if (IS_ERR(ret))
> > > + return ret;
> > > + ret = ovl_create_real(ofs, workdir, ret, attr);
> > > + if (!IS_ERR(ret))
> > > + dget(ret);
> > > + end_creating(ret, workdir);
> > > return ret;
> > > }
> > >
> > > @@ -327,18 +342,21 @@ static int ovl_create_upper(struct dentry *dentry, struct inode *inode,
> > > {
> > > struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
> > > struct dentry *upperdir = ovl_dentry_upper(dentry->d_parent);
> > > - struct inode *udir = upperdir->d_inode;
> > > struct dentry *newdentry;
> > > int err;
> > >
> > > - inode_lock_nested(udir, I_MUTEX_PARENT);
> > > - newdentry = ovl_create_real(ofs, upperdir,
> > > - ovl_lookup_upper(ofs, dentry->d_name.name,
> > > - upperdir, dentry->d_name.len),
> > > - attr);
> > > - inode_unlock(udir);
> > > + newdentry = ovl_start_creating_upper(ofs, upperdir,
> > > + &QSTR_LEN(dentry->d_name.name,
> > > + dentry->d_name.len));
> > > if (IS_ERR(newdentry))
> > > return PTR_ERR(newdentry);
> > > + newdentry = ovl_create_real(ofs, upperdir, newdentry, attr);
> > > + if (IS_ERR(newdentry)) {
> > > + end_creating(newdentry, upperdir);
> > > + return PTR_ERR(newdentry);
> > > + }
> > > + dget(newdentry);
> > > + end_creating(newdentry, upperdir);
> >
> > See suggestion below to make this:
> >
> > newdentry = end_creating_dentry(newdentry, upperdir);
> > if (IS_ERR(newdentry))
> > return PTR_ERR(newdentry);
> >
> > >
> > > if (ovl_type_merge(dentry->d_parent) && d_is_dir(newdentry) &&
> > > !ovl_allow_offline_changes(ofs)) {
> > > diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> > > index 4f84abaa0d68..c24c2da953bd 100644
> > > --- a/fs/overlayfs/overlayfs.h
> > > +++ b/fs/overlayfs/overlayfs.h
> > > @@ -415,6 +415,14 @@ static inline struct dentry *ovl_lookup_upper_unlocked(struct ovl_fs *ofs,
> > > &QSTR_LEN(name, len), base);
> > > }
> > >
> > > +static inline struct dentry *ovl_start_creating_upper(struct ovl_fs *ofs,
> > > + struct dentry *parent,
> > > + struct qstr *name)
> > > +{
> > > + return start_creating(ovl_upper_mnt_idmap(ofs),
> > > + parent, name);
> > > +}
> > > +
> > > static inline bool ovl_open_flags_need_copy_up(int flags)
> > > {
> > > if (!flags)
> > > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> > > index bd3d7ba8fb95..67abb62e205b 100644
> > > --- a/fs/overlayfs/super.c
> > > +++ b/fs/overlayfs/super.c
> > > @@ -300,8 +300,7 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
> > > bool retried = false;
> > >
> > > retry:
> > > - inode_lock_nested(dir, I_MUTEX_PARENT);
> > > - work = ovl_lookup_upper(ofs, name, ofs->workbasedir, strlen(name));
> > > + work = ovl_start_creating_upper(ofs, ofs->workbasedir, &QSTR(name));
> > >
> > > if (!IS_ERR(work)) {
> > > struct iattr attr = {
> > > @@ -310,14 +309,13 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
> > > };
> > >
> > > if (work->d_inode) {
> > > + dget(work);
> > > + end_creating(work, ofs->workbasedir);
> > > + if (persist)
> > > + return work;
> > > err = -EEXIST;
> > > - inode_unlock(dir);
> > > if (retried)
> > > goto out_dput;
> > > -
> > > - if (persist)
> > > - return work;
> > > -
> > > retried = true;
> > > err = ovl_workdir_cleanup(ofs, ofs->workbasedir, mnt, work, 0);
> > > dput(work);
> > > @@ -328,7 +326,9 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
> > > }
> > >
> > > work = ovl_do_mkdir(ofs, dir, work, attr.ia_mode);
> > > - inode_unlock(dir);
> > > + if (!IS_ERR(work))
> > > + dget(work);
> > > + end_creating(work, ofs->workbasedir);
> > > err = PTR_ERR(work);
> > > if (IS_ERR(work))
> > > goto out_err;
> > > @@ -366,7 +366,6 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
> > > if (err)
> > > goto out_dput;
> > > } else {
> > > - inode_unlock(dir);
> > > err = PTR_ERR(work);
> > > goto out_err;
> > > }
> > > @@ -616,14 +615,17 @@ static struct dentry *ovl_lookup_or_create(struct ovl_fs *ofs,
> > > struct dentry *parent,
> > > const char *name, umode_t mode)
> > > {
> > > - size_t len = strlen(name);
> > > struct dentry *child;
> > >
> > > - inode_lock_nested(parent->d_inode, I_MUTEX_PARENT);
> > > - child = ovl_lookup_upper(ofs, name, parent, len);
> > > - if (!IS_ERR(child) && !child->d_inode)
> > > - child = ovl_create_real(ofs, parent, child, OVL_CATTR(mode));
> > > - inode_unlock(parent->d_inode);
> > > + child = ovl_start_creating_upper(ofs, parent, &QSTR(name));
> > > + if (!IS_ERR(child)) {
> > > + if (!child->d_inode)
> > > + child = ovl_create_real(ofs, parent, child,
> > > + OVL_CATTR(mode));
> > > + if (!IS_ERR(child))
> > > + dget(child);
> > > + end_creating(child, parent);
> >
> > We have a few of those things open code which are not so pretty IMO.
> > How about:
> >
> > child = end_creating_dentry(child, parent);
> >
> > Which is a variant of the void end_creating() which does dget()
> > in the non error case?
>
> I have experimented with that idea. I'm not convinced.
>
> There are six places where it would help.
> One is in cachefiles, the rest are in overlayfs.
> Two of them are just
> dget()
> end_creating()
>
> There other have some sort of condition on error as you pointed out
> above.
>
> That is out of nearly 30 uses for end_creating().
>
> I would rather declare the dentry variable as __free(end_dirop)
> or similar so that "end_creating()" would not appear and the dget()
> would stand alone with (hopefully) a clear meaning.
> But I can't do that until the vfs_mkdir() issue is resolved.
>
Sounds like nice plan.
> Maybe it would end up being cleaner having and alternate form of
> end_creating() which returns a reference. But I'm not convinced yet.
>
Not sure myself.
> >
> > end_creating_dentry() could be matched with start_creating_dentry()
> > in common cases where we had a ref on the child before creating and
> > we want to keep the ref on the child after creating.
>
> I don't think there is a useful connection between end_creating_dentry()
> and start_creating_dentry(), so I wouldn't use that name.
> end_creating_return() maybe, or end_creating_keep().
>
> start_creating_dentry() takes an extra ref so it is always correct to
> use end_creating() and you don't need a dget() because you already have
> a ref. You only need the dget() (or to avoid the dput()) if a lookup
> was performed by start_creating().
>
FWIW, the code with end_creating_keep() in your branch looks nice IMO.
might as well just call it end_creating_dget(), but naming is hard..
Let's see what others think.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH 04/11] VFS/nfsd/cachefiles/ovl: introduce start_removing() and end_removing()
2025-09-26 2:49 [PATCH 00/11] Create APIs to centralise locking for directory ops NeilBrown
` (2 preceding siblings ...)
2025-09-26 2:49 ` [PATCH 03/11] VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating() NeilBrown
@ 2025-09-26 2:49 ` NeilBrown
2025-09-27 9:12 ` Amir Goldstein
2025-10-02 17:02 ` Jeff Layton
2025-09-26 2:49 ` [PATCH 05/11] VFS: introduce start_creating_noperm() and start_removing_noperm() NeilBrown
` (7 subsequent siblings)
11 siblings, 2 replies; 49+ messages in thread
From: NeilBrown @ 2025-09-26 2:49 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
start_removing() is similar to start_creating() but will only return a
positive dentry with the expectation that it will be removed. This is
used by nfsd, cachefiles, and overlayfs. They are changed to also use
end_removing() to terminate the action begun by start_removing(). This
is a simple alias for end_dirop().
Apart from changes to the error paths, as we no longer need to unlock on
a lookup error, an effect on callers is that they don't need to test if
the found dentry is positive or negative - they can be sure it is
positive.
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/cachefiles/namei.c | 25 ++++++++++---------------
fs/namei.c | 27 +++++++++++++++++++++++++++
fs/nfsd/nfs4recover.c | 18 +++++-------------
fs/nfsd/vfs.c | 26 ++++++++++----------------
fs/overlayfs/dir.c | 15 +++++++--------
fs/overlayfs/overlayfs.h | 8 ++++++++
include/linux/namei.h | 17 +++++++++++++++++
7 files changed, 84 insertions(+), 52 deletions(-)
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 965b22b2f58d..3064d439807b 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -260,6 +260,7 @@ static int cachefiles_unlink(struct cachefiles_cache *cache,
* - File backed objects are unlinked
* - Directory backed objects are stuffed into the graveyard for userspace to
* delete
+ * On entry dir must be locked. It will be unlocked on exit.
*/
int cachefiles_bury_object(struct cachefiles_cache *cache,
struct cachefiles_object *object,
@@ -275,7 +276,8 @@ int cachefiles_bury_object(struct cachefiles_cache *cache,
_enter(",'%pd','%pd'", dir, rep);
if (rep->d_parent != dir) {
- inode_unlock(d_inode(dir));
+ dget(rep);
+ end_removing(rep);
_leave(" = -ESTALE");
return -ESTALE;
}
@@ -286,16 +288,16 @@ int cachefiles_bury_object(struct cachefiles_cache *cache,
* by a file struct.
*/
ret = cachefiles_unlink(cache, object, dir, rep, why);
- dput(rep);
+ end_removing(rep);
- inode_unlock(d_inode(dir));
_leave(" = %d", ret);
return ret;
}
/* directories have to be moved to the graveyard */
_debug("move stale object to graveyard");
- inode_unlock(d_inode(dir));
+ dget(rep);
+ end_removing(rep);
try_again:
/* first step is to make up a grave dentry in the graveyard */
@@ -745,26 +747,20 @@ static struct dentry *cachefiles_lookup_for_cull(struct cachefiles_cache *cache,
struct dentry *victim;
int ret = -ENOENT;
- inode_lock_nested(d_inode(dir), I_MUTEX_PARENT);
+ victim = start_removing(&nop_mnt_idmap, dir, &QSTR(filename));
- victim = lookup_one(&nop_mnt_idmap, &QSTR(filename), dir);
if (IS_ERR(victim))
goto lookup_error;
- if (d_is_negative(victim))
- goto lookup_put;
if (d_inode(victim)->i_flags & S_KERNEL_FILE)
goto lookup_busy;
return victim;
lookup_busy:
ret = -EBUSY;
-lookup_put:
- inode_unlock(d_inode(dir));
- dput(victim);
+ end_removing(victim);
return ERR_PTR(ret);
lookup_error:
- inode_unlock(d_inode(dir));
ret = PTR_ERR(victim);
if (ret == -ENOENT)
return ERR_PTR(-ESTALE); /* Probably got retired by the netfs */
@@ -812,18 +808,17 @@ int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *dir,
ret = cachefiles_bury_object(cache, NULL, dir, victim,
FSCACHE_OBJECT_WAS_CULLED);
+ dput(victim);
if (ret < 0)
goto error;
fscache_count_culled();
- dput(victim);
_leave(" = 0");
return 0;
error_unlock:
- inode_unlock(d_inode(dir));
+ end_removing(victim);
error:
- dput(victim);
if (ret == -ENOENT)
return -ESTALE; /* Probably got retired by the netfs */
diff --git a/fs/namei.c b/fs/namei.c
index 064cb44a3a46..0d9e98961758 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3269,6 +3269,33 @@ struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *parent,
}
EXPORT_SYMBOL(start_creating);
+/**
+ * start_removing - prepare to remove a given name with permission checking
+ * @idmap - idmap of the mount
+ * @parent - directory in which to find the name
+ * @name - the name to be removed
+ *
+ * Locks are taken and a lookup in performed prior to removing
+ * an object from a directory. Permission checking (MAY_EXEC) is performed
+ * against @idmap.
+ *
+ * If the name doesn't exist, an error is returned.
+ *
+ * end_removing() should be called when removal is complete, or aborted.
+ *
+ * Returns: a positive dentry, or an error.
+ */
+struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *parent,
+ struct qstr *name)
+{
+ int err = lookup_one_common(idmap, name, parent);
+
+ if (err)
+ return ERR_PTR(err);
+ return start_dirop(parent, name, 0);
+}
+EXPORT_SYMBOL(start_removing);
+
#ifdef CONFIG_UNIX98_PTYS
int path_pts(struct path *path)
{
diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
index 93b2a3e764db..0f33e13a9da2 100644
--- a/fs/nfsd/nfs4recover.c
+++ b/fs/nfsd/nfs4recover.c
@@ -345,20 +345,12 @@ nfsd4_unlink_clid_dir(char *name, struct nfsd_net *nn)
dprintk("NFSD: nfsd4_unlink_clid_dir. name %s\n", name);
dir = nn->rec_file->f_path.dentry;
- inode_lock_nested(d_inode(dir), I_MUTEX_PARENT);
- dentry = lookup_one(&nop_mnt_idmap, &QSTR(name), dir);
- if (IS_ERR(dentry)) {
- status = PTR_ERR(dentry);
- goto out_unlock;
- }
- status = -ENOENT;
- if (d_really_is_negative(dentry))
- goto out;
+ dentry = start_removing(&nop_mnt_idmap, dir, &QSTR(name));
+ if (IS_ERR(dentry))
+ return PTR_ERR(dentry);
+
status = vfs_rmdir(&nop_mnt_idmap, d_inode(dir), dentry);
-out:
- dput(dentry);
-out_unlock:
- inode_unlock(d_inode(dir));
+ end_removing(dentry);
return status;
}
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 90c830c59c60..d5b4550fd8f6 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -2021,7 +2021,7 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
{
struct dentry *dentry, *rdentry;
struct inode *dirp;
- struct inode *rinode;
+ struct inode *rinode = NULL;
__be32 err;
int host_err;
@@ -2040,24 +2040,21 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
dentry = fhp->fh_dentry;
dirp = d_inode(dentry);
- inode_lock_nested(dirp, I_MUTEX_PARENT);
- rdentry = lookup_one(&nop_mnt_idmap, &QSTR_LEN(fname, flen), dentry);
+ rdentry = start_removing(&nop_mnt_idmap, dentry, &QSTR_LEN(fname, flen));
+
host_err = PTR_ERR(rdentry);
if (IS_ERR(rdentry))
- goto out_unlock;
+ goto out_drop_write;
- if (d_really_is_negative(rdentry)) {
- dput(rdentry);
- host_err = -ENOENT;
- goto out_unlock;
- }
- rinode = d_inode(rdentry);
err = fh_fill_pre_attrs(fhp);
if (err != nfs_ok)
goto out_unlock;
+ rinode = d_inode(rdentry);
+ /* Prevent truncation until after locks dropped */
ihold(rinode);
+
if (!type)
type = d_inode(rdentry)->i_mode & S_IFMT;
@@ -2079,10 +2076,10 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
}
fh_fill_post_attrs(fhp);
- inode_unlock(dirp);
- if (!host_err)
+out_unlock:
+ end_removing(rdentry);
+ if (!err && !host_err)
host_err = commit_metadata(fhp);
- dput(rdentry);
iput(rinode); /* truncate the inode here */
out_drop_write:
@@ -2100,9 +2097,6 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
}
out:
return err != nfs_ok ? err : nfserrno(host_err);
-out_unlock:
- inode_unlock(dirp);
- goto out_drop_write;
}
/*
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index 0ae79efbfce7..c4057b4a050d 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -841,17 +841,17 @@ static int ovl_remove_upper(struct dentry *dentry, bool is_dir,
goto out;
}
- inode_lock_nested(dir, I_MUTEX_PARENT);
- upper = ovl_lookup_upper(ofs, dentry->d_name.name, upperdir,
- dentry->d_name.len);
+ upper = ovl_start_removing_upper(ofs, upperdir,
+ &QSTR_LEN(dentry->d_name.name,
+ dentry->d_name.len));
err = PTR_ERR(upper);
if (IS_ERR(upper))
- goto out_unlock;
+ goto out_dput;
err = -ESTALE;
if ((opaquedir && upper != opaquedir) ||
(!opaquedir && !ovl_matches_upper(dentry, upper)))
- goto out_dput_upper;
+ goto out_unlock;
if (is_dir)
err = ovl_do_rmdir(ofs, dir, upper);
@@ -867,10 +867,9 @@ static int ovl_remove_upper(struct dentry *dentry, bool is_dir,
*/
if (!err)
d_drop(dentry);
-out_dput_upper:
- dput(upper);
out_unlock:
- inode_unlock(dir);
+ end_removing(upper);
+out_dput:
dput(opaquedir);
out:
return err;
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index c24c2da953bd..915af58459b7 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -423,6 +423,14 @@ static inline struct dentry *ovl_start_creating_upper(struct ovl_fs *ofs,
parent, name);
}
+static inline struct dentry *ovl_start_removing_upper(struct ovl_fs *ofs,
+ struct dentry *parent,
+ struct qstr *name)
+{
+ return start_removing(ovl_upper_mnt_idmap(ofs),
+ parent, name);
+}
+
static inline bool ovl_open_flags_need_copy_up(int flags)
{
if (!flags)
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 4cbe930054a1..63941fdbc23d 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -90,6 +90,8 @@ struct dentry *lookup_one_positive_killable(struct mnt_idmap *idmap,
struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *parent,
struct qstr *name);
+struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *parent,
+ struct qstr *name);
/* end_creating - finish action started with start_creating
* @child - dentry returned by start_creating()
@@ -106,6 +108,21 @@ static inline void end_creating(struct dentry *child, struct dentry *parent)
end_dirop_mkdir(child, parent);
}
+/* end_removing - finish action started with start_removing
+ * @child - dentry returned by start_removing()
+ * @parent - dentry given to start_removing()
+ *
+ * Unlock and release the child.
+ *
+ * This is identical to end_dirop(). It can be passed the result of
+ * start_removing() whether that was successful or not, but it not needed
+ * if start_removing() failed.
+ */
+static inline void end_removing(struct dentry *child)
+{
+ end_dirop(child);
+}
+
extern int follow_down_one(struct path *);
extern int follow_down(struct path *path, unsigned int flags);
extern int follow_up(struct path *);
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 49+ messages in thread* Re: [PATCH 04/11] VFS/nfsd/cachefiles/ovl: introduce start_removing() and end_removing()
2025-09-26 2:49 ` [PATCH 04/11] VFS/nfsd/cachefiles/ovl: introduce start_removing() and end_removing() NeilBrown
@ 2025-09-27 9:12 ` Amir Goldstein
2025-10-02 17:02 ` Jeff Layton
1 sibling, 0 replies; 49+ messages in thread
From: Amir Goldstein @ 2025-09-27 9:12 UTC (permalink / raw)
To: NeilBrown
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Fri, Sep 26, 2025 at 4:50 AM NeilBrown <neilb@ownmail.net> wrote:
>
> From: NeilBrown <neil@brown.name>
>
> start_removing() is similar to start_creating() but will only return a
> positive dentry with the expectation that it will be removed. This is
> used by nfsd, cachefiles, and overlayfs. They are changed to also use
> end_removing() to terminate the action begun by start_removing(). This
> is a simple alias for end_dirop().
>
> Apart from changes to the error paths, as we no longer need to unlock on
> a lookup error, an effect on callers is that they don't need to test if
> the found dentry is positive or negative - they can be sure it is
> positive.
>
> Signed-off-by: NeilBrown <neil@brown.name>
I may be reviewing out of order to make progress
because the start_creating() patches are harder to chew on.
For this one though, with minor nit below fix, feel free to add:
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> ---
> fs/cachefiles/namei.c | 25 ++++++++++---------------
> fs/namei.c | 27 +++++++++++++++++++++++++++
> fs/nfsd/nfs4recover.c | 18 +++++-------------
> fs/nfsd/vfs.c | 26 ++++++++++----------------
> fs/overlayfs/dir.c | 15 +++++++--------
> fs/overlayfs/overlayfs.h | 8 ++++++++
> include/linux/namei.h | 17 +++++++++++++++++
> 7 files changed, 84 insertions(+), 52 deletions(-)
>
> diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
> index 965b22b2f58d..3064d439807b 100644
> --- a/fs/cachefiles/namei.c
> +++ b/fs/cachefiles/namei.c
> @@ -260,6 +260,7 @@ static int cachefiles_unlink(struct cachefiles_cache *cache,
> * - File backed objects are unlinked
> * - Directory backed objects are stuffed into the graveyard for userspace to
> * delete
> + * On entry dir must be locked. It will be unlocked on exit.
> */
> int cachefiles_bury_object(struct cachefiles_cache *cache,
> struct cachefiles_object *object,
> @@ -275,7 +276,8 @@ int cachefiles_bury_object(struct cachefiles_cache *cache,
> _enter(",'%pd','%pd'", dir, rep);
>
> if (rep->d_parent != dir) {
> - inode_unlock(d_inode(dir));
> + dget(rep);
> + end_removing(rep);
This look odd so deserve a comment.
> _leave(" = -ESTALE");
> return -ESTALE;
> }
> @@ -286,16 +288,16 @@ int cachefiles_bury_object(struct cachefiles_cache *cache,
> * by a file struct.
> */
> ret = cachefiles_unlink(cache, object, dir, rep, why);
> - dput(rep);
> + end_removing(rep);
>
> - inode_unlock(d_inode(dir));
> _leave(" = %d", ret);
> return ret;
> }
>
> /* directories have to be moved to the graveyard */
> _debug("move stale object to graveyard");
> - inode_unlock(d_inode(dir));
> + dget(rep);
> + end_removing(rep);
ditto
>
> try_again:
> /* first step is to make up a grave dentry in the graveyard */
> @@ -745,26 +747,20 @@ static struct dentry *cachefiles_lookup_for_cull(struct cachefiles_cache *cache,
> struct dentry *victim;
> int ret = -ENOENT;
>
> - inode_lock_nested(d_inode(dir), I_MUTEX_PARENT);
> + victim = start_removing(&nop_mnt_idmap, dir, &QSTR(filename));
>
> - victim = lookup_one(&nop_mnt_idmap, &QSTR(filename), dir);
> if (IS_ERR(victim))
> goto lookup_error;
> - if (d_is_negative(victim))
> - goto lookup_put;
> if (d_inode(victim)->i_flags & S_KERNEL_FILE)
> goto lookup_busy;
> return victim;
>
> lookup_busy:
> ret = -EBUSY;
> -lookup_put:
> - inode_unlock(d_inode(dir));
> - dput(victim);
> + end_removing(victim);
> return ERR_PTR(ret);
>
> lookup_error:
> - inode_unlock(d_inode(dir));
> ret = PTR_ERR(victim);
> if (ret == -ENOENT)
> return ERR_PTR(-ESTALE); /* Probably got retired by the netfs */
> @@ -812,18 +808,17 @@ int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *dir,
>
> ret = cachefiles_bury_object(cache, NULL, dir, victim,
> FSCACHE_OBJECT_WAS_CULLED);
> + dput(victim);
> if (ret < 0)
> goto error;
>
> fscache_count_culled();
> - dput(victim);
> _leave(" = 0");
> return 0;
>
> error_unlock:
> - inode_unlock(d_inode(dir));
> + end_removing(victim);
> error:
> - dput(victim);
> if (ret == -ENOENT)
> return -ESTALE; /* Probably got retired by the netfs */
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 064cb44a3a46..0d9e98961758 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3269,6 +3269,33 @@ struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *parent,
> }
> EXPORT_SYMBOL(start_creating);
>
> +/**
> + * start_removing - prepare to remove a given name with permission checking
> + * @idmap - idmap of the mount
> + * @parent - directory in which to find the name
> + * @name - the name to be removed
> + *
> + * Locks are taken and a lookup in performed prior to removing
> + * an object from a directory. Permission checking (MAY_EXEC) is performed
> + * against @idmap.
> + *
> + * If the name doesn't exist, an error is returned.
> + *
> + * end_removing() should be called when removal is complete, or aborted.
> + *
> + * Returns: a positive dentry, or an error.
> + */
> +struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *parent,
> + struct qstr *name)
> +{
> + int err = lookup_one_common(idmap, name, parent);
> +
> + if (err)
> + return ERR_PTR(err);
> + return start_dirop(parent, name, 0);
> +}
> +EXPORT_SYMBOL(start_removing);
> +
> #ifdef CONFIG_UNIX98_PTYS
> int path_pts(struct path *path)
> {
> diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
> index 93b2a3e764db..0f33e13a9da2 100644
> --- a/fs/nfsd/nfs4recover.c
> +++ b/fs/nfsd/nfs4recover.c
> @@ -345,20 +345,12 @@ nfsd4_unlink_clid_dir(char *name, struct nfsd_net *nn)
> dprintk("NFSD: nfsd4_unlink_clid_dir. name %s\n", name);
>
> dir = nn->rec_file->f_path.dentry;
> - inode_lock_nested(d_inode(dir), I_MUTEX_PARENT);
> - dentry = lookup_one(&nop_mnt_idmap, &QSTR(name), dir);
> - if (IS_ERR(dentry)) {
> - status = PTR_ERR(dentry);
> - goto out_unlock;
> - }
> - status = -ENOENT;
> - if (d_really_is_negative(dentry))
> - goto out;
> + dentry = start_removing(&nop_mnt_idmap, dir, &QSTR(name));
> + if (IS_ERR(dentry))
> + return PTR_ERR(dentry);
> +
> status = vfs_rmdir(&nop_mnt_idmap, d_inode(dir), dentry);
> -out:
> - dput(dentry);
> -out_unlock:
> - inode_unlock(d_inode(dir));
> + end_removing(dentry);
> return status;
> }
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 90c830c59c60..d5b4550fd8f6 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -2021,7 +2021,7 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
> {
> struct dentry *dentry, *rdentry;
> struct inode *dirp;
> - struct inode *rinode;
> + struct inode *rinode = NULL;
> __be32 err;
> int host_err;
>
> @@ -2040,24 +2040,21 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
>
> dentry = fhp->fh_dentry;
> dirp = d_inode(dentry);
> - inode_lock_nested(dirp, I_MUTEX_PARENT);
>
> - rdentry = lookup_one(&nop_mnt_idmap, &QSTR_LEN(fname, flen), dentry);
> + rdentry = start_removing(&nop_mnt_idmap, dentry, &QSTR_LEN(fname, flen));
> +
> host_err = PTR_ERR(rdentry);
> if (IS_ERR(rdentry))
> - goto out_unlock;
> + goto out_drop_write;
>
> - if (d_really_is_negative(rdentry)) {
> - dput(rdentry);
> - host_err = -ENOENT;
> - goto out_unlock;
> - }
> - rinode = d_inode(rdentry);
> err = fh_fill_pre_attrs(fhp);
> if (err != nfs_ok)
> goto out_unlock;
>
> + rinode = d_inode(rdentry);
> + /* Prevent truncation until after locks dropped */
> ihold(rinode);
> +
> if (!type)
> type = d_inode(rdentry)->i_mode & S_IFMT;
>
> @@ -2079,10 +2076,10 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
> }
> fh_fill_post_attrs(fhp);
>
> - inode_unlock(dirp);
> - if (!host_err)
> +out_unlock:
> + end_removing(rdentry);
> + if (!err && !host_err)
> host_err = commit_metadata(fhp);
> - dput(rdentry);
> iput(rinode); /* truncate the inode here */
>
> out_drop_write:
> @@ -2100,9 +2097,6 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
> }
> out:
> return err != nfs_ok ? err : nfserrno(host_err);
> -out_unlock:
> - inode_unlock(dirp);
> - goto out_drop_write;
> }
>
> /*
> diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> index 0ae79efbfce7..c4057b4a050d 100644
> --- a/fs/overlayfs/dir.c
> +++ b/fs/overlayfs/dir.c
> @@ -841,17 +841,17 @@ static int ovl_remove_upper(struct dentry *dentry, bool is_dir,
> goto out;
> }
>
> - inode_lock_nested(dir, I_MUTEX_PARENT);
> - upper = ovl_lookup_upper(ofs, dentry->d_name.name, upperdir,
> - dentry->d_name.len);
> + upper = ovl_start_removing_upper(ofs, upperdir,
> + &QSTR_LEN(dentry->d_name.name,
> + dentry->d_name.len));
> err = PTR_ERR(upper);
> if (IS_ERR(upper))
> - goto out_unlock;
> + goto out_dput;
>
> err = -ESTALE;
> if ((opaquedir && upper != opaquedir) ||
> (!opaquedir && !ovl_matches_upper(dentry, upper)))
> - goto out_dput_upper;
> + goto out_unlock;
>
> if (is_dir)
> err = ovl_do_rmdir(ofs, dir, upper);
> @@ -867,10 +867,9 @@ static int ovl_remove_upper(struct dentry *dentry, bool is_dir,
> */
> if (!err)
> d_drop(dentry);
> -out_dput_upper:
> - dput(upper);
> out_unlock:
> - inode_unlock(dir);
> + end_removing(upper);
> +out_dput:
> dput(opaquedir);
> out:
> return err;
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index c24c2da953bd..915af58459b7 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -423,6 +423,14 @@ static inline struct dentry *ovl_start_creating_upper(struct ovl_fs *ofs,
> parent, name);
> }
>
> +static inline struct dentry *ovl_start_removing_upper(struct ovl_fs *ofs,
> + struct dentry *parent,
> + struct qstr *name)
> +{
> + return start_removing(ovl_upper_mnt_idmap(ofs),
> + parent, name);
> +}
> +
> static inline bool ovl_open_flags_need_copy_up(int flags)
> {
> if (!flags)
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 4cbe930054a1..63941fdbc23d 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -90,6 +90,8 @@ struct dentry *lookup_one_positive_killable(struct mnt_idmap *idmap,
>
> struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *parent,
> struct qstr *name);
> +struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *parent,
> + struct qstr *name);
>
> /* end_creating - finish action started with start_creating
> * @child - dentry returned by start_creating()
> @@ -106,6 +108,21 @@ static inline void end_creating(struct dentry *child, struct dentry *parent)
> end_dirop_mkdir(child, parent);
> }
>
> +/* end_removing - finish action started with start_removing
> + * @child - dentry returned by start_removing()
> + * @parent - dentry given to start_removing()
> + *
> + * Unlock and release the child.
> + *
> + * This is identical to end_dirop(). It can be passed the result of
> + * start_removing() whether that was successful or not, but it not needed
> + * if start_removing() failed.
> + */
> +static inline void end_removing(struct dentry *child)
> +{
> + end_dirop(child);
> +}
> +
> extern int follow_down_one(struct path *);
> extern int follow_down(struct path *path, unsigned int flags);
> extern int follow_up(struct path *);
> --
> 2.50.0.107.gf914562f5916.dirty
>
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH 04/11] VFS/nfsd/cachefiles/ovl: introduce start_removing() and end_removing()
2025-09-26 2:49 ` [PATCH 04/11] VFS/nfsd/cachefiles/ovl: introduce start_removing() and end_removing() NeilBrown
2025-09-27 9:12 ` Amir Goldstein
@ 2025-10-02 17:02 ` Jeff Layton
1 sibling, 0 replies; 49+ messages in thread
From: Jeff Layton @ 2025-10-02 17:02 UTC (permalink / raw)
To: NeilBrown, Alexander Viro, Christian Brauner, Amir Goldstein
Cc: Jan Kara, linux-fsdevel
On Fri, 2025-09-26 at 12:49 +1000, NeilBrown wrote:
> From: NeilBrown <neil@brown.name>
>
> start_removing() is similar to start_creating() but will only return a
> positive dentry with the expectation that it will be removed. This is
> used by nfsd, cachefiles, and overlayfs. They are changed to also use
> end_removing() to terminate the action begun by start_removing(). This
> is a simple alias for end_dirop().
>
> Apart from changes to the error paths, as we no longer need to unlock on
> a lookup error, an effect on callers is that they don't need to test if
> the found dentry is positive or negative - they can be sure it is
> positive.
>
> Signed-off-by: NeilBrown <neil@brown.name>
> ---
> fs/cachefiles/namei.c | 25 ++++++++++---------------
> fs/namei.c | 27 +++++++++++++++++++++++++++
> fs/nfsd/nfs4recover.c | 18 +++++-------------
> fs/nfsd/vfs.c | 26 ++++++++++----------------
> fs/overlayfs/dir.c | 15 +++++++--------
> fs/overlayfs/overlayfs.h | 8 ++++++++
> include/linux/namei.h | 17 +++++++++++++++++
> 7 files changed, 84 insertions(+), 52 deletions(-)
>
> diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
> index 965b22b2f58d..3064d439807b 100644
> --- a/fs/cachefiles/namei.c
> +++ b/fs/cachefiles/namei.c
> @@ -260,6 +260,7 @@ static int cachefiles_unlink(struct cachefiles_cache *cache,
> * - File backed objects are unlinked
> * - Directory backed objects are stuffed into the graveyard for userspace to
> * delete
> + * On entry dir must be locked. It will be unlocked on exit.
> */
> int cachefiles_bury_object(struct cachefiles_cache *cache,
> struct cachefiles_object *object,
> @@ -275,7 +276,8 @@ int cachefiles_bury_object(struct cachefiles_cache *cache,
> _enter(",'%pd','%pd'", dir, rep);
>
> if (rep->d_parent != dir) {
> - inode_unlock(d_inode(dir));
> + dget(rep);
> + end_removing(rep);
> _leave(" = -ESTALE");
> return -ESTALE;
> }
> @@ -286,16 +288,16 @@ int cachefiles_bury_object(struct cachefiles_cache *cache,
> * by a file struct.
> */
> ret = cachefiles_unlink(cache, object, dir, rep, why);
> - dput(rep);
> + end_removing(rep);
>
> - inode_unlock(d_inode(dir));
> _leave(" = %d", ret);
> return ret;
> }
>
> /* directories have to be moved to the graveyard */
> _debug("move stale object to graveyard");
> - inode_unlock(d_inode(dir));
> + dget(rep);
> + end_removing(rep);
>
> try_again:
> /* first step is to make up a grave dentry in the graveyard */
> @@ -745,26 +747,20 @@ static struct dentry *cachefiles_lookup_for_cull(struct cachefiles_cache *cache,
> struct dentry *victim;
> int ret = -ENOENT;
>
> - inode_lock_nested(d_inode(dir), I_MUTEX_PARENT);
> + victim = start_removing(&nop_mnt_idmap, dir, &QSTR(filename));
>
> - victim = lookup_one(&nop_mnt_idmap, &QSTR(filename), dir);
> if (IS_ERR(victim))
> goto lookup_error;
> - if (d_is_negative(victim))
> - goto lookup_put;
> if (d_inode(victim)->i_flags & S_KERNEL_FILE)
> goto lookup_busy;
> return victim;
>
> lookup_busy:
> ret = -EBUSY;
> -lookup_put:
> - inode_unlock(d_inode(dir));
> - dput(victim);
> + end_removing(victim);
> return ERR_PTR(ret);
>
> lookup_error:
> - inode_unlock(d_inode(dir));
> ret = PTR_ERR(victim);
> if (ret == -ENOENT)
> return ERR_PTR(-ESTALE); /* Probably got retired by the netfs */
> @@ -812,18 +808,17 @@ int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *dir,
>
> ret = cachefiles_bury_object(cache, NULL, dir, victim,
> FSCACHE_OBJECT_WAS_CULLED);
> + dput(victim);
> if (ret < 0)
> goto error;
>
> fscache_count_culled();
> - dput(victim);
> _leave(" = 0");
> return 0;
>
> error_unlock:
> - inode_unlock(d_inode(dir));
> + end_removing(victim);
> error:
> - dput(victim);
> if (ret == -ENOENT)
> return -ESTALE; /* Probably got retired by the netfs */
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 064cb44a3a46..0d9e98961758 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3269,6 +3269,33 @@ struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *parent,
> }
> EXPORT_SYMBOL(start_creating);
>
> +/**
> + * start_removing - prepare to remove a given name with permission checking
> + * @idmap - idmap of the mount
> + * @parent - directory in which to find the name
> + * @name - the name to be removed
> + *
> + * Locks are taken and a lookup in performed prior to removing
> + * an object from a directory. Permission checking (MAY_EXEC) is performed
> + * against @idmap.
> + *
> + * If the name doesn't exist, an error is returned.
> + *
> + * end_removing() should be called when removal is complete, or aborted.
> + *
> + * Returns: a positive dentry, or an error.
> + */
> +struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *parent,
> + struct qstr *name)
> +{
> + int err = lookup_one_common(idmap, name, parent);
> +
> + if (err)
> + return ERR_PTR(err);
> + return start_dirop(parent, name, 0);
> +}
> +EXPORT_SYMBOL(start_removing);
> +
> #ifdef CONFIG_UNIX98_PTYS
> int path_pts(struct path *path)
> {
> diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
> index 93b2a3e764db..0f33e13a9da2 100644
> --- a/fs/nfsd/nfs4recover.c
> +++ b/fs/nfsd/nfs4recover.c
> @@ -345,20 +345,12 @@ nfsd4_unlink_clid_dir(char *name, struct nfsd_net *nn)
> dprintk("NFSD: nfsd4_unlink_clid_dir. name %s\n", name);
>
> dir = nn->rec_file->f_path.dentry;
> - inode_lock_nested(d_inode(dir), I_MUTEX_PARENT);
> - dentry = lookup_one(&nop_mnt_idmap, &QSTR(name), dir);
> - if (IS_ERR(dentry)) {
> - status = PTR_ERR(dentry);
> - goto out_unlock;
> - }
> - status = -ENOENT;
> - if (d_really_is_negative(dentry))
> - goto out;
> + dentry = start_removing(&nop_mnt_idmap, dir, &QSTR(name));
> + if (IS_ERR(dentry))
> + return PTR_ERR(dentry);
> +
> status = vfs_rmdir(&nop_mnt_idmap, d_inode(dir), dentry);
> -out:
> - dput(dentry);
> -out_unlock:
> - inode_unlock(d_inode(dir));
> + end_removing(dentry);
> return status;
> }
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 90c830c59c60..d5b4550fd8f6 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -2021,7 +2021,7 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
> {
> struct dentry *dentry, *rdentry;
> struct inode *dirp;
> - struct inode *rinode;
> + struct inode *rinode = NULL;
> __be32 err;
> int host_err;
>
> @@ -2040,24 +2040,21 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
>
> dentry = fhp->fh_dentry;
> dirp = d_inode(dentry);
> - inode_lock_nested(dirp, I_MUTEX_PARENT);
>
> - rdentry = lookup_one(&nop_mnt_idmap, &QSTR_LEN(fname, flen), dentry);
> + rdentry = start_removing(&nop_mnt_idmap, dentry, &QSTR_LEN(fname, flen));
> +
> host_err = PTR_ERR(rdentry);
> if (IS_ERR(rdentry))
> - goto out_unlock;
> + goto out_drop_write;
>
> - if (d_really_is_negative(rdentry)) {
> - dput(rdentry);
> - host_err = -ENOENT;
> - goto out_unlock;
> - }
> - rinode = d_inode(rdentry);
> err = fh_fill_pre_attrs(fhp);
> if (err != nfs_ok)
> goto out_unlock;
>
> + rinode = d_inode(rdentry);
> + /* Prevent truncation until after locks dropped */
> ihold(rinode);
> +
> if (!type)
> type = d_inode(rdentry)->i_mode & S_IFMT;
>
> @@ -2079,10 +2076,10 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
> }
> fh_fill_post_attrs(fhp);
>
> - inode_unlock(dirp);
> - if (!host_err)
> +out_unlock:
> + end_removing(rdentry);
> + if (!err && !host_err)
> host_err = commit_metadata(fhp);
> - dput(rdentry);
> iput(rinode); /* truncate the inode here */
>
> out_drop_write:
> @@ -2100,9 +2097,6 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
> }
> out:
> return err != nfs_ok ? err : nfserrno(host_err);
> -out_unlock:
> - inode_unlock(dirp);
> - goto out_drop_write;
> }
>
>
I like how the new helper simplifies this code.
>
> /*
> diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> index 0ae79efbfce7..c4057b4a050d 100644
> --- a/fs/overlayfs/dir.c
> +++ b/fs/overlayfs/dir.c
> @@ -841,17 +841,17 @@ static int ovl_remove_upper(struct dentry *dentry, bool is_dir,
> goto out;
> }
>
> - inode_lock_nested(dir, I_MUTEX_PARENT);
> - upper = ovl_lookup_upper(ofs, dentry->d_name.name, upperdir,
> - dentry->d_name.len);
> + upper = ovl_start_removing_upper(ofs, upperdir,
> + &QSTR_LEN(dentry->d_name.name,
> + dentry->d_name.len));
> err = PTR_ERR(upper);
> if (IS_ERR(upper))
> - goto out_unlock;
> + goto out_dput;
>
> err = -ESTALE;
> if ((opaquedir && upper != opaquedir) ||
> (!opaquedir && !ovl_matches_upper(dentry, upper)))
> - goto out_dput_upper;
> + goto out_unlock;
>
> if (is_dir)
> err = ovl_do_rmdir(ofs, dir, upper);
> @@ -867,10 +867,9 @@ static int ovl_remove_upper(struct dentry *dentry, bool is_dir,
> */
> if (!err)
> d_drop(dentry);
> -out_dput_upper:
> - dput(upper);
> out_unlock:
> - inode_unlock(dir);
> + end_removing(upper);
> +out_dput:
> dput(opaquedir);
> out:
> return err;
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index c24c2da953bd..915af58459b7 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -423,6 +423,14 @@ static inline struct dentry *ovl_start_creating_upper(struct ovl_fs *ofs,
> parent, name);
> }
>
> +static inline struct dentry *ovl_start_removing_upper(struct ovl_fs *ofs,
> + struct dentry *parent,
> + struct qstr *name)
> +{
> + return start_removing(ovl_upper_mnt_idmap(ofs),
> + parent, name);
> +}
> +
> static inline bool ovl_open_flags_need_copy_up(int flags)
> {
> if (!flags)
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 4cbe930054a1..63941fdbc23d 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -90,6 +90,8 @@ struct dentry *lookup_one_positive_killable(struct mnt_idmap *idmap,
>
> struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *parent,
> struct qstr *name);
> +struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *parent,
> + struct qstr *name);
>
> /* end_creating - finish action started with start_creating
> * @child - dentry returned by start_creating()
> @@ -106,6 +108,21 @@ static inline void end_creating(struct dentry *child, struct dentry *parent)
> end_dirop_mkdir(child, parent);
> }
>
> +/* end_removing - finish action started with start_removing
> + * @child - dentry returned by start_removing()
> + * @parent - dentry given to start_removing()
> + *
> + * Unlock and release the child.
> + *
> + * This is identical to end_dirop(). It can be passed the result of
> + * start_removing() whether that was successful or not, but it not needed
> + * if start_removing() failed.
> + */
> +static inline void end_removing(struct dentry *child)
> +{
> + end_dirop(child);
> +}
> +
> extern int follow_down_one(struct path *);
> extern int follow_down(struct path *path, unsigned int flags);
> extern int follow_up(struct path *);
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH 05/11] VFS: introduce start_creating_noperm() and start_removing_noperm()
2025-09-26 2:49 [PATCH 00/11] Create APIs to centralise locking for directory ops NeilBrown
` (3 preceding siblings ...)
2025-09-26 2:49 ` [PATCH 04/11] VFS/nfsd/cachefiles/ovl: introduce start_removing() and end_removing() NeilBrown
@ 2025-09-26 2:49 ` NeilBrown
2025-09-28 12:26 ` Amir Goldstein
2025-10-02 17:13 ` Jeff Layton
2025-09-26 2:49 ` [PATCH 06/11] VFS: introduce start_removing_dentry() NeilBrown
` (6 subsequent siblings)
11 siblings, 2 replies; 49+ messages in thread
From: NeilBrown @ 2025-09-26 2:49 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
xfs, fuse, ipc/mqueue need variants of start_creating or start_removing
which do not check permissions.
This patch adds _noperm versions of these functions.
Note that do_mq_open() was only calling mntget() so it could call
path_put() - it didn't really need an extra reference on the mnt.
Now it doesn't call mntget() and uses end_creating() which does
the dput() half of path_put().
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/fuse/dir.c | 19 +++++++---------
fs/namei.c | 48 ++++++++++++++++++++++++++++++++++++++++
fs/xfs/scrub/orphanage.c | 11 ++++-----
include/linux/namei.h | 2 ++
ipc/mqueue.c | 31 +++++++++-----------------
5 files changed, 73 insertions(+), 38 deletions(-)
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 5c569c3cb53f..88bc512639e2 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1404,27 +1404,25 @@ int fuse_reverse_inval_entry(struct fuse_conn *fc, u64 parent_nodeid,
if (!parent)
return -ENOENT;
- inode_lock_nested(parent, I_MUTEX_PARENT);
if (!S_ISDIR(parent->i_mode))
- goto unlock;
+ goto put_parent;
err = -ENOENT;
dir = d_find_alias(parent);
if (!dir)
- goto unlock;
+ goto put_parent;
- name->hash = full_name_hash(dir, name->name, name->len);
- entry = d_lookup(dir, name);
+ entry = start_removing_noperm(dir, name);
dput(dir);
- if (!entry)
- goto unlock;
+ if (IS_ERR(entry))
+ goto put_parent;
fuse_dir_changed(parent);
if (!(flags & FUSE_EXPIRE_ONLY))
d_invalidate(entry);
fuse_invalidate_entry_cache(entry);
- if (child_nodeid != 0 && d_really_is_positive(entry)) {
+ if (child_nodeid != 0) {
inode_lock(d_inode(entry));
if (get_node_id(d_inode(entry)) != child_nodeid) {
err = -ENOENT;
@@ -1452,10 +1450,9 @@ int fuse_reverse_inval_entry(struct fuse_conn *fc, u64 parent_nodeid,
} else {
err = 0;
}
- dput(entry);
- unlock:
- inode_unlock(parent);
+ end_removing(entry);
+ put_parent:
iput(parent);
return err;
}
diff --git a/fs/namei.c b/fs/namei.c
index 0d9e98961758..bd5c45801756 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3296,6 +3296,54 @@ struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *parent,
}
EXPORT_SYMBOL(start_removing);
+/**
+ * start_creating_noperm - prepare to create a given name without permission checking
+ * @parent - directory in which to prepare to create the name
+ * @name - the name to be created
+ *
+ * Locks are taken and a lookup in performed prior to creating
+ * an object in a directory.
+ *
+ * If the name already exists, a positive dentry is returned.
+ *
+ * Returns: a negative or positive dentry, or an error.
+ */
+struct dentry *start_creating_noperm(struct dentry *parent,
+ struct qstr *name)
+{
+ int err = lookup_noperm_common(name, parent);
+
+ if (err)
+ return ERR_PTR(err);
+ return start_dirop(parent, name, LOOKUP_CREATE);
+}
+EXPORT_SYMBOL(start_creating_noperm);
+
+/**
+ * start_removing_noperm - prepare to remove a given name without permission checking
+ * @parent - directory in which to find the name
+ * @name - the name to be removed
+ *
+ * Locks are taken and a lookup in performed prior to removing
+ * an object from a directory.
+ *
+ * If the name doesn't exist, an error is returned.
+ *
+ * end_removing() should be called when removal is complete, or aborted.
+ *
+ * Returns: a positive dentry, or an error.
+ */
+struct dentry *start_removing_noperm(struct dentry *parent,
+ struct qstr *name)
+{
+ int err = lookup_noperm_common(name, parent);
+
+ if (err)
+ return ERR_PTR(err);
+ return start_dirop(parent, name, 0);
+}
+EXPORT_SYMBOL(start_removing_noperm);
+
#ifdef CONFIG_UNIX98_PTYS
int path_pts(struct path *path)
{
diff --git a/fs/xfs/scrub/orphanage.c b/fs/xfs/scrub/orphanage.c
index 9c12cb844231..e732605924a1 100644
--- a/fs/xfs/scrub/orphanage.c
+++ b/fs/xfs/scrub/orphanage.c
@@ -152,11 +152,10 @@ xrep_orphanage_create(
}
/* Try to find the orphanage directory. */
- inode_lock_nested(root_inode, I_MUTEX_PARENT);
- orphanage_dentry = lookup_noperm(&QSTR(ORPHANAGE), root_dentry);
+ orphanage_dentry = start_creating_noperm(root_dentry, &QSTR(ORPHANAGE));
if (IS_ERR(orphanage_dentry)) {
error = PTR_ERR(orphanage_dentry);
- goto out_unlock_root;
+ goto out_dput_root;
}
/*
@@ -170,7 +169,7 @@ xrep_orphanage_create(
orphanage_dentry, 0750);
error = PTR_ERR(orphanage_dentry);
if (IS_ERR(orphanage_dentry))
- goto out_unlock_root;
+ goto out_dput_orphanage;
}
/* Not a directory? Bail out. */
@@ -200,9 +199,7 @@ xrep_orphanage_create(
sc->orphanage_ilock_flags = 0;
out_dput_orphanage:
- dput(orphanage_dentry);
-out_unlock_root:
- inode_unlock(VFS_I(sc->mp->m_rootip));
+ end_creating(orphanage_dentry, root_dentry);
out_dput_root:
dput(root_dentry);
out:
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 63941fdbc23d..20a88a46fe92 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -92,6 +92,8 @@ struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *parent,
struct qstr *name);
struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *parent,
struct qstr *name);
+struct dentry *start_creating_noperm(struct dentry *parent, struct qstr *name);
+struct dentry *start_removing_noperm(struct dentry *parent, struct qstr *name);
/* end_creating - finish action started with start_creating
* @child - dentry returned by start_creating()
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 093551fe66a7..060e8e9c4f59 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -913,13 +913,11 @@ static int do_mq_open(const char __user *u_name, int oflag, umode_t mode,
goto out_putname;
ro = mnt_want_write(mnt); /* we'll drop it in any case */
- inode_lock(d_inode(root));
- path.dentry = lookup_noperm(&QSTR(name->name), root);
+ path.dentry = start_creating_noperm(root, &QSTR(name->name));
if (IS_ERR(path.dentry)) {
error = PTR_ERR(path.dentry);
goto out_putfd;
}
- path.mnt = mntget(mnt);
error = prepare_open(path.dentry, oflag, ro, mode, name, attr);
if (!error) {
struct file *file = dentry_open(&path, oflag, current_cred());
@@ -928,13 +926,12 @@ static int do_mq_open(const char __user *u_name, int oflag, umode_t mode,
else
error = PTR_ERR(file);
}
- path_put(&path);
out_putfd:
if (error) {
put_unused_fd(fd);
fd = error;
}
- inode_unlock(d_inode(root));
+ end_creating(path.dentry, root);
if (!ro)
mnt_drop_write(mnt);
out_putname:
@@ -957,7 +954,7 @@ SYSCALL_DEFINE1(mq_unlink, const char __user *, u_name)
int err;
struct filename *name;
struct dentry *dentry;
- struct inode *inode = NULL;
+ struct inode *inode;
struct ipc_namespace *ipc_ns = current->nsproxy->ipc_ns;
struct vfsmount *mnt = ipc_ns->mq_mnt;
@@ -969,26 +966,20 @@ SYSCALL_DEFINE1(mq_unlink, const char __user *, u_name)
err = mnt_want_write(mnt);
if (err)
goto out_name;
- inode_lock_nested(d_inode(mnt->mnt_root), I_MUTEX_PARENT);
- dentry = lookup_noperm(&QSTR(name->name), mnt->mnt_root);
+ dentry = start_removing_noperm(mnt->mnt_root, &QSTR(name->name));
if (IS_ERR(dentry)) {
err = PTR_ERR(dentry);
- goto out_unlock;
+ goto out_drop_write;
}
inode = d_inode(dentry);
- if (!inode) {
- err = -ENOENT;
- } else {
- ihold(inode);
- err = vfs_unlink(&nop_mnt_idmap, d_inode(dentry->d_parent),
- dentry, NULL);
- }
- dput(dentry);
-
-out_unlock:
- inode_unlock(d_inode(mnt->mnt_root));
+ ihold(inode);
+ err = vfs_unlink(&nop_mnt_idmap, d_inode(dentry->d_parent),
+ dentry, NULL);
+ end_removing(dentry);
iput(inode);
+
+out_drop_write:
mnt_drop_write(mnt);
out_name:
putname(name);
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 49+ messages in thread* Re: [PATCH 05/11] VFS: introduce start_creating_noperm() and start_removing_noperm()
2025-09-26 2:49 ` [PATCH 05/11] VFS: introduce start_creating_noperm() and start_removing_noperm() NeilBrown
@ 2025-09-28 12:26 ` Amir Goldstein
2025-10-02 17:13 ` Jeff Layton
1 sibling, 0 replies; 49+ messages in thread
From: Amir Goldstein @ 2025-09-28 12:26 UTC (permalink / raw)
To: NeilBrown
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Fri, Sep 26, 2025 at 4:50 AM NeilBrown <neilb@ownmail.net> wrote:
>
> From: NeilBrown <neil@brown.name>
>
> xfs, fuse, ipc/mqueue need variants of start_creating or start_removing
> which do not check permissions.
> This patch adds _noperm versions of these functions.
>
> Note that do_mq_open() was only calling mntget() so it could call
> path_put() - it didn't really need an extra reference on the mnt.
> Now it doesn't call mntget() and uses end_creating() which does
> the dput() half of path_put().
>
> Signed-off-by: NeilBrown <neil@brown.name>
Feel free to add:
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
But see question below
> ---
> fs/fuse/dir.c | 19 +++++++---------
> fs/namei.c | 48 ++++++++++++++++++++++++++++++++++++++++
> fs/xfs/scrub/orphanage.c | 11 ++++-----
> include/linux/namei.h | 2 ++
> ipc/mqueue.c | 31 +++++++++-----------------
> 5 files changed, 73 insertions(+), 38 deletions(-)
>
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 5c569c3cb53f..88bc512639e2 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -1404,27 +1404,25 @@ int fuse_reverse_inval_entry(struct fuse_conn *fc, u64 parent_nodeid,
> if (!parent)
> return -ENOENT;
>
> - inode_lock_nested(parent, I_MUTEX_PARENT);
> if (!S_ISDIR(parent->i_mode))
> - goto unlock;
> + goto put_parent;
>
> err = -ENOENT;
> dir = d_find_alias(parent);
> if (!dir)
> - goto unlock;
> + goto put_parent;
>
> - name->hash = full_name_hash(dir, name->name, name->len);
> - entry = d_lookup(dir, name);
> + entry = start_removing_noperm(dir, name);
> dput(dir);
> - if (!entry)
> - goto unlock;
> + if (IS_ERR(entry))
> + goto put_parent;
>
> fuse_dir_changed(parent);
> if (!(flags & FUSE_EXPIRE_ONLY))
> d_invalidate(entry);
> fuse_invalidate_entry_cache(entry);
>
> - if (child_nodeid != 0 && d_really_is_positive(entry)) {
> + if (child_nodeid != 0) {
> inode_lock(d_inode(entry));
> if (get_node_id(d_inode(entry)) != child_nodeid) {
> err = -ENOENT;
> @@ -1452,10 +1450,9 @@ int fuse_reverse_inval_entry(struct fuse_conn *fc, u64 parent_nodeid,
> } else {
> err = 0;
> }
> - dput(entry);
>
> - unlock:
> - inode_unlock(parent);
> + end_removing(entry);
> + put_parent:
> iput(parent);
> return err;
> }
> diff --git a/fs/namei.c b/fs/namei.c
> index 0d9e98961758..bd5c45801756 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3296,6 +3296,54 @@ struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *parent,
> }
> EXPORT_SYMBOL(start_removing);
>
> +/**
> + * start_creating_noperm - prepare to create a given name without permission checking
> + * @parent - directory in which to prepare to create the name
> + * @name - the name to be created
> + *
> + * Locks are taken and a lookup in performed prior to creating
> + * an object in a directory.
> + *
> + * If the name already exists, a positive dentry is returned.
> + *
> + * Returns: a negative or positive dentry, or an error.
> + */
> +struct dentry *start_creating_noperm(struct dentry *parent,
> + struct qstr *name)
> +{
> + int err = lookup_noperm_common(name, parent);
> +
> + if (err)
> + return ERR_PTR(err);
> + return start_dirop(parent, name, LOOKUP_CREATE);
> +}
> +EXPORT_SYMBOL(start_creating_noperm);
> +
> +/**
> + * start_removing_noperm - prepare to remove a given name without permission checking
> + * @parent - directory in which to find the name
> + * @name - the name to be removed
> + *
> + * Locks are taken and a lookup in performed prior to removing
> + * an object from a directory.
> + *
> + * If the name doesn't exist, an error is returned.
> + *
> + * end_removing() should be called when removal is complete, or aborted.
> + *
> + * Returns: a positive dentry, or an error.
I noticed that this does not say "...whose parent is @parent"
and "whose d_parent/d_name are guaranteed to remain stable
until the call to end_removing()"
Do you think this is something that should be spelled out before
the parent lock is dropped??
The reason I am asking is...
> + */
> +struct dentry *start_removing_noperm(struct dentry *parent,
> + struct qstr *name)
> +{
> + int err = lookup_noperm_common(name, parent);
> +
> + if (err)
> + return ERR_PTR(err);
> + return start_dirop(parent, name, 0);
> +}
> +EXPORT_SYMBOL(start_removing_noperm);
> +
> #ifdef CONFIG_UNIX98_PTYS
> int path_pts(struct path *path)
> {
> diff --git a/fs/xfs/scrub/orphanage.c b/fs/xfs/scrub/orphanage.c
> index 9c12cb844231..e732605924a1 100644
> --- a/fs/xfs/scrub/orphanage.c
> +++ b/fs/xfs/scrub/orphanage.c
> @@ -152,11 +152,10 @@ xrep_orphanage_create(
> }
>
> /* Try to find the orphanage directory. */
> - inode_lock_nested(root_inode, I_MUTEX_PARENT);
> - orphanage_dentry = lookup_noperm(&QSTR(ORPHANAGE), root_dentry);
> + orphanage_dentry = start_creating_noperm(root_dentry, &QSTR(ORPHANAGE));
> if (IS_ERR(orphanage_dentry)) {
> error = PTR_ERR(orphanage_dentry);
> - goto out_unlock_root;
> + goto out_dput_root;
> }
>
> /*
> @@ -170,7 +169,7 @@ xrep_orphanage_create(
> orphanage_dentry, 0750);
> error = PTR_ERR(orphanage_dentry);
> if (IS_ERR(orphanage_dentry))
> - goto out_unlock_root;
> + goto out_dput_orphanage;
> }
>
> /* Not a directory? Bail out. */
> @@ -200,9 +199,7 @@ xrep_orphanage_create(
> sc->orphanage_ilock_flags = 0;
>
> out_dput_orphanage:
> - dput(orphanage_dentry);
> -out_unlock_root:
> - inode_unlock(VFS_I(sc->mp->m_rootip));
> + end_creating(orphanage_dentry, root_dentry);
> out_dput_root:
> dput(root_dentry);
> out:
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 63941fdbc23d..20a88a46fe92 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -92,6 +92,8 @@ struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *parent,
> struct qstr *name);
> struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *parent,
> struct qstr *name);
> +struct dentry *start_creating_noperm(struct dentry *parent, struct qstr *name);
> +struct dentry *start_removing_noperm(struct dentry *parent, struct qstr *name);
>
> /* end_creating - finish action started with start_creating
> * @child - dentry returned by start_creating()
> diff --git a/ipc/mqueue.c b/ipc/mqueue.c
> index 093551fe66a7..060e8e9c4f59 100644
> --- a/ipc/mqueue.c
> +++ b/ipc/mqueue.c
> @@ -913,13 +913,11 @@ static int do_mq_open(const char __user *u_name, int oflag, umode_t mode,
> goto out_putname;
>
> ro = mnt_want_write(mnt); /* we'll drop it in any case */
> - inode_lock(d_inode(root));
> - path.dentry = lookup_noperm(&QSTR(name->name), root);
> + path.dentry = start_creating_noperm(root, &QSTR(name->name));
> if (IS_ERR(path.dentry)) {
> error = PTR_ERR(path.dentry);
> goto out_putfd;
> }
> - path.mnt = mntget(mnt);
> error = prepare_open(path.dentry, oflag, ro, mode, name, attr);
> if (!error) {
> struct file *file = dentry_open(&path, oflag, current_cred());
> @@ -928,13 +926,12 @@ static int do_mq_open(const char __user *u_name, int oflag, umode_t mode,
> else
> error = PTR_ERR(file);
> }
> - path_put(&path);
> out_putfd:
> if (error) {
> put_unused_fd(fd);
> fd = error;
> }
> - inode_unlock(d_inode(root));
> + end_creating(path.dentry, root);
> if (!ro)
> mnt_drop_write(mnt);
> out_putname:
> @@ -957,7 +954,7 @@ SYSCALL_DEFINE1(mq_unlink, const char __user *, u_name)
> int err;
> struct filename *name;
> struct dentry *dentry;
> - struct inode *inode = NULL;
> + struct inode *inode;
> struct ipc_namespace *ipc_ns = current->nsproxy->ipc_ns;
> struct vfsmount *mnt = ipc_ns->mq_mnt;
>
> @@ -969,26 +966,20 @@ SYSCALL_DEFINE1(mq_unlink, const char __user *, u_name)
> err = mnt_want_write(mnt);
> if (err)
> goto out_name;
> - inode_lock_nested(d_inode(mnt->mnt_root), I_MUTEX_PARENT);
> - dentry = lookup_noperm(&QSTR(name->name), mnt->mnt_root);
> + dentry = start_removing_noperm(mnt->mnt_root, &QSTR(name->name));
> if (IS_ERR(dentry)) {
> err = PTR_ERR(dentry);
> - goto out_unlock;
> + goto out_drop_write;
> }
>
> inode = d_inode(dentry);
> - if (!inode) {
> - err = -ENOENT;
> - } else {
> - ihold(inode);
> - err = vfs_unlink(&nop_mnt_idmap, d_inode(dentry->d_parent),
> - dentry, NULL);
> - }
> - dput(dentry);
> -
> -out_unlock:
> - inode_unlock(d_inode(mnt->mnt_root));
> + ihold(inode);
> + err = vfs_unlink(&nop_mnt_idmap, d_inode(dentry->d_parent),
> + dentry, NULL);
This code (rightfully) assumes that (mnt->mnt_root == dentry->d_parent)
Maybe that's obvious and does not need any clarification, but since
you are properly documenting a new interface, maybe this is worth
mentioning for clarity.
Really up to you.
Thanks,
Amir.
> + end_removing(dentry);
> iput(inode);
> +
> +out_drop_write:
> mnt_drop_write(mnt);
> out_name:
> putname(name);
> --
> 2.50.0.107.gf914562f5916.dirty
>
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH 05/11] VFS: introduce start_creating_noperm() and start_removing_noperm()
2025-09-26 2:49 ` [PATCH 05/11] VFS: introduce start_creating_noperm() and start_removing_noperm() NeilBrown
2025-09-28 12:26 ` Amir Goldstein
@ 2025-10-02 17:13 ` Jeff Layton
1 sibling, 0 replies; 49+ messages in thread
From: Jeff Layton @ 2025-10-02 17:13 UTC (permalink / raw)
To: NeilBrown, Alexander Viro, Christian Brauner, Amir Goldstein
Cc: Jan Kara, linux-fsdevel
On Fri, 2025-09-26 at 12:49 +1000, NeilBrown wrote:
> From: NeilBrown <neil@brown.name>
>
> xfs, fuse, ipc/mqueue need variants of start_creating or start_removing
> which do not check permissions.
> This patch adds _noperm versions of these functions.
>
> Note that do_mq_open() was only calling mntget() so it could call
> path_put() - it didn't really need an extra reference on the mnt.
> Now it doesn't call mntget() and uses end_creating() which does
> the dput() half of path_put().
>
> Signed-off-by: NeilBrown <neil@brown.name>
> ---
> fs/fuse/dir.c | 19 +++++++---------
> fs/namei.c | 48 ++++++++++++++++++++++++++++++++++++++++
> fs/xfs/scrub/orphanage.c | 11 ++++-----
> include/linux/namei.h | 2 ++
> ipc/mqueue.c | 31 +++++++++-----------------
> 5 files changed, 73 insertions(+), 38 deletions(-)
>
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 5c569c3cb53f..88bc512639e2 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -1404,27 +1404,25 @@ int fuse_reverse_inval_entry(struct fuse_conn *fc, u64 parent_nodeid,
> if (!parent)
> return -ENOENT;
>
> - inode_lock_nested(parent, I_MUTEX_PARENT);
> if (!S_ISDIR(parent->i_mode))
> - goto unlock;
> + goto put_parent;
>
> err = -ENOENT;
> dir = d_find_alias(parent);
> if (!dir)
> - goto unlock;
> + goto put_parent;
>
> - name->hash = full_name_hash(dir, name->name, name->len);
> - entry = d_lookup(dir, name);
> + entry = start_removing_noperm(dir, name);
> dput(dir);
> - if (!entry)
> - goto unlock;
> + if (IS_ERR(entry))
> + goto put_parent;
>
> fuse_dir_changed(parent);
> if (!(flags & FUSE_EXPIRE_ONLY))
> d_invalidate(entry);
> fuse_invalidate_entry_cache(entry);
>
> - if (child_nodeid != 0 && d_really_is_positive(entry)) {
> + if (child_nodeid != 0) {
> inode_lock(d_inode(entry));
> if (get_node_id(d_inode(entry)) != child_nodeid) {
> err = -ENOENT;
> @@ -1452,10 +1450,9 @@ int fuse_reverse_inval_entry(struct fuse_conn *fc, u64 parent_nodeid,
> } else {
> err = 0;
> }
> - dput(entry);
>
> - unlock:
> - inode_unlock(parent);
> + end_removing(entry);
> + put_parent:
> iput(parent);
> return err;
> }
> diff --git a/fs/namei.c b/fs/namei.c
> index 0d9e98961758..bd5c45801756 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3296,6 +3296,54 @@ struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *parent,
> }
> EXPORT_SYMBOL(start_removing);
>
> +/**
> + * start_creating_noperm - prepare to create a given name without permission checking
> + * @parent - directory in which to prepare to create the name
> + * @name - the name to be created
> + *
> + * Locks are taken and a lookup in performed prior to creating
> + * an object in a directory.
> + *
> + * If the name already exists, a positive dentry is returned.
> + *
> + * Returns: a negative or positive dentry, or an error.
> + */
> +struct dentry *start_creating_noperm(struct dentry *parent,
> + struct qstr *name)
> +{
> + int err = lookup_noperm_common(name, parent);
> +
> + if (err)
> + return ERR_PTR(err);
> + return start_dirop(parent, name, LOOKUP_CREATE);
> +}
> +EXPORT_SYMBOL(start_creating_noperm);
> +
> +/**
> + * start_removing_noperm - prepare to remove a given name without permission checking
> + * @parent - directory in which to find the name
> + * @name - the name to be removed
> + *
> + * Locks are taken and a lookup in performed prior to removing
> + * an object from a directory.
> + *
> + * If the name doesn't exist, an error is returned.
> + *
> + * end_removing() should be called when removal is complete, or aborted.
> + *
> + * Returns: a positive dentry, or an error.
> + */
> +struct dentry *start_removing_noperm(struct dentry *parent,
> + struct qstr *name)
> +{
> + int err = lookup_noperm_common(name, parent);
> +
> + if (err)
> + return ERR_PTR(err);
> + return start_dirop(parent, name, 0);
> +}
> +EXPORT_SYMBOL(start_removing_noperm);
> +
> #ifdef CONFIG_UNIX98_PTYS
> int path_pts(struct path *path)
> {
> diff --git a/fs/xfs/scrub/orphanage.c b/fs/xfs/scrub/orphanage.c
> index 9c12cb844231..e732605924a1 100644
> --- a/fs/xfs/scrub/orphanage.c
> +++ b/fs/xfs/scrub/orphanage.c
> @@ -152,11 +152,10 @@ xrep_orphanage_create(
> }
>
> /* Try to find the orphanage directory. */
> - inode_lock_nested(root_inode, I_MUTEX_PARENT);
> - orphanage_dentry = lookup_noperm(&QSTR(ORPHANAGE), root_dentry);
> + orphanage_dentry = start_creating_noperm(root_dentry, &QSTR(ORPHANAGE));
> if (IS_ERR(orphanage_dentry)) {
> error = PTR_ERR(orphanage_dentry);
> - goto out_unlock_root;
> + goto out_dput_root;
> }
>
> /*
> @@ -170,7 +169,7 @@ xrep_orphanage_create(
> orphanage_dentry, 0750);
> error = PTR_ERR(orphanage_dentry);
> if (IS_ERR(orphanage_dentry))
> - goto out_unlock_root;
> + goto out_dput_orphanage;
> }
>
> /* Not a directory? Bail out. */
> @@ -200,9 +199,7 @@ xrep_orphanage_create(
> sc->orphanage_ilock_flags = 0;
>
> out_dput_orphanage:
> - dput(orphanage_dentry);
> -out_unlock_root:
> - inode_unlock(VFS_I(sc->mp->m_rootip));
> + end_creating(orphanage_dentry, root_dentry);
> out_dput_root:
> dput(root_dentry);
> out:
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 63941fdbc23d..20a88a46fe92 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -92,6 +92,8 @@ struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *parent,
> struct qstr *name);
> struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *parent,
> struct qstr *name);
> +struct dentry *start_creating_noperm(struct dentry *parent, struct qstr *name);
> +struct dentry *start_removing_noperm(struct dentry *parent, struct qstr *name);
>
> /* end_creating - finish action started with start_creating
> * @child - dentry returned by start_creating()
> diff --git a/ipc/mqueue.c b/ipc/mqueue.c
> index 093551fe66a7..060e8e9c4f59 100644
> --- a/ipc/mqueue.c
> +++ b/ipc/mqueue.c
> @@ -913,13 +913,11 @@ static int do_mq_open(const char __user *u_name, int oflag, umode_t mode,
> goto out_putname;
>
> ro = mnt_want_write(mnt); /* we'll drop it in any case */
> - inode_lock(d_inode(root));
> - path.dentry = lookup_noperm(&QSTR(name->name), root);
> + path.dentry = start_creating_noperm(root, &QSTR(name->name));
> if (IS_ERR(path.dentry)) {
> error = PTR_ERR(path.dentry);
> goto out_putfd;
> }
> - path.mnt = mntget(mnt);
> error = prepare_open(path.dentry, oflag, ro, mode, name, attr);
> if (!error) {
> struct file *file = dentry_open(&path, oflag, current_cred());
> @@ -928,13 +926,12 @@ static int do_mq_open(const char __user *u_name, int oflag, umode_t mode,
> else
> error = PTR_ERR(file);
> }
> - path_put(&path);
> out_putfd:
> if (error) {
> put_unused_fd(fd);
> fd = error;
> }
> - inode_unlock(d_inode(root));
> + end_creating(path.dentry, root);
> if (!ro)
> mnt_drop_write(mnt);
> out_putname:
> @@ -957,7 +954,7 @@ SYSCALL_DEFINE1(mq_unlink, const char __user *, u_name)
> int err;
> struct filename *name;
> struct dentry *dentry;
> - struct inode *inode = NULL;
> + struct inode *inode;
> struct ipc_namespace *ipc_ns = current->nsproxy->ipc_ns;
> struct vfsmount *mnt = ipc_ns->mq_mnt;
>
> @@ -969,26 +966,20 @@ SYSCALL_DEFINE1(mq_unlink, const char __user *, u_name)
> err = mnt_want_write(mnt);
> if (err)
> goto out_name;
> - inode_lock_nested(d_inode(mnt->mnt_root), I_MUTEX_PARENT);
> - dentry = lookup_noperm(&QSTR(name->name), mnt->mnt_root);
> + dentry = start_removing_noperm(mnt->mnt_root, &QSTR(name->name));
> if (IS_ERR(dentry)) {
> err = PTR_ERR(dentry);
> - goto out_unlock;
> + goto out_drop_write;
> }
>
> inode = d_inode(dentry);
> - if (!inode) {
> - err = -ENOENT;
> - } else {
> - ihold(inode);
> - err = vfs_unlink(&nop_mnt_idmap, d_inode(dentry->d_parent),
> - dentry, NULL);
> - }
> - dput(dentry);
> -
> -out_unlock:
> - inode_unlock(d_inode(mnt->mnt_root));
> + ihold(inode);
> + err = vfs_unlink(&nop_mnt_idmap, d_inode(dentry->d_parent),
> + dentry, NULL);
> + end_removing(dentry);
> iput(inode);
> +
> +out_drop_write:
> mnt_drop_write(mnt);
> out_name:
> putname(name);
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH 06/11] VFS: introduce start_removing_dentry()
2025-09-26 2:49 [PATCH 00/11] Create APIs to centralise locking for directory ops NeilBrown
` (4 preceding siblings ...)
2025-09-26 2:49 ` [PATCH 05/11] VFS: introduce start_creating_noperm() and start_removing_noperm() NeilBrown
@ 2025-09-26 2:49 ` NeilBrown
2025-09-27 9:32 ` Amir Goldstein
2025-10-02 17:19 ` Jeff Layton
2025-09-26 2:49 ` [PATCH 07/11] VFS: add start_creating_killable() and start_removing_killable() NeilBrown
` (5 subsequent siblings)
11 siblings, 2 replies; 49+ messages in thread
From: NeilBrown @ 2025-09-26 2:49 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
start_removing_dentry() is similar to start_removing() but instead of
providing a name for lookup, the target dentry is given.
start_removing_dentry() checks that the dentry is still hashed and in
the parent, and if so it locks and increases the refcount so that
end_removing() can be used to finish the operation.
This is used in cachefiles, overlayfs, smb/server, and apparmor.
There will be other users including ecryptfs.
As start_removing_dentry() takes an extra reference to the dentry (to be
put by end_removing()), there is no need to explicitly take an extra
reference to stop d_delete() from using dentry_unlink_inode() to negate
the dentry - as in cachefiles_delete_object(), and ksmbd_vfs_unlink().
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/cachefiles/interface.c | 14 +++++++++-----
fs/cachefiles/namei.c | 22 ++++++++++++----------
fs/cachefiles/volume.c | 10 +++++++---
fs/namei.c | 29 +++++++++++++++++++++++++++++
fs/overlayfs/dir.c | 10 ++++------
fs/overlayfs/readdir.c | 8 ++++----
fs/smb/server/vfs.c | 27 ++++-----------------------
include/linux/namei.h | 2 ++
security/apparmor/apparmorfs.c | 8 ++++----
9 files changed, 75 insertions(+), 55 deletions(-)
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 3e63cfe15874..3f8a6f1a8fc3 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -9,6 +9,7 @@
#include <linux/mount.h>
#include <linux/xattr.h>
#include <linux/file.h>
+#include <linux/namei.h>
#include <linux/falloc.h>
#include <trace/events/fscache.h>
#include "internal.h"
@@ -428,11 +429,14 @@ static bool cachefiles_invalidate_cookie(struct fscache_cookie *cookie)
if (!old_tmpfile) {
struct cachefiles_volume *volume = object->volume;
struct dentry *fan = volume->fanout[(u8)cookie->key_hash];
-
- inode_lock_nested(d_inode(fan), I_MUTEX_PARENT);
- cachefiles_bury_object(volume->cache, object, fan,
- old_file->f_path.dentry,
- FSCACHE_OBJECT_INVALIDATED);
+ struct dentry *obj;
+
+ obj = start_removing_dentry(fan, old_file->f_path.dentry);
+ if (!IS_ERR(obj))
+ cachefiles_bury_object(volume->cache, object,
+ fan, obj,
+ FSCACHE_OBJECT_INVALIDATED);
+ end_removing(obj);
}
fput(old_file);
}
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 3064d439807b..80a3055d8ae5 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -424,13 +424,12 @@ int cachefiles_delete_object(struct cachefiles_object *object,
_enter(",OBJ%x{%pD}", object->debug_id, object->file);
- /* Stop the dentry being negated if it's only pinned by a file struct. */
- dget(dentry);
-
- inode_lock_nested(d_backing_inode(fan), I_MUTEX_PARENT);
- ret = cachefiles_unlink(volume->cache, object, fan, dentry, why);
- inode_unlock(d_backing_inode(fan));
- dput(dentry);
+ dentry = start_removing_dentry(fan, dentry);
+ if (IS_ERR(dentry))
+ ret = PTR_ERR(dentry);
+ else
+ ret = cachefiles_unlink(volume->cache, object, fan, dentry, why);
+ end_removing(dentry);
return ret;
}
@@ -643,9 +642,12 @@ bool cachefiles_look_up_object(struct cachefiles_object *object)
if (!d_is_reg(dentry)) {
pr_err("%pd is not a file\n", dentry);
- inode_lock_nested(d_inode(fan), I_MUTEX_PARENT);
- ret = cachefiles_bury_object(volume->cache, object, fan, dentry,
- FSCACHE_OBJECT_IS_WEIRD);
+ struct dentry *de = start_removing_dentry(fan, dentry);
+ if (!IS_ERR(de))
+ ret = cachefiles_bury_object(volume->cache, object,
+ fan, de,
+ FSCACHE_OBJECT_IS_WEIRD);
+ end_removing(de);
dput(dentry);
if (ret < 0)
return false;
diff --git a/fs/cachefiles/volume.c b/fs/cachefiles/volume.c
index 781aac4ef274..ddf95ff5daf0 100644
--- a/fs/cachefiles/volume.c
+++ b/fs/cachefiles/volume.c
@@ -7,6 +7,7 @@
#include <linux/fs.h>
#include <linux/slab.h>
+#include <linux/namei.h>
#include "internal.h"
#include <trace/events/fscache.h>
@@ -58,9 +59,12 @@ void cachefiles_acquire_volume(struct fscache_volume *vcookie)
if (ret < 0) {
if (ret != -ESTALE)
goto error_dir;
- inode_lock_nested(d_inode(cache->store), I_MUTEX_PARENT);
- cachefiles_bury_object(cache, NULL, cache->store, vdentry,
- FSCACHE_VOLUME_IS_WEIRD);
+ vdentry = start_removing_dentry(cache->store, vdentry);
+ if (!IS_ERR(vdentry))
+ cachefiles_bury_object(cache, NULL, cache->store,
+ vdentry,
+ FSCACHE_VOLUME_IS_WEIRD);
+ end_removing(vdentry);
cachefiles_put_directory(volume->dentry);
cond_resched();
goto retry;
diff --git a/fs/namei.c b/fs/namei.c
index bd5c45801756..cb4d40af12ae 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3344,6 +3344,35 @@ struct dentry *start_removing_noperm(struct dentry *parent,
}
EXPORT_SYMBOL(start_removing_noperm);
+/**
+ * start_removing_dentry - prepare to remove a given dentry
+ * @parent - directory from which dentry should be removed
+ * @child - the dentry to be removed
+ *
+ * A lock is taken to protect the dentry again other dirops and
+ * the validity of the dentry is checked: correct parent and still hashed.
+ *
+ * If the dentry is valid a reference is taken and returned. If not
+ * an error is returned.
+ *
+ * end_removing() should be called when removal is complete, or aborted.
+ *
+ * Returns: the valid dentry, or an error.
+ */
+struct dentry *start_removing_dentry(struct dentry *parent,
+ struct dentry *child)
+{
+ inode_lock_nested(parent->d_inode, I_MUTEX_PARENT);
+ if (unlikely(IS_DEADDIR(parent->d_inode) ||
+ child->d_parent != parent ||
+ d_unhashed(child))) {
+ inode_unlock(parent->d_inode);
+ return ERR_PTR(-EINVAL);
+ }
+ return dget(child);
+}
+EXPORT_SYMBOL(start_removing_dentry);
+
#ifdef CONFIG_UNIX98_PTYS
int path_pts(struct path *path)
{
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index c4057b4a050d..74b1ef5860a4 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -47,14 +47,12 @@ static int ovl_cleanup_locked(struct ovl_fs *ofs, struct inode *wdir,
int ovl_cleanup(struct ovl_fs *ofs, struct dentry *workdir,
struct dentry *wdentry)
{
- int err;
-
- err = ovl_parent_lock(workdir, wdentry);
- if (err)
- return err;
+ wdentry = start_removing_dentry(workdir, wdentry);
+ if (IS_ERR(wdentry))
+ return PTR_ERR(wdentry);
ovl_cleanup_locked(ofs, workdir->d_inode, wdentry);
- ovl_parent_unlock(workdir);
+ end_removing(wdentry);
return 0;
}
diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index 15cb06fa0c9a..213ff42556e7 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -1158,11 +1158,11 @@ int ovl_workdir_cleanup(struct ovl_fs *ofs, struct dentry *parent,
if (!d_is_dir(dentry) || level > 1)
return ovl_cleanup(ofs, parent, dentry);
- err = ovl_parent_lock(parent, dentry);
- if (err)
- return err;
+ dentry = start_removing_dentry(parent, dentry);
+ if (IS_ERR(dentry))
+ return PTR_ERR(dentry);
err = ovl_do_rmdir(ofs, parent->d_inode, dentry);
- ovl_parent_unlock(parent);
+ end_removing(dentry);
if (err) {
struct path path = { .mnt = mnt, .dentry = dentry };
diff --git a/fs/smb/server/vfs.c b/fs/smb/server/vfs.c
index 1cfa688904b2..56b755a05c4e 100644
--- a/fs/smb/server/vfs.c
+++ b/fs/smb/server/vfs.c
@@ -48,24 +48,6 @@ static void ksmbd_vfs_inherit_owner(struct ksmbd_work *work,
i_uid_write(inode, i_uid_read(parent_inode));
}
-/**
- * ksmbd_vfs_lock_parent() - lock parent dentry if it is stable
- * @parent: parent dentry
- * @child: child dentry
- *
- * Returns: %0 on success, %-ENOENT if the parent dentry is not stable
- */
-int ksmbd_vfs_lock_parent(struct dentry *parent, struct dentry *child)
-{
- inode_lock_nested(d_inode(parent), I_MUTEX_PARENT);
- if (child->d_parent != parent) {
- inode_unlock(d_inode(parent));
- return -ENOENT;
- }
-
- return 0;
-}
-
static int ksmbd_vfs_path_lookup(struct ksmbd_share_config *share_conf,
char *pathname, unsigned int flags,
struct path *path, bool do_lock)
@@ -1083,18 +1065,17 @@ int ksmbd_vfs_unlink(struct file *filp)
return err;
dir = dget_parent(dentry);
- err = ksmbd_vfs_lock_parent(dir, dentry);
- if (err)
+ dentry = start_removing_dentry(dir, dentry);
+ err = PTR_ERR(dentry);
+ if (IS_ERR(dentry))
goto out;
- dget(dentry);
if (S_ISDIR(d_inode(dentry)->i_mode))
err = vfs_rmdir(idmap, d_inode(dir), dentry);
else
err = vfs_unlink(idmap, d_inode(dir), dentry, NULL);
- dput(dentry);
- inode_unlock(d_inode(dir));
+ end_removing(dentry);
if (err)
ksmbd_debug(VFS, "failed to delete, err %d\n", err);
out:
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 20a88a46fe92..32a007f1043e 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -94,6 +94,8 @@ struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *parent,
struct qstr *name);
struct dentry *start_creating_noperm(struct dentry *parent, struct qstr *name);
struct dentry *start_removing_noperm(struct dentry *parent, struct qstr *name);
+struct dentry *start_removing_dentry(struct dentry *parent,
+ struct dentry *child);
/* end_creating - finish action started with start_creating
* @child - dentry returned by start_creating()
diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c
index 391a586d0557..9d08d103f142 100644
--- a/security/apparmor/apparmorfs.c
+++ b/security/apparmor/apparmorfs.c
@@ -355,17 +355,17 @@ static void aafs_remove(struct dentry *dentry)
if (!dentry || IS_ERR(dentry))
return;
+ /* ->d_parent is stable as rename is not supported */
dir = d_inode(dentry->d_parent);
- inode_lock(dir);
- if (simple_positive(dentry)) {
+ dentry = start_removing_dentry(dentry->d_parent, dentry);
+ if (!IS_ERR(dentry) && simple_positive(dentry)) {
if (d_is_dir(dentry))
simple_rmdir(dir, dentry);
else
simple_unlink(dir, dentry);
d_delete(dentry);
- dput(dentry);
}
- inode_unlock(dir);
+ end_removing(dentry);
simple_release_fs(&aafs_mnt, &aafs_count);
}
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 49+ messages in thread* Re: [PATCH 06/11] VFS: introduce start_removing_dentry()
2025-09-26 2:49 ` [PATCH 06/11] VFS: introduce start_removing_dentry() NeilBrown
@ 2025-09-27 9:32 ` Amir Goldstein
2025-09-27 11:55 ` NeilBrown
2025-10-02 17:19 ` Jeff Layton
1 sibling, 1 reply; 49+ messages in thread
From: Amir Goldstein @ 2025-09-27 9:32 UTC (permalink / raw)
To: NeilBrown
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Fri, Sep 26, 2025 at 4:50 AM NeilBrown <neilb@ownmail.net> wrote:
>
> From: NeilBrown <neil@brown.name>
>
> start_removing_dentry() is similar to start_removing() but instead of
> providing a name for lookup, the target dentry is given.
>
> start_removing_dentry() checks that the dentry is still hashed and in
> the parent, and if so it locks and increases the refcount so that
> end_removing() can be used to finish the operation.
>
> This is used in cachefiles, overlayfs, smb/server, and apparmor.
>
> There will be other users including ecryptfs.
>
> As start_removing_dentry() takes an extra reference to the dentry (to be
> put by end_removing()), there is no need to explicitly take an extra
> reference to stop d_delete() from using dentry_unlink_inode() to negate
> the dentry - as in cachefiles_delete_object(), and ksmbd_vfs_unlink().
>
> Signed-off-by: NeilBrown <neil@brown.name>
Feel free to add:
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
After answering/fixing the questions below...
> ---
> fs/cachefiles/interface.c | 14 +++++++++-----
> fs/cachefiles/namei.c | 22 ++++++++++++----------
> fs/cachefiles/volume.c | 10 +++++++---
> fs/namei.c | 29 +++++++++++++++++++++++++++++
> fs/overlayfs/dir.c | 10 ++++------
> fs/overlayfs/readdir.c | 8 ++++----
> fs/smb/server/vfs.c | 27 ++++-----------------------
> include/linux/namei.h | 2 ++
> security/apparmor/apparmorfs.c | 8 ++++----
> 9 files changed, 75 insertions(+), 55 deletions(-)
>
> diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
> index 3e63cfe15874..3f8a6f1a8fc3 100644
> --- a/fs/cachefiles/interface.c
> +++ b/fs/cachefiles/interface.c
> @@ -9,6 +9,7 @@
> #include <linux/mount.h>
> #include <linux/xattr.h>
> #include <linux/file.h>
> +#include <linux/namei.h>
> #include <linux/falloc.h>
> #include <trace/events/fscache.h>
> #include "internal.h"
> @@ -428,11 +429,14 @@ static bool cachefiles_invalidate_cookie(struct fscache_cookie *cookie)
> if (!old_tmpfile) {
> struct cachefiles_volume *volume = object->volume;
> struct dentry *fan = volume->fanout[(u8)cookie->key_hash];
> -
> - inode_lock_nested(d_inode(fan), I_MUTEX_PARENT);
> - cachefiles_bury_object(volume->cache, object, fan,
> - old_file->f_path.dentry,
> - FSCACHE_OBJECT_INVALIDATED);
> + struct dentry *obj;
> +
> + obj = start_removing_dentry(fan, old_file->f_path.dentry);
> + if (!IS_ERR(obj))
> + cachefiles_bury_object(volume->cache, object,
> + fan, obj,
> + FSCACHE_OBJECT_INVALIDATED);
> + end_removing(obj);
> }
> fput(old_file);
> }
> diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
> index 3064d439807b..80a3055d8ae5 100644
> --- a/fs/cachefiles/namei.c
> +++ b/fs/cachefiles/namei.c
> @@ -424,13 +424,12 @@ int cachefiles_delete_object(struct cachefiles_object *object,
>
> _enter(",OBJ%x{%pD}", object->debug_id, object->file);
>
> - /* Stop the dentry being negated if it's only pinned by a file struct. */
> - dget(dentry);
> -
> - inode_lock_nested(d_backing_inode(fan), I_MUTEX_PARENT);
> - ret = cachefiles_unlink(volume->cache, object, fan, dentry, why);
> - inode_unlock(d_backing_inode(fan));
> - dput(dentry);
> + dentry = start_removing_dentry(fan, dentry);
> + if (IS_ERR(dentry))
> + ret = PTR_ERR(dentry);
> + else
> + ret = cachefiles_unlink(volume->cache, object, fan, dentry, why);
> + end_removing(dentry);
> return ret;
> }
>
> @@ -643,9 +642,12 @@ bool cachefiles_look_up_object(struct cachefiles_object *object)
>
> if (!d_is_reg(dentry)) {
> pr_err("%pd is not a file\n", dentry);
> - inode_lock_nested(d_inode(fan), I_MUTEX_PARENT);
> - ret = cachefiles_bury_object(volume->cache, object, fan, dentry,
> - FSCACHE_OBJECT_IS_WEIRD);
> + struct dentry *de = start_removing_dentry(fan, dentry);
> + if (!IS_ERR(de))
I see that other callers do not check return value from
cachefiles_bury_object(), but this call site does.
Shouldn't we treat this error as well (assign it to ret)?
Thanks,
Amir.
> + ret = cachefiles_bury_object(volume->cache, object,
> + fan, de,
> + FSCACHE_OBJECT_IS_WEIRD);
> + end_removing(de);
> dput(dentry);
> if (ret < 0)
> return false;
> diff --git a/fs/cachefiles/volume.c b/fs/cachefiles/volume.c
> index 781aac4ef274..ddf95ff5daf0 100644
> --- a/fs/cachefiles/volume.c
> +++ b/fs/cachefiles/volume.c
> @@ -7,6 +7,7 @@
>
> #include <linux/fs.h>
> #include <linux/slab.h>
> +#include <linux/namei.h>
> #include "internal.h"
> #include <trace/events/fscache.h>
>
> @@ -58,9 +59,12 @@ void cachefiles_acquire_volume(struct fscache_volume *vcookie)
> if (ret < 0) {
> if (ret != -ESTALE)
> goto error_dir;
> - inode_lock_nested(d_inode(cache->store), I_MUTEX_PARENT);
> - cachefiles_bury_object(cache, NULL, cache->store, vdentry,
> - FSCACHE_VOLUME_IS_WEIRD);
> + vdentry = start_removing_dentry(cache->store, vdentry);
> + if (!IS_ERR(vdentry))
> + cachefiles_bury_object(cache, NULL, cache->store,
> + vdentry,
> + FSCACHE_VOLUME_IS_WEIRD);
> + end_removing(vdentry);
> cachefiles_put_directory(volume->dentry);
> cond_resched();
> goto retry;
> diff --git a/fs/namei.c b/fs/namei.c
> index bd5c45801756..cb4d40af12ae 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3344,6 +3344,35 @@ struct dentry *start_removing_noperm(struct dentry *parent,
> }
> EXPORT_SYMBOL(start_removing_noperm);
>
> +/**
> + * start_removing_dentry - prepare to remove a given dentry
> + * @parent - directory from which dentry should be removed
> + * @child - the dentry to be removed
> + *
> + * A lock is taken to protect the dentry again other dirops and
> + * the validity of the dentry is checked: correct parent and still hashed.
> + *
> + * If the dentry is valid a reference is taken and returned. If not
> + * an error is returned.
> + *
> + * end_removing() should be called when removal is complete, or aborted.
> + *
> + * Returns: the valid dentry, or an error.
> + */
> +struct dentry *start_removing_dentry(struct dentry *parent,
> + struct dentry *child)
> +{
> + inode_lock_nested(parent->d_inode, I_MUTEX_PARENT);
> + if (unlikely(IS_DEADDIR(parent->d_inode) ||
> + child->d_parent != parent ||
> + d_unhashed(child))) {
> + inode_unlock(parent->d_inode);
> + return ERR_PTR(-EINVAL);
> + }
> + return dget(child);
> +}
> +EXPORT_SYMBOL(start_removing_dentry);
> +
> #ifdef CONFIG_UNIX98_PTYS
> int path_pts(struct path *path)
> {
> diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> index c4057b4a050d..74b1ef5860a4 100644
> --- a/fs/overlayfs/dir.c
> +++ b/fs/overlayfs/dir.c
> @@ -47,14 +47,12 @@ static int ovl_cleanup_locked(struct ovl_fs *ofs, struct inode *wdir,
> int ovl_cleanup(struct ovl_fs *ofs, struct dentry *workdir,
> struct dentry *wdentry)
> {
> - int err;
> -
> - err = ovl_parent_lock(workdir, wdentry);
> - if (err)
> - return err;
> + wdentry = start_removing_dentry(workdir, wdentry);
> + if (IS_ERR(wdentry))
> + return PTR_ERR(wdentry);
>
> ovl_cleanup_locked(ofs, workdir->d_inode, wdentry);
> - ovl_parent_unlock(workdir);
> + end_removing(wdentry);
>
> return 0;
> }
> diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
> index 15cb06fa0c9a..213ff42556e7 100644
> --- a/fs/overlayfs/readdir.c
> +++ b/fs/overlayfs/readdir.c
> @@ -1158,11 +1158,11 @@ int ovl_workdir_cleanup(struct ovl_fs *ofs, struct dentry *parent,
> if (!d_is_dir(dentry) || level > 1)
> return ovl_cleanup(ofs, parent, dentry);
>
> - err = ovl_parent_lock(parent, dentry);
> - if (err)
> - return err;
> + dentry = start_removing_dentry(parent, dentry);
> + if (IS_ERR(dentry))
> + return PTR_ERR(dentry);
> err = ovl_do_rmdir(ofs, parent->d_inode, dentry);
> - ovl_parent_unlock(parent);
> + end_removing(dentry);
> if (err) {
> struct path path = { .mnt = mnt, .dentry = dentry };
>
> diff --git a/fs/smb/server/vfs.c b/fs/smb/server/vfs.c
> index 1cfa688904b2..56b755a05c4e 100644
> --- a/fs/smb/server/vfs.c
> +++ b/fs/smb/server/vfs.c
> @@ -48,24 +48,6 @@ static void ksmbd_vfs_inherit_owner(struct ksmbd_work *work,
> i_uid_write(inode, i_uid_read(parent_inode));
> }
>
> -/**
> - * ksmbd_vfs_lock_parent() - lock parent dentry if it is stable
> - * @parent: parent dentry
> - * @child: child dentry
> - *
> - * Returns: %0 on success, %-ENOENT if the parent dentry is not stable
> - */
> -int ksmbd_vfs_lock_parent(struct dentry *parent, struct dentry *child)
> -{
> - inode_lock_nested(d_inode(parent), I_MUTEX_PARENT);
> - if (child->d_parent != parent) {
> - inode_unlock(d_inode(parent));
> - return -ENOENT;
> - }
> -
> - return 0;
> -}
> -
> static int ksmbd_vfs_path_lookup(struct ksmbd_share_config *share_conf,
> char *pathname, unsigned int flags,
> struct path *path, bool do_lock)
> @@ -1083,18 +1065,17 @@ int ksmbd_vfs_unlink(struct file *filp)
> return err;
>
> dir = dget_parent(dentry);
> - err = ksmbd_vfs_lock_parent(dir, dentry);
> - if (err)
> + dentry = start_removing_dentry(dir, dentry);
> + err = PTR_ERR(dentry);
> + if (IS_ERR(dentry))
> goto out;
> - dget(dentry);
>
> if (S_ISDIR(d_inode(dentry)->i_mode))
> err = vfs_rmdir(idmap, d_inode(dir), dentry);
> else
> err = vfs_unlink(idmap, d_inode(dir), dentry, NULL);
>
> - dput(dentry);
> - inode_unlock(d_inode(dir));
> + end_removing(dentry);
> if (err)
> ksmbd_debug(VFS, "failed to delete, err %d\n", err);
> out:
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 20a88a46fe92..32a007f1043e 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -94,6 +94,8 @@ struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *parent,
> struct qstr *name);
> struct dentry *start_creating_noperm(struct dentry *parent, struct qstr *name);
> struct dentry *start_removing_noperm(struct dentry *parent, struct qstr *name);
> +struct dentry *start_removing_dentry(struct dentry *parent,
> + struct dentry *child);
>
> /* end_creating - finish action started with start_creating
> * @child - dentry returned by start_creating()
> diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c
> index 391a586d0557..9d08d103f142 100644
> --- a/security/apparmor/apparmorfs.c
> +++ b/security/apparmor/apparmorfs.c
> @@ -355,17 +355,17 @@ static void aafs_remove(struct dentry *dentry)
> if (!dentry || IS_ERR(dentry))
> return;
>
> + /* ->d_parent is stable as rename is not supported */
> dir = d_inode(dentry->d_parent);
> - inode_lock(dir);
> - if (simple_positive(dentry)) {
> + dentry = start_removing_dentry(dentry->d_parent, dentry);
> + if (!IS_ERR(dentry) && simple_positive(dentry)) {
> if (d_is_dir(dentry))
> simple_rmdir(dir, dentry);
> else
> simple_unlink(dir, dentry);
> d_delete(dentry);
> - dput(dentry);
> }
> - inode_unlock(dir);
> + end_removing(dentry);
> simple_release_fs(&aafs_mnt, &aafs_count);
> }
>
> --
> 2.50.0.107.gf914562f5916.dirty
>
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH 06/11] VFS: introduce start_removing_dentry()
2025-09-27 9:32 ` Amir Goldstein
@ 2025-09-27 11:55 ` NeilBrown
0 siblings, 0 replies; 49+ messages in thread
From: NeilBrown @ 2025-09-27 11:55 UTC (permalink / raw)
To: Amir Goldstein
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Sat, 27 Sep 2025, Amir Goldstein wrote:
> On Fri, Sep 26, 2025 at 4:50 AM NeilBrown <neilb@ownmail.net> wrote:
> >
> > From: NeilBrown <neil@brown.name>
> >
> > start_removing_dentry() is similar to start_removing() but instead of
> > providing a name for lookup, the target dentry is given.
> >
> > start_removing_dentry() checks that the dentry is still hashed and in
> > the parent, and if so it locks and increases the refcount so that
> > end_removing() can be used to finish the operation.
> >
> > This is used in cachefiles, overlayfs, smb/server, and apparmor.
> >
> > There will be other users including ecryptfs.
> >
> > As start_removing_dentry() takes an extra reference to the dentry (to be
> > put by end_removing()), there is no need to explicitly take an extra
> > reference to stop d_delete() from using dentry_unlink_inode() to negate
> > the dentry - as in cachefiles_delete_object(), and ksmbd_vfs_unlink().
> >
> > Signed-off-by: NeilBrown <neil@brown.name>
>
> Feel free to add:
> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
>
> After answering/fixing the questions below...
>
> > ---
> > fs/cachefiles/interface.c | 14 +++++++++-----
> > fs/cachefiles/namei.c | 22 ++++++++++++----------
> > fs/cachefiles/volume.c | 10 +++++++---
> > fs/namei.c | 29 +++++++++++++++++++++++++++++
> > fs/overlayfs/dir.c | 10 ++++------
> > fs/overlayfs/readdir.c | 8 ++++----
> > fs/smb/server/vfs.c | 27 ++++-----------------------
> > include/linux/namei.h | 2 ++
> > security/apparmor/apparmorfs.c | 8 ++++----
> > 9 files changed, 75 insertions(+), 55 deletions(-)
> >
> > diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
> > index 3e63cfe15874..3f8a6f1a8fc3 100644
> > --- a/fs/cachefiles/interface.c
> > +++ b/fs/cachefiles/interface.c
> > @@ -9,6 +9,7 @@
> > #include <linux/mount.h>
> > #include <linux/xattr.h>
> > #include <linux/file.h>
> > +#include <linux/namei.h>
> > #include <linux/falloc.h>
> > #include <trace/events/fscache.h>
> > #include "internal.h"
> > @@ -428,11 +429,14 @@ static bool cachefiles_invalidate_cookie(struct fscache_cookie *cookie)
> > if (!old_tmpfile) {
> > struct cachefiles_volume *volume = object->volume;
> > struct dentry *fan = volume->fanout[(u8)cookie->key_hash];
> > -
> > - inode_lock_nested(d_inode(fan), I_MUTEX_PARENT);
> > - cachefiles_bury_object(volume->cache, object, fan,
> > - old_file->f_path.dentry,
> > - FSCACHE_OBJECT_INVALIDATED);
> > + struct dentry *obj;
> > +
> > + obj = start_removing_dentry(fan, old_file->f_path.dentry);
> > + if (!IS_ERR(obj))
> > + cachefiles_bury_object(volume->cache, object,
> > + fan, obj,
> > + FSCACHE_OBJECT_INVALIDATED);
> > + end_removing(obj);
> > }
> > fput(old_file);
> > }
> > diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
> > index 3064d439807b..80a3055d8ae5 100644
> > --- a/fs/cachefiles/namei.c
> > +++ b/fs/cachefiles/namei.c
> > @@ -424,13 +424,12 @@ int cachefiles_delete_object(struct cachefiles_object *object,
> >
> > _enter(",OBJ%x{%pD}", object->debug_id, object->file);
> >
> > - /* Stop the dentry being negated if it's only pinned by a file struct. */
> > - dget(dentry);
> > -
> > - inode_lock_nested(d_backing_inode(fan), I_MUTEX_PARENT);
> > - ret = cachefiles_unlink(volume->cache, object, fan, dentry, why);
> > - inode_unlock(d_backing_inode(fan));
> > - dput(dentry);
> > + dentry = start_removing_dentry(fan, dentry);
> > + if (IS_ERR(dentry))
> > + ret = PTR_ERR(dentry);
> > + else
> > + ret = cachefiles_unlink(volume->cache, object, fan, dentry, why);
> > + end_removing(dentry);
> > return ret;
> > }
> >
> > @@ -643,9 +642,12 @@ bool cachefiles_look_up_object(struct cachefiles_object *object)
> >
> > if (!d_is_reg(dentry)) {
> > pr_err("%pd is not a file\n", dentry);
> > - inode_lock_nested(d_inode(fan), I_MUTEX_PARENT);
> > - ret = cachefiles_bury_object(volume->cache, object, fan, dentry,
> > - FSCACHE_OBJECT_IS_WEIRD);
> > + struct dentry *de = start_removing_dentry(fan, dentry);
> > + if (!IS_ERR(de))
>
> I see that other callers do not check return value from
> cachefiles_bury_object(), but this call site does.
> Shouldn't we treat this error as well (assign it to ret)?
Yes, that make sense.
if (IS_ERR(de))
ret = PTR_ERR(de);
else
ret = cachefiles_bury_object(.....)
Thanks,
NeilBrown
>
> Thanks,
> Amir.
>
> > + ret = cachefiles_bury_object(volume->cache, object,
> > + fan, de,
> > + FSCACHE_OBJECT_IS_WEIRD);
> > + end_removing(de);
> > dput(dentry);
> > if (ret < 0)
> > return false;
> > diff --git a/fs/cachefiles/volume.c b/fs/cachefiles/volume.c
> > index 781aac4ef274..ddf95ff5daf0 100644
> > --- a/fs/cachefiles/volume.c
> > +++ b/fs/cachefiles/volume.c
> > @@ -7,6 +7,7 @@
> >
> > #include <linux/fs.h>
> > #include <linux/slab.h>
> > +#include <linux/namei.h>
> > #include "internal.h"
> > #include <trace/events/fscache.h>
> >
> > @@ -58,9 +59,12 @@ void cachefiles_acquire_volume(struct fscache_volume *vcookie)
> > if (ret < 0) {
> > if (ret != -ESTALE)
> > goto error_dir;
> > - inode_lock_nested(d_inode(cache->store), I_MUTEX_PARENT);
> > - cachefiles_bury_object(cache, NULL, cache->store, vdentry,
> > - FSCACHE_VOLUME_IS_WEIRD);
> > + vdentry = start_removing_dentry(cache->store, vdentry);
> > + if (!IS_ERR(vdentry))
> > + cachefiles_bury_object(cache, NULL, cache->store,
> > + vdentry,
> > + FSCACHE_VOLUME_IS_WEIRD);
> > + end_removing(vdentry);
> > cachefiles_put_directory(volume->dentry);
> > cond_resched();
> > goto retry;
> > diff --git a/fs/namei.c b/fs/namei.c
> > index bd5c45801756..cb4d40af12ae 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -3344,6 +3344,35 @@ struct dentry *start_removing_noperm(struct dentry *parent,
> > }
> > EXPORT_SYMBOL(start_removing_noperm);
> >
> > +/**
> > + * start_removing_dentry - prepare to remove a given dentry
> > + * @parent - directory from which dentry should be removed
> > + * @child - the dentry to be removed
> > + *
> > + * A lock is taken to protect the dentry again other dirops and
> > + * the validity of the dentry is checked: correct parent and still hashed.
> > + *
> > + * If the dentry is valid a reference is taken and returned. If not
> > + * an error is returned.
> > + *
> > + * end_removing() should be called when removal is complete, or aborted.
> > + *
> > + * Returns: the valid dentry, or an error.
> > + */
> > +struct dentry *start_removing_dentry(struct dentry *parent,
> > + struct dentry *child)
> > +{
> > + inode_lock_nested(parent->d_inode, I_MUTEX_PARENT);
> > + if (unlikely(IS_DEADDIR(parent->d_inode) ||
> > + child->d_parent != parent ||
> > + d_unhashed(child))) {
> > + inode_unlock(parent->d_inode);
> > + return ERR_PTR(-EINVAL);
> > + }
> > + return dget(child);
> > +}
> > +EXPORT_SYMBOL(start_removing_dentry);
> > +
> > #ifdef CONFIG_UNIX98_PTYS
> > int path_pts(struct path *path)
> > {
> > diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> > index c4057b4a050d..74b1ef5860a4 100644
> > --- a/fs/overlayfs/dir.c
> > +++ b/fs/overlayfs/dir.c
> > @@ -47,14 +47,12 @@ static int ovl_cleanup_locked(struct ovl_fs *ofs, struct inode *wdir,
> > int ovl_cleanup(struct ovl_fs *ofs, struct dentry *workdir,
> > struct dentry *wdentry)
> > {
> > - int err;
> > -
> > - err = ovl_parent_lock(workdir, wdentry);
> > - if (err)
> > - return err;
> > + wdentry = start_removing_dentry(workdir, wdentry);
> > + if (IS_ERR(wdentry))
> > + return PTR_ERR(wdentry);
> >
> > ovl_cleanup_locked(ofs, workdir->d_inode, wdentry);
> > - ovl_parent_unlock(workdir);
> > + end_removing(wdentry);
> >
> > return 0;
> > }
> > diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
> > index 15cb06fa0c9a..213ff42556e7 100644
> > --- a/fs/overlayfs/readdir.c
> > +++ b/fs/overlayfs/readdir.c
> > @@ -1158,11 +1158,11 @@ int ovl_workdir_cleanup(struct ovl_fs *ofs, struct dentry *parent,
> > if (!d_is_dir(dentry) || level > 1)
> > return ovl_cleanup(ofs, parent, dentry);
> >
> > - err = ovl_parent_lock(parent, dentry);
> > - if (err)
> > - return err;
> > + dentry = start_removing_dentry(parent, dentry);
> > + if (IS_ERR(dentry))
> > + return PTR_ERR(dentry);
> > err = ovl_do_rmdir(ofs, parent->d_inode, dentry);
> > - ovl_parent_unlock(parent);
> > + end_removing(dentry);
> > if (err) {
> > struct path path = { .mnt = mnt, .dentry = dentry };
> >
> > diff --git a/fs/smb/server/vfs.c b/fs/smb/server/vfs.c
> > index 1cfa688904b2..56b755a05c4e 100644
> > --- a/fs/smb/server/vfs.c
> > +++ b/fs/smb/server/vfs.c
> > @@ -48,24 +48,6 @@ static void ksmbd_vfs_inherit_owner(struct ksmbd_work *work,
> > i_uid_write(inode, i_uid_read(parent_inode));
> > }
> >
> > -/**
> > - * ksmbd_vfs_lock_parent() - lock parent dentry if it is stable
> > - * @parent: parent dentry
> > - * @child: child dentry
> > - *
> > - * Returns: %0 on success, %-ENOENT if the parent dentry is not stable
> > - */
> > -int ksmbd_vfs_lock_parent(struct dentry *parent, struct dentry *child)
> > -{
> > - inode_lock_nested(d_inode(parent), I_MUTEX_PARENT);
> > - if (child->d_parent != parent) {
> > - inode_unlock(d_inode(parent));
> > - return -ENOENT;
> > - }
> > -
> > - return 0;
> > -}
> > -
> > static int ksmbd_vfs_path_lookup(struct ksmbd_share_config *share_conf,
> > char *pathname, unsigned int flags,
> > struct path *path, bool do_lock)
> > @@ -1083,18 +1065,17 @@ int ksmbd_vfs_unlink(struct file *filp)
> > return err;
> >
> > dir = dget_parent(dentry);
> > - err = ksmbd_vfs_lock_parent(dir, dentry);
> > - if (err)
> > + dentry = start_removing_dentry(dir, dentry);
> > + err = PTR_ERR(dentry);
> > + if (IS_ERR(dentry))
> > goto out;
> > - dget(dentry);
> >
> > if (S_ISDIR(d_inode(dentry)->i_mode))
> > err = vfs_rmdir(idmap, d_inode(dir), dentry);
> > else
> > err = vfs_unlink(idmap, d_inode(dir), dentry, NULL);
> >
> > - dput(dentry);
> > - inode_unlock(d_inode(dir));
> > + end_removing(dentry);
> > if (err)
> > ksmbd_debug(VFS, "failed to delete, err %d\n", err);
> > out:
> > diff --git a/include/linux/namei.h b/include/linux/namei.h
> > index 20a88a46fe92..32a007f1043e 100644
> > --- a/include/linux/namei.h
> > +++ b/include/linux/namei.h
> > @@ -94,6 +94,8 @@ struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *parent,
> > struct qstr *name);
> > struct dentry *start_creating_noperm(struct dentry *parent, struct qstr *name);
> > struct dentry *start_removing_noperm(struct dentry *parent, struct qstr *name);
> > +struct dentry *start_removing_dentry(struct dentry *parent,
> > + struct dentry *child);
> >
> > /* end_creating - finish action started with start_creating
> > * @child - dentry returned by start_creating()
> > diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c
> > index 391a586d0557..9d08d103f142 100644
> > --- a/security/apparmor/apparmorfs.c
> > +++ b/security/apparmor/apparmorfs.c
> > @@ -355,17 +355,17 @@ static void aafs_remove(struct dentry *dentry)
> > if (!dentry || IS_ERR(dentry))
> > return;
> >
> > + /* ->d_parent is stable as rename is not supported */
> > dir = d_inode(dentry->d_parent);
> > - inode_lock(dir);
> > - if (simple_positive(dentry)) {
> > + dentry = start_removing_dentry(dentry->d_parent, dentry);
> > + if (!IS_ERR(dentry) && simple_positive(dentry)) {
> > if (d_is_dir(dentry))
> > simple_rmdir(dir, dentry);
> > else
> > simple_unlink(dir, dentry);
> > d_delete(dentry);
> > - dput(dentry);
> > }
> > - inode_unlock(dir);
> > + end_removing(dentry);
> > simple_release_fs(&aafs_mnt, &aafs_count);
> > }
> >
> > --
> > 2.50.0.107.gf914562f5916.dirty
> >
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 06/11] VFS: introduce start_removing_dentry()
2025-09-26 2:49 ` [PATCH 06/11] VFS: introduce start_removing_dentry() NeilBrown
2025-09-27 9:32 ` Amir Goldstein
@ 2025-10-02 17:19 ` Jeff Layton
1 sibling, 0 replies; 49+ messages in thread
From: Jeff Layton @ 2025-10-02 17:19 UTC (permalink / raw)
To: NeilBrown, Alexander Viro, Christian Brauner, Amir Goldstein
Cc: Jan Kara, linux-fsdevel
On Fri, 2025-09-26 at 12:49 +1000, NeilBrown wrote:
> From: NeilBrown <neil@brown.name>
>
> start_removing_dentry() is similar to start_removing() but instead of
> providing a name for lookup, the target dentry is given.
>
> start_removing_dentry() checks that the dentry is still hashed and in
> the parent, and if so it locks and increases the refcount so that
> end_removing() can be used to finish the operation.
>
> This is used in cachefiles, overlayfs, smb/server, and apparmor.
>
> There will be other users including ecryptfs.
>
> As start_removing_dentry() takes an extra reference to the dentry (to be
> put by end_removing()), there is no need to explicitly take an extra
> reference to stop d_delete() from using dentry_unlink_inode() to negate
> the dentry - as in cachefiles_delete_object(), and ksmbd_vfs_unlink().
>
> Signed-off-by: NeilBrown <neil@brown.name>
> ---
> fs/cachefiles/interface.c | 14 +++++++++-----
> fs/cachefiles/namei.c | 22 ++++++++++++----------
> fs/cachefiles/volume.c | 10 +++++++---
> fs/namei.c | 29 +++++++++++++++++++++++++++++
> fs/overlayfs/dir.c | 10 ++++------
> fs/overlayfs/readdir.c | 8 ++++----
> fs/smb/server/vfs.c | 27 ++++-----------------------
> include/linux/namei.h | 2 ++
> security/apparmor/apparmorfs.c | 8 ++++----
> 9 files changed, 75 insertions(+), 55 deletions(-)
>
> diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
> index 3e63cfe15874..3f8a6f1a8fc3 100644
> --- a/fs/cachefiles/interface.c
> +++ b/fs/cachefiles/interface.c
> @@ -9,6 +9,7 @@
> #include <linux/mount.h>
> #include <linux/xattr.h>
> #include <linux/file.h>
> +#include <linux/namei.h>
> #include <linux/falloc.h>
> #include <trace/events/fscache.h>
> #include "internal.h"
> @@ -428,11 +429,14 @@ static bool cachefiles_invalidate_cookie(struct fscache_cookie *cookie)
> if (!old_tmpfile) {
> struct cachefiles_volume *volume = object->volume;
> struct dentry *fan = volume->fanout[(u8)cookie->key_hash];
> -
> - inode_lock_nested(d_inode(fan), I_MUTEX_PARENT);
> - cachefiles_bury_object(volume->cache, object, fan,
> - old_file->f_path.dentry,
> - FSCACHE_OBJECT_INVALIDATED);
> + struct dentry *obj;
> +
> + obj = start_removing_dentry(fan, old_file->f_path.dentry);
> + if (!IS_ERR(obj))
> + cachefiles_bury_object(volume->cache, object,
> + fan, obj,
> + FSCACHE_OBJECT_INVALIDATED);
> + end_removing(obj);
> }
> fput(old_file);
> }
> diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
> index 3064d439807b..80a3055d8ae5 100644
> --- a/fs/cachefiles/namei.c
> +++ b/fs/cachefiles/namei.c
> @@ -424,13 +424,12 @@ int cachefiles_delete_object(struct cachefiles_object *object,
>
> _enter(",OBJ%x{%pD}", object->debug_id, object->file);
>
> - /* Stop the dentry being negated if it's only pinned by a file struct. */
> - dget(dentry);
> -
> - inode_lock_nested(d_backing_inode(fan), I_MUTEX_PARENT);
> - ret = cachefiles_unlink(volume->cache, object, fan, dentry, why);
> - inode_unlock(d_backing_inode(fan));
> - dput(dentry);
> + dentry = start_removing_dentry(fan, dentry);
> + if (IS_ERR(dentry))
> + ret = PTR_ERR(dentry);
> + else
> + ret = cachefiles_unlink(volume->cache, object, fan, dentry, why);
> + end_removing(dentry);
> return ret;
> }
>
> @@ -643,9 +642,12 @@ bool cachefiles_look_up_object(struct cachefiles_object *object)
>
> if (!d_is_reg(dentry)) {
> pr_err("%pd is not a file\n", dentry);
> - inode_lock_nested(d_inode(fan), I_MUTEX_PARENT);
> - ret = cachefiles_bury_object(volume->cache, object, fan, dentry,
> - FSCACHE_OBJECT_IS_WEIRD);
> + struct dentry *de = start_removing_dentry(fan, dentry);
> + if (!IS_ERR(de))
> + ret = cachefiles_bury_object(volume->cache, object,
> + fan, de,
> + FSCACHE_OBJECT_IS_WEIRD);
> + end_removing(de);
> dput(dentry);
> if (ret < 0)
> return false;
> diff --git a/fs/cachefiles/volume.c b/fs/cachefiles/volume.c
> index 781aac4ef274..ddf95ff5daf0 100644
> --- a/fs/cachefiles/volume.c
> +++ b/fs/cachefiles/volume.c
> @@ -7,6 +7,7 @@
>
> #include <linux/fs.h>
> #include <linux/slab.h>
> +#include <linux/namei.h>
> #include "internal.h"
> #include <trace/events/fscache.h>
>
> @@ -58,9 +59,12 @@ void cachefiles_acquire_volume(struct fscache_volume *vcookie)
> if (ret < 0) {
> if (ret != -ESTALE)
> goto error_dir;
> - inode_lock_nested(d_inode(cache->store), I_MUTEX_PARENT);
> - cachefiles_bury_object(cache, NULL, cache->store, vdentry,
> - FSCACHE_VOLUME_IS_WEIRD);
> + vdentry = start_removing_dentry(cache->store, vdentry);
> + if (!IS_ERR(vdentry))
> + cachefiles_bury_object(cache, NULL, cache->store,
> + vdentry,
> + FSCACHE_VOLUME_IS_WEIRD);
> + end_removing(vdentry);
> cachefiles_put_directory(volume->dentry);
> cond_resched();
> goto retry;
> diff --git a/fs/namei.c b/fs/namei.c
> index bd5c45801756..cb4d40af12ae 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3344,6 +3344,35 @@ struct dentry *start_removing_noperm(struct dentry *parent,
> }
> EXPORT_SYMBOL(start_removing_noperm);
>
> +/**
> + * start_removing_dentry - prepare to remove a given dentry
> + * @parent - directory from which dentry should be removed
> + * @child - the dentry to be removed
> + *
> + * A lock is taken to protect the dentry again other dirops and
> + * the validity of the dentry is checked: correct parent and still hashed.
> + *
> + * If the dentry is valid a reference is taken and returned. If not
> + * an error is returned.
> + *
> + * end_removing() should be called when removal is complete, or aborted.
> + *
> + * Returns: the valid dentry, or an error.
> + */
> +struct dentry *start_removing_dentry(struct dentry *parent,
> + struct dentry *child)
> +{
> + inode_lock_nested(parent->d_inode, I_MUTEX_PARENT);
> + if (unlikely(IS_DEADDIR(parent->d_inode) ||
> + child->d_parent != parent ||
> + d_unhashed(child))) {
> + inode_unlock(parent->d_inode);
> + return ERR_PTR(-EINVAL);
> + }
> + return dget(child);
> +}
> +EXPORT_SYMBOL(start_removing_dentry);
> +
> #ifdef CONFIG_UNIX98_PTYS
> int path_pts(struct path *path)
> {
> diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> index c4057b4a050d..74b1ef5860a4 100644
> --- a/fs/overlayfs/dir.c
> +++ b/fs/overlayfs/dir.c
> @@ -47,14 +47,12 @@ static int ovl_cleanup_locked(struct ovl_fs *ofs, struct inode *wdir,
> int ovl_cleanup(struct ovl_fs *ofs, struct dentry *workdir,
> struct dentry *wdentry)
> {
> - int err;
> -
> - err = ovl_parent_lock(workdir, wdentry);
> - if (err)
> - return err;
> + wdentry = start_removing_dentry(workdir, wdentry);
> + if (IS_ERR(wdentry))
> + return PTR_ERR(wdentry);
>
> ovl_cleanup_locked(ofs, workdir->d_inode, wdentry);
> - ovl_parent_unlock(workdir);
> + end_removing(wdentry);
>
> return 0;
> }
> diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
> index 15cb06fa0c9a..213ff42556e7 100644
> --- a/fs/overlayfs/readdir.c
> +++ b/fs/overlayfs/readdir.c
> @@ -1158,11 +1158,11 @@ int ovl_workdir_cleanup(struct ovl_fs *ofs, struct dentry *parent,
> if (!d_is_dir(dentry) || level > 1)
> return ovl_cleanup(ofs, parent, dentry);
>
> - err = ovl_parent_lock(parent, dentry);
> - if (err)
> - return err;
> + dentry = start_removing_dentry(parent, dentry);
> + if (IS_ERR(dentry))
> + return PTR_ERR(dentry);
> err = ovl_do_rmdir(ofs, parent->d_inode, dentry);
> - ovl_parent_unlock(parent);
> + end_removing(dentry);
> if (err) {
> struct path path = { .mnt = mnt, .dentry = dentry };
>
> diff --git a/fs/smb/server/vfs.c b/fs/smb/server/vfs.c
> index 1cfa688904b2..56b755a05c4e 100644
> --- a/fs/smb/server/vfs.c
> +++ b/fs/smb/server/vfs.c
> @@ -48,24 +48,6 @@ static void ksmbd_vfs_inherit_owner(struct ksmbd_work *work,
> i_uid_write(inode, i_uid_read(parent_inode));
> }
>
> -/**
> - * ksmbd_vfs_lock_parent() - lock parent dentry if it is stable
> - * @parent: parent dentry
> - * @child: child dentry
> - *
> - * Returns: %0 on success, %-ENOENT if the parent dentry is not stable
> - */
> -int ksmbd_vfs_lock_parent(struct dentry *parent, struct dentry *child)
> -{
> - inode_lock_nested(d_inode(parent), I_MUTEX_PARENT);
> - if (child->d_parent != parent) {
> - inode_unlock(d_inode(parent));
> - return -ENOENT;
> - }
> -
> - return 0;
> -}
> -
> static int ksmbd_vfs_path_lookup(struct ksmbd_share_config *share_conf,
> char *pathname, unsigned int flags,
> struct path *path, bool do_lock)
> @@ -1083,18 +1065,17 @@ int ksmbd_vfs_unlink(struct file *filp)
> return err;
>
> dir = dget_parent(dentry);
> - err = ksmbd_vfs_lock_parent(dir, dentry);
> - if (err)
> + dentry = start_removing_dentry(dir, dentry);
> + err = PTR_ERR(dentry);
> + if (IS_ERR(dentry))
> goto out;
> - dget(dentry);
>
> if (S_ISDIR(d_inode(dentry)->i_mode))
> err = vfs_rmdir(idmap, d_inode(dir), dentry);
> else
> err = vfs_unlink(idmap, d_inode(dir), dentry, NULL);
>
> - dput(dentry);
> - inode_unlock(d_inode(dir));
> + end_removing(dentry);
> if (err)
> ksmbd_debug(VFS, "failed to delete, err %d\n", err);
> out:
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 20a88a46fe92..32a007f1043e 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -94,6 +94,8 @@ struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *parent,
> struct qstr *name);
> struct dentry *start_creating_noperm(struct dentry *parent, struct qstr *name);
> struct dentry *start_removing_noperm(struct dentry *parent, struct qstr *name);
> +struct dentry *start_removing_dentry(struct dentry *parent,
> + struct dentry *child);
>
> /* end_creating - finish action started with start_creating
> * @child - dentry returned by start_creating()
> diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c
> index 391a586d0557..9d08d103f142 100644
> --- a/security/apparmor/apparmorfs.c
> +++ b/security/apparmor/apparmorfs.c
> @@ -355,17 +355,17 @@ static void aafs_remove(struct dentry *dentry)
> if (!dentry || IS_ERR(dentry))
> return;
>
> + /* ->d_parent is stable as rename is not supported */
> dir = d_inode(dentry->d_parent);
> - inode_lock(dir);
> - if (simple_positive(dentry)) {
> + dentry = start_removing_dentry(dentry->d_parent, dentry);
> + if (!IS_ERR(dentry) && simple_positive(dentry)) {
> if (d_is_dir(dentry))
> simple_rmdir(dir, dentry);
> else
> simple_unlink(dir, dentry);
> d_delete(dentry);
> - dput(dentry);
> }
> - inode_unlock(dir);
> + end_removing(dentry);
> simple_release_fs(&aafs_mnt, &aafs_count);
> }
>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH 07/11] VFS: add start_creating_killable() and start_removing_killable()
2025-09-26 2:49 [PATCH 00/11] Create APIs to centralise locking for directory ops NeilBrown
` (5 preceding siblings ...)
2025-09-26 2:49 ` [PATCH 06/11] VFS: introduce start_removing_dentry() NeilBrown
@ 2025-09-26 2:49 ` NeilBrown
2025-09-28 12:05 ` Amir Goldstein
2025-09-26 2:49 ` [PATCH 08/11] VFS/nfsd/ovl: introduce start_renaming() and end_renaming() NeilBrown
` (4 subsequent siblings)
11 siblings, 1 reply; 49+ messages in thread
From: NeilBrown @ 2025-09-26 2:49 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
These are similar to start_creating() and start_removing(), but allow a
fatal signal to abort waiting for the lock.
They are used in btrfs for subvol creation and removal.
btrfs_may_create() no longer needs IS_DEADDIR() and
start_creating_killable() includes that check.
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/btrfs/ioctl.c | 43 +++++++----------------
fs/namei.c | 80 +++++++++++++++++++++++++++++++++++++++++--
include/linux/namei.h | 6 ++++
3 files changed, 95 insertions(+), 34 deletions(-)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 7e13de2bdcbf..3a007f59f7f2 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -880,8 +880,6 @@ static inline int btrfs_may_create(struct mnt_idmap *idmap,
{
if (d_really_is_positive(child))
return -EEXIST;
- if (IS_DEADDIR(dir))
- return -ENOENT;
if (!fsuidgid_has_mapping(dir->i_sb, idmap))
return -EOVERFLOW;
return inode_permission(idmap, dir, MAY_WRITE | MAY_EXEC);
@@ -904,14 +902,9 @@ static noinline int btrfs_mksubvol(struct dentry *parent,
struct fscrypt_str name_str = FSTR_INIT((char *)qname->name, qname->len);
int ret;
- ret = down_write_killable_nested(&dir->i_rwsem, I_MUTEX_PARENT);
- if (ret == -EINTR)
- return ret;
-
- dentry = lookup_one(idmap, qname, parent);
- ret = PTR_ERR(dentry);
+ dentry = start_creating_killable(idmap, parent, qname);
if (IS_ERR(dentry))
- goto out_unlock;
+ return PTR_ERR(dentry);
ret = btrfs_may_create(idmap, dir, dentry);
if (ret)
@@ -940,9 +933,7 @@ static noinline int btrfs_mksubvol(struct dentry *parent,
out_up_read:
up_read(&fs_info->subvol_sem);
out_dput:
- dput(dentry);
-out_unlock:
- btrfs_inode_unlock(BTRFS_I(dir), 0);
+ end_creating(dentry, parent);
return ret;
}
@@ -2417,18 +2408,10 @@ static noinline int btrfs_ioctl_snap_destroy(struct file *file,
goto free_subvol_name;
}
- ret = down_write_killable_nested(&dir->i_rwsem, I_MUTEX_PARENT);
- if (ret == -EINTR)
- goto free_subvol_name;
- dentry = lookup_one(idmap, &QSTR(subvol_name), parent);
+ dentry = start_removing_killable(idmap, parent, &QSTR(subvol_name));
if (IS_ERR(dentry)) {
ret = PTR_ERR(dentry);
- goto out_unlock_dir;
- }
-
- if (d_really_is_negative(dentry)) {
- ret = -ENOENT;
- goto out_dput;
+ goto out_end_removing;
}
inode = d_inode(dentry);
@@ -2449,7 +2432,7 @@ static noinline int btrfs_ioctl_snap_destroy(struct file *file,
*/
ret = -EPERM;
if (!btrfs_test_opt(fs_info, USER_SUBVOL_RM_ALLOWED))
- goto out_dput;
+ goto out_end_removing;
/*
* Do not allow deletion if the parent dir is the same
@@ -2460,21 +2443,21 @@ static noinline int btrfs_ioctl_snap_destroy(struct file *file,
*/
ret = -EINVAL;
if (root == dest)
- goto out_dput;
+ goto out_end_removing;
ret = inode_permission(idmap, inode, MAY_WRITE | MAY_EXEC);
if (ret)
- goto out_dput;
+ goto out_end_removing;
}
/* check if subvolume may be deleted by a user */
ret = btrfs_may_delete(idmap, dir, dentry, 1);
if (ret)
- goto out_dput;
+ goto out_end_removing;
if (btrfs_ino(BTRFS_I(inode)) != BTRFS_FIRST_FREE_OBJECTID) {
ret = -EINVAL;
- goto out_dput;
+ goto out_end_removing;
}
btrfs_inode_lock(BTRFS_I(inode), 0);
@@ -2483,10 +2466,8 @@ static noinline int btrfs_ioctl_snap_destroy(struct file *file,
if (!ret)
d_delete_notify(dir, dentry);
-out_dput:
- dput(dentry);
-out_unlock_dir:
- btrfs_inode_unlock(BTRFS_I(dir), 0);
+out_end_removing:
+ end_removing(dentry);
free_subvol_name:
kfree(subvol_name_ptr);
free_parent:
diff --git a/fs/namei.c b/fs/namei.c
index cb4d40af12ae..f5c96f801b74 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2778,19 +2778,33 @@ static int filename_parentat(int dfd, struct filename *name,
* Returns: a locked dentry, or an error.
*
*/
-struct dentry *start_dirop(struct dentry *parent, struct qstr *name,
- unsigned int lookup_flags)
+static struct dentry *__start_dirop(struct dentry *parent, struct qstr *name,
+ unsigned int lookup_flags,
+ unsigned int state)
{
struct dentry *dentry;
struct inode *dir = d_inode(parent);
- inode_lock_nested(dir, I_MUTEX_PARENT);
+ if (state == TASK_KILLABLE) {
+ int ret = down_write_killable_nested(&dir->i_rwsem,
+ I_MUTEX_PARENT);
+ if (ret)
+ return ERR_PTR(ret);
+ } else {
+ inode_lock_nested(dir, I_MUTEX_PARENT);
+ }
dentry = lookup_one_qstr_excl(name, parent, lookup_flags);
if (IS_ERR(dentry))
inode_unlock(dir);
return dentry;
}
+struct dentry *start_dirop(struct dentry *parent, struct qstr *name,
+ unsigned int lookup_flags)
+{
+ return __start_dirop(parent, name, lookup_flags, TASK_NORMAL);
+}
+
/**
* end_dirop - signal completion of a dirop
* @de - the dentry which was returned by start_dirop or similar.
@@ -3296,6 +3310,66 @@ struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *parent,
}
EXPORT_SYMBOL(start_removing);
+/**
+ * start_creating_killable - prepare to create a given name with permission checking
+ * @idmap - idmap of the mount
+ * @parent - directory in which to prepare to create the name
+ * @name - the name to be created
+ *
+ * Locks are taken and a lookup in performed prior to creating
+ * an object in a directory. Permission checking (MAY_EXEC) is performed
+ * against @idmap.
+ *
+ * If the name already exists, a positive dentry is returned.
+ *
+ * If a signal is received or was already pending, the function aborts
+ * with -EINTR;
+ *
+ * Returns: a negative or positive dentry, or an error.
+ */
+struct dentry *start_creating_killable(struct mnt_idmap *idmap,
+ struct dentry *parent,
+ struct qstr *name)
+{
+ int err = lookup_one_common(idmap, name, parent);
+
+ if (err)
+ return ERR_PTR(err);
+ return __start_dirop(parent, name, LOOKUP_CREATE, TASK_KILLABLE);
+}
+EXPORT_SYMBOL(start_creating_killable);
+
+/**
+ * start_removing_killable - prepare to remove a given name with permission checking
+ * @idmap - idmap of the mount
+ * @parent - directory in which to find the name
+ * @name - the name to be removed
+ *
+ * Locks are taken and a lookup in performed prior to removing
+ * an object from a directory. Permission checking (MAY_EXEC) is performed
+ * against @idmap.
+ *
+ * If the name doesn't exist, an error is returned.
+ *
+ * end_removing() should be called when removal is complete, or aborted.
+ *
+ * If a signal is received or was already pending, the function aborts
+ * with -EINTR;
+ *
+ * Returns: a positive dentry, or an error.
+ */
+struct dentry *start_removing_killable(struct mnt_idmap *idmap,
+ struct dentry *parent,
+ struct qstr *name)
+{
+ int err = lookup_one_common(idmap, name, parent);
+
+ if (err)
+ return ERR_PTR(err);
+ return __start_dirop(parent, name, 0, TASK_KILLABLE);
+}
+EXPORT_SYMBOL(start_removing_killable);
+
/**
* start_creating_noperm - prepare to create a given name without permission checking
* @parent - directory in which to prepare to create the name
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 32a007f1043e..9771ec940b72 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -92,6 +92,12 @@ struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *parent,
struct qstr *name);
struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *parent,
struct qstr *name);
+struct dentry *start_creating_killable(struct mnt_idmap *idmap,
+ struct dentry *parent,
+ struct qstr *name);
+struct dentry *start_removing_killable(struct mnt_idmap *idmap,
+ struct dentry *parent,
+ struct qstr *name);
struct dentry *start_creating_noperm(struct dentry *parent, struct qstr *name);
struct dentry *start_removing_noperm(struct dentry *parent, struct qstr *name);
struct dentry *start_removing_dentry(struct dentry *parent,
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 49+ messages in thread* Re: [PATCH 07/11] VFS: add start_creating_killable() and start_removing_killable()
2025-09-26 2:49 ` [PATCH 07/11] VFS: add start_creating_killable() and start_removing_killable() NeilBrown
@ 2025-09-28 12:05 ` Amir Goldstein
2025-09-29 1:44 ` NeilBrown
0 siblings, 1 reply; 49+ messages in thread
From: Amir Goldstein @ 2025-09-28 12:05 UTC (permalink / raw)
To: NeilBrown
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Fri, Sep 26, 2025 at 4:51 AM NeilBrown <neilb@ownmail.net> wrote:
>
> From: NeilBrown <neil@brown.name>
>
> These are similar to start_creating() and start_removing(), but allow a
> fatal signal to abort waiting for the lock.
>
> They are used in btrfs for subvol creation and removal.
>
> btrfs_may_create() no longer needs IS_DEADDIR() and
> start_creating_killable() includes that check.
TBH, I think there is not much to gain from this removal
and now the comment
/* copy of may_create in fs/namei.c() */
is less accurate, so I would not change btrfs_may_create()
>
> Signed-off-by: NeilBrown <neil@brown.name>
Apart from that and the other comment below, the callers look good
so you may add:
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> ---
> fs/btrfs/ioctl.c | 43 +++++++----------------
> fs/namei.c | 80 +++++++++++++++++++++++++++++++++++++++++--
> include/linux/namei.h | 6 ++++
> 3 files changed, 95 insertions(+), 34 deletions(-)
>
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 7e13de2bdcbf..3a007f59f7f2 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -880,8 +880,6 @@ static inline int btrfs_may_create(struct mnt_idmap *idmap,
> {
> if (d_really_is_positive(child))
> return -EEXIST;
> - if (IS_DEADDIR(dir))
> - return -ENOENT;
> if (!fsuidgid_has_mapping(dir->i_sb, idmap))
> return -EOVERFLOW;
> return inode_permission(idmap, dir, MAY_WRITE | MAY_EXEC);
> @@ -904,14 +902,9 @@ static noinline int btrfs_mksubvol(struct dentry *parent,
> struct fscrypt_str name_str = FSTR_INIT((char *)qname->name, qname->len);
> int ret;
>
> - ret = down_write_killable_nested(&dir->i_rwsem, I_MUTEX_PARENT);
> - if (ret == -EINTR)
> - return ret;
> -
> - dentry = lookup_one(idmap, qname, parent);
> - ret = PTR_ERR(dentry);
> + dentry = start_creating_killable(idmap, parent, qname);
> if (IS_ERR(dentry))
> - goto out_unlock;
> + return PTR_ERR(dentry);
>
> ret = btrfs_may_create(idmap, dir, dentry);
> if (ret)
> @@ -940,9 +933,7 @@ static noinline int btrfs_mksubvol(struct dentry *parent,
> out_up_read:
> up_read(&fs_info->subvol_sem);
> out_dput:
> - dput(dentry);
> -out_unlock:
> - btrfs_inode_unlock(BTRFS_I(dir), 0);
> + end_creating(dentry, parent);
> return ret;
> }
>
> @@ -2417,18 +2408,10 @@ static noinline int btrfs_ioctl_snap_destroy(struct file *file,
> goto free_subvol_name;
> }
>
> - ret = down_write_killable_nested(&dir->i_rwsem, I_MUTEX_PARENT);
> - if (ret == -EINTR)
> - goto free_subvol_name;
> - dentry = lookup_one(idmap, &QSTR(subvol_name), parent);
> + dentry = start_removing_killable(idmap, parent, &QSTR(subvol_name));
> if (IS_ERR(dentry)) {
> ret = PTR_ERR(dentry);
> - goto out_unlock_dir;
> - }
> -
> - if (d_really_is_negative(dentry)) {
> - ret = -ENOENT;
> - goto out_dput;
> + goto out_end_removing;
> }
>
> inode = d_inode(dentry);
> @@ -2449,7 +2432,7 @@ static noinline int btrfs_ioctl_snap_destroy(struct file *file,
> */
> ret = -EPERM;
> if (!btrfs_test_opt(fs_info, USER_SUBVOL_RM_ALLOWED))
> - goto out_dput;
> + goto out_end_removing;
>
> /*
> * Do not allow deletion if the parent dir is the same
> @@ -2460,21 +2443,21 @@ static noinline int btrfs_ioctl_snap_destroy(struct file *file,
> */
> ret = -EINVAL;
> if (root == dest)
> - goto out_dput;
> + goto out_end_removing;
>
> ret = inode_permission(idmap, inode, MAY_WRITE | MAY_EXEC);
> if (ret)
> - goto out_dput;
> + goto out_end_removing;
> }
>
> /* check if subvolume may be deleted by a user */
> ret = btrfs_may_delete(idmap, dir, dentry, 1);
> if (ret)
> - goto out_dput;
> + goto out_end_removing;
>
> if (btrfs_ino(BTRFS_I(inode)) != BTRFS_FIRST_FREE_OBJECTID) {
> ret = -EINVAL;
> - goto out_dput;
> + goto out_end_removing;
> }
>
> btrfs_inode_lock(BTRFS_I(inode), 0);
> @@ -2483,10 +2466,8 @@ static noinline int btrfs_ioctl_snap_destroy(struct file *file,
> if (!ret)
> d_delete_notify(dir, dentry);
>
> -out_dput:
> - dput(dentry);
> -out_unlock_dir:
> - btrfs_inode_unlock(BTRFS_I(dir), 0);
> +out_end_removing:
> + end_removing(dentry);
> free_subvol_name:
> kfree(subvol_name_ptr);
> free_parent:
> diff --git a/fs/namei.c b/fs/namei.c
> index cb4d40af12ae..f5c96f801b74 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -2778,19 +2778,33 @@ static int filename_parentat(int dfd, struct filename *name,
> * Returns: a locked dentry, or an error.
> *
> */
> -struct dentry *start_dirop(struct dentry *parent, struct qstr *name,
> - unsigned int lookup_flags)
> +static struct dentry *__start_dirop(struct dentry *parent, struct qstr *name,
> + unsigned int lookup_flags,
> + unsigned int state)
> {
> struct dentry *dentry;
> struct inode *dir = d_inode(parent);
>
> - inode_lock_nested(dir, I_MUTEX_PARENT);
> + if (state == TASK_KILLABLE) {
> + int ret = down_write_killable_nested(&dir->i_rwsem,
> + I_MUTEX_PARENT);
> + if (ret)
> + return ERR_PTR(ret);
> + } else {
> + inode_lock_nested(dir, I_MUTEX_PARENT);
> + }
IIRC, Al is not fond of helpers that lock conditionally
(or conditionally killable).
Unless you plan to have many other uses to __start_dirop()
the compiler is likely to inline 3 copies of __start_dirop(), so
a variant start_dirop_killable() without the conditional may make
more sense here.
Thanks,
Amir.
> dentry = lookup_one_qstr_excl(name, parent, lookup_flags);
> if (IS_ERR(dentry))
> inode_unlock(dir);
> return dentry;
> }
>
> +struct dentry *start_dirop(struct dentry *parent, struct qstr *name,
> + unsigned int lookup_flags)
> +{
> + return __start_dirop(parent, name, lookup_flags, TASK_NORMAL);
> +}
> +
> /**
> * end_dirop - signal completion of a dirop
> * @de - the dentry which was returned by start_dirop or similar.
> @@ -3296,6 +3310,66 @@ struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *parent,
> }
> EXPORT_SYMBOL(start_removing);
>
> +/**
> + * start_creating_killable - prepare to create a given name with permission checking
> + * @idmap - idmap of the mount
> + * @parent - directory in which to prepare to create the name
> + * @name - the name to be created
> + *
> + * Locks are taken and a lookup in performed prior to creating
> + * an object in a directory. Permission checking (MAY_EXEC) is performed
> + * against @idmap.
> + *
> + * If the name already exists, a positive dentry is returned.
> + *
> + * If a signal is received or was already pending, the function aborts
> + * with -EINTR;
> + *
> + * Returns: a negative or positive dentry, or an error.
> + */
> +struct dentry *start_creating_killable(struct mnt_idmap *idmap,
> + struct dentry *parent,
> + struct qstr *name)
> +{
> + int err = lookup_one_common(idmap, name, parent);
> +
> + if (err)
> + return ERR_PTR(err);
> + return __start_dirop(parent, name, LOOKUP_CREATE, TASK_KILLABLE);
> +}
> +EXPORT_SYMBOL(start_creating_killable);
> +
> +/**
> + * start_removing_killable - prepare to remove a given name with permission checking
> + * @idmap - idmap of the mount
> + * @parent - directory in which to find the name
> + * @name - the name to be removed
> + *
> + * Locks are taken and a lookup in performed prior to removing
> + * an object from a directory. Permission checking (MAY_EXEC) is performed
> + * against @idmap.
> + *
> + * If the name doesn't exist, an error is returned.
> + *
> + * end_removing() should be called when removal is complete, or aborted.
> + *
> + * If a signal is received or was already pending, the function aborts
> + * with -EINTR;
> + *
> + * Returns: a positive dentry, or an error.
> + */
> +struct dentry *start_removing_killable(struct mnt_idmap *idmap,
> + struct dentry *parent,
> + struct qstr *name)
> +{
> + int err = lookup_one_common(idmap, name, parent);
> +
> + if (err)
> + return ERR_PTR(err);
> + return __start_dirop(parent, name, 0, TASK_KILLABLE);
> +}
> +EXPORT_SYMBOL(start_removing_killable);
> +
> /**
> * start_creating_noperm - prepare to create a given name without permission checking
> * @parent - directory in which to prepare to create the name
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 32a007f1043e..9771ec940b72 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -92,6 +92,12 @@ struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *parent,
> struct qstr *name);
> struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *parent,
> struct qstr *name);
> +struct dentry *start_creating_killable(struct mnt_idmap *idmap,
> + struct dentry *parent,
> + struct qstr *name);
> +struct dentry *start_removing_killable(struct mnt_idmap *idmap,
> + struct dentry *parent,
> + struct qstr *name);
> struct dentry *start_creating_noperm(struct dentry *parent, struct qstr *name);
> struct dentry *start_removing_noperm(struct dentry *parent, struct qstr *name);
> struct dentry *start_removing_dentry(struct dentry *parent,
> --
> 2.50.0.107.gf914562f5916.dirty
>
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH 07/11] VFS: add start_creating_killable() and start_removing_killable()
2025-09-28 12:05 ` Amir Goldstein
@ 2025-09-29 1:44 ` NeilBrown
0 siblings, 0 replies; 49+ messages in thread
From: NeilBrown @ 2025-09-29 1:44 UTC (permalink / raw)
To: Amir Goldstein
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Sun, 28 Sep 2025, Amir Goldstein wrote:
> On Fri, Sep 26, 2025 at 4:51 AM NeilBrown <neilb@ownmail.net> wrote:
> >
> > From: NeilBrown <neil@brown.name>
> >
> > These are similar to start_creating() and start_removing(), but allow a
> > fatal signal to abort waiting for the lock.
> >
> > They are used in btrfs for subvol creation and removal.
> >
> > btrfs_may_create() no longer needs IS_DEADDIR() and
> > start_creating_killable() includes that check.
>
> TBH, I think there is not much to gain from this removal
> and now the comment
> /* copy of may_create in fs/namei.c() */
> is less accurate, so I would not change btrfs_may_create()
That is reasonable. I actually wanted to remove btrfs_may_create()
completely but found that some of it was still needed.
I had another look and I think that if an idmap arg is added to
vfs_mkobj(), then using that is the "correct" way to create arbitrary
objects, and that is really what btrfs is doing here. vfs_mkobj()
incorporates the may_create() test itself.
>
> >
> > Signed-off-by: NeilBrown <neil@brown.name>
>
> Apart from that and the other comment below, the callers look good
> so you may add:
> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH 08/11] VFS/nfsd/ovl: introduce start_renaming() and end_renaming()
2025-09-26 2:49 [PATCH 00/11] Create APIs to centralise locking for directory ops NeilBrown
` (6 preceding siblings ...)
2025-09-26 2:49 ` [PATCH 07/11] VFS: add start_creating_killable() and start_removing_killable() NeilBrown
@ 2025-09-26 2:49 ` NeilBrown
2025-09-29 11:23 ` Amir Goldstein
2025-09-26 2:49 ` [PATCH 09/11] VFS/ovl/smb: introduce start_renaming_dentry() NeilBrown
` (3 subsequent siblings)
11 siblings, 1 reply; 49+ messages in thread
From: NeilBrown @ 2025-09-26 2:49 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
start_renaming() combines name lookup and locking to prepare for rename.
It is used when two names need to be looked up as in nfsd and overlayfs -
cases where one or both dentrys are already available will be handled
separately.
__start_renaming() avoids the inode_permission check and hash
calculation and is suitable after filename_parentat() in do_renameat2().
It subsumes quite a bit of code from that function.
start_renaming() does calculate the hash and check X permission and is
suitable elsewhere:
- nfsd_rename()
- ovl_rename()
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/namei.c | 197 ++++++++++++++++++++++++++++-----------
fs/nfsd/vfs.c | 73 +++++----------
fs/overlayfs/dir.c | 72 ++++++--------
fs/overlayfs/overlayfs.h | 14 +++
include/linux/namei.h | 3 +
5 files changed, 214 insertions(+), 145 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index f5c96f801b74..79a8b3b47e4d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3684,6 +3684,129 @@ void unlock_rename(struct dentry *p1, struct dentry *p2)
}
EXPORT_SYMBOL(unlock_rename);
+/**
+ * __start_renaming - lookup and lock names for rename
+ * @rd: rename data containing parent and flags, and
+ * for receiving found dentries
+ * @lookup_flags: extra flags to pass to ->lookup (e.g. LOOKUP_REVAL,
+ * LOOKUP_NO_SYMLINKS etc).
+ * @old_last: name of object in @rd.old_parent
+ * @new_last: name of object in @rd.new_parent
+ *
+ * Look up two names and ensure locks are in place for
+ * rename.
+ *
+ * On success the found dentrys are stored in @rd.old_dentry,
+ * @rd.new_dentry. These references and the lock are dropped by
+ * end_renaming().
+ *
+ * The passed in qstrs must have the hash calculated, and no permission
+ * checking is performed.
+ *
+ * Returns: zero or an error.
+ */
+static int
+__start_renaming(struct renamedata *rd, int lookup_flags,
+ struct qstr *old_last, struct qstr *new_last)
+{
+ struct dentry *trap;
+ struct dentry *d1, *d2;
+ int target_flags = LOOKUP_RENAME_TARGET | LOOKUP_CREATE;
+ int err;
+
+ if (rd->flags & RENAME_EXCHANGE)
+ target_flags = 0;
+ if (rd->flags & RENAME_NOREPLACE)
+ target_flags |= LOOKUP_EXCL;
+
+ trap = lock_rename(rd->old_parent, rd->new_parent);
+ if (IS_ERR(trap))
+ return PTR_ERR(trap);
+
+ d1 = lookup_one_qstr_excl(old_last, rd->old_parent,
+ lookup_flags);
+ if (IS_ERR(d1))
+ goto out_unlock_1;
+
+ d2 = lookup_one_qstr_excl(new_last, rd->new_parent,
+ lookup_flags | target_flags);
+ if (IS_ERR(d2))
+ goto out_unlock_2;
+
+ if (d1 == trap) {
+ /* source is an ancestor of target */
+ err = -EINVAL;
+ goto out_unlock_3;
+ }
+
+ if (d2 == trap) {
+ /* target is an ancestor of source */
+ if (rd->flags & RENAME_EXCHANGE)
+ err = -EINVAL;
+ else
+ err = -ENOTEMPTY;
+ goto out_unlock_3;
+ }
+
+ rd->old_dentry = d1;
+ rd->new_dentry = d2;
+ return 0;
+
+out_unlock_3:
+ dput(d2);
+ d2 = ERR_PTR(err);
+out_unlock_2:
+ dput(d1);
+ d1 = d2;
+out_unlock_1:
+ unlock_rename(rd->old_parent, rd->new_parent);
+ return PTR_ERR(d1);
+}
+
+/**
+ * start_renaming - lookup and lock names for rename with permission checking
+ * @rd: rename data containing parent and flags, and
+ * for receiving found dentries
+ * @lookup_flags: extra flags to pass to ->lookup (e.g. LOOKUP_REVAL,
+ * LOOKUP_NO_SYMLINKS etc).
+ * @old_last: name of object in @rd.old_parent
+ * @new_last: name of object in @rd.new_parent
+ *
+ * Look up two names and ensure locks are in place for
+ * rename.
+ *
+ * On success the found dentrys are stored in @rd.old_dentry,
+ * @rd.new_dentry. These references and the lock are dropped by
+ * end_renaming().
+ *
+ * The passed in qstrs need not have the hash calculated, and basic
+ * eXecute permission checking is performed against @rd.mnt_idmap.
+ *
+ * Returns: zero or an error.
+ */
+int start_renaming(struct renamedata *rd, int lookup_flags,
+ struct qstr *old_last, struct qstr *new_last)
+{
+ int err;
+
+ err = lookup_one_common(rd->mnt_idmap, old_last, rd->old_parent);
+ if (err)
+ return err;
+ err = lookup_one_common(rd->mnt_idmap, new_last, rd->new_parent);
+ if (err)
+ return err;
+ return __start_renaming(rd, lookup_flags, old_last, new_last);
+}
+EXPORT_SYMBOL(start_renaming);
+
+void end_renaming(struct renamedata *rd)
+{
+ unlock_rename(rd->old_parent, rd->new_parent);
+ dput(rd->old_dentry);
+ dput(rd->new_dentry);
+}
+EXPORT_SYMBOL(end_renaming);
+
/**
* vfs_prepare_mode - prepare the mode to be used for a new inode
* @idmap: idmap of the mount the inode was found from
@@ -5509,14 +5632,11 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
struct filename *to, unsigned int flags)
{
struct renamedata rd;
- struct dentry *old_dentry, *new_dentry;
- struct dentry *trap;
struct path old_path, new_path;
struct qstr old_last, new_last;
int old_type, new_type;
struct inode *delegated_inode = NULL;
- unsigned int lookup_flags = 0, target_flags =
- LOOKUP_RENAME_TARGET | LOOKUP_CREATE;
+ unsigned int lookup_flags = 0;
bool should_retry = false;
int error = -EINVAL;
@@ -5527,11 +5647,6 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
(flags & RENAME_EXCHANGE))
goto put_names;
- if (flags & RENAME_EXCHANGE)
- target_flags = 0;
- if (flags & RENAME_NOREPLACE)
- target_flags |= LOOKUP_EXCL;
-
retry:
error = filename_parentat(olddfd, from, lookup_flags, &old_path,
&old_last, &old_type);
@@ -5561,66 +5676,40 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
goto exit2;
retry_deleg:
- trap = lock_rename(new_path.dentry, old_path.dentry);
- if (IS_ERR(trap)) {
- error = PTR_ERR(trap);
+ rd.old_parent = old_path.dentry;
+ rd.mnt_idmap = mnt_idmap(old_path.mnt);
+ rd.new_parent = new_path.dentry;
+ rd.delegated_inode = &delegated_inode;
+ rd.flags = flags;
+
+ error = __start_renaming(&rd, lookup_flags, &old_last, &new_last);
+ if (error)
goto exit_lock_rename;
- }
- old_dentry = lookup_one_qstr_excl(&old_last, old_path.dentry,
- lookup_flags);
- error = PTR_ERR(old_dentry);
- if (IS_ERR(old_dentry))
- goto exit3;
- new_dentry = lookup_one_qstr_excl(&new_last, new_path.dentry,
- lookup_flags | target_flags);
- error = PTR_ERR(new_dentry);
- if (IS_ERR(new_dentry))
- goto exit4;
if (flags & RENAME_EXCHANGE) {
- if (!d_is_dir(new_dentry)) {
+ if (!d_is_dir(rd.new_dentry)) {
error = -ENOTDIR;
if (new_last.name[new_last.len])
- goto exit5;
+ goto exit_unlock;
}
}
/* unless the source is a directory trailing slashes give -ENOTDIR */
- if (!d_is_dir(old_dentry)) {
+ if (!d_is_dir(rd.old_dentry)) {
error = -ENOTDIR;
if (old_last.name[old_last.len])
- goto exit5;
+ goto exit_unlock;
if (!(flags & RENAME_EXCHANGE) && new_last.name[new_last.len])
- goto exit5;
- }
- /* source should not be ancestor of target */
- error = -EINVAL;
- if (old_dentry == trap)
- goto exit5;
- /* target should not be an ancestor of source */
- if (!(flags & RENAME_EXCHANGE))
- error = -ENOTEMPTY;
- if (new_dentry == trap)
- goto exit5;
+ goto exit_unlock;
+ }
- error = security_path_rename(&old_path, old_dentry,
- &new_path, new_dentry, flags);
+ error = security_path_rename(&old_path, rd.old_dentry,
+ &new_path, rd.new_dentry, flags);
if (error)
- goto exit5;
+ goto exit_unlock;
- rd.old_parent = old_path.dentry;
- rd.old_dentry = old_dentry;
- rd.mnt_idmap = mnt_idmap(old_path.mnt);
- rd.new_parent = new_path.dentry;
- rd.new_dentry = new_dentry;
- rd.delegated_inode = &delegated_inode;
- rd.flags = flags;
error = vfs_rename(&rd);
-exit5:
- dput(new_dentry);
-exit4:
- dput(old_dentry);
-exit3:
- unlock_rename(new_path.dentry, old_path.dentry);
+exit_unlock:
+ end_renaming(&rd);
exit_lock_rename:
if (delegated_inode) {
error = break_deleg_wait(&delegated_inode);
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index d5b4550fd8f6..091112d931f9 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1862,11 +1862,12 @@ __be32
nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
struct svc_fh *tfhp, char *tname, int tlen)
{
- struct dentry *fdentry, *tdentry, *odentry, *ndentry, *trap;
+ struct dentry *fdentry, *tdentry;
int type = S_IFDIR;
+ struct renamedata rd = {};
__be32 err;
int host_err;
- bool close_cached = false;
+ struct dentry *close_cached;
trace_nfsd_vfs_rename(rqstp, ffhp, tfhp, fname, flen, tname, tlen);
@@ -1892,15 +1893,22 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
goto out;
retry:
+ close_cached = NULL;
host_err = fh_want_write(ffhp);
if (host_err) {
err = nfserrno(host_err);
goto out;
}
- trap = lock_rename(tdentry, fdentry);
- if (IS_ERR(trap)) {
- err = nfserr_xdev;
+ rd.mnt_idmap = &nop_mnt_idmap;
+ rd.old_parent = fdentry;
+ rd.new_parent = tdentry;
+
+ host_err = start_renaming(&rd, 0, &QSTR_LEN(fname, flen),
+ &QSTR_LEN(tname, tlen));
+
+ if (host_err) {
+ err = nfserrno(host_err);
goto out_want_write;
}
err = fh_fill_pre_attrs(ffhp);
@@ -1910,48 +1918,23 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
if (err != nfs_ok)
goto out_unlock;
- odentry = lookup_one(&nop_mnt_idmap, &QSTR_LEN(fname, flen), fdentry);
- host_err = PTR_ERR(odentry);
- if (IS_ERR(odentry))
- goto out_nfserr;
+ type = d_inode(rd.old_dentry)->i_mode & S_IFMT;
+
+ if (d_inode(rd.new_dentry))
+ type = d_inode(rd.new_dentry)->i_mode & S_IFMT;
- host_err = -ENOENT;
- if (d_really_is_negative(odentry))
- goto out_dput_old;
- host_err = -EINVAL;
- if (odentry == trap)
- goto out_dput_old;
- type = d_inode(odentry)->i_mode & S_IFMT;
-
- ndentry = lookup_one(&nop_mnt_idmap, &QSTR_LEN(tname, tlen), tdentry);
- host_err = PTR_ERR(ndentry);
- if (IS_ERR(ndentry))
- goto out_dput_old;
- if (d_inode(ndentry))
- type = d_inode(ndentry)->i_mode & S_IFMT;
- host_err = -ENOTEMPTY;
- if (ndentry == trap)
- goto out_dput_new;
-
- if ((ndentry->d_sb->s_export_op->flags & EXPORT_OP_CLOSE_BEFORE_UNLINK) &&
- nfsd_has_cached_files(ndentry)) {
- close_cached = true;
- goto out_dput_old;
+ if ((rd.new_dentry->d_sb->s_export_op->flags & EXPORT_OP_CLOSE_BEFORE_UNLINK) &&
+ nfsd_has_cached_files(rd.new_dentry)) {
+ close_cached = dget(rd.new_dentry);
+ goto out_unlock;
} else {
- struct renamedata rd = {
- .mnt_idmap = &nop_mnt_idmap,
- .old_parent = fdentry,
- .old_dentry = odentry,
- .new_parent = tdentry,
- .new_dentry = ndentry,
- };
int retries;
for (retries = 1;;) {
host_err = vfs_rename(&rd);
if (host_err != -EAGAIN || !retries--)
break;
- if (!nfsd_wait_for_delegreturn(rqstp, d_inode(odentry)))
+ if (!nfsd_wait_for_delegreturn(rqstp, d_inode(rd.old_dentry)))
break;
}
if (!host_err) {
@@ -1960,11 +1943,6 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
host_err = commit_metadata(ffhp);
}
}
- out_dput_new:
- dput(ndentry);
- out_dput_old:
- dput(odentry);
- out_nfserr:
if (host_err == -EBUSY) {
/*
* See RFC 8881 Section 18.26.4 para 1-3: NFSv4 RENAME
@@ -1983,7 +1961,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
fh_fill_post_attrs(tfhp);
}
out_unlock:
- unlock_rename(tdentry, fdentry);
+ end_renaming(&rd);
out_want_write:
fh_drop_write(ffhp);
@@ -1994,9 +1972,8 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
* until this point and then reattempt the whole shebang.
*/
if (close_cached) {
- close_cached = false;
- nfsd_close_cached_files(ndentry);
- dput(ndentry);
+ nfsd_close_cached_files(close_cached);
+ dput(close_cached);
goto retry;
}
out:
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index 74b1ef5860a4..b37aefe465a2 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -1099,9 +1099,7 @@ static int ovl_rename(struct mnt_idmap *idmap, struct inode *olddir,
int err;
struct dentry *old_upperdir;
struct dentry *new_upperdir;
- struct dentry *olddentry = NULL;
- struct dentry *newdentry = NULL;
- struct dentry *trap, *de;
+ struct renamedata rd = {};
bool old_opaque;
bool new_opaque;
bool cleanup_whiteout = false;
@@ -1208,29 +1206,21 @@ static int ovl_rename(struct mnt_idmap *idmap, struct inode *olddir,
}
}
- trap = lock_rename(new_upperdir, old_upperdir);
- if (IS_ERR(trap)) {
- err = PTR_ERR(trap);
- goto out_revert_creds;
- }
+ rd.mnt_idmap = ovl_upper_mnt_idmap(ofs);
+ rd.old_parent = old_upperdir;
+ rd.new_parent = new_upperdir;
+ rd.flags = flags;
- de = ovl_lookup_upper(ofs, old->d_name.name, old_upperdir,
- old->d_name.len);
- err = PTR_ERR(de);
- if (IS_ERR(de))
- goto out_unlock;
- olddentry = de;
+ err = start_renaming(&rd, 0,
+ &QSTR_LEN(old->d_name.name, old->d_name.len),
+ &QSTR_LEN(new->d_name.name, new->d_name.len));
- err = -ESTALE;
- if (!ovl_matches_upper(old, olddentry))
- goto out_unlock;
+ if (err)
+ goto out_revert_creds;
- de = ovl_lookup_upper(ofs, new->d_name.name, new_upperdir,
- new->d_name.len);
- err = PTR_ERR(de);
- if (IS_ERR(de))
+ err = -ESTALE;
+ if (!ovl_matches_upper(old, rd.old_dentry))
goto out_unlock;
- newdentry = de;
old_opaque = ovl_dentry_is_opaque(old);
new_opaque = ovl_dentry_is_opaque(new);
@@ -1238,15 +1228,15 @@ static int ovl_rename(struct mnt_idmap *idmap, struct inode *olddir,
err = -ESTALE;
if (d_inode(new) && ovl_dentry_upper(new)) {
if (opaquedir) {
- if (newdentry != opaquedir)
+ if (rd.new_dentry != opaquedir)
goto out_unlock;
} else {
- if (!ovl_matches_upper(new, newdentry))
+ if (!ovl_matches_upper(new, rd.new_dentry))
goto out_unlock;
}
} else {
- if (!d_is_negative(newdentry)) {
- if (!new_opaque || !ovl_upper_is_whiteout(ofs, newdentry))
+ if (!d_is_negative(rd.new_dentry)) {
+ if (!new_opaque || !ovl_upper_is_whiteout(ofs, rd.new_dentry))
goto out_unlock;
} else {
if (flags & RENAME_EXCHANGE)
@@ -1254,19 +1244,14 @@ static int ovl_rename(struct mnt_idmap *idmap, struct inode *olddir,
}
}
- if (olddentry == trap)
- goto out_unlock;
- if (newdentry == trap)
- goto out_unlock;
-
- if (olddentry->d_inode == newdentry->d_inode)
+ if (rd.old_dentry->d_inode == rd.new_dentry->d_inode)
goto out_unlock;
err = 0;
if (ovl_type_merge_or_lower(old))
err = ovl_set_redirect(old, samedir);
else if (is_dir && !old_opaque && ovl_type_merge(new->d_parent))
- err = ovl_set_opaque_xerr(old, olddentry, -EXDEV);
+ err = ovl_set_opaque_xerr(old, rd.old_dentry, -EXDEV);
if (err)
goto out_unlock;
@@ -1274,19 +1259,22 @@ static int ovl_rename(struct mnt_idmap *idmap, struct inode *olddir,
err = ovl_set_redirect(new, samedir);
else if (!overwrite && new_is_dir && !new_opaque &&
ovl_type_merge(old->d_parent))
- err = ovl_set_opaque_xerr(new, newdentry, -EXDEV);
+ err = ovl_set_opaque_xerr(new, rd.new_dentry, -EXDEV);
if (err)
goto out_unlock;
- err = ovl_do_rename(ofs, old_upperdir, olddentry,
- new_upperdir, newdentry, flags);
- unlock_rename(new_upperdir, old_upperdir);
+ err = ovl_do_rename_rd(&rd);
+
+ dget(rd.new_dentry);
+ end_renaming(&rd);
+
+ if (!err && cleanup_whiteout) {
+ ovl_cleanup(ofs, old_upperdir, rd.new_dentry);
+ }
+ dput(rd.new_dentry);
if (err)
goto out_revert_creds;
- if (cleanup_whiteout)
- ovl_cleanup(ofs, old_upperdir, newdentry);
-
if (overwrite && d_inode(new)) {
if (new_is_dir)
clear_nlink(d_inode(new));
@@ -1311,14 +1299,12 @@ static int ovl_rename(struct mnt_idmap *idmap, struct inode *olddir,
else
ovl_drop_write(old);
out:
- dput(newdentry);
- dput(olddentry);
dput(opaquedir);
ovl_cache_free(&list);
return err;
out_unlock:
- unlock_rename(new_upperdir, old_upperdir);
+ end_renaming(&rd);
goto out_revert_creds;
}
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 915af58459b7..181fc46195f2 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -378,6 +378,20 @@ static inline int ovl_do_rename(struct ovl_fs *ofs, struct dentry *olddir,
return err;
}
+static inline int ovl_do_rename_rd(struct renamedata *rd)
+{
+ int err;
+
+ pr_debug("rename(%pd2, %pd2, 0x%x)\n", rd->old_dentry, rd->new_dentry,
+ rd->flags);
+ err = vfs_rename(rd);
+ if (err) {
+ pr_debug("...rename(%pd2, %pd2, ...) = %i\n",
+ rd->old_dentry, rd->new_dentry, err);
+ }
+ return err;
+}
+
static inline int ovl_do_whiteout(struct ovl_fs *ofs,
struct inode *dir, struct dentry *dentry)
{
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 9771ec940b72..b65ed6e1e91a 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -140,6 +140,9 @@ extern int follow_up(struct path *);
extern struct dentry *lock_rename(struct dentry *, struct dentry *);
extern struct dentry *lock_rename_child(struct dentry *, struct dentry *);
extern void unlock_rename(struct dentry *, struct dentry *);
+int start_renaming(struct renamedata *rd, int lookup_flags,
+ struct qstr *old_last, struct qstr *new_last);
+void end_renaming(struct renamedata *rd);
/**
* mode_strip_umask - handle vfs umask stripping
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 49+ messages in thread* Re: [PATCH 08/11] VFS/nfsd/ovl: introduce start_renaming() and end_renaming()
2025-09-26 2:49 ` [PATCH 08/11] VFS/nfsd/ovl: introduce start_renaming() and end_renaming() NeilBrown
@ 2025-09-29 11:23 ` Amir Goldstein
0 siblings, 0 replies; 49+ messages in thread
From: Amir Goldstein @ 2025-09-29 11:23 UTC (permalink / raw)
To: NeilBrown
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Fri, Sep 26, 2025 at 4:51 AM NeilBrown <neilb@ownmail.net> wrote:
>
> From: NeilBrown <neil@brown.name>
>
> start_renaming() combines name lookup and locking to prepare for rename.
> It is used when two names need to be looked up as in nfsd and overlayfs -
> cases where one or both dentrys are already available will be handled
> separately.
>
> __start_renaming() avoids the inode_permission check and hash
> calculation and is suitable after filename_parentat() in do_renameat2().
> It subsumes quite a bit of code from that function.
>
> start_renaming() does calculate the hash and check X permission and is
> suitable elsewhere:
> - nfsd_rename()
> - ovl_rename()
>
> Signed-off-by: NeilBrown <neil@brown.name>
> ---
> fs/namei.c | 197 ++++++++++++++++++++++++++++-----------
> fs/nfsd/vfs.c | 73 +++++----------
> fs/overlayfs/dir.c | 72 ++++++--------
> fs/overlayfs/overlayfs.h | 14 +++
> include/linux/namei.h | 3 +
> 5 files changed, 214 insertions(+), 145 deletions(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index f5c96f801b74..79a8b3b47e4d 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3684,6 +3684,129 @@ void unlock_rename(struct dentry *p1, struct dentry *p2)
> }
> EXPORT_SYMBOL(unlock_rename);
>
> +/**
> + * __start_renaming - lookup and lock names for rename
> + * @rd: rename data containing parent and flags, and
> + * for receiving found dentries
> + * @lookup_flags: extra flags to pass to ->lookup (e.g. LOOKUP_REVAL,
> + * LOOKUP_NO_SYMLINKS etc).
> + * @old_last: name of object in @rd.old_parent
> + * @new_last: name of object in @rd.new_parent
> + *
> + * Look up two names and ensure locks are in place for
> + * rename.
> + *
> + * On success the found dentrys are stored in @rd.old_dentry,
Any reason for this odd spelling of dentries?
> + * @rd.new_dentry. These references and the lock are dropped by
> + * end_renaming().
> + *
> + * The passed in qstrs must have the hash calculated, and no permission
> + * checking is performed.
> + *
> + * Returns: zero or an error.
> + */
> +static int
> +__start_renaming(struct renamedata *rd, int lookup_flags,
> + struct qstr *old_last, struct qstr *new_last)
> +{
> + struct dentry *trap;
> + struct dentry *d1, *d2;
> + int target_flags = LOOKUP_RENAME_TARGET | LOOKUP_CREATE;
> + int err;
> +
> + if (rd->flags & RENAME_EXCHANGE)
> + target_flags = 0;
> + if (rd->flags & RENAME_NOREPLACE)
> + target_flags |= LOOKUP_EXCL;
> +
> + trap = lock_rename(rd->old_parent, rd->new_parent);
> + if (IS_ERR(trap))
> + return PTR_ERR(trap);
> +
> + d1 = lookup_one_qstr_excl(old_last, rd->old_parent,
> + lookup_flags);
err = IS_ERR(d1);
> + if (IS_ERR(d1))
> + goto out_unlock_1;
> +
> + d2 = lookup_one_qstr_excl(new_last, rd->new_parent,
> + lookup_flags | target_flags);
err = IS_ERR(d2);
> + if (IS_ERR(d2))
> + goto out_unlock_2;
> +
> + if (d1 == trap) {
> + /* source is an ancestor of target */
> + err = -EINVAL;
> + goto out_unlock_3;
> + }
> +
> + if (d2 == trap) {
> + /* target is an ancestor of source */
> + if (rd->flags & RENAME_EXCHANGE)
> + err = -EINVAL;
> + else
> + err = -ENOTEMPTY;
> + goto out_unlock_3;
> + }
> +
> + rd->old_dentry = d1;
> + rd->new_dentry = d2;
> + return 0;
> +
I'd rather avoid meaningless label names.
> +out_unlock_3:
out_dput_d2:
> + dput(d2);
> + d2 = ERR_PTR(err);
This is not pretty IMO, much cleaner to assign err before goto
> +out_unlock_2:
out_dput_d1:
> + dput(d1);
> + d1 = d2;
This is not pretty IMO, much cleaner to assign err before goto
> +out_unlock_1:
out_unlock:
> + unlock_rename(rd->old_parent, rd->new_parent);
> + return PTR_ERR(d1);
return err;
> +}
> +
> +/**
> + * start_renaming - lookup and lock names for rename with permission checking
> + * @rd: rename data containing parent and flags, and
> + * for receiving found dentries
> + * @lookup_flags: extra flags to pass to ->lookup (e.g. LOOKUP_REVAL,
> + * LOOKUP_NO_SYMLINKS etc).
> + * @old_last: name of object in @rd.old_parent
> + * @new_last: name of object in @rd.new_parent
> + *
> + * Look up two names and ensure locks are in place for
> + * rename.
> + *
> + * On success the found dentrys are stored in @rd.old_dentry,
> + * @rd.new_dentry. These references and the lock are dropped by
> + * end_renaming().
> + *
> + * The passed in qstrs need not have the hash calculated, and basic
> + * eXecute permission checking is performed against @rd.mnt_idmap.
> + *
> + * Returns: zero or an error.
> + */
> +int start_renaming(struct renamedata *rd, int lookup_flags,
> + struct qstr *old_last, struct qstr *new_last)
> +{
> + int err;
> +
> + err = lookup_one_common(rd->mnt_idmap, old_last, rd->old_parent);
> + if (err)
> + return err;
> + err = lookup_one_common(rd->mnt_idmap, new_last, rd->new_parent);
> + if (err)
> + return err;
> + return __start_renaming(rd, lookup_flags, old_last, new_last);
> +}
> +EXPORT_SYMBOL(start_renaming);
> +
> +void end_renaming(struct renamedata *rd)
> +{
> + unlock_rename(rd->old_parent, rd->new_parent);
> + dput(rd->old_dentry);
> + dput(rd->new_dentry);
> +}
> +EXPORT_SYMBOL(end_renaming);
> +
> /**
> * vfs_prepare_mode - prepare the mode to be used for a new inode
> * @idmap: idmap of the mount the inode was found from
> @@ -5509,14 +5632,11 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
> struct filename *to, unsigned int flags)
> {
> struct renamedata rd;
> - struct dentry *old_dentry, *new_dentry;
> - struct dentry *trap;
> struct path old_path, new_path;
> struct qstr old_last, new_last;
> int old_type, new_type;
> struct inode *delegated_inode = NULL;
> - unsigned int lookup_flags = 0, target_flags =
> - LOOKUP_RENAME_TARGET | LOOKUP_CREATE;
> + unsigned int lookup_flags = 0;
> bool should_retry = false;
> int error = -EINVAL;
>
> @@ -5527,11 +5647,6 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
> (flags & RENAME_EXCHANGE))
> goto put_names;
>
> - if (flags & RENAME_EXCHANGE)
> - target_flags = 0;
> - if (flags & RENAME_NOREPLACE)
> - target_flags |= LOOKUP_EXCL;
> -
> retry:
> error = filename_parentat(olddfd, from, lookup_flags, &old_path,
> &old_last, &old_type);
> @@ -5561,66 +5676,40 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
> goto exit2;
>
> retry_deleg:
> - trap = lock_rename(new_path.dentry, old_path.dentry);
> - if (IS_ERR(trap)) {
> - error = PTR_ERR(trap);
> + rd.old_parent = old_path.dentry;
> + rd.mnt_idmap = mnt_idmap(old_path.mnt);
> + rd.new_parent = new_path.dentry;
> + rd.delegated_inode = &delegated_inode;
> + rd.flags = flags;
> +
> + error = __start_renaming(&rd, lookup_flags, &old_last, &new_last);
> + if (error)
> goto exit_lock_rename;
> - }
>
> - old_dentry = lookup_one_qstr_excl(&old_last, old_path.dentry,
> - lookup_flags);
> - error = PTR_ERR(old_dentry);
> - if (IS_ERR(old_dentry))
> - goto exit3;
> - new_dentry = lookup_one_qstr_excl(&new_last, new_path.dentry,
> - lookup_flags | target_flags);
> - error = PTR_ERR(new_dentry);
> - if (IS_ERR(new_dentry))
> - goto exit4;
> if (flags & RENAME_EXCHANGE) {
> - if (!d_is_dir(new_dentry)) {
> + if (!d_is_dir(rd.new_dentry)) {
> error = -ENOTDIR;
> if (new_last.name[new_last.len])
> - goto exit5;
> + goto exit_unlock;
> }
> }
> /* unless the source is a directory trailing slashes give -ENOTDIR */
> - if (!d_is_dir(old_dentry)) {
> + if (!d_is_dir(rd.old_dentry)) {
> error = -ENOTDIR;
> if (old_last.name[old_last.len])
> - goto exit5;
> + goto exit_unlock;
> if (!(flags & RENAME_EXCHANGE) && new_last.name[new_last.len])
> - goto exit5;
> - }
> - /* source should not be ancestor of target */
> - error = -EINVAL;
> - if (old_dentry == trap)
> - goto exit5;
> - /* target should not be an ancestor of source */
> - if (!(flags & RENAME_EXCHANGE))
> - error = -ENOTEMPTY;
> - if (new_dentry == trap)
> - goto exit5;
> + goto exit_unlock;
> + }
>
> - error = security_path_rename(&old_path, old_dentry,
> - &new_path, new_dentry, flags);
> + error = security_path_rename(&old_path, rd.old_dentry,
> + &new_path, rd.new_dentry, flags);
> if (error)
> - goto exit5;
> + goto exit_unlock;
>
> - rd.old_parent = old_path.dentry;
> - rd.old_dentry = old_dentry;
> - rd.mnt_idmap = mnt_idmap(old_path.mnt);
> - rd.new_parent = new_path.dentry;
> - rd.new_dentry = new_dentry;
> - rd.delegated_inode = &delegated_inode;
> - rd.flags = flags;
> error = vfs_rename(&rd);
> -exit5:
> - dput(new_dentry);
> -exit4:
> - dput(old_dentry);
> -exit3:
> - unlock_rename(new_path.dentry, old_path.dentry);
> +exit_unlock:
> + end_renaming(&rd);
> exit_lock_rename:
> if (delegated_inode) {
> error = break_deleg_wait(&delegated_inode);
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index d5b4550fd8f6..091112d931f9 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1862,11 +1862,12 @@ __be32
> nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
> struct svc_fh *tfhp, char *tname, int tlen)
> {
> - struct dentry *fdentry, *tdentry, *odentry, *ndentry, *trap;
> + struct dentry *fdentry, *tdentry;
> int type = S_IFDIR;
> + struct renamedata rd = {};
> __be32 err;
> int host_err;
> - bool close_cached = false;
> + struct dentry *close_cached;
>
> trace_nfsd_vfs_rename(rqstp, ffhp, tfhp, fname, flen, tname, tlen);
>
> @@ -1892,15 +1893,22 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
> goto out;
>
> retry:
> + close_cached = NULL;
> host_err = fh_want_write(ffhp);
> if (host_err) {
> err = nfserrno(host_err);
> goto out;
> }
>
> - trap = lock_rename(tdentry, fdentry);
> - if (IS_ERR(trap)) {
> - err = nfserr_xdev;
> + rd.mnt_idmap = &nop_mnt_idmap;
> + rd.old_parent = fdentry;
> + rd.new_parent = tdentry;
> +
> + host_err = start_renaming(&rd, 0, &QSTR_LEN(fname, flen),
> + &QSTR_LEN(tname, tlen));
> +
> + if (host_err) {
> + err = nfserrno(host_err);
> goto out_want_write;
> }
> err = fh_fill_pre_attrs(ffhp);
> @@ -1910,48 +1918,23 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
> if (err != nfs_ok)
> goto out_unlock;
>
> - odentry = lookup_one(&nop_mnt_idmap, &QSTR_LEN(fname, flen), fdentry);
> - host_err = PTR_ERR(odentry);
> - if (IS_ERR(odentry))
> - goto out_nfserr;
> + type = d_inode(rd.old_dentry)->i_mode & S_IFMT;
> +
> + if (d_inode(rd.new_dentry))
> + type = d_inode(rd.new_dentry)->i_mode & S_IFMT;
>
> - host_err = -ENOENT;
> - if (d_really_is_negative(odentry))
> - goto out_dput_old;
> - host_err = -EINVAL;
> - if (odentry == trap)
> - goto out_dput_old;
> - type = d_inode(odentry)->i_mode & S_IFMT;
> -
> - ndentry = lookup_one(&nop_mnt_idmap, &QSTR_LEN(tname, tlen), tdentry);
> - host_err = PTR_ERR(ndentry);
> - if (IS_ERR(ndentry))
> - goto out_dput_old;
> - if (d_inode(ndentry))
> - type = d_inode(ndentry)->i_mode & S_IFMT;
> - host_err = -ENOTEMPTY;
> - if (ndentry == trap)
> - goto out_dput_new;
> -
> - if ((ndentry->d_sb->s_export_op->flags & EXPORT_OP_CLOSE_BEFORE_UNLINK) &&
> - nfsd_has_cached_files(ndentry)) {
> - close_cached = true;
> - goto out_dput_old;
> + if ((rd.new_dentry->d_sb->s_export_op->flags & EXPORT_OP_CLOSE_BEFORE_UNLINK) &&
> + nfsd_has_cached_files(rd.new_dentry)) {
> + close_cached = dget(rd.new_dentry);
> + goto out_unlock;
> } else {
> - struct renamedata rd = {
> - .mnt_idmap = &nop_mnt_idmap,
> - .old_parent = fdentry,
> - .old_dentry = odentry,
> - .new_parent = tdentry,
> - .new_dentry = ndentry,
> - };
> int retries;
>
> for (retries = 1;;) {
> host_err = vfs_rename(&rd);
> if (host_err != -EAGAIN || !retries--)
> break;
> - if (!nfsd_wait_for_delegreturn(rqstp, d_inode(odentry)))
> + if (!nfsd_wait_for_delegreturn(rqstp, d_inode(rd.old_dentry)))
> break;
> }
> if (!host_err) {
> @@ -1960,11 +1943,6 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
> host_err = commit_metadata(ffhp);
> }
> }
> - out_dput_new:
> - dput(ndentry);
> - out_dput_old:
> - dput(odentry);
> - out_nfserr:
> if (host_err == -EBUSY) {
> /*
> * See RFC 8881 Section 18.26.4 para 1-3: NFSv4 RENAME
> @@ -1983,7 +1961,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
> fh_fill_post_attrs(tfhp);
> }
> out_unlock:
> - unlock_rename(tdentry, fdentry);
> + end_renaming(&rd);
> out_want_write:
> fh_drop_write(ffhp);
>
> @@ -1994,9 +1972,8 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
> * until this point and then reattempt the whole shebang.
> */
> if (close_cached) {
> - close_cached = false;
> - nfsd_close_cached_files(ndentry);
> - dput(ndentry);
> + nfsd_close_cached_files(close_cached);
> + dput(close_cached);
> goto retry;
> }
> out:
> diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> index 74b1ef5860a4..b37aefe465a2 100644
> --- a/fs/overlayfs/dir.c
> +++ b/fs/overlayfs/dir.c
> @@ -1099,9 +1099,7 @@ static int ovl_rename(struct mnt_idmap *idmap, struct inode *olddir,
> int err;
> struct dentry *old_upperdir;
> struct dentry *new_upperdir;
> - struct dentry *olddentry = NULL;
> - struct dentry *newdentry = NULL;
> - struct dentry *trap, *de;
> + struct renamedata rd = {};
> bool old_opaque;
> bool new_opaque;
> bool cleanup_whiteout = false;
> @@ -1208,29 +1206,21 @@ static int ovl_rename(struct mnt_idmap *idmap, struct inode *olddir,
> }
> }
>
> - trap = lock_rename(new_upperdir, old_upperdir);
> - if (IS_ERR(trap)) {
> - err = PTR_ERR(trap);
> - goto out_revert_creds;
> - }
> + rd.mnt_idmap = ovl_upper_mnt_idmap(ofs);
> + rd.old_parent = old_upperdir;
> + rd.new_parent = new_upperdir;
> + rd.flags = flags;
>
> - de = ovl_lookup_upper(ofs, old->d_name.name, old_upperdir,
> - old->d_name.len);
> - err = PTR_ERR(de);
> - if (IS_ERR(de))
> - goto out_unlock;
> - olddentry = de;
> + err = start_renaming(&rd, 0,
> + &QSTR_LEN(old->d_name.name, old->d_name.len),
> + &QSTR_LEN(new->d_name.name, new->d_name.len));
>
> - err = -ESTALE;
> - if (!ovl_matches_upper(old, olddentry))
> - goto out_unlock;
> + if (err)
> + goto out_revert_creds;
>
> - de = ovl_lookup_upper(ofs, new->d_name.name, new_upperdir,
> - new->d_name.len);
> - err = PTR_ERR(de);
> - if (IS_ERR(de))
> + err = -ESTALE;
> + if (!ovl_matches_upper(old, rd.old_dentry))
> goto out_unlock;
> - newdentry = de;
>
> old_opaque = ovl_dentry_is_opaque(old);
> new_opaque = ovl_dentry_is_opaque(new);
> @@ -1238,15 +1228,15 @@ static int ovl_rename(struct mnt_idmap *idmap, struct inode *olddir,
> err = -ESTALE;
> if (d_inode(new) && ovl_dentry_upper(new)) {
> if (opaquedir) {
> - if (newdentry != opaquedir)
> + if (rd.new_dentry != opaquedir)
> goto out_unlock;
> } else {
> - if (!ovl_matches_upper(new, newdentry))
> + if (!ovl_matches_upper(new, rd.new_dentry))
> goto out_unlock;
> }
> } else {
> - if (!d_is_negative(newdentry)) {
> - if (!new_opaque || !ovl_upper_is_whiteout(ofs, newdentry))
> + if (!d_is_negative(rd.new_dentry)) {
> + if (!new_opaque || !ovl_upper_is_whiteout(ofs, rd.new_dentry))
> goto out_unlock;
> } else {
> if (flags & RENAME_EXCHANGE)
> @@ -1254,19 +1244,14 @@ static int ovl_rename(struct mnt_idmap *idmap, struct inode *olddir,
> }
> }
>
> - if (olddentry == trap)
> - goto out_unlock;
> - if (newdentry == trap)
> - goto out_unlock;
> -
> - if (olddentry->d_inode == newdentry->d_inode)
> + if (rd.old_dentry->d_inode == rd.new_dentry->d_inode)
> goto out_unlock;
>
> err = 0;
> if (ovl_type_merge_or_lower(old))
> err = ovl_set_redirect(old, samedir);
> else if (is_dir && !old_opaque && ovl_type_merge(new->d_parent))
> - err = ovl_set_opaque_xerr(old, olddentry, -EXDEV);
> + err = ovl_set_opaque_xerr(old, rd.old_dentry, -EXDEV);
> if (err)
> goto out_unlock;
>
> @@ -1274,19 +1259,22 @@ static int ovl_rename(struct mnt_idmap *idmap, struct inode *olddir,
> err = ovl_set_redirect(new, samedir);
> else if (!overwrite && new_is_dir && !new_opaque &&
> ovl_type_merge(old->d_parent))
> - err = ovl_set_opaque_xerr(new, newdentry, -EXDEV);
> + err = ovl_set_opaque_xerr(new, rd.new_dentry, -EXDEV);
> if (err)
> goto out_unlock;
>
> - err = ovl_do_rename(ofs, old_upperdir, olddentry,
> - new_upperdir, newdentry, flags);
> - unlock_rename(new_upperdir, old_upperdir);
> + err = ovl_do_rename_rd(&rd);
> +
> + dget(rd.new_dentry);
> + end_renaming(&rd);
> +
> + if (!err && cleanup_whiteout) {
> + ovl_cleanup(ofs, old_upperdir, rd.new_dentry);
> + }
> + dput(rd.new_dentry);
I would restructure this for better clarity:
if (!err && cleanup_whiteout)
whiteout = dget(rd.new_dentry);
end_renaming(&rd);
if (err)
goto out_revert_creds;
if (whiteout) {
ovl_cleanup(ofs, old_upperdir, whiteout);
dput(whiteout);
}
> if (err)
> goto out_revert_creds;
>
> - if (cleanup_whiteout)
> - ovl_cleanup(ofs, old_upperdir, newdentry);
> -
> if (overwrite && d_inode(new)) {
> if (new_is_dir)
> clear_nlink(d_inode(new));
> @@ -1311,14 +1299,12 @@ static int ovl_rename(struct mnt_idmap *idmap, struct inode *olddir,
> else
> ovl_drop_write(old);
> out:
> - dput(newdentry);
> - dput(olddentry);
> dput(opaquedir);
> ovl_cache_free(&list);
> return err;
>
> out_unlock:
> - unlock_rename(new_upperdir, old_upperdir);
> + end_renaming(&rd);
> goto out_revert_creds;
> }
>
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index 915af58459b7..181fc46195f2 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -378,6 +378,20 @@ static inline int ovl_do_rename(struct ovl_fs *ofs, struct dentry *olddir,
> return err;
> }
>
> +static inline int ovl_do_rename_rd(struct renamedata *rd)
> +{
> + int err;
> +
> + pr_debug("rename(%pd2, %pd2, 0x%x)\n", rd->old_dentry, rd->new_dentry,
> + rd->flags);
> + err = vfs_rename(rd);
> + if (err) {
> + pr_debug("...rename(%pd2, %pd2, ...) = %i\n",
> + rd->old_dentry, rd->new_dentry, err);
> + }
> + return err;
> +}
> +
This was factored out of ovl_do_rename().
Please avoid duplication and call this from ovl_do_rename().
Even if you are going to remove ovl_do_rename() which has no callers
at the end of your series still, please avoid copying this code mid series.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH 09/11] VFS/ovl/smb: introduce start_renaming_dentry()
2025-09-26 2:49 [PATCH 00/11] Create APIs to centralise locking for directory ops NeilBrown
` (7 preceding siblings ...)
2025-09-26 2:49 ` [PATCH 08/11] VFS/nfsd/ovl: introduce start_renaming() and end_renaming() NeilBrown
@ 2025-09-26 2:49 ` NeilBrown
2025-09-26 15:43 ` kernel test robot
` (2 more replies)
2025-09-26 2:49 ` [PATCH 10/11] Add start_renaming_two_dentrys() NeilBrown
` (2 subsequent siblings)
11 siblings, 3 replies; 49+ messages in thread
From: NeilBrown @ 2025-09-26 2:49 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
Several callers perform a rename on a dentry they already have, and only
require lookup for the target name. This includes smb/server and a few
different places in overlayfs.
start_renaming_dentry() performs the required lookup and takes the
required lock using lock_rename_child()
It is used in three places in overlayfs and in ksmbd_vfs_rename().
In the ksmbd case, the parent of the source is not important - the
source must be renamed from wherever it is. So start_renaming_dentry()
allows rd->old_parent to be NULL and only checks it if it is non-NULL.
On success rd->old_parent will be the parent of old_dentry with an extra
reference taken. Other start_renaming function also now take the extra
reference and end_renaming() now drops this reference as well.
ovl_lookup_temp(), ovl_parent_lock(), and ovl_parent_unlock() are
all removed as they are no longer needed.
OVL_TEMPNAME_SIZE and ovl_tempname() are now declared in overlayfs.h so
that ovl_check_rename_whiteout() can access them.
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/namei.c | 106 ++++++++++++++++++++++++++++++++++++---
fs/overlayfs/copy_up.c | 47 ++++++++---------
fs/overlayfs/dir.c | 19 +------
fs/overlayfs/overlayfs.h | 8 +--
fs/overlayfs/super.c | 20 ++++----
fs/overlayfs/util.c | 11 ----
fs/smb/server/vfs.c | 60 ++++------------------
include/linux/namei.h | 2 +
8 files changed, 147 insertions(+), 126 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index 79a8b3b47e4d..aca6de83d255 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3686,7 +3686,7 @@ EXPORT_SYMBOL(unlock_rename);
/**
* __start_renaming - lookup and lock names for rename
- * @rd: rename data containing parent and flags, and
+ * @rd: rename data containing parents and flags, and
* for receiving found dentries
* @lookup_flags: extra flags to pass to ->lookup (e.g. LOOKUP_REVAL,
* LOOKUP_NO_SYMLINKS etc).
@@ -3697,8 +3697,8 @@ EXPORT_SYMBOL(unlock_rename);
* rename.
*
* On success the found dentrys are stored in @rd.old_dentry,
- * @rd.new_dentry. These references and the lock are dropped by
- * end_renaming().
+ * @rd.new_dentry and an extra ref is taken on @rd.old_parent.
+ * These references and the lock are dropped by end_renaming().
*
* The passed in qstrs must have the hash calculated, and no permission
* checking is performed.
@@ -3750,6 +3750,7 @@ __start_renaming(struct renamedata *rd, int lookup_flags,
rd->old_dentry = d1;
rd->new_dentry = d2;
+ dget(rd->old_parent);
return 0;
out_unlock_3:
@@ -3765,7 +3766,7 @@ __start_renaming(struct renamedata *rd, int lookup_flags,
/**
* start_renaming - lookup and lock names for rename with permission checking
- * @rd: rename data containing parent and flags, and
+ * @rd: rename data containing parents and flags, and
* for receiving found dentries
* @lookup_flags: extra flags to pass to ->lookup (e.g. LOOKUP_REVAL,
* LOOKUP_NO_SYMLINKS etc).
@@ -3776,8 +3777,8 @@ __start_renaming(struct renamedata *rd, int lookup_flags,
* rename.
*
* On success the found dentrys are stored in @rd.old_dentry,
- * @rd.new_dentry. These references and the lock are dropped by
- * end_renaming().
+ * @rd.new_dentry. Also the refcount on @rd->old_parent is increased.
+ * These references and the lock are dropped by end_renaming().
*
* The passed in qstrs need not have the hash calculated, and basic
* eXecute permission checking is performed against @rd.mnt_idmap.
@@ -3799,11 +3800,104 @@ int start_renaming(struct renamedata *rd, int lookup_flags,
}
EXPORT_SYMBOL(start_renaming);
+static int
+__start_renaming_dentry(struct renamedata *rd, int lookup_flags,
+ struct dentry *old_dentry, struct qstr *new_last)
+{
+ struct dentry *trap;
+ struct dentry *d2;
+ int target_flags = LOOKUP_RENAME_TARGET | LOOKUP_CREATE;
+ int err;
+
+ if (rd->flags & RENAME_EXCHANGE)
+ target_flags = 0;
+ if (rd->flags & RENAME_NOREPLACE)
+ target_flags |= LOOKUP_EXCL;
+
+ /* Already have the dentry - need to be sure to lock the correct parent */
+ trap = lock_rename_child(old_dentry, rd->new_parent);
+ if (IS_ERR(trap))
+ return PTR_ERR(trap);
+ if (d_unhashed(old_dentry) ||
+ (rd->old_parent && rd->old_parent != old_dentry->d_parent)) {
+ /* dentry was removed, or moved and explicit parent requested */
+ d2 = ERR_PTR(-EINVAL);
+ goto out_unlock_2;
+ }
+
+ d2 = lookup_one_qstr_excl(new_last, rd->new_parent,
+ lookup_flags | target_flags);
+ if (IS_ERR(d2))
+ goto out_unlock_2;
+
+ if (old_dentry == trap) {
+ /* source is an ancestor of target */
+ err = -EINVAL;
+ goto out_unlock_3;
+ }
+
+ if (d2 == trap) {
+ /* target is an ancestor of source */
+ if (rd->flags & RENAME_EXCHANGE)
+ err = -EINVAL;
+ else
+ err = -ENOTEMPTY;
+ goto out_unlock_3;
+ }
+
+ rd->old_dentry = dget(old_dentry);
+ rd->new_dentry = d2;
+ rd->old_parent = dget(old_dentry->d_parent);
+ return 0;
+
+out_unlock_3:
+ dput(d2);
+ d2 = ERR_PTR(err);
+out_unlock_2:
+ unlock_rename(old_dentry->d_parent, rd->new_parent);
+ return PTR_ERR(d2);
+}
+
+/**
+ * start_renaming_dentry - lookup and lock name for rename with permission checking
+ * @rd: rename data containing parents and flags, and
+ * for receiving found dentries
+ * @lookup_flags: extra flags to pass to ->lookup (e.g. LOOKUP_REVAL,
+ * LOOKUP_NO_SYMLINKS etc).
+ * @old_dentry: dentry of name to move
+ * @new_last: name of target in @rd.new_parent
+ *
+ * Look up target name and ensure locks are in place for
+ * rename.
+ *
+ * On success the found dentry is stored in @rd.new_dentry and
+ * @rd.old_parent is confirmed to be the parent of @old_dentry. If it
+ * was originally %NULL, it is set. In either case a refernence is taken.
+ *
+ * References and the lock can be dropped with end_renaming()
+ *
+ * The passed in qstr need not have the hash calculated, and basic
+ * eXecute permission checking is performed against @rd.mnt_idmap.
+ *
+ * Returns: zero or an error.
+ */
+int start_renaming_dentry(struct renamedata *rd, int lookup_flags,
+ struct dentry *old_dentry, struct qstr *new_last)
+{
+ int err;
+
+ err = lookup_one_common(rd->mnt_idmap, new_last, rd->new_parent);
+ if (err)
+ return err;
+ return __start_renaming_dentry(rd, lookup_flags, old_dentry, new_last);
+}
+
void end_renaming(struct renamedata *rd)
{
unlock_rename(rd->old_parent, rd->new_parent);
dput(rd->old_dentry);
dput(rd->new_dentry);
+ dput(rd->old_parent);
}
EXPORT_SYMBOL(end_renaming);
diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 6a31ea34ff80..3f19548b5d48 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -523,8 +523,8 @@ static int ovl_create_index(struct dentry *dentry, const struct ovl_fh *fh,
{
struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
struct dentry *indexdir = ovl_indexdir(dentry->d_sb);
- struct dentry *index = NULL;
struct dentry *temp = NULL;
+ struct renamedata rd = {};
struct qstr name = { };
int err;
@@ -556,17 +556,15 @@ static int ovl_create_index(struct dentry *dentry, const struct ovl_fh *fh,
if (err)
goto out;
- err = ovl_parent_lock(indexdir, temp);
+ rd.mnt_idmap = ovl_upper_mnt_idmap(ofs);
+ rd.old_parent = indexdir;
+ rd.new_parent = indexdir;
+ err = start_renaming_dentry(&rd, 0, temp, &name);
if (err)
goto out;
- index = ovl_lookup_upper(ofs, name.name, indexdir, name.len);
- if (IS_ERR(index)) {
- err = PTR_ERR(index);
- } else {
- err = ovl_do_rename(ofs, indexdir, temp, indexdir, index, 0);
- dput(index);
- }
- ovl_parent_unlock(indexdir);
+
+ err = ovl_do_rename_rd(&rd);
+ end_renaming(&rd);
out:
if (err)
ovl_cleanup(ofs, indexdir, temp);
@@ -763,7 +761,8 @@ static int ovl_copy_up_workdir(struct ovl_copy_up_ctx *c)
struct ovl_fs *ofs = OVL_FS(c->dentry->d_sb);
struct inode *inode;
struct path path = { .mnt = ovl_upper_mnt(ofs) };
- struct dentry *temp, *upper, *trap;
+ struct renamedata rd = {};
+ struct dentry *temp;
struct ovl_cu_creds cc;
int err;
struct ovl_cattr cattr = {
@@ -807,29 +806,27 @@ static int ovl_copy_up_workdir(struct ovl_copy_up_ctx *c)
* ovl_copy_up_data(), so lock workdir and destdir and make sure that
* temp wasn't moved before copy up completion or cleanup.
*/
- trap = lock_rename(c->workdir, c->destdir);
- if (trap || temp->d_parent != c->workdir) {
+ rd.mnt_idmap = ovl_upper_mnt_idmap(ofs);
+ rd.old_parent = c->workdir;
+ rd.new_parent = c->destdir;
+ rd.flags = 0;
+ err = start_renaming_dentry(&rd, 0, temp,
+ &QSTR_LEN(c->destname.name, c->destname.len));
+ if (err == -EINVAL || err == -EXDEV) {
/* temp or workdir moved underneath us? abort without cleanup */
dput(temp);
err = -EIO;
- if (!IS_ERR(trap))
- unlock_rename(c->workdir, c->destdir);
goto out;
}
-
- err = ovl_copy_up_metadata(c, temp);
if (err)
goto cleanup;
- upper = ovl_lookup_upper(ofs, c->destname.name, c->destdir,
- c->destname.len);
- err = PTR_ERR(upper);
- if (IS_ERR(upper))
+ err = ovl_copy_up_metadata(c, temp);
+ if (err)
goto cleanup;
- err = ovl_do_rename(ofs, c->workdir, temp, c->destdir, upper, 0);
- unlock_rename(c->workdir, c->destdir);
- dput(upper);
+ err = ovl_do_rename_rd(&rd);
+ end_renaming(&rd);
if (err)
goto cleanup_unlocked;
@@ -851,7 +848,7 @@ static int ovl_copy_up_workdir(struct ovl_copy_up_ctx *c)
return err;
cleanup:
- unlock_rename(c->workdir, c->destdir);
+ end_renaming(&rd);
cleanup_unlocked:
ovl_cleanup(ofs, c->workdir, temp);
dput(temp);
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index b37aefe465a2..54423ad00e1c 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -57,8 +57,7 @@ int ovl_cleanup(struct ovl_fs *ofs, struct dentry *workdir,
return 0;
}
-#define OVL_TEMPNAME_SIZE 20
-static void ovl_tempname(char name[OVL_TEMPNAME_SIZE])
+void ovl_tempname(char name[OVL_TEMPNAME_SIZE])
{
static atomic_t temp_id = ATOMIC_INIT(0);
@@ -66,22 +65,6 @@ static void ovl_tempname(char name[OVL_TEMPNAME_SIZE])
snprintf(name, OVL_TEMPNAME_SIZE, "#%x", atomic_inc_return(&temp_id));
}
-struct dentry *ovl_lookup_temp(struct ovl_fs *ofs, struct dentry *workdir)
-{
- struct dentry *temp;
- char name[OVL_TEMPNAME_SIZE];
-
- ovl_tempname(name);
- temp = ovl_lookup_upper(ofs, name, workdir, strlen(name));
- if (!IS_ERR(temp) && temp->d_inode) {
- pr_err("workdir/%s already exists\n", name);
- dput(temp);
- temp = ERR_PTR(-EIO);
- }
-
- return temp;
-}
-
static struct dentry *ovl_start_creating_temp(struct ovl_fs *ofs,
struct dentry *workdir)
{
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 181fc46195f2..a8bc144f9d62 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -454,11 +454,6 @@ static inline bool ovl_open_flags_need_copy_up(int flags)
}
/* util.c */
-int ovl_parent_lock(struct dentry *parent, struct dentry *child);
-static inline void ovl_parent_unlock(struct dentry *parent)
-{
- inode_unlock(parent->d_inode);
-}
int ovl_get_write_access(struct dentry *dentry);
void ovl_put_write_access(struct dentry *dentry);
void ovl_start_write(struct dentry *dentry);
@@ -893,7 +888,8 @@ struct dentry *ovl_create_real(struct ovl_fs *ofs,
struct dentry *parent, struct dentry *newdentry,
struct ovl_cattr *attr);
int ovl_cleanup(struct ovl_fs *ofs, struct dentry *workdir, struct dentry *dentry);
-struct dentry *ovl_lookup_temp(struct ovl_fs *ofs, struct dentry *workdir);
+#define OVL_TEMPNAME_SIZE 20
+void ovl_tempname(char name[OVL_TEMPNAME_SIZE]);
struct dentry *ovl_create_temp(struct ovl_fs *ofs, struct dentry *workdir,
struct ovl_cattr *attr);
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 67abb62e205b..1af489272d10 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -559,6 +559,8 @@ static int ovl_check_rename_whiteout(struct ovl_fs *ofs)
struct dentry *dest;
struct dentry *whiteout;
struct name_snapshot name;
+ struct renamedata rd = {};
+ char name2[OVL_TEMPNAME_SIZE];
int err;
temp = ovl_create_temp(ofs, workdir, OVL_CATTR(S_IFREG | 0));
@@ -566,23 +568,21 @@ static int ovl_check_rename_whiteout(struct ovl_fs *ofs)
if (IS_ERR(temp))
return err;
- err = ovl_parent_lock(workdir, temp);
+ rd.mnt_idmap = ovl_upper_mnt_idmap(ofs);
+ rd.old_parent = workdir;
+ rd.new_parent = workdir;
+ rd.flags = RENAME_WHITEOUT;
+ ovl_tempname(name2);
+ err = start_renaming_dentry(&rd, 0, temp, &QSTR(name2));
if (err) {
dput(temp);
return err;
}
- dest = ovl_lookup_temp(ofs, workdir);
- err = PTR_ERR(dest);
- if (IS_ERR(dest)) {
- dput(temp);
- ovl_parent_unlock(workdir);
- return err;
- }
/* Name is inline and stable - using snapshot as a copy helper */
take_dentry_name_snapshot(&name, temp);
- err = ovl_do_rename(ofs, workdir, temp, workdir, dest, RENAME_WHITEOUT);
- ovl_parent_unlock(workdir);
+ err = ovl_do_rename_rd(&rd);
+ end_renaming(&rd);
if (err) {
if (err == -EINVAL)
err = 0;
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 41033bac96cb..bfe44eba903f 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -1548,14 +1548,3 @@ void ovl_copyattr(struct inode *inode)
i_size_write(inode, i_size_read(realinode));
spin_unlock(&inode->i_lock);
}
-
-int ovl_parent_lock(struct dentry *parent, struct dentry *child)
-{
- inode_lock_nested(parent->d_inode, I_MUTEX_PARENT);
- if (!child ||
- (!d_unhashed(child) && child->d_parent == parent))
- return 0;
-
- inode_unlock(parent->d_inode);
- return -EINVAL;
-}
diff --git a/fs/smb/server/vfs.c b/fs/smb/server/vfs.c
index 56b755a05c4e..8961b7e38782 100644
--- a/fs/smb/server/vfs.c
+++ b/fs/smb/server/vfs.c
@@ -662,7 +662,6 @@ int ksmbd_vfs_link(struct ksmbd_work *work, const char *oldname,
int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path,
char *newname, int flags)
{
- struct dentry *old_parent, *new_dentry, *trap;
struct dentry *old_child = old_path->dentry;
struct path new_path;
struct qstr new_last;
@@ -672,7 +671,6 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path,
struct ksmbd_file *parent_fp;
int new_type;
int err, lookup_flags = LOOKUP_NO_SYMLINKS;
- int target_lookup_flags = LOOKUP_RENAME_TARGET | LOOKUP_CREATE;
if (ksmbd_override_fsids(work))
return -ENOMEM;
@@ -683,14 +681,6 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path,
goto revert_fsids;
}
- /*
- * explicitly handle file overwrite case, for compatibility with
- * filesystems that may not support rename flags (e.g: fuse)
- */
- if (flags & RENAME_NOREPLACE)
- target_lookup_flags |= LOOKUP_EXCL;
- flags &= ~(RENAME_NOREPLACE);
-
retry:
err = vfs_path_parent_lookup(to, lookup_flags | LOOKUP_BENEATH,
&new_path, &new_last, &new_type,
@@ -707,17 +697,14 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path,
if (err)
goto out2;
- trap = lock_rename_child(old_child, new_path.dentry);
- if (IS_ERR(trap)) {
- err = PTR_ERR(trap);
+ rd.mnt_idmap = mnt_idmap(old_path->mnt);
+ rd.old_parent = NULL;
+ rd.new_parent = new_path.dentry;
+ rd.flags = flags;
+ rd.delegated_inode = NULL,
+ err = start_renaming_dentry(&rd, lookup_flags, old_child, &new_last);
+ if (err)
goto out_drop_write;
- }
-
- old_parent = dget(old_child->d_parent);
- if (d_unhashed(old_child)) {
- err = -EINVAL;
- goto out3;
- }
parent_fp = ksmbd_lookup_fd_inode(old_child->d_parent);
if (parent_fp) {
@@ -730,44 +717,17 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path,
ksmbd_fd_put(work, parent_fp);
}
- new_dentry = lookup_one_qstr_excl(&new_last, new_path.dentry,
- lookup_flags | target_lookup_flags);
- if (IS_ERR(new_dentry)) {
- err = PTR_ERR(new_dentry);
- goto out3;
- }
-
- if (d_is_symlink(new_dentry)) {
+ if (d_is_symlink(rd.new_dentry)) {
err = -EACCES;
- goto out4;
- }
-
- if (old_child == trap) {
- err = -EINVAL;
- goto out4;
- }
-
- if (new_dentry == trap) {
- err = -ENOTEMPTY;
- goto out4;
+ goto out3;
}
- rd.mnt_idmap = mnt_idmap(old_path->mnt),
- rd.old_parent = old_parent,
- rd.old_dentry = old_child,
- rd.new_parent = new_path.dentry,
- rd.new_dentry = new_dentry,
- rd.flags = flags,
- rd.delegated_inode = NULL,
err = vfs_rename(&rd);
if (err)
ksmbd_debug(VFS, "vfs_rename failed err %d\n", err);
-out4:
- dput(new_dentry);
out3:
- dput(old_parent);
- unlock_rename(old_parent, new_path.dentry);
+ end_renaming(&rd);
out_drop_write:
mnt_drop_write(old_path->mnt);
out2:
diff --git a/include/linux/namei.h b/include/linux/namei.h
index b65ed6e1e91a..ada0f6cc38bc 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -142,6 +142,8 @@ extern struct dentry *lock_rename_child(struct dentry *, struct dentry *);
extern void unlock_rename(struct dentry *, struct dentry *);
int start_renaming(struct renamedata *rd, int lookup_flags,
struct qstr *old_last, struct qstr *new_last);
+int start_renaming_dentry(struct renamedata *rd, int lookup_flags,
+ struct dentry *old_dentry, struct qstr *new_last);
void end_renaming(struct renamedata *rd);
/**
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 49+ messages in thread* Re: [PATCH 09/11] VFS/ovl/smb: introduce start_renaming_dentry()
2025-09-26 2:49 ` [PATCH 09/11] VFS/ovl/smb: introduce start_renaming_dentry() NeilBrown
@ 2025-09-26 15:43 ` kernel test robot
2025-09-26 17:17 ` kernel test robot
2025-09-30 7:08 ` Amir Goldstein
2 siblings, 0 replies; 49+ messages in thread
From: kernel test robot @ 2025-09-26 15:43 UTC (permalink / raw)
To: NeilBrown, Alexander Viro, Christian Brauner, Amir Goldstein,
Jeff Layton
Cc: llvm, oe-kbuild-all, Jan Kara, linux-fsdevel
Hi NeilBrown,
kernel test robot noticed the following build warnings:
[auto build test WARNING on brauner-vfs/vfs.all]
[also build test WARNING on next-20250925]
[cannot apply to driver-core/driver-core-testing driver-core/driver-core-next driver-core/driver-core-linus viro-vfs/for-next linus/master v6.17-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/NeilBrown/debugfs-rename-end_creating-to-debugfs_end_creating/20250926-105302
base: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git vfs.all
patch link: https://lore.kernel.org/r/20250926025015.1747294-10-neilb%40ownmail.net
patch subject: [PATCH 09/11] VFS/ovl/smb: introduce start_renaming_dentry()
config: x86_64-buildonly-randconfig-001-20250926 (https://download.01.org/0day-ci/archive/20250926/202509262345.LLJy17UN-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250926/202509262345.LLJy17UN-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202509262345.LLJy17UN-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> fs/overlayfs/super.c:609:7: warning: variable 'dest' is uninitialized when used here [-Wuninitialized]
609 | dput(dest);
| ^~~~
fs/overlayfs/super.c:559:21: note: initialize the variable 'dest' to silence this warning
559 | struct dentry *dest;
| ^
| = NULL
1 warning generated.
vim +/dest +609 fs/overlayfs/super.c
6ee8acf0f72b89 Miklos Szeredi 2017-11-09 550
cad218ab332078 Amir Goldstein 2020-02-20 551 /*
cad218ab332078 Amir Goldstein 2020-02-20 552 * Returns 1 if RENAME_WHITEOUT is supported, 0 if not supported and
cad218ab332078 Amir Goldstein 2020-02-20 553 * negative values if error is encountered.
cad218ab332078 Amir Goldstein 2020-02-20 554 */
576bb263450bbb Christian Brauner 2022-04-04 555 static int ovl_check_rename_whiteout(struct ovl_fs *ofs)
cad218ab332078 Amir Goldstein 2020-02-20 556 {
576bb263450bbb Christian Brauner 2022-04-04 557 struct dentry *workdir = ofs->workdir;
cad218ab332078 Amir Goldstein 2020-02-20 558 struct dentry *temp;
cad218ab332078 Amir Goldstein 2020-02-20 559 struct dentry *dest;
cad218ab332078 Amir Goldstein 2020-02-20 560 struct dentry *whiteout;
cad218ab332078 Amir Goldstein 2020-02-20 561 struct name_snapshot name;
18cc48cabac8b0 NeilBrown 2025-09-26 562 struct renamedata rd = {};
18cc48cabac8b0 NeilBrown 2025-09-26 563 char name2[OVL_TEMPNAME_SIZE];
cad218ab332078 Amir Goldstein 2020-02-20 564 int err;
cad218ab332078 Amir Goldstein 2020-02-20 565
576bb263450bbb Christian Brauner 2022-04-04 566 temp = ovl_create_temp(ofs, workdir, OVL_CATTR(S_IFREG | 0));
cad218ab332078 Amir Goldstein 2020-02-20 567 err = PTR_ERR(temp);
cad218ab332078 Amir Goldstein 2020-02-20 568 if (IS_ERR(temp))
d2c995581c7c5d NeilBrown 2025-07-16 569 return err;
cad218ab332078 Amir Goldstein 2020-02-20 570
18cc48cabac8b0 NeilBrown 2025-09-26 571 rd.mnt_idmap = ovl_upper_mnt_idmap(ofs);
18cc48cabac8b0 NeilBrown 2025-09-26 572 rd.old_parent = workdir;
18cc48cabac8b0 NeilBrown 2025-09-26 573 rd.new_parent = workdir;
18cc48cabac8b0 NeilBrown 2025-09-26 574 rd.flags = RENAME_WHITEOUT;
18cc48cabac8b0 NeilBrown 2025-09-26 575 ovl_tempname(name2);
18cc48cabac8b0 NeilBrown 2025-09-26 576 err = start_renaming_dentry(&rd, 0, temp, &QSTR(name2));
d2c995581c7c5d NeilBrown 2025-07-16 577 if (err) {
d2c995581c7c5d NeilBrown 2025-07-16 578 dput(temp);
d2c995581c7c5d NeilBrown 2025-07-16 579 return err;
d2c995581c7c5d NeilBrown 2025-07-16 580 }
cad218ab332078 Amir Goldstein 2020-02-20 581
cad218ab332078 Amir Goldstein 2020-02-20 582 /* Name is inline and stable - using snapshot as a copy helper */
cad218ab332078 Amir Goldstein 2020-02-20 583 take_dentry_name_snapshot(&name, temp);
18cc48cabac8b0 NeilBrown 2025-09-26 584 err = ovl_do_rename_rd(&rd);
18cc48cabac8b0 NeilBrown 2025-09-26 585 end_renaming(&rd);
cad218ab332078 Amir Goldstein 2020-02-20 586 if (err) {
cad218ab332078 Amir Goldstein 2020-02-20 587 if (err == -EINVAL)
cad218ab332078 Amir Goldstein 2020-02-20 588 err = 0;
cad218ab332078 Amir Goldstein 2020-02-20 589 goto cleanup_temp;
cad218ab332078 Amir Goldstein 2020-02-20 590 }
cad218ab332078 Amir Goldstein 2020-02-20 591
09d56cc88c2470 NeilBrown 2025-07-16 592 whiteout = ovl_lookup_upper_unlocked(ofs, name.name.name,
09d56cc88c2470 NeilBrown 2025-07-16 593 workdir, name.name.len);
cad218ab332078 Amir Goldstein 2020-02-20 594 err = PTR_ERR(whiteout);
cad218ab332078 Amir Goldstein 2020-02-20 595 if (IS_ERR(whiteout))
cad218ab332078 Amir Goldstein 2020-02-20 596 goto cleanup_temp;
cad218ab332078 Amir Goldstein 2020-02-20 597
bc8df7a3dc0359 Alexander Larsson 2023-08-23 598 err = ovl_upper_is_whiteout(ofs, whiteout);
cad218ab332078 Amir Goldstein 2020-02-20 599
cad218ab332078 Amir Goldstein 2020-02-20 600 /* Best effort cleanup of whiteout and temp file */
cad218ab332078 Amir Goldstein 2020-02-20 601 if (err)
fe4d3360f9cbb5 NeilBrown 2025-07-16 602 ovl_cleanup(ofs, workdir, whiteout);
cad218ab332078 Amir Goldstein 2020-02-20 603 dput(whiteout);
cad218ab332078 Amir Goldstein 2020-02-20 604
cad218ab332078 Amir Goldstein 2020-02-20 605 cleanup_temp:
fe4d3360f9cbb5 NeilBrown 2025-07-16 606 ovl_cleanup(ofs, workdir, temp);
cad218ab332078 Amir Goldstein 2020-02-20 607 release_dentry_name_snapshot(&name);
cad218ab332078 Amir Goldstein 2020-02-20 608 dput(temp);
cad218ab332078 Amir Goldstein 2020-02-20 @609 dput(dest);
cad218ab332078 Amir Goldstein 2020-02-20 610
cad218ab332078 Amir Goldstein 2020-02-20 611 return err;
cad218ab332078 Amir Goldstein 2020-02-20 612 }
cad218ab332078 Amir Goldstein 2020-02-20 613
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH 09/11] VFS/ovl/smb: introduce start_renaming_dentry()
2025-09-26 2:49 ` [PATCH 09/11] VFS/ovl/smb: introduce start_renaming_dentry() NeilBrown
2025-09-26 15:43 ` kernel test robot
@ 2025-09-26 17:17 ` kernel test robot
2025-09-30 7:08 ` Amir Goldstein
2 siblings, 0 replies; 49+ messages in thread
From: kernel test robot @ 2025-09-26 17:17 UTC (permalink / raw)
To: NeilBrown, Alexander Viro, Christian Brauner, Amir Goldstein,
Jeff Layton
Cc: oe-kbuild-all, Jan Kara, linux-fsdevel
Hi NeilBrown,
kernel test robot noticed the following build errors:
[auto build test ERROR on brauner-vfs/vfs.all]
[also build test ERROR on next-20250926]
[cannot apply to driver-core/driver-core-testing driver-core/driver-core-next driver-core/driver-core-linus viro-vfs/for-next linus/master v6.17-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/NeilBrown/debugfs-rename-end_creating-to-debugfs_end_creating/20250926-105302
base: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git vfs.all
patch link: https://lore.kernel.org/r/20250926025015.1747294-10-neilb%40ownmail.net
patch subject: [PATCH 09/11] VFS/ovl/smb: introduce start_renaming_dentry()
config: x86_64-buildonly-randconfig-005-20250926 (https://download.01.org/0day-ci/archive/20250927/202509270016.YwqJ8gSc-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250927/202509270016.YwqJ8gSc-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202509270016.YwqJ8gSc-lkp@intel.com/
All errors (new ones prefixed by >>, old ones prefixed by <<):
>> ERROR: modpost: "start_renaming_dentry" [fs/overlayfs/overlay.ko] undefined!
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 09/11] VFS/ovl/smb: introduce start_renaming_dentry()
2025-09-26 2:49 ` [PATCH 09/11] VFS/ovl/smb: introduce start_renaming_dentry() NeilBrown
2025-09-26 15:43 ` kernel test robot
2025-09-26 17:17 ` kernel test robot
@ 2025-09-30 7:08 ` Amir Goldstein
2025-10-01 1:45 ` NeilBrown
2025-10-01 4:35 ` NeilBrown
2 siblings, 2 replies; 49+ messages in thread
From: Amir Goldstein @ 2025-09-30 7:08 UTC (permalink / raw)
To: NeilBrown
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Fri, Sep 26, 2025 at 4:51 AM NeilBrown <neilb@ownmail.net> wrote:
>
> From: NeilBrown <neil@brown.name>
>
> Several callers perform a rename on a dentry they already have, and only
> require lookup for the target name. This includes smb/server and a few
> different places in overlayfs.
>
> start_renaming_dentry() performs the required lookup and takes the
> required lock using lock_rename_child()
>
> It is used in three places in overlayfs and in ksmbd_vfs_rename().
>
> In the ksmbd case, the parent of the source is not important - the
> source must be renamed from wherever it is. So start_renaming_dentry()
> allows rd->old_parent to be NULL and only checks it if it is non-NULL.
> On success rd->old_parent will be the parent of old_dentry with an extra
> reference taken.
It is not clear to me why you need to take that extra ref.
It looks very unnatural for start_renaming/end_renaming
to take ref on old_parent and not on new_parent.
If ksmbd needs old_parent it can use old->d_parent after
the start_renaming_dentry() it should be stable. right?
So what's the point of taking this extra ref?
> Other start_renaming function also now take the extra
> reference and end_renaming() now drops this reference as well.
>
> ovl_lookup_temp(), ovl_parent_lock(), and ovl_parent_unlock() are
> all removed as they are no longer needed.
>
> OVL_TEMPNAME_SIZE and ovl_tempname() are now declared in overlayfs.h so
> that ovl_check_rename_whiteout() can access them.
>
> Signed-off-by: NeilBrown <neil@brown.name>
> ---
> fs/namei.c | 106 ++++++++++++++++++++++++++++++++++++---
> fs/overlayfs/copy_up.c | 47 ++++++++---------
> fs/overlayfs/dir.c | 19 +------
> fs/overlayfs/overlayfs.h | 8 +--
> fs/overlayfs/super.c | 20 ++++----
> fs/overlayfs/util.c | 11 ----
> fs/smb/server/vfs.c | 60 ++++------------------
> include/linux/namei.h | 2 +
> 8 files changed, 147 insertions(+), 126 deletions(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 79a8b3b47e4d..aca6de83d255 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3686,7 +3686,7 @@ EXPORT_SYMBOL(unlock_rename);
>
> /**
> * __start_renaming - lookup and lock names for rename
> - * @rd: rename data containing parent and flags, and
> + * @rd: rename data containing parents and flags, and
> * for receiving found dentries
> * @lookup_flags: extra flags to pass to ->lookup (e.g. LOOKUP_REVAL,
> * LOOKUP_NO_SYMLINKS etc).
> @@ -3697,8 +3697,8 @@ EXPORT_SYMBOL(unlock_rename);
> * rename.
> *
> * On success the found dentrys are stored in @rd.old_dentry,
> - * @rd.new_dentry. These references and the lock are dropped by
> - * end_renaming().
> + * @rd.new_dentry and an extra ref is taken on @rd.old_parent.
> + * These references and the lock are dropped by end_renaming().
> *
> * The passed in qstrs must have the hash calculated, and no permission
> * checking is performed.
> @@ -3750,6 +3750,7 @@ __start_renaming(struct renamedata *rd, int lookup_flags,
>
> rd->old_dentry = d1;
> rd->new_dentry = d2;
> + dget(rd->old_parent);
> return 0;
>
> out_unlock_3:
> @@ -3765,7 +3766,7 @@ __start_renaming(struct renamedata *rd, int lookup_flags,
>
> /**
> * start_renaming - lookup and lock names for rename with permission checking
> - * @rd: rename data containing parent and flags, and
> + * @rd: rename data containing parents and flags, and
> * for receiving found dentries
> * @lookup_flags: extra flags to pass to ->lookup (e.g. LOOKUP_REVAL,
> * LOOKUP_NO_SYMLINKS etc).
> @@ -3776,8 +3777,8 @@ __start_renaming(struct renamedata *rd, int lookup_flags,
> * rename.
> *
> * On success the found dentrys are stored in @rd.old_dentry,
> - * @rd.new_dentry. These references and the lock are dropped by
> - * end_renaming().
> + * @rd.new_dentry. Also the refcount on @rd->old_parent is increased.
> + * These references and the lock are dropped by end_renaming().
> *
> * The passed in qstrs need not have the hash calculated, and basic
> * eXecute permission checking is performed against @rd.mnt_idmap.
> @@ -3799,11 +3800,104 @@ int start_renaming(struct renamedata *rd, int lookup_flags,
> }
> EXPORT_SYMBOL(start_renaming);
>
> +static int
> +__start_renaming_dentry(struct renamedata *rd, int lookup_flags,
> + struct dentry *old_dentry, struct qstr *new_last)
> +{
> + struct dentry *trap;
> + struct dentry *d2;
> + int target_flags = LOOKUP_RENAME_TARGET | LOOKUP_CREATE;
> + int err;
> +
> + if (rd->flags & RENAME_EXCHANGE)
> + target_flags = 0;
> + if (rd->flags & RENAME_NOREPLACE)
> + target_flags |= LOOKUP_EXCL;
> +
> + /* Already have the dentry - need to be sure to lock the correct parent */
> + trap = lock_rename_child(old_dentry, rd->new_parent);
> + if (IS_ERR(trap))
> + return PTR_ERR(trap);
> + if (d_unhashed(old_dentry) ||
> + (rd->old_parent && rd->old_parent != old_dentry->d_parent)) {
> + /* dentry was removed, or moved and explicit parent requested */
> + d2 = ERR_PTR(-EINVAL);
> + goto out_unlock_2;
> + }
> +
> + d2 = lookup_one_qstr_excl(new_last, rd->new_parent,
> + lookup_flags | target_flags);
> + if (IS_ERR(d2))
> + goto out_unlock_2;
> +
> + if (old_dentry == trap) {
> + /* source is an ancestor of target */
> + err = -EINVAL;
> + goto out_unlock_3;
> + }
> +
> + if (d2 == trap) {
> + /* target is an ancestor of source */
> + if (rd->flags & RENAME_EXCHANGE)
> + err = -EINVAL;
> + else
> + err = -ENOTEMPTY;
> + goto out_unlock_3;
> + }
> +
> + rd->old_dentry = dget(old_dentry);
> + rd->new_dentry = d2;
> + rd->old_parent = dget(old_dentry->d_parent);
> + return 0;
> +
> +out_unlock_3:
> + dput(d2);
> + d2 = ERR_PTR(err);
> +out_unlock_2:
> + unlock_rename(old_dentry->d_parent, rd->new_parent);
> + return PTR_ERR(d2);
Please assign err before goto and simplify:
out_dput:
dput(d2);
out_unlock:
unlock_rename(old_dentry->d_parent, rd->new_parent);
return err;
> +}
> +
> +/**
> + * start_renaming_dentry - lookup and lock name for rename with permission checking
> + * @rd: rename data containing parents and flags, and
> + * for receiving found dentries
> + * @lookup_flags: extra flags to pass to ->lookup (e.g. LOOKUP_REVAL,
> + * LOOKUP_NO_SYMLINKS etc).
> + * @old_dentry: dentry of name to move
> + * @new_last: name of target in @rd.new_parent
> + *
> + * Look up target name and ensure locks are in place for
> + * rename.
> + *
> + * On success the found dentry is stored in @rd.new_dentry and
> + * @rd.old_parent is confirmed to be the parent of @old_dentry. If it
> + * was originally %NULL, it is set. In either case a refernence is taken.
Typo: %NULL, typo: refernence
> + *
> + * References and the lock can be dropped with end_renaming()
> + *
> + * The passed in qstr need not have the hash calculated, and basic
> + * eXecute permission checking is performed against @rd.mnt_idmap.
> + *
> + * Returns: zero or an error.
> + */
> +int start_renaming_dentry(struct renamedata *rd, int lookup_flags,
> + struct dentry *old_dentry, struct qstr *new_last)
> +{
> + int err;
> +
> + err = lookup_one_common(rd->mnt_idmap, new_last, rd->new_parent);
> + if (err)
> + return err;
> + return __start_renaming_dentry(rd, lookup_flags, old_dentry, new_last);
> +}
> +
> void end_renaming(struct renamedata *rd)
> {
> unlock_rename(rd->old_parent, rd->new_parent);
> dput(rd->old_dentry);
> dput(rd->new_dentry);
> + dput(rd->old_parent);
> }
> EXPORT_SYMBOL(end_renaming);
>
> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> index 6a31ea34ff80..3f19548b5d48 100644
> --- a/fs/overlayfs/copy_up.c
> +++ b/fs/overlayfs/copy_up.c
> @@ -523,8 +523,8 @@ static int ovl_create_index(struct dentry *dentry, const struct ovl_fh *fh,
> {
> struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
> struct dentry *indexdir = ovl_indexdir(dentry->d_sb);
> - struct dentry *index = NULL;
> struct dentry *temp = NULL;
> + struct renamedata rd = {};
> struct qstr name = { };
> int err;
>
> @@ -556,17 +556,15 @@ static int ovl_create_index(struct dentry *dentry, const struct ovl_fh *fh,
> if (err)
> goto out;
>
> - err = ovl_parent_lock(indexdir, temp);
> + rd.mnt_idmap = ovl_upper_mnt_idmap(ofs);
> + rd.old_parent = indexdir;
> + rd.new_parent = indexdir;
> + err = start_renaming_dentry(&rd, 0, temp, &name);
> if (err)
> goto out;
> - index = ovl_lookup_upper(ofs, name.name, indexdir, name.len);
> - if (IS_ERR(index)) {
> - err = PTR_ERR(index);
> - } else {
> - err = ovl_do_rename(ofs, indexdir, temp, indexdir, index, 0);
> - dput(index);
> - }
> - ovl_parent_unlock(indexdir);
> +
> + err = ovl_do_rename_rd(&rd);
> + end_renaming(&rd);
> out:
> if (err)
> ovl_cleanup(ofs, indexdir, temp);
> @@ -763,7 +761,8 @@ static int ovl_copy_up_workdir(struct ovl_copy_up_ctx *c)
> struct ovl_fs *ofs = OVL_FS(c->dentry->d_sb);
> struct inode *inode;
> struct path path = { .mnt = ovl_upper_mnt(ofs) };
> - struct dentry *temp, *upper, *trap;
> + struct renamedata rd = {};
> + struct dentry *temp;
> struct ovl_cu_creds cc;
> int err;
> struct ovl_cattr cattr = {
> @@ -807,29 +806,27 @@ static int ovl_copy_up_workdir(struct ovl_copy_up_ctx *c)
> * ovl_copy_up_data(), so lock workdir and destdir and make sure that
> * temp wasn't moved before copy up completion or cleanup.
> */
> - trap = lock_rename(c->workdir, c->destdir);
> - if (trap || temp->d_parent != c->workdir) {
> + rd.mnt_idmap = ovl_upper_mnt_idmap(ofs);
> + rd.old_parent = c->workdir;
> + rd.new_parent = c->destdir;
> + rd.flags = 0;
> + err = start_renaming_dentry(&rd, 0, temp,
> + &QSTR_LEN(c->destname.name, c->destname.len));
> + if (err == -EINVAL || err == -EXDEV) {
This error code whitelist is not needed and is too fragile anyway.
After your commit
9d23967b18c64 ("ovl: simplify an error path in ovl_copy_up_workdir()")
any locking error is treated the same - it does not matter what the
reason for lock_rename() or start_renaming_dentry() is.
> /* temp or workdir moved underneath us? abort without cleanup */
> dput(temp);
> err = -EIO;
> - if (!IS_ERR(trap))
> - unlock_rename(c->workdir, c->destdir);
> goto out;
> }
Frankly, we could get rid of the "abort without cleanup"
comment and instead: err = -EIO; goto cleanup_unlocked;
because before cleanup_unlocked, cleanup was relying on the
lock_rename() to take the lock for the cleanup, but we don't need
that anymore.
To be clear, I don't think is it important to goto cleanup_unlocked,
leaving goto out is fine because we are not very sympathetic
to changes to underlying layers while ovl is mounted, so we should
not really care about this cleanup, but for the sake of simpler code
I wouldn't mind the goto cleanup_unlocked.
> -
> - err = ovl_copy_up_metadata(c, temp);
> if (err)
> goto cleanup;
Is this right? should we be calling end_renaming() on error?
Thanks,
Amir.
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH 09/11] VFS/ovl/smb: introduce start_renaming_dentry()
2025-09-30 7:08 ` Amir Goldstein
@ 2025-10-01 1:45 ` NeilBrown
2025-10-02 10:56 ` Amir Goldstein
2025-10-01 4:35 ` NeilBrown
1 sibling, 1 reply; 49+ messages in thread
From: NeilBrown @ 2025-10-01 1:45 UTC (permalink / raw)
To: Amir Goldstein
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Tue, 30 Sep 2025, Amir Goldstein wrote:
> On Fri, Sep 26, 2025 at 4:51 AM NeilBrown <neilb@ownmail.net> wrote:
> >
> > From: NeilBrown <neil@brown.name>
> >
> > Several callers perform a rename on a dentry they already have, and only
> > require lookup for the target name. This includes smb/server and a few
> > different places in overlayfs.
> >
> > start_renaming_dentry() performs the required lookup and takes the
> > required lock using lock_rename_child()
> >
> > It is used in three places in overlayfs and in ksmbd_vfs_rename().
> >
> > In the ksmbd case, the parent of the source is not important - the
> > source must be renamed from wherever it is. So start_renaming_dentry()
> > allows rd->old_parent to be NULL and only checks it if it is non-NULL.
> > On success rd->old_parent will be the parent of old_dentry with an extra
> > reference taken.
>
> It is not clear to me why you need to take that extra ref.
> It looks very unnatural for start_renaming/end_renaming
> to take ref on old_parent and not on new_parent.
There is an important difference between old_parent and new_parent.
After the rename, new_parent will still be valid as we will hold a
reference through whichever child is in that parent.
However we might not still have a reference that keeps old_parent valid,
unless we take one ourselves.
>
> If ksmbd needs old_parent it can use old->d_parent after
> the start_renaming_dentry() it should be stable. right?
> So what's the point of taking this extra ref?
It is not that ksmbd might need old_parent, it is that end_renaming()
needs old_parent so it can be unlocked. If we don't explicitly take a
reference, then we cannot be sure that the reference that was found in
start_renaming_dentry() is still valid after vfs_rename() has moved the
old_dentry out of it.
>
> > Other start_renaming function also now take the extra
> > reference and end_renaming() now drops this reference as well.
> >
> > ovl_lookup_temp(), ovl_parent_lock(), and ovl_parent_unlock() are
> > all removed as they are no longer needed.
> >
> > OVL_TEMPNAME_SIZE and ovl_tempname() are now declared in overlayfs.h so
> > that ovl_check_rename_whiteout() can access them.
> >
> > Signed-off-by: NeilBrown <neil@brown.name>
> > ---
> > fs/namei.c | 106 ++++++++++++++++++++++++++++++++++++---
> > fs/overlayfs/copy_up.c | 47 ++++++++---------
> > fs/overlayfs/dir.c | 19 +------
> > fs/overlayfs/overlayfs.h | 8 +--
> > fs/overlayfs/super.c | 20 ++++----
> > fs/overlayfs/util.c | 11 ----
> > fs/smb/server/vfs.c | 60 ++++------------------
> > include/linux/namei.h | 2 +
> > 8 files changed, 147 insertions(+), 126 deletions(-)
> >
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 79a8b3b47e4d..aca6de83d255 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -3686,7 +3686,7 @@ EXPORT_SYMBOL(unlock_rename);
> >
> > /**
> > * __start_renaming - lookup and lock names for rename
> > - * @rd: rename data containing parent and flags, and
> > + * @rd: rename data containing parents and flags, and
> > * for receiving found dentries
> > * @lookup_flags: extra flags to pass to ->lookup (e.g. LOOKUP_REVAL,
> > * LOOKUP_NO_SYMLINKS etc).
> > @@ -3697,8 +3697,8 @@ EXPORT_SYMBOL(unlock_rename);
> > * rename.
> > *
> > * On success the found dentrys are stored in @rd.old_dentry,
> > - * @rd.new_dentry. These references and the lock are dropped by
> > - * end_renaming().
> > + * @rd.new_dentry and an extra ref is taken on @rd.old_parent.
> > + * These references and the lock are dropped by end_renaming().
> > *
> > * The passed in qstrs must have the hash calculated, and no permission
> > * checking is performed.
> > @@ -3750,6 +3750,7 @@ __start_renaming(struct renamedata *rd, int lookup_flags,
> >
> > rd->old_dentry = d1;
> > rd->new_dentry = d2;
> > + dget(rd->old_parent);
> > return 0;
> >
> > out_unlock_3:
> > @@ -3765,7 +3766,7 @@ __start_renaming(struct renamedata *rd, int lookup_flags,
> >
> > /**
> > * start_renaming - lookup and lock names for rename with permission checking
> > - * @rd: rename data containing parent and flags, and
> > + * @rd: rename data containing parents and flags, and
> > * for receiving found dentries
> > * @lookup_flags: extra flags to pass to ->lookup (e.g. LOOKUP_REVAL,
> > * LOOKUP_NO_SYMLINKS etc).
> > @@ -3776,8 +3777,8 @@ __start_renaming(struct renamedata *rd, int lookup_flags,
> > * rename.
> > *
> > * On success the found dentrys are stored in @rd.old_dentry,
> > - * @rd.new_dentry. These references and the lock are dropped by
> > - * end_renaming().
> > + * @rd.new_dentry. Also the refcount on @rd->old_parent is increased.
> > + * These references and the lock are dropped by end_renaming().
> > *
> > * The passed in qstrs need not have the hash calculated, and basic
> > * eXecute permission checking is performed against @rd.mnt_idmap.
> > @@ -3799,11 +3800,104 @@ int start_renaming(struct renamedata *rd, int lookup_flags,
> > }
> > EXPORT_SYMBOL(start_renaming);
> >
> > +static int
> > +__start_renaming_dentry(struct renamedata *rd, int lookup_flags,
> > + struct dentry *old_dentry, struct qstr *new_last)
> > +{
> > + struct dentry *trap;
> > + struct dentry *d2;
> > + int target_flags = LOOKUP_RENAME_TARGET | LOOKUP_CREATE;
> > + int err;
> > +
> > + if (rd->flags & RENAME_EXCHANGE)
> > + target_flags = 0;
> > + if (rd->flags & RENAME_NOREPLACE)
> > + target_flags |= LOOKUP_EXCL;
> > +
> > + /* Already have the dentry - need to be sure to lock the correct parent */
> > + trap = lock_rename_child(old_dentry, rd->new_parent);
> > + if (IS_ERR(trap))
> > + return PTR_ERR(trap);
> > + if (d_unhashed(old_dentry) ||
> > + (rd->old_parent && rd->old_parent != old_dentry->d_parent)) {
> > + /* dentry was removed, or moved and explicit parent requested */
> > + d2 = ERR_PTR(-EINVAL);
> > + goto out_unlock_2;
> > + }
> > +
> > + d2 = lookup_one_qstr_excl(new_last, rd->new_parent,
> > + lookup_flags | target_flags);
> > + if (IS_ERR(d2))
> > + goto out_unlock_2;
> > +
> > + if (old_dentry == trap) {
> > + /* source is an ancestor of target */
> > + err = -EINVAL;
> > + goto out_unlock_3;
> > + }
> > +
> > + if (d2 == trap) {
> > + /* target is an ancestor of source */
> > + if (rd->flags & RENAME_EXCHANGE)
> > + err = -EINVAL;
> > + else
> > + err = -ENOTEMPTY;
> > + goto out_unlock_3;
> > + }
> > +
> > + rd->old_dentry = dget(old_dentry);
> > + rd->new_dentry = d2;
> > + rd->old_parent = dget(old_dentry->d_parent);
> > + return 0;
> > +
> > +out_unlock_3:
> > + dput(d2);
> > + d2 = ERR_PTR(err);
> > +out_unlock_2:
> > + unlock_rename(old_dentry->d_parent, rd->new_parent);
> > + return PTR_ERR(d2);
>
> Please assign err before goto and simplify:
>
> out_dput:
> dput(d2);
> out_unlock:
> unlock_rename(old_dentry->d_parent, rd->new_parent);
> return err;
I'll try that change and see if I like the result.
>
> > +}
> > +
> > +/**
> > + * start_renaming_dentry - lookup and lock name for rename with permission checking
> > + * @rd: rename data containing parents and flags, and
> > + * for receiving found dentries
> > + * @lookup_flags: extra flags to pass to ->lookup (e.g. LOOKUP_REVAL,
> > + * LOOKUP_NO_SYMLINKS etc).
> > + * @old_dentry: dentry of name to move
> > + * @new_last: name of target in @rd.new_parent
> > + *
> > + * Look up target name and ensure locks are in place for
> > + * rename.
> > + *
> > + * On success the found dentry is stored in @rd.new_dentry and
> > + * @rd.old_parent is confirmed to be the parent of @old_dentry. If it
> > + * was originally %NULL, it is set. In either case a refernence is taken.
>
> Typo: %NULL, typo: refernence
>
> > + *
> > + * References and the lock can be dropped with end_renaming()
> > + *
> > + * The passed in qstr need not have the hash calculated, and basic
> > + * eXecute permission checking is performed against @rd.mnt_idmap.
> > + *
> > + * Returns: zero or an error.
> > + */
> > +int start_renaming_dentry(struct renamedata *rd, int lookup_flags,
> > + struct dentry *old_dentry, struct qstr *new_last)
> > +{
> > + int err;
> > +
> > + err = lookup_one_common(rd->mnt_idmap, new_last, rd->new_parent);
> > + if (err)
> > + return err;
> > + return __start_renaming_dentry(rd, lookup_flags, old_dentry, new_last);
> > +}
> > +
> > void end_renaming(struct renamedata *rd)
> > {
> > unlock_rename(rd->old_parent, rd->new_parent);
> > dput(rd->old_dentry);
> > dput(rd->new_dentry);
> > + dput(rd->old_parent);
> > }
> > EXPORT_SYMBOL(end_renaming);
> >
> > diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> > index 6a31ea34ff80..3f19548b5d48 100644
> > --- a/fs/overlayfs/copy_up.c
> > +++ b/fs/overlayfs/copy_up.c
> > @@ -523,8 +523,8 @@ static int ovl_create_index(struct dentry *dentry, const struct ovl_fh *fh,
> > {
> > struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
> > struct dentry *indexdir = ovl_indexdir(dentry->d_sb);
> > - struct dentry *index = NULL;
> > struct dentry *temp = NULL;
> > + struct renamedata rd = {};
> > struct qstr name = { };
> > int err;
> >
> > @@ -556,17 +556,15 @@ static int ovl_create_index(struct dentry *dentry, const struct ovl_fh *fh,
> > if (err)
> > goto out;
> >
> > - err = ovl_parent_lock(indexdir, temp);
> > + rd.mnt_idmap = ovl_upper_mnt_idmap(ofs);
> > + rd.old_parent = indexdir;
> > + rd.new_parent = indexdir;
> > + err = start_renaming_dentry(&rd, 0, temp, &name);
> > if (err)
> > goto out;
> > - index = ovl_lookup_upper(ofs, name.name, indexdir, name.len);
> > - if (IS_ERR(index)) {
> > - err = PTR_ERR(index);
> > - } else {
> > - err = ovl_do_rename(ofs, indexdir, temp, indexdir, index, 0);
> > - dput(index);
> > - }
> > - ovl_parent_unlock(indexdir);
> > +
> > + err = ovl_do_rename_rd(&rd);
> > + end_renaming(&rd);
> > out:
> > if (err)
> > ovl_cleanup(ofs, indexdir, temp);
> > @@ -763,7 +761,8 @@ static int ovl_copy_up_workdir(struct ovl_copy_up_ctx *c)
> > struct ovl_fs *ofs = OVL_FS(c->dentry->d_sb);
> > struct inode *inode;
> > struct path path = { .mnt = ovl_upper_mnt(ofs) };
> > - struct dentry *temp, *upper, *trap;
> > + struct renamedata rd = {};
> > + struct dentry *temp;
> > struct ovl_cu_creds cc;
> > int err;
> > struct ovl_cattr cattr = {
> > @@ -807,29 +806,27 @@ static int ovl_copy_up_workdir(struct ovl_copy_up_ctx *c)
> > * ovl_copy_up_data(), so lock workdir and destdir and make sure that
> > * temp wasn't moved before copy up completion or cleanup.
> > */
> > - trap = lock_rename(c->workdir, c->destdir);
> > - if (trap || temp->d_parent != c->workdir) {
> > + rd.mnt_idmap = ovl_upper_mnt_idmap(ofs);
> > + rd.old_parent = c->workdir;
> > + rd.new_parent = c->destdir;
> > + rd.flags = 0;
> > + err = start_renaming_dentry(&rd, 0, temp,
> > + &QSTR_LEN(c->destname.name, c->destname.len));
> > + if (err == -EINVAL || err == -EXDEV) {
>
> This error code whitelist is not needed and is too fragile anyway.
> After your commit
> 9d23967b18c64 ("ovl: simplify an error path in ovl_copy_up_workdir()")
> any locking error is treated the same - it does not matter what the
> reason for lock_rename() or start_renaming_dentry() is.
>
> > /* temp or workdir moved underneath us? abort without cleanup */
> > dput(temp);
> > err = -EIO;
> > - if (!IS_ERR(trap))
> > - unlock_rename(c->workdir, c->destdir);
> > goto out;
> > }
>
> Frankly, we could get rid of the "abort without cleanup"
> comment and instead: err = -EIO; goto cleanup_unlocked;
> because before cleanup_unlocked, cleanup was relying on the
> lock_rename() to take the lock for the cleanup, but we don't need
> that anymore.
>
> To be clear, I don't think is it important to goto cleanup_unlocked,
> leaving goto out is fine because we are not very sympathetic
> to changes to underlying layers while ovl is mounted, so we should
> not really care about this cleanup, but for the sake of simpler code
> I wouldn't mind the goto cleanup_unlocked.
So I think you are saying that if start_renaming_dentry() returns an
error, we should map that to -EIO and cleanup?
I can do that - sure.
>
> > -
> > - err = ovl_copy_up_metadata(c, temp);
> > if (err)
> > goto cleanup;
>
> Is this right? should we be calling end_renaming() on error?
You are right- that should be "goto cleanup_unlocked".
Thanks,
NeilBrown
>
> Thanks,
> Amir.
>
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH 09/11] VFS/ovl/smb: introduce start_renaming_dentry()
2025-10-01 1:45 ` NeilBrown
@ 2025-10-02 10:56 ` Amir Goldstein
0 siblings, 0 replies; 49+ messages in thread
From: Amir Goldstein @ 2025-10-02 10:56 UTC (permalink / raw)
To: NeilBrown
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Wed, Oct 1, 2025 at 3:45 AM NeilBrown <neilb@ownmail.net> wrote:
>
> On Tue, 30 Sep 2025, Amir Goldstein wrote:
> > On Fri, Sep 26, 2025 at 4:51 AM NeilBrown <neilb@ownmail.net> wrote:
> > >
> > > From: NeilBrown <neil@brown.name>
> > >
> > > Several callers perform a rename on a dentry they already have, and only
> > > require lookup for the target name. This includes smb/server and a few
> > > different places in overlayfs.
> > >
> > > start_renaming_dentry() performs the required lookup and takes the
> > > required lock using lock_rename_child()
> > >
> > > It is used in three places in overlayfs and in ksmbd_vfs_rename().
> > >
> > > In the ksmbd case, the parent of the source is not important - the
> > > source must be renamed from wherever it is. So start_renaming_dentry()
> > > allows rd->old_parent to be NULL and only checks it if it is non-NULL.
> > > On success rd->old_parent will be the parent of old_dentry with an extra
> > > reference taken.
> >
> > It is not clear to me why you need to take that extra ref.
> > It looks very unnatural for start_renaming/end_renaming
> > to take ref on old_parent and not on new_parent.
>
> There is an important difference between old_parent and new_parent.
> After the rename, new_parent will still be valid as we will hold a
> reference through whichever child is in that parent.
> However we might not still have a reference that keeps old_parent valid,
> unless we take one ourselves.
>
> >
> > If ksmbd needs old_parent it can use old->d_parent after
> > the start_renaming_dentry() it should be stable. right?
> > So what's the point of taking this extra ref?
>
> It is not that ksmbd might need old_parent, it is that end_renaming()
> needs old_parent so it can be unlocked. If we don't explicitly take a
> reference, then we cannot be sure that the reference that was found in
> start_renaming_dentry() is still valid after vfs_rename() has moved the
> old_dentry out of it.
>
> >
> > > Other start_renaming function also now take the extra
> > > reference and end_renaming() now drops this reference as well.
> > >
> > > ovl_lookup_temp(), ovl_parent_lock(), and ovl_parent_unlock() are
> > > all removed as they are no longer needed.
> > >
> > > OVL_TEMPNAME_SIZE and ovl_tempname() are now declared in overlayfs.h so
> > > that ovl_check_rename_whiteout() can access them.
> > >
> > > Signed-off-by: NeilBrown <neil@brown.name>
> > > ---
> > > fs/namei.c | 106 ++++++++++++++++++++++++++++++++++++---
> > > fs/overlayfs/copy_up.c | 47 ++++++++---------
> > > fs/overlayfs/dir.c | 19 +------
> > > fs/overlayfs/overlayfs.h | 8 +--
> > > fs/overlayfs/super.c | 20 ++++----
> > > fs/overlayfs/util.c | 11 ----
> > > fs/smb/server/vfs.c | 60 ++++------------------
> > > include/linux/namei.h | 2 +
> > > 8 files changed, 147 insertions(+), 126 deletions(-)
> > >
> > > diff --git a/fs/namei.c b/fs/namei.c
> > > index 79a8b3b47e4d..aca6de83d255 100644
> > > --- a/fs/namei.c
> > > +++ b/fs/namei.c
> > > @@ -3686,7 +3686,7 @@ EXPORT_SYMBOL(unlock_rename);
> > >
> > > /**
> > > * __start_renaming - lookup and lock names for rename
> > > - * @rd: rename data containing parent and flags, and
> > > + * @rd: rename data containing parents and flags, and
> > > * for receiving found dentries
> > > * @lookup_flags: extra flags to pass to ->lookup (e.g. LOOKUP_REVAL,
> > > * LOOKUP_NO_SYMLINKS etc).
> > > @@ -3697,8 +3697,8 @@ EXPORT_SYMBOL(unlock_rename);
> > > * rename.
> > > *
> > > * On success the found dentrys are stored in @rd.old_dentry,
> > > - * @rd.new_dentry. These references and the lock are dropped by
> > > - * end_renaming().
> > > + * @rd.new_dentry and an extra ref is taken on @rd.old_parent.
> > > + * These references and the lock are dropped by end_renaming().
> > > *
> > > * The passed in qstrs must have the hash calculated, and no permission
> > > * checking is performed.
> > > @@ -3750,6 +3750,7 @@ __start_renaming(struct renamedata *rd, int lookup_flags,
> > >
> > > rd->old_dentry = d1;
> > > rd->new_dentry = d2;
> > > + dget(rd->old_parent);
> > > return 0;
> > >
> > > out_unlock_3:
> > > @@ -3765,7 +3766,7 @@ __start_renaming(struct renamedata *rd, int lookup_flags,
> > >
> > > /**
> > > * start_renaming - lookup and lock names for rename with permission checking
> > > - * @rd: rename data containing parent and flags, and
> > > + * @rd: rename data containing parents and flags, and
> > > * for receiving found dentries
> > > * @lookup_flags: extra flags to pass to ->lookup (e.g. LOOKUP_REVAL,
> > > * LOOKUP_NO_SYMLINKS etc).
> > > @@ -3776,8 +3777,8 @@ __start_renaming(struct renamedata *rd, int lookup_flags,
> > > * rename.
> > > *
> > > * On success the found dentrys are stored in @rd.old_dentry,
> > > - * @rd.new_dentry. These references and the lock are dropped by
> > > - * end_renaming().
> > > + * @rd.new_dentry. Also the refcount on @rd->old_parent is increased.
> > > + * These references and the lock are dropped by end_renaming().
> > > *
> > > * The passed in qstrs need not have the hash calculated, and basic
> > > * eXecute permission checking is performed against @rd.mnt_idmap.
> > > @@ -3799,11 +3800,104 @@ int start_renaming(struct renamedata *rd, int lookup_flags,
> > > }
> > > EXPORT_SYMBOL(start_renaming);
> > >
> > > +static int
> > > +__start_renaming_dentry(struct renamedata *rd, int lookup_flags,
> > > + struct dentry *old_dentry, struct qstr *new_last)
> > > +{
> > > + struct dentry *trap;
> > > + struct dentry *d2;
> > > + int target_flags = LOOKUP_RENAME_TARGET | LOOKUP_CREATE;
> > > + int err;
> > > +
> > > + if (rd->flags & RENAME_EXCHANGE)
> > > + target_flags = 0;
> > > + if (rd->flags & RENAME_NOREPLACE)
> > > + target_flags |= LOOKUP_EXCL;
> > > +
> > > + /* Already have the dentry - need to be sure to lock the correct parent */
> > > + trap = lock_rename_child(old_dentry, rd->new_parent);
> > > + if (IS_ERR(trap))
> > > + return PTR_ERR(trap);
> > > + if (d_unhashed(old_dentry) ||
> > > + (rd->old_parent && rd->old_parent != old_dentry->d_parent)) {
> > > + /* dentry was removed, or moved and explicit parent requested */
> > > + d2 = ERR_PTR(-EINVAL);
> > > + goto out_unlock_2;
> > > + }
> > > +
> > > + d2 = lookup_one_qstr_excl(new_last, rd->new_parent,
> > > + lookup_flags | target_flags);
> > > + if (IS_ERR(d2))
> > > + goto out_unlock_2;
> > > +
> > > + if (old_dentry == trap) {
> > > + /* source is an ancestor of target */
> > > + err = -EINVAL;
> > > + goto out_unlock_3;
> > > + }
> > > +
> > > + if (d2 == trap) {
> > > + /* target is an ancestor of source */
> > > + if (rd->flags & RENAME_EXCHANGE)
> > > + err = -EINVAL;
> > > + else
> > > + err = -ENOTEMPTY;
> > > + goto out_unlock_3;
> > > + }
> > > +
> > > + rd->old_dentry = dget(old_dentry);
> > > + rd->new_dentry = d2;
> > > + rd->old_parent = dget(old_dentry->d_parent);
> > > + return 0;
> > > +
> > > +out_unlock_3:
> > > + dput(d2);
> > > + d2 = ERR_PTR(err);
> > > +out_unlock_2:
> > > + unlock_rename(old_dentry->d_parent, rd->new_parent);
> > > + return PTR_ERR(d2);
> >
> > Please assign err before goto and simplify:
> >
> > out_dput:
> > dput(d2);
> > out_unlock:
> > unlock_rename(old_dentry->d_parent, rd->new_parent);
> > return err;
>
> I'll try that change and see if I like the result.
>
> >
> > > +}
> > > +
> > > +/**
> > > + * start_renaming_dentry - lookup and lock name for rename with permission checking
> > > + * @rd: rename data containing parents and flags, and
> > > + * for receiving found dentries
> > > + * @lookup_flags: extra flags to pass to ->lookup (e.g. LOOKUP_REVAL,
> > > + * LOOKUP_NO_SYMLINKS etc).
> > > + * @old_dentry: dentry of name to move
> > > + * @new_last: name of target in @rd.new_parent
> > > + *
> > > + * Look up target name and ensure locks are in place for
> > > + * rename.
> > > + *
> > > + * On success the found dentry is stored in @rd.new_dentry and
> > > + * @rd.old_parent is confirmed to be the parent of @old_dentry. If it
> > > + * was originally %NULL, it is set. In either case a refernence is taken.
> >
> > Typo: %NULL, typo: refernence
> >
> > > + *
> > > + * References and the lock can be dropped with end_renaming()
> > > + *
> > > + * The passed in qstr need not have the hash calculated, and basic
> > > + * eXecute permission checking is performed against @rd.mnt_idmap.
> > > + *
> > > + * Returns: zero or an error.
> > > + */
> > > +int start_renaming_dentry(struct renamedata *rd, int lookup_flags,
> > > + struct dentry *old_dentry, struct qstr *new_last)
> > > +{
> > > + int err;
> > > +
> > > + err = lookup_one_common(rd->mnt_idmap, new_last, rd->new_parent);
> > > + if (err)
> > > + return err;
> > > + return __start_renaming_dentry(rd, lookup_flags, old_dentry, new_last);
> > > +}
> > > +
> > > void end_renaming(struct renamedata *rd)
> > > {
> > > unlock_rename(rd->old_parent, rd->new_parent);
> > > dput(rd->old_dentry);
> > > dput(rd->new_dentry);
> > > + dput(rd->old_parent);
> > > }
> > > EXPORT_SYMBOL(end_renaming);
> > >
> > > diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> > > index 6a31ea34ff80..3f19548b5d48 100644
> > > --- a/fs/overlayfs/copy_up.c
> > > +++ b/fs/overlayfs/copy_up.c
> > > @@ -523,8 +523,8 @@ static int ovl_create_index(struct dentry *dentry, const struct ovl_fh *fh,
> > > {
> > > struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
> > > struct dentry *indexdir = ovl_indexdir(dentry->d_sb);
> > > - struct dentry *index = NULL;
> > > struct dentry *temp = NULL;
> > > + struct renamedata rd = {};
> > > struct qstr name = { };
> > > int err;
> > >
> > > @@ -556,17 +556,15 @@ static int ovl_create_index(struct dentry *dentry, const struct ovl_fh *fh,
> > > if (err)
> > > goto out;
> > >
> > > - err = ovl_parent_lock(indexdir, temp);
> > > + rd.mnt_idmap = ovl_upper_mnt_idmap(ofs);
> > > + rd.old_parent = indexdir;
> > > + rd.new_parent = indexdir;
> > > + err = start_renaming_dentry(&rd, 0, temp, &name);
> > > if (err)
> > > goto out;
> > > - index = ovl_lookup_upper(ofs, name.name, indexdir, name.len);
> > > - if (IS_ERR(index)) {
> > > - err = PTR_ERR(index);
> > > - } else {
> > > - err = ovl_do_rename(ofs, indexdir, temp, indexdir, index, 0);
> > > - dput(index);
> > > - }
> > > - ovl_parent_unlock(indexdir);
> > > +
> > > + err = ovl_do_rename_rd(&rd);
> > > + end_renaming(&rd);
> > > out:
> > > if (err)
> > > ovl_cleanup(ofs, indexdir, temp);
> > > @@ -763,7 +761,8 @@ static int ovl_copy_up_workdir(struct ovl_copy_up_ctx *c)
> > > struct ovl_fs *ofs = OVL_FS(c->dentry->d_sb);
> > > struct inode *inode;
> > > struct path path = { .mnt = ovl_upper_mnt(ofs) };
> > > - struct dentry *temp, *upper, *trap;
> > > + struct renamedata rd = {};
> > > + struct dentry *temp;
> > > struct ovl_cu_creds cc;
> > > int err;
> > > struct ovl_cattr cattr = {
> > > @@ -807,29 +806,27 @@ static int ovl_copy_up_workdir(struct ovl_copy_up_ctx *c)
> > > * ovl_copy_up_data(), so lock workdir and destdir and make sure that
> > > * temp wasn't moved before copy up completion or cleanup.
> > > */
> > > - trap = lock_rename(c->workdir, c->destdir);
> > > - if (trap || temp->d_parent != c->workdir) {
> > > + rd.mnt_idmap = ovl_upper_mnt_idmap(ofs);
> > > + rd.old_parent = c->workdir;
> > > + rd.new_parent = c->destdir;
> > > + rd.flags = 0;
> > > + err = start_renaming_dentry(&rd, 0, temp,
> > > + &QSTR_LEN(c->destname.name, c->destname.len));
> > > + if (err == -EINVAL || err == -EXDEV) {
> >
> > This error code whitelist is not needed and is too fragile anyway.
> > After your commit
> > 9d23967b18c64 ("ovl: simplify an error path in ovl_copy_up_workdir()")
> > any locking error is treated the same - it does not matter what the
> > reason for lock_rename() or start_renaming_dentry() is.
> >
> > > /* temp or workdir moved underneath us? abort without cleanup */
> > > dput(temp);
> > > err = -EIO;
> > > - if (!IS_ERR(trap))
> > > - unlock_rename(c->workdir, c->destdir);
> > > goto out;
> > > }
> >
> > Frankly, we could get rid of the "abort without cleanup"
> > comment and instead: err = -EIO; goto cleanup_unlocked;
> > because before cleanup_unlocked, cleanup was relying on the
> > lock_rename() to take the lock for the cleanup, but we don't need
> > that anymore.
> >
> > To be clear, I don't think is it important to goto cleanup_unlocked,
> > leaving goto out is fine because we are not very sympathetic
> > to changes to underlying layers while ovl is mounted, so we should
> > not really care about this cleanup, but for the sake of simpler code
> > I wouldn't mind the goto cleanup_unlocked.
>
> So I think you are saying that if start_renaming_dentry() returns an
> error, we should map that to -EIO and cleanup?
>
Yes. That sounds right.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 09/11] VFS/ovl/smb: introduce start_renaming_dentry()
2025-09-30 7:08 ` Amir Goldstein
2025-10-01 1:45 ` NeilBrown
@ 2025-10-01 4:35 ` NeilBrown
1 sibling, 0 replies; 49+ messages in thread
From: NeilBrown @ 2025-10-01 4:35 UTC (permalink / raw)
To: Amir Goldstein
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Tue, 30 Sep 2025, Amir Goldstein wrote:
> On Fri, Sep 26, 2025 at 4:51 AM NeilBrown <neilb@ownmail.net> wrote:
> > + * On success the found dentry is stored in @rd.new_dentry and
> > + * @rd.old_parent is confirmed to be the parent of @old_dentry. If it
> > + * was originally %NULL, it is set. In either case a refernence is taken.
>
> Typo: %NULL, typo: refernence
%NULL isn't a typo. As documented in kernel-doc.rst
%CONST
should be used to give appropriate formatting to constant.
Thanks for the refernence fix!
NeilBrown
>
> > + *
> > + * References and the lock can be dropped with end_renaming()
> > + *
> > + * The passed in qstr need not have the hash calculated, and basic
> > + * eXecute permission checking is performed against @rd.mnt_idmap.
> > + *
> > + * Returns: zero or an error.
> > + */
> > +int start_renaming_dentry(struct renamedata *rd, int lookup_flags,
> > + struct dentry *old_dentry, struct qstr *new_last)
> > +{
> > + int err;
> > +
> > + err = lookup_one_common(rd->mnt_idmap, new_last, rd->new_parent);
> > + if (err)
> > + return err;
> > + return __start_renaming_dentry(rd, lookup_flags, old_dentry, new_last);
> > +}
> > +
> > void end_renaming(struct renamedata *rd)
> > {
> > unlock_rename(rd->old_parent, rd->new_parent);
> > dput(rd->old_dentry);
> > dput(rd->new_dentry);
> > + dput(rd->old_parent);
> > }
> > EXPORT_SYMBOL(end_renaming);
> >
> > diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> > index 6a31ea34ff80..3f19548b5d48 100644
> > --- a/fs/overlayfs/copy_up.c
> > +++ b/fs/overlayfs/copy_up.c
> > @@ -523,8 +523,8 @@ static int ovl_create_index(struct dentry *dentry, const struct ovl_fh *fh,
> > {
> > struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
> > struct dentry *indexdir = ovl_indexdir(dentry->d_sb);
> > - struct dentry *index = NULL;
> > struct dentry *temp = NULL;
> > + struct renamedata rd = {};
> > struct qstr name = { };
> > int err;
> >
> > @@ -556,17 +556,15 @@ static int ovl_create_index(struct dentry *dentry, const struct ovl_fh *fh,
> > if (err)
> > goto out;
> >
> > - err = ovl_parent_lock(indexdir, temp);
> > + rd.mnt_idmap = ovl_upper_mnt_idmap(ofs);
> > + rd.old_parent = indexdir;
> > + rd.new_parent = indexdir;
> > + err = start_renaming_dentry(&rd, 0, temp, &name);
> > if (err)
> > goto out;
> > - index = ovl_lookup_upper(ofs, name.name, indexdir, name.len);
> > - if (IS_ERR(index)) {
> > - err = PTR_ERR(index);
> > - } else {
> > - err = ovl_do_rename(ofs, indexdir, temp, indexdir, index, 0);
> > - dput(index);
> > - }
> > - ovl_parent_unlock(indexdir);
> > +
> > + err = ovl_do_rename_rd(&rd);
> > + end_renaming(&rd);
> > out:
> > if (err)
> > ovl_cleanup(ofs, indexdir, temp);
> > @@ -763,7 +761,8 @@ static int ovl_copy_up_workdir(struct ovl_copy_up_ctx *c)
> > struct ovl_fs *ofs = OVL_FS(c->dentry->d_sb);
> > struct inode *inode;
> > struct path path = { .mnt = ovl_upper_mnt(ofs) };
> > - struct dentry *temp, *upper, *trap;
> > + struct renamedata rd = {};
> > + struct dentry *temp;
> > struct ovl_cu_creds cc;
> > int err;
> > struct ovl_cattr cattr = {
> > @@ -807,29 +806,27 @@ static int ovl_copy_up_workdir(struct ovl_copy_up_ctx *c)
> > * ovl_copy_up_data(), so lock workdir and destdir and make sure that
> > * temp wasn't moved before copy up completion or cleanup.
> > */
> > - trap = lock_rename(c->workdir, c->destdir);
> > - if (trap || temp->d_parent != c->workdir) {
> > + rd.mnt_idmap = ovl_upper_mnt_idmap(ofs);
> > + rd.old_parent = c->workdir;
> > + rd.new_parent = c->destdir;
> > + rd.flags = 0;
> > + err = start_renaming_dentry(&rd, 0, temp,
> > + &QSTR_LEN(c->destname.name, c->destname.len));
> > + if (err == -EINVAL || err == -EXDEV) {
>
> This error code whitelist is not needed and is too fragile anyway.
> After your commit
> 9d23967b18c64 ("ovl: simplify an error path in ovl_copy_up_workdir()")
> any locking error is treated the same - it does not matter what the
> reason for lock_rename() or start_renaming_dentry() is.
>
> > /* temp or workdir moved underneath us? abort without cleanup */
> > dput(temp);
> > err = -EIO;
> > - if (!IS_ERR(trap))
> > - unlock_rename(c->workdir, c->destdir);
> > goto out;
> > }
>
> Frankly, we could get rid of the "abort without cleanup"
> comment and instead: err = -EIO; goto cleanup_unlocked;
> because before cleanup_unlocked, cleanup was relying on the
> lock_rename() to take the lock for the cleanup, but we don't need
> that anymore.
>
> To be clear, I don't think is it important to goto cleanup_unlocked,
> leaving goto out is fine because we are not very sympathetic
> to changes to underlying layers while ovl is mounted, so we should
> not really care about this cleanup, but for the sake of simpler code
> I wouldn't mind the goto cleanup_unlocked.
>
> > -
> > - err = ovl_copy_up_metadata(c, temp);
> > if (err)
> > goto cleanup;
>
> Is this right? should we be calling end_renaming() on error?
>
> Thanks,
> Amir.
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH 10/11] Add start_renaming_two_dentrys()
2025-09-26 2:49 [PATCH 00/11] Create APIs to centralise locking for directory ops NeilBrown
` (8 preceding siblings ...)
2025-09-26 2:49 ` [PATCH 09/11] VFS/ovl/smb: introduce start_renaming_dentry() NeilBrown
@ 2025-09-26 2:49 ` NeilBrown
2025-09-30 7:46 ` Amir Goldstein
2025-09-26 2:49 ` [PATCH 11/11] ecryptfs: use new start_creaing/start_removing APIs NeilBrown
2025-09-26 15:47 ` [PATCH 00/11] Create APIs to centralise locking for directory ops Amir Goldstein
11 siblings, 1 reply; 49+ messages in thread
From: NeilBrown @ 2025-09-26 2:49 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
A few callers want to lock for a rename and already have both dentrys.
Also debugfs does want to perform a lookup but doesn't want permission
checking, so start_renaming_dentry() cannot be used.
This patch introduces start_renaming_two_dentrys() which is given both
dentrys. debugfs performs one lookup itself. As it will only continue
with a negative dentry and as those cannot be renamed or unlinked, it is
safe to do the lookup before getting the rename locks.
overlayfs uses start_renaming_two_dentrys() in three places and selinux
uses it twice in sel_make_policy_nodes().
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/debugfs/inode.c | 48 +++++++++++++--------------
fs/namei.c | 63 ++++++++++++++++++++++++++++++++++++
fs/overlayfs/dir.c | 42 ++++++++++++++++--------
include/linux/namei.h | 2 ++
security/selinux/selinuxfs.c | 27 ++++++++++------
5 files changed, 133 insertions(+), 49 deletions(-)
diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
index b863c8d0cbcd..2aad67b8174e 100644
--- a/fs/debugfs/inode.c
+++ b/fs/debugfs/inode.c
@@ -842,7 +842,8 @@ int __printf(2, 3) debugfs_change_name(struct dentry *dentry, const char *fmt, .
int error = 0;
const char *new_name;
struct name_snapshot old_name;
- struct dentry *parent, *target;
+ struct dentry *target;
+ struct renamedata rd = {};
struct inode *dir;
va_list ap;
@@ -855,36 +856,31 @@ int __printf(2, 3) debugfs_change_name(struct dentry *dentry, const char *fmt, .
if (!new_name)
return -ENOMEM;
- parent = dget_parent(dentry);
- dir = d_inode(parent);
- inode_lock(dir);
+ rd.old_parent = dget_parent(dentry);
+ rd.new_parent = rd.old_parent;
+ rd.flags = RENAME_NOREPLACE;
+ target = lookup_noperm_unlocked(&QSTR(new_name), rd.new_parent);
+ if (IS_ERR(target))
+ return PTR_ERR(target);
- take_dentry_name_snapshot(&old_name, dentry);
-
- if (WARN_ON_ONCE(dentry->d_parent != parent)) {
- error = -EINVAL;
- goto out;
- }
- if (strcmp(old_name.name.name, new_name) == 0)
- goto out;
- target = lookup_noperm(&QSTR(new_name), parent);
- if (IS_ERR(target)) {
- error = PTR_ERR(target);
- goto out;
- }
- if (d_really_is_positive(target)) {
- dput(target);
- error = -EINVAL;
+ error = start_renaming_two_dentrys(&rd, dentry, target);
+ if (error) {
+ if (error == -EEXIST && target == dentry)
+ /* it isn't an error to rename a thing to itself */
+ error = 0;
goto out;
}
- simple_rename_timestamp(dir, dentry, dir, target);
- d_move(dentry, target);
- dput(target);
+
+ dir = d_inode(rd.old_parent);
+ take_dentry_name_snapshot(&old_name, dentry);
+ simple_rename_timestamp(dir, dentry, dir, rd.new_dentry);
+ d_move(dentry, rd.new_dentry);
fsnotify_move(dir, dir, &old_name.name, d_is_dir(dentry), NULL, dentry);
-out:
release_dentry_name_snapshot(&old_name);
- inode_unlock(dir);
- dput(parent);
+ end_renaming(&rd);
+out:
+ dput(rd.old_parent);
+ dput(target);
kfree_const(new_name);
return error;
}
diff --git a/fs/namei.c b/fs/namei.c
index aca6de83d255..23f9adb43401 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3892,6 +3892,69 @@ int start_renaming_dentry(struct renamedata *rd, int lookup_flags,
return __start_renaming_dentry(rd, lookup_flags, old_dentry, new_last);
}
+/**
+ * start_renaming_two_dentrys - Lock to dentries in given parents for rename
+ * @rd: rename data containing parent
+ * @old_dentry: dentry of name to move
+ * @new_dentry: dentry to move to
+ *
+ * Ensure locks are in place for rename and check parentage is still correct.
+ *
+ * On success the two dentrys are stored in @rd.old_dentry and @rd.new_dentry and
+ * @rd.old_parent and @rd.new_parent are confirmed to be the parents of the dentruies.
+ *
+ * References and the lock can be dropped with end_renaming()
+ *
+ * Returns: zero or an error.
+ */
+int
+start_renaming_two_dentrys(struct renamedata *rd,
+ struct dentry *old_dentry, struct dentry *new_dentry)
+{
+ struct dentry *trap;
+ int err;
+
+ /* Already have the dentry - need to be sure to lock the correct parent */
+ trap = lock_rename_child(old_dentry, rd->new_parent);
+ if (IS_ERR(trap))
+ return PTR_ERR(trap);
+ err = -EINVAL;
+ if (d_unhashed(old_dentry) ||
+ (rd->old_parent && rd->old_parent != old_dentry->d_parent))
+ /* old_dentry was removed, or moved and explicit parent requested */
+ goto out_unlock;
+ if (d_unhashed(new_dentry) ||
+ rd->new_parent != new_dentry->d_parent)
+ /* new_dentry was removed or moved */
+ goto out_unlock;
+
+ if (old_dentry == trap)
+ /* source is an ancestor of target */
+ goto out_unlock;
+
+ if (new_dentry == trap) {
+ /* target is an ancestor of source */
+ if (rd->flags & RENAME_EXCHANGE)
+ err = -EINVAL;
+ else
+ err = -ENOTEMPTY;
+ goto out_unlock;
+ }
+
+ err = -EEXIST;
+ if (d_is_positive(new_dentry) && (rd->flags & RENAME_NOREPLACE))
+ goto out_unlock;
+
+ rd->old_dentry = dget(old_dentry);
+ rd->new_dentry = dget(new_dentry);
+ rd->old_parent = dget(old_dentry->d_parent);
+ return 0;
+
+out_unlock:
+ unlock_rename(old_dentry->d_parent, rd->new_parent);
+ return err;
+}
+
void end_renaming(struct renamedata *rd)
{
unlock_rename(rd->old_parent, rd->new_parent);
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index 54423ad00e1c..e8c369e3e277 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -125,6 +125,7 @@ int ovl_cleanup_and_whiteout(struct ovl_fs *ofs, struct dentry *dir,
struct dentry *dentry)
{
struct dentry *whiteout;
+ struct renamedata rd = {};
int err;
int flags = 0;
@@ -136,10 +137,13 @@ int ovl_cleanup_and_whiteout(struct ovl_fs *ofs, struct dentry *dir,
if (d_is_dir(dentry))
flags = RENAME_EXCHANGE;
- err = ovl_lock_rename_workdir(ofs->workdir, whiteout, dir, dentry);
+ rd.old_parent = ofs->workdir;
+ rd.new_parent = dir;
+ rd.flags = flags;
+ err = start_renaming_two_dentrys(&rd, whiteout, dentry);
if (!err) {
- err = ovl_do_rename(ofs, ofs->workdir, whiteout, dir, dentry, flags);
- unlock_rename(ofs->workdir, dir);
+ err = ovl_do_rename_rd(&rd);
+ end_renaming(&rd);
}
if (err)
goto kill_whiteout;
@@ -363,6 +367,7 @@ static struct dentry *ovl_clear_empty(struct dentry *dentry,
struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
struct dentry *workdir = ovl_workdir(dentry);
struct dentry *upperdir = ovl_dentry_upper(dentry->d_parent);
+ struct renamedata rd = {};
struct path upperpath;
struct dentry *upper;
struct dentry *opaquedir;
@@ -388,7 +393,11 @@ static struct dentry *ovl_clear_empty(struct dentry *dentry,
if (IS_ERR(opaquedir))
goto out;
- err = ovl_lock_rename_workdir(workdir, opaquedir, upperdir, upper);
+ rd.mnt_idmap = ovl_upper_mnt_idmap(ofs);
+ rd.old_parent = workdir;
+ rd.new_parent = upperdir;
+ rd.flags = RENAME_EXCHANGE;
+ err = start_renaming_two_dentrys(&rd, opaquedir, upper);
if (err)
goto out_cleanup_unlocked;
@@ -406,8 +415,8 @@ static struct dentry *ovl_clear_empty(struct dentry *dentry,
if (err)
goto out_cleanup;
- err = ovl_do_rename(ofs, workdir, opaquedir, upperdir, upper, RENAME_EXCHANGE);
- unlock_rename(workdir, upperdir);
+ err = ovl_do_rename_rd(&rd);
+ end_renaming(&rd);
if (err)
goto out_cleanup_unlocked;
@@ -420,7 +429,7 @@ static struct dentry *ovl_clear_empty(struct dentry *dentry,
return opaquedir;
out_cleanup:
- unlock_rename(workdir, upperdir);
+ end_renaming(&rd);
out_cleanup_unlocked:
ovl_cleanup(ofs, workdir, opaquedir);
dput(opaquedir);
@@ -443,6 +452,7 @@ static int ovl_create_over_whiteout(struct dentry *dentry, struct inode *inode,
struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
struct dentry *workdir = ovl_workdir(dentry);
struct dentry *upperdir = ovl_dentry_upper(dentry->d_parent);
+ struct renamedata rd = {};
struct dentry *upper;
struct dentry *newdentry;
int err;
@@ -474,7 +484,11 @@ static int ovl_create_over_whiteout(struct dentry *dentry, struct inode *inode,
if (IS_ERR(newdentry))
goto out_dput;
- err = ovl_lock_rename_workdir(workdir, newdentry, upperdir, upper);
+ rd.mnt_idmap = ovl_upper_mnt_idmap(ofs);
+ rd.old_parent = workdir;
+ rd.new_parent = upperdir;
+ rd.flags = 0;
+ err = start_renaming_two_dentrys(&rd, newdentry, upper);
if (err)
goto out_cleanup_unlocked;
@@ -511,16 +525,16 @@ static int ovl_create_over_whiteout(struct dentry *dentry, struct inode *inode,
if (err)
goto out_cleanup;
- err = ovl_do_rename(ofs, workdir, newdentry, upperdir, upper,
- RENAME_EXCHANGE);
- unlock_rename(workdir, upperdir);
+ rd.flags = RENAME_EXCHANGE;
+ err = ovl_do_rename_rd(&rd);
+ end_renaming(&rd);
if (err)
goto out_cleanup_unlocked;
ovl_cleanup(ofs, workdir, upper);
} else {
- err = ovl_do_rename(ofs, workdir, newdentry, upperdir, upper, 0);
- unlock_rename(workdir, upperdir);
+ err = ovl_do_rename_rd(&rd);
+ end_renaming(&rd);
if (err)
goto out_cleanup_unlocked;
}
@@ -540,7 +554,7 @@ static int ovl_create_over_whiteout(struct dentry *dentry, struct inode *inode,
return err;
out_cleanup:
- unlock_rename(workdir, upperdir);
+ end_renaming(&rd);
out_cleanup_unlocked:
ovl_cleanup(ofs, workdir, newdentry);
dput(newdentry);
diff --git a/include/linux/namei.h b/include/linux/namei.h
index ada0f6cc38bc..434b10476e40 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -144,6 +144,8 @@ int start_renaming(struct renamedata *rd, int lookup_flags,
struct qstr *old_last, struct qstr *new_last);
int start_renaming_dentry(struct renamedata *rd, int lookup_flags,
struct dentry *old_dentry, struct qstr *new_last);
+int start_renaming_two_dentrys(struct renamedata *rd,
+ struct dentry *old_dentry, struct dentry *new_dentry);
void end_renaming(struct renamedata *rd);
/**
diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
index 9aa1d03ab612..13d413107a29 100644
--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
@@ -506,6 +506,7 @@ static int sel_make_policy_nodes(struct selinux_fs_info *fsi,
{
int ret = 0;
struct dentry *tmp_parent, *tmp_bool_dir, *tmp_class_dir;
+ struct renamedata rd = {};
unsigned int bool_num = 0;
char **bool_names = NULL;
int *bool_values = NULL;
@@ -539,22 +540,30 @@ static int sel_make_policy_nodes(struct selinux_fs_info *fsi,
if (ret)
goto out;
- lock_rename(tmp_parent, fsi->sb->s_root);
+ rd.old_parent = tmp_parent;
+ rd.new_parent = fsi->sb->s_root;
/* booleans */
- d_exchange(tmp_bool_dir, fsi->bool_dir);
+ ret = start_renaming_two_dentrys(&rd, tmp_bool_dir, fsi->bool_dir);
+ if (!ret) {
+ d_exchange(tmp_bool_dir, fsi->bool_dir);
- swap(fsi->bool_num, bool_num);
- swap(fsi->bool_pending_names, bool_names);
- swap(fsi->bool_pending_values, bool_values);
+ swap(fsi->bool_num, bool_num);
+ swap(fsi->bool_pending_names, bool_names);
+ swap(fsi->bool_pending_values, bool_values);
- fsi->bool_dir = tmp_bool_dir;
+ fsi->bool_dir = tmp_bool_dir;
+ end_renaming(&rd);
+ }
/* classes */
- d_exchange(tmp_class_dir, fsi->class_dir);
- fsi->class_dir = tmp_class_dir;
+ ret = start_renaming_two_dentrys(&rd, tmp_class_dir, fsi->class_dir);
+ if (ret == 0) {
+ d_exchange(tmp_class_dir, fsi->class_dir);
+ fsi->class_dir = tmp_class_dir;
- unlock_rename(tmp_parent, fsi->sb->s_root);
+ end_renaming(&rd);
+ }
out:
sel_remove_old_bool_data(bool_num, bool_names, bool_values);
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 49+ messages in thread* Re: [PATCH 10/11] Add start_renaming_two_dentrys()
2025-09-26 2:49 ` [PATCH 10/11] Add start_renaming_two_dentrys() NeilBrown
@ 2025-09-30 7:46 ` Amir Goldstein
2025-10-01 4:14 ` NeilBrown
0 siblings, 1 reply; 49+ messages in thread
From: Amir Goldstein @ 2025-09-30 7:46 UTC (permalink / raw)
To: NeilBrown
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Fri, Sep 26, 2025 at 4:51 AM NeilBrown <neilb@ownmail.net> wrote:
>
> From: NeilBrown <neil@brown.name>
>
> A few callers want to lock for a rename and already have both dentrys.
> Also debugfs does want to perform a lookup but doesn't want permission
> checking, so start_renaming_dentry() cannot be used.
>
> This patch introduces start_renaming_two_dentrys() which is given both
> dentrys. debugfs performs one lookup itself. As it will only continue
> with a negative dentry and as those cannot be renamed or unlinked, it is
> safe to do the lookup before getting the rename locks.
>
> overlayfs uses start_renaming_two_dentrys() in three places and selinux
> uses it twice in sel_make_policy_nodes().
>
start_renaming_two_dentries() please
> Signed-off-by: NeilBrown <neil@brown.name>
> ---
> fs/debugfs/inode.c | 48 +++++++++++++--------------
> fs/namei.c | 63 ++++++++++++++++++++++++++++++++++++
> fs/overlayfs/dir.c | 42 ++++++++++++++++--------
> include/linux/namei.h | 2 ++
> security/selinux/selinuxfs.c | 27 ++++++++++------
> 5 files changed, 133 insertions(+), 49 deletions(-)
>
> diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
> index b863c8d0cbcd..2aad67b8174e 100644
> --- a/fs/debugfs/inode.c
> +++ b/fs/debugfs/inode.c
> @@ -842,7 +842,8 @@ int __printf(2, 3) debugfs_change_name(struct dentry *dentry, const char *fmt, .
> int error = 0;
> const char *new_name;
> struct name_snapshot old_name;
> - struct dentry *parent, *target;
> + struct dentry *target;
> + struct renamedata rd = {};
> struct inode *dir;
> va_list ap;
>
> @@ -855,36 +856,31 @@ int __printf(2, 3) debugfs_change_name(struct dentry *dentry, const char *fmt, .
> if (!new_name)
> return -ENOMEM;
>
> - parent = dget_parent(dentry);
> - dir = d_inode(parent);
> - inode_lock(dir);
> + rd.old_parent = dget_parent(dentry);
> + rd.new_parent = rd.old_parent;
> + rd.flags = RENAME_NOREPLACE;
> + target = lookup_noperm_unlocked(&QSTR(new_name), rd.new_parent);
> + if (IS_ERR(target))
> + return PTR_ERR(target);
>
> - take_dentry_name_snapshot(&old_name, dentry);
> -
> - if (WARN_ON_ONCE(dentry->d_parent != parent)) {
> - error = -EINVAL;
> - goto out;
> - }
> - if (strcmp(old_name.name.name, new_name) == 0)
> - goto out;
> - target = lookup_noperm(&QSTR(new_name), parent);
> - if (IS_ERR(target)) {
> - error = PTR_ERR(target);
> - goto out;
> - }
> - if (d_really_is_positive(target)) {
> - dput(target);
> - error = -EINVAL;
> + error = start_renaming_two_dentrys(&rd, dentry, target);
> + if (error) {
> + if (error == -EEXIST && target == dentry)
> + /* it isn't an error to rename a thing to itself */
> + error = 0;
> goto out;
> }
> - simple_rename_timestamp(dir, dentry, dir, target);
> - d_move(dentry, target);
> - dput(target);
> +
> + dir = d_inode(rd.old_parent);
> + take_dentry_name_snapshot(&old_name, dentry);
> + simple_rename_timestamp(dir, dentry, dir, rd.new_dentry);
> + d_move(dentry, rd.new_dentry);
> fsnotify_move(dir, dir, &old_name.name, d_is_dir(dentry), NULL, dentry);
> -out:
> release_dentry_name_snapshot(&old_name);
> - inode_unlock(dir);
> - dput(parent);
> + end_renaming(&rd);
> +out:
> + dput(rd.old_parent);
> + dput(target);
> kfree_const(new_name);
> return error;
> }
> diff --git a/fs/namei.c b/fs/namei.c
> index aca6de83d255..23f9adb43401 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3892,6 +3892,69 @@ int start_renaming_dentry(struct renamedata *rd, int lookup_flags,
> return __start_renaming_dentry(rd, lookup_flags, old_dentry, new_last);
> }
>
> +/**
> + * start_renaming_two_dentrys - Lock to dentries in given parents for rename
two_dentries please
> + * @rd: rename data containing parent
> + * @old_dentry: dentry of name to move
> + * @new_dentry: dentry to move to
> + *
> + * Ensure locks are in place for rename and check parentage is still correct.
> + *
> + * On success the two dentrys are stored in @rd.old_dentry and @rd.new_dentry and
> + * @rd.old_parent and @rd.new_parent are confirmed to be the parents of the dentruies.
typo: dentruies
> + *
> + * References and the lock can be dropped with end_renaming()
> + *
> + * Returns: zero or an error.
> + */
> +int
> +start_renaming_two_dentrys(struct renamedata *rd,
> + struct dentry *old_dentry, struct dentry *new_dentry)
> +{
> + struct dentry *trap;
> + int err;
> +
> + /* Already have the dentry - need to be sure to lock the correct parent */
> + trap = lock_rename_child(old_dentry, rd->new_parent);
> + if (IS_ERR(trap))
> + return PTR_ERR(trap);
> + err = -EINVAL;
> + if (d_unhashed(old_dentry) ||
> + (rd->old_parent && rd->old_parent != old_dentry->d_parent))
> + /* old_dentry was removed, or moved and explicit parent requested */
> + goto out_unlock;
> + if (d_unhashed(new_dentry) ||
> + rd->new_parent != new_dentry->d_parent)
> + /* new_dentry was removed or moved */
> + goto out_unlock;
> +
> + if (old_dentry == trap)
> + /* source is an ancestor of target */
> + goto out_unlock;
> +
> + if (new_dentry == trap) {
> + /* target is an ancestor of source */
> + if (rd->flags & RENAME_EXCHANGE)
> + err = -EINVAL;
> + else
> + err = -ENOTEMPTY;
> + goto out_unlock;
> + }
> +
> + err = -EEXIST;
> + if (d_is_positive(new_dentry) && (rd->flags & RENAME_NOREPLACE))
> + goto out_unlock;
> +
> + rd->old_dentry = dget(old_dentry);
> + rd->new_dentry = dget(new_dentry);
> + rd->old_parent = dget(old_dentry->d_parent);
This asymmetry between old_parent and new_parent is especially
odd with two dentries and particularly with RENAME_EXCHANGE
where the two dentries are alike.
Is the old_parent ref really needed?
> + return 0;
> +
> +out_unlock:
> + unlock_rename(old_dentry->d_parent, rd->new_parent);
> + return err;
> +}
needs EXPORT_GPL
> +
> void end_renaming(struct renamedata *rd)
> {
> unlock_rename(rd->old_parent, rd->new_parent);
> diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> index 54423ad00e1c..e8c369e3e277 100644
> --- a/fs/overlayfs/dir.c
> +++ b/fs/overlayfs/dir.c
> @@ -125,6 +125,7 @@ int ovl_cleanup_and_whiteout(struct ovl_fs *ofs, struct dentry *dir,
> struct dentry *dentry)
> {
> struct dentry *whiteout;
> + struct renamedata rd = {};
> int err;
> int flags = 0;
>
> @@ -136,10 +137,13 @@ int ovl_cleanup_and_whiteout(struct ovl_fs *ofs, struct dentry *dir,
> if (d_is_dir(dentry))
> flags = RENAME_EXCHANGE;
>
> - err = ovl_lock_rename_workdir(ofs->workdir, whiteout, dir, dentry);
> + rd.old_parent = ofs->workdir;
> + rd.new_parent = dir;
> + rd.flags = flags;
> + err = start_renaming_two_dentrys(&rd, whiteout, dentry);
> if (!err) {
> - err = ovl_do_rename(ofs, ofs->workdir, whiteout, dir, dentry, flags);
> - unlock_rename(ofs->workdir, dir);
> + err = ovl_do_rename_rd(&rd);
> + end_renaming(&rd);
> }
> if (err)
> goto kill_whiteout;
> @@ -363,6 +367,7 @@ static struct dentry *ovl_clear_empty(struct dentry *dentry,
> struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
> struct dentry *workdir = ovl_workdir(dentry);
> struct dentry *upperdir = ovl_dentry_upper(dentry->d_parent);
> + struct renamedata rd = {};
> struct path upperpath;
> struct dentry *upper;
> struct dentry *opaquedir;
> @@ -388,7 +393,11 @@ static struct dentry *ovl_clear_empty(struct dentry *dentry,
> if (IS_ERR(opaquedir))
> goto out;
>
> - err = ovl_lock_rename_workdir(workdir, opaquedir, upperdir, upper);
> + rd.mnt_idmap = ovl_upper_mnt_idmap(ofs);
> + rd.old_parent = workdir;
> + rd.new_parent = upperdir;
> + rd.flags = RENAME_EXCHANGE;
> + err = start_renaming_two_dentrys(&rd, opaquedir, upper);
> if (err)
> goto out_cleanup_unlocked;
>
> @@ -406,8 +415,8 @@ static struct dentry *ovl_clear_empty(struct dentry *dentry,
> if (err)
> goto out_cleanup;
>
> - err = ovl_do_rename(ofs, workdir, opaquedir, upperdir, upper, RENAME_EXCHANGE);
> - unlock_rename(workdir, upperdir);
> + err = ovl_do_rename_rd(&rd);
> + end_renaming(&rd);
> if (err)
> goto out_cleanup_unlocked;
>
> @@ -420,7 +429,7 @@ static struct dentry *ovl_clear_empty(struct dentry *dentry,
> return opaquedir;
>
> out_cleanup:
> - unlock_rename(workdir, upperdir);
> + end_renaming(&rd);
> out_cleanup_unlocked:
> ovl_cleanup(ofs, workdir, opaquedir);
> dput(opaquedir);
> @@ -443,6 +452,7 @@ static int ovl_create_over_whiteout(struct dentry *dentry, struct inode *inode,
> struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
> struct dentry *workdir = ovl_workdir(dentry);
> struct dentry *upperdir = ovl_dentry_upper(dentry->d_parent);
> + struct renamedata rd = {};
> struct dentry *upper;
> struct dentry *newdentry;
> int err;
> @@ -474,7 +484,11 @@ static int ovl_create_over_whiteout(struct dentry *dentry, struct inode *inode,
> if (IS_ERR(newdentry))
> goto out_dput;
>
> - err = ovl_lock_rename_workdir(workdir, newdentry, upperdir, upper);
> + rd.mnt_idmap = ovl_upper_mnt_idmap(ofs);
> + rd.old_parent = workdir;
> + rd.new_parent = upperdir;
> + rd.flags = 0;
> + err = start_renaming_two_dentrys(&rd, newdentry, upper);
> if (err)
> goto out_cleanup_unlocked;
>
> @@ -511,16 +525,16 @@ static int ovl_create_over_whiteout(struct dentry *dentry, struct inode *inode,
> if (err)
> goto out_cleanup;
>
> - err = ovl_do_rename(ofs, workdir, newdentry, upperdir, upper,
> - RENAME_EXCHANGE);
> - unlock_rename(workdir, upperdir);
> + rd.flags = RENAME_EXCHANGE;
> + err = ovl_do_rename_rd(&rd);
> + end_renaming(&rd);
> if (err)
> goto out_cleanup_unlocked;
>
> ovl_cleanup(ofs, workdir, upper);
> } else {
> - err = ovl_do_rename(ofs, workdir, newdentry, upperdir, upper, 0);
> - unlock_rename(workdir, upperdir);
> + err = ovl_do_rename_rd(&rd);
> + end_renaming(&rd);
> if (err)
> goto out_cleanup_unlocked;
> }
> @@ -540,7 +554,7 @@ static int ovl_create_over_whiteout(struct dentry *dentry, struct inode *inode,
> return err;
>
> out_cleanup:
> - unlock_rename(workdir, upperdir);
> + end_renaming(&rd);
> out_cleanup_unlocked:
> ovl_cleanup(ofs, workdir, newdentry);
> dput(newdentry);
ovl changes look fine to me.
with change of helper name and typo fixes feel free to add:
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Thanks,
Amir.
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH 10/11] Add start_renaming_two_dentrys()
2025-09-30 7:46 ` Amir Goldstein
@ 2025-10-01 4:14 ` NeilBrown
0 siblings, 0 replies; 49+ messages in thread
From: NeilBrown @ 2025-10-01 4:14 UTC (permalink / raw)
To: Amir Goldstein
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Tue, 30 Sep 2025, Amir Goldstein wrote:
> On Fri, Sep 26, 2025 at 4:51 AM NeilBrown <neilb@ownmail.net> wrote:
> >
> > From: NeilBrown <neil@brown.name>
> >
> > A few callers want to lock for a rename and already have both dentrys.
> > Also debugfs does want to perform a lookup but doesn't want permission
> > checking, so start_renaming_dentry() cannot be used.
> >
> > This patch introduces start_renaming_two_dentrys() which is given both
> > dentrys. debugfs performs one lookup itself. As it will only continue
> > with a negative dentry and as those cannot be renamed or unlinked, it is
> > safe to do the lookup before getting the rename locks.
> >
> > overlayfs uses start_renaming_two_dentrys() in three places and selinux
> > uses it twice in sel_make_policy_nodes().
> >
>
> start_renaming_two_dentries() please
I don't really like "two_dentries" as you wouldn't find it when
searching for "dentry". But maybe that doesn't matter. I can't think
of a better name so I've made the change as you suggest.
>
> > Signed-off-by: NeilBrown <neil@brown.name>
> > ---
> > fs/debugfs/inode.c | 48 +++++++++++++--------------
> > fs/namei.c | 63 ++++++++++++++++++++++++++++++++++++
> > fs/overlayfs/dir.c | 42 ++++++++++++++++--------
> > include/linux/namei.h | 2 ++
> > security/selinux/selinuxfs.c | 27 ++++++++++------
> > 5 files changed, 133 insertions(+), 49 deletions(-)
> >
> > diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
> > index b863c8d0cbcd..2aad67b8174e 100644
> > --- a/fs/debugfs/inode.c
> > +++ b/fs/debugfs/inode.c
> > @@ -842,7 +842,8 @@ int __printf(2, 3) debugfs_change_name(struct dentry *dentry, const char *fmt, .
> > int error = 0;
> > const char *new_name;
> > struct name_snapshot old_name;
> > - struct dentry *parent, *target;
> > + struct dentry *target;
> > + struct renamedata rd = {};
> > struct inode *dir;
> > va_list ap;
> >
> > @@ -855,36 +856,31 @@ int __printf(2, 3) debugfs_change_name(struct dentry *dentry, const char *fmt, .
> > if (!new_name)
> > return -ENOMEM;
> >
> > - parent = dget_parent(dentry);
> > - dir = d_inode(parent);
> > - inode_lock(dir);
> > + rd.old_parent = dget_parent(dentry);
> > + rd.new_parent = rd.old_parent;
> > + rd.flags = RENAME_NOREPLACE;
> > + target = lookup_noperm_unlocked(&QSTR(new_name), rd.new_parent);
> > + if (IS_ERR(target))
> > + return PTR_ERR(target);
> >
> > - take_dentry_name_snapshot(&old_name, dentry);
> > -
> > - if (WARN_ON_ONCE(dentry->d_parent != parent)) {
> > - error = -EINVAL;
> > - goto out;
> > - }
> > - if (strcmp(old_name.name.name, new_name) == 0)
> > - goto out;
> > - target = lookup_noperm(&QSTR(new_name), parent);
> > - if (IS_ERR(target)) {
> > - error = PTR_ERR(target);
> > - goto out;
> > - }
> > - if (d_really_is_positive(target)) {
> > - dput(target);
> > - error = -EINVAL;
> > + error = start_renaming_two_dentrys(&rd, dentry, target);
> > + if (error) {
> > + if (error == -EEXIST && target == dentry)
> > + /* it isn't an error to rename a thing to itself */
> > + error = 0;
> > goto out;
> > }
> > - simple_rename_timestamp(dir, dentry, dir, target);
> > - d_move(dentry, target);
> > - dput(target);
> > +
> > + dir = d_inode(rd.old_parent);
> > + take_dentry_name_snapshot(&old_name, dentry);
> > + simple_rename_timestamp(dir, dentry, dir, rd.new_dentry);
> > + d_move(dentry, rd.new_dentry);
> > fsnotify_move(dir, dir, &old_name.name, d_is_dir(dentry), NULL, dentry);
> > -out:
> > release_dentry_name_snapshot(&old_name);
> > - inode_unlock(dir);
> > - dput(parent);
> > + end_renaming(&rd);
> > +out:
> > + dput(rd.old_parent);
> > + dput(target);
> > kfree_const(new_name);
> > return error;
> > }
> > diff --git a/fs/namei.c b/fs/namei.c
> > index aca6de83d255..23f9adb43401 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -3892,6 +3892,69 @@ int start_renaming_dentry(struct renamedata *rd, int lookup_flags,
> > return __start_renaming_dentry(rd, lookup_flags, old_dentry, new_last);
> > }
> >
> > +/**
> > + * start_renaming_two_dentrys - Lock to dentries in given parents for rename
>
> two_dentries please
>
> > + * @rd: rename data containing parent
> > + * @old_dentry: dentry of name to move
> > + * @new_dentry: dentry to move to
> > + *
> > + * Ensure locks are in place for rename and check parentage is still correct.
> > + *
> > + * On success the two dentrys are stored in @rd.old_dentry and @rd.new_dentry and
> > + * @rd.old_parent and @rd.new_parent are confirmed to be the parents of the dentruies.
>
> typo: dentruies
>
> > + *
> > + * References and the lock can be dropped with end_renaming()
> > + *
> > + * Returns: zero or an error.
> > + */
> > +int
> > +start_renaming_two_dentrys(struct renamedata *rd,
> > + struct dentry *old_dentry, struct dentry *new_dentry)
> > +{
> > + struct dentry *trap;
> > + int err;
> > +
> > + /* Already have the dentry - need to be sure to lock the correct parent */
> > + trap = lock_rename_child(old_dentry, rd->new_parent);
> > + if (IS_ERR(trap))
> > + return PTR_ERR(trap);
> > + err = -EINVAL;
> > + if (d_unhashed(old_dentry) ||
> > + (rd->old_parent && rd->old_parent != old_dentry->d_parent))
> > + /* old_dentry was removed, or moved and explicit parent requested */
> > + goto out_unlock;
> > + if (d_unhashed(new_dentry) ||
> > + rd->new_parent != new_dentry->d_parent)
> > + /* new_dentry was removed or moved */
> > + goto out_unlock;
> > +
> > + if (old_dentry == trap)
> > + /* source is an ancestor of target */
> > + goto out_unlock;
> > +
> > + if (new_dentry == trap) {
> > + /* target is an ancestor of source */
> > + if (rd->flags & RENAME_EXCHANGE)
> > + err = -EINVAL;
> > + else
> > + err = -ENOTEMPTY;
> > + goto out_unlock;
> > + }
> > +
> > + err = -EEXIST;
> > + if (d_is_positive(new_dentry) && (rd->flags & RENAME_NOREPLACE))
> > + goto out_unlock;
> > +
> > + rd->old_dentry = dget(old_dentry);
> > + rd->new_dentry = dget(new_dentry);
> > + rd->old_parent = dget(old_dentry->d_parent);
>
> This asymmetry between old_parent and new_parent is especially
> odd with two dentries and particularly with RENAME_EXCHANGE
> where the two dentries are alike.
>
> Is the old_parent ref really needed?
Yes, as end_renaming() needs to know it can still use the ref.
I've added a note to the start_renaming_dentry() to say why the
reference is taken.
>
> > + return 0;
> > +
> > +out_unlock:
> > + unlock_rename(old_dentry->d_parent, rd->new_parent);
> > + return err;
> > +}
>
> needs EXPORT_GPL
Yes, thanks.
>
> > +
> > void end_renaming(struct renamedata *rd)
> > {
> > unlock_rename(rd->old_parent, rd->new_parent);
> > diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> > index 54423ad00e1c..e8c369e3e277 100644
> > --- a/fs/overlayfs/dir.c
> > +++ b/fs/overlayfs/dir.c
> > @@ -125,6 +125,7 @@ int ovl_cleanup_and_whiteout(struct ovl_fs *ofs, struct dentry *dir,
> > struct dentry *dentry)
> > {
> > struct dentry *whiteout;
> > + struct renamedata rd = {};
> > int err;
> > int flags = 0;
> >
> > @@ -136,10 +137,13 @@ int ovl_cleanup_and_whiteout(struct ovl_fs *ofs, struct dentry *dir,
> > if (d_is_dir(dentry))
> > flags = RENAME_EXCHANGE;
> >
> > - err = ovl_lock_rename_workdir(ofs->workdir, whiteout, dir, dentry);
> > + rd.old_parent = ofs->workdir;
> > + rd.new_parent = dir;
> > + rd.flags = flags;
> > + err = start_renaming_two_dentrys(&rd, whiteout, dentry);
> > if (!err) {
> > - err = ovl_do_rename(ofs, ofs->workdir, whiteout, dir, dentry, flags);
> > - unlock_rename(ofs->workdir, dir);
> > + err = ovl_do_rename_rd(&rd);
> > + end_renaming(&rd);
> > }
> > if (err)
> > goto kill_whiteout;
> > @@ -363,6 +367,7 @@ static struct dentry *ovl_clear_empty(struct dentry *dentry,
> > struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
> > struct dentry *workdir = ovl_workdir(dentry);
> > struct dentry *upperdir = ovl_dentry_upper(dentry->d_parent);
> > + struct renamedata rd = {};
> > struct path upperpath;
> > struct dentry *upper;
> > struct dentry *opaquedir;
> > @@ -388,7 +393,11 @@ static struct dentry *ovl_clear_empty(struct dentry *dentry,
> > if (IS_ERR(opaquedir))
> > goto out;
> >
> > - err = ovl_lock_rename_workdir(workdir, opaquedir, upperdir, upper);
> > + rd.mnt_idmap = ovl_upper_mnt_idmap(ofs);
> > + rd.old_parent = workdir;
> > + rd.new_parent = upperdir;
> > + rd.flags = RENAME_EXCHANGE;
> > + err = start_renaming_two_dentrys(&rd, opaquedir, upper);
> > if (err)
> > goto out_cleanup_unlocked;
> >
> > @@ -406,8 +415,8 @@ static struct dentry *ovl_clear_empty(struct dentry *dentry,
> > if (err)
> > goto out_cleanup;
> >
> > - err = ovl_do_rename(ofs, workdir, opaquedir, upperdir, upper, RENAME_EXCHANGE);
> > - unlock_rename(workdir, upperdir);
> > + err = ovl_do_rename_rd(&rd);
> > + end_renaming(&rd);
> > if (err)
> > goto out_cleanup_unlocked;
> >
> > @@ -420,7 +429,7 @@ static struct dentry *ovl_clear_empty(struct dentry *dentry,
> > return opaquedir;
> >
> > out_cleanup:
> > - unlock_rename(workdir, upperdir);
> > + end_renaming(&rd);
> > out_cleanup_unlocked:
> > ovl_cleanup(ofs, workdir, opaquedir);
> > dput(opaquedir);
> > @@ -443,6 +452,7 @@ static int ovl_create_over_whiteout(struct dentry *dentry, struct inode *inode,
> > struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
> > struct dentry *workdir = ovl_workdir(dentry);
> > struct dentry *upperdir = ovl_dentry_upper(dentry->d_parent);
> > + struct renamedata rd = {};
> > struct dentry *upper;
> > struct dentry *newdentry;
> > int err;
> > @@ -474,7 +484,11 @@ static int ovl_create_over_whiteout(struct dentry *dentry, struct inode *inode,
> > if (IS_ERR(newdentry))
> > goto out_dput;
> >
> > - err = ovl_lock_rename_workdir(workdir, newdentry, upperdir, upper);
> > + rd.mnt_idmap = ovl_upper_mnt_idmap(ofs);
> > + rd.old_parent = workdir;
> > + rd.new_parent = upperdir;
> > + rd.flags = 0;
> > + err = start_renaming_two_dentrys(&rd, newdentry, upper);
> > if (err)
> > goto out_cleanup_unlocked;
> >
> > @@ -511,16 +525,16 @@ static int ovl_create_over_whiteout(struct dentry *dentry, struct inode *inode,
> > if (err)
> > goto out_cleanup;
> >
> > - err = ovl_do_rename(ofs, workdir, newdentry, upperdir, upper,
> > - RENAME_EXCHANGE);
> > - unlock_rename(workdir, upperdir);
> > + rd.flags = RENAME_EXCHANGE;
> > + err = ovl_do_rename_rd(&rd);
> > + end_renaming(&rd);
> > if (err)
> > goto out_cleanup_unlocked;
> >
> > ovl_cleanup(ofs, workdir, upper);
> > } else {
> > - err = ovl_do_rename(ofs, workdir, newdentry, upperdir, upper, 0);
> > - unlock_rename(workdir, upperdir);
> > + err = ovl_do_rename_rd(&rd);
> > + end_renaming(&rd);
> > if (err)
> > goto out_cleanup_unlocked;
> > }
> > @@ -540,7 +554,7 @@ static int ovl_create_over_whiteout(struct dentry *dentry, struct inode *inode,
> > return err;
> >
> > out_cleanup:
> > - unlock_rename(workdir, upperdir);
> > + end_renaming(&rd);
> > out_cleanup_unlocked:
> > ovl_cleanup(ofs, workdir, newdentry);
> > dput(newdentry);
>
> ovl changes look fine to me.
>
> with change of helper name and typo fixes feel free to add:
> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Done. Thanks for you help.
NeilBrown
>
>
> Thanks,
> Amir.
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH 11/11] ecryptfs: use new start_creaing/start_removing APIs
2025-09-26 2:49 [PATCH 00/11] Create APIs to centralise locking for directory ops NeilBrown
` (9 preceding siblings ...)
2025-09-26 2:49 ` [PATCH 10/11] Add start_renaming_two_dentrys() NeilBrown
@ 2025-09-26 2:49 ` NeilBrown
2025-09-26 16:03 ` kernel test robot
2025-09-28 12:50 ` Amir Goldstein
2025-09-26 15:47 ` [PATCH 00/11] Create APIs to centralise locking for directory ops Amir Goldstein
11 siblings, 2 replies; 49+ messages in thread
From: NeilBrown @ 2025-09-26 2:49 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
This requires the addition of start_creating_dentry().
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/ecryptfs/inode.c | 153 ++++++++++++++++++++----------------------
fs/namei.c | 41 ++++++++++-
include/linux/namei.h | 2 +
3 files changed, 113 insertions(+), 83 deletions(-)
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index abd954c6a14e..25ef6ea8b150 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -24,18 +24,26 @@
#include <linux/unaligned.h>
#include "ecryptfs_kernel.h"
-static int lock_parent(struct dentry *dentry,
- struct dentry **lower_dentry,
- struct inode **lower_dir)
+static struct dentry *ecryptfs_start_creating_dentry(struct dentry *dentry)
{
- struct dentry *lower_dir_dentry;
+ struct dentry *parent = dget_parent(dentry->d_parent);
+ struct dentry *ret;
- lower_dir_dentry = ecryptfs_dentry_to_lower(dentry->d_parent);
- *lower_dir = d_inode(lower_dir_dentry);
- *lower_dentry = ecryptfs_dentry_to_lower(dentry);
+ ret = start_creating_dentry(ecryptfs_dentry_to_lower(parent),
+ ecryptfs_dentry_to_lower(dentry));
+ dput(parent);
+ return ret;
+}
- inode_lock_nested(*lower_dir, I_MUTEX_PARENT);
- return (*lower_dentry)->d_parent == lower_dir_dentry ? 0 : -EINVAL;
+static struct dentry *ecryptfs_start_removing_dentry(struct dentry *dentry)
+{
+ struct dentry *parent = dget_parent(dentry->d_parent);
+ struct dentry *ret;
+
+ ret = start_removing_dentry(ecryptfs_dentry_to_lower(parent),
+ ecryptfs_dentry_to_lower(dentry));
+ dput(parent);
+ return ret;
}
static int ecryptfs_inode_test(struct inode *inode, void *lower_inode)
@@ -141,15 +149,12 @@ static int ecryptfs_do_unlink(struct inode *dir, struct dentry *dentry,
struct inode *lower_dir;
int rc;
- rc = lock_parent(dentry, &lower_dentry, &lower_dir);
- dget(lower_dentry); // don't even try to make the lower negative
- if (!rc) {
- if (d_unhashed(lower_dentry))
- rc = -EINVAL;
- else
- rc = vfs_unlink(&nop_mnt_idmap, lower_dir, lower_dentry,
- NULL);
- }
+ lower_dentry = ecryptfs_start_removing_dentry(dentry);
+ if (IS_ERR(lower_dentry))
+ return PTR_ERR(lower_dentry);
+
+ lower_dir = lower_dentry->d_parent->d_inode;
+ rc = vfs_unlink(&nop_mnt_idmap, lower_dir, lower_dentry, NULL);
if (rc) {
printk(KERN_ERR "Error in vfs_unlink; rc = [%d]\n", rc);
goto out_unlock;
@@ -158,8 +163,7 @@ static int ecryptfs_do_unlink(struct inode *dir, struct dentry *dentry,
set_nlink(inode, ecryptfs_inode_to_lower(inode)->i_nlink);
inode_set_ctime_to_ts(inode, inode_get_ctime(dir));
out_unlock:
- dput(lower_dentry);
- inode_unlock(lower_dir);
+ end_removing(lower_dentry);
if (!rc)
d_drop(dentry);
return rc;
@@ -186,10 +190,12 @@ ecryptfs_do_create(struct inode *directory_inode,
struct inode *lower_dir;
struct inode *inode;
- rc = lock_parent(ecryptfs_dentry, &lower_dentry, &lower_dir);
- if (!rc)
- rc = vfs_create(&nop_mnt_idmap, lower_dir,
- lower_dentry, mode, true);
+ lower_dentry = ecryptfs_start_creating_dentry(ecryptfs_dentry);
+ if (IS_ERR(lower_dentry))
+ return ERR_CAST(lower_dentry);
+ lower_dir = lower_dentry->d_parent->d_inode;
+ rc = vfs_create(&nop_mnt_idmap, lower_dir,
+ lower_dentry, mode, true);
if (rc) {
printk(KERN_ERR "%s: Failure to create dentry in lower fs; "
"rc = [%d]\n", __func__, rc);
@@ -205,7 +211,7 @@ ecryptfs_do_create(struct inode *directory_inode,
fsstack_copy_attr_times(directory_inode, lower_dir);
fsstack_copy_inode_size(directory_inode, lower_dir);
out_lock:
- inode_unlock(lower_dir);
+ end_creating(lower_dentry, NULL);
return inode;
}
@@ -442,10 +448,12 @@ static int ecryptfs_link(struct dentry *old_dentry, struct inode *dir,
file_size_save = i_size_read(d_inode(old_dentry));
lower_old_dentry = ecryptfs_dentry_to_lower(old_dentry);
- rc = lock_parent(new_dentry, &lower_new_dentry, &lower_dir);
- if (!rc)
- rc = vfs_link(lower_old_dentry, &nop_mnt_idmap, lower_dir,
- lower_new_dentry, NULL);
+ lower_new_dentry = ecryptfs_start_creating_dentry(new_dentry);
+ if (IS_ERR(lower_new_dentry))
+ return PTR_ERR(lower_new_dentry);
+ lower_dir = lower_new_dentry->d_parent->d_inode;
+ rc = vfs_link(lower_old_dentry, &nop_mnt_idmap, lower_dir,
+ lower_new_dentry, NULL);
if (rc || d_really_is_negative(lower_new_dentry))
goto out_lock;
rc = ecryptfs_interpose(lower_new_dentry, new_dentry, dir->i_sb);
@@ -457,7 +465,7 @@ static int ecryptfs_link(struct dentry *old_dentry, struct inode *dir,
ecryptfs_inode_to_lower(d_inode(old_dentry))->i_nlink);
i_size_write(d_inode(new_dentry), file_size_save);
out_lock:
- inode_unlock(lower_dir);
+ end_creating(lower_new_dentry, NULL);
return rc;
}
@@ -477,9 +485,11 @@ static int ecryptfs_symlink(struct mnt_idmap *idmap,
size_t encoded_symlen;
struct ecryptfs_mount_crypt_stat *mount_crypt_stat = NULL;
- rc = lock_parent(dentry, &lower_dentry, &lower_dir);
- if (rc)
- goto out_lock;
+ lower_dentry = ecryptfs_start_creating_dentry(dentry);
+ if (IS_ERR(lower_dentry))
+ return PTR_ERR(lower_dentry);
+ lower_dir = lower_dentry->d_parent->d_inode;
+
mount_crypt_stat = &ecryptfs_superblock_to_private(
dir->i_sb)->mount_crypt_stat;
rc = ecryptfs_encrypt_and_encode_filename(&encoded_symname,
@@ -499,7 +509,7 @@ static int ecryptfs_symlink(struct mnt_idmap *idmap,
fsstack_copy_attr_times(dir, lower_dir);
fsstack_copy_inode_size(dir, lower_dir);
out_lock:
- inode_unlock(lower_dir);
+ end_creating(lower_dentry, NULL);
if (d_really_is_negative(dentry))
d_drop(dentry);
return rc;
@@ -510,12 +520,14 @@ static struct dentry *ecryptfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
{
int rc;
struct dentry *lower_dentry;
+ struct dentry *lower_dir_dentry;
struct inode *lower_dir;
- rc = lock_parent(dentry, &lower_dentry, &lower_dir);
- if (rc)
- goto out;
-
+ lower_dentry = ecryptfs_start_creating_dentry(dentry);
+ if (IS_ERR(lower_dentry))
+ return lower_dentry;
+ lower_dir_dentry = dget(lower_dentry->d_parent);
+ lower_dir = lower_dir_dentry->d_inode;
lower_dentry = vfs_mkdir(&nop_mnt_idmap, lower_dir,
lower_dentry, mode);
rc = PTR_ERR(lower_dentry);
@@ -531,7 +543,7 @@ static struct dentry *ecryptfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
fsstack_copy_inode_size(dir, lower_dir);
set_nlink(dir, lower_dir->i_nlink);
out:
- inode_unlock(lower_dir);
+ end_creating(lower_dentry, lower_dir_dentry);
if (d_really_is_negative(dentry))
d_drop(dentry);
return ERR_PTR(rc);
@@ -543,21 +555,18 @@ static int ecryptfs_rmdir(struct inode *dir, struct dentry *dentry)
struct inode *lower_dir;
int rc;
- rc = lock_parent(dentry, &lower_dentry, &lower_dir);
- dget(lower_dentry); // don't even try to make the lower negative
- if (!rc) {
- if (d_unhashed(lower_dentry))
- rc = -EINVAL;
- else
- rc = vfs_rmdir(&nop_mnt_idmap, lower_dir, lower_dentry);
- }
+ lower_dentry = ecryptfs_start_removing_dentry(dentry);
+ if (IS_ERR(lower_dentry))
+ return PTR_ERR(lower_dentry);
+ lower_dir = lower_dentry->d_parent->d_inode;
+
+ rc = vfs_rmdir(&nop_mnt_idmap, lower_dir, lower_dentry);
if (!rc) {
clear_nlink(d_inode(dentry));
fsstack_copy_attr_times(dir, lower_dir);
set_nlink(dir, lower_dir->i_nlink);
}
- dput(lower_dentry);
- inode_unlock(lower_dir);
+ end_removing(lower_dentry);
if (!rc)
d_drop(dentry);
return rc;
@@ -571,10 +580,12 @@ ecryptfs_mknod(struct mnt_idmap *idmap, struct inode *dir,
struct dentry *lower_dentry;
struct inode *lower_dir;
- rc = lock_parent(dentry, &lower_dentry, &lower_dir);
- if (!rc)
- rc = vfs_mknod(&nop_mnt_idmap, lower_dir,
- lower_dentry, mode, dev);
+ lower_dentry = ecryptfs_start_creating_dentry(dentry);
+ if (IS_ERR(lower_dentry))
+ return PTR_ERR(lower_dentry);
+ lower_dir = lower_dentry->d_parent->d_inode;
+
+ rc = vfs_mknod(&nop_mnt_idmap, lower_dir, lower_dentry, mode, dev);
if (rc || d_really_is_negative(lower_dentry))
goto out;
rc = ecryptfs_interpose(lower_dentry, dentry, dir->i_sb);
@@ -583,7 +594,7 @@ ecryptfs_mknod(struct mnt_idmap *idmap, struct inode *dir,
fsstack_copy_attr_times(dir, lower_dir);
fsstack_copy_inode_size(dir, lower_dir);
out:
- inode_unlock(lower_dir);
+ end_removing(lower_dentry);
if (d_really_is_negative(dentry))
d_drop(dentry);
return rc;
@@ -599,7 +610,6 @@ ecryptfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
struct dentry *lower_new_dentry;
struct dentry *lower_old_dir_dentry;
struct dentry *lower_new_dir_dentry;
- struct dentry *trap;
struct inode *target_inode;
struct renamedata rd = {};
@@ -614,31 +624,13 @@ ecryptfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
target_inode = d_inode(new_dentry);
- trap = lock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
- if (IS_ERR(trap))
- return PTR_ERR(trap);
- dget(lower_new_dentry);
- rc = -EINVAL;
- if (lower_old_dentry->d_parent != lower_old_dir_dentry)
- goto out_lock;
- if (lower_new_dentry->d_parent != lower_new_dir_dentry)
- goto out_lock;
- if (d_unhashed(lower_old_dentry) || d_unhashed(lower_new_dentry))
- goto out_lock;
- /* source should not be ancestor of target */
- if (trap == lower_old_dentry)
- goto out_lock;
- /* target should not be ancestor of source */
- if (trap == lower_new_dentry) {
- rc = -ENOTEMPTY;
- goto out_lock;
- }
+ rd.mnt_idmap = &nop_mnt_idmap;
+ rd.old_parent = lower_old_dir_dentry;
+ rd.new_parent = lower_new_dir_dentry;
+ rc = start_renaming_two_dentry(&rd, lower_old_dentry, lower_new_dentry);
+ if (rc)
+ return rc;
- rd.mnt_idmap = &nop_mnt_idmap;
- rd.old_parent = lower_old_dir_dentry;
- rd.old_dentry = lower_old_dentry;
- rd.new_parent = lower_new_dir_dentry;
- rd.new_dentry = lower_new_dentry;
rc = vfs_rename(&rd);
if (rc)
goto out_lock;
@@ -649,8 +641,7 @@ ecryptfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
if (new_dir != old_dir)
fsstack_copy_attr_all(old_dir, d_inode(lower_old_dir_dentry));
out_lock:
- dput(lower_new_dentry);
- unlock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
+ end_renaming(&rd);
return rc;
}
diff --git a/fs/namei.c b/fs/namei.c
index 23f9adb43401..80a687a95da0 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3418,6 +3418,39 @@ struct dentry *start_removing_noperm(struct dentry *parent,
}
EXPORT_SYMBOL(start_removing_noperm);
+/**
+ * start_creating_dentry - prepare to create a given dentry
+ * @parent - directory from which dentry should be removed
+ * @child - the dentry to be removed
+ *
+ * A lock is taken to protect the dentry again other dirops and
+ * the validity of the dentry is checked: correct parent and still hashed.
+ *
+ * If the dentry is valid and negative a reference is taken and
+ * returned. If not an error is returned.
+ *
+ * end_creating() should be called when creation is complete, or aborted.
+ *
+ * Returns: the valid dentry, or an error.
+ */
+struct dentry *start_creating_dentry(struct dentry *parent,
+ struct dentry *child)
+{
+ inode_lock_nested(parent->d_inode, I_MUTEX_PARENT);
+ if (unlikely(IS_DEADDIR(parent->d_inode) ||
+ child->d_parent != parent ||
+ d_unhashed(child))) {
+ inode_unlock(parent->d_inode);
+ return ERR_PTR(-EINVAL);
+ }
+ if (d_is_positive(child)) {
+ inode_unlock(parent->d_inode);
+ return ERR_PTR(-EEXIST);
+ }
+ return dget(child);
+}
+EXPORT_SYMBOL(start_creating_dentry);
+
/**
* start_removing_dentry - prepare to remove a given dentry
* @parent - directory from which dentry should be removed
@@ -3426,8 +3459,8 @@ EXPORT_SYMBOL(start_removing_noperm);
* A lock is taken to protect the dentry again other dirops and
* the validity of the dentry is checked: correct parent and still hashed.
*
- * If the dentry is valid a reference is taken and returned. If not
- * an error is returned.
+ * If the dentry is valid and positive a reference is taken and
+ * returned. If not an error is returned.
*
* end_removing() should be called when removal is complete, or aborted.
*
@@ -3443,6 +3476,10 @@ struct dentry *start_removing_dentry(struct dentry *parent,
inode_unlock(parent->d_inode);
return ERR_PTR(-EINVAL);
}
+ if (d_is_negative(child)) {
+ inode_unlock(parent->d_inode);
+ return ERR_PTR(-ENOENT);
+ }
return dget(child);
}
EXPORT_SYMBOL(start_removing_dentry);
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 434b10476e40..7ed299567da8 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -100,6 +100,8 @@ struct dentry *start_removing_killable(struct mnt_idmap *idmap,
struct qstr *name);
struct dentry *start_creating_noperm(struct dentry *parent, struct qstr *name);
struct dentry *start_removing_noperm(struct dentry *parent, struct qstr *name);
+struct dentry *start_creating_dentry(struct dentry *parent,
+ struct dentry *child);
struct dentry *start_removing_dentry(struct dentry *parent,
struct dentry *child);
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 49+ messages in thread* Re: [PATCH 11/11] ecryptfs: use new start_creaing/start_removing APIs
2025-09-26 2:49 ` [PATCH 11/11] ecryptfs: use new start_creaing/start_removing APIs NeilBrown
@ 2025-09-26 16:03 ` kernel test robot
2025-09-28 12:50 ` Amir Goldstein
1 sibling, 0 replies; 49+ messages in thread
From: kernel test robot @ 2025-09-26 16:03 UTC (permalink / raw)
To: NeilBrown, Alexander Viro, Christian Brauner, Amir Goldstein,
Jeff Layton
Cc: oe-kbuild-all, Jan Kara, linux-fsdevel
Hi NeilBrown,
kernel test robot noticed the following build errors:
[auto build test ERROR on brauner-vfs/vfs.all]
[also build test ERROR on next-20250925]
[cannot apply to driver-core/driver-core-testing driver-core/driver-core-next driver-core/driver-core-linus viro-vfs/for-next linus/master v6.17-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/NeilBrown/debugfs-rename-end_creating-to-debugfs_end_creating/20250926-105302
base: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git vfs.all
patch link: https://lore.kernel.org/r/20250926025015.1747294-12-neilb%40ownmail.net
patch subject: [PATCH 11/11] ecryptfs: use new start_creaing/start_removing APIs
config: x86_64-buildonly-randconfig-003-20250926 (https://download.01.org/0day-ci/archive/20250926/202509262333.TsoLDUkJ-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250926/202509262333.TsoLDUkJ-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202509262333.TsoLDUkJ-lkp@intel.com/
All error/warnings (new ones prefixed by >>):
Warning: fs/namei.c:2815 function parameter 'de' not described in 'end_dirop'
Warning: fs/namei.c:2836 function parameter 'de' not described in 'end_dirop_mkdir'
Warning: fs/namei.c:2836 function parameter 'parent' not described in 'end_dirop_mkdir'
Warning: fs/namei.c:3276 function parameter 'idmap' not described in 'start_creating'
Warning: fs/namei.c:3276 function parameter 'parent' not described in 'start_creating'
Warning: fs/namei.c:3276 function parameter 'name' not described in 'start_creating'
Warning: fs/namei.c:3303 function parameter 'idmap' not described in 'start_removing'
Warning: fs/namei.c:3303 function parameter 'parent' not described in 'start_removing'
Warning: fs/namei.c:3303 function parameter 'name' not described in 'start_removing'
Warning: fs/namei.c:3332 function parameter 'idmap' not described in 'start_creating_killable'
Warning: fs/namei.c:3332 function parameter 'parent' not described in 'start_creating_killable'
Warning: fs/namei.c:3332 function parameter 'name' not described in 'start_creating_killable'
Warning: fs/namei.c:3363 function parameter 'idmap' not described in 'start_removing_killable'
Warning: fs/namei.c:3363 function parameter 'parent' not described in 'start_removing_killable'
Warning: fs/namei.c:3363 function parameter 'name' not described in 'start_removing_killable'
Warning: fs/namei.c:3386 function parameter 'parent' not described in 'start_creating_noperm'
Warning: fs/namei.c:3386 function parameter 'name' not described in 'start_creating_noperm'
Warning: fs/namei.c:3411 function parameter 'parent' not described in 'start_removing_noperm'
Warning: fs/namei.c:3411 function parameter 'name' not described in 'start_removing_noperm'
>> Warning: fs/namei.c:3437 function parameter 'parent' not described in 'start_creating_dentry'
>> Warning: fs/namei.c:3437 function parameter 'child' not described in 'start_creating_dentry'
Warning: fs/namei.c:3470 function parameter 'parent' not described in 'start_removing_dentry'
Warning: fs/namei.c:3470 function parameter 'child' not described in 'start_removing_dentry'
--
fs/ecryptfs/inode.c: In function 'ecryptfs_rename':
>> fs/ecryptfs/inode.c:630:14: error: implicit declaration of function 'start_renaming_two_dentry'; did you mean 'start_renaming_two_dentrys'? [-Wimplicit-function-declaration]
630 | rc = start_renaming_two_dentry(&rd, lower_old_dentry, lower_new_dentry);
| ^~~~~~~~~~~~~~~~~~~~~~~~~
| start_renaming_two_dentrys
vim +630 fs/ecryptfs/inode.c
602
603 static int
604 ecryptfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
605 struct dentry *old_dentry, struct inode *new_dir,
606 struct dentry *new_dentry, unsigned int flags)
607 {
608 int rc;
609 struct dentry *lower_old_dentry;
610 struct dentry *lower_new_dentry;
611 struct dentry *lower_old_dir_dentry;
612 struct dentry *lower_new_dir_dentry;
613 struct inode *target_inode;
614 struct renamedata rd = {};
615
616 if (flags)
617 return -EINVAL;
618
619 lower_old_dir_dentry = ecryptfs_dentry_to_lower(old_dentry->d_parent);
620 lower_new_dir_dentry = ecryptfs_dentry_to_lower(new_dentry->d_parent);
621
622 lower_old_dentry = ecryptfs_dentry_to_lower(old_dentry);
623 lower_new_dentry = ecryptfs_dentry_to_lower(new_dentry);
624
625 target_inode = d_inode(new_dentry);
626
627 rd.mnt_idmap = &nop_mnt_idmap;
628 rd.old_parent = lower_old_dir_dentry;
629 rd.new_parent = lower_new_dir_dentry;
> 630 rc = start_renaming_two_dentry(&rd, lower_old_dentry, lower_new_dentry);
631 if (rc)
632 return rc;
633
634 rc = vfs_rename(&rd);
635 if (rc)
636 goto out_lock;
637 if (target_inode)
638 fsstack_copy_attr_all(target_inode,
639 ecryptfs_inode_to_lower(target_inode));
640 fsstack_copy_attr_all(new_dir, d_inode(lower_new_dir_dentry));
641 if (new_dir != old_dir)
642 fsstack_copy_attr_all(old_dir, d_inode(lower_old_dir_dentry));
643 out_lock:
644 end_renaming(&rd);
645 return rc;
646 }
647
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH 11/11] ecryptfs: use new start_creaing/start_removing APIs
2025-09-26 2:49 ` [PATCH 11/11] ecryptfs: use new start_creaing/start_removing APIs NeilBrown
2025-09-26 16:03 ` kernel test robot
@ 2025-09-28 12:50 ` Amir Goldstein
2025-09-29 5:26 ` NeilBrown
1 sibling, 1 reply; 49+ messages in thread
From: Amir Goldstein @ 2025-09-28 12:50 UTC (permalink / raw)
To: NeilBrown
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Fri, Sep 26, 2025 at 4:51 AM NeilBrown <neilb@ownmail.net> wrote:
>
> From: NeilBrown <neil@brown.name>
>
> This requires the addition of start_creating_dentry().
>
> Signed-off-by: NeilBrown <neil@brown.name>
> ---
> fs/ecryptfs/inode.c | 153 ++++++++++++++++++++----------------------
> fs/namei.c | 41 ++++++++++-
> include/linux/namei.h | 2 +
> 3 files changed, 113 insertions(+), 83 deletions(-)
>
> diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
> index abd954c6a14e..25ef6ea8b150 100644
> --- a/fs/ecryptfs/inode.c
> +++ b/fs/ecryptfs/inode.c
> @@ -24,18 +24,26 @@
> #include <linux/unaligned.h>
> #include "ecryptfs_kernel.h"
>
> -static int lock_parent(struct dentry *dentry,
> - struct dentry **lower_dentry,
> - struct inode **lower_dir)
> +static struct dentry *ecryptfs_start_creating_dentry(struct dentry *dentry)
> {
> - struct dentry *lower_dir_dentry;
> + struct dentry *parent = dget_parent(dentry->d_parent);
> + struct dentry *ret;
>
> - lower_dir_dentry = ecryptfs_dentry_to_lower(dentry->d_parent);
> - *lower_dir = d_inode(lower_dir_dentry);
> - *lower_dentry = ecryptfs_dentry_to_lower(dentry);
> + ret = start_creating_dentry(ecryptfs_dentry_to_lower(parent),
> + ecryptfs_dentry_to_lower(dentry));
> + dput(parent);
> + return ret;
> +}
>
> - inode_lock_nested(*lower_dir, I_MUTEX_PARENT);
> - return (*lower_dentry)->d_parent == lower_dir_dentry ? 0 : -EINVAL;
> +static struct dentry *ecryptfs_start_removing_dentry(struct dentry *dentry)
> +{
> + struct dentry *parent = dget_parent(dentry->d_parent);
> + struct dentry *ret;
> +
> + ret = start_removing_dentry(ecryptfs_dentry_to_lower(parent),
> + ecryptfs_dentry_to_lower(dentry));
> + dput(parent);
> + return ret;
> }
>
> static int ecryptfs_inode_test(struct inode *inode, void *lower_inode)
> @@ -141,15 +149,12 @@ static int ecryptfs_do_unlink(struct inode *dir, struct dentry *dentry,
> struct inode *lower_dir;
> int rc;
>
> - rc = lock_parent(dentry, &lower_dentry, &lower_dir);
> - dget(lower_dentry); // don't even try to make the lower negative
> - if (!rc) {
> - if (d_unhashed(lower_dentry))
> - rc = -EINVAL;
> - else
> - rc = vfs_unlink(&nop_mnt_idmap, lower_dir, lower_dentry,
> - NULL);
> - }
> + lower_dentry = ecryptfs_start_removing_dentry(dentry);
> + if (IS_ERR(lower_dentry))
> + return PTR_ERR(lower_dentry);
> +
> + lower_dir = lower_dentry->d_parent->d_inode;
> + rc = vfs_unlink(&nop_mnt_idmap, lower_dir, lower_dentry, NULL);
> if (rc) {
> printk(KERN_ERR "Error in vfs_unlink; rc = [%d]\n", rc);
> goto out_unlock;
> @@ -158,8 +163,7 @@ static int ecryptfs_do_unlink(struct inode *dir, struct dentry *dentry,
> set_nlink(inode, ecryptfs_inode_to_lower(inode)->i_nlink);
> inode_set_ctime_to_ts(inode, inode_get_ctime(dir));
> out_unlock:
> - dput(lower_dentry);
> - inode_unlock(lower_dir);
> + end_removing(lower_dentry);
> if (!rc)
> d_drop(dentry);
> return rc;
> @@ -186,10 +190,12 @@ ecryptfs_do_create(struct inode *directory_inode,
> struct inode *lower_dir;
> struct inode *inode;
>
> - rc = lock_parent(ecryptfs_dentry, &lower_dentry, &lower_dir);
> - if (!rc)
> - rc = vfs_create(&nop_mnt_idmap, lower_dir,
> - lower_dentry, mode, true);
> + lower_dentry = ecryptfs_start_creating_dentry(ecryptfs_dentry);
> + if (IS_ERR(lower_dentry))
> + return ERR_CAST(lower_dentry);
> + lower_dir = lower_dentry->d_parent->d_inode;
> + rc = vfs_create(&nop_mnt_idmap, lower_dir,
> + lower_dentry, mode, true);
> if (rc) {
> printk(KERN_ERR "%s: Failure to create dentry in lower fs; "
> "rc = [%d]\n", __func__, rc);
> @@ -205,7 +211,7 @@ ecryptfs_do_create(struct inode *directory_inode,
> fsstack_copy_attr_times(directory_inode, lower_dir);
> fsstack_copy_inode_size(directory_inode, lower_dir);
> out_lock:
> - inode_unlock(lower_dir);
> + end_creating(lower_dentry, NULL);
These calls were surprising to me.
I did not recall any documentation that @parent could be NULL
when calling end_creating(). In fact, the documentation specifically
says that it should be the parent used for start_creating().
So either introduce end_creating_dentry(), which makes it clear
that it does not take an ERR_PTR child,
Or add WARN_ON to end_creating() in case it is called with NULL
parent and an ERR_PTR child to avoid dereferencing parent->d_inode
in that case.
Thanks,
Amir.
> return inode;
> }
>
> @@ -442,10 +448,12 @@ static int ecryptfs_link(struct dentry *old_dentry, struct inode *dir,
>
> file_size_save = i_size_read(d_inode(old_dentry));
> lower_old_dentry = ecryptfs_dentry_to_lower(old_dentry);
> - rc = lock_parent(new_dentry, &lower_new_dentry, &lower_dir);
> - if (!rc)
> - rc = vfs_link(lower_old_dentry, &nop_mnt_idmap, lower_dir,
> - lower_new_dentry, NULL);
> + lower_new_dentry = ecryptfs_start_creating_dentry(new_dentry);
> + if (IS_ERR(lower_new_dentry))
> + return PTR_ERR(lower_new_dentry);
> + lower_dir = lower_new_dentry->d_parent->d_inode;
> + rc = vfs_link(lower_old_dentry, &nop_mnt_idmap, lower_dir,
> + lower_new_dentry, NULL);
> if (rc || d_really_is_negative(lower_new_dentry))
> goto out_lock;
> rc = ecryptfs_interpose(lower_new_dentry, new_dentry, dir->i_sb);
> @@ -457,7 +465,7 @@ static int ecryptfs_link(struct dentry *old_dentry, struct inode *dir,
> ecryptfs_inode_to_lower(d_inode(old_dentry))->i_nlink);
> i_size_write(d_inode(new_dentry), file_size_save);
> out_lock:
> - inode_unlock(lower_dir);
> + end_creating(lower_new_dentry, NULL);
> return rc;
> }
>
> @@ -477,9 +485,11 @@ static int ecryptfs_symlink(struct mnt_idmap *idmap,
> size_t encoded_symlen;
> struct ecryptfs_mount_crypt_stat *mount_crypt_stat = NULL;
>
> - rc = lock_parent(dentry, &lower_dentry, &lower_dir);
> - if (rc)
> - goto out_lock;
> + lower_dentry = ecryptfs_start_creating_dentry(dentry);
> + if (IS_ERR(lower_dentry))
> + return PTR_ERR(lower_dentry);
> + lower_dir = lower_dentry->d_parent->d_inode;
> +
> mount_crypt_stat = &ecryptfs_superblock_to_private(
> dir->i_sb)->mount_crypt_stat;
> rc = ecryptfs_encrypt_and_encode_filename(&encoded_symname,
> @@ -499,7 +509,7 @@ static int ecryptfs_symlink(struct mnt_idmap *idmap,
> fsstack_copy_attr_times(dir, lower_dir);
> fsstack_copy_inode_size(dir, lower_dir);
> out_lock:
> - inode_unlock(lower_dir);
> + end_creating(lower_dentry, NULL);
> if (d_really_is_negative(dentry))
> d_drop(dentry);
> return rc;
> @@ -510,12 +520,14 @@ static struct dentry *ecryptfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
> {
> int rc;
> struct dentry *lower_dentry;
> + struct dentry *lower_dir_dentry;
> struct inode *lower_dir;
>
> - rc = lock_parent(dentry, &lower_dentry, &lower_dir);
> - if (rc)
> - goto out;
> -
> + lower_dentry = ecryptfs_start_creating_dentry(dentry);
> + if (IS_ERR(lower_dentry))
> + return lower_dentry;
> + lower_dir_dentry = dget(lower_dentry->d_parent);
> + lower_dir = lower_dir_dentry->d_inode;
> lower_dentry = vfs_mkdir(&nop_mnt_idmap, lower_dir,
> lower_dentry, mode);
> rc = PTR_ERR(lower_dentry);
> @@ -531,7 +543,7 @@ static struct dentry *ecryptfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
> fsstack_copy_inode_size(dir, lower_dir);
> set_nlink(dir, lower_dir->i_nlink);
> out:
> - inode_unlock(lower_dir);
> + end_creating(lower_dentry, lower_dir_dentry);
> if (d_really_is_negative(dentry))
> d_drop(dentry);
> return ERR_PTR(rc);
> @@ -543,21 +555,18 @@ static int ecryptfs_rmdir(struct inode *dir, struct dentry *dentry)
> struct inode *lower_dir;
> int rc;
>
> - rc = lock_parent(dentry, &lower_dentry, &lower_dir);
> - dget(lower_dentry); // don't even try to make the lower negative
> - if (!rc) {
> - if (d_unhashed(lower_dentry))
> - rc = -EINVAL;
> - else
> - rc = vfs_rmdir(&nop_mnt_idmap, lower_dir, lower_dentry);
> - }
> + lower_dentry = ecryptfs_start_removing_dentry(dentry);
> + if (IS_ERR(lower_dentry))
> + return PTR_ERR(lower_dentry);
> + lower_dir = lower_dentry->d_parent->d_inode;
> +
> + rc = vfs_rmdir(&nop_mnt_idmap, lower_dir, lower_dentry);
> if (!rc) {
> clear_nlink(d_inode(dentry));
> fsstack_copy_attr_times(dir, lower_dir);
> set_nlink(dir, lower_dir->i_nlink);
> }
> - dput(lower_dentry);
> - inode_unlock(lower_dir);
> + end_removing(lower_dentry);
> if (!rc)
> d_drop(dentry);
> return rc;
> @@ -571,10 +580,12 @@ ecryptfs_mknod(struct mnt_idmap *idmap, struct inode *dir,
> struct dentry *lower_dentry;
> struct inode *lower_dir;
>
> - rc = lock_parent(dentry, &lower_dentry, &lower_dir);
> - if (!rc)
> - rc = vfs_mknod(&nop_mnt_idmap, lower_dir,
> - lower_dentry, mode, dev);
> + lower_dentry = ecryptfs_start_creating_dentry(dentry);
> + if (IS_ERR(lower_dentry))
> + return PTR_ERR(lower_dentry);
> + lower_dir = lower_dentry->d_parent->d_inode;
> +
> + rc = vfs_mknod(&nop_mnt_idmap, lower_dir, lower_dentry, mode, dev);
> if (rc || d_really_is_negative(lower_dentry))
> goto out;
> rc = ecryptfs_interpose(lower_dentry, dentry, dir->i_sb);
> @@ -583,7 +594,7 @@ ecryptfs_mknod(struct mnt_idmap *idmap, struct inode *dir,
> fsstack_copy_attr_times(dir, lower_dir);
> fsstack_copy_inode_size(dir, lower_dir);
> out:
> - inode_unlock(lower_dir);
> + end_removing(lower_dentry);
> if (d_really_is_negative(dentry))
> d_drop(dentry);
> return rc;
> @@ -599,7 +610,6 @@ ecryptfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
> struct dentry *lower_new_dentry;
> struct dentry *lower_old_dir_dentry;
> struct dentry *lower_new_dir_dentry;
> - struct dentry *trap;
> struct inode *target_inode;
> struct renamedata rd = {};
>
> @@ -614,31 +624,13 @@ ecryptfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
>
> target_inode = d_inode(new_dentry);
>
> - trap = lock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
> - if (IS_ERR(trap))
> - return PTR_ERR(trap);
> - dget(lower_new_dentry);
> - rc = -EINVAL;
> - if (lower_old_dentry->d_parent != lower_old_dir_dentry)
> - goto out_lock;
> - if (lower_new_dentry->d_parent != lower_new_dir_dentry)
> - goto out_lock;
> - if (d_unhashed(lower_old_dentry) || d_unhashed(lower_new_dentry))
> - goto out_lock;
> - /* source should not be ancestor of target */
> - if (trap == lower_old_dentry)
> - goto out_lock;
> - /* target should not be ancestor of source */
> - if (trap == lower_new_dentry) {
> - rc = -ENOTEMPTY;
> - goto out_lock;
> - }
> + rd.mnt_idmap = &nop_mnt_idmap;
> + rd.old_parent = lower_old_dir_dentry;
> + rd.new_parent = lower_new_dir_dentry;
> + rc = start_renaming_two_dentry(&rd, lower_old_dentry, lower_new_dentry);
> + if (rc)
> + return rc;
>
> - rd.mnt_idmap = &nop_mnt_idmap;
> - rd.old_parent = lower_old_dir_dentry;
> - rd.old_dentry = lower_old_dentry;
> - rd.new_parent = lower_new_dir_dentry;
> - rd.new_dentry = lower_new_dentry;
> rc = vfs_rename(&rd);
> if (rc)
> goto out_lock;
> @@ -649,8 +641,7 @@ ecryptfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
> if (new_dir != old_dir)
> fsstack_copy_attr_all(old_dir, d_inode(lower_old_dir_dentry));
> out_lock:
> - dput(lower_new_dentry);
> - unlock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
> + end_renaming(&rd);
> return rc;
> }
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 23f9adb43401..80a687a95da0 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3418,6 +3418,39 @@ struct dentry *start_removing_noperm(struct dentry *parent,
> }
> EXPORT_SYMBOL(start_removing_noperm);
>
> +/**
> + * start_creating_dentry - prepare to create a given dentry
> + * @parent - directory from which dentry should be removed
> + * @child - the dentry to be removed
> + *
> + * A lock is taken to protect the dentry again other dirops and
> + * the validity of the dentry is checked: correct parent and still hashed.
> + *
> + * If the dentry is valid and negative a reference is taken and
> + * returned. If not an error is returned.
> + *
> + * end_creating() should be called when creation is complete, or aborted.
> + *
> + * Returns: the valid dentry, or an error.
> + */
> +struct dentry *start_creating_dentry(struct dentry *parent,
> + struct dentry *child)
> +{
> + inode_lock_nested(parent->d_inode, I_MUTEX_PARENT);
> + if (unlikely(IS_DEADDIR(parent->d_inode) ||
> + child->d_parent != parent ||
> + d_unhashed(child))) {
> + inode_unlock(parent->d_inode);
> + return ERR_PTR(-EINVAL);
> + }
> + if (d_is_positive(child)) {
> + inode_unlock(parent->d_inode);
> + return ERR_PTR(-EEXIST);
> + }
> + return dget(child);
> +}
> +EXPORT_SYMBOL(start_creating_dentry);
> +
> /**
> * start_removing_dentry - prepare to remove a given dentry
> * @parent - directory from which dentry should be removed
> @@ -3426,8 +3459,8 @@ EXPORT_SYMBOL(start_removing_noperm);
> * A lock is taken to protect the dentry again other dirops and
> * the validity of the dentry is checked: correct parent and still hashed.
> *
> - * If the dentry is valid a reference is taken and returned. If not
> - * an error is returned.
> + * If the dentry is valid and positive a reference is taken and
> + * returned. If not an error is returned.
> *
> * end_removing() should be called when removal is complete, or aborted.
> *
> @@ -3443,6 +3476,10 @@ struct dentry *start_removing_dentry(struct dentry *parent,
> inode_unlock(parent->d_inode);
> return ERR_PTR(-EINVAL);
> }
> + if (d_is_negative(child)) {
> + inode_unlock(parent->d_inode);
> + return ERR_PTR(-ENOENT);
> + }
> return dget(child);
> }
> EXPORT_SYMBOL(start_removing_dentry);
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 434b10476e40..7ed299567da8 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -100,6 +100,8 @@ struct dentry *start_removing_killable(struct mnt_idmap *idmap,
> struct qstr *name);
> struct dentry *start_creating_noperm(struct dentry *parent, struct qstr *name);
> struct dentry *start_removing_noperm(struct dentry *parent, struct qstr *name);
> +struct dentry *start_creating_dentry(struct dentry *parent,
> + struct dentry *child);
> struct dentry *start_removing_dentry(struct dentry *parent,
> struct dentry *child);
>
> --
> 2.50.0.107.gf914562f5916.dirty
>
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH 11/11] ecryptfs: use new start_creaing/start_removing APIs
2025-09-28 12:50 ` Amir Goldstein
@ 2025-09-29 5:26 ` NeilBrown
2025-09-29 7:53 ` Amir Goldstein
0 siblings, 1 reply; 49+ messages in thread
From: NeilBrown @ 2025-09-29 5:26 UTC (permalink / raw)
To: Amir Goldstein
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Sun, 28 Sep 2025, Amir Goldstein wrote:
> On Fri, Sep 26, 2025 at 4:51 AM NeilBrown <neilb@ownmail.net> wrote:
> >
> > From: NeilBrown <neil@brown.name>
> >
> > This requires the addition of start_creating_dentry().
> >
...
> > @@ -186,10 +190,12 @@ ecryptfs_do_create(struct inode *directory_inode,
> > struct inode *lower_dir;
> > struct inode *inode;
> >
> > - rc = lock_parent(ecryptfs_dentry, &lower_dentry, &lower_dir);
> > - if (!rc)
> > - rc = vfs_create(&nop_mnt_idmap, lower_dir,
> > - lower_dentry, mode, true);
> > + lower_dentry = ecryptfs_start_creating_dentry(ecryptfs_dentry);
> > + if (IS_ERR(lower_dentry))
> > + return ERR_CAST(lower_dentry);
> > + lower_dir = lower_dentry->d_parent->d_inode;
> > + rc = vfs_create(&nop_mnt_idmap, lower_dir,
> > + lower_dentry, mode, true);
> > if (rc) {
> > printk(KERN_ERR "%s: Failure to create dentry in lower fs; "
> > "rc = [%d]\n", __func__, rc);
> > @@ -205,7 +211,7 @@ ecryptfs_do_create(struct inode *directory_inode,
> > fsstack_copy_attr_times(directory_inode, lower_dir);
> > fsstack_copy_inode_size(directory_inode, lower_dir);
> > out_lock:
> > - inode_unlock(lower_dir);
> > + end_creating(lower_dentry, NULL);
>
> These calls were surprising to me.
> I did not recall any documentation that @parent could be NULL
> when calling end_creating(). In fact, the documentation specifically
> says that it should be the parent used for start_creating().
I've updated the documentation for end_creating() say that the parent is
not needed when vfs_mkdir() wasn't used.
>
> So either introduce end_creating_dentry(), which makes it clear
> that it does not take an ERR_PTR child,
it would be end_creating_not_mkdir() :-)
> Or add WARN_ON to end_creating() in case it is called with NULL
> parent and an ERR_PTR child to avoid dereferencing parent->d_inode
> in that case.
I don't think a WARN_ON is particularly useful immediately before a
NULL-pointer dereference.
Thanks for highlighting this - clarification of the documentation is
needed.
NeilBrown
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH 11/11] ecryptfs: use new start_creaing/start_removing APIs
2025-09-29 5:26 ` NeilBrown
@ 2025-09-29 7:53 ` Amir Goldstein
2025-10-01 1:31 ` NeilBrown
0 siblings, 1 reply; 49+ messages in thread
From: Amir Goldstein @ 2025-09-29 7:53 UTC (permalink / raw)
To: NeilBrown
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Mon, Sep 29, 2025 at 7:26 AM NeilBrown <neilb@ownmail.net> wrote:
>
> On Sun, 28 Sep 2025, Amir Goldstein wrote:
> > On Fri, Sep 26, 2025 at 4:51 AM NeilBrown <neilb@ownmail.net> wrote:
> > >
> > > From: NeilBrown <neil@brown.name>
> > >
> > > This requires the addition of start_creating_dentry().
> > >
> ...
> > > @@ -186,10 +190,12 @@ ecryptfs_do_create(struct inode *directory_inode,
> > > struct inode *lower_dir;
> > > struct inode *inode;
> > >
> > > - rc = lock_parent(ecryptfs_dentry, &lower_dentry, &lower_dir);
> > > - if (!rc)
> > > - rc = vfs_create(&nop_mnt_idmap, lower_dir,
> > > - lower_dentry, mode, true);
> > > + lower_dentry = ecryptfs_start_creating_dentry(ecryptfs_dentry);
> > > + if (IS_ERR(lower_dentry))
> > > + return ERR_CAST(lower_dentry);
> > > + lower_dir = lower_dentry->d_parent->d_inode;
> > > + rc = vfs_create(&nop_mnt_idmap, lower_dir,
> > > + lower_dentry, mode, true);
> > > if (rc) {
> > > printk(KERN_ERR "%s: Failure to create dentry in lower fs; "
> > > "rc = [%d]\n", __func__, rc);
> > > @@ -205,7 +211,7 @@ ecryptfs_do_create(struct inode *directory_inode,
> > > fsstack_copy_attr_times(directory_inode, lower_dir);
> > > fsstack_copy_inode_size(directory_inode, lower_dir);
> > > out_lock:
> > > - inode_unlock(lower_dir);
> > > + end_creating(lower_dentry, NULL);
> >
> > These calls were surprising to me.
> > I did not recall any documentation that @parent could be NULL
> > when calling end_creating(). In fact, the documentation specifically
> > says that it should be the parent used for start_creating().
>
> I've updated the documentation for end_creating() say that the parent is
> not needed when vfs_mkdir() wasn't used.
>
This was not what I was aiming for at all.
This is exactly the bad interface that end_dirop_mkdir() was.
A well designed scope interface like strart_XXX/end_XXX should not depend
on what happened between the
start_XXX to end_XXX.
If start_XXX succeeds you MUST call end_XXX
end of story, no ifs and buts and conditional arguments only
if mkdir was called. This is bad IMO.
> >
> > So either introduce end_creating_dentry(), which makes it clear
> > that it does not take an ERR_PTR child,
>
> it would be end_creating_not_mkdir() :-)
>
OK, but that is not the emphasis.
The emphasis is that dentry is not PTR_ERR,
because in all the callers where you pass NULL parent
the error case is checked beforehand.
static inline void end_creating_dentry(struct dentry *child)
{
if (!(WARN_ON(IS_ERR(child))
end_dirop(child);
}
If someone uses end_creating_dentry() after failed mkdir
the assertion would trigger.
> > Or add WARN_ON to end_creating() in case it is called with NULL
> > parent and an ERR_PTR child to avoid dereferencing parent->d_inode
> > in that case.
>
> I don't think a WARN_ON is particularly useful immediately before a
> NULL-pointer dereference.
Of course I did not mean WARN_ON and contoinue to dereference NULL
that's never the correct use of WARN_ON.
static inline void end_creating(struct dentry *child, struct dentry *parent)
{
if (!IS_ERR(child)) {
end_dirop(child);
} else if (!WARN_ON(!parent)) {
/* The parent is still locked despite the error from
* vfs_mkdir() - must unlock it.
*/
inode_unlock(parent->d_inode);
}
}
static inline void end_creating_dentry(struct dentry *child)
{
end_creating(child, NULL);
}
To me, this:
end_creating_dentry(lower_dentry);
Is more clear than this:
end_creating(lower_dentry, NULL);
But my main concern was about adding the assertion
and documenting that @parent may be NULL as long as
it can be deduced from @child->d_parent (right?).
Thanks,
Amir.
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH 11/11] ecryptfs: use new start_creaing/start_removing APIs
2025-09-29 7:53 ` Amir Goldstein
@ 2025-10-01 1:31 ` NeilBrown
2025-10-02 10:25 ` Amir Goldstein
0 siblings, 1 reply; 49+ messages in thread
From: NeilBrown @ 2025-10-01 1:31 UTC (permalink / raw)
To: Amir Goldstein
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Mon, 29 Sep 2025, Amir Goldstein wrote:
> On Mon, Sep 29, 2025 at 7:26 AM NeilBrown <neilb@ownmail.net> wrote:
> >
> > On Sun, 28 Sep 2025, Amir Goldstein wrote:
> > > On Fri, Sep 26, 2025 at 4:51 AM NeilBrown <neilb@ownmail.net> wrote:
> > > >
> > > > From: NeilBrown <neil@brown.name>
> > > >
> > > > This requires the addition of start_creating_dentry().
> > > >
> > ...
> > > > @@ -186,10 +190,12 @@ ecryptfs_do_create(struct inode *directory_inode,
> > > > struct inode *lower_dir;
> > > > struct inode *inode;
> > > >
> > > > - rc = lock_parent(ecryptfs_dentry, &lower_dentry, &lower_dir);
> > > > - if (!rc)
> > > > - rc = vfs_create(&nop_mnt_idmap, lower_dir,
> > > > - lower_dentry, mode, true);
> > > > + lower_dentry = ecryptfs_start_creating_dentry(ecryptfs_dentry);
> > > > + if (IS_ERR(lower_dentry))
> > > > + return ERR_CAST(lower_dentry);
> > > > + lower_dir = lower_dentry->d_parent->d_inode;
> > > > + rc = vfs_create(&nop_mnt_idmap, lower_dir,
> > > > + lower_dentry, mode, true);
> > > > if (rc) {
> > > > printk(KERN_ERR "%s: Failure to create dentry in lower fs; "
> > > > "rc = [%d]\n", __func__, rc);
> > > > @@ -205,7 +211,7 @@ ecryptfs_do_create(struct inode *directory_inode,
> > > > fsstack_copy_attr_times(directory_inode, lower_dir);
> > > > fsstack_copy_inode_size(directory_inode, lower_dir);
> > > > out_lock:
> > > > - inode_unlock(lower_dir);
> > > > + end_creating(lower_dentry, NULL);
> > >
> > > These calls were surprising to me.
> > > I did not recall any documentation that @parent could be NULL
> > > when calling end_creating(). In fact, the documentation specifically
> > > says that it should be the parent used for start_creating().
> >
> > I've updated the documentation for end_creating() say that the parent is
> > not needed when vfs_mkdir() wasn't used.
> >
>
> This was not what I was aiming for at all.
> This is exactly the bad interface that end_dirop_mkdir() was.
There is a reason for that. vfs_mkdir() has a bad interface and somehow
we need to accommodate it. Once we fix vfs_mkdir() the second arg to
end_creating() goes away. Until then we need it, but don't always use
it.
> A well designed scope interface like strart_XXX/end_XXX should not depend
> on what happened between the
> start_XXX to end_XXX.
> If start_XXX succeeds you MUST call end_XXX
And that is what we do.
> end of story, no ifs and buts and conditional arguments only
> if mkdir was called. This is bad IMO.
The practical reality is that the second argument is ignored if
vfs_mkdir() wasn't used. This isn't a function of the design of
end_creating(), it is a function of the design of vfs_mkdir().
>
>
>
> > >
> > > So either introduce end_creating_dentry(), which makes it clear
> > > that it does not take an ERR_PTR child,
> >
> > it would be end_creating_not_mkdir() :-)
> >
>
> OK, but that is not the emphasis.
> The emphasis is that dentry is not PTR_ERR,
> because in all the callers where you pass NULL parent
> the error case is checked beforehand.
No, it all other cases there is there cannot be an error. Only
vfs_mkdir() returns a dentry that might be IS_ERR(), and consume the
dentry that was passed in. All other vfs_foo() return an integer error
and don't consume the dentry.
"vfs_mkdir() was used" and "dentry migth be IS_ERR()" are logically
equivalent statements.
>
> static inline void end_creating_dentry(struct dentry *child)
> {
> if (!(WARN_ON(IS_ERR(child))
> end_dirop(child);
> }
>
> If someone uses end_creating_dentry() after failed mkdir
> the assertion would trigger.
But you NEED end_creating() after a failed vfs_mkdir(). You still need
to unlock the parent.
"end_creating_dentry()" look like it is a pair to
"start_creating_dentry()" but the two are quite unrelated.
>
> > > Or add WARN_ON to end_creating() in case it is called with NULL
> > > parent and an ERR_PTR child to avoid dereferencing parent->d_inode
> > > in that case.
> >
> > I don't think a WARN_ON is particularly useful immediately before a
> > NULL-pointer dereference.
>
> Of course I did not mean WARN_ON and contoinue to dereference NULL
> that's never the correct use of WARN_ON.
>
> static inline void end_creating(struct dentry *child, struct dentry *parent)
> {
> if (!IS_ERR(child)) {
> end_dirop(child);
> } else if (!WARN_ON(!parent)) {
> /* The parent is still locked despite the error from
> * vfs_mkdir() - must unlock it.
> */
> inode_unlock(parent->d_inode);
> }
> }
>
> static inline void end_creating_dentry(struct dentry *child)
> {
> end_creating(child, NULL);
> }
>
> To me, this:
>
> end_creating_dentry(lower_dentry);
>
> Is more clear than this:
>
> end_creating(lower_dentry, NULL);
>
> But my main concern was about adding the assertion
> and documenting that @parent may be NULL as long as
> it can be deduced from @child->d_parent (right?).
If it really bothers you to pass NULL I'll change it to pass the actual
parent.
end_creating(lower_dentry, lower_dentry->d_parent);
Would you find that less bothersome?
Thanks,
NeilBrown
>
> Thanks,
> Amir.
>
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH 11/11] ecryptfs: use new start_creaing/start_removing APIs
2025-10-01 1:31 ` NeilBrown
@ 2025-10-02 10:25 ` Amir Goldstein
0 siblings, 0 replies; 49+ messages in thread
From: Amir Goldstein @ 2025-10-02 10:25 UTC (permalink / raw)
To: NeilBrown
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Wed, Oct 1, 2025 at 3:31 AM NeilBrown <neilb@ownmail.net> wrote:
>
> On Mon, 29 Sep 2025, Amir Goldstein wrote:
> > On Mon, Sep 29, 2025 at 7:26 AM NeilBrown <neilb@ownmail.net> wrote:
> > >
> > > On Sun, 28 Sep 2025, Amir Goldstein wrote:
> > > > On Fri, Sep 26, 2025 at 4:51 AM NeilBrown <neilb@ownmail.net> wrote:
> > > > >
> > > > > From: NeilBrown <neil@brown.name>
> > > > >
> > > > > This requires the addition of start_creating_dentry().
> > > > >
> > > ...
> > > > > @@ -186,10 +190,12 @@ ecryptfs_do_create(struct inode *directory_inode,
> > > > > struct inode *lower_dir;
> > > > > struct inode *inode;
> > > > >
> > > > > - rc = lock_parent(ecryptfs_dentry, &lower_dentry, &lower_dir);
> > > > > - if (!rc)
> > > > > - rc = vfs_create(&nop_mnt_idmap, lower_dir,
> > > > > - lower_dentry, mode, true);
> > > > > + lower_dentry = ecryptfs_start_creating_dentry(ecryptfs_dentry);
> > > > > + if (IS_ERR(lower_dentry))
> > > > > + return ERR_CAST(lower_dentry);
> > > > > + lower_dir = lower_dentry->d_parent->d_inode;
> > > > > + rc = vfs_create(&nop_mnt_idmap, lower_dir,
> > > > > + lower_dentry, mode, true);
> > > > > if (rc) {
> > > > > printk(KERN_ERR "%s: Failure to create dentry in lower fs; "
> > > > > "rc = [%d]\n", __func__, rc);
> > > > > @@ -205,7 +211,7 @@ ecryptfs_do_create(struct inode *directory_inode,
> > > > > fsstack_copy_attr_times(directory_inode, lower_dir);
> > > > > fsstack_copy_inode_size(directory_inode, lower_dir);
> > > > > out_lock:
> > > > > - inode_unlock(lower_dir);
> > > > > + end_creating(lower_dentry, NULL);
> > > >
> > > > These calls were surprising to me.
> > > > I did not recall any documentation that @parent could be NULL
> > > > when calling end_creating(). In fact, the documentation specifically
> > > > says that it should be the parent used for start_creating().
> > >
> > > I've updated the documentation for end_creating() say that the parent is
> > > not needed when vfs_mkdir() wasn't used.
> > >
> >
> > This was not what I was aiming for at all.
> > This is exactly the bad interface that end_dirop_mkdir() was.
>
> There is a reason for that. vfs_mkdir() has a bad interface and somehow
> we need to accommodate it. Once we fix vfs_mkdir() the second arg to
> end_creating() goes away. Until then we need it, but don't always use
> it.
>
ok, as I don't have a suggestion for an easier way to form this series,
if others don't mind this scafolding that goes away at the end,
I won't stand in your way.
>
> > A well designed scope interface like strart_XXX/end_XXX should not depend
> > on what happened between the
> > start_XXX to end_XXX.
> > If start_XXX succeeds you MUST call end_XXX
>
> And that is what we do.
>
> > end of story, no ifs and buts and conditional arguments only
> > if mkdir was called. This is bad IMO.
>
> The practical reality is that the second argument is ignored if
> vfs_mkdir() wasn't used. This isn't a function of the design of
> end_creating(), it is a function of the design of vfs_mkdir().
>
Yeh that's one hell of a weird interface, but I can live with that
if it goes away at the end of this series, not in some future time.
> >
> >
> >
> > > >
> > > > So either introduce end_creating_dentry(), which makes it clear
> > > > that it does not take an ERR_PTR child,
> > >
> > > it would be end_creating_not_mkdir() :-)
> > >
> >
> > OK, but that is not the emphasis.
> > The emphasis is that dentry is not PTR_ERR,
> > because in all the callers where you pass NULL parent
> > the error case is checked beforehand.
>
> No, it all other cases there is there cannot be an error. Only
> vfs_mkdir() returns a dentry that might be IS_ERR(), and consume the
> dentry that was passed in. All other vfs_foo() return an integer error
> and don't consume the dentry.
>
> "vfs_mkdir() was used" and "dentry migth be IS_ERR()" are logically
> equivalent statements.
>
> >
> > static inline void end_creating_dentry(struct dentry *child)
> > {
> > if (!(WARN_ON(IS_ERR(child))
> > end_dirop(child);
> > }
> >
> > If someone uses end_creating_dentry() after failed mkdir
> > the assertion would trigger.
>
> But you NEED end_creating() after a failed vfs_mkdir(). You still need
> to unlock the parent.
>
> "end_creating_dentry()" look like it is a pair to
> "start_creating_dentry()" but the two are quite unrelated.
>
>
> >
> > > > Or add WARN_ON to end_creating() in case it is called with NULL
> > > > parent and an ERR_PTR child to avoid dereferencing parent->d_inode
> > > > in that case.
> > >
> > > I don't think a WARN_ON is particularly useful immediately before a
> > > NULL-pointer dereference.
> >
> > Of course I did not mean WARN_ON and contoinue to dereference NULL
> > that's never the correct use of WARN_ON.
> >
> > static inline void end_creating(struct dentry *child, struct dentry *parent)
> > {
> > if (!IS_ERR(child)) {
> > end_dirop(child);
> > } else if (!WARN_ON(!parent)) {
> > /* The parent is still locked despite the error from
> > * vfs_mkdir() - must unlock it.
> > */
> > inode_unlock(parent->d_inode);
> > }
> > }
> >
> > static inline void end_creating_dentry(struct dentry *child)
> > {
> > end_creating(child, NULL);
> > }
> >
> > To me, this:
> >
> > end_creating_dentry(lower_dentry);
> >
> > Is more clear than this:
> >
> > end_creating(lower_dentry, NULL);
> >
> > But my main concern was about adding the assertion
> > and documenting that @parent may be NULL as long as
> > it can be deduced from @child->d_parent (right?).
>
> If it really bothers you to pass NULL I'll change it to pass the actual
> parent.
> end_creating(lower_dentry, lower_dentry->d_parent);
>
> Would you find that less bothersome?
No, that's not needed.
If parent arg goes away at the end of the series, NULL is fine
and I don't need elaborate documentation for something that
does not exist at the end of the series.
Feel free to add:
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Thanks,
Amir.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 00/11] Create APIs to centralise locking for directory ops
2025-09-26 2:49 [PATCH 00/11] Create APIs to centralise locking for directory ops NeilBrown
` (10 preceding siblings ...)
2025-09-26 2:49 ` [PATCH 11/11] ecryptfs: use new start_creaing/start_removing APIs NeilBrown
@ 2025-09-26 15:47 ` Amir Goldstein
2025-09-27 11:20 ` NeilBrown
11 siblings, 1 reply; 49+ messages in thread
From: Amir Goldstein @ 2025-09-26 15:47 UTC (permalink / raw)
To: NeilBrown
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Fri, Sep 26, 2025 at 4:50 AM NeilBrown <neilb@ownmail.net> wrote:
>
> This is the next batch in my ongoing work to change directory op locking.
>
> The series creates a number of interfaces that combine locking and lookup, or
> sometimes do the locking without lookup.
> After this series there are still a few places where non-VFS code knows
> about the locking rules. Places that call simple_start_creating()
> still have explicit unlock on the parent (I think). Al is doing work
> on those places so I'll wait until he is finished.
> Also there explicit locking one place in nfsd which is changed by an
> in-flight patch. That lands it can be updated to use these interfaces.
>
> The first patch here should have been part of the last patch of the
> previous series - sorry for leaving it out. It should probably be
> squashed into that patch.
>
> I've combined the new interface with changes is various places to use
> the new interfaces. I think it is easier to reveiew the design that way.
> If necessary I can split these out to have separate patches for each place
> that new APIs are used if the general design is accepted.
>
> NeilBrown
>
> [PATCH 01/11] debugfs: rename end_creating() to
> [PATCH 02/11] VFS: introduce start_dirop() and end_dirop()
> [PATCH 03/11] VFS/nfsd/cachefiles/ovl: add start_creating() and
> [PATCH 04/11] VFS/nfsd/cachefiles/ovl: introduce start_removing() and
> [PATCH 05/11] VFS: introduce start_creating_noperm() and
> [PATCH 06/11] VFS: introduce start_removing_dentry()
> [PATCH 07/11] VFS: add start_creating_killable() and
> [PATCH 08/11] VFS/nfsd/ovl: introduce start_renaming() and
> [PATCH 09/11] VFS/ovl/smb: introduce start_renaming_dentry()
> [PATCH 10/11] Add start_renaming_two_dentrys()
> [PATCH 11/11] ecryptfs: use new start_creaing/start_removing APIs
Overall looks like nice abstractions.
Will try to look closer in next few days.
Can you please share a branch for testing.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [PATCH 00/11] Create APIs to centralise locking for directory ops
2025-09-26 15:47 ` [PATCH 00/11] Create APIs to centralise locking for directory ops Amir Goldstein
@ 2025-09-27 11:20 ` NeilBrown
2025-10-01 5:04 ` NeilBrown
0 siblings, 1 reply; 49+ messages in thread
From: NeilBrown @ 2025-09-27 11:20 UTC (permalink / raw)
To: Amir Goldstein
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Sat, 27 Sep 2025, Amir Goldstein wrote:
> On Fri, Sep 26, 2025 at 4:50 AM NeilBrown <neilb@ownmail.net> wrote:
> >
> > This is the next batch in my ongoing work to change directory op locking.
> >
> > The series creates a number of interfaces that combine locking and lookup, or
> > sometimes do the locking without lookup.
> > After this series there are still a few places where non-VFS code knows
> > about the locking rules. Places that call simple_start_creating()
> > still have explicit unlock on the parent (I think). Al is doing work
> > on those places so I'll wait until he is finished.
> > Also there explicit locking one place in nfsd which is changed by an
> > in-flight patch. That lands it can be updated to use these interfaces.
> >
> > The first patch here should have been part of the last patch of the
> > previous series - sorry for leaving it out. It should probably be
> > squashed into that patch.
> >
> > I've combined the new interface with changes is various places to use
> > the new interfaces. I think it is easier to reveiew the design that way.
> > If necessary I can split these out to have separate patches for each place
> > that new APIs are used if the general design is accepted.
> >
> > NeilBrown
> >
> > [PATCH 01/11] debugfs: rename end_creating() to
> > [PATCH 02/11] VFS: introduce start_dirop() and end_dirop()
> > [PATCH 03/11] VFS/nfsd/cachefiles/ovl: add start_creating() and
> > [PATCH 04/11] VFS/nfsd/cachefiles/ovl: introduce start_removing() and
> > [PATCH 05/11] VFS: introduce start_creating_noperm() and
> > [PATCH 06/11] VFS: introduce start_removing_dentry()
> > [PATCH 07/11] VFS: add start_creating_killable() and
> > [PATCH 08/11] VFS/nfsd/ovl: introduce start_renaming() and
> > [PATCH 09/11] VFS/ovl/smb: introduce start_renaming_dentry()
> > [PATCH 10/11] Add start_renaming_two_dentrys()
> > [PATCH 11/11] ecryptfs: use new start_creaing/start_removing APIs
>
> Overall looks like nice abstractions.
> Will try to look closer in next few days.
Thanks.
>
> Can you please share a branch for testing.
https://github.com/neilbrown/linux branch pdirops
I may update that as I process other review.
Thanks,
NeilBrown
>
> Thanks,
> Amir.
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 00/11] Create APIs to centralise locking for directory ops
2025-09-27 11:20 ` NeilBrown
@ 2025-10-01 5:04 ` NeilBrown
0 siblings, 0 replies; 49+ messages in thread
From: NeilBrown @ 2025-10-01 5:04 UTC (permalink / raw)
To: Amir Goldstein
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Sat, 27 Sep 2025, NeilBrown wrote:
> On Sat, 27 Sep 2025, Amir Goldstein wrote:
> > On Fri, Sep 26, 2025 at 4:50 AM NeilBrown <neilb@ownmail.net> wrote:
>
> >
> > Can you please share a branch for testing.
>
> https://github.com/neilbrown/linux branch pdirops
>
> I may update that as I process other review.
>
I have updated this with responses to all review comments (I hope).
I've added a patch to change vfs_mkdir() to unlock on error,
and one which introduced end_creating_keep() so we can think about
whether it is worthwhile.
I don't plan to resend the whole series until after -rc1 is out.
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 49+ messages in thread