* [PATCH v2 00/14] Create and use APIs to centralise locking for directory ops.
@ 2025-10-15 1:46 NeilBrown
2025-10-15 1:46 ` [PATCH v2 01/14] debugfs: rename end_creating() to debugfs_end_creating() NeilBrown
` (14 more replies)
0 siblings, 15 replies; 32+ messages in thread
From: NeilBrown @ 2025-10-15 1:46 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
Here is a new series in response to review (thanks!).
The series creates a number of interfaces that combine locking and lookup, or
sometimes do the locking without lookup.
After this series there are still a few places where non-VFS code knows
about the locking rules. Places that call simple_start_creating()
still have explicit unlock on the parent (I think). Al is doing work
on those places so I'll wait until he is finished.
Also there explicit locking one place in nfsd which is changed by an
in-flight patch. That lands it can be updated to use these interfaces.
The first patch here should have been part of the last patch of the
previous series - sorry for leaving it out.
I've combined the new interface with changes is various places to use
the new interfaces. I think it is easier to reveiew the design that way.
If necessary I can split these out to have separate patches for each place
that new APIs are used if the general design is accepted.
NeilBrown
[PATCH v2 01/14] debugfs: rename end_creating() to
[PATCH v2 02/14] VFS: introduce start_dirop() and end_dirop()
[PATCH v2 03/14] VFS: tidy up do_unlinkat()
[PATCH v2 04/14] VFS/nfsd/cachefiles/ovl: add start_creating() and
[PATCH v2 05/14] VFS/nfsd/cachefiles/ovl: introduce start_removing()
[PATCH v2 06/14] VFS: introduce start_creating_noperm() and
[PATCH v2 07/14] VFS: introduce start_removing_dentry()
[PATCH v2 08/14] VFS: add start_creating_killable() and
[PATCH v2 09/14] VFS/nfsd/ovl: introduce start_renaming() and
[PATCH v2 10/14] VFS/ovl/smb: introduce start_renaming_dentry()
[PATCH v2 11/14] Add start_renaming_two_dentries()
[PATCH v2 12/14] ecryptfs: use new start_creating/start_removing APIs
[PATCH v2 13/14] VFS: change vfs_mkdir() to unlock on failure.
[PATCH v2 14/14] VFS: introduce end_creating_keep()
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v2 01/14] debugfs: rename end_creating() to debugfs_end_creating()
2025-10-15 1:46 [PATCH v2 00/14] Create and use APIs to centralise locking for directory ops NeilBrown
@ 2025-10-15 1:46 ` NeilBrown
2025-10-21 13:26 ` Christian Brauner
2025-10-15 1:46 ` [PATCH v2 02/14] VFS: introduce start_dirop() and end_dirop() NeilBrown
` (13 subsequent siblings)
14 siblings, 1 reply; 32+ messages in thread
From: NeilBrown @ 2025-10-15 1:46 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
By not using the generic end_creating() name here we are free to use it
more globally for a more generic function.
This should have been done when start_creating() was renamed.
For consistency, also rename failed_creating().
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/debugfs/inode.c | 26 +++++++++++++-------------
1 file changed, 13 insertions(+), 13 deletions(-)
diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
index 661a99a7dfbe..f241b9df642a 100644
--- a/fs/debugfs/inode.c
+++ b/fs/debugfs/inode.c
@@ -403,7 +403,7 @@ static struct dentry *debugfs_start_creating(const char *name,
return dentry;
}
-static struct dentry *failed_creating(struct dentry *dentry)
+static struct dentry *debugfs_failed_creating(struct dentry *dentry)
{
inode_unlock(d_inode(dentry->d_parent));
dput(dentry);
@@ -411,7 +411,7 @@ static struct dentry *failed_creating(struct dentry *dentry)
return ERR_PTR(-ENOMEM);
}
-static struct dentry *end_creating(struct dentry *dentry)
+static struct dentry *debugfs_end_creating(struct dentry *dentry)
{
inode_unlock(d_inode(dentry->d_parent));
return dentry;
@@ -435,7 +435,7 @@ static struct dentry *__debugfs_create_file(const char *name, umode_t mode,
return dentry;
if (!(debugfs_allow & DEBUGFS_ALLOW_API)) {
- failed_creating(dentry);
+ debugfs_failed_creating(dentry);
return ERR_PTR(-EPERM);
}
@@ -443,7 +443,7 @@ static struct dentry *__debugfs_create_file(const char *name, umode_t mode,
if (unlikely(!inode)) {
pr_err("out of free dentries, can not create file '%s'\n",
name);
- return failed_creating(dentry);
+ return debugfs_failed_creating(dentry);
}
inode->i_mode = mode;
@@ -458,7 +458,7 @@ static struct dentry *__debugfs_create_file(const char *name, umode_t mode,
d_instantiate(dentry, inode);
fsnotify_create(d_inode(dentry->d_parent), dentry);
- return end_creating(dentry);
+ return debugfs_end_creating(dentry);
}
struct dentry *debugfs_create_file_full(const char *name, umode_t mode,
@@ -585,7 +585,7 @@ struct dentry *debugfs_create_dir(const char *name, struct dentry *parent)
return dentry;
if (!(debugfs_allow & DEBUGFS_ALLOW_API)) {
- failed_creating(dentry);
+ debugfs_failed_creating(dentry);
return ERR_PTR(-EPERM);
}
@@ -593,7 +593,7 @@ struct dentry *debugfs_create_dir(const char *name, struct dentry *parent)
if (unlikely(!inode)) {
pr_err("out of free dentries, can not create directory '%s'\n",
name);
- return failed_creating(dentry);
+ return debugfs_failed_creating(dentry);
}
inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
@@ -605,7 +605,7 @@ struct dentry *debugfs_create_dir(const char *name, struct dentry *parent)
d_instantiate(dentry, inode);
inc_nlink(d_inode(dentry->d_parent));
fsnotify_mkdir(d_inode(dentry->d_parent), dentry);
- return end_creating(dentry);
+ return debugfs_end_creating(dentry);
}
EXPORT_SYMBOL_GPL(debugfs_create_dir);
@@ -632,7 +632,7 @@ struct dentry *debugfs_create_automount(const char *name,
return dentry;
if (!(debugfs_allow & DEBUGFS_ALLOW_API)) {
- failed_creating(dentry);
+ debugfs_failed_creating(dentry);
return ERR_PTR(-EPERM);
}
@@ -640,7 +640,7 @@ struct dentry *debugfs_create_automount(const char *name,
if (unlikely(!inode)) {
pr_err("out of free dentries, can not create automount '%s'\n",
name);
- return failed_creating(dentry);
+ return debugfs_failed_creating(dentry);
}
make_empty_dir_inode(inode);
@@ -652,7 +652,7 @@ struct dentry *debugfs_create_automount(const char *name,
d_instantiate(dentry, inode);
inc_nlink(d_inode(dentry->d_parent));
fsnotify_mkdir(d_inode(dentry->d_parent), dentry);
- return end_creating(dentry);
+ return debugfs_end_creating(dentry);
}
EXPORT_SYMBOL(debugfs_create_automount);
@@ -699,13 +699,13 @@ struct dentry *debugfs_create_symlink(const char *name, struct dentry *parent,
pr_err("out of free dentries, can not create symlink '%s'\n",
name);
kfree(link);
- return failed_creating(dentry);
+ return debugfs_failed_creating(dentry);
}
inode->i_mode = S_IFLNK | S_IRWXUGO;
inode->i_op = &debugfs_symlink_inode_operations;
inode->i_link = link;
d_instantiate(dentry, inode);
- return end_creating(dentry);
+ return debugfs_end_creating(dentry);
}
EXPORT_SYMBOL_GPL(debugfs_create_symlink);
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 02/14] VFS: introduce start_dirop() and end_dirop()
2025-10-15 1:46 [PATCH v2 00/14] Create and use APIs to centralise locking for directory ops NeilBrown
2025-10-15 1:46 ` [PATCH v2 01/14] debugfs: rename end_creating() to debugfs_end_creating() NeilBrown
@ 2025-10-15 1:46 ` NeilBrown
2025-10-19 9:56 ` Amir Goldstein
2025-10-15 1:46 ` [PATCH v2 03/14] VFS: tidy up do_unlinkat() NeilBrown
` (12 subsequent siblings)
14 siblings, 1 reply; 32+ messages in thread
From: NeilBrown @ 2025-10-15 1:46 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
The fact that directory operations (create,remove,rename) are protected
by a lock on the parent is known widely throughout the kernel.
In order to change this - to instead lock the target dentry - it is
best to centralise this knowledge so it can be changed in one place.
This patch introduces start_dirop() which is local to VFS code.
It performs the required locking for create and remove. Rename
will be handled separately.
Various functions with names like start_creating() or start_removing_path(),
some of which already exist, will export this functionality beyond the VFS.
end_dirop() is the partner of start_dirop(). It drops the lock and
releases the reference on the dentry.
It *is* exported so that various end_creating etc functions can be inline.
As vfs_mkdir() drops the dentry on error we cannot use end_dirop() as
that won't unlock when the dentry IS_ERR(). For now we need an explicit
unlock when dentry IS_ERR(). I hope to change vfs_mkdir() to unlock
when it drops a dentry so that explicit unlock can go away.
end_dirop() can always be called on the result of start_dirop(), but not
after vfs_mkdir(). After a vfs_mkdir() we still may need the explicit
unlock as seen in end_creating_path().
As well as adding start_dirop() and end_dirop()
this patch uses them in:
- simple_start_creating (which requires sharing lookup_noperm_common()
with libfs.c)
- start_removing_path / start_removing_user_path_at
- filename_create / end_creating_path()
- do_rmdir(), do_unlinkat()
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/internal.h | 3 ++
fs/libfs.c | 36 ++++++++---------
fs/namei.c | 98 ++++++++++++++++++++++++++++++++++------------
include/linux/fs.h | 2 +
4 files changed, 95 insertions(+), 44 deletions(-)
diff --git a/fs/internal.h b/fs/internal.h
index 9b2b4d116880..d08d5e2235e9 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -67,6 +67,9 @@ int vfs_tmpfile(struct mnt_idmap *idmap,
const struct path *parentpath,
struct file *file, umode_t mode);
struct dentry *d_hash_and_lookup(struct dentry *, struct qstr *);
+struct dentry *start_dirop(struct dentry *parent, struct qstr *name,
+ unsigned int lookup_flags);
+int lookup_noperm_common(struct qstr *qname, struct dentry *base);
/*
* namespace.c
diff --git a/fs/libfs.c b/fs/libfs.c
index ce8c496a6940..02371f45ef7d 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -2289,27 +2289,25 @@ void stashed_dentry_prune(struct dentry *dentry)
cmpxchg(stashed, dentry, NULL);
}
-/* parent must be held exclusive */
+/**
+ * simple_start_creating - prepare to create a given name
+ * @parent: directory in which to prepare to create the name
+ * @name: the name to be created
+ *
+ * Required lock is taken and a lookup in performed prior to creating an
+ * object in a directory. No permission checking is performed.
+ *
+ * Returns: a negative dentry on which vfs_create() or similar may
+ * be attempted, or an error.
+ */
struct dentry *simple_start_creating(struct dentry *parent, const char *name)
{
- struct dentry *dentry;
- struct inode *dir = d_inode(parent);
+ struct qstr qname = QSTR(name);
+ int err;
- inode_lock(dir);
- if (unlikely(IS_DEADDIR(dir))) {
- inode_unlock(dir);
- return ERR_PTR(-ENOENT);
- }
- dentry = lookup_noperm(&QSTR(name), parent);
- if (IS_ERR(dentry)) {
- inode_unlock(dir);
- return dentry;
- }
- if (dentry->d_inode) {
- dput(dentry);
- inode_unlock(dir);
- return ERR_PTR(-EEXIST);
- }
- return dentry;
+ err = lookup_noperm_common(&qname, parent);
+ if (err)
+ return ERR_PTR(err);
+ return start_dirop(parent, &qname, LOOKUP_CREATE | LOOKUP_EXCL);
}
EXPORT_SYMBOL(simple_start_creating);
diff --git a/fs/namei.c b/fs/namei.c
index 7377020a2cba..3618efd4bcaa 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2765,6 +2765,48 @@ static int filename_parentat(int dfd, struct filename *name,
return __filename_parentat(dfd, name, flags, parent, last, type, NULL);
}
+/**
+ * start_dirop - begin a create or remove dirop, performing locking and lookup
+ * @parent: the dentry of the parent in which the operation will occur
+ * @name: a qstr holding the name within that parent
+ * @lookup_flags: intent and other lookup flags.
+ *
+ * The lookup is performed and necessary locks are taken so that, on success,
+ * the returned dentry can be operated on safely.
+ * The qstr must already have the hash value calculated.
+ *
+ * Returns: a locked dentry, or an error.
+ *
+ */
+struct dentry *start_dirop(struct dentry *parent, struct qstr *name,
+ unsigned int lookup_flags)
+{
+ struct dentry *dentry;
+ struct inode *dir = d_inode(parent);
+
+ inode_lock_nested(dir, I_MUTEX_PARENT);
+ dentry = lookup_one_qstr_excl(name, parent, lookup_flags);
+ if (IS_ERR(dentry))
+ inode_unlock(dir);
+ return dentry;
+}
+
+/**
+ * end_dirop - signal completion of a dirop
+ * @de: the dentry which was returned by start_dirop or similar.
+ *
+ * If the de is an error, nothing happens. Otherwise any lock taken to
+ * protect the dentry is dropped and the dentry itself is release (dput()).
+ */
+void end_dirop(struct dentry *de)
+{
+ if (!IS_ERR(de)) {
+ inode_unlock(de->d_parent->d_inode);
+ dput(de);
+ }
+}
+EXPORT_SYMBOL(end_dirop);
+
/* does lookup, returns the object with parent locked */
static struct dentry *__start_removing_path(int dfd, struct filename *name,
struct path *path)
@@ -2781,10 +2823,9 @@ static struct dentry *__start_removing_path(int dfd, struct filename *name,
return ERR_PTR(-EINVAL);
/* don't fail immediately if it's r/o, at least try to report other errors */
error = mnt_want_write(parent_path.mnt);
- inode_lock_nested(parent_path.dentry->d_inode, I_MUTEX_PARENT);
- d = lookup_one_qstr_excl(&last, parent_path.dentry, 0);
+ d = start_dirop(parent_path.dentry, &last, 0);
if (IS_ERR(d))
- goto unlock;
+ goto drop;
if (error)
goto fail;
path->dentry = no_free_ptr(parent_path.dentry);
@@ -2792,10 +2833,9 @@ static struct dentry *__start_removing_path(int dfd, struct filename *name,
return d;
fail:
- dput(d);
+ end_dirop(d);
d = ERR_PTR(error);
-unlock:
- inode_unlock(parent_path.dentry->d_inode);
+drop:
if (!error)
mnt_drop_write(parent_path.mnt);
return d;
@@ -2910,7 +2950,7 @@ int vfs_path_lookup(struct dentry *dentry, struct vfsmount *mnt,
}
EXPORT_SYMBOL(vfs_path_lookup);
-static int lookup_noperm_common(struct qstr *qname, struct dentry *base)
+int lookup_noperm_common(struct qstr *qname, struct dentry *base)
{
const char *name = qname->name;
u32 len = qname->len;
@@ -4223,21 +4263,18 @@ static struct dentry *filename_create(int dfd, struct filename *name,
*/
if (last.name[last.len] && !want_dir)
create_flags &= ~LOOKUP_CREATE;
- inode_lock_nested(path->dentry->d_inode, I_MUTEX_PARENT);
- dentry = lookup_one_qstr_excl(&last, path->dentry,
- reval_flag | create_flags);
+ dentry = start_dirop(path->dentry, &last, reval_flag | create_flags);
if (IS_ERR(dentry))
- goto unlock;
+ goto out_drop_write;
if (unlikely(error))
goto fail;
return dentry;
fail:
- dput(dentry);
+ end_dirop(dentry);
dentry = ERR_PTR(error);
-unlock:
- inode_unlock(path->dentry->d_inode);
+out_drop_write:
if (!error)
mnt_drop_write(path->mnt);
out:
@@ -4256,11 +4293,26 @@ struct dentry *start_creating_path(int dfd, const char *pathname,
}
EXPORT_SYMBOL(start_creating_path);
+/**
+ * end_creating_path - finish a code section started by start_creating_path()
+ * @path: the path instantiated by start_creating_path()
+ * @dentry: the dentry returned by start_creating_path()
+ *
+ * end_creating_path() will unlock and locks taken by start_creating_path()
+ * and drop an references that were taken. It should only be called
+ * if start_creating_path() returned a non-error.
+ * If vfs_mkdir() was called and it returned an error, that error *should*
+ * be passed to end_creating_path() together with the path.
+ */
void end_creating_path(const struct path *path, struct dentry *dentry)
{
- if (!IS_ERR(dentry))
- dput(dentry);
- inode_unlock(path->dentry->d_inode);
+ if (IS_ERR(dentry))
+ /* The parent is still locked despite the error from
+ * vfs_mkdir() - must unlock it.
+ */
+ inode_unlock(path->dentry->d_inode);
+ else
+ end_dirop(dentry);
mnt_drop_write(path->mnt);
path_put(path);
}
@@ -4592,8 +4644,7 @@ int do_rmdir(int dfd, struct filename *name)
if (error)
goto exit2;
- inode_lock_nested(path.dentry->d_inode, I_MUTEX_PARENT);
- dentry = lookup_one_qstr_excl(&last, path.dentry, lookup_flags);
+ dentry = start_dirop(path.dentry, &last, lookup_flags);
error = PTR_ERR(dentry);
if (IS_ERR(dentry))
goto exit3;
@@ -4602,9 +4653,8 @@ int do_rmdir(int dfd, struct filename *name)
goto exit4;
error = vfs_rmdir(mnt_idmap(path.mnt), path.dentry->d_inode, dentry);
exit4:
- dput(dentry);
+ end_dirop(dentry);
exit3:
- inode_unlock(path.dentry->d_inode);
mnt_drop_write(path.mnt);
exit2:
path_put(&path);
@@ -4721,8 +4771,7 @@ int do_unlinkat(int dfd, struct filename *name)
if (error)
goto exit2;
retry_deleg:
- inode_lock_nested(path.dentry->d_inode, I_MUTEX_PARENT);
- dentry = lookup_one_qstr_excl(&last, path.dentry, lookup_flags);
+ dentry = start_dirop(path.dentry, &last, lookup_flags);
error = PTR_ERR(dentry);
if (!IS_ERR(dentry)) {
@@ -4737,9 +4786,8 @@ int do_unlinkat(int dfd, struct filename *name)
error = vfs_unlink(mnt_idmap(path.mnt), path.dentry->d_inode,
dentry, &delegated_inode);
exit3:
- dput(dentry);
+ end_dirop(dentry);
}
- inode_unlock(path.dentry->d_inode);
if (inode)
iput(inode); /* truncate the inode here */
inode = NULL;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c895146c1444..f4543612ef1e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3609,6 +3609,8 @@ extern void iterate_supers_type(struct file_system_type *,
void filesystems_freeze(void);
void filesystems_thaw(void);
+void end_dirop(struct dentry *de);
+
extern int dcache_dir_open(struct inode *, struct file *);
extern int dcache_dir_close(struct inode *, struct file *);
extern loff_t dcache_dir_lseek(struct file *, loff_t, int);
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 03/14] VFS: tidy up do_unlinkat()
2025-10-15 1:46 [PATCH v2 00/14] Create and use APIs to centralise locking for directory ops NeilBrown
2025-10-15 1:46 ` [PATCH v2 01/14] debugfs: rename end_creating() to debugfs_end_creating() NeilBrown
2025-10-15 1:46 ` [PATCH v2 02/14] VFS: introduce start_dirop() and end_dirop() NeilBrown
@ 2025-10-15 1:46 ` NeilBrown
2025-10-19 10:02 ` Amir Goldstein
2025-10-15 1:46 ` [PATCH v2 04/14] VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating() NeilBrown
` (11 subsequent siblings)
14 siblings, 1 reply; 32+ messages in thread
From: NeilBrown @ 2025-10-15 1:46 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
The simplification of locking in the previous patch opens up some room
for tidying up do_unlinkat()
- change all "exit" labels to describe what will happen at the label.
- always goto an exit label on an error - unwrap the "if (!IS_ERR())" branch.
- Move the "slashes" handing inline, but mark it as unlikely()
- simplify use of the "inode" variable - we no longer need to test for NULL.
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/namei.c | 55 ++++++++++++++++++++++++++----------------------------
1 file changed, 26 insertions(+), 29 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index 3618efd4bcaa..9effaad115d9 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4755,65 +4755,62 @@ int do_unlinkat(int dfd, struct filename *name)
struct path path;
struct qstr last;
int type;
- struct inode *inode = NULL;
+ struct inode *inode;
struct inode *delegated_inode = NULL;
unsigned int lookup_flags = 0;
retry:
error = filename_parentat(dfd, name, lookup_flags, &path, &last, &type);
if (error)
- goto exit1;
+ goto exit_putname;
error = -EISDIR;
if (type != LAST_NORM)
- goto exit2;
+ goto exit_path_put;
error = mnt_want_write(path.mnt);
if (error)
- goto exit2;
+ goto exit_path_put;
retry_deleg:
dentry = start_dirop(path.dentry, &last, lookup_flags);
error = PTR_ERR(dentry);
- if (!IS_ERR(dentry)) {
+ if (IS_ERR(dentry))
+ goto exit_drop_write;
- /* Why not before? Because we want correct error value */
- if (last.name[last.len])
- goto slashes;
- inode = dentry->d_inode;
- ihold(inode);
- error = security_path_unlink(&path, dentry);
- if (error)
- goto exit3;
- error = vfs_unlink(mnt_idmap(path.mnt), path.dentry->d_inode,
- dentry, &delegated_inode);
-exit3:
+ /* Why not before? Because we want correct error value */
+ if (unlikely(last.name[last.len])) {
+ if (d_is_dir(dentry))
+ error = -EISDIR;
+ else
+ error = -ENOTDIR;
end_dirop(dentry);
+ goto exit_drop_write;
}
- if (inode)
- iput(inode); /* truncate the inode here */
- inode = NULL;
+ inode = dentry->d_inode;
+ ihold(inode);
+ error = security_path_unlink(&path, dentry);
+ if (error)
+ goto exit_end_dirop;
+ error = vfs_unlink(mnt_idmap(path.mnt), path.dentry->d_inode,
+ dentry, &delegated_inode);
+exit_end_dirop:
+ end_dirop(dentry);
+ iput(inode); /* truncate the inode here */
if (delegated_inode) {
error = break_deleg_wait(&delegated_inode);
if (!error)
goto retry_deleg;
}
+exit_drop_write:
mnt_drop_write(path.mnt);
-exit2:
+exit_path_put:
path_put(&path);
if (retry_estale(error, lookup_flags)) {
lookup_flags |= LOOKUP_REVAL;
- inode = NULL;
goto retry;
}
-exit1:
+exit_putname:
putname(name);
return error;
-
-slashes:
- if (d_is_dir(dentry))
- error = -EISDIR;
- else
- error = -ENOTDIR;
- goto exit3;
}
SYSCALL_DEFINE3(unlinkat, int, dfd, const char __user *, pathname, int, flag)
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 04/14] VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating()
2025-10-15 1:46 [PATCH v2 00/14] Create and use APIs to centralise locking for directory ops NeilBrown
` (2 preceding siblings ...)
2025-10-15 1:46 ` [PATCH v2 03/14] VFS: tidy up do_unlinkat() NeilBrown
@ 2025-10-15 1:46 ` NeilBrown
2025-10-19 10:10 ` Amir Goldstein
2025-10-15 1:46 ` [PATCH v2 05/14] VFS/nfsd/cachefiles/ovl: introduce start_removing() and end_removing() NeilBrown
` (10 subsequent siblings)
14 siblings, 1 reply; 32+ messages in thread
From: NeilBrown @ 2025-10-15 1:46 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
start_creating() is similar to simple_start_creating() but is not so
simple.
It takes a qstr for the name, includes permission checking, and does NOT
report an error if the name already exists, returning a positive dentry
instead.
This is currently used by nfsd, cachefiles, and overlayfs.
end_creating() is called after the dentry has been used.
end_creating() drops the reference to the dentry as it is generally no
longer needed. This is exactly the first section of end_creating_path()
so that function is changed to call the new end_creating()
These calls help encapsulate locking rules so that directory locking can
be changed.
Occasionally this change means that the parent lock is held for a
shorter period of time, for example in cachefiles_commit_tmpfile().
As this function now unlocks after an unlink and before the following
lookup, it is possible that the lookup could again find a positive
dentry, so a while loop is introduced there.
In overlayfs the ovl_lookup_temp() function has ovl_tempname()
split out to be used in ovl_start_creating_temp(). The other use
of ovl_lookup_temp() is preparing for a rename. When rename handling
is updated, ovl_lookup_temp() will be removed.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/cachefiles/namei.c | 41 ++++++++---------
fs/namei.c | 35 ++++++++++++---
fs/nfsd/nfs3proc.c | 14 +++---
fs/nfsd/nfs4proc.c | 14 +++---
fs/nfsd/nfs4recover.c | 16 +++----
fs/nfsd/nfsproc.c | 11 +++--
fs/nfsd/vfs.c | 52 +++++++++-------------
fs/overlayfs/copy_up.c | 19 ++++----
fs/overlayfs/dir.c | 96 +++++++++++++++++++++++-----------------
fs/overlayfs/overlayfs.h | 8 ++++
fs/overlayfs/super.c | 32 +++++++-------
include/linux/namei.h | 33 ++++++++++++++
12 files changed, 213 insertions(+), 158 deletions(-)
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index d1edb2ac3837..0a136eb434da 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -93,12 +93,11 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
_enter(",,%s", dirname);
/* search the current directory for the element name */
- inode_lock_nested(d_inode(dir), I_MUTEX_PARENT);
retry:
ret = cachefiles_inject_read_error();
if (ret == 0)
- subdir = lookup_one(&nop_mnt_idmap, &QSTR(dirname), dir);
+ subdir = start_creating(&nop_mnt_idmap, dir, &QSTR(dirname));
else
subdir = ERR_PTR(ret);
trace_cachefiles_lookup(NULL, dir, subdir);
@@ -141,7 +140,7 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
trace_cachefiles_mkdir(dir, subdir);
if (unlikely(d_unhashed(subdir) || d_is_negative(subdir))) {
- dput(subdir);
+ end_creating(subdir, dir);
goto retry;
}
ASSERT(d_backing_inode(subdir));
@@ -154,7 +153,8 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
/* Tell rmdir() it's not allowed to delete the subdir */
inode_lock(d_inode(subdir));
- inode_unlock(d_inode(dir));
+ dget(subdir);
+ end_creating(subdir, dir);
if (!__cachefiles_mark_inode_in_use(NULL, d_inode(subdir))) {
pr_notice("cachefiles: Inode already in use: %pd (B=%lx)\n",
@@ -196,14 +196,11 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
return ERR_PTR(-EBUSY);
mkdir_error:
- inode_unlock(d_inode(dir));
- if (!IS_ERR(subdir))
- dput(subdir);
+ end_creating(subdir, dir);
pr_err("mkdir %s failed with error %d\n", dirname, ret);
return ERR_PTR(ret);
lookup_error:
- inode_unlock(d_inode(dir));
ret = PTR_ERR(subdir);
pr_err("Lookup %s failed with error %d\n", dirname, ret);
return ERR_PTR(ret);
@@ -679,36 +676,41 @@ bool cachefiles_commit_tmpfile(struct cachefiles_cache *cache,
_enter(",%pD", object->file);
- inode_lock_nested(d_inode(fan), I_MUTEX_PARENT);
ret = cachefiles_inject_read_error();
if (ret == 0)
- dentry = lookup_one(&nop_mnt_idmap, &QSTR(object->d_name), fan);
+ dentry = start_creating(&nop_mnt_idmap, fan, &QSTR(object->d_name));
else
dentry = ERR_PTR(ret);
if (IS_ERR(dentry)) {
trace_cachefiles_vfs_error(object, d_inode(fan), PTR_ERR(dentry),
cachefiles_trace_lookup_error);
_debug("lookup fail %ld", PTR_ERR(dentry));
- goto out_unlock;
+ goto out;
}
- if (!d_is_negative(dentry)) {
+ /*
+ * This loop will only execute more than once if some other thread
+ * races to create the object we are trying to create.
+ */
+ while (!d_is_negative(dentry)) {
ret = cachefiles_unlink(volume->cache, object, fan, dentry,
FSCACHE_OBJECT_IS_STALE);
if (ret < 0)
- goto out_dput;
+ goto out_end;
+
+ end_creating(dentry, fan);
- dput(dentry);
ret = cachefiles_inject_read_error();
if (ret == 0)
- dentry = lookup_one(&nop_mnt_idmap, &QSTR(object->d_name), fan);
+ dentry = start_creating(&nop_mnt_idmap, fan,
+ &QSTR(object->d_name));
else
dentry = ERR_PTR(ret);
if (IS_ERR(dentry)) {
trace_cachefiles_vfs_error(object, d_inode(fan), PTR_ERR(dentry),
cachefiles_trace_lookup_error);
_debug("lookup fail %ld", PTR_ERR(dentry));
- goto out_unlock;
+ goto out;
}
}
@@ -729,10 +731,9 @@ bool cachefiles_commit_tmpfile(struct cachefiles_cache *cache,
success = true;
}
-out_dput:
- dput(dentry);
-out_unlock:
- inode_unlock(d_inode(fan));
+out_end:
+ end_creating(dentry, fan);
+out:
_leave(" = %u", success);
return success;
}
diff --git a/fs/namei.c b/fs/namei.c
index 9effaad115d9..9972b0257a4c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3221,6 +3221,33 @@ struct dentry *lookup_noperm_positive_unlocked(struct qstr *name,
}
EXPORT_SYMBOL(lookup_noperm_positive_unlocked);
+/**
+ * start_creating - prepare to create a given name with permission checking
+ * @idmap: idmap of the mount
+ * @parent: directory in which to prepare to create the name
+ * @name: the name to be created
+ *
+ * Locks are taken and a lookup is performed prior to creating
+ * an object in a directory. Permission checking (MAY_EXEC) is performed
+ * against @idmap.
+ *
+ * If the name already exists, a positive dentry is returned, so
+ * behaviour is similar to O_CREAT without O_EXCL, which doesn't fail
+ * with -EEXIST.
+ *
+ * Returns: a negative or positive dentry, or an error.
+ */
+struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *parent,
+ struct qstr *name)
+{
+ int err = lookup_one_common(idmap, name, parent);
+
+ if (err)
+ return ERR_PTR(err);
+ return start_dirop(parent, name, LOOKUP_CREATE);
+}
+EXPORT_SYMBOL(start_creating);
+
#ifdef CONFIG_UNIX98_PTYS
int path_pts(struct path *path)
{
@@ -4306,13 +4333,7 @@ EXPORT_SYMBOL(start_creating_path);
*/
void end_creating_path(const struct path *path, struct dentry *dentry)
{
- if (IS_ERR(dentry))
- /* The parent is still locked despite the error from
- * vfs_mkdir() - must unlock it.
- */
- inode_unlock(path->dentry->d_inode);
- else
- end_dirop(dentry);
+ end_creating(dentry, path->dentry);
mnt_drop_write(path->mnt);
path_put(path);
}
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
index b6d03e1ef5f7..e2aac0def2cb 100644
--- a/fs/nfsd/nfs3proc.c
+++ b/fs/nfsd/nfs3proc.c
@@ -281,14 +281,11 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
if (host_err)
return nfserrno(host_err);
- inode_lock_nested(inode, I_MUTEX_PARENT);
-
- child = lookup_one(&nop_mnt_idmap,
- &QSTR_LEN(argp->name, argp->len),
- parent);
+ child = start_creating(&nop_mnt_idmap, parent,
+ &QSTR_LEN(argp->name, argp->len));
if (IS_ERR(child)) {
status = nfserrno(PTR_ERR(child));
- goto out;
+ goto out_write;
}
if (d_really_is_negative(child)) {
@@ -367,9 +364,8 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
status = nfsd_create_setattr(rqstp, fhp, resfhp, &attrs);
out:
- inode_unlock(inode);
- if (child && !IS_ERR(child))
- dput(child);
+ end_creating(child, parent);
+out_write:
fh_drop_write(fhp);
return status;
}
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index e466cf52d7d7..b2c95e8e7c68 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -264,14 +264,11 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
if (is_create_with_attrs(open))
nfsd4_acl_to_attr(NF4REG, open->op_acl, &attrs);
- inode_lock_nested(inode, I_MUTEX_PARENT);
-
- child = lookup_one(&nop_mnt_idmap,
- &QSTR_LEN(open->op_fname, open->op_fnamelen),
- parent);
+ child = start_creating(&nop_mnt_idmap, parent,
+ &QSTR_LEN(open->op_fname, open->op_fnamelen));
if (IS_ERR(child)) {
status = nfserrno(PTR_ERR(child));
- goto out;
+ goto out_write;
}
if (d_really_is_negative(child)) {
@@ -379,10 +376,9 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
if (attrs.na_aclerr)
open->op_bmval[0] &= ~FATTR4_WORD0_ACL;
out:
- inode_unlock(inode);
+ end_creating(child, parent);
nfsd_attrs_free(&attrs);
- if (child && !IS_ERR(child))
- dput(child);
+out_write:
fh_drop_write(fhp);
return status;
}
diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
index e2b9472e5c78..c247a7c3291c 100644
--- a/fs/nfsd/nfs4recover.c
+++ b/fs/nfsd/nfs4recover.c
@@ -195,13 +195,11 @@ nfsd4_create_clid_dir(struct nfs4_client *clp)
goto out_creds;
dir = nn->rec_file->f_path.dentry;
- /* lock the parent */
- inode_lock(d_inode(dir));
- dentry = lookup_one(&nop_mnt_idmap, &QSTR(dname), dir);
+ dentry = start_creating(&nop_mnt_idmap, dir, &QSTR(dname));
if (IS_ERR(dentry)) {
status = PTR_ERR(dentry);
- goto out_unlock;
+ goto out;
}
if (d_really_is_positive(dentry))
/*
@@ -212,15 +210,13 @@ nfsd4_create_clid_dir(struct nfs4_client *clp)
* In the 4.0 case, we should never get here; but we may
* as well be forgiving and just succeed silently.
*/
- goto out_put;
+ goto out_end;
dentry = vfs_mkdir(&nop_mnt_idmap, d_inode(dir), dentry, S_IRWXU);
if (IS_ERR(dentry))
status = PTR_ERR(dentry);
-out_put:
- if (!status)
- dput(dentry);
-out_unlock:
- inode_unlock(d_inode(dir));
+out_end:
+ end_creating(dentry, dir);
+out:
if (status == 0) {
if (nn->in_grace)
__nfsd4_create_reclaim_record_grace(clp, dname,
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
index 8f71f5748c75..ee1b16e921fd 100644
--- a/fs/nfsd/nfsproc.c
+++ b/fs/nfsd/nfsproc.c
@@ -306,18 +306,16 @@ nfsd_proc_create(struct svc_rqst *rqstp)
goto done;
}
- inode_lock_nested(dirfhp->fh_dentry->d_inode, I_MUTEX_PARENT);
- dchild = lookup_one(&nop_mnt_idmap, &QSTR_LEN(argp->name, argp->len),
- dirfhp->fh_dentry);
+ dchild = start_creating(&nop_mnt_idmap, dirfhp->fh_dentry,
+ &QSTR_LEN(argp->name, argp->len));
if (IS_ERR(dchild)) {
resp->status = nfserrno(PTR_ERR(dchild));
- goto out_unlock;
+ goto out_write;
}
fh_init(newfhp, NFS_FHSIZE);
resp->status = fh_compose(newfhp, dirfhp->fh_export, dchild, dirfhp);
if (!resp->status && d_really_is_negative(dchild))
resp->status = nfserr_noent;
- dput(dchild);
if (resp->status) {
if (resp->status != nfserr_noent)
goto out_unlock;
@@ -423,7 +421,8 @@ nfsd_proc_create(struct svc_rqst *rqstp)
}
out_unlock:
- inode_unlock(dirfhp->fh_dentry->d_inode);
+ end_creating(dchild, dirfhp->fh_dentry);
+out_write:
fh_drop_write(dirfhp);
done:
fh_put(dirfhp);
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 9cb20d4aeab1..4efd3688e081 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1521,7 +1521,7 @@ nfsd_check_ignore_resizing(struct iattr *iap)
iap->ia_valid &= ~ATTR_SIZE;
}
-/* The parent directory should already be locked: */
+/* The parent directory should already be locked - we will unlock */
__be32
nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp,
struct nfsd_attrs *attrs,
@@ -1587,8 +1587,9 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp,
err = nfsd_create_setattr(rqstp, fhp, resfhp, attrs);
out:
- if (!IS_ERR(dchild))
- dput(dchild);
+ if (!err)
+ fh_fill_post_attrs(fhp);
+ end_creating(dchild, dentry);
return err;
out_nfserr:
@@ -1626,28 +1627,26 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
if (host_err)
return nfserrno(host_err);
- inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT);
- dchild = lookup_one(&nop_mnt_idmap, &QSTR_LEN(fname, flen), dentry);
+ dchild = start_creating(&nop_mnt_idmap, dentry, &QSTR_LEN(fname, flen));
host_err = PTR_ERR(dchild);
- if (IS_ERR(dchild)) {
- err = nfserrno(host_err);
- goto out_unlock;
- }
+ if (IS_ERR(dchild))
+ return nfserrno(host_err);
+
err = fh_compose(resfhp, fhp->fh_export, dchild, fhp);
/*
* We unconditionally drop our ref to dchild as fh_compose will have
* already grabbed its own ref for it.
*/
- dput(dchild);
if (err)
goto out_unlock;
err = fh_fill_pre_attrs(fhp);
if (err != nfs_ok)
goto out_unlock;
err = nfsd_create_locked(rqstp, fhp, attrs, type, rdev, resfhp);
- fh_fill_post_attrs(fhp);
+ return err;
+
out_unlock:
- inode_unlock(dentry->d_inode);
+ end_creating(dchild, dentry);
return err;
}
@@ -1733,11 +1732,9 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
}
dentry = fhp->fh_dentry;
- inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT);
- dnew = lookup_one(&nop_mnt_idmap, &QSTR_LEN(fname, flen), dentry);
+ dnew = start_creating(&nop_mnt_idmap, dentry, &QSTR_LEN(fname, flen));
if (IS_ERR(dnew)) {
err = nfserrno(PTR_ERR(dnew));
- inode_unlock(dentry->d_inode);
goto out_drop_write;
}
err = fh_fill_pre_attrs(fhp);
@@ -1750,11 +1747,11 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
nfsd_create_setattr(rqstp, fhp, resfhp, attrs);
fh_fill_post_attrs(fhp);
out_unlock:
- inode_unlock(dentry->d_inode);
+ end_creating(dnew, dentry);
if (!err)
err = nfserrno(commit_metadata(fhp));
- dput(dnew);
- if (err==0) err = cerr;
+ if (!err)
+ err = cerr;
out_drop_write:
fh_drop_write(fhp);
out:
@@ -1809,32 +1806,31 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
ddir = ffhp->fh_dentry;
dirp = d_inode(ddir);
- inode_lock_nested(dirp, I_MUTEX_PARENT);
+ dnew = start_creating(&nop_mnt_idmap, ddir, &QSTR_LEN(name, len));
- dnew = lookup_one(&nop_mnt_idmap, &QSTR_LEN(name, len), ddir);
if (IS_ERR(dnew)) {
host_err = PTR_ERR(dnew);
- goto out_unlock;
+ goto out_drop_write;
}
dold = tfhp->fh_dentry;
err = nfserr_noent;
if (d_really_is_negative(dold))
- goto out_dput;
+ goto out_unlock;
err = fh_fill_pre_attrs(ffhp);
if (err != nfs_ok)
- goto out_dput;
+ goto out_unlock;
host_err = vfs_link(dold, &nop_mnt_idmap, dirp, dnew, NULL);
fh_fill_post_attrs(ffhp);
- inode_unlock(dirp);
+out_unlock:
+ end_creating(dnew, ddir);
if (!host_err) {
host_err = commit_metadata(ffhp);
if (!host_err)
host_err = commit_metadata(tfhp);
}
- dput(dnew);
out_drop_write:
fh_drop_write(tfhp);
if (host_err == -EBUSY) {
@@ -1849,12 +1845,6 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
}
out:
return err != nfs_ok ? err : nfserrno(host_err);
-
-out_dput:
- dput(dnew);
-out_unlock:
- inode_unlock(dirp);
- goto out_drop_write;
}
static void
diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index aac7e34f56c1..7a31ca9bdea2 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -613,9 +613,9 @@ static int ovl_link_up(struct ovl_copy_up_ctx *c)
if (err)
goto out;
- inode_lock_nested(udir, I_MUTEX_PARENT);
- upper = ovl_lookup_upper(ofs, c->dentry->d_name.name, upperdir,
- c->dentry->d_name.len);
+ upper = ovl_start_creating_upper(ofs, upperdir,
+ &QSTR_LEN(c->dentry->d_name.name,
+ c->dentry->d_name.len));
err = PTR_ERR(upper);
if (!IS_ERR(upper)) {
err = ovl_do_link(ofs, ovl_dentry_upper(c->dentry), udir, upper);
@@ -626,9 +626,8 @@ static int ovl_link_up(struct ovl_copy_up_ctx *c)
ovl_dentry_set_upper_alias(c->dentry);
ovl_dentry_update_reval(c->dentry, upper);
}
- dput(upper);
+ end_creating(upper, upperdir);
}
- inode_unlock(udir);
if (err)
goto out;
@@ -894,16 +893,14 @@ static int ovl_copy_up_tmpfile(struct ovl_copy_up_ctx *c)
if (err)
goto out;
- inode_lock_nested(udir, I_MUTEX_PARENT);
-
- upper = ovl_lookup_upper(ofs, c->destname.name, c->destdir,
- c->destname.len);
+ upper = ovl_start_creating_upper(ofs, c->destdir,
+ &QSTR_LEN(c->destname.name,
+ c->destname.len));
err = PTR_ERR(upper);
if (!IS_ERR(upper)) {
err = ovl_do_link(ofs, temp, udir, upper);
- dput(upper);
+ end_creating(upper, c->destdir);
}
- inode_unlock(udir);
if (err)
goto out;
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index a5e9ddf3023b..a8a24abee6b3 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -59,15 +59,21 @@ int ovl_cleanup(struct ovl_fs *ofs, struct dentry *workdir,
return 0;
}
-struct dentry *ovl_lookup_temp(struct ovl_fs *ofs, struct dentry *workdir)
+#define OVL_TEMPNAME_SIZE 20
+static void ovl_tempname(char name[OVL_TEMPNAME_SIZE])
{
- struct dentry *temp;
- char name[20];
static atomic_t temp_id = ATOMIC_INIT(0);
/* counter is allowed to wrap, since temp dentries are ephemeral */
- snprintf(name, sizeof(name), "#%x", atomic_inc_return(&temp_id));
+ snprintf(name, OVL_TEMPNAME_SIZE, "#%x", atomic_inc_return(&temp_id));
+}
+
+struct dentry *ovl_lookup_temp(struct ovl_fs *ofs, struct dentry *workdir)
+{
+ struct dentry *temp;
+ char name[OVL_TEMPNAME_SIZE];
+ ovl_tempname(name);
temp = ovl_lookup_upper(ofs, name, workdir, strlen(name));
if (!IS_ERR(temp) && temp->d_inode) {
pr_err("workdir/%s already exists\n", name);
@@ -78,45 +84,49 @@ struct dentry *ovl_lookup_temp(struct ovl_fs *ofs, struct dentry *workdir)
return temp;
}
+static struct dentry *ovl_start_creating_temp(struct ovl_fs *ofs,
+ struct dentry *workdir)
+{
+ char name[OVL_TEMPNAME_SIZE];
+
+ ovl_tempname(name);
+ return start_creating(ovl_upper_mnt_idmap(ofs), workdir,
+ &QSTR(name));
+}
+
static struct dentry *ovl_whiteout(struct ovl_fs *ofs)
{
int err;
- struct dentry *whiteout;
+ struct dentry *whiteout, *link;
struct dentry *workdir = ofs->workdir;
struct inode *wdir = workdir->d_inode;
guard(mutex)(&ofs->whiteout_lock);
if (!ofs->whiteout) {
- inode_lock_nested(wdir, I_MUTEX_PARENT);
- whiteout = ovl_lookup_temp(ofs, workdir);
- if (!IS_ERR(whiteout)) {
- err = ovl_do_whiteout(ofs, wdir, whiteout);
- if (err) {
- dput(whiteout);
- whiteout = ERR_PTR(err);
- }
- }
- inode_unlock(wdir);
+ whiteout = ovl_start_creating_temp(ofs, workdir);
if (IS_ERR(whiteout))
return whiteout;
- ofs->whiteout = whiteout;
+ err = ovl_do_whiteout(ofs, wdir, whiteout);
+ if (!err)
+ ofs->whiteout = dget(whiteout);
+ end_creating(whiteout, workdir);
+ if (err)
+ return ERR_PTR(err);
}
if (!ofs->no_shared_whiteout) {
- inode_lock_nested(wdir, I_MUTEX_PARENT);
- whiteout = ovl_lookup_temp(ofs, workdir);
- if (!IS_ERR(whiteout)) {
- err = ovl_do_link(ofs, ofs->whiteout, wdir, whiteout);
- if (err) {
- dput(whiteout);
- whiteout = ERR_PTR(err);
- }
- }
- inode_unlock(wdir);
- if (!IS_ERR(whiteout))
- return whiteout;
- if (PTR_ERR(whiteout) != -EMLINK) {
+ link = ovl_start_creating_temp(ofs, workdir);
+ if (IS_ERR(link))
+ return link;
+ err = ovl_do_link(ofs, ofs->whiteout, wdir, link);
+ if (!err)
+ whiteout = dget(link);
+ end_creating(link, workdir);
+ if (!err)
+ return whiteout;;
+
+ if (err != -EMLINK) {
pr_warn("Failed to link whiteout - disabling whiteout inode sharing(nlink=%u, err=%lu)\n",
ofs->whiteout->d_inode->i_nlink,
PTR_ERR(whiteout));
@@ -252,10 +262,13 @@ struct dentry *ovl_create_temp(struct ovl_fs *ofs, struct dentry *workdir,
struct ovl_cattr *attr)
{
struct dentry *ret;
- inode_lock_nested(workdir->d_inode, I_MUTEX_PARENT);
- ret = ovl_create_real(ofs, workdir,
- ovl_lookup_temp(ofs, workdir), attr);
- inode_unlock(workdir->d_inode);
+ ret = ovl_start_creating_temp(ofs, workdir);
+ if (IS_ERR(ret))
+ return ret;
+ ret = ovl_create_real(ofs, workdir, ret, attr);
+ if (!IS_ERR(ret))
+ dget(ret);
+ end_creating(ret, workdir);
return ret;
}
@@ -354,18 +367,21 @@ static int ovl_create_upper(struct dentry *dentry, struct inode *inode,
{
struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
struct dentry *upperdir = ovl_dentry_upper(dentry->d_parent);
- struct inode *udir = upperdir->d_inode;
struct dentry *newdentry;
int err;
- inode_lock_nested(udir, I_MUTEX_PARENT);
- newdentry = ovl_create_real(ofs, upperdir,
- ovl_lookup_upper(ofs, dentry->d_name.name,
- upperdir, dentry->d_name.len),
- attr);
- inode_unlock(udir);
+ newdentry = ovl_start_creating_upper(ofs, upperdir,
+ &QSTR_LEN(dentry->d_name.name,
+ dentry->d_name.len));
if (IS_ERR(newdentry))
return PTR_ERR(newdentry);
+ newdentry = ovl_create_real(ofs, upperdir, newdentry, attr);
+ if (IS_ERR(newdentry)) {
+ end_creating(newdentry, upperdir);
+ return PTR_ERR(newdentry);
+ }
+ dget(newdentry);
+ end_creating(newdentry, upperdir);
if (ovl_type_merge(dentry->d_parent) && d_is_dir(newdentry) &&
!ovl_allow_offline_changes(ofs)) {
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index c8fd5951fc5e..beeba96cfcb2 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -415,6 +415,14 @@ static inline struct dentry *ovl_lookup_upper_unlocked(struct ovl_fs *ofs,
&QSTR_LEN(name, len), base);
}
+static inline struct dentry *ovl_start_creating_upper(struct ovl_fs *ofs,
+ struct dentry *parent,
+ struct qstr *name)
+{
+ return start_creating(ovl_upper_mnt_idmap(ofs),
+ parent, name);
+}
+
static inline bool ovl_open_flags_need_copy_up(int flags)
{
if (!flags)
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 43ee4c7296a7..6e0816c1147a 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -310,8 +310,7 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
bool retried = false;
retry:
- inode_lock_nested(dir, I_MUTEX_PARENT);
- work = ovl_lookup_upper(ofs, name, ofs->workbasedir, strlen(name));
+ work = ovl_start_creating_upper(ofs, ofs->workbasedir, &QSTR(name));
if (!IS_ERR(work)) {
struct iattr attr = {
@@ -320,14 +319,13 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
};
if (work->d_inode) {
+ dget(work);
+ end_creating(work, ofs->workbasedir);
+ if (persist)
+ return work;
err = -EEXIST;
- inode_unlock(dir);
if (retried)
goto out_dput;
-
- if (persist)
- return work;
-
retried = true;
err = ovl_workdir_cleanup(ofs, ofs->workbasedir, mnt, work, 0);
dput(work);
@@ -338,7 +336,9 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
}
work = ovl_do_mkdir(ofs, dir, work, attr.ia_mode);
- inode_unlock(dir);
+ if (!IS_ERR(work))
+ dget(work);
+ end_creating(work, ofs->workbasedir);
err = PTR_ERR(work);
if (IS_ERR(work))
goto out_err;
@@ -376,7 +376,6 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
if (err)
goto out_dput;
} else {
- inode_unlock(dir);
err = PTR_ERR(work);
goto out_err;
}
@@ -626,14 +625,17 @@ static struct dentry *ovl_lookup_or_create(struct ovl_fs *ofs,
struct dentry *parent,
const char *name, umode_t mode)
{
- size_t len = strlen(name);
struct dentry *child;
- inode_lock_nested(parent->d_inode, I_MUTEX_PARENT);
- child = ovl_lookup_upper(ofs, name, parent, len);
- if (!IS_ERR(child) && !child->d_inode)
- child = ovl_create_real(ofs, parent, child, OVL_CATTR(mode));
- inode_unlock(parent->d_inode);
+ child = ovl_start_creating_upper(ofs, parent, &QSTR(name));
+ if (!IS_ERR(child)) {
+ if (!child->d_inode)
+ child = ovl_create_real(ofs, parent, child,
+ OVL_CATTR(mode));
+ if (!IS_ERR(child))
+ dget(child);
+ end_creating(child, parent);
+ }
dput(parent);
return child;
diff --git a/include/linux/namei.h b/include/linux/namei.h
index fed86221c69c..3f92c1a16878 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -88,6 +88,39 @@ struct dentry *lookup_one_positive_killable(struct mnt_idmap *idmap,
struct qstr *name,
struct dentry *base);
+struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *parent,
+ struct qstr *name);
+
+/**
+ * end_creating - finish action started with start_creating
+ * @child: dentry returned by start_creating() or vfs_mkdir()
+ * @parent: dentry given to start_creating(),
+ *
+ * Unlock and release the child.
+ *
+ * Unlike end_dirop() this can only be called if start_creating() succeeded.
+ * It handles @child being and error as vfs_mkdir() might have converted the
+ * dentry to an error - in that case the parent still needs to be unlocked.
+ *
+ * If vfs_mkdir() was called then the value returned from that function
+ * should be given for @child rather than the original dentry, as vfs_mkdir()
+ * may have provided a new dentry. Even if vfs_mkdir() returns an error
+ * it must be given to end_creating().
+ *
+ * If vfs_mkdir() was not called, then @child will be a valid dentry and
+ * @parent will be ignored.
+ */
+static inline void end_creating(struct dentry *child, struct dentry *parent)
+{
+ if (IS_ERR(child))
+ /* The parent is still locked despite the error from
+ * vfs_mkdir() - must unlock it.
+ */
+ inode_unlock(parent->d_inode);
+ else
+ end_dirop(child);
+}
+
extern int follow_down_one(struct path *);
extern int follow_down(struct path *path, unsigned int flags);
extern int follow_up(struct path *);
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 05/14] VFS/nfsd/cachefiles/ovl: introduce start_removing() and end_removing()
2025-10-15 1:46 [PATCH v2 00/14] Create and use APIs to centralise locking for directory ops NeilBrown
` (3 preceding siblings ...)
2025-10-15 1:46 ` [PATCH v2 04/14] VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating() NeilBrown
@ 2025-10-15 1:46 ` NeilBrown
2025-10-15 1:46 ` [PATCH v2 06/14] VFS: introduce start_creating_noperm() and start_removing_noperm() NeilBrown
` (9 subsequent siblings)
14 siblings, 0 replies; 32+ messages in thread
From: NeilBrown @ 2025-10-15 1:46 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
start_removing() is similar to start_creating() but will only return a
positive dentry with the expectation that it will be removed. This is
used by nfsd, cachefiles, and overlayfs. They are changed to also use
end_removing() to terminate the action begun by start_removing(). This
is a simple alias for end_dirop().
Apart from changes to the error paths, as we no longer need to unlock on
a lookup error, an effect on callers is that they don't need to test if
the found dentry is positive or negative - they can be sure it is
positive.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/cachefiles/namei.c | 32 ++++++++++++++------------------
fs/namei.c | 27 +++++++++++++++++++++++++++
fs/nfsd/nfs4recover.c | 18 +++++-------------
fs/nfsd/vfs.c | 26 ++++++++++----------------
fs/overlayfs/dir.c | 15 +++++++--------
fs/overlayfs/overlayfs.h | 8 ++++++++
include/linux/namei.h | 18 ++++++++++++++++++
7 files changed, 89 insertions(+), 55 deletions(-)
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 0a136eb434da..c7f0c6ab9b88 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -260,6 +260,7 @@ static int cachefiles_unlink(struct cachefiles_cache *cache,
* - File backed objects are unlinked
* - Directory backed objects are stuffed into the graveyard for userspace to
* delete
+ * On entry dir must be locked. It will be unlocked on exit.
*/
int cachefiles_bury_object(struct cachefiles_cache *cache,
struct cachefiles_object *object,
@@ -274,28 +275,30 @@ int cachefiles_bury_object(struct cachefiles_cache *cache,
_enter(",'%pd','%pd'", dir, rep);
+ /* end_removing() will dput() @rep but we need to keep
+ * a ref, so take one now. This also stops the dentry
+ * being negated when unlinked which we need.
+ */
+ dget(rep);
+
if (rep->d_parent != dir) {
- inode_unlock(d_inode(dir));
+ end_removing(rep);
_leave(" = -ESTALE");
return -ESTALE;
}
/* non-directories can just be unlinked */
if (!d_is_dir(rep)) {
- dget(rep); /* Stop the dentry being negated if it's only pinned
- * by a file struct.
- */
ret = cachefiles_unlink(cache, object, dir, rep, why);
- dput(rep);
+ end_removing(rep);
- inode_unlock(d_inode(dir));
_leave(" = %d", ret);
return ret;
}
/* directories have to be moved to the graveyard */
_debug("move stale object to graveyard");
- inode_unlock(d_inode(dir));
+ end_removing(rep);
try_again:
/* first step is to make up a grave dentry in the graveyard */
@@ -749,26 +752,20 @@ static struct dentry *cachefiles_lookup_for_cull(struct cachefiles_cache *cache,
struct dentry *victim;
int ret = -ENOENT;
- inode_lock_nested(d_inode(dir), I_MUTEX_PARENT);
+ victim = start_removing(&nop_mnt_idmap, dir, &QSTR(filename));
- victim = lookup_one(&nop_mnt_idmap, &QSTR(filename), dir);
if (IS_ERR(victim))
goto lookup_error;
- if (d_is_negative(victim))
- goto lookup_put;
if (d_inode(victim)->i_flags & S_KERNEL_FILE)
goto lookup_busy;
return victim;
lookup_busy:
ret = -EBUSY;
-lookup_put:
- inode_unlock(d_inode(dir));
- dput(victim);
+ end_removing(victim);
return ERR_PTR(ret);
lookup_error:
- inode_unlock(d_inode(dir));
ret = PTR_ERR(victim);
if (ret == -ENOENT)
return ERR_PTR(-ESTALE); /* Probably got retired by the netfs */
@@ -816,18 +813,17 @@ int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *dir,
ret = cachefiles_bury_object(cache, NULL, dir, victim,
FSCACHE_OBJECT_WAS_CULLED);
+ dput(victim);
if (ret < 0)
goto error;
fscache_count_culled();
- dput(victim);
_leave(" = 0");
return 0;
error_unlock:
- inode_unlock(d_inode(dir));
+ end_removing(victim);
error:
- dput(victim);
if (ret == -ENOENT)
return -ESTALE; /* Probably got retired by the netfs */
diff --git a/fs/namei.c b/fs/namei.c
index 9972b0257a4c..ae833dfa277c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3248,6 +3248,33 @@ struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *parent,
}
EXPORT_SYMBOL(start_creating);
+/**
+ * start_removing - prepare to remove a given name with permission checking
+ * @idmap: idmap of the mount
+ * @parent: directory in which to find the name
+ * @name: the name to be removed
+ *
+ * Locks are taken and a lookup in performed prior to removing
+ * an object from a directory. Permission checking (MAY_EXEC) is performed
+ * against @idmap.
+ *
+ * If the name doesn't exist, an error is returned.
+ *
+ * end_removing() should be called when removal is complete, or aborted.
+ *
+ * Returns: a positive dentry, or an error.
+ */
+struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *parent,
+ struct qstr *name)
+{
+ int err = lookup_one_common(idmap, name, parent);
+
+ if (err)
+ return ERR_PTR(err);
+ return start_dirop(parent, name, 0);
+}
+EXPORT_SYMBOL(start_removing);
+
#ifdef CONFIG_UNIX98_PTYS
int path_pts(struct path *path)
{
diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
index c247a7c3291c..3eefaa2202e3 100644
--- a/fs/nfsd/nfs4recover.c
+++ b/fs/nfsd/nfs4recover.c
@@ -324,20 +324,12 @@ nfsd4_unlink_clid_dir(char *name, struct nfsd_net *nn)
dprintk("NFSD: nfsd4_unlink_clid_dir. name %s\n", name);
dir = nn->rec_file->f_path.dentry;
- inode_lock_nested(d_inode(dir), I_MUTEX_PARENT);
- dentry = lookup_one(&nop_mnt_idmap, &QSTR(name), dir);
- if (IS_ERR(dentry)) {
- status = PTR_ERR(dentry);
- goto out_unlock;
- }
- status = -ENOENT;
- if (d_really_is_negative(dentry))
- goto out;
+ dentry = start_removing(&nop_mnt_idmap, dir, &QSTR(name));
+ if (IS_ERR(dentry))
+ return PTR_ERR(dentry);
+
status = vfs_rmdir(&nop_mnt_idmap, d_inode(dir), dentry);
-out:
- dput(dentry);
-out_unlock:
- inode_unlock(d_inode(dir));
+ end_removing(dentry);
return status;
}
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 4efd3688e081..cd64ffe12e0b 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -2044,7 +2044,7 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
{
struct dentry *dentry, *rdentry;
struct inode *dirp;
- struct inode *rinode;
+ struct inode *rinode = NULL;
__be32 err;
int host_err;
@@ -2063,24 +2063,21 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
dentry = fhp->fh_dentry;
dirp = d_inode(dentry);
- inode_lock_nested(dirp, I_MUTEX_PARENT);
- rdentry = lookup_one(&nop_mnt_idmap, &QSTR_LEN(fname, flen), dentry);
+ rdentry = start_removing(&nop_mnt_idmap, dentry, &QSTR_LEN(fname, flen));
+
host_err = PTR_ERR(rdentry);
if (IS_ERR(rdentry))
- goto out_unlock;
+ goto out_drop_write;
- if (d_really_is_negative(rdentry)) {
- dput(rdentry);
- host_err = -ENOENT;
- goto out_unlock;
- }
- rinode = d_inode(rdentry);
err = fh_fill_pre_attrs(fhp);
if (err != nfs_ok)
goto out_unlock;
+ rinode = d_inode(rdentry);
+ /* Prevent truncation until after locks dropped */
ihold(rinode);
+
if (!type)
type = d_inode(rdentry)->i_mode & S_IFMT;
@@ -2102,10 +2099,10 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
}
fh_fill_post_attrs(fhp);
- inode_unlock(dirp);
- if (!host_err)
+out_unlock:
+ end_removing(rdentry);
+ if (!err && !host_err)
host_err = commit_metadata(fhp);
- dput(rdentry);
iput(rinode); /* truncate the inode here */
out_drop_write:
@@ -2123,9 +2120,6 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
}
out:
return err != nfs_ok ? err : nfserrno(host_err);
-out_unlock:
- inode_unlock(dirp);
- goto out_drop_write;
}
/*
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index a8a24abee6b3..b5247c9e1903 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -866,17 +866,17 @@ static int ovl_remove_upper(struct dentry *dentry, bool is_dir,
goto out;
}
- inode_lock_nested(dir, I_MUTEX_PARENT);
- upper = ovl_lookup_upper(ofs, dentry->d_name.name, upperdir,
- dentry->d_name.len);
+ upper = ovl_start_removing_upper(ofs, upperdir,
+ &QSTR_LEN(dentry->d_name.name,
+ dentry->d_name.len));
err = PTR_ERR(upper);
if (IS_ERR(upper))
- goto out_unlock;
+ goto out_dput;
err = -ESTALE;
if ((opaquedir && upper != opaquedir) ||
(!opaquedir && !ovl_matches_upper(dentry, upper)))
- goto out_dput_upper;
+ goto out_unlock;
if (is_dir)
err = ovl_do_rmdir(ofs, dir, upper);
@@ -892,10 +892,9 @@ static int ovl_remove_upper(struct dentry *dentry, bool is_dir,
*/
if (!err)
d_drop(dentry);
-out_dput_upper:
- dput(upper);
out_unlock:
- inode_unlock(dir);
+ end_removing(upper);
+out_dput:
dput(opaquedir);
out:
return err;
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index beeba96cfcb2..49ad65f829dc 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -423,6 +423,14 @@ static inline struct dentry *ovl_start_creating_upper(struct ovl_fs *ofs,
parent, name);
}
+static inline struct dentry *ovl_start_removing_upper(struct ovl_fs *ofs,
+ struct dentry *parent,
+ struct qstr *name)
+{
+ return start_removing(ovl_upper_mnt_idmap(ofs),
+ parent, name);
+}
+
static inline bool ovl_open_flags_need_copy_up(int flags)
{
if (!flags)
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 3f92c1a16878..9ee76e88f3dd 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -90,6 +90,8 @@ struct dentry *lookup_one_positive_killable(struct mnt_idmap *idmap,
struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *parent,
struct qstr *name);
+struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *parent,
+ struct qstr *name);
/**
* end_creating - finish action started with start_creating
@@ -121,6 +123,22 @@ static inline void end_creating(struct dentry *child, struct dentry *parent)
end_dirop(child);
}
+/**
+ * end_removing - finish action started with start_removing
+ * @child: dentry returned by start_removing()
+ * @parent: dentry given to start_removing()
+ *
+ * Unlock and release the child.
+ *
+ * This is identical to end_dirop(). It can be passed the result of
+ * start_removing() whether that was successful or not, but it not needed
+ * if start_removing() failed.
+ */
+static inline void end_removing(struct dentry *child)
+{
+ end_dirop(child);
+}
+
extern int follow_down_one(struct path *);
extern int follow_down(struct path *path, unsigned int flags);
extern int follow_up(struct path *);
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 06/14] VFS: introduce start_creating_noperm() and start_removing_noperm()
2025-10-15 1:46 [PATCH v2 00/14] Create and use APIs to centralise locking for directory ops NeilBrown
` (4 preceding siblings ...)
2025-10-15 1:46 ` [PATCH v2 05/14] VFS/nfsd/cachefiles/ovl: introduce start_removing() and end_removing() NeilBrown
@ 2025-10-15 1:46 ` NeilBrown
2025-10-19 10:15 ` Amir Goldstein
2025-10-20 8:36 ` kernel test robot
2025-10-15 1:46 ` [PATCH v2 07/14] VFS: introduce start_removing_dentry() NeilBrown
` (8 subsequent siblings)
14 siblings, 2 replies; 32+ messages in thread
From: NeilBrown @ 2025-10-15 1:46 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
xfs, fuse, ipc/mqueue need variants of start_creating or start_removing
which do not check permissions.
This patch adds _noperm versions of these functions.
Note that do_mq_open() was only calling mntget() so it could call
path_put() - it didn't really need an extra reference on the mnt.
Now it doesn't call mntget() and uses end_creating() which does
the dput() half of path_put().
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/fuse/dir.c | 19 +++++++---------
fs/namei.c | 48 ++++++++++++++++++++++++++++++++++++++++
fs/xfs/scrub/orphanage.c | 11 ++++-----
include/linux/namei.h | 2 ++
ipc/mqueue.c | 31 +++++++++-----------------
5 files changed, 73 insertions(+), 38 deletions(-)
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index ecaec0fea3a1..40ca94922349 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1397,27 +1397,25 @@ int fuse_reverse_inval_entry(struct fuse_conn *fc, u64 parent_nodeid,
if (!parent)
return -ENOENT;
- inode_lock_nested(parent, I_MUTEX_PARENT);
if (!S_ISDIR(parent->i_mode))
- goto unlock;
+ goto put_parent;
err = -ENOENT;
dir = d_find_alias(parent);
if (!dir)
- goto unlock;
+ goto put_parent;
- name->hash = full_name_hash(dir, name->name, name->len);
- entry = d_lookup(dir, name);
+ entry = start_removing_noperm(dir, name);
dput(dir);
- if (!entry)
- goto unlock;
+ if (IS_ERR(entry))
+ goto put_parent;
fuse_dir_changed(parent);
if (!(flags & FUSE_EXPIRE_ONLY))
d_invalidate(entry);
fuse_invalidate_entry_cache(entry);
- if (child_nodeid != 0 && d_really_is_positive(entry)) {
+ if (child_nodeid != 0) {
inode_lock(d_inode(entry));
if (get_node_id(d_inode(entry)) != child_nodeid) {
err = -ENOENT;
@@ -1445,10 +1443,9 @@ int fuse_reverse_inval_entry(struct fuse_conn *fc, u64 parent_nodeid,
} else {
err = 0;
}
- dput(entry);
- unlock:
- inode_unlock(parent);
+ end_removing(entry);
+ put_parent:
iput(parent);
return err;
}
diff --git a/fs/namei.c b/fs/namei.c
index ae833dfa277c..696e4b794416 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3275,6 +3275,54 @@ struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *parent,
}
EXPORT_SYMBOL(start_removing);
+/**
+ * start_creating_noperm - prepare to create a given name without permission checking
+ * @parent: directory in which to prepare to create the name
+ * @name: the name to be created
+ *
+ * Locks are taken and a lookup in performed prior to creating
+ * an object in a directory.
+ *
+ * If the name already exists, a positive dentry is returned.
+ *
+ * Returns: a negative or positive dentry, or an error.
+ */
+struct dentry *start_creating_noperm(struct dentry *parent,
+ struct qstr *name)
+{
+ int err = lookup_noperm_common(name, parent);
+
+ if (err)
+ return ERR_PTR(err);
+ return start_dirop(parent, name, LOOKUP_CREATE);
+}
+EXPORT_SYMBOL(start_creating_noperm);
+
+/**
+ * start_removing_noperm - prepare to remove a given name without permission checking
+ * @parent: directory in which to find the name
+ * @name: the name to be removed
+ *
+ * Locks are taken and a lookup in performed prior to removing
+ * an object from a directory.
+ *
+ * If the name doesn't exist, an error is returned.
+ *
+ * end_removing() should be called when removal is complete, or aborted.
+ *
+ * Returns: a positive dentry, or an error.
+ */
+struct dentry *start_removing_noperm(struct dentry *parent,
+ struct qstr *name)
+{
+ int err = lookup_noperm_common(name, parent);
+
+ if (err)
+ return ERR_PTR(err);
+ return start_dirop(parent, name, 0);
+}
+EXPORT_SYMBOL(start_removing_noperm);
+
#ifdef CONFIG_UNIX98_PTYS
int path_pts(struct path *path)
{
diff --git a/fs/xfs/scrub/orphanage.c b/fs/xfs/scrub/orphanage.c
index 9c12cb844231..e732605924a1 100644
--- a/fs/xfs/scrub/orphanage.c
+++ b/fs/xfs/scrub/orphanage.c
@@ -152,11 +152,10 @@ xrep_orphanage_create(
}
/* Try to find the orphanage directory. */
- inode_lock_nested(root_inode, I_MUTEX_PARENT);
- orphanage_dentry = lookup_noperm(&QSTR(ORPHANAGE), root_dentry);
+ orphanage_dentry = start_creating_noperm(root_dentry, &QSTR(ORPHANAGE));
if (IS_ERR(orphanage_dentry)) {
error = PTR_ERR(orphanage_dentry);
- goto out_unlock_root;
+ goto out_dput_root;
}
/*
@@ -170,7 +169,7 @@ xrep_orphanage_create(
orphanage_dentry, 0750);
error = PTR_ERR(orphanage_dentry);
if (IS_ERR(orphanage_dentry))
- goto out_unlock_root;
+ goto out_dput_orphanage;
}
/* Not a directory? Bail out. */
@@ -200,9 +199,7 @@ xrep_orphanage_create(
sc->orphanage_ilock_flags = 0;
out_dput_orphanage:
- dput(orphanage_dentry);
-out_unlock_root:
- inode_unlock(VFS_I(sc->mp->m_rootip));
+ end_creating(orphanage_dentry, root_dentry);
out_dput_root:
dput(root_dentry);
out:
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 9ee76e88f3dd..688e157d6afc 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -92,6 +92,8 @@ struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *parent,
struct qstr *name);
struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *parent,
struct qstr *name);
+struct dentry *start_creating_noperm(struct dentry *parent, struct qstr *name);
+struct dentry *start_removing_noperm(struct dentry *parent, struct qstr *name);
/**
* end_creating - finish action started with start_creating
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 093551fe66a7..060e8e9c4f59 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -913,13 +913,11 @@ static int do_mq_open(const char __user *u_name, int oflag, umode_t mode,
goto out_putname;
ro = mnt_want_write(mnt); /* we'll drop it in any case */
- inode_lock(d_inode(root));
- path.dentry = lookup_noperm(&QSTR(name->name), root);
+ path.dentry = start_creating_noperm(root, &QSTR(name->name));
if (IS_ERR(path.dentry)) {
error = PTR_ERR(path.dentry);
goto out_putfd;
}
- path.mnt = mntget(mnt);
error = prepare_open(path.dentry, oflag, ro, mode, name, attr);
if (!error) {
struct file *file = dentry_open(&path, oflag, current_cred());
@@ -928,13 +926,12 @@ static int do_mq_open(const char __user *u_name, int oflag, umode_t mode,
else
error = PTR_ERR(file);
}
- path_put(&path);
out_putfd:
if (error) {
put_unused_fd(fd);
fd = error;
}
- inode_unlock(d_inode(root));
+ end_creating(path.dentry, root);
if (!ro)
mnt_drop_write(mnt);
out_putname:
@@ -957,7 +954,7 @@ SYSCALL_DEFINE1(mq_unlink, const char __user *, u_name)
int err;
struct filename *name;
struct dentry *dentry;
- struct inode *inode = NULL;
+ struct inode *inode;
struct ipc_namespace *ipc_ns = current->nsproxy->ipc_ns;
struct vfsmount *mnt = ipc_ns->mq_mnt;
@@ -969,26 +966,20 @@ SYSCALL_DEFINE1(mq_unlink, const char __user *, u_name)
err = mnt_want_write(mnt);
if (err)
goto out_name;
- inode_lock_nested(d_inode(mnt->mnt_root), I_MUTEX_PARENT);
- dentry = lookup_noperm(&QSTR(name->name), mnt->mnt_root);
+ dentry = start_removing_noperm(mnt->mnt_root, &QSTR(name->name));
if (IS_ERR(dentry)) {
err = PTR_ERR(dentry);
- goto out_unlock;
+ goto out_drop_write;
}
inode = d_inode(dentry);
- if (!inode) {
- err = -ENOENT;
- } else {
- ihold(inode);
- err = vfs_unlink(&nop_mnt_idmap, d_inode(dentry->d_parent),
- dentry, NULL);
- }
- dput(dentry);
-
-out_unlock:
- inode_unlock(d_inode(mnt->mnt_root));
+ ihold(inode);
+ err = vfs_unlink(&nop_mnt_idmap, d_inode(dentry->d_parent),
+ dentry, NULL);
+ end_removing(dentry);
iput(inode);
+
+out_drop_write:
mnt_drop_write(mnt);
out_name:
putname(name);
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 07/14] VFS: introduce start_removing_dentry()
2025-10-15 1:46 [PATCH v2 00/14] Create and use APIs to centralise locking for directory ops NeilBrown
` (5 preceding siblings ...)
2025-10-15 1:46 ` [PATCH v2 06/14] VFS: introduce start_creating_noperm() and start_removing_noperm() NeilBrown
@ 2025-10-15 1:46 ` NeilBrown
2025-10-15 1:47 ` [PATCH v2 08/14] VFS: add start_creating_killable() and start_removing_killable() NeilBrown
` (7 subsequent siblings)
14 siblings, 0 replies; 32+ messages in thread
From: NeilBrown @ 2025-10-15 1:46 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
start_removing_dentry() is similar to start_removing() but instead of
providing a name for lookup, the target dentry is given.
start_removing_dentry() checks that the dentry is still hashed and in
the parent, and if so it locks and increases the refcount so that
end_removing() can be used to finish the operation.
This is used in cachefiles, overlayfs, smb/server, and apparmor.
There will be other users including ecryptfs.
As start_removing_dentry() takes an extra reference to the dentry (to be
put by end_removing()), there is no need to explicitly take an extra
reference to stop d_delete() from using dentry_unlink_inode() to negate
the dentry - as in cachefiles_delete_object(), and ksmbd_vfs_unlink().
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/cachefiles/interface.c | 14 +++++++++-----
fs/cachefiles/namei.c | 24 ++++++++++++++----------
fs/cachefiles/volume.c | 10 +++++++---
fs/namei.c | 33 +++++++++++++++++++++++++++++++++
fs/overlayfs/dir.c | 10 ++++------
fs/overlayfs/readdir.c | 8 ++++----
fs/smb/server/vfs.c | 27 ++++-----------------------
include/linux/namei.h | 2 ++
security/apparmor/apparmorfs.c | 8 ++++----
9 files changed, 81 insertions(+), 55 deletions(-)
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 3e63cfe15874..3f8a6f1a8fc3 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -9,6 +9,7 @@
#include <linux/mount.h>
#include <linux/xattr.h>
#include <linux/file.h>
+#include <linux/namei.h>
#include <linux/falloc.h>
#include <trace/events/fscache.h>
#include "internal.h"
@@ -428,11 +429,14 @@ static bool cachefiles_invalidate_cookie(struct fscache_cookie *cookie)
if (!old_tmpfile) {
struct cachefiles_volume *volume = object->volume;
struct dentry *fan = volume->fanout[(u8)cookie->key_hash];
-
- inode_lock_nested(d_inode(fan), I_MUTEX_PARENT);
- cachefiles_bury_object(volume->cache, object, fan,
- old_file->f_path.dentry,
- FSCACHE_OBJECT_INVALIDATED);
+ struct dentry *obj;
+
+ obj = start_removing_dentry(fan, old_file->f_path.dentry);
+ if (!IS_ERR(obj))
+ cachefiles_bury_object(volume->cache, object,
+ fan, obj,
+ FSCACHE_OBJECT_INVALIDATED);
+ end_removing(obj);
}
fput(old_file);
}
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index c7f0c6ab9b88..b97a40917a32 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -425,13 +425,12 @@ int cachefiles_delete_object(struct cachefiles_object *object,
_enter(",OBJ%x{%pD}", object->debug_id, object->file);
- /* Stop the dentry being negated if it's only pinned by a file struct. */
- dget(dentry);
-
- inode_lock_nested(d_backing_inode(fan), I_MUTEX_PARENT);
- ret = cachefiles_unlink(volume->cache, object, fan, dentry, why);
- inode_unlock(d_backing_inode(fan));
- dput(dentry);
+ dentry = start_removing_dentry(fan, dentry);
+ if (IS_ERR(dentry))
+ ret = PTR_ERR(dentry);
+ else
+ ret = cachefiles_unlink(volume->cache, object, fan, dentry, why);
+ end_removing(dentry);
return ret;
}
@@ -644,9 +643,14 @@ bool cachefiles_look_up_object(struct cachefiles_object *object)
if (!d_is_reg(dentry)) {
pr_err("%pd is not a file\n", dentry);
- inode_lock_nested(d_inode(fan), I_MUTEX_PARENT);
- ret = cachefiles_bury_object(volume->cache, object, fan, dentry,
- FSCACHE_OBJECT_IS_WEIRD);
+ struct dentry *de = start_removing_dentry(fan, dentry);
+ if (IS_ERR(de))
+ ret = PTR_ERR(de);
+ else
+ ret = cachefiles_bury_object(volume->cache, object,
+ fan, de,
+ FSCACHE_OBJECT_IS_WEIRD);
+ end_removing(de);
dput(dentry);
if (ret < 0)
return false;
diff --git a/fs/cachefiles/volume.c b/fs/cachefiles/volume.c
index 781aac4ef274..ddf95ff5daf0 100644
--- a/fs/cachefiles/volume.c
+++ b/fs/cachefiles/volume.c
@@ -7,6 +7,7 @@
#include <linux/fs.h>
#include <linux/slab.h>
+#include <linux/namei.h>
#include "internal.h"
#include <trace/events/fscache.h>
@@ -58,9 +59,12 @@ void cachefiles_acquire_volume(struct fscache_volume *vcookie)
if (ret < 0) {
if (ret != -ESTALE)
goto error_dir;
- inode_lock_nested(d_inode(cache->store), I_MUTEX_PARENT);
- cachefiles_bury_object(cache, NULL, cache->store, vdentry,
- FSCACHE_VOLUME_IS_WEIRD);
+ vdentry = start_removing_dentry(cache->store, vdentry);
+ if (!IS_ERR(vdentry))
+ cachefiles_bury_object(cache, NULL, cache->store,
+ vdentry,
+ FSCACHE_VOLUME_IS_WEIRD);
+ end_removing(vdentry);
cachefiles_put_directory(volume->dentry);
cond_resched();
goto retry;
diff --git a/fs/namei.c b/fs/namei.c
index 696e4b794416..bfc443bec8a9 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3323,6 +3323,39 @@ struct dentry *start_removing_noperm(struct dentry *parent,
}
EXPORT_SYMBOL(start_removing_noperm);
+/**
+ * start_removing_dentry - prepare to remove a given dentry
+ * @parent: directory from which dentry should be removed
+ * @child: the dentry to be removed
+ *
+ * A lock is taken to protect the dentry again other dirops and
+ * the validity of the dentry is checked: correct parent and still hashed.
+ *
+ * If the dentry is valid and positive, a reference is taken and
+ * returned. If not an error is returned.
+ *
+ * end_removing() should be called when removal is complete, or aborted.
+ *
+ * Returns: the valid dentry, or an error.
+ */
+struct dentry *start_removing_dentry(struct dentry *parent,
+ struct dentry *child)
+{
+ inode_lock_nested(parent->d_inode, I_MUTEX_PARENT);
+ if (unlikely(IS_DEADDIR(parent->d_inode) ||
+ child->d_parent != parent ||
+ d_unhashed(child))) {
+ inode_unlock(parent->d_inode);
+ return ERR_PTR(-EINVAL);
+ }
+ if (d_is_negative(child)) {
+ inode_unlock(parent->d_inode);
+ return ERR_PTR(-ENOENT);
+ }
+ return dget(child);
+}
+EXPORT_SYMBOL(start_removing_dentry);
+
#ifdef CONFIG_UNIX98_PTYS
int path_pts(struct path *path)
{
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index b5247c9e1903..c8d0885ee5e0 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -47,14 +47,12 @@ static int ovl_cleanup_locked(struct ovl_fs *ofs, struct inode *wdir,
int ovl_cleanup(struct ovl_fs *ofs, struct dentry *workdir,
struct dentry *wdentry)
{
- int err;
-
- err = ovl_parent_lock(workdir, wdentry);
- if (err)
- return err;
+ wdentry = start_removing_dentry(workdir, wdentry);
+ if (IS_ERR(wdentry))
+ return PTR_ERR(wdentry);
ovl_cleanup_locked(ofs, workdir->d_inode, wdentry);
- ovl_parent_unlock(workdir);
+ end_removing(wdentry);
return 0;
}
diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index 1e9792cc557b..77ecc39fc33a 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -1242,11 +1242,11 @@ int ovl_workdir_cleanup(struct ovl_fs *ofs, struct dentry *parent,
if (!d_is_dir(dentry) || level > 1)
return ovl_cleanup(ofs, parent, dentry);
- err = ovl_parent_lock(parent, dentry);
- if (err)
- return err;
+ dentry = start_removing_dentry(parent, dentry);
+ if (IS_ERR(dentry))
+ return PTR_ERR(dentry);
err = ovl_do_rmdir(ofs, parent->d_inode, dentry);
- ovl_parent_unlock(parent);
+ end_removing(dentry);
if (err) {
struct path path = { .mnt = mnt, .dentry = dentry };
diff --git a/fs/smb/server/vfs.c b/fs/smb/server/vfs.c
index 891ed2dc2b73..7c4ddc43ab39 100644
--- a/fs/smb/server/vfs.c
+++ b/fs/smb/server/vfs.c
@@ -49,24 +49,6 @@ static void ksmbd_vfs_inherit_owner(struct ksmbd_work *work,
i_uid_write(inode, i_uid_read(parent_inode));
}
-/**
- * ksmbd_vfs_lock_parent() - lock parent dentry if it is stable
- * @parent: parent dentry
- * @child: child dentry
- *
- * Returns: %0 on success, %-ENOENT if the parent dentry is not stable
- */
-int ksmbd_vfs_lock_parent(struct dentry *parent, struct dentry *child)
-{
- inode_lock_nested(d_inode(parent), I_MUTEX_PARENT);
- if (child->d_parent != parent) {
- inode_unlock(d_inode(parent));
- return -ENOENT;
- }
-
- return 0;
-}
-
static int ksmbd_vfs_path_lookup(struct ksmbd_share_config *share_conf,
char *pathname, unsigned int flags,
struct path *path, bool do_lock)
@@ -1084,18 +1066,17 @@ int ksmbd_vfs_unlink(struct file *filp)
return err;
dir = dget_parent(dentry);
- err = ksmbd_vfs_lock_parent(dir, dentry);
- if (err)
+ dentry = start_removing_dentry(dir, dentry);
+ err = PTR_ERR(dentry);
+ if (IS_ERR(dentry))
goto out;
- dget(dentry);
if (S_ISDIR(d_inode(dentry)->i_mode))
err = vfs_rmdir(idmap, d_inode(dir), dentry);
else
err = vfs_unlink(idmap, d_inode(dir), dentry, NULL);
- dput(dentry);
- inode_unlock(d_inode(dir));
+ end_removing(dentry);
if (err)
ksmbd_debug(VFS, "failed to delete, err %d\n", err);
out:
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 688e157d6afc..7e916e9d7726 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -94,6 +94,8 @@ struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *parent,
struct qstr *name);
struct dentry *start_creating_noperm(struct dentry *parent, struct qstr *name);
struct dentry *start_removing_noperm(struct dentry *parent, struct qstr *name);
+struct dentry *start_removing_dentry(struct dentry *parent,
+ struct dentry *child);
/**
* end_creating - finish action started with start_creating
diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c
index 391a586d0557..9d08d103f142 100644
--- a/security/apparmor/apparmorfs.c
+++ b/security/apparmor/apparmorfs.c
@@ -355,17 +355,17 @@ static void aafs_remove(struct dentry *dentry)
if (!dentry || IS_ERR(dentry))
return;
+ /* ->d_parent is stable as rename is not supported */
dir = d_inode(dentry->d_parent);
- inode_lock(dir);
- if (simple_positive(dentry)) {
+ dentry = start_removing_dentry(dentry->d_parent, dentry);
+ if (!IS_ERR(dentry) && simple_positive(dentry)) {
if (d_is_dir(dentry))
simple_rmdir(dir, dentry);
else
simple_unlink(dir, dentry);
d_delete(dentry);
- dput(dentry);
}
- inode_unlock(dir);
+ end_removing(dentry);
simple_release_fs(&aafs_mnt, &aafs_count);
}
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 08/14] VFS: add start_creating_killable() and start_removing_killable()
2025-10-15 1:46 [PATCH v2 00/14] Create and use APIs to centralise locking for directory ops NeilBrown
` (6 preceding siblings ...)
2025-10-15 1:46 ` [PATCH v2 07/14] VFS: introduce start_removing_dentry() NeilBrown
@ 2025-10-15 1:47 ` NeilBrown
2025-10-15 1:47 ` [PATCH v2 09/14] VFS/nfsd/ovl: introduce start_renaming() and end_renaming() NeilBrown
` (6 subsequent siblings)
14 siblings, 0 replies; 32+ messages in thread
From: NeilBrown @ 2025-10-15 1:47 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
These are similar to start_creating() and start_removing(), but allow a
fatal signal to abort waiting for the lock.
They are used in btrfs for subvol creation and removal.
btrfs_may_create() no longer needs IS_DEADDIR() and
start_creating_killable() includes that check.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/btrfs/ioctl.c | 41 +++++++---------------
fs/namei.c | 80 +++++++++++++++++++++++++++++++++++++++++--
include/linux/namei.h | 6 ++++
3 files changed, 95 insertions(+), 32 deletions(-)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 185bef0df1c2..4fbfdd8faf6a 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -904,14 +904,9 @@ static noinline int btrfs_mksubvol(struct dentry *parent,
struct fscrypt_str name_str = FSTR_INIT((char *)qname->name, qname->len);
int ret;
- ret = down_write_killable_nested(&dir->i_rwsem, I_MUTEX_PARENT);
- if (ret == -EINTR)
- return ret;
-
- dentry = lookup_one(idmap, qname, parent);
- ret = PTR_ERR(dentry);
+ dentry = start_creating_killable(idmap, parent, qname);
if (IS_ERR(dentry))
- goto out_unlock;
+ return PTR_ERR(dentry);
ret = btrfs_may_create(idmap, dir, dentry);
if (ret)
@@ -940,9 +935,7 @@ static noinline int btrfs_mksubvol(struct dentry *parent,
out_up_read:
up_read(&fs_info->subvol_sem);
out_dput:
- dput(dentry);
-out_unlock:
- btrfs_inode_unlock(BTRFS_I(dir), 0);
+ end_creating(dentry, parent);
return ret;
}
@@ -2417,18 +2410,10 @@ static noinline int btrfs_ioctl_snap_destroy(struct file *file,
goto free_subvol_name;
}
- ret = down_write_killable_nested(&dir->i_rwsem, I_MUTEX_PARENT);
- if (ret == -EINTR)
- goto free_subvol_name;
- dentry = lookup_one(idmap, &QSTR(subvol_name), parent);
+ dentry = start_removing_killable(idmap, parent, &QSTR(subvol_name));
if (IS_ERR(dentry)) {
ret = PTR_ERR(dentry);
- goto out_unlock_dir;
- }
-
- if (d_really_is_negative(dentry)) {
- ret = -ENOENT;
- goto out_dput;
+ goto out_end_removing;
}
inode = d_inode(dentry);
@@ -2449,7 +2434,7 @@ static noinline int btrfs_ioctl_snap_destroy(struct file *file,
*/
ret = -EPERM;
if (!btrfs_test_opt(fs_info, USER_SUBVOL_RM_ALLOWED))
- goto out_dput;
+ goto out_end_removing;
/*
* Do not allow deletion if the parent dir is the same
@@ -2460,21 +2445,21 @@ static noinline int btrfs_ioctl_snap_destroy(struct file *file,
*/
ret = -EINVAL;
if (root == dest)
- goto out_dput;
+ goto out_end_removing;
ret = inode_permission(idmap, inode, MAY_WRITE | MAY_EXEC);
if (ret)
- goto out_dput;
+ goto out_end_removing;
}
/* check if subvolume may be deleted by a user */
ret = btrfs_may_delete(idmap, dir, dentry, 1);
if (ret)
- goto out_dput;
+ goto out_end_removing;
if (btrfs_ino(BTRFS_I(inode)) != BTRFS_FIRST_FREE_OBJECTID) {
ret = -EINVAL;
- goto out_dput;
+ goto out_end_removing;
}
btrfs_inode_lock(BTRFS_I(inode), 0);
@@ -2483,10 +2468,8 @@ static noinline int btrfs_ioctl_snap_destroy(struct file *file,
if (!ret)
d_delete_notify(dir, dentry);
-out_dput:
- dput(dentry);
-out_unlock_dir:
- btrfs_inode_unlock(BTRFS_I(dir), 0);
+out_end_removing:
+ end_removing(dentry);
free_subvol_name:
kfree(subvol_name_ptr);
free_parent:
diff --git a/fs/namei.c b/fs/namei.c
index bfc443bec8a9..04d2819bd351 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2778,19 +2778,33 @@ static int filename_parentat(int dfd, struct filename *name,
* Returns: a locked dentry, or an error.
*
*/
-struct dentry *start_dirop(struct dentry *parent, struct qstr *name,
- unsigned int lookup_flags)
+static struct dentry *__start_dirop(struct dentry *parent, struct qstr *name,
+ unsigned int lookup_flags,
+ unsigned int state)
{
struct dentry *dentry;
struct inode *dir = d_inode(parent);
- inode_lock_nested(dir, I_MUTEX_PARENT);
+ if (state == TASK_KILLABLE) {
+ int ret = down_write_killable_nested(&dir->i_rwsem,
+ I_MUTEX_PARENT);
+ if (ret)
+ return ERR_PTR(ret);
+ } else {
+ inode_lock_nested(dir, I_MUTEX_PARENT);
+ }
dentry = lookup_one_qstr_excl(name, parent, lookup_flags);
if (IS_ERR(dentry))
inode_unlock(dir);
return dentry;
}
+struct dentry *start_dirop(struct dentry *parent, struct qstr *name,
+ unsigned int lookup_flags)
+{
+ return __start_dirop(parent, name, lookup_flags, TASK_NORMAL);
+}
+
/**
* end_dirop - signal completion of a dirop
* @de: the dentry which was returned by start_dirop or similar.
@@ -3275,6 +3289,66 @@ struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *parent,
}
EXPORT_SYMBOL(start_removing);
+/**
+ * start_creating_killable - prepare to create a given name with permission checking
+ * @idmap: idmap of the mount
+ * @parent: directory in which to prepare to create the name
+ * @name: the name to be created
+ *
+ * Locks are taken and a lookup in performed prior to creating
+ * an object in a directory. Permission checking (MAY_EXEC) is performed
+ * against @idmap.
+ *
+ * If the name already exists, a positive dentry is returned.
+ *
+ * If a signal is received or was already pending, the function aborts
+ * with -EINTR;
+ *
+ * Returns: a negative or positive dentry, or an error.
+ */
+struct dentry *start_creating_killable(struct mnt_idmap *idmap,
+ struct dentry *parent,
+ struct qstr *name)
+{
+ int err = lookup_one_common(idmap, name, parent);
+
+ if (err)
+ return ERR_PTR(err);
+ return __start_dirop(parent, name, LOOKUP_CREATE, TASK_KILLABLE);
+}
+EXPORT_SYMBOL(start_creating_killable);
+
+/**
+ * start_removing_killable - prepare to remove a given name with permission checking
+ * @idmap: idmap of the mount
+ * @parent: directory in which to find the name
+ * @name: the name to be removed
+ *
+ * Locks are taken and a lookup in performed prior to removing
+ * an object from a directory. Permission checking (MAY_EXEC) is performed
+ * against @idmap.
+ *
+ * If the name doesn't exist, an error is returned.
+ *
+ * end_removing() should be called when removal is complete, or aborted.
+ *
+ * If a signal is received or was already pending, the function aborts
+ * with -EINTR;
+ *
+ * Returns: a positive dentry, or an error.
+ */
+struct dentry *start_removing_killable(struct mnt_idmap *idmap,
+ struct dentry *parent,
+ struct qstr *name)
+{
+ int err = lookup_one_common(idmap, name, parent);
+
+ if (err)
+ return ERR_PTR(err);
+ return __start_dirop(parent, name, 0, TASK_KILLABLE);
+}
+EXPORT_SYMBOL(start_removing_killable);
+
/**
* start_creating_noperm - prepare to create a given name without permission checking
* @parent: directory in which to prepare to create the name
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 7e916e9d7726..e5cff89679df 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -92,6 +92,12 @@ struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *parent,
struct qstr *name);
struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *parent,
struct qstr *name);
+struct dentry *start_creating_killable(struct mnt_idmap *idmap,
+ struct dentry *parent,
+ struct qstr *name);
+struct dentry *start_removing_killable(struct mnt_idmap *idmap,
+ struct dentry *parent,
+ struct qstr *name);
struct dentry *start_creating_noperm(struct dentry *parent, struct qstr *name);
struct dentry *start_removing_noperm(struct dentry *parent, struct qstr *name);
struct dentry *start_removing_dentry(struct dentry *parent,
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 09/14] VFS/nfsd/ovl: introduce start_renaming() and end_renaming()
2025-10-15 1:46 [PATCH v2 00/14] Create and use APIs to centralise locking for directory ops NeilBrown
` (7 preceding siblings ...)
2025-10-15 1:47 ` [PATCH v2 08/14] VFS: add start_creating_killable() and start_removing_killable() NeilBrown
@ 2025-10-15 1:47 ` NeilBrown
2025-10-19 10:25 ` Amir Goldstein
2025-10-15 1:47 ` [PATCH v2 10/14] VFS/ovl/smb: introduce start_renaming_dentry() NeilBrown
` (5 subsequent siblings)
14 siblings, 1 reply; 32+ messages in thread
From: NeilBrown @ 2025-10-15 1:47 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
start_renaming() combines name lookup and locking to prepare for rename.
It is used when two names need to be looked up as in nfsd and overlayfs -
cases where one or both dentrys are already available will be handled
separately.
__start_renaming() avoids the inode_permission check and hash
calculation and is suitable after filename_parentat() in do_renameat2().
It subsumes quite a bit of code from that function.
start_renaming() does calculate the hash and check X permission and is
suitable elsewhere:
- nfsd_rename()
- ovl_rename()
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/namei.c | 197 ++++++++++++++++++++++++++++-----------
fs/nfsd/vfs.c | 73 +++++----------
fs/overlayfs/dir.c | 72 ++++++--------
fs/overlayfs/overlayfs.h | 14 +++
include/linux/namei.h | 3 +
5 files changed, 214 insertions(+), 145 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index 04d2819bd351..a2553df8f34e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3667,6 +3667,129 @@ void unlock_rename(struct dentry *p1, struct dentry *p2)
}
EXPORT_SYMBOL(unlock_rename);
+/**
+ * __start_renaming - lookup and lock names for rename
+ * @rd: rename data containing parent and flags, and
+ * for receiving found dentries
+ * @lookup_flags: extra flags to pass to ->lookup (e.g. LOOKUP_REVAL,
+ * LOOKUP_NO_SYMLINKS etc).
+ * @old_last: name of object in @rd.old_parent
+ * @new_last: name of object in @rd.new_parent
+ *
+ * Look up two names and ensure locks are in place for
+ * rename.
+ *
+ * On success the found dentrys are stored in @rd.old_dentry,
+ * @rd.new_dentry. These references and the lock are dropped by
+ * end_renaming().
+ *
+ * The passed in qstrs must have the hash calculated, and no permission
+ * checking is performed.
+ *
+ * Returns: zero or an error.
+ */
+static int
+__start_renaming(struct renamedata *rd, int lookup_flags,
+ struct qstr *old_last, struct qstr *new_last)
+{
+ struct dentry *trap;
+ struct dentry *d1, *d2;
+ int target_flags = LOOKUP_RENAME_TARGET | LOOKUP_CREATE;
+ int err;
+
+ if (rd->flags & RENAME_EXCHANGE)
+ target_flags = 0;
+ if (rd->flags & RENAME_NOREPLACE)
+ target_flags |= LOOKUP_EXCL;
+
+ trap = lock_rename(rd->old_parent, rd->new_parent);
+ if (IS_ERR(trap))
+ return PTR_ERR(trap);
+
+ d1 = lookup_one_qstr_excl(old_last, rd->old_parent,
+ lookup_flags);
+ if (IS_ERR(d1))
+ goto out_unlock_1;
+
+ d2 = lookup_one_qstr_excl(new_last, rd->new_parent,
+ lookup_flags | target_flags);
+ if (IS_ERR(d2))
+ goto out_unlock_2;
+
+ if (d1 == trap) {
+ /* source is an ancestor of target */
+ err = -EINVAL;
+ goto out_unlock_3;
+ }
+
+ if (d2 == trap) {
+ /* target is an ancestor of source */
+ if (rd->flags & RENAME_EXCHANGE)
+ err = -EINVAL;
+ else
+ err = -ENOTEMPTY;
+ goto out_unlock_3;
+ }
+
+ rd->old_dentry = d1;
+ rd->new_dentry = d2;
+ return 0;
+
+out_unlock_3:
+ dput(d2);
+ d2 = ERR_PTR(err);
+out_unlock_2:
+ dput(d1);
+ d1 = d2;
+out_unlock_1:
+ unlock_rename(rd->old_parent, rd->new_parent);
+ return PTR_ERR(d1);
+}
+
+/**
+ * start_renaming - lookup and lock names for rename with permission checking
+ * @rd: rename data containing parent and flags, and
+ * for receiving found dentries
+ * @lookup_flags: extra flags to pass to ->lookup (e.g. LOOKUP_REVAL,
+ * LOOKUP_NO_SYMLINKS etc).
+ * @old_last: name of object in @rd.old_parent
+ * @new_last: name of object in @rd.new_parent
+ *
+ * Look up two names and ensure locks are in place for
+ * rename.
+ *
+ * On success the found dentrys are stored in @rd.old_dentry,
+ * @rd.new_dentry. These references and the lock are dropped by
+ * end_renaming().
+ *
+ * The passed in qstrs need not have the hash calculated, and basic
+ * eXecute permission checking is performed against @rd.mnt_idmap.
+ *
+ * Returns: zero or an error.
+ */
+int start_renaming(struct renamedata *rd, int lookup_flags,
+ struct qstr *old_last, struct qstr *new_last)
+{
+ int err;
+
+ err = lookup_one_common(rd->mnt_idmap, old_last, rd->old_parent);
+ if (err)
+ return err;
+ err = lookup_one_common(rd->mnt_idmap, new_last, rd->new_parent);
+ if (err)
+ return err;
+ return __start_renaming(rd, lookup_flags, old_last, new_last);
+}
+EXPORT_SYMBOL(start_renaming);
+
+void end_renaming(struct renamedata *rd)
+{
+ unlock_rename(rd->old_parent, rd->new_parent);
+ dput(rd->old_dentry);
+ dput(rd->new_dentry);
+}
+EXPORT_SYMBOL(end_renaming);
+
/**
* vfs_prepare_mode - prepare the mode to be used for a new inode
* @idmap: idmap of the mount the inode was found from
@@ -5504,14 +5627,11 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
struct filename *to, unsigned int flags)
{
struct renamedata rd;
- struct dentry *old_dentry, *new_dentry;
- struct dentry *trap;
struct path old_path, new_path;
struct qstr old_last, new_last;
int old_type, new_type;
struct inode *delegated_inode = NULL;
- unsigned int lookup_flags = 0, target_flags =
- LOOKUP_RENAME_TARGET | LOOKUP_CREATE;
+ unsigned int lookup_flags = 0;
bool should_retry = false;
int error = -EINVAL;
@@ -5522,11 +5642,6 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
(flags & RENAME_EXCHANGE))
goto put_names;
- if (flags & RENAME_EXCHANGE)
- target_flags = 0;
- if (flags & RENAME_NOREPLACE)
- target_flags |= LOOKUP_EXCL;
-
retry:
error = filename_parentat(olddfd, from, lookup_flags, &old_path,
&old_last, &old_type);
@@ -5556,66 +5671,40 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
goto exit2;
retry_deleg:
- trap = lock_rename(new_path.dentry, old_path.dentry);
- if (IS_ERR(trap)) {
- error = PTR_ERR(trap);
+ rd.old_parent = old_path.dentry;
+ rd.mnt_idmap = mnt_idmap(old_path.mnt);
+ rd.new_parent = new_path.dentry;
+ rd.delegated_inode = &delegated_inode;
+ rd.flags = flags;
+
+ error = __start_renaming(&rd, lookup_flags, &old_last, &new_last);
+ if (error)
goto exit_lock_rename;
- }
- old_dentry = lookup_one_qstr_excl(&old_last, old_path.dentry,
- lookup_flags);
- error = PTR_ERR(old_dentry);
- if (IS_ERR(old_dentry))
- goto exit3;
- new_dentry = lookup_one_qstr_excl(&new_last, new_path.dentry,
- lookup_flags | target_flags);
- error = PTR_ERR(new_dentry);
- if (IS_ERR(new_dentry))
- goto exit4;
if (flags & RENAME_EXCHANGE) {
- if (!d_is_dir(new_dentry)) {
+ if (!d_is_dir(rd.new_dentry)) {
error = -ENOTDIR;
if (new_last.name[new_last.len])
- goto exit5;
+ goto exit_unlock;
}
}
/* unless the source is a directory trailing slashes give -ENOTDIR */
- if (!d_is_dir(old_dentry)) {
+ if (!d_is_dir(rd.old_dentry)) {
error = -ENOTDIR;
if (old_last.name[old_last.len])
- goto exit5;
+ goto exit_unlock;
if (!(flags & RENAME_EXCHANGE) && new_last.name[new_last.len])
- goto exit5;
- }
- /* source should not be ancestor of target */
- error = -EINVAL;
- if (old_dentry == trap)
- goto exit5;
- /* target should not be an ancestor of source */
- if (!(flags & RENAME_EXCHANGE))
- error = -ENOTEMPTY;
- if (new_dentry == trap)
- goto exit5;
+ goto exit_unlock;
+ }
- error = security_path_rename(&old_path, old_dentry,
- &new_path, new_dentry, flags);
+ error = security_path_rename(&old_path, rd.old_dentry,
+ &new_path, rd.new_dentry, flags);
if (error)
- goto exit5;
+ goto exit_unlock;
- rd.old_parent = old_path.dentry;
- rd.old_dentry = old_dentry;
- rd.mnt_idmap = mnt_idmap(old_path.mnt);
- rd.new_parent = new_path.dentry;
- rd.new_dentry = new_dentry;
- rd.delegated_inode = &delegated_inode;
- rd.flags = flags;
error = vfs_rename(&rd);
-exit5:
- dput(new_dentry);
-exit4:
- dput(old_dentry);
-exit3:
- unlock_rename(new_path.dentry, old_path.dentry);
+exit_unlock:
+ end_renaming(&rd);
exit_lock_rename:
if (delegated_inode) {
error = break_deleg_wait(&delegated_inode);
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index cd64ffe12e0b..62109885d4db 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1885,11 +1885,12 @@ __be32
nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
struct svc_fh *tfhp, char *tname, int tlen)
{
- struct dentry *fdentry, *tdentry, *odentry, *ndentry, *trap;
+ struct dentry *fdentry, *tdentry;
int type = S_IFDIR;
+ struct renamedata rd = {};
__be32 err;
int host_err;
- bool close_cached = false;
+ struct dentry *close_cached;
trace_nfsd_vfs_rename(rqstp, ffhp, tfhp, fname, flen, tname, tlen);
@@ -1915,15 +1916,22 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
goto out;
retry:
+ close_cached = NULL;
host_err = fh_want_write(ffhp);
if (host_err) {
err = nfserrno(host_err);
goto out;
}
- trap = lock_rename(tdentry, fdentry);
- if (IS_ERR(trap)) {
- err = nfserr_xdev;
+ rd.mnt_idmap = &nop_mnt_idmap;
+ rd.old_parent = fdentry;
+ rd.new_parent = tdentry;
+
+ host_err = start_renaming(&rd, 0, &QSTR_LEN(fname, flen),
+ &QSTR_LEN(tname, tlen));
+
+ if (host_err) {
+ err = nfserrno(host_err);
goto out_want_write;
}
err = fh_fill_pre_attrs(ffhp);
@@ -1933,48 +1941,23 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
if (err != nfs_ok)
goto out_unlock;
- odentry = lookup_one(&nop_mnt_idmap, &QSTR_LEN(fname, flen), fdentry);
- host_err = PTR_ERR(odentry);
- if (IS_ERR(odentry))
- goto out_nfserr;
+ type = d_inode(rd.old_dentry)->i_mode & S_IFMT;
+
+ if (d_inode(rd.new_dentry))
+ type = d_inode(rd.new_dentry)->i_mode & S_IFMT;
- host_err = -ENOENT;
- if (d_really_is_negative(odentry))
- goto out_dput_old;
- host_err = -EINVAL;
- if (odentry == trap)
- goto out_dput_old;
- type = d_inode(odentry)->i_mode & S_IFMT;
-
- ndentry = lookup_one(&nop_mnt_idmap, &QSTR_LEN(tname, tlen), tdentry);
- host_err = PTR_ERR(ndentry);
- if (IS_ERR(ndentry))
- goto out_dput_old;
- if (d_inode(ndentry))
- type = d_inode(ndentry)->i_mode & S_IFMT;
- host_err = -ENOTEMPTY;
- if (ndentry == trap)
- goto out_dput_new;
-
- if ((ndentry->d_sb->s_export_op->flags & EXPORT_OP_CLOSE_BEFORE_UNLINK) &&
- nfsd_has_cached_files(ndentry)) {
- close_cached = true;
- goto out_dput_old;
+ if ((rd.new_dentry->d_sb->s_export_op->flags & EXPORT_OP_CLOSE_BEFORE_UNLINK) &&
+ nfsd_has_cached_files(rd.new_dentry)) {
+ close_cached = dget(rd.new_dentry);
+ goto out_unlock;
} else {
- struct renamedata rd = {
- .mnt_idmap = &nop_mnt_idmap,
- .old_parent = fdentry,
- .old_dentry = odentry,
- .new_parent = tdentry,
- .new_dentry = ndentry,
- };
int retries;
for (retries = 1;;) {
host_err = vfs_rename(&rd);
if (host_err != -EAGAIN || !retries--)
break;
- if (!nfsd_wait_for_delegreturn(rqstp, d_inode(odentry)))
+ if (!nfsd_wait_for_delegreturn(rqstp, d_inode(rd.old_dentry)))
break;
}
if (!host_err) {
@@ -1983,11 +1966,6 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
host_err = commit_metadata(ffhp);
}
}
- out_dput_new:
- dput(ndentry);
- out_dput_old:
- dput(odentry);
- out_nfserr:
if (host_err == -EBUSY) {
/*
* See RFC 8881 Section 18.26.4 para 1-3: NFSv4 RENAME
@@ -2006,7 +1984,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
fh_fill_post_attrs(tfhp);
}
out_unlock:
- unlock_rename(tdentry, fdentry);
+ end_renaming(&rd);
out_want_write:
fh_drop_write(ffhp);
@@ -2017,9 +1995,8 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
* until this point and then reattempt the whole shebang.
*/
if (close_cached) {
- close_cached = false;
- nfsd_close_cached_files(ndentry);
- dput(ndentry);
+ nfsd_close_cached_files(close_cached);
+ dput(close_cached);
goto retry;
}
out:
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index c8d0885ee5e0..ded86855e91c 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -1124,9 +1124,7 @@ static int ovl_rename(struct mnt_idmap *idmap, struct inode *olddir,
int err;
struct dentry *old_upperdir;
struct dentry *new_upperdir;
- struct dentry *olddentry = NULL;
- struct dentry *newdentry = NULL;
- struct dentry *trap, *de;
+ struct renamedata rd = {};
bool old_opaque;
bool new_opaque;
bool cleanup_whiteout = false;
@@ -1233,29 +1231,21 @@ static int ovl_rename(struct mnt_idmap *idmap, struct inode *olddir,
}
}
- trap = lock_rename(new_upperdir, old_upperdir);
- if (IS_ERR(trap)) {
- err = PTR_ERR(trap);
- goto out_revert_creds;
- }
+ rd.mnt_idmap = ovl_upper_mnt_idmap(ofs);
+ rd.old_parent = old_upperdir;
+ rd.new_parent = new_upperdir;
+ rd.flags = flags;
- de = ovl_lookup_upper(ofs, old->d_name.name, old_upperdir,
- old->d_name.len);
- err = PTR_ERR(de);
- if (IS_ERR(de))
- goto out_unlock;
- olddentry = de;
+ err = start_renaming(&rd, 0,
+ &QSTR_LEN(old->d_name.name, old->d_name.len),
+ &QSTR_LEN(new->d_name.name, new->d_name.len));
- err = -ESTALE;
- if (!ovl_matches_upper(old, olddentry))
- goto out_unlock;
+ if (err)
+ goto out_revert_creds;
- de = ovl_lookup_upper(ofs, new->d_name.name, new_upperdir,
- new->d_name.len);
- err = PTR_ERR(de);
- if (IS_ERR(de))
+ err = -ESTALE;
+ if (!ovl_matches_upper(old, rd.old_dentry))
goto out_unlock;
- newdentry = de;
old_opaque = ovl_dentry_is_opaque(old);
new_opaque = ovl_dentry_is_opaque(new);
@@ -1263,15 +1253,15 @@ static int ovl_rename(struct mnt_idmap *idmap, struct inode *olddir,
err = -ESTALE;
if (d_inode(new) && ovl_dentry_upper(new)) {
if (opaquedir) {
- if (newdentry != opaquedir)
+ if (rd.new_dentry != opaquedir)
goto out_unlock;
} else {
- if (!ovl_matches_upper(new, newdentry))
+ if (!ovl_matches_upper(new, rd.new_dentry))
goto out_unlock;
}
} else {
- if (!d_is_negative(newdentry)) {
- if (!new_opaque || !ovl_upper_is_whiteout(ofs, newdentry))
+ if (!d_is_negative(rd.new_dentry)) {
+ if (!new_opaque || !ovl_upper_is_whiteout(ofs, rd.new_dentry))
goto out_unlock;
} else {
if (flags & RENAME_EXCHANGE)
@@ -1279,19 +1269,14 @@ static int ovl_rename(struct mnt_idmap *idmap, struct inode *olddir,
}
}
- if (olddentry == trap)
- goto out_unlock;
- if (newdentry == trap)
- goto out_unlock;
-
- if (olddentry->d_inode == newdentry->d_inode)
+ if (rd.old_dentry->d_inode == rd.new_dentry->d_inode)
goto out_unlock;
err = 0;
if (ovl_type_merge_or_lower(old))
err = ovl_set_redirect(old, samedir);
else if (is_dir && !old_opaque && ovl_type_merge(new->d_parent))
- err = ovl_set_opaque_xerr(old, olddentry, -EXDEV);
+ err = ovl_set_opaque_xerr(old, rd.old_dentry, -EXDEV);
if (err)
goto out_unlock;
@@ -1299,19 +1284,22 @@ static int ovl_rename(struct mnt_idmap *idmap, struct inode *olddir,
err = ovl_set_redirect(new, samedir);
else if (!overwrite && new_is_dir && !new_opaque &&
ovl_type_merge(old->d_parent))
- err = ovl_set_opaque_xerr(new, newdentry, -EXDEV);
+ err = ovl_set_opaque_xerr(new, rd.new_dentry, -EXDEV);
if (err)
goto out_unlock;
- err = ovl_do_rename(ofs, old_upperdir, olddentry,
- new_upperdir, newdentry, flags);
- unlock_rename(new_upperdir, old_upperdir);
+ err = ovl_do_rename_rd(&rd);
+
+ dget(rd.new_dentry);
+ end_renaming(&rd);
+
+ if (!err && cleanup_whiteout) {
+ ovl_cleanup(ofs, old_upperdir, rd.new_dentry);
+ }
+ dput(rd.new_dentry);
if (err)
goto out_revert_creds;
- if (cleanup_whiteout)
- ovl_cleanup(ofs, old_upperdir, newdentry);
-
if (overwrite && d_inode(new)) {
if (new_is_dir)
clear_nlink(d_inode(new));
@@ -1336,14 +1324,12 @@ static int ovl_rename(struct mnt_idmap *idmap, struct inode *olddir,
else
ovl_drop_write(old);
out:
- dput(newdentry);
- dput(olddentry);
dput(opaquedir);
ovl_cache_free(&list);
return err;
out_unlock:
- unlock_rename(new_upperdir, old_upperdir);
+ end_renaming(&rd);
goto out_revert_creds;
}
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 49ad65f829dc..aecb527e0524 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -378,6 +378,20 @@ static inline int ovl_do_rename(struct ovl_fs *ofs, struct dentry *olddir,
return err;
}
+static inline int ovl_do_rename_rd(struct renamedata *rd)
+{
+ int err;
+
+ pr_debug("rename(%pd2, %pd2, 0x%x)\n", rd->old_dentry, rd->new_dentry,
+ rd->flags);
+ err = vfs_rename(rd);
+ if (err) {
+ pr_debug("...rename(%pd2, %pd2, ...) = %i\n",
+ rd->old_dentry, rd->new_dentry, err);
+ }
+ return err;
+}
+
static inline int ovl_do_whiteout(struct ovl_fs *ofs,
struct inode *dir, struct dentry *dentry)
{
diff --git a/include/linux/namei.h b/include/linux/namei.h
index e5cff89679df..19c3d8e336d5 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -156,6 +156,9 @@ extern int follow_up(struct path *);
extern struct dentry *lock_rename(struct dentry *, struct dentry *);
extern struct dentry *lock_rename_child(struct dentry *, struct dentry *);
extern void unlock_rename(struct dentry *, struct dentry *);
+int start_renaming(struct renamedata *rd, int lookup_flags,
+ struct qstr *old_last, struct qstr *new_last);
+void end_renaming(struct renamedata *rd);
/**
* mode_strip_umask - handle vfs umask stripping
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 10/14] VFS/ovl/smb: introduce start_renaming_dentry()
2025-10-15 1:46 [PATCH v2 00/14] Create and use APIs to centralise locking for directory ops NeilBrown
` (8 preceding siblings ...)
2025-10-15 1:47 ` [PATCH v2 09/14] VFS/nfsd/ovl: introduce start_renaming() and end_renaming() NeilBrown
@ 2025-10-15 1:47 ` NeilBrown
2025-10-19 10:31 ` Amir Goldstein
2025-10-15 1:47 ` [PATCH v2 11/14] Add start_renaming_two_dentries() NeilBrown
` (4 subsequent siblings)
14 siblings, 1 reply; 32+ messages in thread
From: NeilBrown @ 2025-10-15 1:47 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
Several callers perform a rename on a dentry they already have, and only
require lookup for the target name. This includes smb/server and a few
different places in overlayfs.
start_renaming_dentry() performs the required lookup and takes the
required lock using lock_rename_child()
It is used in three places in overlayfs and in ksmbd_vfs_rename().
In the ksmbd case, the parent of the source is not important - the
source must be renamed from wherever it is. So start_renaming_dentry()
allows rd->old_parent to be NULL and only checks it if it is non-NULL.
On success rd->old_parent will be the parent of old_dentry with an extra
reference taken. Other start_renaming function also now take the extra
reference and end_renaming() now drops this reference as well.
ovl_lookup_temp(), ovl_parent_lock(), and ovl_parent_unlock() are
all removed as they are no longer needed.
OVL_TEMPNAME_SIZE and ovl_tempname() are now declared in overlayfs.h so
that ovl_check_rename_whiteout() can access them.
ovl_copy_up_workdir() now always cleans up on error.
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/namei.c | 108 ++++++++++++++++++++++++++++++++++++---
fs/overlayfs/copy_up.c | 54 +++++++++-----------
fs/overlayfs/dir.c | 19 +------
fs/overlayfs/overlayfs.h | 8 +--
fs/overlayfs/super.c | 22 ++++----
fs/overlayfs/util.c | 11 ----
fs/smb/server/vfs.c | 60 ++++------------------
include/linux/namei.h | 2 +
8 files changed, 150 insertions(+), 134 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index a2553df8f34e..4e694b82e309 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3669,7 +3669,7 @@ EXPORT_SYMBOL(unlock_rename);
/**
* __start_renaming - lookup and lock names for rename
- * @rd: rename data containing parent and flags, and
+ * @rd: rename data containing parents and flags, and
* for receiving found dentries
* @lookup_flags: extra flags to pass to ->lookup (e.g. LOOKUP_REVAL,
* LOOKUP_NO_SYMLINKS etc).
@@ -3680,8 +3680,8 @@ EXPORT_SYMBOL(unlock_rename);
* rename.
*
* On success the found dentrys are stored in @rd.old_dentry,
- * @rd.new_dentry. These references and the lock are dropped by
- * end_renaming().
+ * @rd.new_dentry and an extra ref is taken on @rd.old_parent.
+ * These references and the lock are dropped by end_renaming().
*
* The passed in qstrs must have the hash calculated, and no permission
* checking is performed.
@@ -3733,6 +3733,7 @@ __start_renaming(struct renamedata *rd, int lookup_flags,
rd->old_dentry = d1;
rd->new_dentry = d2;
+ dget(rd->old_parent);
return 0;
out_unlock_3:
@@ -3748,7 +3749,7 @@ __start_renaming(struct renamedata *rd, int lookup_flags,
/**
* start_renaming - lookup and lock names for rename with permission checking
- * @rd: rename data containing parent and flags, and
+ * @rd: rename data containing parents and flags, and
* for receiving found dentries
* @lookup_flags: extra flags to pass to ->lookup (e.g. LOOKUP_REVAL,
* LOOKUP_NO_SYMLINKS etc).
@@ -3759,8 +3760,8 @@ __start_renaming(struct renamedata *rd, int lookup_flags,
* rename.
*
* On success the found dentrys are stored in @rd.old_dentry,
- * @rd.new_dentry. These references and the lock are dropped by
- * end_renaming().
+ * @rd.new_dentry. Also the refcount on @rd->old_parent is increased.
+ * These references and the lock are dropped by end_renaming().
*
* The passed in qstrs need not have the hash calculated, and basic
* eXecute permission checking is performed against @rd.mnt_idmap.
@@ -3782,11 +3783,106 @@ int start_renaming(struct renamedata *rd, int lookup_flags,
}
EXPORT_SYMBOL(start_renaming);
+static int
+__start_renaming_dentry(struct renamedata *rd, int lookup_flags,
+ struct dentry *old_dentry, struct qstr *new_last)
+{
+ struct dentry *trap;
+ struct dentry *d2;
+ int target_flags = LOOKUP_RENAME_TARGET | LOOKUP_CREATE;
+ int err;
+
+ if (rd->flags & RENAME_EXCHANGE)
+ target_flags = 0;
+ if (rd->flags & RENAME_NOREPLACE)
+ target_flags |= LOOKUP_EXCL;
+
+ /* Already have the dentry - need to be sure to lock the correct parent */
+ trap = lock_rename_child(old_dentry, rd->new_parent);
+ if (IS_ERR(trap))
+ return PTR_ERR(trap);
+ if (d_unhashed(old_dentry) ||
+ (rd->old_parent && rd->old_parent != old_dentry->d_parent)) {
+ /* dentry was removed, or moved and explicit parent requested */
+ err = -EINVAL;
+ goto out_unlock;
+ }
+
+ d2 = lookup_one_qstr_excl(new_last, rd->new_parent,
+ lookup_flags | target_flags);
+ err = PTR_ERR(d2);
+ if (IS_ERR(d2))
+ goto out_unlock;
+
+ if (old_dentry == trap) {
+ /* source is an ancestor of target */
+ err = -EINVAL;
+ goto out_dput_d2;
+ }
+
+ if (d2 == trap) {
+ /* target is an ancestor of source */
+ if (rd->flags & RENAME_EXCHANGE)
+ err = -EINVAL;
+ else
+ err = -ENOTEMPTY;
+ goto out_dput_d2;
+ }
+
+ rd->old_dentry = dget(old_dentry);
+ rd->new_dentry = d2;
+ rd->old_parent = dget(old_dentry->d_parent);
+ return 0;
+
+out_dput_d2:
+ dput(d2);
+out_unlock:
+ unlock_rename(old_dentry->d_parent, rd->new_parent);
+ return err;
+}
+
+/**
+ * start_renaming_dentry - lookup and lock name for rename with permission checking
+ * @rd: rename data containing parents and flags, and
+ * for receiving found dentries
+ * @lookup_flags: extra flags to pass to ->lookup (e.g. LOOKUP_REVAL,
+ * LOOKUP_NO_SYMLINKS etc).
+ * @old_dentry: dentry of name to move
+ * @new_last: name of target in @rd.new_parent
+ *
+ * Look up target name and ensure locks are in place for
+ * rename.
+ *
+ * On success the found dentry is stored in @rd.new_dentry and
+ * @rd.old_parent is confirmed to be the parent of @old_dentry. If it
+ * was originally %NULL, it is set. In either case a reference is taken
+ * so that end_renaming() can have a stable reference to unlock.
+ *
+ * References and the lock can be dropped with end_renaming()
+ *
+ * The passed in qstr need not have the hash calculated, and basic
+ * eXecute permission checking is performed against @rd.mnt_idmap.
+ *
+ * Returns: zero or an error.
+ */
+int start_renaming_dentry(struct renamedata *rd, int lookup_flags,
+ struct dentry *old_dentry, struct qstr *new_last)
+{
+ int err;
+
+ err = lookup_one_common(rd->mnt_idmap, new_last, rd->new_parent);
+ if (err)
+ return err;
+ return __start_renaming_dentry(rd, lookup_flags, old_dentry, new_last);
+}
+EXPORT_SYMBOL(start_renaming_dentry);
+
void end_renaming(struct renamedata *rd)
{
unlock_rename(rd->old_parent, rd->new_parent);
dput(rd->old_dentry);
dput(rd->new_dentry);
+ dput(rd->old_parent);
}
EXPORT_SYMBOL(end_renaming);
diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 7a31ca9bdea2..27014ada11c7 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -523,8 +523,8 @@ static int ovl_create_index(struct dentry *dentry, const struct ovl_fh *fh,
{
struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
struct dentry *indexdir = ovl_indexdir(dentry->d_sb);
- struct dentry *index = NULL;
struct dentry *temp = NULL;
+ struct renamedata rd = {};
struct qstr name = { };
int err;
@@ -556,17 +556,15 @@ static int ovl_create_index(struct dentry *dentry, const struct ovl_fh *fh,
if (err)
goto out;
- err = ovl_parent_lock(indexdir, temp);
+ rd.mnt_idmap = ovl_upper_mnt_idmap(ofs);
+ rd.old_parent = indexdir;
+ rd.new_parent = indexdir;
+ err = start_renaming_dentry(&rd, 0, temp, &name);
if (err)
goto out;
- index = ovl_lookup_upper(ofs, name.name, indexdir, name.len);
- if (IS_ERR(index)) {
- err = PTR_ERR(index);
- } else {
- err = ovl_do_rename(ofs, indexdir, temp, indexdir, index, 0);
- dput(index);
- }
- ovl_parent_unlock(indexdir);
+
+ err = ovl_do_rename_rd(&rd);
+ end_renaming(&rd);
out:
if (err)
ovl_cleanup(ofs, indexdir, temp);
@@ -763,7 +761,8 @@ static int ovl_copy_up_workdir(struct ovl_copy_up_ctx *c)
struct ovl_fs *ofs = OVL_FS(c->dentry->d_sb);
struct inode *inode;
struct path path = { .mnt = ovl_upper_mnt(ofs) };
- struct dentry *temp, *upper, *trap;
+ struct renamedata rd = {};
+ struct dentry *temp;
struct ovl_cu_creds cc;
int err;
struct ovl_cattr cattr = {
@@ -807,29 +806,24 @@ static int ovl_copy_up_workdir(struct ovl_copy_up_ctx *c)
* ovl_copy_up_data(), so lock workdir and destdir and make sure that
* temp wasn't moved before copy up completion or cleanup.
*/
- trap = lock_rename(c->workdir, c->destdir);
- if (trap || temp->d_parent != c->workdir) {
- /* temp or workdir moved underneath us? abort without cleanup */
- dput(temp);
+ rd.mnt_idmap = ovl_upper_mnt_idmap(ofs);
+ rd.old_parent = c->workdir;
+ rd.new_parent = c->destdir;
+ rd.flags = 0;
+ err = start_renaming_dentry(&rd, 0, temp,
+ &QSTR_LEN(c->destname.name, c->destname.len));
+ if (err) {
+ /* temp or workdir moved underneath us? map to -EIO */
err = -EIO;
- if (!IS_ERR(trap))
- unlock_rename(c->workdir, c->destdir);
- goto out;
}
-
- err = ovl_copy_up_metadata(c, temp);
if (err)
- goto cleanup;
+ goto cleanup_unlocked;
- upper = ovl_lookup_upper(ofs, c->destname.name, c->destdir,
- c->destname.len);
- err = PTR_ERR(upper);
- if (IS_ERR(upper))
- goto cleanup;
+ err = ovl_copy_up_metadata(c, temp);
+ if (!err)
+ err = ovl_do_rename_rd(&rd);
+ end_renaming(&rd);
- err = ovl_do_rename(ofs, c->workdir, temp, c->destdir, upper, 0);
- unlock_rename(c->workdir, c->destdir);
- dput(upper);
if (err)
goto cleanup_unlocked;
@@ -850,8 +844,6 @@ static int ovl_copy_up_workdir(struct ovl_copy_up_ctx *c)
return err;
-cleanup:
- unlock_rename(c->workdir, c->destdir);
cleanup_unlocked:
ovl_cleanup(ofs, c->workdir, temp);
dput(temp);
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index ded86855e91c..6367cebdbd48 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -57,8 +57,7 @@ int ovl_cleanup(struct ovl_fs *ofs, struct dentry *workdir,
return 0;
}
-#define OVL_TEMPNAME_SIZE 20
-static void ovl_tempname(char name[OVL_TEMPNAME_SIZE])
+void ovl_tempname(char name[OVL_TEMPNAME_SIZE])
{
static atomic_t temp_id = ATOMIC_INIT(0);
@@ -66,22 +65,6 @@ static void ovl_tempname(char name[OVL_TEMPNAME_SIZE])
snprintf(name, OVL_TEMPNAME_SIZE, "#%x", atomic_inc_return(&temp_id));
}
-struct dentry *ovl_lookup_temp(struct ovl_fs *ofs, struct dentry *workdir)
-{
- struct dentry *temp;
- char name[OVL_TEMPNAME_SIZE];
-
- ovl_tempname(name);
- temp = ovl_lookup_upper(ofs, name, workdir, strlen(name));
- if (!IS_ERR(temp) && temp->d_inode) {
- pr_err("workdir/%s already exists\n", name);
- dput(temp);
- temp = ERR_PTR(-EIO);
- }
-
- return temp;
-}
-
static struct dentry *ovl_start_creating_temp(struct ovl_fs *ofs,
struct dentry *workdir)
{
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index aecb527e0524..a9ecab16dba6 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -454,11 +454,6 @@ static inline bool ovl_open_flags_need_copy_up(int flags)
}
/* util.c */
-int ovl_parent_lock(struct dentry *parent, struct dentry *child);
-static inline void ovl_parent_unlock(struct dentry *parent)
-{
- inode_unlock(parent->d_inode);
-}
int ovl_get_write_access(struct dentry *dentry);
void ovl_put_write_access(struct dentry *dentry);
void ovl_start_write(struct dentry *dentry);
@@ -895,7 +890,8 @@ struct dentry *ovl_create_real(struct ovl_fs *ofs,
struct dentry *parent, struct dentry *newdentry,
struct ovl_cattr *attr);
int ovl_cleanup(struct ovl_fs *ofs, struct dentry *workdir, struct dentry *dentry);
-struct dentry *ovl_lookup_temp(struct ovl_fs *ofs, struct dentry *workdir);
+#define OVL_TEMPNAME_SIZE 20
+void ovl_tempname(char name[OVL_TEMPNAME_SIZE]);
struct dentry *ovl_create_temp(struct ovl_fs *ofs, struct dentry *workdir,
struct ovl_cattr *attr);
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 6e0816c1147a..a721ef2b90e8 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -566,9 +566,10 @@ static int ovl_check_rename_whiteout(struct ovl_fs *ofs)
{
struct dentry *workdir = ofs->workdir;
struct dentry *temp;
- struct dentry *dest;
struct dentry *whiteout;
struct name_snapshot name;
+ struct renamedata rd = {};
+ char name2[OVL_TEMPNAME_SIZE];
int err;
temp = ovl_create_temp(ofs, workdir, OVL_CATTR(S_IFREG | 0));
@@ -576,23 +577,21 @@ static int ovl_check_rename_whiteout(struct ovl_fs *ofs)
if (IS_ERR(temp))
return err;
- err = ovl_parent_lock(workdir, temp);
+ rd.mnt_idmap = ovl_upper_mnt_idmap(ofs);
+ rd.old_parent = workdir;
+ rd.new_parent = workdir;
+ rd.flags = RENAME_WHITEOUT;
+ ovl_tempname(name2);
+ err = start_renaming_dentry(&rd, 0, temp, &QSTR(name2));
if (err) {
dput(temp);
return err;
}
- dest = ovl_lookup_temp(ofs, workdir);
- err = PTR_ERR(dest);
- if (IS_ERR(dest)) {
- dput(temp);
- ovl_parent_unlock(workdir);
- return err;
- }
/* Name is inline and stable - using snapshot as a copy helper */
take_dentry_name_snapshot(&name, temp);
- err = ovl_do_rename(ofs, workdir, temp, workdir, dest, RENAME_WHITEOUT);
- ovl_parent_unlock(workdir);
+ err = ovl_do_rename_rd(&rd);
+ end_renaming(&rd);
if (err) {
if (err == -EINVAL)
err = 0;
@@ -616,7 +615,6 @@ static int ovl_check_rename_whiteout(struct ovl_fs *ofs)
ovl_cleanup(ofs, workdir, temp);
release_dentry_name_snapshot(&name);
dput(temp);
- dput(dest);
return err;
}
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index f76672f2e686..46387aeb6be6 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -1548,14 +1548,3 @@ void ovl_copyattr(struct inode *inode)
i_size_write(inode, i_size_read(realinode));
spin_unlock(&inode->i_lock);
}
-
-int ovl_parent_lock(struct dentry *parent, struct dentry *child)
-{
- inode_lock_nested(parent->d_inode, I_MUTEX_PARENT);
- if (!child ||
- (!d_unhashed(child) && child->d_parent == parent))
- return 0;
-
- inode_unlock(parent->d_inode);
- return -EINVAL;
-}
diff --git a/fs/smb/server/vfs.c b/fs/smb/server/vfs.c
index 7c4ddc43ab39..f54b5b0aaba2 100644
--- a/fs/smb/server/vfs.c
+++ b/fs/smb/server/vfs.c
@@ -663,7 +663,6 @@ int ksmbd_vfs_link(struct ksmbd_work *work, const char *oldname,
int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path,
char *newname, int flags)
{
- struct dentry *old_parent, *new_dentry, *trap;
struct dentry *old_child = old_path->dentry;
struct path new_path;
struct qstr new_last;
@@ -673,7 +672,6 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path,
struct ksmbd_file *parent_fp;
int new_type;
int err, lookup_flags = LOOKUP_NO_SYMLINKS;
- int target_lookup_flags = LOOKUP_RENAME_TARGET | LOOKUP_CREATE;
if (ksmbd_override_fsids(work))
return -ENOMEM;
@@ -684,14 +682,6 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path,
goto revert_fsids;
}
- /*
- * explicitly handle file overwrite case, for compatibility with
- * filesystems that may not support rename flags (e.g: fuse)
- */
- if (flags & RENAME_NOREPLACE)
- target_lookup_flags |= LOOKUP_EXCL;
- flags &= ~(RENAME_NOREPLACE);
-
retry:
err = vfs_path_parent_lookup(to, lookup_flags | LOOKUP_BENEATH,
&new_path, &new_last, &new_type,
@@ -708,17 +698,14 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path,
if (err)
goto out2;
- trap = lock_rename_child(old_child, new_path.dentry);
- if (IS_ERR(trap)) {
- err = PTR_ERR(trap);
+ rd.mnt_idmap = mnt_idmap(old_path->mnt);
+ rd.old_parent = NULL;
+ rd.new_parent = new_path.dentry;
+ rd.flags = flags;
+ rd.delegated_inode = NULL,
+ err = start_renaming_dentry(&rd, lookup_flags, old_child, &new_last);
+ if (err)
goto out_drop_write;
- }
-
- old_parent = dget(old_child->d_parent);
- if (d_unhashed(old_child)) {
- err = -EINVAL;
- goto out3;
- }
parent_fp = ksmbd_lookup_fd_inode(old_child->d_parent);
if (parent_fp) {
@@ -731,44 +718,17 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path,
ksmbd_fd_put(work, parent_fp);
}
- new_dentry = lookup_one_qstr_excl(&new_last, new_path.dentry,
- lookup_flags | target_lookup_flags);
- if (IS_ERR(new_dentry)) {
- err = PTR_ERR(new_dentry);
- goto out3;
- }
-
- if (d_is_symlink(new_dentry)) {
+ if (d_is_symlink(rd.new_dentry)) {
err = -EACCES;
- goto out4;
- }
-
- if (old_child == trap) {
- err = -EINVAL;
- goto out4;
- }
-
- if (new_dentry == trap) {
- err = -ENOTEMPTY;
- goto out4;
+ goto out3;
}
- rd.mnt_idmap = mnt_idmap(old_path->mnt),
- rd.old_parent = old_parent,
- rd.old_dentry = old_child,
- rd.new_parent = new_path.dentry,
- rd.new_dentry = new_dentry,
- rd.flags = flags,
- rd.delegated_inode = NULL,
err = vfs_rename(&rd);
if (err)
ksmbd_debug(VFS, "vfs_rename failed err %d\n", err);
-out4:
- dput(new_dentry);
out3:
- dput(old_parent);
- unlock_rename(old_parent, new_path.dentry);
+ end_renaming(&rd);
out_drop_write:
mnt_drop_write(old_path->mnt);
out2:
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 19c3d8e336d5..f73001e3719a 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -158,6 +158,8 @@ extern struct dentry *lock_rename_child(struct dentry *, struct dentry *);
extern void unlock_rename(struct dentry *, struct dentry *);
int start_renaming(struct renamedata *rd, int lookup_flags,
struct qstr *old_last, struct qstr *new_last);
+int start_renaming_dentry(struct renamedata *rd, int lookup_flags,
+ struct dentry *old_dentry, struct qstr *new_last);
void end_renaming(struct renamedata *rd);
/**
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 11/14] Add start_renaming_two_dentries()
2025-10-15 1:46 [PATCH v2 00/14] Create and use APIs to centralise locking for directory ops NeilBrown
` (9 preceding siblings ...)
2025-10-15 1:47 ` [PATCH v2 10/14] VFS/ovl/smb: introduce start_renaming_dentry() NeilBrown
@ 2025-10-15 1:47 ` NeilBrown
2025-10-15 1:47 ` [PATCH v2 12/14] ecryptfs: use new start_creating/start_removing APIs NeilBrown
` (3 subsequent siblings)
14 siblings, 0 replies; 32+ messages in thread
From: NeilBrown @ 2025-10-15 1:47 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
A few callers want to lock for a rename and already have both dentries.
Also debugfs does want to perform a lookup but doesn't want permission
checking, so start_renaming_dentry() cannot be used.
This patch introduces start_renaming_two_dentries() which is given both
dentries. debugfs performs one lookup itself. As it will only continue
with a negative dentry and as those cannot be renamed or unlinked, it is
safe to do the lookup before getting the rename locks.
overlayfs uses start_renaming_two_dentries() in three places and selinux
uses it twice in sel_make_policy_nodes().
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/debugfs/inode.c | 48 ++++++++++++--------------
fs/namei.c | 65 ++++++++++++++++++++++++++++++++++++
fs/overlayfs/dir.c | 42 +++++++++++++++--------
include/linux/namei.h | 2 ++
security/selinux/selinuxfs.c | 27 ++++++++++-----
5 files changed, 135 insertions(+), 49 deletions(-)
diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
index f241b9df642a..532bd7c46baf 100644
--- a/fs/debugfs/inode.c
+++ b/fs/debugfs/inode.c
@@ -842,7 +842,8 @@ int __printf(2, 3) debugfs_change_name(struct dentry *dentry, const char *fmt, .
int error = 0;
const char *new_name;
struct name_snapshot old_name;
- struct dentry *parent, *target;
+ struct dentry *target;
+ struct renamedata rd = {};
struct inode *dir;
va_list ap;
@@ -855,36 +856,31 @@ int __printf(2, 3) debugfs_change_name(struct dentry *dentry, const char *fmt, .
if (!new_name)
return -ENOMEM;
- parent = dget_parent(dentry);
- dir = d_inode(parent);
- inode_lock(dir);
+ rd.old_parent = dget_parent(dentry);
+ rd.new_parent = rd.old_parent;
+ rd.flags = RENAME_NOREPLACE;
+ target = lookup_noperm_unlocked(&QSTR(new_name), rd.new_parent);
+ if (IS_ERR(target))
+ return PTR_ERR(target);
- take_dentry_name_snapshot(&old_name, dentry);
-
- if (WARN_ON_ONCE(dentry->d_parent != parent)) {
- error = -EINVAL;
- goto out;
- }
- if (strcmp(old_name.name.name, new_name) == 0)
- goto out;
- target = lookup_noperm(&QSTR(new_name), parent);
- if (IS_ERR(target)) {
- error = PTR_ERR(target);
- goto out;
- }
- if (d_really_is_positive(target)) {
- dput(target);
- error = -EINVAL;
+ error = start_renaming_two_dentries(&rd, dentry, target);
+ if (error) {
+ if (error == -EEXIST && target == dentry)
+ /* it isn't an error to rename a thing to itself */
+ error = 0;
goto out;
}
- simple_rename_timestamp(dir, dentry, dir, target);
- d_move(dentry, target);
- dput(target);
+
+ dir = d_inode(rd.old_parent);
+ take_dentry_name_snapshot(&old_name, dentry);
+ simple_rename_timestamp(dir, dentry, dir, rd.new_dentry);
+ d_move(dentry, rd.new_dentry);
fsnotify_move(dir, dir, &old_name.name, d_is_dir(dentry), NULL, dentry);
-out:
release_dentry_name_snapshot(&old_name);
- inode_unlock(dir);
- dput(parent);
+ end_renaming(&rd);
+out:
+ dput(rd.old_parent);
+ dput(target);
kfree_const(new_name);
return error;
}
diff --git a/fs/namei.c b/fs/namei.c
index 4e694b82e309..0a5261640ae5 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3877,6 +3877,71 @@ int start_renaming_dentry(struct renamedata *rd, int lookup_flags,
}
EXPORT_SYMBOL(start_renaming_dentry);
+/**
+ * start_renaming_two_dentries - Lock to dentries in given parents for rename
+ * @rd: rename data containing parent
+ * @old_dentry: dentry of name to move
+ * @new_dentry: dentry to move to
+ *
+ * Ensure locks are in place for rename and check parentage is still correct.
+ *
+ * On success the two dentries are stored in @rd.old_dentry and
+ * @rd.new_dentry and @rd.old_parent and @rd.new_parent are confirmed to
+ * be the parents of the dentries.
+ *
+ * References and the lock can be dropped with end_renaming()
+ *
+ * Returns: zero or an error.
+ */
+int
+start_renaming_two_dentries(struct renamedata *rd,
+ struct dentry *old_dentry, struct dentry *new_dentry)
+{
+ struct dentry *trap;
+ int err;
+
+ /* Already have the dentry - need to be sure to lock the correct parent */
+ trap = lock_rename_child(old_dentry, rd->new_parent);
+ if (IS_ERR(trap))
+ return PTR_ERR(trap);
+ err = -EINVAL;
+ if (d_unhashed(old_dentry) ||
+ (rd->old_parent && rd->old_parent != old_dentry->d_parent))
+ /* old_dentry was removed, or moved and explicit parent requested */
+ goto out_unlock;
+ if (d_unhashed(new_dentry) ||
+ rd->new_parent != new_dentry->d_parent)
+ /* new_dentry was removed or moved */
+ goto out_unlock;
+
+ if (old_dentry == trap)
+ /* source is an ancestor of target */
+ goto out_unlock;
+
+ if (new_dentry == trap) {
+ /* target is an ancestor of source */
+ if (rd->flags & RENAME_EXCHANGE)
+ err = -EINVAL;
+ else
+ err = -ENOTEMPTY;
+ goto out_unlock;
+ }
+
+ err = -EEXIST;
+ if (d_is_positive(new_dentry) && (rd->flags & RENAME_NOREPLACE))
+ goto out_unlock;
+
+ rd->old_dentry = dget(old_dentry);
+ rd->new_dentry = dget(new_dentry);
+ rd->old_parent = dget(old_dentry->d_parent);
+ return 0;
+
+out_unlock:
+ unlock_rename(old_dentry->d_parent, rd->new_parent);
+ return err;
+}
+EXPORT_SYMBOL(start_renaming_two_dentries);
+
void end_renaming(struct renamedata *rd)
{
unlock_rename(rd->old_parent, rd->new_parent);
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index 6367cebdbd48..dfd1d4b48948 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -123,6 +123,7 @@ int ovl_cleanup_and_whiteout(struct ovl_fs *ofs, struct dentry *dir,
struct dentry *dentry)
{
struct dentry *whiteout;
+ struct renamedata rd = {};
int err;
int flags = 0;
@@ -134,10 +135,13 @@ int ovl_cleanup_and_whiteout(struct ovl_fs *ofs, struct dentry *dir,
if (d_is_dir(dentry))
flags = RENAME_EXCHANGE;
- err = ovl_lock_rename_workdir(ofs->workdir, whiteout, dir, dentry);
+ rd.old_parent = ofs->workdir;
+ rd.new_parent = dir;
+ rd.flags = flags;
+ err = start_renaming_two_dentries(&rd, whiteout, dentry);
if (!err) {
- err = ovl_do_rename(ofs, ofs->workdir, whiteout, dir, dentry, flags);
- unlock_rename(ofs->workdir, dir);
+ err = ovl_do_rename_rd(&rd);
+ end_renaming(&rd);
}
if (err)
goto kill_whiteout;
@@ -388,6 +392,7 @@ static struct dentry *ovl_clear_empty(struct dentry *dentry,
struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
struct dentry *workdir = ovl_workdir(dentry);
struct dentry *upperdir = ovl_dentry_upper(dentry->d_parent);
+ struct renamedata rd = {};
struct path upperpath;
struct dentry *upper;
struct dentry *opaquedir;
@@ -413,7 +418,11 @@ static struct dentry *ovl_clear_empty(struct dentry *dentry,
if (IS_ERR(opaquedir))
goto out;
- err = ovl_lock_rename_workdir(workdir, opaquedir, upperdir, upper);
+ rd.mnt_idmap = ovl_upper_mnt_idmap(ofs);
+ rd.old_parent = workdir;
+ rd.new_parent = upperdir;
+ rd.flags = RENAME_EXCHANGE;
+ err = start_renaming_two_dentries(&rd, opaquedir, upper);
if (err)
goto out_cleanup_unlocked;
@@ -431,8 +440,8 @@ static struct dentry *ovl_clear_empty(struct dentry *dentry,
if (err)
goto out_cleanup;
- err = ovl_do_rename(ofs, workdir, opaquedir, upperdir, upper, RENAME_EXCHANGE);
- unlock_rename(workdir, upperdir);
+ err = ovl_do_rename_rd(&rd);
+ end_renaming(&rd);
if (err)
goto out_cleanup_unlocked;
@@ -445,7 +454,7 @@ static struct dentry *ovl_clear_empty(struct dentry *dentry,
return opaquedir;
out_cleanup:
- unlock_rename(workdir, upperdir);
+ end_renaming(&rd);
out_cleanup_unlocked:
ovl_cleanup(ofs, workdir, opaquedir);
dput(opaquedir);
@@ -468,6 +477,7 @@ static int ovl_create_over_whiteout(struct dentry *dentry, struct inode *inode,
struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
struct dentry *workdir = ovl_workdir(dentry);
struct dentry *upperdir = ovl_dentry_upper(dentry->d_parent);
+ struct renamedata rd = {};
struct dentry *upper;
struct dentry *newdentry;
int err;
@@ -499,7 +509,11 @@ static int ovl_create_over_whiteout(struct dentry *dentry, struct inode *inode,
if (IS_ERR(newdentry))
goto out_dput;
- err = ovl_lock_rename_workdir(workdir, newdentry, upperdir, upper);
+ rd.mnt_idmap = ovl_upper_mnt_idmap(ofs);
+ rd.old_parent = workdir;
+ rd.new_parent = upperdir;
+ rd.flags = 0;
+ err = start_renaming_two_dentries(&rd, newdentry, upper);
if (err)
goto out_cleanup_unlocked;
@@ -536,16 +550,16 @@ static int ovl_create_over_whiteout(struct dentry *dentry, struct inode *inode,
if (err)
goto out_cleanup;
- err = ovl_do_rename(ofs, workdir, newdentry, upperdir, upper,
- RENAME_EXCHANGE);
- unlock_rename(workdir, upperdir);
+ rd.flags = RENAME_EXCHANGE;
+ err = ovl_do_rename_rd(&rd);
+ end_renaming(&rd);
if (err)
goto out_cleanup_unlocked;
ovl_cleanup(ofs, workdir, upper);
} else {
- err = ovl_do_rename(ofs, workdir, newdentry, upperdir, upper, 0);
- unlock_rename(workdir, upperdir);
+ err = ovl_do_rename_rd(&rd);
+ end_renaming(&rd);
if (err)
goto out_cleanup_unlocked;
}
@@ -565,7 +579,7 @@ static int ovl_create_over_whiteout(struct dentry *dentry, struct inode *inode,
return err;
out_cleanup:
- unlock_rename(workdir, upperdir);
+ end_renaming(&rd);
out_cleanup_unlocked:
ovl_cleanup(ofs, workdir, newdentry);
dput(newdentry);
diff --git a/include/linux/namei.h b/include/linux/namei.h
index f73001e3719a..a99ac8b7e24a 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -160,6 +160,8 @@ int start_renaming(struct renamedata *rd, int lookup_flags,
struct qstr *old_last, struct qstr *new_last);
int start_renaming_dentry(struct renamedata *rd, int lookup_flags,
struct dentry *old_dentry, struct qstr *new_last);
+int start_renaming_two_dentries(struct renamedata *rd,
+ struct dentry *old_dentry, struct dentry *new_dentry);
void end_renaming(struct renamedata *rd);
/**
diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
index 232e087bce3e..a224ef9bb831 100644
--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
@@ -506,6 +506,7 @@ static int sel_make_policy_nodes(struct selinux_fs_info *fsi,
{
int ret = 0;
struct dentry *tmp_parent, *tmp_bool_dir, *tmp_class_dir;
+ struct renamedata rd = {};
unsigned int bool_num = 0;
char **bool_names = NULL;
int *bool_values = NULL;
@@ -539,22 +540,30 @@ static int sel_make_policy_nodes(struct selinux_fs_info *fsi,
if (ret)
goto out;
- lock_rename(tmp_parent, fsi->sb->s_root);
+ rd.old_parent = tmp_parent;
+ rd.new_parent = fsi->sb->s_root;
/* booleans */
- d_exchange(tmp_bool_dir, fsi->bool_dir);
+ ret = start_renaming_two_dentries(&rd, tmp_bool_dir, fsi->bool_dir);
+ if (!ret) {
+ d_exchange(tmp_bool_dir, fsi->bool_dir);
- swap(fsi->bool_num, bool_num);
- swap(fsi->bool_pending_names, bool_names);
- swap(fsi->bool_pending_values, bool_values);
+ swap(fsi->bool_num, bool_num);
+ swap(fsi->bool_pending_names, bool_names);
+ swap(fsi->bool_pending_values, bool_values);
- fsi->bool_dir = tmp_bool_dir;
+ fsi->bool_dir = tmp_bool_dir;
+ end_renaming(&rd);
+ }
/* classes */
- d_exchange(tmp_class_dir, fsi->class_dir);
- fsi->class_dir = tmp_class_dir;
+ ret = start_renaming_two_dentries(&rd, tmp_class_dir, fsi->class_dir);
+ if (ret == 0) {
+ d_exchange(tmp_class_dir, fsi->class_dir);
+ fsi->class_dir = tmp_class_dir;
- unlock_rename(tmp_parent, fsi->sb->s_root);
+ end_renaming(&rd);
+ }
out:
sel_remove_old_bool_data(bool_num, bool_names, bool_values);
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 12/14] ecryptfs: use new start_creating/start_removing APIs
2025-10-15 1:46 [PATCH v2 00/14] Create and use APIs to centralise locking for directory ops NeilBrown
` (10 preceding siblings ...)
2025-10-15 1:47 ` [PATCH v2 11/14] Add start_renaming_two_dentries() NeilBrown
@ 2025-10-15 1:47 ` NeilBrown
2025-10-19 10:38 ` Amir Goldstein
2025-10-15 1:47 ` [PATCH v2 13/14] VFS: change vfs_mkdir() to unlock on failure NeilBrown
` (2 subsequent siblings)
14 siblings, 1 reply; 32+ messages in thread
From: NeilBrown @ 2025-10-15 1:47 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
This requires the addition of start_creating_dentry() which is given the
dentry which has already been found, and asks for it to be locked and
its parent validated.
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/ecryptfs/inode.c | 153 ++++++++++++++++++++----------------------
fs/namei.c | 33 +++++++++
include/linux/namei.h | 2 +
3 files changed, 107 insertions(+), 81 deletions(-)
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index ed1394da8d6b..b3702105d236 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -24,18 +24,26 @@
#include <linux/unaligned.h>
#include "ecryptfs_kernel.h"
-static int lock_parent(struct dentry *dentry,
- struct dentry **lower_dentry,
- struct inode **lower_dir)
+static struct dentry *ecryptfs_start_creating_dentry(struct dentry *dentry)
{
- struct dentry *lower_dir_dentry;
+ struct dentry *parent = dget_parent(dentry->d_parent);
+ struct dentry *ret;
- lower_dir_dentry = ecryptfs_dentry_to_lower(dentry->d_parent);
- *lower_dir = d_inode(lower_dir_dentry);
- *lower_dentry = ecryptfs_dentry_to_lower(dentry);
+ ret = start_creating_dentry(ecryptfs_dentry_to_lower(parent),
+ ecryptfs_dentry_to_lower(dentry));
+ dput(parent);
+ return ret;
+}
- inode_lock_nested(*lower_dir, I_MUTEX_PARENT);
- return (*lower_dentry)->d_parent == lower_dir_dentry ? 0 : -EINVAL;
+static struct dentry *ecryptfs_start_removing_dentry(struct dentry *dentry)
+{
+ struct dentry *parent = dget_parent(dentry->d_parent);
+ struct dentry *ret;
+
+ ret = start_removing_dentry(ecryptfs_dentry_to_lower(parent),
+ ecryptfs_dentry_to_lower(dentry));
+ dput(parent);
+ return ret;
}
static int ecryptfs_inode_test(struct inode *inode, void *lower_inode)
@@ -141,15 +149,12 @@ static int ecryptfs_do_unlink(struct inode *dir, struct dentry *dentry,
struct inode *lower_dir;
int rc;
- rc = lock_parent(dentry, &lower_dentry, &lower_dir);
- dget(lower_dentry); // don't even try to make the lower negative
- if (!rc) {
- if (d_unhashed(lower_dentry))
- rc = -EINVAL;
- else
- rc = vfs_unlink(&nop_mnt_idmap, lower_dir, lower_dentry,
- NULL);
- }
+ lower_dentry = ecryptfs_start_removing_dentry(dentry);
+ if (IS_ERR(lower_dentry))
+ return PTR_ERR(lower_dentry);
+
+ lower_dir = lower_dentry->d_parent->d_inode;
+ rc = vfs_unlink(&nop_mnt_idmap, lower_dir, lower_dentry, NULL);
if (rc) {
printk(KERN_ERR "Error in vfs_unlink; rc = [%d]\n", rc);
goto out_unlock;
@@ -158,8 +163,7 @@ static int ecryptfs_do_unlink(struct inode *dir, struct dentry *dentry,
set_nlink(inode, ecryptfs_inode_to_lower(inode)->i_nlink);
inode_set_ctime_to_ts(inode, inode_get_ctime(dir));
out_unlock:
- dput(lower_dentry);
- inode_unlock(lower_dir);
+ end_removing(lower_dentry);
if (!rc)
d_drop(dentry);
return rc;
@@ -186,10 +190,12 @@ ecryptfs_do_create(struct inode *directory_inode,
struct inode *lower_dir;
struct inode *inode;
- rc = lock_parent(ecryptfs_dentry, &lower_dentry, &lower_dir);
- if (!rc)
- rc = vfs_create(&nop_mnt_idmap, lower_dir,
- lower_dentry, mode, true);
+ lower_dentry = ecryptfs_start_creating_dentry(ecryptfs_dentry);
+ if (IS_ERR(lower_dentry))
+ return ERR_CAST(lower_dentry);
+ lower_dir = lower_dentry->d_parent->d_inode;
+ rc = vfs_create(&nop_mnt_idmap, lower_dir,
+ lower_dentry, mode, true);
if (rc) {
printk(KERN_ERR "%s: Failure to create dentry in lower fs; "
"rc = [%d]\n", __func__, rc);
@@ -205,7 +211,7 @@ ecryptfs_do_create(struct inode *directory_inode,
fsstack_copy_attr_times(directory_inode, lower_dir);
fsstack_copy_inode_size(directory_inode, lower_dir);
out_lock:
- inode_unlock(lower_dir);
+ end_creating(lower_dentry, NULL);
return inode;
}
@@ -433,10 +439,12 @@ static int ecryptfs_link(struct dentry *old_dentry, struct inode *dir,
file_size_save = i_size_read(d_inode(old_dentry));
lower_old_dentry = ecryptfs_dentry_to_lower(old_dentry);
- rc = lock_parent(new_dentry, &lower_new_dentry, &lower_dir);
- if (!rc)
- rc = vfs_link(lower_old_dentry, &nop_mnt_idmap, lower_dir,
- lower_new_dentry, NULL);
+ lower_new_dentry = ecryptfs_start_creating_dentry(new_dentry);
+ if (IS_ERR(lower_new_dentry))
+ return PTR_ERR(lower_new_dentry);
+ lower_dir = lower_new_dentry->d_parent->d_inode;
+ rc = vfs_link(lower_old_dentry, &nop_mnt_idmap, lower_dir,
+ lower_new_dentry, NULL);
if (rc || d_really_is_negative(lower_new_dentry))
goto out_lock;
rc = ecryptfs_interpose(lower_new_dentry, new_dentry, dir->i_sb);
@@ -448,7 +456,7 @@ static int ecryptfs_link(struct dentry *old_dentry, struct inode *dir,
ecryptfs_inode_to_lower(d_inode(old_dentry))->i_nlink);
i_size_write(d_inode(new_dentry), file_size_save);
out_lock:
- inode_unlock(lower_dir);
+ end_creating(lower_new_dentry, NULL);
return rc;
}
@@ -468,9 +476,11 @@ static int ecryptfs_symlink(struct mnt_idmap *idmap,
size_t encoded_symlen;
struct ecryptfs_mount_crypt_stat *mount_crypt_stat = NULL;
- rc = lock_parent(dentry, &lower_dentry, &lower_dir);
- if (rc)
- goto out_lock;
+ lower_dentry = ecryptfs_start_creating_dentry(dentry);
+ if (IS_ERR(lower_dentry))
+ return PTR_ERR(lower_dentry);
+ lower_dir = lower_dentry->d_parent->d_inode;
+
mount_crypt_stat = &ecryptfs_superblock_to_private(
dir->i_sb)->mount_crypt_stat;
rc = ecryptfs_encrypt_and_encode_filename(&encoded_symname,
@@ -490,7 +500,7 @@ static int ecryptfs_symlink(struct mnt_idmap *idmap,
fsstack_copy_attr_times(dir, lower_dir);
fsstack_copy_inode_size(dir, lower_dir);
out_lock:
- inode_unlock(lower_dir);
+ end_creating(lower_dentry, NULL);
if (d_really_is_negative(dentry))
d_drop(dentry);
return rc;
@@ -501,12 +511,14 @@ static struct dentry *ecryptfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
{
int rc;
struct dentry *lower_dentry;
+ struct dentry *lower_dir_dentry;
struct inode *lower_dir;
- rc = lock_parent(dentry, &lower_dentry, &lower_dir);
- if (rc)
- goto out;
-
+ lower_dentry = ecryptfs_start_creating_dentry(dentry);
+ if (IS_ERR(lower_dentry))
+ return lower_dentry;
+ lower_dir_dentry = dget(lower_dentry->d_parent);
+ lower_dir = lower_dir_dentry->d_inode;
lower_dentry = vfs_mkdir(&nop_mnt_idmap, lower_dir,
lower_dentry, mode);
rc = PTR_ERR(lower_dentry);
@@ -522,7 +534,7 @@ static struct dentry *ecryptfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
fsstack_copy_inode_size(dir, lower_dir);
set_nlink(dir, lower_dir->i_nlink);
out:
- inode_unlock(lower_dir);
+ end_creating(lower_dentry, lower_dir_dentry);
if (d_really_is_negative(dentry))
d_drop(dentry);
return ERR_PTR(rc);
@@ -534,21 +546,18 @@ static int ecryptfs_rmdir(struct inode *dir, struct dentry *dentry)
struct inode *lower_dir;
int rc;
- rc = lock_parent(dentry, &lower_dentry, &lower_dir);
- dget(lower_dentry); // don't even try to make the lower negative
- if (!rc) {
- if (d_unhashed(lower_dentry))
- rc = -EINVAL;
- else
- rc = vfs_rmdir(&nop_mnt_idmap, lower_dir, lower_dentry);
- }
+ lower_dentry = ecryptfs_start_removing_dentry(dentry);
+ if (IS_ERR(lower_dentry))
+ return PTR_ERR(lower_dentry);
+ lower_dir = lower_dentry->d_parent->d_inode;
+
+ rc = vfs_rmdir(&nop_mnt_idmap, lower_dir, lower_dentry);
if (!rc) {
clear_nlink(d_inode(dentry));
fsstack_copy_attr_times(dir, lower_dir);
set_nlink(dir, lower_dir->i_nlink);
}
- dput(lower_dentry);
- inode_unlock(lower_dir);
+ end_removing(lower_dentry);
if (!rc)
d_drop(dentry);
return rc;
@@ -562,10 +571,12 @@ ecryptfs_mknod(struct mnt_idmap *idmap, struct inode *dir,
struct dentry *lower_dentry;
struct inode *lower_dir;
- rc = lock_parent(dentry, &lower_dentry, &lower_dir);
- if (!rc)
- rc = vfs_mknod(&nop_mnt_idmap, lower_dir,
- lower_dentry, mode, dev);
+ lower_dentry = ecryptfs_start_creating_dentry(dentry);
+ if (IS_ERR(lower_dentry))
+ return PTR_ERR(lower_dentry);
+ lower_dir = lower_dentry->d_parent->d_inode;
+
+ rc = vfs_mknod(&nop_mnt_idmap, lower_dir, lower_dentry, mode, dev);
if (rc || d_really_is_negative(lower_dentry))
goto out;
rc = ecryptfs_interpose(lower_dentry, dentry, dir->i_sb);
@@ -574,7 +585,7 @@ ecryptfs_mknod(struct mnt_idmap *idmap, struct inode *dir,
fsstack_copy_attr_times(dir, lower_dir);
fsstack_copy_inode_size(dir, lower_dir);
out:
- inode_unlock(lower_dir);
+ end_removing(lower_dentry);
if (d_really_is_negative(dentry))
d_drop(dentry);
return rc;
@@ -590,7 +601,6 @@ ecryptfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
struct dentry *lower_new_dentry;
struct dentry *lower_old_dir_dentry;
struct dentry *lower_new_dir_dentry;
- struct dentry *trap;
struct inode *target_inode;
struct renamedata rd = {};
@@ -605,31 +615,13 @@ ecryptfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
target_inode = d_inode(new_dentry);
- trap = lock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
- if (IS_ERR(trap))
- return PTR_ERR(trap);
- dget(lower_new_dentry);
- rc = -EINVAL;
- if (lower_old_dentry->d_parent != lower_old_dir_dentry)
- goto out_lock;
- if (lower_new_dentry->d_parent != lower_new_dir_dentry)
- goto out_lock;
- if (d_unhashed(lower_old_dentry) || d_unhashed(lower_new_dentry))
- goto out_lock;
- /* source should not be ancestor of target */
- if (trap == lower_old_dentry)
- goto out_lock;
- /* target should not be ancestor of source */
- if (trap == lower_new_dentry) {
- rc = -ENOTEMPTY;
- goto out_lock;
- }
+ rd.mnt_idmap = &nop_mnt_idmap;
+ rd.old_parent = lower_old_dir_dentry;
+ rd.new_parent = lower_new_dir_dentry;
+ rc = start_renaming_two_dentries(&rd, lower_old_dentry, lower_new_dentry);
+ if (rc)
+ return rc;
- rd.mnt_idmap = &nop_mnt_idmap;
- rd.old_parent = lower_old_dir_dentry;
- rd.old_dentry = lower_old_dentry;
- rd.new_parent = lower_new_dir_dentry;
- rd.new_dentry = lower_new_dentry;
rc = vfs_rename(&rd);
if (rc)
goto out_lock;
@@ -640,8 +632,7 @@ ecryptfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
if (new_dir != old_dir)
fsstack_copy_attr_all(old_dir, d_inode(lower_old_dir_dentry));
out_lock:
- dput(lower_new_dentry);
- unlock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
+ end_renaming(&rd);
return rc;
}
diff --git a/fs/namei.c b/fs/namei.c
index 0a5261640ae5..91e484dbc239 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3397,6 +3397,39 @@ struct dentry *start_removing_noperm(struct dentry *parent,
}
EXPORT_SYMBOL(start_removing_noperm);
+/**
+ * start_creating_dentry - prepare to create a given dentry
+ * @parent: directory from which dentry should be removed
+ * @child: the dentry to be removed
+ *
+ * A lock is taken to protect the dentry again other dirops and
+ * the validity of the dentry is checked: correct parent and still hashed.
+ *
+ * If the dentry is valid and negative a reference is taken and
+ * returned. If not an error is returned.
+ *
+ * end_creating() should be called when creation is complete, or aborted.
+ *
+ * Returns: the valid dentry, or an error.
+ */
+struct dentry *start_creating_dentry(struct dentry *parent,
+ struct dentry *child)
+{
+ inode_lock_nested(parent->d_inode, I_MUTEX_PARENT);
+ if (unlikely(IS_DEADDIR(parent->d_inode) ||
+ child->d_parent != parent ||
+ d_unhashed(child))) {
+ inode_unlock(parent->d_inode);
+ return ERR_PTR(-EINVAL);
+ }
+ if (d_is_positive(child)) {
+ inode_unlock(parent->d_inode);
+ return ERR_PTR(-EEXIST);
+ }
+ return dget(child);
+}
+EXPORT_SYMBOL(start_creating_dentry);
+
/**
* start_removing_dentry - prepare to remove a given dentry
* @parent: directory from which dentry should be removed
diff --git a/include/linux/namei.h b/include/linux/namei.h
index a99ac8b7e24a..208aed1d6728 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -100,6 +100,8 @@ struct dentry *start_removing_killable(struct mnt_idmap *idmap,
struct qstr *name);
struct dentry *start_creating_noperm(struct dentry *parent, struct qstr *name);
struct dentry *start_removing_noperm(struct dentry *parent, struct qstr *name);
+struct dentry *start_creating_dentry(struct dentry *parent,
+ struct dentry *child);
struct dentry *start_removing_dentry(struct dentry *parent,
struct dentry *child);
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 13/14] VFS: change vfs_mkdir() to unlock on failure.
2025-10-15 1:46 [PATCH v2 00/14] Create and use APIs to centralise locking for directory ops NeilBrown
` (11 preceding siblings ...)
2025-10-15 1:47 ` [PATCH v2 12/14] ecryptfs: use new start_creating/start_removing APIs NeilBrown
@ 2025-10-15 1:47 ` NeilBrown
2025-10-19 10:46 ` Amir Goldstein
2025-10-15 1:47 ` [PATCH v2 14/14] VFS: introduce end_creating_keep() NeilBrown
2025-10-19 10:50 ` [PATCH v2 00/14] Create and use APIs to centralise locking for directory ops Amir Goldstein
14 siblings, 1 reply; 32+ messages in thread
From: NeilBrown @ 2025-10-15 1:47 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
vfs_mkdir() already drops the reference to the dentry on failure but it
leaves the parent locked.
This complicates end_creating() which needs to unlock the parent even
though the dentry is no longer available.
If we change vfs_mkdir() to unlock on failure as well as releasing the
dentry, we can remove the "parent" arg from end_creating() and simplify
the rules for calling it.
Note that cachefiles_get_directory() can choose to substitute an error
instead of actually calling vfs_mkdir(), for fault injection. In that
case it needs to call end_creating(), just as vfs_mkdir() now does on
error.
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/btrfs/ioctl.c | 2 +-
fs/cachefiles/namei.c | 14 ++++++++------
fs/ecryptfs/inode.c | 8 ++++----
fs/namei.c | 4 ++--
fs/nfsd/nfs3proc.c | 2 +-
fs/nfsd/nfs4proc.c | 2 +-
fs/nfsd/nfs4recover.c | 2 +-
fs/nfsd/nfsproc.c | 2 +-
fs/nfsd/vfs.c | 8 ++++----
fs/overlayfs/copy_up.c | 4 ++--
fs/overlayfs/dir.c | 13 ++++++-------
fs/overlayfs/super.c | 6 +++---
fs/xfs/scrub/orphanage.c | 2 +-
include/linux/namei.h | 28 +++++++++-------------------
ipc/mqueue.c | 2 +-
15 files changed, 45 insertions(+), 54 deletions(-)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 4fbfdd8faf6a..90ef777eae25 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -935,7 +935,7 @@ static noinline int btrfs_mksubvol(struct dentry *parent,
out_up_read:
up_read(&fs_info->subvol_sem);
out_dput:
- end_creating(dentry, parent);
+ end_creating(dentry);
return ret;
}
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index b97a40917a32..10f010dc9946 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -130,8 +130,10 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
ret = cachefiles_inject_write_error();
if (ret == 0)
subdir = vfs_mkdir(&nop_mnt_idmap, d_inode(dir), subdir, 0700);
- else
+ else {
+ end_creating(subdir);
subdir = ERR_PTR(ret);
+ }
if (IS_ERR(subdir)) {
trace_cachefiles_vfs_error(NULL, d_inode(dir), ret,
cachefiles_trace_mkdir_error);
@@ -140,7 +142,7 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
trace_cachefiles_mkdir(dir, subdir);
if (unlikely(d_unhashed(subdir) || d_is_negative(subdir))) {
- end_creating(subdir, dir);
+ end_creating(subdir);
goto retry;
}
ASSERT(d_backing_inode(subdir));
@@ -154,7 +156,7 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
/* Tell rmdir() it's not allowed to delete the subdir */
inode_lock(d_inode(subdir));
dget(subdir);
- end_creating(subdir, dir);
+ end_creating(subdir);
if (!__cachefiles_mark_inode_in_use(NULL, d_inode(subdir))) {
pr_notice("cachefiles: Inode already in use: %pd (B=%lx)\n",
@@ -196,7 +198,7 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
return ERR_PTR(-EBUSY);
mkdir_error:
- end_creating(subdir, dir);
+ end_creating(subdir);
pr_err("mkdir %s failed with error %d\n", dirname, ret);
return ERR_PTR(ret);
@@ -705,7 +707,7 @@ bool cachefiles_commit_tmpfile(struct cachefiles_cache *cache,
if (ret < 0)
goto out_end;
- end_creating(dentry, fan);
+ end_creating(dentry);
ret = cachefiles_inject_read_error();
if (ret == 0)
@@ -739,7 +741,7 @@ bool cachefiles_commit_tmpfile(struct cachefiles_cache *cache,
}
out_end:
- end_creating(dentry, fan);
+ end_creating(dentry);
out:
_leave(" = %u", success);
return success;
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index b3702105d236..90d74ecc5028 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -211,7 +211,7 @@ ecryptfs_do_create(struct inode *directory_inode,
fsstack_copy_attr_times(directory_inode, lower_dir);
fsstack_copy_inode_size(directory_inode, lower_dir);
out_lock:
- end_creating(lower_dentry, NULL);
+ end_creating(lower_dentry);
return inode;
}
@@ -456,7 +456,7 @@ static int ecryptfs_link(struct dentry *old_dentry, struct inode *dir,
ecryptfs_inode_to_lower(d_inode(old_dentry))->i_nlink);
i_size_write(d_inode(new_dentry), file_size_save);
out_lock:
- end_creating(lower_new_dentry, NULL);
+ end_creating(lower_new_dentry);
return rc;
}
@@ -500,7 +500,7 @@ static int ecryptfs_symlink(struct mnt_idmap *idmap,
fsstack_copy_attr_times(dir, lower_dir);
fsstack_copy_inode_size(dir, lower_dir);
out_lock:
- end_creating(lower_dentry, NULL);
+ end_creating(lower_dentry);
if (d_really_is_negative(dentry))
d_drop(dentry);
return rc;
@@ -534,7 +534,7 @@ static struct dentry *ecryptfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
fsstack_copy_inode_size(dir, lower_dir);
set_nlink(dir, lower_dir->i_nlink);
out:
- end_creating(lower_dentry, lower_dir_dentry);
+ end_creating(lower_dentry);
if (d_really_is_negative(dentry))
d_drop(dentry);
return ERR_PTR(rc);
diff --git a/fs/namei.c b/fs/namei.c
index 91e484dbc239..ba831fc6cce8 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4832,7 +4832,7 @@ EXPORT_SYMBOL(start_creating_path);
*/
void end_creating_path(const struct path *path, struct dentry *dentry)
{
- end_creating(dentry, path->dentry);
+ end_creating(dentry);
mnt_drop_write(path->mnt);
path_put(path);
}
@@ -5034,7 +5034,7 @@ struct dentry *vfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
return dentry;
err:
- dput(dentry);
+ end_creating(dentry);
return ERR_PTR(error);
}
EXPORT_SYMBOL(vfs_mkdir);
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
index e2aac0def2cb..6b39e4aff959 100644
--- a/fs/nfsd/nfs3proc.c
+++ b/fs/nfsd/nfs3proc.c
@@ -364,7 +364,7 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
status = nfsd_create_setattr(rqstp, fhp, resfhp, &attrs);
out:
- end_creating(child, parent);
+ end_creating(child);
out_write:
fh_drop_write(fhp);
return status;
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index b2c95e8e7c68..524cb07a477c 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -376,7 +376,7 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
if (attrs.na_aclerr)
open->op_bmval[0] &= ~FATTR4_WORD0_ACL;
out:
- end_creating(child, parent);
+ end_creating(child);
nfsd_attrs_free(&attrs);
out_write:
fh_drop_write(fhp);
diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
index 3eefaa2202e3..18c08395b273 100644
--- a/fs/nfsd/nfs4recover.c
+++ b/fs/nfsd/nfs4recover.c
@@ -215,7 +215,7 @@ nfsd4_create_clid_dir(struct nfs4_client *clp)
if (IS_ERR(dentry))
status = PTR_ERR(dentry);
out_end:
- end_creating(dentry, dir);
+ end_creating(dentry);
out:
if (status == 0) {
if (nn->in_grace)
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
index ee1b16e921fd..28f03a6a3cc3 100644
--- a/fs/nfsd/nfsproc.c
+++ b/fs/nfsd/nfsproc.c
@@ -421,7 +421,7 @@ nfsd_proc_create(struct svc_rqst *rqstp)
}
out_unlock:
- end_creating(dchild, dirfhp->fh_dentry);
+ end_creating(dchild);
out_write:
fh_drop_write(dirfhp);
done:
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 62109885d4db..6e9a57863904 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1589,7 +1589,7 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp,
out:
if (!err)
fh_fill_post_attrs(fhp);
- end_creating(dchild, dentry);
+ end_creating(dchild);
return err;
out_nfserr:
@@ -1646,7 +1646,7 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
return err;
out_unlock:
- end_creating(dchild, dentry);
+ end_creating(dchild);
return err;
}
@@ -1747,7 +1747,7 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
nfsd_create_setattr(rqstp, fhp, resfhp, attrs);
fh_fill_post_attrs(fhp);
out_unlock:
- end_creating(dnew, dentry);
+ end_creating(dnew);
if (!err)
err = nfserrno(commit_metadata(fhp));
if (!err)
@@ -1824,7 +1824,7 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
host_err = vfs_link(dold, &nop_mnt_idmap, dirp, dnew, NULL);
fh_fill_post_attrs(ffhp);
out_unlock:
- end_creating(dnew, ddir);
+ end_creating(dnew);
if (!host_err) {
host_err = commit_metadata(ffhp);
if (!host_err)
diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 27014ada11c7..36949856ddea 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -624,7 +624,7 @@ static int ovl_link_up(struct ovl_copy_up_ctx *c)
ovl_dentry_set_upper_alias(c->dentry);
ovl_dentry_update_reval(c->dentry, upper);
}
- end_creating(upper, upperdir);
+ end_creating(upper);
}
if (err)
goto out;
@@ -891,7 +891,7 @@ static int ovl_copy_up_tmpfile(struct ovl_copy_up_ctx *c)
err = PTR_ERR(upper);
if (!IS_ERR(upper)) {
err = ovl_do_link(ofs, temp, udir, upper);
- end_creating(upper, c->destdir);
+ end_creating(upper);
}
if (err)
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index dfd1d4b48948..00dc797f2da7 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -91,7 +91,7 @@ static struct dentry *ovl_whiteout(struct ovl_fs *ofs)
err = ovl_do_whiteout(ofs, wdir, whiteout);
if (!err)
ofs->whiteout = dget(whiteout);
- end_creating(whiteout, workdir);
+ end_creating(whiteout);
if (err)
return ERR_PTR(err);
}
@@ -103,7 +103,7 @@ static struct dentry *ovl_whiteout(struct ovl_fs *ofs)
err = ovl_do_link(ofs, ofs->whiteout, wdir, link);
if (!err)
whiteout = dget(link);
- end_creating(link, workdir);
+ end_creating(link);
if (!err)
return whiteout;;
@@ -253,7 +253,7 @@ struct dentry *ovl_create_temp(struct ovl_fs *ofs, struct dentry *workdir,
ret = ovl_create_real(ofs, workdir, ret, attr);
if (!IS_ERR(ret))
dget(ret);
- end_creating(ret, workdir);
+ end_creating(ret);
return ret;
}
@@ -361,12 +361,11 @@ static int ovl_create_upper(struct dentry *dentry, struct inode *inode,
if (IS_ERR(newdentry))
return PTR_ERR(newdentry);
newdentry = ovl_create_real(ofs, upperdir, newdentry, attr);
- if (IS_ERR(newdentry)) {
- end_creating(newdentry, upperdir);
+ if (IS_ERR(newdentry))
return PTR_ERR(newdentry);
- }
+
dget(newdentry);
- end_creating(newdentry, upperdir);
+ end_creating(newdentry);
if (ovl_type_merge(dentry->d_parent) && d_is_dir(newdentry) &&
!ovl_allow_offline_changes(ofs)) {
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index a721ef2b90e8..3acda985c8a3 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -320,7 +320,7 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
if (work->d_inode) {
dget(work);
- end_creating(work, ofs->workbasedir);
+ end_creating(work);
if (persist)
return work;
err = -EEXIST;
@@ -338,7 +338,7 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
work = ovl_do_mkdir(ofs, dir, work, attr.ia_mode);
if (!IS_ERR(work))
dget(work);
- end_creating(work, ofs->workbasedir);
+ end_creating(work);
err = PTR_ERR(work);
if (IS_ERR(work))
goto out_err;
@@ -632,7 +632,7 @@ static struct dentry *ovl_lookup_or_create(struct ovl_fs *ofs,
OVL_CATTR(mode));
if (!IS_ERR(child))
dget(child);
- end_creating(child, parent);
+ end_creating(child);
}
dput(parent);
diff --git a/fs/xfs/scrub/orphanage.c b/fs/xfs/scrub/orphanage.c
index e732605924a1..b77c2b6b6d44 100644
--- a/fs/xfs/scrub/orphanage.c
+++ b/fs/xfs/scrub/orphanage.c
@@ -199,7 +199,7 @@ xrep_orphanage_create(
sc->orphanage_ilock_flags = 0;
out_dput_orphanage:
- end_creating(orphanage_dentry, root_dentry);
+ end_creating(orphanage_dentry);
out_dput_root:
dput(root_dentry);
out:
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 208aed1d6728..0ef73d739a31 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -105,34 +105,24 @@ struct dentry *start_creating_dentry(struct dentry *parent,
struct dentry *start_removing_dentry(struct dentry *parent,
struct dentry *child);
-/**
- * end_creating - finish action started with start_creating
- * @child: dentry returned by start_creating() or vfs_mkdir()
- * @parent: dentry given to start_creating(),
- *
- * Unlock and release the child.
+/* end_creating - finish action started with start_creating
+ * @child: dentry returned by start_creating() or vfs_mkdir()
*
- * Unlike end_dirop() this can only be called if start_creating() succeeded.
- * It handles @child being and error as vfs_mkdir() might have converted the
- * dentry to an error - in that case the parent still needs to be unlocked.
+ * Unlock and release the child. This can be called after
+ * start_creating() whether that function succeeded or not,
+ * but it is not needed on failure.
*
* If vfs_mkdir() was called then the value returned from that function
* should be given for @child rather than the original dentry, as vfs_mkdir()
- * may have provided a new dentry. Even if vfs_mkdir() returns an error
- * it must be given to end_creating().
+ * may have provided a new dentry.
+ *
*
* If vfs_mkdir() was not called, then @child will be a valid dentry and
* @parent will be ignored.
*/
-static inline void end_creating(struct dentry *child, struct dentry *parent)
+static inline void end_creating(struct dentry *child)
{
- if (IS_ERR(child))
- /* The parent is still locked despite the error from
- * vfs_mkdir() - must unlock it.
- */
- inode_unlock(parent->d_inode);
- else
- end_dirop(child);
+ end_dirop(child);
}
/**
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 060e8e9c4f59..7713a61aa431 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -931,7 +931,7 @@ static int do_mq_open(const char __user *u_name, int oflag, umode_t mode,
put_unused_fd(fd);
fd = error;
}
- end_creating(path.dentry, root);
+ end_creating(path.dentry);
if (!ro)
mnt_drop_write(mnt);
out_putname:
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 14/14] VFS: introduce end_creating_keep()
2025-10-15 1:46 [PATCH v2 00/14] Create and use APIs to centralise locking for directory ops NeilBrown
` (12 preceding siblings ...)
2025-10-15 1:47 ` [PATCH v2 13/14] VFS: change vfs_mkdir() to unlock on failure NeilBrown
@ 2025-10-15 1:47 ` NeilBrown
2025-10-19 10:39 ` Amir Goldstein
2025-10-19 10:50 ` [PATCH v2 00/14] Create and use APIs to centralise locking for directory ops Amir Goldstein
14 siblings, 1 reply; 32+ messages in thread
From: NeilBrown @ 2025-10-15 1:47 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton
Cc: Jan Kara, linux-fsdevel
From: NeilBrown <neil@brown.name>
Occasionally the caller of end_creating() wants to keep using the dentry.
Rather then requiring them to dget() the dentry (when not an error)
before calling end_creating(), provide end_creating_keep() which does
this.
cachefiles and overlayfs make use of this.
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/cachefiles/namei.c | 3 +--
fs/overlayfs/dir.c | 8 ++------
fs/overlayfs/super.c | 11 +++--------
include/linux/namei.h | 22 ++++++++++++++++++++++
4 files changed, 28 insertions(+), 16 deletions(-)
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 10f010dc9946..5c50293328f4 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -155,8 +155,7 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
/* Tell rmdir() it's not allowed to delete the subdir */
inode_lock(d_inode(subdir));
- dget(subdir);
- end_creating(subdir);
+ end_creating_keep(subdir);
if (!__cachefiles_mark_inode_in_use(NULL, d_inode(subdir))) {
pr_notice("cachefiles: Inode already in use: %pd (B=%lx)\n",
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index 00dc797f2da7..cadbb47c6225 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -251,10 +251,7 @@ struct dentry *ovl_create_temp(struct ovl_fs *ofs, struct dentry *workdir,
if (IS_ERR(ret))
return ret;
ret = ovl_create_real(ofs, workdir, ret, attr);
- if (!IS_ERR(ret))
- dget(ret);
- end_creating(ret);
- return ret;
+ return end_creating_keep(ret);
}
static int ovl_set_opaque_xerr(struct dentry *dentry, struct dentry *upper,
@@ -364,8 +361,7 @@ static int ovl_create_upper(struct dentry *dentry, struct inode *inode,
if (IS_ERR(newdentry))
return PTR_ERR(newdentry);
- dget(newdentry);
- end_creating(newdentry);
+ end_creating_keep(newdentry);
if (ovl_type_merge(dentry->d_parent) && d_is_dir(newdentry) &&
!ovl_allow_offline_changes(ofs)) {
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 3acda985c8a3..7b8fc1cab6eb 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -319,8 +319,7 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
};
if (work->d_inode) {
- dget(work);
- end_creating(work);
+ end_creating_keep(work);
if (persist)
return work;
err = -EEXIST;
@@ -336,9 +335,7 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
}
work = ovl_do_mkdir(ofs, dir, work, attr.ia_mode);
- if (!IS_ERR(work))
- dget(work);
- end_creating(work);
+ end_creating_keep(work);
err = PTR_ERR(work);
if (IS_ERR(work))
goto out_err;
@@ -630,9 +627,7 @@ static struct dentry *ovl_lookup_or_create(struct ovl_fs *ofs,
if (!child->d_inode)
child = ovl_create_real(ofs, parent, child,
OVL_CATTR(mode));
- if (!IS_ERR(child))
- dget(child);
- end_creating(child);
+ end_creating_keep(child);
}
dput(parent);
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 0ef73d739a31..3d82c6a19197 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -125,6 +125,28 @@ static inline void end_creating(struct dentry *child)
end_dirop(child);
}
+/* end_creating_keep - finish action started with start_creating() and return result
+ * @child: dentry returned by start_creating() or vfs_mkdir()
+ *
+ * Unlock and return the child. This can be called after
+ * start_creating() whether that function succeeded or not,
+ * but it is not needed on failure.
+ *
+ * If vfs_mkdir() was called then the value returned from that function
+ * should be given for @child rather than the original dentry, as vfs_mkdir()
+ * may have provided a new dentry.
+ *
+ * Returns: @child, which may be a dentry or an error.
+ *
+ */
+static inline struct dentry *end_creating_keep(struct dentry *child)
+{
+ if (!IS_ERR(child))
+ dget(child);
+ end_dirop(child);
+ return child;
+}
+
/**
* end_removing - finish action started with start_removing
* @child: dentry returned by start_removing()
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v2 02/14] VFS: introduce start_dirop() and end_dirop()
2025-10-15 1:46 ` [PATCH v2 02/14] VFS: introduce start_dirop() and end_dirop() NeilBrown
@ 2025-10-19 9:56 ` Amir Goldstein
0 siblings, 0 replies; 32+ messages in thread
From: Amir Goldstein @ 2025-10-19 9:56 UTC (permalink / raw)
To: NeilBrown
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Wed, Oct 15, 2025 at 3:48 AM NeilBrown <neilb@ownmail.net> wrote:
>
> From: NeilBrown <neil@brown.name>
>
> The fact that directory operations (create,remove,rename) are protected
> by a lock on the parent is known widely throughout the kernel.
> In order to change this - to instead lock the target dentry - it is
> best to centralise this knowledge so it can be changed in one place.
>
> This patch introduces start_dirop() which is local to VFS code.
> It performs the required locking for create and remove. Rename
> will be handled separately.
>
> Various functions with names like start_creating() or start_removing_path(),
> some of which already exist, will export this functionality beyond the VFS.
>
> end_dirop() is the partner of start_dirop(). It drops the lock and
> releases the reference on the dentry.
> It *is* exported so that various end_creating etc functions can be inline.
>
> As vfs_mkdir() drops the dentry on error we cannot use end_dirop() as
> that won't unlock when the dentry IS_ERR(). For now we need an explicit
> unlock when dentry IS_ERR(). I hope to change vfs_mkdir() to unlock
> when it drops a dentry so that explicit unlock can go away.
>
> end_dirop() can always be called on the result of start_dirop(), but not
> after vfs_mkdir(). After a vfs_mkdir() we still may need the explicit
> unlock as seen in end_creating_path().
>
> As well as adding start_dirop() and end_dirop()
> this patch uses them in:
> - simple_start_creating (which requires sharing lookup_noperm_common()
> with libfs.c)
> - start_removing_path / start_removing_user_path_at
> - filename_create / end_creating_path()
> - do_rmdir(), do_unlinkat()
>
> Signed-off-by: NeilBrown <neil@brown.name>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> ---
> fs/internal.h | 3 ++
> fs/libfs.c | 36 ++++++++---------
> fs/namei.c | 98 ++++++++++++++++++++++++++++++++++------------
> include/linux/fs.h | 2 +
> 4 files changed, 95 insertions(+), 44 deletions(-)
>
> diff --git a/fs/internal.h b/fs/internal.h
> index 9b2b4d116880..d08d5e2235e9 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -67,6 +67,9 @@ int vfs_tmpfile(struct mnt_idmap *idmap,
> const struct path *parentpath,
> struct file *file, umode_t mode);
> struct dentry *d_hash_and_lookup(struct dentry *, struct qstr *);
> +struct dentry *start_dirop(struct dentry *parent, struct qstr *name,
> + unsigned int lookup_flags);
> +int lookup_noperm_common(struct qstr *qname, struct dentry *base);
>
> /*
> * namespace.c
> diff --git a/fs/libfs.c b/fs/libfs.c
> index ce8c496a6940..02371f45ef7d 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -2289,27 +2289,25 @@ void stashed_dentry_prune(struct dentry *dentry)
> cmpxchg(stashed, dentry, NULL);
> }
>
> -/* parent must be held exclusive */
> +/**
> + * simple_start_creating - prepare to create a given name
> + * @parent: directory in which to prepare to create the name
> + * @name: the name to be created
> + *
> + * Required lock is taken and a lookup in performed prior to creating an
> + * object in a directory. No permission checking is performed.
> + *
> + * Returns: a negative dentry on which vfs_create() or similar may
> + * be attempted, or an error.
> + */
> struct dentry *simple_start_creating(struct dentry *parent, const char *name)
> {
> - struct dentry *dentry;
> - struct inode *dir = d_inode(parent);
> + struct qstr qname = QSTR(name);
> + int err;
>
> - inode_lock(dir);
> - if (unlikely(IS_DEADDIR(dir))) {
> - inode_unlock(dir);
> - return ERR_PTR(-ENOENT);
> - }
> - dentry = lookup_noperm(&QSTR(name), parent);
> - if (IS_ERR(dentry)) {
> - inode_unlock(dir);
> - return dentry;
> - }
> - if (dentry->d_inode) {
> - dput(dentry);
> - inode_unlock(dir);
> - return ERR_PTR(-EEXIST);
> - }
> - return dentry;
> + err = lookup_noperm_common(&qname, parent);
> + if (err)
> + return ERR_PTR(err);
> + return start_dirop(parent, &qname, LOOKUP_CREATE | LOOKUP_EXCL);
> }
> EXPORT_SYMBOL(simple_start_creating);
> diff --git a/fs/namei.c b/fs/namei.c
> index 7377020a2cba..3618efd4bcaa 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -2765,6 +2765,48 @@ static int filename_parentat(int dfd, struct filename *name,
> return __filename_parentat(dfd, name, flags, parent, last, type, NULL);
> }
>
> +/**
> + * start_dirop - begin a create or remove dirop, performing locking and lookup
> + * @parent: the dentry of the parent in which the operation will occur
> + * @name: a qstr holding the name within that parent
> + * @lookup_flags: intent and other lookup flags.
> + *
> + * The lookup is performed and necessary locks are taken so that, on success,
> + * the returned dentry can be operated on safely.
> + * The qstr must already have the hash value calculated.
> + *
> + * Returns: a locked dentry, or an error.
> + *
> + */
> +struct dentry *start_dirop(struct dentry *parent, struct qstr *name,
> + unsigned int lookup_flags)
> +{
> + struct dentry *dentry;
> + struct inode *dir = d_inode(parent);
> +
> + inode_lock_nested(dir, I_MUTEX_PARENT);
> + dentry = lookup_one_qstr_excl(name, parent, lookup_flags);
> + if (IS_ERR(dentry))
> + inode_unlock(dir);
> + return dentry;
> +}
> +
> +/**
> + * end_dirop - signal completion of a dirop
> + * @de: the dentry which was returned by start_dirop or similar.
> + *
> + * If the de is an error, nothing happens. Otherwise any lock taken to
> + * protect the dentry is dropped and the dentry itself is release (dput()).
> + */
> +void end_dirop(struct dentry *de)
> +{
> + if (!IS_ERR(de)) {
> + inode_unlock(de->d_parent->d_inode);
> + dput(de);
> + }
> +}
> +EXPORT_SYMBOL(end_dirop);
> +
> /* does lookup, returns the object with parent locked */
> static struct dentry *__start_removing_path(int dfd, struct filename *name,
> struct path *path)
> @@ -2781,10 +2823,9 @@ static struct dentry *__start_removing_path(int dfd, struct filename *name,
> return ERR_PTR(-EINVAL);
> /* don't fail immediately if it's r/o, at least try to report other errors */
> error = mnt_want_write(parent_path.mnt);
> - inode_lock_nested(parent_path.dentry->d_inode, I_MUTEX_PARENT);
> - d = lookup_one_qstr_excl(&last, parent_path.dentry, 0);
> + d = start_dirop(parent_path.dentry, &last, 0);
> if (IS_ERR(d))
> - goto unlock;
> + goto drop;
> if (error)
> goto fail;
> path->dentry = no_free_ptr(parent_path.dentry);
> @@ -2792,10 +2833,9 @@ static struct dentry *__start_removing_path(int dfd, struct filename *name,
> return d;
>
> fail:
> - dput(d);
> + end_dirop(d);
> d = ERR_PTR(error);
> -unlock:
> - inode_unlock(parent_path.dentry->d_inode);
> +drop:
> if (!error)
> mnt_drop_write(parent_path.mnt);
> return d;
> @@ -2910,7 +2950,7 @@ int vfs_path_lookup(struct dentry *dentry, struct vfsmount *mnt,
> }
> EXPORT_SYMBOL(vfs_path_lookup);
>
> -static int lookup_noperm_common(struct qstr *qname, struct dentry *base)
> +int lookup_noperm_common(struct qstr *qname, struct dentry *base)
> {
> const char *name = qname->name;
> u32 len = qname->len;
> @@ -4223,21 +4263,18 @@ static struct dentry *filename_create(int dfd, struct filename *name,
> */
> if (last.name[last.len] && !want_dir)
> create_flags &= ~LOOKUP_CREATE;
> - inode_lock_nested(path->dentry->d_inode, I_MUTEX_PARENT);
> - dentry = lookup_one_qstr_excl(&last, path->dentry,
> - reval_flag | create_flags);
> + dentry = start_dirop(path->dentry, &last, reval_flag | create_flags);
> if (IS_ERR(dentry))
> - goto unlock;
> + goto out_drop_write;
>
> if (unlikely(error))
> goto fail;
>
> return dentry;
> fail:
> - dput(dentry);
> + end_dirop(dentry);
> dentry = ERR_PTR(error);
> -unlock:
> - inode_unlock(path->dentry->d_inode);
> +out_drop_write:
> if (!error)
> mnt_drop_write(path->mnt);
> out:
> @@ -4256,11 +4293,26 @@ struct dentry *start_creating_path(int dfd, const char *pathname,
> }
> EXPORT_SYMBOL(start_creating_path);
>
> +/**
> + * end_creating_path - finish a code section started by start_creating_path()
> + * @path: the path instantiated by start_creating_path()
> + * @dentry: the dentry returned by start_creating_path()
> + *
> + * end_creating_path() will unlock and locks taken by start_creating_path()
> + * and drop an references that were taken. It should only be called
> + * if start_creating_path() returned a non-error.
> + * If vfs_mkdir() was called and it returned an error, that error *should*
> + * be passed to end_creating_path() together with the path.
> + */
> void end_creating_path(const struct path *path, struct dentry *dentry)
> {
> - if (!IS_ERR(dentry))
> - dput(dentry);
> - inode_unlock(path->dentry->d_inode);
> + if (IS_ERR(dentry))
> + /* The parent is still locked despite the error from
> + * vfs_mkdir() - must unlock it.
> + */
> + inode_unlock(path->dentry->d_inode);
> + else
> + end_dirop(dentry);
> mnt_drop_write(path->mnt);
> path_put(path);
> }
> @@ -4592,8 +4644,7 @@ int do_rmdir(int dfd, struct filename *name)
> if (error)
> goto exit2;
>
> - inode_lock_nested(path.dentry->d_inode, I_MUTEX_PARENT);
> - dentry = lookup_one_qstr_excl(&last, path.dentry, lookup_flags);
> + dentry = start_dirop(path.dentry, &last, lookup_flags);
> error = PTR_ERR(dentry);
> if (IS_ERR(dentry))
> goto exit3;
> @@ -4602,9 +4653,8 @@ int do_rmdir(int dfd, struct filename *name)
> goto exit4;
> error = vfs_rmdir(mnt_idmap(path.mnt), path.dentry->d_inode, dentry);
> exit4:
> - dput(dentry);
> + end_dirop(dentry);
> exit3:
> - inode_unlock(path.dentry->d_inode);
> mnt_drop_write(path.mnt);
> exit2:
> path_put(&path);
> @@ -4721,8 +4771,7 @@ int do_unlinkat(int dfd, struct filename *name)
> if (error)
> goto exit2;
> retry_deleg:
> - inode_lock_nested(path.dentry->d_inode, I_MUTEX_PARENT);
> - dentry = lookup_one_qstr_excl(&last, path.dentry, lookup_flags);
> + dentry = start_dirop(path.dentry, &last, lookup_flags);
> error = PTR_ERR(dentry);
> if (!IS_ERR(dentry)) {
>
> @@ -4737,9 +4786,8 @@ int do_unlinkat(int dfd, struct filename *name)
> error = vfs_unlink(mnt_idmap(path.mnt), path.dentry->d_inode,
> dentry, &delegated_inode);
> exit3:
> - dput(dentry);
> + end_dirop(dentry);
> }
> - inode_unlock(path.dentry->d_inode);
> if (inode)
> iput(inode); /* truncate the inode here */
> inode = NULL;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index c895146c1444..f4543612ef1e 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3609,6 +3609,8 @@ extern void iterate_supers_type(struct file_system_type *,
> void filesystems_freeze(void);
> void filesystems_thaw(void);
>
> +void end_dirop(struct dentry *de);
> +
> extern int dcache_dir_open(struct inode *, struct file *);
> extern int dcache_dir_close(struct inode *, struct file *);
> extern loff_t dcache_dir_lseek(struct file *, loff_t, int);
> --
> 2.50.0.107.gf914562f5916.dirty
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 03/14] VFS: tidy up do_unlinkat()
2025-10-15 1:46 ` [PATCH v2 03/14] VFS: tidy up do_unlinkat() NeilBrown
@ 2025-10-19 10:02 ` Amir Goldstein
0 siblings, 0 replies; 32+ messages in thread
From: Amir Goldstein @ 2025-10-19 10:02 UTC (permalink / raw)
To: NeilBrown
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Wed, Oct 15, 2025 at 3:48 AM NeilBrown <neilb@ownmail.net> wrote:
>
> From: NeilBrown <neil@brown.name>
>
> The simplification of locking in the previous patch opens up some room
> for tidying up do_unlinkat()
>
> - change all "exit" labels to describe what will happen at the label.
> - always goto an exit label on an error - unwrap the "if (!IS_ERR())" branch.
> - Move the "slashes" handing inline, but mark it as unlikely()
> - simplify use of the "inode" variable - we no longer need to test for NULL.
>
> Signed-off-by: NeilBrown <neil@brown.name>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> ---
> fs/namei.c | 55 ++++++++++++++++++++++++++----------------------------
> 1 file changed, 26 insertions(+), 29 deletions(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 3618efd4bcaa..9effaad115d9 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -4755,65 +4755,62 @@ int do_unlinkat(int dfd, struct filename *name)
> struct path path;
> struct qstr last;
> int type;
> - struct inode *inode = NULL;
> + struct inode *inode;
> struct inode *delegated_inode = NULL;
> unsigned int lookup_flags = 0;
> retry:
> error = filename_parentat(dfd, name, lookup_flags, &path, &last, &type);
> if (error)
> - goto exit1;
> + goto exit_putname;
>
> error = -EISDIR;
> if (type != LAST_NORM)
> - goto exit2;
> + goto exit_path_put;
>
> error = mnt_want_write(path.mnt);
> if (error)
> - goto exit2;
> + goto exit_path_put;
> retry_deleg:
> dentry = start_dirop(path.dentry, &last, lookup_flags);
> error = PTR_ERR(dentry);
> - if (!IS_ERR(dentry)) {
> + if (IS_ERR(dentry))
> + goto exit_drop_write;
>
> - /* Why not before? Because we want correct error value */
> - if (last.name[last.len])
> - goto slashes;
> - inode = dentry->d_inode;
> - ihold(inode);
> - error = security_path_unlink(&path, dentry);
> - if (error)
> - goto exit3;
> - error = vfs_unlink(mnt_idmap(path.mnt), path.dentry->d_inode,
> - dentry, &delegated_inode);
> -exit3:
> + /* Why not before? Because we want correct error value */
> + if (unlikely(last.name[last.len])) {
> + if (d_is_dir(dentry))
> + error = -EISDIR;
> + else
> + error = -ENOTDIR;
> end_dirop(dentry);
> + goto exit_drop_write;
> }
> - if (inode)
> - iput(inode); /* truncate the inode here */
> - inode = NULL;
> + inode = dentry->d_inode;
> + ihold(inode);
> + error = security_path_unlink(&path, dentry);
> + if (error)
> + goto exit_end_dirop;
> + error = vfs_unlink(mnt_idmap(path.mnt), path.dentry->d_inode,
> + dentry, &delegated_inode);
> +exit_end_dirop:
> + end_dirop(dentry);
> + iput(inode); /* truncate the inode here */
> if (delegated_inode) {
> error = break_deleg_wait(&delegated_inode);
> if (!error)
> goto retry_deleg;
> }
> +exit_drop_write:
> mnt_drop_write(path.mnt);
> -exit2:
> +exit_path_put:
> path_put(&path);
> if (retry_estale(error, lookup_flags)) {
> lookup_flags |= LOOKUP_REVAL;
> - inode = NULL;
> goto retry;
> }
> -exit1:
> +exit_putname:
> putname(name);
> return error;
> -
> -slashes:
> - if (d_is_dir(dentry))
> - error = -EISDIR;
> - else
> - error = -ENOTDIR;
> - goto exit3;
> }
>
> SYSCALL_DEFINE3(unlinkat, int, dfd, const char __user *, pathname, int, flag)
> --
> 2.50.0.107.gf914562f5916.dirty
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 04/14] VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating()
2025-10-15 1:46 ` [PATCH v2 04/14] VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating() NeilBrown
@ 2025-10-19 10:10 ` Amir Goldstein
0 siblings, 0 replies; 32+ messages in thread
From: Amir Goldstein @ 2025-10-19 10:10 UTC (permalink / raw)
To: NeilBrown
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Wed, Oct 15, 2025 at 3:48 AM NeilBrown <neilb@ownmail.net> wrote:
>
> From: NeilBrown <neil@brown.name>
>
> start_creating() is similar to simple_start_creating() but is not so
> simple.
> It takes a qstr for the name, includes permission checking, and does NOT
> report an error if the name already exists, returning a positive dentry
> instead.
>
> This is currently used by nfsd, cachefiles, and overlayfs.
>
> end_creating() is called after the dentry has been used.
> end_creating() drops the reference to the dentry as it is generally no
> longer needed. This is exactly the first section of end_creating_path()
> so that function is changed to call the new end_creating()
>
> These calls help encapsulate locking rules so that directory locking can
> be changed.
>
> Occasionally this change means that the parent lock is held for a
> shorter period of time, for example in cachefiles_commit_tmpfile().
> As this function now unlocks after an unlink and before the following
> lookup, it is possible that the lookup could again find a positive
> dentry, so a while loop is introduced there.
>
> In overlayfs the ovl_lookup_temp() function has ovl_tempname()
> split out to be used in ovl_start_creating_temp(). The other use
> of ovl_lookup_temp() is preparing for a rename. When rename handling
> is updated, ovl_lookup_temp() will be removed.
>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
> Signed-off-by: NeilBrown <neil@brown.name>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> ---
> fs/cachefiles/namei.c | 41 ++++++++---------
> fs/namei.c | 35 ++++++++++++---
> fs/nfsd/nfs3proc.c | 14 +++---
> fs/nfsd/nfs4proc.c | 14 +++---
> fs/nfsd/nfs4recover.c | 16 +++----
> fs/nfsd/nfsproc.c | 11 +++--
> fs/nfsd/vfs.c | 52 +++++++++-------------
> fs/overlayfs/copy_up.c | 19 ++++----
> fs/overlayfs/dir.c | 96 +++++++++++++++++++++++-----------------
> fs/overlayfs/overlayfs.h | 8 ++++
> fs/overlayfs/super.c | 32 +++++++-------
> include/linux/namei.h | 33 ++++++++++++++
> 12 files changed, 213 insertions(+), 158 deletions(-)
>
> diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
> index d1edb2ac3837..0a136eb434da 100644
> --- a/fs/cachefiles/namei.c
> +++ b/fs/cachefiles/namei.c
> @@ -93,12 +93,11 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
> _enter(",,%s", dirname);
>
> /* search the current directory for the element name */
> - inode_lock_nested(d_inode(dir), I_MUTEX_PARENT);
>
> retry:
> ret = cachefiles_inject_read_error();
> if (ret == 0)
> - subdir = lookup_one(&nop_mnt_idmap, &QSTR(dirname), dir);
> + subdir = start_creating(&nop_mnt_idmap, dir, &QSTR(dirname));
> else
> subdir = ERR_PTR(ret);
> trace_cachefiles_lookup(NULL, dir, subdir);
> @@ -141,7 +140,7 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
> trace_cachefiles_mkdir(dir, subdir);
>
> if (unlikely(d_unhashed(subdir) || d_is_negative(subdir))) {
> - dput(subdir);
> + end_creating(subdir, dir);
> goto retry;
> }
> ASSERT(d_backing_inode(subdir));
> @@ -154,7 +153,8 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
>
> /* Tell rmdir() it's not allowed to delete the subdir */
> inode_lock(d_inode(subdir));
> - inode_unlock(d_inode(dir));
> + dget(subdir);
> + end_creating(subdir, dir);
>
> if (!__cachefiles_mark_inode_in_use(NULL, d_inode(subdir))) {
> pr_notice("cachefiles: Inode already in use: %pd (B=%lx)\n",
> @@ -196,14 +196,11 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
> return ERR_PTR(-EBUSY);
>
> mkdir_error:
> - inode_unlock(d_inode(dir));
> - if (!IS_ERR(subdir))
> - dput(subdir);
> + end_creating(subdir, dir);
> pr_err("mkdir %s failed with error %d\n", dirname, ret);
> return ERR_PTR(ret);
>
> lookup_error:
> - inode_unlock(d_inode(dir));
> ret = PTR_ERR(subdir);
> pr_err("Lookup %s failed with error %d\n", dirname, ret);
> return ERR_PTR(ret);
> @@ -679,36 +676,41 @@ bool cachefiles_commit_tmpfile(struct cachefiles_cache *cache,
>
> _enter(",%pD", object->file);
>
> - inode_lock_nested(d_inode(fan), I_MUTEX_PARENT);
> ret = cachefiles_inject_read_error();
> if (ret == 0)
> - dentry = lookup_one(&nop_mnt_idmap, &QSTR(object->d_name), fan);
> + dentry = start_creating(&nop_mnt_idmap, fan, &QSTR(object->d_name));
> else
> dentry = ERR_PTR(ret);
> if (IS_ERR(dentry)) {
> trace_cachefiles_vfs_error(object, d_inode(fan), PTR_ERR(dentry),
> cachefiles_trace_lookup_error);
> _debug("lookup fail %ld", PTR_ERR(dentry));
> - goto out_unlock;
> + goto out;
> }
>
> - if (!d_is_negative(dentry)) {
> + /*
> + * This loop will only execute more than once if some other thread
> + * races to create the object we are trying to create.
> + */
> + while (!d_is_negative(dentry)) {
> ret = cachefiles_unlink(volume->cache, object, fan, dentry,
> FSCACHE_OBJECT_IS_STALE);
> if (ret < 0)
> - goto out_dput;
> + goto out_end;
> +
> + end_creating(dentry, fan);
>
> - dput(dentry);
> ret = cachefiles_inject_read_error();
> if (ret == 0)
> - dentry = lookup_one(&nop_mnt_idmap, &QSTR(object->d_name), fan);
> + dentry = start_creating(&nop_mnt_idmap, fan,
> + &QSTR(object->d_name));
> else
> dentry = ERR_PTR(ret);
> if (IS_ERR(dentry)) {
> trace_cachefiles_vfs_error(object, d_inode(fan), PTR_ERR(dentry),
> cachefiles_trace_lookup_error);
> _debug("lookup fail %ld", PTR_ERR(dentry));
> - goto out_unlock;
> + goto out;
> }
> }
>
> @@ -729,10 +731,9 @@ bool cachefiles_commit_tmpfile(struct cachefiles_cache *cache,
> success = true;
> }
>
> -out_dput:
> - dput(dentry);
> -out_unlock:
> - inode_unlock(d_inode(fan));
> +out_end:
> + end_creating(dentry, fan);
> +out:
> _leave(" = %u", success);
> return success;
> }
> diff --git a/fs/namei.c b/fs/namei.c
> index 9effaad115d9..9972b0257a4c 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3221,6 +3221,33 @@ struct dentry *lookup_noperm_positive_unlocked(struct qstr *name,
> }
> EXPORT_SYMBOL(lookup_noperm_positive_unlocked);
>
> +/**
> + * start_creating - prepare to create a given name with permission checking
> + * @idmap: idmap of the mount
> + * @parent: directory in which to prepare to create the name
> + * @name: the name to be created
> + *
> + * Locks are taken and a lookup is performed prior to creating
> + * an object in a directory. Permission checking (MAY_EXEC) is performed
> + * against @idmap.
> + *
> + * If the name already exists, a positive dentry is returned, so
> + * behaviour is similar to O_CREAT without O_EXCL, which doesn't fail
> + * with -EEXIST.
> + *
> + * Returns: a negative or positive dentry, or an error.
> + */
> +struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *parent,
> + struct qstr *name)
> +{
> + int err = lookup_one_common(idmap, name, parent);
> +
> + if (err)
> + return ERR_PTR(err);
> + return start_dirop(parent, name, LOOKUP_CREATE);
> +}
> +EXPORT_SYMBOL(start_creating);
> +
> #ifdef CONFIG_UNIX98_PTYS
> int path_pts(struct path *path)
> {
> @@ -4306,13 +4333,7 @@ EXPORT_SYMBOL(start_creating_path);
> */
> void end_creating_path(const struct path *path, struct dentry *dentry)
> {
> - if (IS_ERR(dentry))
> - /* The parent is still locked despite the error from
> - * vfs_mkdir() - must unlock it.
> - */
> - inode_unlock(path->dentry->d_inode);
> - else
> - end_dirop(dentry);
> + end_creating(dentry, path->dentry);
> mnt_drop_write(path->mnt);
> path_put(path);
> }
> diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
> index b6d03e1ef5f7..e2aac0def2cb 100644
> --- a/fs/nfsd/nfs3proc.c
> +++ b/fs/nfsd/nfs3proc.c
> @@ -281,14 +281,11 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> if (host_err)
> return nfserrno(host_err);
>
> - inode_lock_nested(inode, I_MUTEX_PARENT);
> -
> - child = lookup_one(&nop_mnt_idmap,
> - &QSTR_LEN(argp->name, argp->len),
> - parent);
> + child = start_creating(&nop_mnt_idmap, parent,
> + &QSTR_LEN(argp->name, argp->len));
> if (IS_ERR(child)) {
> status = nfserrno(PTR_ERR(child));
> - goto out;
> + goto out_write;
> }
>
> if (d_really_is_negative(child)) {
> @@ -367,9 +364,8 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> status = nfsd_create_setattr(rqstp, fhp, resfhp, &attrs);
>
> out:
> - inode_unlock(inode);
> - if (child && !IS_ERR(child))
> - dput(child);
> + end_creating(child, parent);
> +out_write:
> fh_drop_write(fhp);
> return status;
> }
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index e466cf52d7d7..b2c95e8e7c68 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -264,14 +264,11 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> if (is_create_with_attrs(open))
> nfsd4_acl_to_attr(NF4REG, open->op_acl, &attrs);
>
> - inode_lock_nested(inode, I_MUTEX_PARENT);
> -
> - child = lookup_one(&nop_mnt_idmap,
> - &QSTR_LEN(open->op_fname, open->op_fnamelen),
> - parent);
> + child = start_creating(&nop_mnt_idmap, parent,
> + &QSTR_LEN(open->op_fname, open->op_fnamelen));
> if (IS_ERR(child)) {
> status = nfserrno(PTR_ERR(child));
> - goto out;
> + goto out_write;
> }
>
> if (d_really_is_negative(child)) {
> @@ -379,10 +376,9 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp,
> if (attrs.na_aclerr)
> open->op_bmval[0] &= ~FATTR4_WORD0_ACL;
> out:
> - inode_unlock(inode);
> + end_creating(child, parent);
> nfsd_attrs_free(&attrs);
> - if (child && !IS_ERR(child))
> - dput(child);
> +out_write:
> fh_drop_write(fhp);
> return status;
> }
> diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
> index e2b9472e5c78..c247a7c3291c 100644
> --- a/fs/nfsd/nfs4recover.c
> +++ b/fs/nfsd/nfs4recover.c
> @@ -195,13 +195,11 @@ nfsd4_create_clid_dir(struct nfs4_client *clp)
> goto out_creds;
>
> dir = nn->rec_file->f_path.dentry;
> - /* lock the parent */
> - inode_lock(d_inode(dir));
>
> - dentry = lookup_one(&nop_mnt_idmap, &QSTR(dname), dir);
> + dentry = start_creating(&nop_mnt_idmap, dir, &QSTR(dname));
> if (IS_ERR(dentry)) {
> status = PTR_ERR(dentry);
> - goto out_unlock;
> + goto out;
> }
> if (d_really_is_positive(dentry))
> /*
> @@ -212,15 +210,13 @@ nfsd4_create_clid_dir(struct nfs4_client *clp)
> * In the 4.0 case, we should never get here; but we may
> * as well be forgiving and just succeed silently.
> */
> - goto out_put;
> + goto out_end;
> dentry = vfs_mkdir(&nop_mnt_idmap, d_inode(dir), dentry, S_IRWXU);
> if (IS_ERR(dentry))
> status = PTR_ERR(dentry);
> -out_put:
> - if (!status)
> - dput(dentry);
> -out_unlock:
> - inode_unlock(d_inode(dir));
> +out_end:
> + end_creating(dentry, dir);
> +out:
> if (status == 0) {
> if (nn->in_grace)
> __nfsd4_create_reclaim_record_grace(clp, dname,
> diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
> index 8f71f5748c75..ee1b16e921fd 100644
> --- a/fs/nfsd/nfsproc.c
> +++ b/fs/nfsd/nfsproc.c
> @@ -306,18 +306,16 @@ nfsd_proc_create(struct svc_rqst *rqstp)
> goto done;
> }
>
> - inode_lock_nested(dirfhp->fh_dentry->d_inode, I_MUTEX_PARENT);
> - dchild = lookup_one(&nop_mnt_idmap, &QSTR_LEN(argp->name, argp->len),
> - dirfhp->fh_dentry);
> + dchild = start_creating(&nop_mnt_idmap, dirfhp->fh_dentry,
> + &QSTR_LEN(argp->name, argp->len));
> if (IS_ERR(dchild)) {
> resp->status = nfserrno(PTR_ERR(dchild));
> - goto out_unlock;
> + goto out_write;
> }
> fh_init(newfhp, NFS_FHSIZE);
> resp->status = fh_compose(newfhp, dirfhp->fh_export, dchild, dirfhp);
> if (!resp->status && d_really_is_negative(dchild))
> resp->status = nfserr_noent;
> - dput(dchild);
> if (resp->status) {
> if (resp->status != nfserr_noent)
> goto out_unlock;
> @@ -423,7 +421,8 @@ nfsd_proc_create(struct svc_rqst *rqstp)
> }
>
> out_unlock:
> - inode_unlock(dirfhp->fh_dentry->d_inode);
> + end_creating(dchild, dirfhp->fh_dentry);
> +out_write:
> fh_drop_write(dirfhp);
> done:
> fh_put(dirfhp);
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 9cb20d4aeab1..4efd3688e081 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1521,7 +1521,7 @@ nfsd_check_ignore_resizing(struct iattr *iap)
> iap->ia_valid &= ~ATTR_SIZE;
> }
>
> -/* The parent directory should already be locked: */
> +/* The parent directory should already be locked - we will unlock */
> __be32
> nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp,
> struct nfsd_attrs *attrs,
> @@ -1587,8 +1587,9 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp,
> err = nfsd_create_setattr(rqstp, fhp, resfhp, attrs);
>
> out:
> - if (!IS_ERR(dchild))
> - dput(dchild);
> + if (!err)
> + fh_fill_post_attrs(fhp);
> + end_creating(dchild, dentry);
> return err;
>
> out_nfserr:
> @@ -1626,28 +1627,26 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> if (host_err)
> return nfserrno(host_err);
>
> - inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT);
> - dchild = lookup_one(&nop_mnt_idmap, &QSTR_LEN(fname, flen), dentry);
> + dchild = start_creating(&nop_mnt_idmap, dentry, &QSTR_LEN(fname, flen));
> host_err = PTR_ERR(dchild);
> - if (IS_ERR(dchild)) {
> - err = nfserrno(host_err);
> - goto out_unlock;
> - }
> + if (IS_ERR(dchild))
> + return nfserrno(host_err);
> +
> err = fh_compose(resfhp, fhp->fh_export, dchild, fhp);
> /*
> * We unconditionally drop our ref to dchild as fh_compose will have
> * already grabbed its own ref for it.
> */
> - dput(dchild);
> if (err)
> goto out_unlock;
> err = fh_fill_pre_attrs(fhp);
> if (err != nfs_ok)
> goto out_unlock;
> err = nfsd_create_locked(rqstp, fhp, attrs, type, rdev, resfhp);
> - fh_fill_post_attrs(fhp);
> + return err;
> +
> out_unlock:
> - inode_unlock(dentry->d_inode);
> + end_creating(dchild, dentry);
> return err;
> }
>
> @@ -1733,11 +1732,9 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
> }
>
> dentry = fhp->fh_dentry;
> - inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT);
> - dnew = lookup_one(&nop_mnt_idmap, &QSTR_LEN(fname, flen), dentry);
> + dnew = start_creating(&nop_mnt_idmap, dentry, &QSTR_LEN(fname, flen));
> if (IS_ERR(dnew)) {
> err = nfserrno(PTR_ERR(dnew));
> - inode_unlock(dentry->d_inode);
> goto out_drop_write;
> }
> err = fh_fill_pre_attrs(fhp);
> @@ -1750,11 +1747,11 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp,
> nfsd_create_setattr(rqstp, fhp, resfhp, attrs);
> fh_fill_post_attrs(fhp);
> out_unlock:
> - inode_unlock(dentry->d_inode);
> + end_creating(dnew, dentry);
> if (!err)
> err = nfserrno(commit_metadata(fhp));
> - dput(dnew);
> - if (err==0) err = cerr;
> + if (!err)
> + err = cerr;
> out_drop_write:
> fh_drop_write(fhp);
> out:
> @@ -1809,32 +1806,31 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
>
> ddir = ffhp->fh_dentry;
> dirp = d_inode(ddir);
> - inode_lock_nested(dirp, I_MUTEX_PARENT);
> + dnew = start_creating(&nop_mnt_idmap, ddir, &QSTR_LEN(name, len));
>
> - dnew = lookup_one(&nop_mnt_idmap, &QSTR_LEN(name, len), ddir);
> if (IS_ERR(dnew)) {
> host_err = PTR_ERR(dnew);
> - goto out_unlock;
> + goto out_drop_write;
> }
>
> dold = tfhp->fh_dentry;
>
> err = nfserr_noent;
> if (d_really_is_negative(dold))
> - goto out_dput;
> + goto out_unlock;
> err = fh_fill_pre_attrs(ffhp);
> if (err != nfs_ok)
> - goto out_dput;
> + goto out_unlock;
> host_err = vfs_link(dold, &nop_mnt_idmap, dirp, dnew, NULL);
> fh_fill_post_attrs(ffhp);
> - inode_unlock(dirp);
> +out_unlock:
> + end_creating(dnew, ddir);
> if (!host_err) {
> host_err = commit_metadata(ffhp);
> if (!host_err)
> host_err = commit_metadata(tfhp);
> }
>
> - dput(dnew);
> out_drop_write:
> fh_drop_write(tfhp);
> if (host_err == -EBUSY) {
> @@ -1849,12 +1845,6 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
> }
> out:
> return err != nfs_ok ? err : nfserrno(host_err);
> -
> -out_dput:
> - dput(dnew);
> -out_unlock:
> - inode_unlock(dirp);
> - goto out_drop_write;
> }
>
> static void
> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> index aac7e34f56c1..7a31ca9bdea2 100644
> --- a/fs/overlayfs/copy_up.c
> +++ b/fs/overlayfs/copy_up.c
> @@ -613,9 +613,9 @@ static int ovl_link_up(struct ovl_copy_up_ctx *c)
> if (err)
> goto out;
>
> - inode_lock_nested(udir, I_MUTEX_PARENT);
> - upper = ovl_lookup_upper(ofs, c->dentry->d_name.name, upperdir,
> - c->dentry->d_name.len);
> + upper = ovl_start_creating_upper(ofs, upperdir,
> + &QSTR_LEN(c->dentry->d_name.name,
> + c->dentry->d_name.len));
> err = PTR_ERR(upper);
> if (!IS_ERR(upper)) {
> err = ovl_do_link(ofs, ovl_dentry_upper(c->dentry), udir, upper);
> @@ -626,9 +626,8 @@ static int ovl_link_up(struct ovl_copy_up_ctx *c)
> ovl_dentry_set_upper_alias(c->dentry);
> ovl_dentry_update_reval(c->dentry, upper);
> }
> - dput(upper);
> + end_creating(upper, upperdir);
> }
> - inode_unlock(udir);
> if (err)
> goto out;
>
> @@ -894,16 +893,14 @@ static int ovl_copy_up_tmpfile(struct ovl_copy_up_ctx *c)
> if (err)
> goto out;
>
> - inode_lock_nested(udir, I_MUTEX_PARENT);
> -
> - upper = ovl_lookup_upper(ofs, c->destname.name, c->destdir,
> - c->destname.len);
> + upper = ovl_start_creating_upper(ofs, c->destdir,
> + &QSTR_LEN(c->destname.name,
> + c->destname.len));
> err = PTR_ERR(upper);
> if (!IS_ERR(upper)) {
> err = ovl_do_link(ofs, temp, udir, upper);
> - dput(upper);
> + end_creating(upper, c->destdir);
> }
> - inode_unlock(udir);
>
> if (err)
> goto out;
> diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> index a5e9ddf3023b..a8a24abee6b3 100644
> --- a/fs/overlayfs/dir.c
> +++ b/fs/overlayfs/dir.c
> @@ -59,15 +59,21 @@ int ovl_cleanup(struct ovl_fs *ofs, struct dentry *workdir,
> return 0;
> }
>
> -struct dentry *ovl_lookup_temp(struct ovl_fs *ofs, struct dentry *workdir)
> +#define OVL_TEMPNAME_SIZE 20
> +static void ovl_tempname(char name[OVL_TEMPNAME_SIZE])
> {
> - struct dentry *temp;
> - char name[20];
> static atomic_t temp_id = ATOMIC_INIT(0);
>
> /* counter is allowed to wrap, since temp dentries are ephemeral */
> - snprintf(name, sizeof(name), "#%x", atomic_inc_return(&temp_id));
> + snprintf(name, OVL_TEMPNAME_SIZE, "#%x", atomic_inc_return(&temp_id));
> +}
> +
> +struct dentry *ovl_lookup_temp(struct ovl_fs *ofs, struct dentry *workdir)
> +{
> + struct dentry *temp;
> + char name[OVL_TEMPNAME_SIZE];
>
> + ovl_tempname(name);
> temp = ovl_lookup_upper(ofs, name, workdir, strlen(name));
> if (!IS_ERR(temp) && temp->d_inode) {
> pr_err("workdir/%s already exists\n", name);
> @@ -78,45 +84,49 @@ struct dentry *ovl_lookup_temp(struct ovl_fs *ofs, struct dentry *workdir)
> return temp;
> }
>
> +static struct dentry *ovl_start_creating_temp(struct ovl_fs *ofs,
> + struct dentry *workdir)
> +{
> + char name[OVL_TEMPNAME_SIZE];
> +
> + ovl_tempname(name);
> + return start_creating(ovl_upper_mnt_idmap(ofs), workdir,
> + &QSTR(name));
> +}
> +
> static struct dentry *ovl_whiteout(struct ovl_fs *ofs)
> {
> int err;
> - struct dentry *whiteout;
> + struct dentry *whiteout, *link;
> struct dentry *workdir = ofs->workdir;
> struct inode *wdir = workdir->d_inode;
>
> guard(mutex)(&ofs->whiteout_lock);
>
> if (!ofs->whiteout) {
> - inode_lock_nested(wdir, I_MUTEX_PARENT);
> - whiteout = ovl_lookup_temp(ofs, workdir);
> - if (!IS_ERR(whiteout)) {
> - err = ovl_do_whiteout(ofs, wdir, whiteout);
> - if (err) {
> - dput(whiteout);
> - whiteout = ERR_PTR(err);
> - }
> - }
> - inode_unlock(wdir);
> + whiteout = ovl_start_creating_temp(ofs, workdir);
> if (IS_ERR(whiteout))
> return whiteout;
> - ofs->whiteout = whiteout;
> + err = ovl_do_whiteout(ofs, wdir, whiteout);
> + if (!err)
> + ofs->whiteout = dget(whiteout);
> + end_creating(whiteout, workdir);
> + if (err)
> + return ERR_PTR(err);
> }
>
> if (!ofs->no_shared_whiteout) {
> - inode_lock_nested(wdir, I_MUTEX_PARENT);
> - whiteout = ovl_lookup_temp(ofs, workdir);
> - if (!IS_ERR(whiteout)) {
> - err = ovl_do_link(ofs, ofs->whiteout, wdir, whiteout);
> - if (err) {
> - dput(whiteout);
> - whiteout = ERR_PTR(err);
> - }
> - }
> - inode_unlock(wdir);
> - if (!IS_ERR(whiteout))
> - return whiteout;
> - if (PTR_ERR(whiteout) != -EMLINK) {
> + link = ovl_start_creating_temp(ofs, workdir);
> + if (IS_ERR(link))
> + return link;
> + err = ovl_do_link(ofs, ofs->whiteout, wdir, link);
> + if (!err)
> + whiteout = dget(link);
> + end_creating(link, workdir);
> + if (!err)
> + return whiteout;;
> +
> + if (err != -EMLINK) {
> pr_warn("Failed to link whiteout - disabling whiteout inode sharing(nlink=%u, err=%lu)\n",
> ofs->whiteout->d_inode->i_nlink,
> PTR_ERR(whiteout));
> @@ -252,10 +262,13 @@ struct dentry *ovl_create_temp(struct ovl_fs *ofs, struct dentry *workdir,
> struct ovl_cattr *attr)
> {
> struct dentry *ret;
> - inode_lock_nested(workdir->d_inode, I_MUTEX_PARENT);
> - ret = ovl_create_real(ofs, workdir,
> - ovl_lookup_temp(ofs, workdir), attr);
> - inode_unlock(workdir->d_inode);
> + ret = ovl_start_creating_temp(ofs, workdir);
> + if (IS_ERR(ret))
> + return ret;
> + ret = ovl_create_real(ofs, workdir, ret, attr);
> + if (!IS_ERR(ret))
> + dget(ret);
> + end_creating(ret, workdir);
> return ret;
> }
>
> @@ -354,18 +367,21 @@ static int ovl_create_upper(struct dentry *dentry, struct inode *inode,
> {
> struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
> struct dentry *upperdir = ovl_dentry_upper(dentry->d_parent);
> - struct inode *udir = upperdir->d_inode;
> struct dentry *newdentry;
> int err;
>
> - inode_lock_nested(udir, I_MUTEX_PARENT);
> - newdentry = ovl_create_real(ofs, upperdir,
> - ovl_lookup_upper(ofs, dentry->d_name.name,
> - upperdir, dentry->d_name.len),
> - attr);
> - inode_unlock(udir);
> + newdentry = ovl_start_creating_upper(ofs, upperdir,
> + &QSTR_LEN(dentry->d_name.name,
> + dentry->d_name.len));
> if (IS_ERR(newdentry))
> return PTR_ERR(newdentry);
> + newdentry = ovl_create_real(ofs, upperdir, newdentry, attr);
> + if (IS_ERR(newdentry)) {
> + end_creating(newdentry, upperdir);
> + return PTR_ERR(newdentry);
> + }
> + dget(newdentry);
> + end_creating(newdentry, upperdir);
>
> if (ovl_type_merge(dentry->d_parent) && d_is_dir(newdentry) &&
> !ovl_allow_offline_changes(ofs)) {
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index c8fd5951fc5e..beeba96cfcb2 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -415,6 +415,14 @@ static inline struct dentry *ovl_lookup_upper_unlocked(struct ovl_fs *ofs,
> &QSTR_LEN(name, len), base);
> }
>
> +static inline struct dentry *ovl_start_creating_upper(struct ovl_fs *ofs,
> + struct dentry *parent,
> + struct qstr *name)
> +{
> + return start_creating(ovl_upper_mnt_idmap(ofs),
> + parent, name);
> +}
> +
> static inline bool ovl_open_flags_need_copy_up(int flags)
> {
> if (!flags)
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index 43ee4c7296a7..6e0816c1147a 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -310,8 +310,7 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
> bool retried = false;
>
> retry:
> - inode_lock_nested(dir, I_MUTEX_PARENT);
> - work = ovl_lookup_upper(ofs, name, ofs->workbasedir, strlen(name));
> + work = ovl_start_creating_upper(ofs, ofs->workbasedir, &QSTR(name));
>
> if (!IS_ERR(work)) {
> struct iattr attr = {
> @@ -320,14 +319,13 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
> };
>
> if (work->d_inode) {
> + dget(work);
> + end_creating(work, ofs->workbasedir);
> + if (persist)
> + return work;
> err = -EEXIST;
> - inode_unlock(dir);
> if (retried)
> goto out_dput;
> -
> - if (persist)
> - return work;
> -
> retried = true;
> err = ovl_workdir_cleanup(ofs, ofs->workbasedir, mnt, work, 0);
> dput(work);
> @@ -338,7 +336,9 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
> }
>
> work = ovl_do_mkdir(ofs, dir, work, attr.ia_mode);
> - inode_unlock(dir);
> + if (!IS_ERR(work))
> + dget(work);
> + end_creating(work, ofs->workbasedir);
> err = PTR_ERR(work);
> if (IS_ERR(work))
> goto out_err;
> @@ -376,7 +376,6 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
> if (err)
> goto out_dput;
> } else {
> - inode_unlock(dir);
> err = PTR_ERR(work);
> goto out_err;
> }
> @@ -626,14 +625,17 @@ static struct dentry *ovl_lookup_or_create(struct ovl_fs *ofs,
> struct dentry *parent,
> const char *name, umode_t mode)
> {
> - size_t len = strlen(name);
> struct dentry *child;
>
> - inode_lock_nested(parent->d_inode, I_MUTEX_PARENT);
> - child = ovl_lookup_upper(ofs, name, parent, len);
> - if (!IS_ERR(child) && !child->d_inode)
> - child = ovl_create_real(ofs, parent, child, OVL_CATTR(mode));
> - inode_unlock(parent->d_inode);
> + child = ovl_start_creating_upper(ofs, parent, &QSTR(name));
> + if (!IS_ERR(child)) {
> + if (!child->d_inode)
> + child = ovl_create_real(ofs, parent, child,
> + OVL_CATTR(mode));
> + if (!IS_ERR(child))
> + dget(child);
> + end_creating(child, parent);
> + }
> dput(parent);
>
> return child;
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index fed86221c69c..3f92c1a16878 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -88,6 +88,39 @@ struct dentry *lookup_one_positive_killable(struct mnt_idmap *idmap,
> struct qstr *name,
> struct dentry *base);
>
> +struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *parent,
> + struct qstr *name);
> +
> +/**
> + * end_creating - finish action started with start_creating
> + * @child: dentry returned by start_creating() or vfs_mkdir()
> + * @parent: dentry given to start_creating(),
> + *
> + * Unlock and release the child.
> + *
> + * Unlike end_dirop() this can only be called if start_creating() succeeded.
> + * It handles @child being and error as vfs_mkdir() might have converted the
> + * dentry to an error - in that case the parent still needs to be unlocked.
> + *
> + * If vfs_mkdir() was called then the value returned from that function
> + * should be given for @child rather than the original dentry, as vfs_mkdir()
> + * may have provided a new dentry. Even if vfs_mkdir() returns an error
> + * it must be given to end_creating().
> + *
> + * If vfs_mkdir() was not called, then @child will be a valid dentry and
> + * @parent will be ignored.
> + */
> +static inline void end_creating(struct dentry *child, struct dentry *parent)
> +{
> + if (IS_ERR(child))
> + /* The parent is still locked despite the error from
> + * vfs_mkdir() - must unlock it.
> + */
> + inode_unlock(parent->d_inode);
> + else
> + end_dirop(child);
> +}
> +
> extern int follow_down_one(struct path *);
> extern int follow_down(struct path *path, unsigned int flags);
> extern int follow_up(struct path *);
> --
> 2.50.0.107.gf914562f5916.dirty
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 06/14] VFS: introduce start_creating_noperm() and start_removing_noperm()
2025-10-15 1:46 ` [PATCH v2 06/14] VFS: introduce start_creating_noperm() and start_removing_noperm() NeilBrown
@ 2025-10-19 10:15 ` Amir Goldstein
2025-10-22 3:20 ` NeilBrown
2025-10-20 8:36 ` kernel test robot
1 sibling, 1 reply; 32+ messages in thread
From: Amir Goldstein @ 2025-10-19 10:15 UTC (permalink / raw)
To: NeilBrown
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Wed, Oct 15, 2025 at 3:48 AM NeilBrown <neilb@ownmail.net> wrote:
>
> From: NeilBrown <neil@brown.name>
>
> xfs, fuse, ipc/mqueue need variants of start_creating or start_removing
> which do not check permissions.
> This patch adds _noperm versions of these functions.
>
> Note that do_mq_open() was only calling mntget() so it could call
> path_put() - it didn't really need an extra reference on the mnt.
> Now it doesn't call mntget() and uses end_creating() which does
> the dput() half of path_put().
>
> Signed-off-by: NeilBrown <neil@brown.name>
I noticed that both Jeff and I had already given our RVB on v1
and it's not here, so has this patch changed in some fundamental way since v1?
I could really use a "changed since v1" section when that happens.
Otherwise, feel free to add:
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> ---
> fs/fuse/dir.c | 19 +++++++---------
> fs/namei.c | 48 ++++++++++++++++++++++++++++++++++++++++
> fs/xfs/scrub/orphanage.c | 11 ++++-----
> include/linux/namei.h | 2 ++
> ipc/mqueue.c | 31 +++++++++-----------------
> 5 files changed, 73 insertions(+), 38 deletions(-)
>
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index ecaec0fea3a1..40ca94922349 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -1397,27 +1397,25 @@ int fuse_reverse_inval_entry(struct fuse_conn *fc, u64 parent_nodeid,
> if (!parent)
> return -ENOENT;
>
> - inode_lock_nested(parent, I_MUTEX_PARENT);
> if (!S_ISDIR(parent->i_mode))
> - goto unlock;
> + goto put_parent;
>
> err = -ENOENT;
> dir = d_find_alias(parent);
> if (!dir)
> - goto unlock;
> + goto put_parent;
>
> - name->hash = full_name_hash(dir, name->name, name->len);
> - entry = d_lookup(dir, name);
> + entry = start_removing_noperm(dir, name);
> dput(dir);
> - if (!entry)
> - goto unlock;
> + if (IS_ERR(entry))
> + goto put_parent;
>
> fuse_dir_changed(parent);
> if (!(flags & FUSE_EXPIRE_ONLY))
> d_invalidate(entry);
> fuse_invalidate_entry_cache(entry);
>
> - if (child_nodeid != 0 && d_really_is_positive(entry)) {
> + if (child_nodeid != 0) {
> inode_lock(d_inode(entry));
> if (get_node_id(d_inode(entry)) != child_nodeid) {
> err = -ENOENT;
> @@ -1445,10 +1443,9 @@ int fuse_reverse_inval_entry(struct fuse_conn *fc, u64 parent_nodeid,
> } else {
> err = 0;
> }
> - dput(entry);
>
> - unlock:
> - inode_unlock(parent);
> + end_removing(entry);
> + put_parent:
> iput(parent);
> return err;
> }
> diff --git a/fs/namei.c b/fs/namei.c
> index ae833dfa277c..696e4b794416 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3275,6 +3275,54 @@ struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *parent,
> }
> EXPORT_SYMBOL(start_removing);
>
> +/**
> + * start_creating_noperm - prepare to create a given name without permission checking
> + * @parent: directory in which to prepare to create the name
> + * @name: the name to be created
> + *
> + * Locks are taken and a lookup in performed prior to creating
> + * an object in a directory.
> + *
> + * If the name already exists, a positive dentry is returned.
> + *
> + * Returns: a negative or positive dentry, or an error.
> + */
> +struct dentry *start_creating_noperm(struct dentry *parent,
> + struct qstr *name)
> +{
> + int err = lookup_noperm_common(name, parent);
> +
> + if (err)
> + return ERR_PTR(err);
> + return start_dirop(parent, name, LOOKUP_CREATE);
> +}
> +EXPORT_SYMBOL(start_creating_noperm);
> +
> +/**
> + * start_removing_noperm - prepare to remove a given name without permission checking
> + * @parent: directory in which to find the name
> + * @name: the name to be removed
> + *
> + * Locks are taken and a lookup in performed prior to removing
> + * an object from a directory.
> + *
> + * If the name doesn't exist, an error is returned.
> + *
> + * end_removing() should be called when removal is complete, or aborted.
> + *
> + * Returns: a positive dentry, or an error.
> + */
> +struct dentry *start_removing_noperm(struct dentry *parent,
> + struct qstr *name)
> +{
> + int err = lookup_noperm_common(name, parent);
> +
> + if (err)
> + return ERR_PTR(err);
> + return start_dirop(parent, name, 0);
> +}
> +EXPORT_SYMBOL(start_removing_noperm);
> +
> #ifdef CONFIG_UNIX98_PTYS
> int path_pts(struct path *path)
> {
> diff --git a/fs/xfs/scrub/orphanage.c b/fs/xfs/scrub/orphanage.c
> index 9c12cb844231..e732605924a1 100644
> --- a/fs/xfs/scrub/orphanage.c
> +++ b/fs/xfs/scrub/orphanage.c
> @@ -152,11 +152,10 @@ xrep_orphanage_create(
> }
>
> /* Try to find the orphanage directory. */
> - inode_lock_nested(root_inode, I_MUTEX_PARENT);
> - orphanage_dentry = lookup_noperm(&QSTR(ORPHANAGE), root_dentry);
> + orphanage_dentry = start_creating_noperm(root_dentry, &QSTR(ORPHANAGE));
> if (IS_ERR(orphanage_dentry)) {
> error = PTR_ERR(orphanage_dentry);
> - goto out_unlock_root;
> + goto out_dput_root;
> }
>
> /*
> @@ -170,7 +169,7 @@ xrep_orphanage_create(
> orphanage_dentry, 0750);
> error = PTR_ERR(orphanage_dentry);
> if (IS_ERR(orphanage_dentry))
> - goto out_unlock_root;
> + goto out_dput_orphanage;
> }
>
> /* Not a directory? Bail out. */
> @@ -200,9 +199,7 @@ xrep_orphanage_create(
> sc->orphanage_ilock_flags = 0;
>
> out_dput_orphanage:
> - dput(orphanage_dentry);
> -out_unlock_root:
> - inode_unlock(VFS_I(sc->mp->m_rootip));
> + end_creating(orphanage_dentry, root_dentry);
> out_dput_root:
> dput(root_dentry);
> out:
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 9ee76e88f3dd..688e157d6afc 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -92,6 +92,8 @@ struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *parent,
> struct qstr *name);
> struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *parent,
> struct qstr *name);
> +struct dentry *start_creating_noperm(struct dentry *parent, struct qstr *name);
> +struct dentry *start_removing_noperm(struct dentry *parent, struct qstr *name);
>
> /**
> * end_creating - finish action started with start_creating
> diff --git a/ipc/mqueue.c b/ipc/mqueue.c
> index 093551fe66a7..060e8e9c4f59 100644
> --- a/ipc/mqueue.c
> +++ b/ipc/mqueue.c
> @@ -913,13 +913,11 @@ static int do_mq_open(const char __user *u_name, int oflag, umode_t mode,
> goto out_putname;
>
> ro = mnt_want_write(mnt); /* we'll drop it in any case */
> - inode_lock(d_inode(root));
> - path.dentry = lookup_noperm(&QSTR(name->name), root);
> + path.dentry = start_creating_noperm(root, &QSTR(name->name));
> if (IS_ERR(path.dentry)) {
> error = PTR_ERR(path.dentry);
> goto out_putfd;
> }
> - path.mnt = mntget(mnt);
> error = prepare_open(path.dentry, oflag, ro, mode, name, attr);
> if (!error) {
> struct file *file = dentry_open(&path, oflag, current_cred());
> @@ -928,13 +926,12 @@ static int do_mq_open(const char __user *u_name, int oflag, umode_t mode,
> else
> error = PTR_ERR(file);
> }
> - path_put(&path);
> out_putfd:
> if (error) {
> put_unused_fd(fd);
> fd = error;
> }
> - inode_unlock(d_inode(root));
> + end_creating(path.dentry, root);
> if (!ro)
> mnt_drop_write(mnt);
> out_putname:
> @@ -957,7 +954,7 @@ SYSCALL_DEFINE1(mq_unlink, const char __user *, u_name)
> int err;
> struct filename *name;
> struct dentry *dentry;
> - struct inode *inode = NULL;
> + struct inode *inode;
> struct ipc_namespace *ipc_ns = current->nsproxy->ipc_ns;
> struct vfsmount *mnt = ipc_ns->mq_mnt;
>
> @@ -969,26 +966,20 @@ SYSCALL_DEFINE1(mq_unlink, const char __user *, u_name)
> err = mnt_want_write(mnt);
> if (err)
> goto out_name;
> - inode_lock_nested(d_inode(mnt->mnt_root), I_MUTEX_PARENT);
> - dentry = lookup_noperm(&QSTR(name->name), mnt->mnt_root);
> + dentry = start_removing_noperm(mnt->mnt_root, &QSTR(name->name));
> if (IS_ERR(dentry)) {
> err = PTR_ERR(dentry);
> - goto out_unlock;
> + goto out_drop_write;
> }
>
> inode = d_inode(dentry);
> - if (!inode) {
> - err = -ENOENT;
> - } else {
> - ihold(inode);
> - err = vfs_unlink(&nop_mnt_idmap, d_inode(dentry->d_parent),
> - dentry, NULL);
> - }
> - dput(dentry);
> -
> -out_unlock:
> - inode_unlock(d_inode(mnt->mnt_root));
> + ihold(inode);
> + err = vfs_unlink(&nop_mnt_idmap, d_inode(dentry->d_parent),
> + dentry, NULL);
> + end_removing(dentry);
> iput(inode);
> +
> +out_drop_write:
> mnt_drop_write(mnt);
> out_name:
> putname(name);
> --
> 2.50.0.107.gf914562f5916.dirty
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 09/14] VFS/nfsd/ovl: introduce start_renaming() and end_renaming()
2025-10-15 1:47 ` [PATCH v2 09/14] VFS/nfsd/ovl: introduce start_renaming() and end_renaming() NeilBrown
@ 2025-10-19 10:25 ` Amir Goldstein
2025-10-19 10:33 ` Amir Goldstein
2025-10-22 3:35 ` NeilBrown
0 siblings, 2 replies; 32+ messages in thread
From: Amir Goldstein @ 2025-10-19 10:25 UTC (permalink / raw)
To: NeilBrown
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Wed, Oct 15, 2025 at 3:48 AM NeilBrown <neilb@ownmail.net> wrote:
>
> From: NeilBrown <neil@brown.name>
>
> start_renaming() combines name lookup and locking to prepare for rename.
> It is used when two names need to be looked up as in nfsd and overlayfs -
> cases where one or both dentrys are already available will be handled
> separately.
>
> __start_renaming() avoids the inode_permission check and hash
> calculation and is suitable after filename_parentat() in do_renameat2().
> It subsumes quite a bit of code from that function.
>
> start_renaming() does calculate the hash and check X permission and is
> suitable elsewhere:
> - nfsd_rename()
> - ovl_rename()
>
> Signed-off-by: NeilBrown <neil@brown.name>
Review comments from v1 not addressed:
https://lore.kernel.org/linux-fsdevel/CAOQ4uxh+NcAv9v6NtVRrLCMYbpd0ajtvsd6c9-W2a7+vur0UJQ@mail.gmail.com/
> ---
> fs/namei.c | 197 ++++++++++++++++++++++++++++-----------
> fs/nfsd/vfs.c | 73 +++++----------
> fs/overlayfs/dir.c | 72 ++++++--------
> fs/overlayfs/overlayfs.h | 14 +++
> include/linux/namei.h | 3 +
> 5 files changed, 214 insertions(+), 145 deletions(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 04d2819bd351..a2553df8f34e 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3667,6 +3667,129 @@ void unlock_rename(struct dentry *p1, struct dentry *p2)
> }
> EXPORT_SYMBOL(unlock_rename);
>
> +/**
> + * __start_renaming - lookup and lock names for rename
> + * @rd: rename data containing parent and flags, and
> + * for receiving found dentries
> + * @lookup_flags: extra flags to pass to ->lookup (e.g. LOOKUP_REVAL,
> + * LOOKUP_NO_SYMLINKS etc).
> + * @old_last: name of object in @rd.old_parent
> + * @new_last: name of object in @rd.new_parent
> + *
> + * Look up two names and ensure locks are in place for
> + * rename.
> + *
> + * On success the found dentrys are stored in @rd.old_dentry,
> + * @rd.new_dentry. These references and the lock are dropped by
> + * end_renaming().
> + *
> + * The passed in qstrs must have the hash calculated, and no permission
> + * checking is performed.
> + *
> + * Returns: zero or an error.
> + */
> +static int
> +__start_renaming(struct renamedata *rd, int lookup_flags,
> + struct qstr *old_last, struct qstr *new_last)
> +{
> + struct dentry *trap;
> + struct dentry *d1, *d2;
> + int target_flags = LOOKUP_RENAME_TARGET | LOOKUP_CREATE;
> + int err;
> +
> + if (rd->flags & RENAME_EXCHANGE)
> + target_flags = 0;
> + if (rd->flags & RENAME_NOREPLACE)
> + target_flags |= LOOKUP_EXCL;
> +
> + trap = lock_rename(rd->old_parent, rd->new_parent);
> + if (IS_ERR(trap))
> + return PTR_ERR(trap);
> +
> + d1 = lookup_one_qstr_excl(old_last, rd->old_parent,
> + lookup_flags);
> + if (IS_ERR(d1))
> + goto out_unlock_1;
> +
> + d2 = lookup_one_qstr_excl(new_last, rd->new_parent,
> + lookup_flags | target_flags);
> + if (IS_ERR(d2))
> + goto out_unlock_2;
> +
> + if (d1 == trap) {
> + /* source is an ancestor of target */
> + err = -EINVAL;
> + goto out_unlock_3;
> + }
> +
> + if (d2 == trap) {
> + /* target is an ancestor of source */
> + if (rd->flags & RENAME_EXCHANGE)
> + err = -EINVAL;
> + else
> + err = -ENOTEMPTY;
> + goto out_unlock_3;
> + }
> +
> + rd->old_dentry = d1;
> + rd->new_dentry = d2;
> + return 0;
> +
> +out_unlock_3:
> + dput(d2);
> + d2 = ERR_PTR(err);
> +out_unlock_2:
> + dput(d1);
> + d1 = d2;
> +out_unlock_1:
> + unlock_rename(rd->old_parent, rd->new_parent);
> + return PTR_ERR(d1);
> +}
> +
> +/**
> + * start_renaming - lookup and lock names for rename with permission checking
> + * @rd: rename data containing parent and flags, and
> + * for receiving found dentries
> + * @lookup_flags: extra flags to pass to ->lookup (e.g. LOOKUP_REVAL,
> + * LOOKUP_NO_SYMLINKS etc).
> + * @old_last: name of object in @rd.old_parent
> + * @new_last: name of object in @rd.new_parent
> + *
> + * Look up two names and ensure locks are in place for
> + * rename.
> + *
> + * On success the found dentrys are stored in @rd.old_dentry,
> + * @rd.new_dentry. These references and the lock are dropped by
> + * end_renaming().
> + *
> + * The passed in qstrs need not have the hash calculated, and basic
> + * eXecute permission checking is performed against @rd.mnt_idmap.
> + *
> + * Returns: zero or an error.
> + */
> +int start_renaming(struct renamedata *rd, int lookup_flags,
> + struct qstr *old_last, struct qstr *new_last)
> +{
> + int err;
> +
> + err = lookup_one_common(rd->mnt_idmap, old_last, rd->old_parent);
> + if (err)
> + return err;
> + err = lookup_one_common(rd->mnt_idmap, new_last, rd->new_parent);
> + if (err)
> + return err;
> + return __start_renaming(rd, lookup_flags, old_last, new_last);
> +}
> +EXPORT_SYMBOL(start_renaming);
> +
> +void end_renaming(struct renamedata *rd)
> +{
> + unlock_rename(rd->old_parent, rd->new_parent);
> + dput(rd->old_dentry);
> + dput(rd->new_dentry);
> +}
> +EXPORT_SYMBOL(end_renaming);
> +
> /**
> * vfs_prepare_mode - prepare the mode to be used for a new inode
> * @idmap: idmap of the mount the inode was found from
> @@ -5504,14 +5627,11 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
> struct filename *to, unsigned int flags)
> {
> struct renamedata rd;
> - struct dentry *old_dentry, *new_dentry;
> - struct dentry *trap;
> struct path old_path, new_path;
> struct qstr old_last, new_last;
> int old_type, new_type;
> struct inode *delegated_inode = NULL;
> - unsigned int lookup_flags = 0, target_flags =
> - LOOKUP_RENAME_TARGET | LOOKUP_CREATE;
> + unsigned int lookup_flags = 0;
> bool should_retry = false;
> int error = -EINVAL;
>
> @@ -5522,11 +5642,6 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
> (flags & RENAME_EXCHANGE))
> goto put_names;
>
> - if (flags & RENAME_EXCHANGE)
> - target_flags = 0;
> - if (flags & RENAME_NOREPLACE)
> - target_flags |= LOOKUP_EXCL;
> -
> retry:
> error = filename_parentat(olddfd, from, lookup_flags, &old_path,
> &old_last, &old_type);
> @@ -5556,66 +5671,40 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
> goto exit2;
>
> retry_deleg:
> - trap = lock_rename(new_path.dentry, old_path.dentry);
> - if (IS_ERR(trap)) {
> - error = PTR_ERR(trap);
> + rd.old_parent = old_path.dentry;
> + rd.mnt_idmap = mnt_idmap(old_path.mnt);
> + rd.new_parent = new_path.dentry;
> + rd.delegated_inode = &delegated_inode;
> + rd.flags = flags;
> +
> + error = __start_renaming(&rd, lookup_flags, &old_last, &new_last);
> + if (error)
> goto exit_lock_rename;
> - }
>
> - old_dentry = lookup_one_qstr_excl(&old_last, old_path.dentry,
> - lookup_flags);
> - error = PTR_ERR(old_dentry);
> - if (IS_ERR(old_dentry))
> - goto exit3;
> - new_dentry = lookup_one_qstr_excl(&new_last, new_path.dentry,
> - lookup_flags | target_flags);
> - error = PTR_ERR(new_dentry);
> - if (IS_ERR(new_dentry))
> - goto exit4;
> if (flags & RENAME_EXCHANGE) {
> - if (!d_is_dir(new_dentry)) {
> + if (!d_is_dir(rd.new_dentry)) {
> error = -ENOTDIR;
> if (new_last.name[new_last.len])
> - goto exit5;
> + goto exit_unlock;
> }
> }
> /* unless the source is a directory trailing slashes give -ENOTDIR */
> - if (!d_is_dir(old_dentry)) {
> + if (!d_is_dir(rd.old_dentry)) {
> error = -ENOTDIR;
> if (old_last.name[old_last.len])
> - goto exit5;
> + goto exit_unlock;
> if (!(flags & RENAME_EXCHANGE) && new_last.name[new_last.len])
> - goto exit5;
> - }
> - /* source should not be ancestor of target */
> - error = -EINVAL;
> - if (old_dentry == trap)
> - goto exit5;
> - /* target should not be an ancestor of source */
> - if (!(flags & RENAME_EXCHANGE))
> - error = -ENOTEMPTY;
> - if (new_dentry == trap)
> - goto exit5;
> + goto exit_unlock;
> + }
>
> - error = security_path_rename(&old_path, old_dentry,
> - &new_path, new_dentry, flags);
> + error = security_path_rename(&old_path, rd.old_dentry,
> + &new_path, rd.new_dentry, flags);
> if (error)
> - goto exit5;
> + goto exit_unlock;
>
> - rd.old_parent = old_path.dentry;
> - rd.old_dentry = old_dentry;
> - rd.mnt_idmap = mnt_idmap(old_path.mnt);
> - rd.new_parent = new_path.dentry;
> - rd.new_dentry = new_dentry;
> - rd.delegated_inode = &delegated_inode;
> - rd.flags = flags;
> error = vfs_rename(&rd);
> -exit5:
> - dput(new_dentry);
> -exit4:
> - dput(old_dentry);
> -exit3:
> - unlock_rename(new_path.dentry, old_path.dentry);
> +exit_unlock:
> + end_renaming(&rd);
> exit_lock_rename:
> if (delegated_inode) {
> error = break_deleg_wait(&delegated_inode);
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index cd64ffe12e0b..62109885d4db 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1885,11 +1885,12 @@ __be32
> nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
> struct svc_fh *tfhp, char *tname, int tlen)
> {
> - struct dentry *fdentry, *tdentry, *odentry, *ndentry, *trap;
> + struct dentry *fdentry, *tdentry;
> int type = S_IFDIR;
> + struct renamedata rd = {};
> __be32 err;
> int host_err;
> - bool close_cached = false;
> + struct dentry *close_cached;
>
> trace_nfsd_vfs_rename(rqstp, ffhp, tfhp, fname, flen, tname, tlen);
>
> @@ -1915,15 +1916,22 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
> goto out;
>
> retry:
> + close_cached = NULL;
> host_err = fh_want_write(ffhp);
> if (host_err) {
> err = nfserrno(host_err);
> goto out;
> }
>
> - trap = lock_rename(tdentry, fdentry);
> - if (IS_ERR(trap)) {
> - err = nfserr_xdev;
> + rd.mnt_idmap = &nop_mnt_idmap;
> + rd.old_parent = fdentry;
> + rd.new_parent = tdentry;
> +
> + host_err = start_renaming(&rd, 0, &QSTR_LEN(fname, flen),
> + &QSTR_LEN(tname, tlen));
> +
> + if (host_err) {
> + err = nfserrno(host_err);
> goto out_want_write;
> }
> err = fh_fill_pre_attrs(ffhp);
> @@ -1933,48 +1941,23 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
> if (err != nfs_ok)
> goto out_unlock;
>
> - odentry = lookup_one(&nop_mnt_idmap, &QSTR_LEN(fname, flen), fdentry);
> - host_err = PTR_ERR(odentry);
> - if (IS_ERR(odentry))
> - goto out_nfserr;
> + type = d_inode(rd.old_dentry)->i_mode & S_IFMT;
> +
> + if (d_inode(rd.new_dentry))
> + type = d_inode(rd.new_dentry)->i_mode & S_IFMT;
>
> - host_err = -ENOENT;
> - if (d_really_is_negative(odentry))
> - goto out_dput_old;
> - host_err = -EINVAL;
> - if (odentry == trap)
> - goto out_dput_old;
> - type = d_inode(odentry)->i_mode & S_IFMT;
> -
> - ndentry = lookup_one(&nop_mnt_idmap, &QSTR_LEN(tname, tlen), tdentry);
> - host_err = PTR_ERR(ndentry);
> - if (IS_ERR(ndentry))
> - goto out_dput_old;
> - if (d_inode(ndentry))
> - type = d_inode(ndentry)->i_mode & S_IFMT;
> - host_err = -ENOTEMPTY;
> - if (ndentry == trap)
> - goto out_dput_new;
> -
> - if ((ndentry->d_sb->s_export_op->flags & EXPORT_OP_CLOSE_BEFORE_UNLINK) &&
> - nfsd_has_cached_files(ndentry)) {
> - close_cached = true;
> - goto out_dput_old;
> + if ((rd.new_dentry->d_sb->s_export_op->flags & EXPORT_OP_CLOSE_BEFORE_UNLINK) &&
> + nfsd_has_cached_files(rd.new_dentry)) {
> + close_cached = dget(rd.new_dentry);
> + goto out_unlock;
> } else {
> - struct renamedata rd = {
> - .mnt_idmap = &nop_mnt_idmap,
> - .old_parent = fdentry,
> - .old_dentry = odentry,
> - .new_parent = tdentry,
> - .new_dentry = ndentry,
> - };
> int retries;
>
> for (retries = 1;;) {
> host_err = vfs_rename(&rd);
> if (host_err != -EAGAIN || !retries--)
> break;
> - if (!nfsd_wait_for_delegreturn(rqstp, d_inode(odentry)))
> + if (!nfsd_wait_for_delegreturn(rqstp, d_inode(rd.old_dentry)))
> break;
> }
> if (!host_err) {
> @@ -1983,11 +1966,6 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
> host_err = commit_metadata(ffhp);
> }
> }
> - out_dput_new:
> - dput(ndentry);
> - out_dput_old:
> - dput(odentry);
> - out_nfserr:
> if (host_err == -EBUSY) {
> /*
> * See RFC 8881 Section 18.26.4 para 1-3: NFSv4 RENAME
> @@ -2006,7 +1984,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
> fh_fill_post_attrs(tfhp);
> }
> out_unlock:
> - unlock_rename(tdentry, fdentry);
> + end_renaming(&rd);
> out_want_write:
> fh_drop_write(ffhp);
>
> @@ -2017,9 +1995,8 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
> * until this point and then reattempt the whole shebang.
> */
> if (close_cached) {
> - close_cached = false;
> - nfsd_close_cached_files(ndentry);
> - dput(ndentry);
> + nfsd_close_cached_files(close_cached);
> + dput(close_cached);
> goto retry;
> }
> out:
> diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> index c8d0885ee5e0..ded86855e91c 100644
> --- a/fs/overlayfs/dir.c
> +++ b/fs/overlayfs/dir.c
> @@ -1124,9 +1124,7 @@ static int ovl_rename(struct mnt_idmap *idmap, struct inode *olddir,
> int err;
> struct dentry *old_upperdir;
> struct dentry *new_upperdir;
> - struct dentry *olddentry = NULL;
> - struct dentry *newdentry = NULL;
> - struct dentry *trap, *de;
> + struct renamedata rd = {};
> bool old_opaque;
> bool new_opaque;
> bool cleanup_whiteout = false;
> @@ -1233,29 +1231,21 @@ static int ovl_rename(struct mnt_idmap *idmap, struct inode *olddir,
> }
> }
>
> - trap = lock_rename(new_upperdir, old_upperdir);
> - if (IS_ERR(trap)) {
> - err = PTR_ERR(trap);
> - goto out_revert_creds;
> - }
> + rd.mnt_idmap = ovl_upper_mnt_idmap(ofs);
> + rd.old_parent = old_upperdir;
> + rd.new_parent = new_upperdir;
> + rd.flags = flags;
>
> - de = ovl_lookup_upper(ofs, old->d_name.name, old_upperdir,
> - old->d_name.len);
> - err = PTR_ERR(de);
> - if (IS_ERR(de))
> - goto out_unlock;
> - olddentry = de;
> + err = start_renaming(&rd, 0,
> + &QSTR_LEN(old->d_name.name, old->d_name.len),
> + &QSTR_LEN(new->d_name.name, new->d_name.len));
>
> - err = -ESTALE;
> - if (!ovl_matches_upper(old, olddentry))
> - goto out_unlock;
> + if (err)
> + goto out_revert_creds;
>
> - de = ovl_lookup_upper(ofs, new->d_name.name, new_upperdir,
> - new->d_name.len);
> - err = PTR_ERR(de);
> - if (IS_ERR(de))
> + err = -ESTALE;
> + if (!ovl_matches_upper(old, rd.old_dentry))
> goto out_unlock;
> - newdentry = de;
>
> old_opaque = ovl_dentry_is_opaque(old);
> new_opaque = ovl_dentry_is_opaque(new);
> @@ -1263,15 +1253,15 @@ static int ovl_rename(struct mnt_idmap *idmap, struct inode *olddir,
> err = -ESTALE;
> if (d_inode(new) && ovl_dentry_upper(new)) {
> if (opaquedir) {
> - if (newdentry != opaquedir)
> + if (rd.new_dentry != opaquedir)
> goto out_unlock;
> } else {
> - if (!ovl_matches_upper(new, newdentry))
> + if (!ovl_matches_upper(new, rd.new_dentry))
> goto out_unlock;
> }
> } else {
> - if (!d_is_negative(newdentry)) {
> - if (!new_opaque || !ovl_upper_is_whiteout(ofs, newdentry))
> + if (!d_is_negative(rd.new_dentry)) {
> + if (!new_opaque || !ovl_upper_is_whiteout(ofs, rd.new_dentry))
> goto out_unlock;
> } else {
> if (flags & RENAME_EXCHANGE)
> @@ -1279,19 +1269,14 @@ static int ovl_rename(struct mnt_idmap *idmap, struct inode *olddir,
> }
> }
>
> - if (olddentry == trap)
> - goto out_unlock;
> - if (newdentry == trap)
> - goto out_unlock;
> -
> - if (olddentry->d_inode == newdentry->d_inode)
> + if (rd.old_dentry->d_inode == rd.new_dentry->d_inode)
> goto out_unlock;
>
> err = 0;
> if (ovl_type_merge_or_lower(old))
> err = ovl_set_redirect(old, samedir);
> else if (is_dir && !old_opaque && ovl_type_merge(new->d_parent))
> - err = ovl_set_opaque_xerr(old, olddentry, -EXDEV);
> + err = ovl_set_opaque_xerr(old, rd.old_dentry, -EXDEV);
> if (err)
> goto out_unlock;
>
> @@ -1299,19 +1284,22 @@ static int ovl_rename(struct mnt_idmap *idmap, struct inode *olddir,
> err = ovl_set_redirect(new, samedir);
> else if (!overwrite && new_is_dir && !new_opaque &&
> ovl_type_merge(old->d_parent))
> - err = ovl_set_opaque_xerr(new, newdentry, -EXDEV);
> + err = ovl_set_opaque_xerr(new, rd.new_dentry, -EXDEV);
> if (err)
> goto out_unlock;
>
> - err = ovl_do_rename(ofs, old_upperdir, olddentry,
> - new_upperdir, newdentry, flags);
> - unlock_rename(new_upperdir, old_upperdir);
> + err = ovl_do_rename_rd(&rd);
> +
> + dget(rd.new_dentry);
> + end_renaming(&rd);
> +
> + if (!err && cleanup_whiteout) {
> + ovl_cleanup(ofs, old_upperdir, rd.new_dentry);
> + }
> + dput(rd.new_dentry);
> if (err)
> goto out_revert_creds;
>
> - if (cleanup_whiteout)
> - ovl_cleanup(ofs, old_upperdir, newdentry);
> -
> if (overwrite && d_inode(new)) {
> if (new_is_dir)
> clear_nlink(d_inode(new));
> @@ -1336,14 +1324,12 @@ static int ovl_rename(struct mnt_idmap *idmap, struct inode *olddir,
> else
> ovl_drop_write(old);
> out:
> - dput(newdentry);
> - dput(olddentry);
> dput(opaquedir);
> ovl_cache_free(&list);
> return err;
>
> out_unlock:
> - unlock_rename(new_upperdir, old_upperdir);
> + end_renaming(&rd);
> goto out_revert_creds;
> }
>
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index 49ad65f829dc..aecb527e0524 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -378,6 +378,20 @@ static inline int ovl_do_rename(struct ovl_fs *ofs, struct dentry *olddir,
> return err;
> }
>
> +static inline int ovl_do_rename_rd(struct renamedata *rd)
> +{
> + int err;
> +
> + pr_debug("rename(%pd2, %pd2, 0x%x)\n", rd->old_dentry, rd->new_dentry,
> + rd->flags);
> + err = vfs_rename(rd);
> + if (err) {
> + pr_debug("...rename(%pd2, %pd2, ...) = %i\n",
> + rd->old_dentry, rd->new_dentry, err);
> + }
> + return err;
> +}
> +
> static inline int ovl_do_whiteout(struct ovl_fs *ofs,
> struct inode *dir, struct dentry *dentry)
> {
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index e5cff89679df..19c3d8e336d5 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -156,6 +156,9 @@ extern int follow_up(struct path *);
> extern struct dentry *lock_rename(struct dentry *, struct dentry *);
> extern struct dentry *lock_rename_child(struct dentry *, struct dentry *);
> extern void unlock_rename(struct dentry *, struct dentry *);
> +int start_renaming(struct renamedata *rd, int lookup_flags,
> + struct qstr *old_last, struct qstr *new_last);
> +void end_renaming(struct renamedata *rd);
>
> /**
> * mode_strip_umask - handle vfs umask stripping
> --
> 2.50.0.107.gf914562f5916.dirty
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 10/14] VFS/ovl/smb: introduce start_renaming_dentry()
2025-10-15 1:47 ` [PATCH v2 10/14] VFS/ovl/smb: introduce start_renaming_dentry() NeilBrown
@ 2025-10-19 10:31 ` Amir Goldstein
0 siblings, 0 replies; 32+ messages in thread
From: Amir Goldstein @ 2025-10-19 10:31 UTC (permalink / raw)
To: NeilBrown
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Wed, Oct 15, 2025 at 3:48 AM NeilBrown <neilb@ownmail.net> wrote:
>
> From: NeilBrown <neil@brown.name>
>
> Several callers perform a rename on a dentry they already have, and only
> require lookup for the target name. This includes smb/server and a few
> different places in overlayfs.
>
> start_renaming_dentry() performs the required lookup and takes the
> required lock using lock_rename_child()
>
> It is used in three places in overlayfs and in ksmbd_vfs_rename().
>
> In the ksmbd case, the parent of the source is not important - the
> source must be renamed from wherever it is. So start_renaming_dentry()
> allows rd->old_parent to be NULL and only checks it if it is non-NULL.
> On success rd->old_parent will be the parent of old_dentry with an extra
> reference taken. Other start_renaming function also now take the extra
> reference and end_renaming() now drops this reference as well.
>
> ovl_lookup_temp(), ovl_parent_lock(), and ovl_parent_unlock() are
> all removed as they are no longer needed.
>
> OVL_TEMPNAME_SIZE and ovl_tempname() are now declared in overlayfs.h so
> that ovl_check_rename_whiteout() can access them.
>
> ovl_copy_up_workdir() now always cleans up on error.
>
> Signed-off-by: NeilBrown <neil@brown.name>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> ---
> fs/namei.c | 108 ++++++++++++++++++++++++++++++++++++---
> fs/overlayfs/copy_up.c | 54 +++++++++-----------
> fs/overlayfs/dir.c | 19 +------
> fs/overlayfs/overlayfs.h | 8 +--
> fs/overlayfs/super.c | 22 ++++----
> fs/overlayfs/util.c | 11 ----
> fs/smb/server/vfs.c | 60 ++++------------------
> include/linux/namei.h | 2 +
> 8 files changed, 150 insertions(+), 134 deletions(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index a2553df8f34e..4e694b82e309 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3669,7 +3669,7 @@ EXPORT_SYMBOL(unlock_rename);
>
> /**
> * __start_renaming - lookup and lock names for rename
> - * @rd: rename data containing parent and flags, and
> + * @rd: rename data containing parents and flags, and
> * for receiving found dentries
> * @lookup_flags: extra flags to pass to ->lookup (e.g. LOOKUP_REVAL,
> * LOOKUP_NO_SYMLINKS etc).
> @@ -3680,8 +3680,8 @@ EXPORT_SYMBOL(unlock_rename);
> * rename.
> *
> * On success the found dentrys are stored in @rd.old_dentry,
> - * @rd.new_dentry. These references and the lock are dropped by
> - * end_renaming().
> + * @rd.new_dentry and an extra ref is taken on @rd.old_parent.
> + * These references and the lock are dropped by end_renaming().
> *
> * The passed in qstrs must have the hash calculated, and no permission
> * checking is performed.
> @@ -3733,6 +3733,7 @@ __start_renaming(struct renamedata *rd, int lookup_flags,
>
> rd->old_dentry = d1;
> rd->new_dentry = d2;
> + dget(rd->old_parent);
> return 0;
>
> out_unlock_3:
> @@ -3748,7 +3749,7 @@ __start_renaming(struct renamedata *rd, int lookup_flags,
>
> /**
> * start_renaming - lookup and lock names for rename with permission checking
> - * @rd: rename data containing parent and flags, and
> + * @rd: rename data containing parents and flags, and
> * for receiving found dentries
> * @lookup_flags: extra flags to pass to ->lookup (e.g. LOOKUP_REVAL,
> * LOOKUP_NO_SYMLINKS etc).
> @@ -3759,8 +3760,8 @@ __start_renaming(struct renamedata *rd, int lookup_flags,
> * rename.
> *
> * On success the found dentrys are stored in @rd.old_dentry,
> - * @rd.new_dentry. These references and the lock are dropped by
> - * end_renaming().
> + * @rd.new_dentry. Also the refcount on @rd->old_parent is increased.
> + * These references and the lock are dropped by end_renaming().
> *
> * The passed in qstrs need not have the hash calculated, and basic
> * eXecute permission checking is performed against @rd.mnt_idmap.
> @@ -3782,11 +3783,106 @@ int start_renaming(struct renamedata *rd, int lookup_flags,
> }
> EXPORT_SYMBOL(start_renaming);
>
> +static int
> +__start_renaming_dentry(struct renamedata *rd, int lookup_flags,
> + struct dentry *old_dentry, struct qstr *new_last)
> +{
> + struct dentry *trap;
> + struct dentry *d2;
> + int target_flags = LOOKUP_RENAME_TARGET | LOOKUP_CREATE;
> + int err;
> +
> + if (rd->flags & RENAME_EXCHANGE)
> + target_flags = 0;
> + if (rd->flags & RENAME_NOREPLACE)
> + target_flags |= LOOKUP_EXCL;
> +
> + /* Already have the dentry - need to be sure to lock the correct parent */
> + trap = lock_rename_child(old_dentry, rd->new_parent);
> + if (IS_ERR(trap))
> + return PTR_ERR(trap);
> + if (d_unhashed(old_dentry) ||
> + (rd->old_parent && rd->old_parent != old_dentry->d_parent)) {
> + /* dentry was removed, or moved and explicit parent requested */
> + err = -EINVAL;
> + goto out_unlock;
> + }
> +
> + d2 = lookup_one_qstr_excl(new_last, rd->new_parent,
> + lookup_flags | target_flags);
> + err = PTR_ERR(d2);
> + if (IS_ERR(d2))
> + goto out_unlock;
> +
> + if (old_dentry == trap) {
> + /* source is an ancestor of target */
> + err = -EINVAL;
> + goto out_dput_d2;
> + }
> +
> + if (d2 == trap) {
> + /* target is an ancestor of source */
> + if (rd->flags & RENAME_EXCHANGE)
> + err = -EINVAL;
> + else
> + err = -ENOTEMPTY;
> + goto out_dput_d2;
> + }
> +
> + rd->old_dentry = dget(old_dentry);
> + rd->new_dentry = d2;
> + rd->old_parent = dget(old_dentry->d_parent);
> + return 0;
> +
> +out_dput_d2:
> + dput(d2);
> +out_unlock:
> + unlock_rename(old_dentry->d_parent, rd->new_parent);
> + return err;
> +}
> +
> +/**
> + * start_renaming_dentry - lookup and lock name for rename with permission checking
> + * @rd: rename data containing parents and flags, and
> + * for receiving found dentries
> + * @lookup_flags: extra flags to pass to ->lookup (e.g. LOOKUP_REVAL,
> + * LOOKUP_NO_SYMLINKS etc).
> + * @old_dentry: dentry of name to move
> + * @new_last: name of target in @rd.new_parent
> + *
> + * Look up target name and ensure locks are in place for
> + * rename.
> + *
> + * On success the found dentry is stored in @rd.new_dentry and
> + * @rd.old_parent is confirmed to be the parent of @old_dentry. If it
> + * was originally %NULL, it is set. In either case a reference is taken
> + * so that end_renaming() can have a stable reference to unlock.
> + *
> + * References and the lock can be dropped with end_renaming()
> + *
> + * The passed in qstr need not have the hash calculated, and basic
> + * eXecute permission checking is performed against @rd.mnt_idmap.
> + *
> + * Returns: zero or an error.
> + */
> +int start_renaming_dentry(struct renamedata *rd, int lookup_flags,
> + struct dentry *old_dentry, struct qstr *new_last)
> +{
> + int err;
> +
> + err = lookup_one_common(rd->mnt_idmap, new_last, rd->new_parent);
> + if (err)
> + return err;
> + return __start_renaming_dentry(rd, lookup_flags, old_dentry, new_last);
> +}
> +EXPORT_SYMBOL(start_renaming_dentry);
> +
> void end_renaming(struct renamedata *rd)
> {
> unlock_rename(rd->old_parent, rd->new_parent);
> dput(rd->old_dentry);
> dput(rd->new_dentry);
> + dput(rd->old_parent);
> }
> EXPORT_SYMBOL(end_renaming);
>
> diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
> index 7a31ca9bdea2..27014ada11c7 100644
> --- a/fs/overlayfs/copy_up.c
> +++ b/fs/overlayfs/copy_up.c
> @@ -523,8 +523,8 @@ static int ovl_create_index(struct dentry *dentry, const struct ovl_fh *fh,
> {
> struct ovl_fs *ofs = OVL_FS(dentry->d_sb);
> struct dentry *indexdir = ovl_indexdir(dentry->d_sb);
> - struct dentry *index = NULL;
> struct dentry *temp = NULL;
> + struct renamedata rd = {};
> struct qstr name = { };
> int err;
>
> @@ -556,17 +556,15 @@ static int ovl_create_index(struct dentry *dentry, const struct ovl_fh *fh,
> if (err)
> goto out;
>
> - err = ovl_parent_lock(indexdir, temp);
> + rd.mnt_idmap = ovl_upper_mnt_idmap(ofs);
> + rd.old_parent = indexdir;
> + rd.new_parent = indexdir;
> + err = start_renaming_dentry(&rd, 0, temp, &name);
> if (err)
> goto out;
> - index = ovl_lookup_upper(ofs, name.name, indexdir, name.len);
> - if (IS_ERR(index)) {
> - err = PTR_ERR(index);
> - } else {
> - err = ovl_do_rename(ofs, indexdir, temp, indexdir, index, 0);
> - dput(index);
> - }
> - ovl_parent_unlock(indexdir);
> +
> + err = ovl_do_rename_rd(&rd);
> + end_renaming(&rd);
> out:
> if (err)
> ovl_cleanup(ofs, indexdir, temp);
> @@ -763,7 +761,8 @@ static int ovl_copy_up_workdir(struct ovl_copy_up_ctx *c)
> struct ovl_fs *ofs = OVL_FS(c->dentry->d_sb);
> struct inode *inode;
> struct path path = { .mnt = ovl_upper_mnt(ofs) };
> - struct dentry *temp, *upper, *trap;
> + struct renamedata rd = {};
> + struct dentry *temp;
> struct ovl_cu_creds cc;
> int err;
> struct ovl_cattr cattr = {
> @@ -807,29 +806,24 @@ static int ovl_copy_up_workdir(struct ovl_copy_up_ctx *c)
> * ovl_copy_up_data(), so lock workdir and destdir and make sure that
> * temp wasn't moved before copy up completion or cleanup.
> */
> - trap = lock_rename(c->workdir, c->destdir);
> - if (trap || temp->d_parent != c->workdir) {
> - /* temp or workdir moved underneath us? abort without cleanup */
> - dput(temp);
> + rd.mnt_idmap = ovl_upper_mnt_idmap(ofs);
> + rd.old_parent = c->workdir;
> + rd.new_parent = c->destdir;
> + rd.flags = 0;
> + err = start_renaming_dentry(&rd, 0, temp,
> + &QSTR_LEN(c->destname.name, c->destname.len));
> + if (err) {
> + /* temp or workdir moved underneath us? map to -EIO */
> err = -EIO;
> - if (!IS_ERR(trap))
> - unlock_rename(c->workdir, c->destdir);
> - goto out;
> }
> -
> - err = ovl_copy_up_metadata(c, temp);
> if (err)
> - goto cleanup;
> + goto cleanup_unlocked;
>
> - upper = ovl_lookup_upper(ofs, c->destname.name, c->destdir,
> - c->destname.len);
> - err = PTR_ERR(upper);
> - if (IS_ERR(upper))
> - goto cleanup;
> + err = ovl_copy_up_metadata(c, temp);
> + if (!err)
> + err = ovl_do_rename_rd(&rd);
> + end_renaming(&rd);
>
> - err = ovl_do_rename(ofs, c->workdir, temp, c->destdir, upper, 0);
> - unlock_rename(c->workdir, c->destdir);
> - dput(upper);
> if (err)
> goto cleanup_unlocked;
>
> @@ -850,8 +844,6 @@ static int ovl_copy_up_workdir(struct ovl_copy_up_ctx *c)
>
> return err;
>
> -cleanup:
> - unlock_rename(c->workdir, c->destdir);
> cleanup_unlocked:
> ovl_cleanup(ofs, c->workdir, temp);
> dput(temp);
> diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> index ded86855e91c..6367cebdbd48 100644
> --- a/fs/overlayfs/dir.c
> +++ b/fs/overlayfs/dir.c
> @@ -57,8 +57,7 @@ int ovl_cleanup(struct ovl_fs *ofs, struct dentry *workdir,
> return 0;
> }
>
> -#define OVL_TEMPNAME_SIZE 20
> -static void ovl_tempname(char name[OVL_TEMPNAME_SIZE])
> +void ovl_tempname(char name[OVL_TEMPNAME_SIZE])
> {
> static atomic_t temp_id = ATOMIC_INIT(0);
>
> @@ -66,22 +65,6 @@ static void ovl_tempname(char name[OVL_TEMPNAME_SIZE])
> snprintf(name, OVL_TEMPNAME_SIZE, "#%x", atomic_inc_return(&temp_id));
> }
>
> -struct dentry *ovl_lookup_temp(struct ovl_fs *ofs, struct dentry *workdir)
> -{
> - struct dentry *temp;
> - char name[OVL_TEMPNAME_SIZE];
> -
> - ovl_tempname(name);
> - temp = ovl_lookup_upper(ofs, name, workdir, strlen(name));
> - if (!IS_ERR(temp) && temp->d_inode) {
> - pr_err("workdir/%s already exists\n", name);
> - dput(temp);
> - temp = ERR_PTR(-EIO);
> - }
> -
> - return temp;
> -}
> -
> static struct dentry *ovl_start_creating_temp(struct ovl_fs *ofs,
> struct dentry *workdir)
> {
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index aecb527e0524..a9ecab16dba6 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -454,11 +454,6 @@ static inline bool ovl_open_flags_need_copy_up(int flags)
> }
>
> /* util.c */
> -int ovl_parent_lock(struct dentry *parent, struct dentry *child);
> -static inline void ovl_parent_unlock(struct dentry *parent)
> -{
> - inode_unlock(parent->d_inode);
> -}
> int ovl_get_write_access(struct dentry *dentry);
> void ovl_put_write_access(struct dentry *dentry);
> void ovl_start_write(struct dentry *dentry);
> @@ -895,7 +890,8 @@ struct dentry *ovl_create_real(struct ovl_fs *ofs,
> struct dentry *parent, struct dentry *newdentry,
> struct ovl_cattr *attr);
> int ovl_cleanup(struct ovl_fs *ofs, struct dentry *workdir, struct dentry *dentry);
> -struct dentry *ovl_lookup_temp(struct ovl_fs *ofs, struct dentry *workdir);
> +#define OVL_TEMPNAME_SIZE 20
> +void ovl_tempname(char name[OVL_TEMPNAME_SIZE]);
> struct dentry *ovl_create_temp(struct ovl_fs *ofs, struct dentry *workdir,
> struct ovl_cattr *attr);
>
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index 6e0816c1147a..a721ef2b90e8 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -566,9 +566,10 @@ static int ovl_check_rename_whiteout(struct ovl_fs *ofs)
> {
> struct dentry *workdir = ofs->workdir;
> struct dentry *temp;
> - struct dentry *dest;
> struct dentry *whiteout;
> struct name_snapshot name;
> + struct renamedata rd = {};
> + char name2[OVL_TEMPNAME_SIZE];
> int err;
>
> temp = ovl_create_temp(ofs, workdir, OVL_CATTR(S_IFREG | 0));
> @@ -576,23 +577,21 @@ static int ovl_check_rename_whiteout(struct ovl_fs *ofs)
> if (IS_ERR(temp))
> return err;
>
> - err = ovl_parent_lock(workdir, temp);
> + rd.mnt_idmap = ovl_upper_mnt_idmap(ofs);
> + rd.old_parent = workdir;
> + rd.new_parent = workdir;
> + rd.flags = RENAME_WHITEOUT;
> + ovl_tempname(name2);
> + err = start_renaming_dentry(&rd, 0, temp, &QSTR(name2));
> if (err) {
> dput(temp);
> return err;
> }
> - dest = ovl_lookup_temp(ofs, workdir);
> - err = PTR_ERR(dest);
> - if (IS_ERR(dest)) {
> - dput(temp);
> - ovl_parent_unlock(workdir);
> - return err;
> - }
>
> /* Name is inline and stable - using snapshot as a copy helper */
> take_dentry_name_snapshot(&name, temp);
> - err = ovl_do_rename(ofs, workdir, temp, workdir, dest, RENAME_WHITEOUT);
> - ovl_parent_unlock(workdir);
> + err = ovl_do_rename_rd(&rd);
> + end_renaming(&rd);
> if (err) {
> if (err == -EINVAL)
> err = 0;
> @@ -616,7 +615,6 @@ static int ovl_check_rename_whiteout(struct ovl_fs *ofs)
> ovl_cleanup(ofs, workdir, temp);
> release_dentry_name_snapshot(&name);
> dput(temp);
> - dput(dest);
>
> return err;
> }
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index f76672f2e686..46387aeb6be6 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -1548,14 +1548,3 @@ void ovl_copyattr(struct inode *inode)
> i_size_write(inode, i_size_read(realinode));
> spin_unlock(&inode->i_lock);
> }
> -
> -int ovl_parent_lock(struct dentry *parent, struct dentry *child)
> -{
> - inode_lock_nested(parent->d_inode, I_MUTEX_PARENT);
> - if (!child ||
> - (!d_unhashed(child) && child->d_parent == parent))
> - return 0;
> -
> - inode_unlock(parent->d_inode);
> - return -EINVAL;
> -}
> diff --git a/fs/smb/server/vfs.c b/fs/smb/server/vfs.c
> index 7c4ddc43ab39..f54b5b0aaba2 100644
> --- a/fs/smb/server/vfs.c
> +++ b/fs/smb/server/vfs.c
> @@ -663,7 +663,6 @@ int ksmbd_vfs_link(struct ksmbd_work *work, const char *oldname,
> int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path,
> char *newname, int flags)
> {
> - struct dentry *old_parent, *new_dentry, *trap;
> struct dentry *old_child = old_path->dentry;
> struct path new_path;
> struct qstr new_last;
> @@ -673,7 +672,6 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path,
> struct ksmbd_file *parent_fp;
> int new_type;
> int err, lookup_flags = LOOKUP_NO_SYMLINKS;
> - int target_lookup_flags = LOOKUP_RENAME_TARGET | LOOKUP_CREATE;
>
> if (ksmbd_override_fsids(work))
> return -ENOMEM;
> @@ -684,14 +682,6 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path,
> goto revert_fsids;
> }
>
> - /*
> - * explicitly handle file overwrite case, for compatibility with
> - * filesystems that may not support rename flags (e.g: fuse)
> - */
> - if (flags & RENAME_NOREPLACE)
> - target_lookup_flags |= LOOKUP_EXCL;
> - flags &= ~(RENAME_NOREPLACE);
> -
> retry:
> err = vfs_path_parent_lookup(to, lookup_flags | LOOKUP_BENEATH,
> &new_path, &new_last, &new_type,
> @@ -708,17 +698,14 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path,
> if (err)
> goto out2;
>
> - trap = lock_rename_child(old_child, new_path.dentry);
> - if (IS_ERR(trap)) {
> - err = PTR_ERR(trap);
> + rd.mnt_idmap = mnt_idmap(old_path->mnt);
> + rd.old_parent = NULL;
> + rd.new_parent = new_path.dentry;
> + rd.flags = flags;
> + rd.delegated_inode = NULL,
> + err = start_renaming_dentry(&rd, lookup_flags, old_child, &new_last);
> + if (err)
> goto out_drop_write;
> - }
> -
> - old_parent = dget(old_child->d_parent);
> - if (d_unhashed(old_child)) {
> - err = -EINVAL;
> - goto out3;
> - }
>
> parent_fp = ksmbd_lookup_fd_inode(old_child->d_parent);
> if (parent_fp) {
> @@ -731,44 +718,17 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const struct path *old_path,
> ksmbd_fd_put(work, parent_fp);
> }
>
> - new_dentry = lookup_one_qstr_excl(&new_last, new_path.dentry,
> - lookup_flags | target_lookup_flags);
> - if (IS_ERR(new_dentry)) {
> - err = PTR_ERR(new_dentry);
> - goto out3;
> - }
> -
> - if (d_is_symlink(new_dentry)) {
> + if (d_is_symlink(rd.new_dentry)) {
> err = -EACCES;
> - goto out4;
> - }
> -
> - if (old_child == trap) {
> - err = -EINVAL;
> - goto out4;
> - }
> -
> - if (new_dentry == trap) {
> - err = -ENOTEMPTY;
> - goto out4;
> + goto out3;
> }
>
> - rd.mnt_idmap = mnt_idmap(old_path->mnt),
> - rd.old_parent = old_parent,
> - rd.old_dentry = old_child,
> - rd.new_parent = new_path.dentry,
> - rd.new_dentry = new_dentry,
> - rd.flags = flags,
> - rd.delegated_inode = NULL,
> err = vfs_rename(&rd);
> if (err)
> ksmbd_debug(VFS, "vfs_rename failed err %d\n", err);
>
> -out4:
> - dput(new_dentry);
> out3:
> - dput(old_parent);
> - unlock_rename(old_parent, new_path.dentry);
> + end_renaming(&rd);
> out_drop_write:
> mnt_drop_write(old_path->mnt);
> out2:
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 19c3d8e336d5..f73001e3719a 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -158,6 +158,8 @@ extern struct dentry *lock_rename_child(struct dentry *, struct dentry *);
> extern void unlock_rename(struct dentry *, struct dentry *);
> int start_renaming(struct renamedata *rd, int lookup_flags,
> struct qstr *old_last, struct qstr *new_last);
> +int start_renaming_dentry(struct renamedata *rd, int lookup_flags,
> + struct dentry *old_dentry, struct qstr *new_last);
> void end_renaming(struct renamedata *rd);
>
> /**
> --
> 2.50.0.107.gf914562f5916.dirty
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 09/14] VFS/nfsd/ovl: introduce start_renaming() and end_renaming()
2025-10-19 10:25 ` Amir Goldstein
@ 2025-10-19 10:33 ` Amir Goldstein
2025-10-21 13:25 ` Christian Brauner
2025-10-22 3:35 ` NeilBrown
1 sibling, 1 reply; 32+ messages in thread
From: Amir Goldstein @ 2025-10-19 10:33 UTC (permalink / raw)
To: NeilBrown
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Sun, Oct 19, 2025 at 12:25 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Wed, Oct 15, 2025 at 3:48 AM NeilBrown <neilb@ownmail.net> wrote:
> >
> > From: NeilBrown <neil@brown.name>
> >
> > start_renaming() combines name lookup and locking to prepare for rename.
> > It is used when two names need to be looked up as in nfsd and overlayfs -
> > cases where one or both dentrys are already available will be handled
> > separately.
> >
> > __start_renaming() avoids the inode_permission check and hash
> > calculation and is suitable after filename_parentat() in do_renameat2().
> > It subsumes quite a bit of code from that function.
> >
> > start_renaming() does calculate the hash and check X permission and is
> > suitable elsewhere:
> > - nfsd_rename()
> > - ovl_rename()
> >
> > Signed-off-by: NeilBrown <neil@brown.name>
>
> Review comments from v1 not addressed:
> https://lore.kernel.org/linux-fsdevel/CAOQ4uxh+NcAv9v6NtVRrLCMYbpd0ajtvsd6c9-W2a7+vur0UJQ@mail.gmail.com/
>
Obviously, I am more attached to my comments on the overlayfs
changes. since you have not replied to those, you might have missed them...
Thanks,
Amir.
> > ---
> > fs/namei.c | 197 ++++++++++++++++++++++++++++-----------
> > fs/nfsd/vfs.c | 73 +++++----------
> > fs/overlayfs/dir.c | 72 ++++++--------
> > fs/overlayfs/overlayfs.h | 14 +++
> > include/linux/namei.h | 3 +
> > 5 files changed, 214 insertions(+), 145 deletions(-)
> >
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 04d2819bd351..a2553df8f34e 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -3667,6 +3667,129 @@ void unlock_rename(struct dentry *p1, struct dentry *p2)
> > }
> > EXPORT_SYMBOL(unlock_rename);
> >
> > +/**
> > + * __start_renaming - lookup and lock names for rename
> > + * @rd: rename data containing parent and flags, and
> > + * for receiving found dentries
> > + * @lookup_flags: extra flags to pass to ->lookup (e.g. LOOKUP_REVAL,
> > + * LOOKUP_NO_SYMLINKS etc).
> > + * @old_last: name of object in @rd.old_parent
> > + * @new_last: name of object in @rd.new_parent
> > + *
> > + * Look up two names and ensure locks are in place for
> > + * rename.
> > + *
> > + * On success the found dentrys are stored in @rd.old_dentry,
> > + * @rd.new_dentry. These references and the lock are dropped by
> > + * end_renaming().
> > + *
> > + * The passed in qstrs must have the hash calculated, and no permission
> > + * checking is performed.
> > + *
> > + * Returns: zero or an error.
> > + */
> > +static int
> > +__start_renaming(struct renamedata *rd, int lookup_flags,
> > + struct qstr *old_last, struct qstr *new_last)
> > +{
> > + struct dentry *trap;
> > + struct dentry *d1, *d2;
> > + int target_flags = LOOKUP_RENAME_TARGET | LOOKUP_CREATE;
> > + int err;
> > +
> > + if (rd->flags & RENAME_EXCHANGE)
> > + target_flags = 0;
> > + if (rd->flags & RENAME_NOREPLACE)
> > + target_flags |= LOOKUP_EXCL;
> > +
> > + trap = lock_rename(rd->old_parent, rd->new_parent);
> > + if (IS_ERR(trap))
> > + return PTR_ERR(trap);
> > +
> > + d1 = lookup_one_qstr_excl(old_last, rd->old_parent,
> > + lookup_flags);
> > + if (IS_ERR(d1))
> > + goto out_unlock_1;
> > +
> > + d2 = lookup_one_qstr_excl(new_last, rd->new_parent,
> > + lookup_flags | target_flags);
> > + if (IS_ERR(d2))
> > + goto out_unlock_2;
> > +
> > + if (d1 == trap) {
> > + /* source is an ancestor of target */
> > + err = -EINVAL;
> > + goto out_unlock_3;
> > + }
> > +
> > + if (d2 == trap) {
> > + /* target is an ancestor of source */
> > + if (rd->flags & RENAME_EXCHANGE)
> > + err = -EINVAL;
> > + else
> > + err = -ENOTEMPTY;
> > + goto out_unlock_3;
> > + }
> > +
> > + rd->old_dentry = d1;
> > + rd->new_dentry = d2;
> > + return 0;
> > +
> > +out_unlock_3:
> > + dput(d2);
> > + d2 = ERR_PTR(err);
> > +out_unlock_2:
> > + dput(d1);
> > + d1 = d2;
> > +out_unlock_1:
> > + unlock_rename(rd->old_parent, rd->new_parent);
> > + return PTR_ERR(d1);
> > +}
> > +
> > +/**
> > + * start_renaming - lookup and lock names for rename with permission checking
> > + * @rd: rename data containing parent and flags, and
> > + * for receiving found dentries
> > + * @lookup_flags: extra flags to pass to ->lookup (e.g. LOOKUP_REVAL,
> > + * LOOKUP_NO_SYMLINKS etc).
> > + * @old_last: name of object in @rd.old_parent
> > + * @new_last: name of object in @rd.new_parent
> > + *
> > + * Look up two names and ensure locks are in place for
> > + * rename.
> > + *
> > + * On success the found dentrys are stored in @rd.old_dentry,
> > + * @rd.new_dentry. These references and the lock are dropped by
> > + * end_renaming().
> > + *
> > + * The passed in qstrs need not have the hash calculated, and basic
> > + * eXecute permission checking is performed against @rd.mnt_idmap.
> > + *
> > + * Returns: zero or an error.
> > + */
> > +int start_renaming(struct renamedata *rd, int lookup_flags,
> > + struct qstr *old_last, struct qstr *new_last)
> > +{
> > + int err;
> > +
> > + err = lookup_one_common(rd->mnt_idmap, old_last, rd->old_parent);
> > + if (err)
> > + return err;
> > + err = lookup_one_common(rd->mnt_idmap, new_last, rd->new_parent);
> > + if (err)
> > + return err;
> > + return __start_renaming(rd, lookup_flags, old_last, new_last);
> > +}
> > +EXPORT_SYMBOL(start_renaming);
> > +
> > +void end_renaming(struct renamedata *rd)
> > +{
> > + unlock_rename(rd->old_parent, rd->new_parent);
> > + dput(rd->old_dentry);
> > + dput(rd->new_dentry);
> > +}
> > +EXPORT_SYMBOL(end_renaming);
> > +
> > /**
> > * vfs_prepare_mode - prepare the mode to be used for a new inode
> > * @idmap: idmap of the mount the inode was found from
> > @@ -5504,14 +5627,11 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
> > struct filename *to, unsigned int flags)
> > {
> > struct renamedata rd;
> > - struct dentry *old_dentry, *new_dentry;
> > - struct dentry *trap;
> > struct path old_path, new_path;
> > struct qstr old_last, new_last;
> > int old_type, new_type;
> > struct inode *delegated_inode = NULL;
> > - unsigned int lookup_flags = 0, target_flags =
> > - LOOKUP_RENAME_TARGET | LOOKUP_CREATE;
> > + unsigned int lookup_flags = 0;
> > bool should_retry = false;
> > int error = -EINVAL;
> >
> > @@ -5522,11 +5642,6 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
> > (flags & RENAME_EXCHANGE))
> > goto put_names;
> >
> > - if (flags & RENAME_EXCHANGE)
> > - target_flags = 0;
> > - if (flags & RENAME_NOREPLACE)
> > - target_flags |= LOOKUP_EXCL;
> > -
> > retry:
> > error = filename_parentat(olddfd, from, lookup_flags, &old_path,
> > &old_last, &old_type);
> > @@ -5556,66 +5671,40 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd,
> > goto exit2;
> >
> > retry_deleg:
> > - trap = lock_rename(new_path.dentry, old_path.dentry);
> > - if (IS_ERR(trap)) {
> > - error = PTR_ERR(trap);
> > + rd.old_parent = old_path.dentry;
> > + rd.mnt_idmap = mnt_idmap(old_path.mnt);
> > + rd.new_parent = new_path.dentry;
> > + rd.delegated_inode = &delegated_inode;
> > + rd.flags = flags;
> > +
> > + error = __start_renaming(&rd, lookup_flags, &old_last, &new_last);
> > + if (error)
> > goto exit_lock_rename;
> > - }
> >
> > - old_dentry = lookup_one_qstr_excl(&old_last, old_path.dentry,
> > - lookup_flags);
> > - error = PTR_ERR(old_dentry);
> > - if (IS_ERR(old_dentry))
> > - goto exit3;
> > - new_dentry = lookup_one_qstr_excl(&new_last, new_path.dentry,
> > - lookup_flags | target_flags);
> > - error = PTR_ERR(new_dentry);
> > - if (IS_ERR(new_dentry))
> > - goto exit4;
> > if (flags & RENAME_EXCHANGE) {
> > - if (!d_is_dir(new_dentry)) {
> > + if (!d_is_dir(rd.new_dentry)) {
> > error = -ENOTDIR;
> > if (new_last.name[new_last.len])
> > - goto exit5;
> > + goto exit_unlock;
> > }
> > }
> > /* unless the source is a directory trailing slashes give -ENOTDIR */
> > - if (!d_is_dir(old_dentry)) {
> > + if (!d_is_dir(rd.old_dentry)) {
> > error = -ENOTDIR;
> > if (old_last.name[old_last.len])
> > - goto exit5;
> > + goto exit_unlock;
> > if (!(flags & RENAME_EXCHANGE) && new_last.name[new_last.len])
> > - goto exit5;
> > - }
> > - /* source should not be ancestor of target */
> > - error = -EINVAL;
> > - if (old_dentry == trap)
> > - goto exit5;
> > - /* target should not be an ancestor of source */
> > - if (!(flags & RENAME_EXCHANGE))
> > - error = -ENOTEMPTY;
> > - if (new_dentry == trap)
> > - goto exit5;
> > + goto exit_unlock;
> > + }
> >
> > - error = security_path_rename(&old_path, old_dentry,
> > - &new_path, new_dentry, flags);
> > + error = security_path_rename(&old_path, rd.old_dentry,
> > + &new_path, rd.new_dentry, flags);
> > if (error)
> > - goto exit5;
> > + goto exit_unlock;
> >
> > - rd.old_parent = old_path.dentry;
> > - rd.old_dentry = old_dentry;
> > - rd.mnt_idmap = mnt_idmap(old_path.mnt);
> > - rd.new_parent = new_path.dentry;
> > - rd.new_dentry = new_dentry;
> > - rd.delegated_inode = &delegated_inode;
> > - rd.flags = flags;
> > error = vfs_rename(&rd);
> > -exit5:
> > - dput(new_dentry);
> > -exit4:
> > - dput(old_dentry);
> > -exit3:
> > - unlock_rename(new_path.dentry, old_path.dentry);
> > +exit_unlock:
> > + end_renaming(&rd);
> > exit_lock_rename:
> > if (delegated_inode) {
> > error = break_deleg_wait(&delegated_inode);
> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index cd64ffe12e0b..62109885d4db 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -1885,11 +1885,12 @@ __be32
> > nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
> > struct svc_fh *tfhp, char *tname, int tlen)
> > {
> > - struct dentry *fdentry, *tdentry, *odentry, *ndentry, *trap;
> > + struct dentry *fdentry, *tdentry;
> > int type = S_IFDIR;
> > + struct renamedata rd = {};
> > __be32 err;
> > int host_err;
> > - bool close_cached = false;
> > + struct dentry *close_cached;
> >
> > trace_nfsd_vfs_rename(rqstp, ffhp, tfhp, fname, flen, tname, tlen);
> >
> > @@ -1915,15 +1916,22 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
> > goto out;
> >
> > retry:
> > + close_cached = NULL;
> > host_err = fh_want_write(ffhp);
> > if (host_err) {
> > err = nfserrno(host_err);
> > goto out;
> > }
> >
> > - trap = lock_rename(tdentry, fdentry);
> > - if (IS_ERR(trap)) {
> > - err = nfserr_xdev;
> > + rd.mnt_idmap = &nop_mnt_idmap;
> > + rd.old_parent = fdentry;
> > + rd.new_parent = tdentry;
> > +
> > + host_err = start_renaming(&rd, 0, &QSTR_LEN(fname, flen),
> > + &QSTR_LEN(tname, tlen));
> > +
> > + if (host_err) {
> > + err = nfserrno(host_err);
> > goto out_want_write;
> > }
> > err = fh_fill_pre_attrs(ffhp);
> > @@ -1933,48 +1941,23 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
> > if (err != nfs_ok)
> > goto out_unlock;
> >
> > - odentry = lookup_one(&nop_mnt_idmap, &QSTR_LEN(fname, flen), fdentry);
> > - host_err = PTR_ERR(odentry);
> > - if (IS_ERR(odentry))
> > - goto out_nfserr;
> > + type = d_inode(rd.old_dentry)->i_mode & S_IFMT;
> > +
> > + if (d_inode(rd.new_dentry))
> > + type = d_inode(rd.new_dentry)->i_mode & S_IFMT;
> >
> > - host_err = -ENOENT;
> > - if (d_really_is_negative(odentry))
> > - goto out_dput_old;
> > - host_err = -EINVAL;
> > - if (odentry == trap)
> > - goto out_dput_old;
> > - type = d_inode(odentry)->i_mode & S_IFMT;
> > -
> > - ndentry = lookup_one(&nop_mnt_idmap, &QSTR_LEN(tname, tlen), tdentry);
> > - host_err = PTR_ERR(ndentry);
> > - if (IS_ERR(ndentry))
> > - goto out_dput_old;
> > - if (d_inode(ndentry))
> > - type = d_inode(ndentry)->i_mode & S_IFMT;
> > - host_err = -ENOTEMPTY;
> > - if (ndentry == trap)
> > - goto out_dput_new;
> > -
> > - if ((ndentry->d_sb->s_export_op->flags & EXPORT_OP_CLOSE_BEFORE_UNLINK) &&
> > - nfsd_has_cached_files(ndentry)) {
> > - close_cached = true;
> > - goto out_dput_old;
> > + if ((rd.new_dentry->d_sb->s_export_op->flags & EXPORT_OP_CLOSE_BEFORE_UNLINK) &&
> > + nfsd_has_cached_files(rd.new_dentry)) {
> > + close_cached = dget(rd.new_dentry);
> > + goto out_unlock;
> > } else {
> > - struct renamedata rd = {
> > - .mnt_idmap = &nop_mnt_idmap,
> > - .old_parent = fdentry,
> > - .old_dentry = odentry,
> > - .new_parent = tdentry,
> > - .new_dentry = ndentry,
> > - };
> > int retries;
> >
> > for (retries = 1;;) {
> > host_err = vfs_rename(&rd);
> > if (host_err != -EAGAIN || !retries--)
> > break;
> > - if (!nfsd_wait_for_delegreturn(rqstp, d_inode(odentry)))
> > + if (!nfsd_wait_for_delegreturn(rqstp, d_inode(rd.old_dentry)))
> > break;
> > }
> > if (!host_err) {
> > @@ -1983,11 +1966,6 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
> > host_err = commit_metadata(ffhp);
> > }
> > }
> > - out_dput_new:
> > - dput(ndentry);
> > - out_dput_old:
> > - dput(odentry);
> > - out_nfserr:
> > if (host_err == -EBUSY) {
> > /*
> > * See RFC 8881 Section 18.26.4 para 1-3: NFSv4 RENAME
> > @@ -2006,7 +1984,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
> > fh_fill_post_attrs(tfhp);
> > }
> > out_unlock:
> > - unlock_rename(tdentry, fdentry);
> > + end_renaming(&rd);
> > out_want_write:
> > fh_drop_write(ffhp);
> >
> > @@ -2017,9 +1995,8 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
> > * until this point and then reattempt the whole shebang.
> > */
> > if (close_cached) {
> > - close_cached = false;
> > - nfsd_close_cached_files(ndentry);
> > - dput(ndentry);
> > + nfsd_close_cached_files(close_cached);
> > + dput(close_cached);
> > goto retry;
> > }
> > out:
> > diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> > index c8d0885ee5e0..ded86855e91c 100644
> > --- a/fs/overlayfs/dir.c
> > +++ b/fs/overlayfs/dir.c
> > @@ -1124,9 +1124,7 @@ static int ovl_rename(struct mnt_idmap *idmap, struct inode *olddir,
> > int err;
> > struct dentry *old_upperdir;
> > struct dentry *new_upperdir;
> > - struct dentry *olddentry = NULL;
> > - struct dentry *newdentry = NULL;
> > - struct dentry *trap, *de;
> > + struct renamedata rd = {};
> > bool old_opaque;
> > bool new_opaque;
> > bool cleanup_whiteout = false;
> > @@ -1233,29 +1231,21 @@ static int ovl_rename(struct mnt_idmap *idmap, struct inode *olddir,
> > }
> > }
> >
> > - trap = lock_rename(new_upperdir, old_upperdir);
> > - if (IS_ERR(trap)) {
> > - err = PTR_ERR(trap);
> > - goto out_revert_creds;
> > - }
> > + rd.mnt_idmap = ovl_upper_mnt_idmap(ofs);
> > + rd.old_parent = old_upperdir;
> > + rd.new_parent = new_upperdir;
> > + rd.flags = flags;
> >
> > - de = ovl_lookup_upper(ofs, old->d_name.name, old_upperdir,
> > - old->d_name.len);
> > - err = PTR_ERR(de);
> > - if (IS_ERR(de))
> > - goto out_unlock;
> > - olddentry = de;
> > + err = start_renaming(&rd, 0,
> > + &QSTR_LEN(old->d_name.name, old->d_name.len),
> > + &QSTR_LEN(new->d_name.name, new->d_name.len));
> >
> > - err = -ESTALE;
> > - if (!ovl_matches_upper(old, olddentry))
> > - goto out_unlock;
> > + if (err)
> > + goto out_revert_creds;
> >
> > - de = ovl_lookup_upper(ofs, new->d_name.name, new_upperdir,
> > - new->d_name.len);
> > - err = PTR_ERR(de);
> > - if (IS_ERR(de))
> > + err = -ESTALE;
> > + if (!ovl_matches_upper(old, rd.old_dentry))
> > goto out_unlock;
> > - newdentry = de;
> >
> > old_opaque = ovl_dentry_is_opaque(old);
> > new_opaque = ovl_dentry_is_opaque(new);
> > @@ -1263,15 +1253,15 @@ static int ovl_rename(struct mnt_idmap *idmap, struct inode *olddir,
> > err = -ESTALE;
> > if (d_inode(new) && ovl_dentry_upper(new)) {
> > if (opaquedir) {
> > - if (newdentry != opaquedir)
> > + if (rd.new_dentry != opaquedir)
> > goto out_unlock;
> > } else {
> > - if (!ovl_matches_upper(new, newdentry))
> > + if (!ovl_matches_upper(new, rd.new_dentry))
> > goto out_unlock;
> > }
> > } else {
> > - if (!d_is_negative(newdentry)) {
> > - if (!new_opaque || !ovl_upper_is_whiteout(ofs, newdentry))
> > + if (!d_is_negative(rd.new_dentry)) {
> > + if (!new_opaque || !ovl_upper_is_whiteout(ofs, rd.new_dentry))
> > goto out_unlock;
> > } else {
> > if (flags & RENAME_EXCHANGE)
> > @@ -1279,19 +1269,14 @@ static int ovl_rename(struct mnt_idmap *idmap, struct inode *olddir,
> > }
> > }
> >
> > - if (olddentry == trap)
> > - goto out_unlock;
> > - if (newdentry == trap)
> > - goto out_unlock;
> > -
> > - if (olddentry->d_inode == newdentry->d_inode)
> > + if (rd.old_dentry->d_inode == rd.new_dentry->d_inode)
> > goto out_unlock;
> >
> > err = 0;
> > if (ovl_type_merge_or_lower(old))
> > err = ovl_set_redirect(old, samedir);
> > else if (is_dir && !old_opaque && ovl_type_merge(new->d_parent))
> > - err = ovl_set_opaque_xerr(old, olddentry, -EXDEV);
> > + err = ovl_set_opaque_xerr(old, rd.old_dentry, -EXDEV);
> > if (err)
> > goto out_unlock;
> >
> > @@ -1299,19 +1284,22 @@ static int ovl_rename(struct mnt_idmap *idmap, struct inode *olddir,
> > err = ovl_set_redirect(new, samedir);
> > else if (!overwrite && new_is_dir && !new_opaque &&
> > ovl_type_merge(old->d_parent))
> > - err = ovl_set_opaque_xerr(new, newdentry, -EXDEV);
> > + err = ovl_set_opaque_xerr(new, rd.new_dentry, -EXDEV);
> > if (err)
> > goto out_unlock;
> >
> > - err = ovl_do_rename(ofs, old_upperdir, olddentry,
> > - new_upperdir, newdentry, flags);
> > - unlock_rename(new_upperdir, old_upperdir);
> > + err = ovl_do_rename_rd(&rd);
> > +
> > + dget(rd.new_dentry);
> > + end_renaming(&rd);
> > +
> > + if (!err && cleanup_whiteout) {
> > + ovl_cleanup(ofs, old_upperdir, rd.new_dentry);
> > + }
> > + dput(rd.new_dentry);
> > if (err)
> > goto out_revert_creds;
> >
> > - if (cleanup_whiteout)
> > - ovl_cleanup(ofs, old_upperdir, newdentry);
> > -
> > if (overwrite && d_inode(new)) {
> > if (new_is_dir)
> > clear_nlink(d_inode(new));
> > @@ -1336,14 +1324,12 @@ static int ovl_rename(struct mnt_idmap *idmap, struct inode *olddir,
> > else
> > ovl_drop_write(old);
> > out:
> > - dput(newdentry);
> > - dput(olddentry);
> > dput(opaquedir);
> > ovl_cache_free(&list);
> > return err;
> >
> > out_unlock:
> > - unlock_rename(new_upperdir, old_upperdir);
> > + end_renaming(&rd);
> > goto out_revert_creds;
> > }
> >
> > diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> > index 49ad65f829dc..aecb527e0524 100644
> > --- a/fs/overlayfs/overlayfs.h
> > +++ b/fs/overlayfs/overlayfs.h
> > @@ -378,6 +378,20 @@ static inline int ovl_do_rename(struct ovl_fs *ofs, struct dentry *olddir,
> > return err;
> > }
> >
> > +static inline int ovl_do_rename_rd(struct renamedata *rd)
> > +{
> > + int err;
> > +
> > + pr_debug("rename(%pd2, %pd2, 0x%x)\n", rd->old_dentry, rd->new_dentry,
> > + rd->flags);
> > + err = vfs_rename(rd);
> > + if (err) {
> > + pr_debug("...rename(%pd2, %pd2, ...) = %i\n",
> > + rd->old_dentry, rd->new_dentry, err);
> > + }
> > + return err;
> > +}
> > +
> > static inline int ovl_do_whiteout(struct ovl_fs *ofs,
> > struct inode *dir, struct dentry *dentry)
> > {
> > diff --git a/include/linux/namei.h b/include/linux/namei.h
> > index e5cff89679df..19c3d8e336d5 100644
> > --- a/include/linux/namei.h
> > +++ b/include/linux/namei.h
> > @@ -156,6 +156,9 @@ extern int follow_up(struct path *);
> > extern struct dentry *lock_rename(struct dentry *, struct dentry *);
> > extern struct dentry *lock_rename_child(struct dentry *, struct dentry *);
> > extern void unlock_rename(struct dentry *, struct dentry *);
> > +int start_renaming(struct renamedata *rd, int lookup_flags,
> > + struct qstr *old_last, struct qstr *new_last);
> > +void end_renaming(struct renamedata *rd);
> >
> > /**
> > * mode_strip_umask - handle vfs umask stripping
> > --
> > 2.50.0.107.gf914562f5916.dirty
> >
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 12/14] ecryptfs: use new start_creating/start_removing APIs
2025-10-15 1:47 ` [PATCH v2 12/14] ecryptfs: use new start_creating/start_removing APIs NeilBrown
@ 2025-10-19 10:38 ` Amir Goldstein
0 siblings, 0 replies; 32+ messages in thread
From: Amir Goldstein @ 2025-10-19 10:38 UTC (permalink / raw)
To: NeilBrown
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Wed, Oct 15, 2025 at 3:49 AM NeilBrown <neilb@ownmail.net> wrote:
>
> From: NeilBrown <neil@brown.name>
>
> This requires the addition of start_creating_dentry() which is given the
> dentry which has already been found, and asks for it to be locked and
> its parent validated.
>
> Signed-off-by: NeilBrown <neil@brown.name>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> ---
> fs/ecryptfs/inode.c | 153 ++++++++++++++++++++----------------------
> fs/namei.c | 33 +++++++++
> include/linux/namei.h | 2 +
> 3 files changed, 107 insertions(+), 81 deletions(-)
>
> diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
> index ed1394da8d6b..b3702105d236 100644
> --- a/fs/ecryptfs/inode.c
> +++ b/fs/ecryptfs/inode.c
> @@ -24,18 +24,26 @@
> #include <linux/unaligned.h>
> #include "ecryptfs_kernel.h"
>
> -static int lock_parent(struct dentry *dentry,
> - struct dentry **lower_dentry,
> - struct inode **lower_dir)
> +static struct dentry *ecryptfs_start_creating_dentry(struct dentry *dentry)
> {
> - struct dentry *lower_dir_dentry;
> + struct dentry *parent = dget_parent(dentry->d_parent);
> + struct dentry *ret;
>
> - lower_dir_dentry = ecryptfs_dentry_to_lower(dentry->d_parent);
> - *lower_dir = d_inode(lower_dir_dentry);
> - *lower_dentry = ecryptfs_dentry_to_lower(dentry);
> + ret = start_creating_dentry(ecryptfs_dentry_to_lower(parent),
> + ecryptfs_dentry_to_lower(dentry));
> + dput(parent);
> + return ret;
> +}
>
> - inode_lock_nested(*lower_dir, I_MUTEX_PARENT);
> - return (*lower_dentry)->d_parent == lower_dir_dentry ? 0 : -EINVAL;
> +static struct dentry *ecryptfs_start_removing_dentry(struct dentry *dentry)
> +{
> + struct dentry *parent = dget_parent(dentry->d_parent);
> + struct dentry *ret;
> +
> + ret = start_removing_dentry(ecryptfs_dentry_to_lower(parent),
> + ecryptfs_dentry_to_lower(dentry));
> + dput(parent);
> + return ret;
> }
>
> static int ecryptfs_inode_test(struct inode *inode, void *lower_inode)
> @@ -141,15 +149,12 @@ static int ecryptfs_do_unlink(struct inode *dir, struct dentry *dentry,
> struct inode *lower_dir;
> int rc;
>
> - rc = lock_parent(dentry, &lower_dentry, &lower_dir);
> - dget(lower_dentry); // don't even try to make the lower negative
> - if (!rc) {
> - if (d_unhashed(lower_dentry))
> - rc = -EINVAL;
> - else
> - rc = vfs_unlink(&nop_mnt_idmap, lower_dir, lower_dentry,
> - NULL);
> - }
> + lower_dentry = ecryptfs_start_removing_dentry(dentry);
> + if (IS_ERR(lower_dentry))
> + return PTR_ERR(lower_dentry);
> +
> + lower_dir = lower_dentry->d_parent->d_inode;
> + rc = vfs_unlink(&nop_mnt_idmap, lower_dir, lower_dentry, NULL);
> if (rc) {
> printk(KERN_ERR "Error in vfs_unlink; rc = [%d]\n", rc);
> goto out_unlock;
> @@ -158,8 +163,7 @@ static int ecryptfs_do_unlink(struct inode *dir, struct dentry *dentry,
> set_nlink(inode, ecryptfs_inode_to_lower(inode)->i_nlink);
> inode_set_ctime_to_ts(inode, inode_get_ctime(dir));
> out_unlock:
> - dput(lower_dentry);
> - inode_unlock(lower_dir);
> + end_removing(lower_dentry);
> if (!rc)
> d_drop(dentry);
> return rc;
> @@ -186,10 +190,12 @@ ecryptfs_do_create(struct inode *directory_inode,
> struct inode *lower_dir;
> struct inode *inode;
>
> - rc = lock_parent(ecryptfs_dentry, &lower_dentry, &lower_dir);
> - if (!rc)
> - rc = vfs_create(&nop_mnt_idmap, lower_dir,
> - lower_dentry, mode, true);
> + lower_dentry = ecryptfs_start_creating_dentry(ecryptfs_dentry);
> + if (IS_ERR(lower_dentry))
> + return ERR_CAST(lower_dentry);
> + lower_dir = lower_dentry->d_parent->d_inode;
> + rc = vfs_create(&nop_mnt_idmap, lower_dir,
> + lower_dentry, mode, true);
> if (rc) {
> printk(KERN_ERR "%s: Failure to create dentry in lower fs; "
> "rc = [%d]\n", __func__, rc);
> @@ -205,7 +211,7 @@ ecryptfs_do_create(struct inode *directory_inode,
> fsstack_copy_attr_times(directory_inode, lower_dir);
> fsstack_copy_inode_size(directory_inode, lower_dir);
> out_lock:
> - inode_unlock(lower_dir);
> + end_creating(lower_dentry, NULL);
> return inode;
> }
>
> @@ -433,10 +439,12 @@ static int ecryptfs_link(struct dentry *old_dentry, struct inode *dir,
>
> file_size_save = i_size_read(d_inode(old_dentry));
> lower_old_dentry = ecryptfs_dentry_to_lower(old_dentry);
> - rc = lock_parent(new_dentry, &lower_new_dentry, &lower_dir);
> - if (!rc)
> - rc = vfs_link(lower_old_dentry, &nop_mnt_idmap, lower_dir,
> - lower_new_dentry, NULL);
> + lower_new_dentry = ecryptfs_start_creating_dentry(new_dentry);
> + if (IS_ERR(lower_new_dentry))
> + return PTR_ERR(lower_new_dentry);
> + lower_dir = lower_new_dentry->d_parent->d_inode;
> + rc = vfs_link(lower_old_dentry, &nop_mnt_idmap, lower_dir,
> + lower_new_dentry, NULL);
> if (rc || d_really_is_negative(lower_new_dentry))
> goto out_lock;
> rc = ecryptfs_interpose(lower_new_dentry, new_dentry, dir->i_sb);
> @@ -448,7 +456,7 @@ static int ecryptfs_link(struct dentry *old_dentry, struct inode *dir,
> ecryptfs_inode_to_lower(d_inode(old_dentry))->i_nlink);
> i_size_write(d_inode(new_dentry), file_size_save);
> out_lock:
> - inode_unlock(lower_dir);
> + end_creating(lower_new_dentry, NULL);
> return rc;
> }
>
> @@ -468,9 +476,11 @@ static int ecryptfs_symlink(struct mnt_idmap *idmap,
> size_t encoded_symlen;
> struct ecryptfs_mount_crypt_stat *mount_crypt_stat = NULL;
>
> - rc = lock_parent(dentry, &lower_dentry, &lower_dir);
> - if (rc)
> - goto out_lock;
> + lower_dentry = ecryptfs_start_creating_dentry(dentry);
> + if (IS_ERR(lower_dentry))
> + return PTR_ERR(lower_dentry);
> + lower_dir = lower_dentry->d_parent->d_inode;
> +
> mount_crypt_stat = &ecryptfs_superblock_to_private(
> dir->i_sb)->mount_crypt_stat;
> rc = ecryptfs_encrypt_and_encode_filename(&encoded_symname,
> @@ -490,7 +500,7 @@ static int ecryptfs_symlink(struct mnt_idmap *idmap,
> fsstack_copy_attr_times(dir, lower_dir);
> fsstack_copy_inode_size(dir, lower_dir);
> out_lock:
> - inode_unlock(lower_dir);
> + end_creating(lower_dentry, NULL);
> if (d_really_is_negative(dentry))
> d_drop(dentry);
> return rc;
> @@ -501,12 +511,14 @@ static struct dentry *ecryptfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
> {
> int rc;
> struct dentry *lower_dentry;
> + struct dentry *lower_dir_dentry;
> struct inode *lower_dir;
>
> - rc = lock_parent(dentry, &lower_dentry, &lower_dir);
> - if (rc)
> - goto out;
> -
> + lower_dentry = ecryptfs_start_creating_dentry(dentry);
> + if (IS_ERR(lower_dentry))
> + return lower_dentry;
> + lower_dir_dentry = dget(lower_dentry->d_parent);
> + lower_dir = lower_dir_dentry->d_inode;
> lower_dentry = vfs_mkdir(&nop_mnt_idmap, lower_dir,
> lower_dentry, mode);
> rc = PTR_ERR(lower_dentry);
> @@ -522,7 +534,7 @@ static struct dentry *ecryptfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
> fsstack_copy_inode_size(dir, lower_dir);
> set_nlink(dir, lower_dir->i_nlink);
> out:
> - inode_unlock(lower_dir);
> + end_creating(lower_dentry, lower_dir_dentry);
> if (d_really_is_negative(dentry))
> d_drop(dentry);
> return ERR_PTR(rc);
> @@ -534,21 +546,18 @@ static int ecryptfs_rmdir(struct inode *dir, struct dentry *dentry)
> struct inode *lower_dir;
> int rc;
>
> - rc = lock_parent(dentry, &lower_dentry, &lower_dir);
> - dget(lower_dentry); // don't even try to make the lower negative
> - if (!rc) {
> - if (d_unhashed(lower_dentry))
> - rc = -EINVAL;
> - else
> - rc = vfs_rmdir(&nop_mnt_idmap, lower_dir, lower_dentry);
> - }
> + lower_dentry = ecryptfs_start_removing_dentry(dentry);
> + if (IS_ERR(lower_dentry))
> + return PTR_ERR(lower_dentry);
> + lower_dir = lower_dentry->d_parent->d_inode;
> +
> + rc = vfs_rmdir(&nop_mnt_idmap, lower_dir, lower_dentry);
> if (!rc) {
> clear_nlink(d_inode(dentry));
> fsstack_copy_attr_times(dir, lower_dir);
> set_nlink(dir, lower_dir->i_nlink);
> }
> - dput(lower_dentry);
> - inode_unlock(lower_dir);
> + end_removing(lower_dentry);
> if (!rc)
> d_drop(dentry);
> return rc;
> @@ -562,10 +571,12 @@ ecryptfs_mknod(struct mnt_idmap *idmap, struct inode *dir,
> struct dentry *lower_dentry;
> struct inode *lower_dir;
>
> - rc = lock_parent(dentry, &lower_dentry, &lower_dir);
> - if (!rc)
> - rc = vfs_mknod(&nop_mnt_idmap, lower_dir,
> - lower_dentry, mode, dev);
> + lower_dentry = ecryptfs_start_creating_dentry(dentry);
> + if (IS_ERR(lower_dentry))
> + return PTR_ERR(lower_dentry);
> + lower_dir = lower_dentry->d_parent->d_inode;
> +
> + rc = vfs_mknod(&nop_mnt_idmap, lower_dir, lower_dentry, mode, dev);
> if (rc || d_really_is_negative(lower_dentry))
> goto out;
> rc = ecryptfs_interpose(lower_dentry, dentry, dir->i_sb);
> @@ -574,7 +585,7 @@ ecryptfs_mknod(struct mnt_idmap *idmap, struct inode *dir,
> fsstack_copy_attr_times(dir, lower_dir);
> fsstack_copy_inode_size(dir, lower_dir);
> out:
> - inode_unlock(lower_dir);
> + end_removing(lower_dentry);
> if (d_really_is_negative(dentry))
> d_drop(dentry);
> return rc;
> @@ -590,7 +601,6 @@ ecryptfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
> struct dentry *lower_new_dentry;
> struct dentry *lower_old_dir_dentry;
> struct dentry *lower_new_dir_dentry;
> - struct dentry *trap;
> struct inode *target_inode;
> struct renamedata rd = {};
>
> @@ -605,31 +615,13 @@ ecryptfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
>
> target_inode = d_inode(new_dentry);
>
> - trap = lock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
> - if (IS_ERR(trap))
> - return PTR_ERR(trap);
> - dget(lower_new_dentry);
> - rc = -EINVAL;
> - if (lower_old_dentry->d_parent != lower_old_dir_dentry)
> - goto out_lock;
> - if (lower_new_dentry->d_parent != lower_new_dir_dentry)
> - goto out_lock;
> - if (d_unhashed(lower_old_dentry) || d_unhashed(lower_new_dentry))
> - goto out_lock;
> - /* source should not be ancestor of target */
> - if (trap == lower_old_dentry)
> - goto out_lock;
> - /* target should not be ancestor of source */
> - if (trap == lower_new_dentry) {
> - rc = -ENOTEMPTY;
> - goto out_lock;
> - }
> + rd.mnt_idmap = &nop_mnt_idmap;
> + rd.old_parent = lower_old_dir_dentry;
> + rd.new_parent = lower_new_dir_dentry;
> + rc = start_renaming_two_dentries(&rd, lower_old_dentry, lower_new_dentry);
> + if (rc)
> + return rc;
>
> - rd.mnt_idmap = &nop_mnt_idmap;
> - rd.old_parent = lower_old_dir_dentry;
> - rd.old_dentry = lower_old_dentry;
> - rd.new_parent = lower_new_dir_dentry;
> - rd.new_dentry = lower_new_dentry;
> rc = vfs_rename(&rd);
> if (rc)
> goto out_lock;
> @@ -640,8 +632,7 @@ ecryptfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
> if (new_dir != old_dir)
> fsstack_copy_attr_all(old_dir, d_inode(lower_old_dir_dentry));
> out_lock:
> - dput(lower_new_dentry);
> - unlock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
> + end_renaming(&rd);
> return rc;
> }
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 0a5261640ae5..91e484dbc239 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3397,6 +3397,39 @@ struct dentry *start_removing_noperm(struct dentry *parent,
> }
> EXPORT_SYMBOL(start_removing_noperm);
>
> +/**
> + * start_creating_dentry - prepare to create a given dentry
> + * @parent: directory from which dentry should be removed
> + * @child: the dentry to be removed
> + *
> + * A lock is taken to protect the dentry again other dirops and
> + * the validity of the dentry is checked: correct parent and still hashed.
> + *
> + * If the dentry is valid and negative a reference is taken and
> + * returned. If not an error is returned.
> + *
> + * end_creating() should be called when creation is complete, or aborted.
> + *
> + * Returns: the valid dentry, or an error.
> + */
> +struct dentry *start_creating_dentry(struct dentry *parent,
> + struct dentry *child)
> +{
> + inode_lock_nested(parent->d_inode, I_MUTEX_PARENT);
> + if (unlikely(IS_DEADDIR(parent->d_inode) ||
> + child->d_parent != parent ||
> + d_unhashed(child))) {
> + inode_unlock(parent->d_inode);
> + return ERR_PTR(-EINVAL);
> + }
> + if (d_is_positive(child)) {
> + inode_unlock(parent->d_inode);
> + return ERR_PTR(-EEXIST);
> + }
> + return dget(child);
> +}
> +EXPORT_SYMBOL(start_creating_dentry);
> +
> /**
> * start_removing_dentry - prepare to remove a given dentry
> * @parent: directory from which dentry should be removed
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index a99ac8b7e24a..208aed1d6728 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -100,6 +100,8 @@ struct dentry *start_removing_killable(struct mnt_idmap *idmap,
> struct qstr *name);
> struct dentry *start_creating_noperm(struct dentry *parent, struct qstr *name);
> struct dentry *start_removing_noperm(struct dentry *parent, struct qstr *name);
> +struct dentry *start_creating_dentry(struct dentry *parent,
> + struct dentry *child);
> struct dentry *start_removing_dentry(struct dentry *parent,
> struct dentry *child);
>
> --
> 2.50.0.107.gf914562f5916.dirty
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 14/14] VFS: introduce end_creating_keep()
2025-10-15 1:47 ` [PATCH v2 14/14] VFS: introduce end_creating_keep() NeilBrown
@ 2025-10-19 10:39 ` Amir Goldstein
0 siblings, 0 replies; 32+ messages in thread
From: Amir Goldstein @ 2025-10-19 10:39 UTC (permalink / raw)
To: NeilBrown
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Wed, Oct 15, 2025 at 3:49 AM NeilBrown <neilb@ownmail.net> wrote:
>
> From: NeilBrown <neil@brown.name>
>
> Occasionally the caller of end_creating() wants to keep using the dentry.
> Rather then requiring them to dget() the dentry (when not an error)
> before calling end_creating(), provide end_creating_keep() which does
> this.
>
> cachefiles and overlayfs make use of this.
>
> Signed-off-by: NeilBrown <neil@brown.name>
Thanks for adding this cleanup patch!
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> ---
> fs/cachefiles/namei.c | 3 +--
> fs/overlayfs/dir.c | 8 ++------
> fs/overlayfs/super.c | 11 +++--------
> include/linux/namei.h | 22 ++++++++++++++++++++++
> 4 files changed, 28 insertions(+), 16 deletions(-)
>
> diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
> index 10f010dc9946..5c50293328f4 100644
> --- a/fs/cachefiles/namei.c
> +++ b/fs/cachefiles/namei.c
> @@ -155,8 +155,7 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
>
> /* Tell rmdir() it's not allowed to delete the subdir */
> inode_lock(d_inode(subdir));
> - dget(subdir);
> - end_creating(subdir);
> + end_creating_keep(subdir);
>
> if (!__cachefiles_mark_inode_in_use(NULL, d_inode(subdir))) {
> pr_notice("cachefiles: Inode already in use: %pd (B=%lx)\n",
> diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
> index 00dc797f2da7..cadbb47c6225 100644
> --- a/fs/overlayfs/dir.c
> +++ b/fs/overlayfs/dir.c
> @@ -251,10 +251,7 @@ struct dentry *ovl_create_temp(struct ovl_fs *ofs, struct dentry *workdir,
> if (IS_ERR(ret))
> return ret;
> ret = ovl_create_real(ofs, workdir, ret, attr);
> - if (!IS_ERR(ret))
> - dget(ret);
> - end_creating(ret);
> - return ret;
> + return end_creating_keep(ret);
> }
>
> static int ovl_set_opaque_xerr(struct dentry *dentry, struct dentry *upper,
> @@ -364,8 +361,7 @@ static int ovl_create_upper(struct dentry *dentry, struct inode *inode,
> if (IS_ERR(newdentry))
> return PTR_ERR(newdentry);
>
> - dget(newdentry);
> - end_creating(newdentry);
> + end_creating_keep(newdentry);
>
> if (ovl_type_merge(dentry->d_parent) && d_is_dir(newdentry) &&
> !ovl_allow_offline_changes(ofs)) {
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index 3acda985c8a3..7b8fc1cab6eb 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -319,8 +319,7 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
> };
>
> if (work->d_inode) {
> - dget(work);
> - end_creating(work);
> + end_creating_keep(work);
> if (persist)
> return work;
> err = -EEXIST;
> @@ -336,9 +335,7 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
> }
>
> work = ovl_do_mkdir(ofs, dir, work, attr.ia_mode);
> - if (!IS_ERR(work))
> - dget(work);
> - end_creating(work);
> + end_creating_keep(work);
> err = PTR_ERR(work);
> if (IS_ERR(work))
> goto out_err;
> @@ -630,9 +627,7 @@ static struct dentry *ovl_lookup_or_create(struct ovl_fs *ofs,
> if (!child->d_inode)
> child = ovl_create_real(ofs, parent, child,
> OVL_CATTR(mode));
> - if (!IS_ERR(child))
> - dget(child);
> - end_creating(child);
> + end_creating_keep(child);
> }
> dput(parent);
>
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 0ef73d739a31..3d82c6a19197 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -125,6 +125,28 @@ static inline void end_creating(struct dentry *child)
> end_dirop(child);
> }
>
> +/* end_creating_keep - finish action started with start_creating() and return result
> + * @child: dentry returned by start_creating() or vfs_mkdir()
> + *
> + * Unlock and return the child. This can be called after
> + * start_creating() whether that function succeeded or not,
> + * but it is not needed on failure.
> + *
> + * If vfs_mkdir() was called then the value returned from that function
> + * should be given for @child rather than the original dentry, as vfs_mkdir()
> + * may have provided a new dentry.
> + *
> + * Returns: @child, which may be a dentry or an error.
> + *
> + */
> +static inline struct dentry *end_creating_keep(struct dentry *child)
> +{
> + if (!IS_ERR(child))
> + dget(child);
> + end_dirop(child);
> + return child;
> +}
> +
> /**
> * end_removing - finish action started with start_removing
> * @child: dentry returned by start_removing()
> --
> 2.50.0.107.gf914562f5916.dirty
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 13/14] VFS: change vfs_mkdir() to unlock on failure.
2025-10-15 1:47 ` [PATCH v2 13/14] VFS: change vfs_mkdir() to unlock on failure NeilBrown
@ 2025-10-19 10:46 ` Amir Goldstein
2025-10-22 3:54 ` NeilBrown
0 siblings, 1 reply; 32+ messages in thread
From: Amir Goldstein @ 2025-10-19 10:46 UTC (permalink / raw)
To: NeilBrown
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Wed, Oct 15, 2025 at 3:49 AM NeilBrown <neilb@ownmail.net> wrote:
>
> From: NeilBrown <neil@brown.name>
>
> vfs_mkdir() already drops the reference to the dentry on failure but it
> leaves the parent locked.
> This complicates end_creating() which needs to unlock the parent even
> though the dentry is no longer available.
>
> If we change vfs_mkdir() to unlock on failure as well as releasing the
> dentry, we can remove the "parent" arg from end_creating() and simplify
> the rules for calling it.
Does this deserve a mention in filesystems/porting.rst?
I think the change of semantics in
c54b386969a58 VFS: Change vfs_mkdir() to return the dentry.
was also not recorded in porting.rst.
>
> Note that cachefiles_get_directory() can choose to substitute an error
> instead of actually calling vfs_mkdir(), for fault injection. In that
> case it needs to call end_creating(), just as vfs_mkdir() now does on
> error.
>
> Signed-off-by: NeilBrown <neil@brown.name>
This looks much better IMO.
With one nit below fixed, feel free to add:
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> ---
> fs/btrfs/ioctl.c | 2 +-
> fs/cachefiles/namei.c | 14 ++++++++------
> fs/ecryptfs/inode.c | 8 ++++----
> fs/namei.c | 4 ++--
> fs/nfsd/nfs3proc.c | 2 +-
> fs/nfsd/nfs4proc.c | 2 +-
> fs/nfsd/nfs4recover.c | 2 +-
> fs/nfsd/nfsproc.c | 2 +-
> fs/nfsd/vfs.c | 8 ++++----
> fs/overlayfs/copy_up.c | 4 ++--
> fs/overlayfs/dir.c | 13 ++++++-------
> fs/overlayfs/super.c | 6 +++---
> fs/xfs/scrub/orphanage.c | 2 +-
> include/linux/namei.h | 28 +++++++++-------------------
> ipc/mqueue.c | 2 +-
> 15 files changed, 45 insertions(+), 54 deletions(-)
>
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 4fbfdd8faf6a..90ef777eae25 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -935,7 +935,7 @@ static noinline int btrfs_mksubvol(struct dentry *parent,
> out_up_read:
> up_read(&fs_info->subvol_sem);
> out_dput:
> - end_creating(dentry, parent);
> + end_creating(dentry);
> return ret;
> }
>
> diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
> index b97a40917a32..10f010dc9946 100644
> --- a/fs/cachefiles/namei.c
> +++ b/fs/cachefiles/namei.c
> @@ -130,8 +130,10 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
> ret = cachefiles_inject_write_error();
> if (ret == 0)
> subdir = vfs_mkdir(&nop_mnt_idmap, d_inode(dir), subdir, 0700);
> - else
> + else {
> + end_creating(subdir);
> subdir = ERR_PTR(ret);
> + }
Please match if {} else {} parenthesis
Thanks,
Amir.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 00/14] Create and use APIs to centralise locking for directory ops.
2025-10-15 1:46 [PATCH v2 00/14] Create and use APIs to centralise locking for directory ops NeilBrown
` (13 preceding siblings ...)
2025-10-15 1:47 ` [PATCH v2 14/14] VFS: introduce end_creating_keep() NeilBrown
@ 2025-10-19 10:50 ` Amir Goldstein
14 siblings, 0 replies; 32+ messages in thread
From: Amir Goldstein @ 2025-10-19 10:50 UTC (permalink / raw)
To: NeilBrown
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Wed, Oct 15, 2025 at 3:48 AM NeilBrown <neilb@ownmail.net> wrote:
>
> Here is a new series in response to review (thanks!).
>
> The series creates a number of interfaces that combine locking and lookup, or
> sometimes do the locking without lookup.
> After this series there are still a few places where non-VFS code knows
> about the locking rules. Places that call simple_start_creating()
> still have explicit unlock on the parent (I think). Al is doing work
> on those places so I'll wait until he is finished.
> Also there explicit locking one place in nfsd which is changed by an
> in-flight patch. That lands it can be updated to use these interfaces.
>
> The first patch here should have been part of the last patch of the
> previous series - sorry for leaving it out.
>
> I've combined the new interface with changes is various places to use
> the new interfaces. I think it is easier to reveiew the design that way.
> If necessary I can split these out to have separate patches for each place
> that new APIs are used if the general design is accepted.
>
Apart from minor review comments on patch 9 not addressed
from v1, all looks good to me.
I could really use a "changed since v1" summary in this cover letter
and/or individual patches.
Please push the pdirops branch so I can run the overlayfs tests.
Thanks,
Amir.
>
> [PATCH v2 01/14] debugfs: rename end_creating() to
> [PATCH v2 02/14] VFS: introduce start_dirop() and end_dirop()
> [PATCH v2 03/14] VFS: tidy up do_unlinkat()
> [PATCH v2 04/14] VFS/nfsd/cachefiles/ovl: add start_creating() and
> [PATCH v2 05/14] VFS/nfsd/cachefiles/ovl: introduce start_removing()
> [PATCH v2 06/14] VFS: introduce start_creating_noperm() and
> [PATCH v2 07/14] VFS: introduce start_removing_dentry()
> [PATCH v2 08/14] VFS: add start_creating_killable() and
> [PATCH v2 09/14] VFS/nfsd/ovl: introduce start_renaming() and
> [PATCH v2 10/14] VFS/ovl/smb: introduce start_renaming_dentry()
> [PATCH v2 11/14] Add start_renaming_two_dentries()
> [PATCH v2 12/14] ecryptfs: use new start_creating/start_removing APIs
> [PATCH v2 13/14] VFS: change vfs_mkdir() to unlock on failure.
> [PATCH v2 14/14] VFS: introduce end_creating_keep()
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 06/14] VFS: introduce start_creating_noperm() and start_removing_noperm()
2025-10-15 1:46 ` [PATCH v2 06/14] VFS: introduce start_creating_noperm() and start_removing_noperm() NeilBrown
2025-10-19 10:15 ` Amir Goldstein
@ 2025-10-20 8:36 ` kernel test robot
1 sibling, 0 replies; 32+ messages in thread
From: kernel test robot @ 2025-10-20 8:36 UTC (permalink / raw)
To: NeilBrown
Cc: oe-lkp, lkp, linux-fsdevel, linux-xfs, linux-kernel,
Alexander Viro, Christian Brauner, Amir Goldstein, Jeff Layton,
Jan Kara, oliver.sang
Hello,
kernel test robot noticed "kernel_BUG_at_fs/open.c" on:
commit: dc62b71efff8093d50a9e1f7321cabcb76ff8447 ("[PATCH v2 06/14] VFS: introduce start_creating_noperm() and start_removing_noperm()")
url: https://github.com/intel-lab-lkp/linux/commits/NeilBrown/debugfs-rename-end_creating-to-debugfs_end_creating/20251015-095112
base: https://git.kernel.org/cgit/linux/kernel/git/driver-core/driver-core.git 3a8660878839faadb4f1a6dd72c3179c1df56787
patch link: https://lore.kernel.org/all/20251015014756.2073439-7-neilb@ownmail.net/
patch subject: [PATCH v2 06/14] VFS: introduce start_creating_noperm() and start_removing_noperm()
in testcase: trinity
version:
with following parameters:
runtime: 300s
group: group-03
nr_groups: 5
config: x86_64-randconfig-074-20251018
compiler: clang-20
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
(please refer to attached dmesg/kmsg for entire log/backtrace)
+------------------------------------------+------------+------------+
| | 04e655aedc | dc62b71eff |
+------------------------------------------+------------+------------+
| boot_successes | 9 | 0 |
| boot_failures | 0 | 9 |
| kernel_BUG_at_fs/open.c | 0 | 9 |
| Oops:invalid_opcode:#[##] | 0 | 9 |
| RIP:dentry_open | 0 | 9 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 9 |
+------------------------------------------+------------+------------+
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202510201610.40b1a654-lkp@intel.com
[ 58.472072][ T3648] ------------[ cut here ]------------
[ 58.472990][ T3648] kernel BUG at fs/open.c:1116!
[ 58.479432][ T3648] Oops: invalid opcode: 0000 [#1]
[ 58.480255][ T3648] CPU: 0 UID: 192664024 PID: 3648 Comm: trinity-c2 Tainted: G T 6.18.0-rc1-00006-gdc62b71efff8 #1 PREEMPT
[ 58.482041][ T3648] Tainted: [T]=RANDSTRUCT
[ 58.482680][ T3648] RIP: 0010:dentry_open (fs/open.c:1116)
[ 58.483443][ T3648] Code: df 48 89 c3 48 89 c6 e8 90 fe ff ff 85 c0 74 0f 89 c5 48 89 df e8 82 92 00 00 48 63 c5 eb 03 48 89 d8 5b 5d c3 cc cc cc cc cc <0f> 0b 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 66
All code
========
0: df 48 89 fisttps -0x77(%rax)
3: c3 ret
4: 48 89 c6 mov %rax,%rsi
7: e8 90 fe ff ff call 0xfffffffffffffe9c
c: 85 c0 test %eax,%eax
e: 74 0f je 0x1f
10: 89 c5 mov %eax,%ebp
12: 48 89 df mov %rbx,%rdi
15: e8 82 92 00 00 call 0x929c
1a: 48 63 c5 movslq %ebp,%rax
1d: eb 03 jmp 0x22
1f: 48 89 d8 mov %rbx,%rax
22: 5b pop %rbx
23: 5d pop %rbp
24: c3 ret
25: cc int3
26: cc int3
27: cc int3
28: cc int3
29: cc int3
2a:* 0f 0b ud2 <-- trapping instruction
2c: 66 66 66 66 66 66 2e data16 data16 data16 data16 data16 cs nopw 0x0(%rax,%rax,1)
33: 0f 1f 84 00 00 00 00
3a: 00
3b: 66 data16
3c: 66 data16
3d: 66 data16
3e: 66 data16
3f: 66 data16
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 66 66 66 66 66 66 2e data16 data16 data16 data16 data16 cs nopw 0x0(%rax,%rax,1)
9: 0f 1f 84 00 00 00 00
10: 00
11: 66 data16
12: 66 data16
13: 66 data16
14: 66 data16
15: 66 data16
[ 58.486214][ T3648] RSP: 0018:ffff88813b80fe20 EFLAGS: 00010246
[ 58.487088][ T3648] RAX: 0000000000000001 RBX: ffff888142398000 RCX: ffff888142354000
[ 58.488074][ T3648] RDX: ffff88813acdc000 RSI: 00000000fffffff9 RDI: ffff88813b80fe58
[ 58.489177][ T3648] RBP: 0000000000000213 R08: 0000000000000000 R09: 0000000000000000
[ 58.490214][ T3648] R10: ffff888141f59a90 R11: ffffffff81960fc9 R12: 0000000000000000
[ 58.491333][ T3648] R13: 00000000fffffff9 R14: ffff888102692798 R15: ffff888142354000
[ 58.492445][ T3648] FS: 00000000357bf880(0000) GS:0000000000000000(0000) knlGS:0000000000000000
[ 58.493720][ T3648] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 58.494658][ T3648] CR2: 00007ffffffff000 CR3: 0000000142163000 CR4: 00000000000406b0
[ 58.495806][ T3648] Call Trace:
[ 58.496319][ T3648] <TASK>
[ 58.496723][ T3648] do_mq_open (ipc/mqueue.c:923)
[ 58.497381][ T3648] __x64_sys_mq_open (ipc/mqueue.c:949 ipc/mqueue.c:942 ipc/mqueue.c:942)
[ 58.498090][ T3648] ? entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
[ 58.498979][ T3648] do_syscall_64 (arch/x86/entry/syscall_64.c:?)
[ 58.499657][ T3648] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
[ 58.500484][ T3648] RIP: 0033:0x463519
[ 58.501061][ T3648] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db 59 00 00 c3 66 2e 0f 1f 84 00 00 00 00
All code
========
0: 00 f3 add %dh,%bl
2: c3 ret
3: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
a: 00 00 00
d: 0f 1f 40 00 nopl 0x0(%rax)
11: 48 89 f8 mov %rdi,%rax
14: 48 89 f7 mov %rsi,%rdi
17: 48 89 d6 mov %rdx,%rsi
1a: 48 89 ca mov %rcx,%rdx
1d: 4d 89 c2 mov %r8,%r10
20: 4d 89 c8 mov %r9,%r8
23: 4c 8b 4c 24 08 mov 0x8(%rsp),%r9
28: 0f 05 syscall
2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <-- trapping instruction
30: 0f 83 db 59 00 00 jae 0x5a11
36: c3 ret
37: 66 data16
38: 2e cs
39: 0f .byte 0xf
3a: 1f (bad)
3b: 84 00 test %al,(%rax)
3d: 00 00 add %al,(%rax)
...
Code starting with the faulting instruction
===========================================
0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax
6: 0f 83 db 59 00 00 jae 0x59e7
c: c3 ret
d: 66 data16
e: 2e cs
f: 0f .byte 0xf
10: 1f (bad)
11: 84 00 test %al,(%rax)
13: 00 00 add %al,(%rax)
...
[ 58.503916][ T3648] RSP: 002b:00007ffc376f2be8 EFLAGS: 00000246 ORIG_RAX: 00000000000000f0
[ 58.505150][ T3648] RAX: ffffffffffffffda RBX: 00000000000000f0 RCX: 0000000000463519
[ 58.506312][ T3648] RDX: 0000000000000030 RSI: fffffffffffffff9 RDI: 00007f9ad403e000
[ 58.507487][ T3648] RBP: 00007f9ad4949000 R08: 0000000030010000 R09: 0000000001000000
[ 58.508586][ T3648] R10: 00007f9ad403e008 R11: 0000000000000246 R12: 0000000000000002
[ 58.509732][ T3648] R13: 00007f9ad4949058 R14: 00000000357bf850 R15: 00007f9ad4949000
[ 58.510897][ T3648] </TASK>
[ 58.511374][ T3648] Modules linked in:
[ 58.512025][ T3648] ---[ end trace 0000000000000000 ]---
[ 58.524399][ T3648] RIP: 0010:dentry_open (fs/open.c:1116)
[ 58.527033][ T3648] Code: df 48 89 c3 48 89 c6 e8 90 fe ff ff 85 c0 74 0f 89 c5 48 89 df e8 82 92 00 00 48 63 c5 eb 03 48 89 d8 5b 5d c3 cc cc cc cc cc <0f> 0b 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 66
All code
========
0: df 48 89 fisttps -0x77(%rax)
3: c3 ret
4: 48 89 c6 mov %rax,%rsi
7: e8 90 fe ff ff call 0xfffffffffffffe9c
c: 85 c0 test %eax,%eax
e: 74 0f je 0x1f
10: 89 c5 mov %eax,%ebp
12: 48 89 df mov %rbx,%rdi
15: e8 82 92 00 00 call 0x929c
1a: 48 63 c5 movslq %ebp,%rax
1d: eb 03 jmp 0x22
1f: 48 89 d8 mov %rbx,%rax
22: 5b pop %rbx
23: 5d pop %rbp
24: c3 ret
25: cc int3
26: cc int3
27: cc int3
28: cc int3
29: cc int3
2a:* 0f 0b ud2 <-- trapping instruction
2c: 66 66 66 66 66 66 2e data16 data16 data16 data16 data16 cs nopw 0x0(%rax,%rax,1)
33: 0f 1f 84 00 00 00 00
3a: 00
3b: 66 data16
3c: 66 data16
3d: 66 data16
3e: 66 data16
3f: 66 data16
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 66 66 66 66 66 66 2e data16 data16 data16 data16 data16 cs nopw 0x0(%rax,%rax,1)
9: 0f 1f 84 00 00 00 00
10: 00
11: 66 data16
12: 66 data16
13: 66 data16
14: 66 data16
15: 66 data16
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20251020/202510201610.40b1a654-lkp@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 09/14] VFS/nfsd/ovl: introduce start_renaming() and end_renaming()
2025-10-19 10:33 ` Amir Goldstein
@ 2025-10-21 13:25 ` Christian Brauner
0 siblings, 0 replies; 32+ messages in thread
From: Christian Brauner @ 2025-10-21 13:25 UTC (permalink / raw)
To: Amir Goldstein
Cc: NeilBrown, Alexander Viro, Jeff Layton, Jan Kara, linux-fsdevel
On Sun, Oct 19, 2025 at 12:33:05PM +0200, Amir Goldstein wrote:
> On Sun, Oct 19, 2025 at 12:25 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Wed, Oct 15, 2025 at 3:48 AM NeilBrown <neilb@ownmail.net> wrote:
> > >
> > > From: NeilBrown <neil@brown.name>
> > >
> > > start_renaming() combines name lookup and locking to prepare for rename.
> > > It is used when two names need to be looked up as in nfsd and overlayfs -
> > > cases where one or both dentrys are already available will be handled
> > > separately.
> > >
> > > __start_renaming() avoids the inode_permission check and hash
> > > calculation and is suitable after filename_parentat() in do_renameat2().
> > > It subsumes quite a bit of code from that function.
> > >
> > > start_renaming() does calculate the hash and check X permission and is
> > > suitable elsewhere:
> > > - nfsd_rename()
> > > - ovl_rename()
> > >
> > > Signed-off-by: NeilBrown <neil@brown.name>
> >
> > Review comments from v1 not addressed:
> > https://lore.kernel.org/linux-fsdevel/CAOQ4uxh+NcAv9v6NtVRrLCMYbpd0ajtvsd6c9-W2a7+vur0UJQ@mail.gmail.com/
> >
>
> Obviously, I am more attached to my comments on the overlayfs
> changes. since you have not replied to those, you might have missed them...
I'll wait for a resend of this version then.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 01/14] debugfs: rename end_creating() to debugfs_end_creating()
2025-10-15 1:46 ` [PATCH v2 01/14] debugfs: rename end_creating() to debugfs_end_creating() NeilBrown
@ 2025-10-21 13:26 ` Christian Brauner
0 siblings, 0 replies; 32+ messages in thread
From: Christian Brauner @ 2025-10-21 13:26 UTC (permalink / raw)
To: NeilBrown
Cc: Alexander Viro, Amir Goldstein, Jeff Layton, Jan Kara,
linux-fsdevel
On Wed, Oct 15, 2025 at 12:46:53PM +1100, NeilBrown wrote:
> From: NeilBrown <neil@brown.name>
>
> By not using the generic end_creating() name here we are free to use it
> more globally for a more generic function.
> This should have been done when start_creating() was renamed.
>
> For consistency, also rename failed_creating().
>
> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
> Signed-off-by: NeilBrown <neil@brown.name>
> ---
Makes a lot of sense, thanks.
I'll spare slapping my RvBs onto everything because it'll carry my SoB
anyway.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 06/14] VFS: introduce start_creating_noperm() and start_removing_noperm()
2025-10-19 10:15 ` Amir Goldstein
@ 2025-10-22 3:20 ` NeilBrown
0 siblings, 0 replies; 32+ messages in thread
From: NeilBrown @ 2025-10-22 3:20 UTC (permalink / raw)
To: Amir Goldstein
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Sun, 19 Oct 2025, Amir Goldstein wrote:
> On Wed, Oct 15, 2025 at 3:48 AM NeilBrown <neilb@ownmail.net> wrote:
> >
> > From: NeilBrown <neil@brown.name>
> >
> > xfs, fuse, ipc/mqueue need variants of start_creating or start_removing
> > which do not check permissions.
> > This patch adds _noperm versions of these functions.
> >
> > Note that do_mq_open() was only calling mntget() so it could call
> > path_put() - it didn't really need an extra reference on the mnt.
> > Now it doesn't call mntget() and uses end_creating() which does
> > the dput() half of path_put().
> >
> > Signed-off-by: NeilBrown <neil@brown.name>
>
> I noticed that both Jeff and I had already given our RVB on v1
> and it's not here, so has this patch changed in some fundamental way since v1?
> I could really use a "changed since v1" section when that happens.
>
> Otherwise, feel free to add:
> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
I hadn't changed anything. I think I started looking at your
suggestions about documentation changes and the fact that mq_unlink()
passes d_inode(dentry->d_parent) to vfs_unlink() .... and got
distracted.
I've added both reviewed-bys (thanks) and changes mq_unlink() to pass
d_inode(mnt->mnt_root) (using the same dentry as was given to
lookup_noperm).
I've also fixed the bug kernel-test-robot reported.
Thanks,
NeilBrown
>
> > ---
> > fs/fuse/dir.c | 19 +++++++---------
> > fs/namei.c | 48 ++++++++++++++++++++++++++++++++++++++++
> > fs/xfs/scrub/orphanage.c | 11 ++++-----
> > include/linux/namei.h | 2 ++
> > ipc/mqueue.c | 31 +++++++++-----------------
> > 5 files changed, 73 insertions(+), 38 deletions(-)
> >
> > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> > index ecaec0fea3a1..40ca94922349 100644
> > --- a/fs/fuse/dir.c
> > +++ b/fs/fuse/dir.c
> > @@ -1397,27 +1397,25 @@ int fuse_reverse_inval_entry(struct fuse_conn *fc, u64 parent_nodeid,
> > if (!parent)
> > return -ENOENT;
> >
> > - inode_lock_nested(parent, I_MUTEX_PARENT);
> > if (!S_ISDIR(parent->i_mode))
> > - goto unlock;
> > + goto put_parent;
> >
> > err = -ENOENT;
> > dir = d_find_alias(parent);
> > if (!dir)
> > - goto unlock;
> > + goto put_parent;
> >
> > - name->hash = full_name_hash(dir, name->name, name->len);
> > - entry = d_lookup(dir, name);
> > + entry = start_removing_noperm(dir, name);
> > dput(dir);
> > - if (!entry)
> > - goto unlock;
> > + if (IS_ERR(entry))
> > + goto put_parent;
> >
> > fuse_dir_changed(parent);
> > if (!(flags & FUSE_EXPIRE_ONLY))
> > d_invalidate(entry);
> > fuse_invalidate_entry_cache(entry);
> >
> > - if (child_nodeid != 0 && d_really_is_positive(entry)) {
> > + if (child_nodeid != 0) {
> > inode_lock(d_inode(entry));
> > if (get_node_id(d_inode(entry)) != child_nodeid) {
> > err = -ENOENT;
> > @@ -1445,10 +1443,9 @@ int fuse_reverse_inval_entry(struct fuse_conn *fc, u64 parent_nodeid,
> > } else {
> > err = 0;
> > }
> > - dput(entry);
> >
> > - unlock:
> > - inode_unlock(parent);
> > + end_removing(entry);
> > + put_parent:
> > iput(parent);
> > return err;
> > }
> > diff --git a/fs/namei.c b/fs/namei.c
> > index ae833dfa277c..696e4b794416 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -3275,6 +3275,54 @@ struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *parent,
> > }
> > EXPORT_SYMBOL(start_removing);
> >
> > +/**
> > + * start_creating_noperm - prepare to create a given name without permission checking
> > + * @parent: directory in which to prepare to create the name
> > + * @name: the name to be created
> > + *
> > + * Locks are taken and a lookup in performed prior to creating
> > + * an object in a directory.
> > + *
> > + * If the name already exists, a positive dentry is returned.
> > + *
> > + * Returns: a negative or positive dentry, or an error.
> > + */
> > +struct dentry *start_creating_noperm(struct dentry *parent,
> > + struct qstr *name)
> > +{
> > + int err = lookup_noperm_common(name, parent);
> > +
> > + if (err)
> > + return ERR_PTR(err);
> > + return start_dirop(parent, name, LOOKUP_CREATE);
> > +}
> > +EXPORT_SYMBOL(start_creating_noperm);
> > +
> > +/**
> > + * start_removing_noperm - prepare to remove a given name without permission checking
> > + * @parent: directory in which to find the name
> > + * @name: the name to be removed
> > + *
> > + * Locks are taken and a lookup in performed prior to removing
> > + * an object from a directory.
> > + *
> > + * If the name doesn't exist, an error is returned.
> > + *
> > + * end_removing() should be called when removal is complete, or aborted.
> > + *
> > + * Returns: a positive dentry, or an error.
> > + */
> > +struct dentry *start_removing_noperm(struct dentry *parent,
> > + struct qstr *name)
> > +{
> > + int err = lookup_noperm_common(name, parent);
> > +
> > + if (err)
> > + return ERR_PTR(err);
> > + return start_dirop(parent, name, 0);
> > +}
> > +EXPORT_SYMBOL(start_removing_noperm);
> > +
> > #ifdef CONFIG_UNIX98_PTYS
> > int path_pts(struct path *path)
> > {
> > diff --git a/fs/xfs/scrub/orphanage.c b/fs/xfs/scrub/orphanage.c
> > index 9c12cb844231..e732605924a1 100644
> > --- a/fs/xfs/scrub/orphanage.c
> > +++ b/fs/xfs/scrub/orphanage.c
> > @@ -152,11 +152,10 @@ xrep_orphanage_create(
> > }
> >
> > /* Try to find the orphanage directory. */
> > - inode_lock_nested(root_inode, I_MUTEX_PARENT);
> > - orphanage_dentry = lookup_noperm(&QSTR(ORPHANAGE), root_dentry);
> > + orphanage_dentry = start_creating_noperm(root_dentry, &QSTR(ORPHANAGE));
> > if (IS_ERR(orphanage_dentry)) {
> > error = PTR_ERR(orphanage_dentry);
> > - goto out_unlock_root;
> > + goto out_dput_root;
> > }
> >
> > /*
> > @@ -170,7 +169,7 @@ xrep_orphanage_create(
> > orphanage_dentry, 0750);
> > error = PTR_ERR(orphanage_dentry);
> > if (IS_ERR(orphanage_dentry))
> > - goto out_unlock_root;
> > + goto out_dput_orphanage;
> > }
> >
> > /* Not a directory? Bail out. */
> > @@ -200,9 +199,7 @@ xrep_orphanage_create(
> > sc->orphanage_ilock_flags = 0;
> >
> > out_dput_orphanage:
> > - dput(orphanage_dentry);
> > -out_unlock_root:
> > - inode_unlock(VFS_I(sc->mp->m_rootip));
> > + end_creating(orphanage_dentry, root_dentry);
> > out_dput_root:
> > dput(root_dentry);
> > out:
> > diff --git a/include/linux/namei.h b/include/linux/namei.h
> > index 9ee76e88f3dd..688e157d6afc 100644
> > --- a/include/linux/namei.h
> > +++ b/include/linux/namei.h
> > @@ -92,6 +92,8 @@ struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *parent,
> > struct qstr *name);
> > struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *parent,
> > struct qstr *name);
> > +struct dentry *start_creating_noperm(struct dentry *parent, struct qstr *name);
> > +struct dentry *start_removing_noperm(struct dentry *parent, struct qstr *name);
> >
> > /**
> > * end_creating - finish action started with start_creating
> > diff --git a/ipc/mqueue.c b/ipc/mqueue.c
> > index 093551fe66a7..060e8e9c4f59 100644
> > --- a/ipc/mqueue.c
> > +++ b/ipc/mqueue.c
> > @@ -913,13 +913,11 @@ static int do_mq_open(const char __user *u_name, int oflag, umode_t mode,
> > goto out_putname;
> >
> > ro = mnt_want_write(mnt); /* we'll drop it in any case */
> > - inode_lock(d_inode(root));
> > - path.dentry = lookup_noperm(&QSTR(name->name), root);
> > + path.dentry = start_creating_noperm(root, &QSTR(name->name));
> > if (IS_ERR(path.dentry)) {
> > error = PTR_ERR(path.dentry);
> > goto out_putfd;
> > }
> > - path.mnt = mntget(mnt);
> > error = prepare_open(path.dentry, oflag, ro, mode, name, attr);
> > if (!error) {
> > struct file *file = dentry_open(&path, oflag, current_cred());
> > @@ -928,13 +926,12 @@ static int do_mq_open(const char __user *u_name, int oflag, umode_t mode,
> > else
> > error = PTR_ERR(file);
> > }
> > - path_put(&path);
> > out_putfd:
> > if (error) {
> > put_unused_fd(fd);
> > fd = error;
> > }
> > - inode_unlock(d_inode(root));
> > + end_creating(path.dentry, root);
> > if (!ro)
> > mnt_drop_write(mnt);
> > out_putname:
> > @@ -957,7 +954,7 @@ SYSCALL_DEFINE1(mq_unlink, const char __user *, u_name)
> > int err;
> > struct filename *name;
> > struct dentry *dentry;
> > - struct inode *inode = NULL;
> > + struct inode *inode;
> > struct ipc_namespace *ipc_ns = current->nsproxy->ipc_ns;
> > struct vfsmount *mnt = ipc_ns->mq_mnt;
> >
> > @@ -969,26 +966,20 @@ SYSCALL_DEFINE1(mq_unlink, const char __user *, u_name)
> > err = mnt_want_write(mnt);
> > if (err)
> > goto out_name;
> > - inode_lock_nested(d_inode(mnt->mnt_root), I_MUTEX_PARENT);
> > - dentry = lookup_noperm(&QSTR(name->name), mnt->mnt_root);
> > + dentry = start_removing_noperm(mnt->mnt_root, &QSTR(name->name));
> > if (IS_ERR(dentry)) {
> > err = PTR_ERR(dentry);
> > - goto out_unlock;
> > + goto out_drop_write;
> > }
> >
> > inode = d_inode(dentry);
> > - if (!inode) {
> > - err = -ENOENT;
> > - } else {
> > - ihold(inode);
> > - err = vfs_unlink(&nop_mnt_idmap, d_inode(dentry->d_parent),
> > - dentry, NULL);
> > - }
> > - dput(dentry);
> > -
> > -out_unlock:
> > - inode_unlock(d_inode(mnt->mnt_root));
> > + ihold(inode);
> > + err = vfs_unlink(&nop_mnt_idmap, d_inode(dentry->d_parent),
> > + dentry, NULL);
> > + end_removing(dentry);
> > iput(inode);
> > +
> > +out_drop_write:
> > mnt_drop_write(mnt);
> > out_name:
> > putname(name);
> > --
> > 2.50.0.107.gf914562f5916.dirty
> >
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 09/14] VFS/nfsd/ovl: introduce start_renaming() and end_renaming()
2025-10-19 10:25 ` Amir Goldstein
2025-10-19 10:33 ` Amir Goldstein
@ 2025-10-22 3:35 ` NeilBrown
1 sibling, 0 replies; 32+ messages in thread
From: NeilBrown @ 2025-10-22 3:35 UTC (permalink / raw)
To: Amir Goldstein
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Sun, 19 Oct 2025, Amir Goldstein wrote:
> On Wed, Oct 15, 2025 at 3:48 AM NeilBrown <neilb@ownmail.net> wrote:
> >
> > From: NeilBrown <neil@brown.name>
> >
> > start_renaming() combines name lookup and locking to prepare for rename.
> > It is used when two names need to be looked up as in nfsd and overlayfs -
> > cases where one or both dentrys are already available will be handled
> > separately.
> >
> > __start_renaming() avoids the inode_permission check and hash
> > calculation and is suitable after filename_parentat() in do_renameat2().
> > It subsumes quite a bit of code from that function.
> >
> > start_renaming() does calculate the hash and check X permission and is
> > suitable elsewhere:
> > - nfsd_rename()
> > - ovl_rename()
> >
> > Signed-off-by: NeilBrown <neil@brown.name>
>
> Review comments from v1 not addressed:
> https://lore.kernel.org/linux-fsdevel/CAOQ4uxh+NcAv9v6NtVRrLCMYbpd0ajtvsd6c9-W2a7+vur0UJQ@mail.gmail.com/
I do remember looking at those .... thanks for the reminder.
They all look good and sensible. I have made the appropriate changes.
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 13/14] VFS: change vfs_mkdir() to unlock on failure.
2025-10-19 10:46 ` Amir Goldstein
@ 2025-10-22 3:54 ` NeilBrown
0 siblings, 0 replies; 32+ messages in thread
From: NeilBrown @ 2025-10-22 3:54 UTC (permalink / raw)
To: Amir Goldstein
Cc: Alexander Viro, Christian Brauner, Jeff Layton, Jan Kara,
linux-fsdevel
On Sun, 19 Oct 2025, Amir Goldstein wrote:
> On Wed, Oct 15, 2025 at 3:49 AM NeilBrown <neilb@ownmail.net> wrote:
> >
> > From: NeilBrown <neil@brown.name>
> >
> > vfs_mkdir() already drops the reference to the dentry on failure but it
> > leaves the parent locked.
> > This complicates end_creating() which needs to unlock the parent even
> > though the dentry is no longer available.
> >
> > If we change vfs_mkdir() to unlock on failure as well as releasing the
> > dentry, we can remove the "parent" arg from end_creating() and simplify
> > the rules for calling it.
>
> Does this deserve a mention in filesystems/porting.rst?
> I think the change of semantics in
> c54b386969a58 VFS: Change vfs_mkdir() to return the dentry.
> was also not recorded in porting.rst.
Yes, I think you are right. I've added that and addressed the nit
below.
Thanks,
NeilBrown
>
> >
> > Note that cachefiles_get_directory() can choose to substitute an error
> > instead of actually calling vfs_mkdir(), for fault injection. In that
> > case it needs to call end_creating(), just as vfs_mkdir() now does on
> > error.
> >
> > Signed-off-by: NeilBrown <neil@brown.name>
>
> This looks much better IMO.
>
> With one nit below fixed, feel free to add:
> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
>
> > ---
> > fs/btrfs/ioctl.c | 2 +-
> > fs/cachefiles/namei.c | 14 ++++++++------
> > fs/ecryptfs/inode.c | 8 ++++----
> > fs/namei.c | 4 ++--
> > fs/nfsd/nfs3proc.c | 2 +-
> > fs/nfsd/nfs4proc.c | 2 +-
> > fs/nfsd/nfs4recover.c | 2 +-
> > fs/nfsd/nfsproc.c | 2 +-
> > fs/nfsd/vfs.c | 8 ++++----
> > fs/overlayfs/copy_up.c | 4 ++--
> > fs/overlayfs/dir.c | 13 ++++++-------
> > fs/overlayfs/super.c | 6 +++---
> > fs/xfs/scrub/orphanage.c | 2 +-
> > include/linux/namei.h | 28 +++++++++-------------------
> > ipc/mqueue.c | 2 +-
> > 15 files changed, 45 insertions(+), 54 deletions(-)
> >
> > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > index 4fbfdd8faf6a..90ef777eae25 100644
> > --- a/fs/btrfs/ioctl.c
> > +++ b/fs/btrfs/ioctl.c
> > @@ -935,7 +935,7 @@ static noinline int btrfs_mksubvol(struct dentry *parent,
> > out_up_read:
> > up_read(&fs_info->subvol_sem);
> > out_dput:
> > - end_creating(dentry, parent);
> > + end_creating(dentry);
> > return ret;
> > }
> >
> > diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
> > index b97a40917a32..10f010dc9946 100644
> > --- a/fs/cachefiles/namei.c
> > +++ b/fs/cachefiles/namei.c
> > @@ -130,8 +130,10 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
> > ret = cachefiles_inject_write_error();
> > if (ret == 0)
> > subdir = vfs_mkdir(&nop_mnt_idmap, d_inode(dir), subdir, 0700);
> > - else
> > + else {
> > + end_creating(subdir);
> > subdir = ERR_PTR(ret);
> > + }
>
> Please match if {} else {} parenthesis
>
> Thanks,
> Amir.
>
^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2025-10-22 3:54 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-15 1:46 [PATCH v2 00/14] Create and use APIs to centralise locking for directory ops NeilBrown
2025-10-15 1:46 ` [PATCH v2 01/14] debugfs: rename end_creating() to debugfs_end_creating() NeilBrown
2025-10-21 13:26 ` Christian Brauner
2025-10-15 1:46 ` [PATCH v2 02/14] VFS: introduce start_dirop() and end_dirop() NeilBrown
2025-10-19 9:56 ` Amir Goldstein
2025-10-15 1:46 ` [PATCH v2 03/14] VFS: tidy up do_unlinkat() NeilBrown
2025-10-19 10:02 ` Amir Goldstein
2025-10-15 1:46 ` [PATCH v2 04/14] VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating() NeilBrown
2025-10-19 10:10 ` Amir Goldstein
2025-10-15 1:46 ` [PATCH v2 05/14] VFS/nfsd/cachefiles/ovl: introduce start_removing() and end_removing() NeilBrown
2025-10-15 1:46 ` [PATCH v2 06/14] VFS: introduce start_creating_noperm() and start_removing_noperm() NeilBrown
2025-10-19 10:15 ` Amir Goldstein
2025-10-22 3:20 ` NeilBrown
2025-10-20 8:36 ` kernel test robot
2025-10-15 1:46 ` [PATCH v2 07/14] VFS: introduce start_removing_dentry() NeilBrown
2025-10-15 1:47 ` [PATCH v2 08/14] VFS: add start_creating_killable() and start_removing_killable() NeilBrown
2025-10-15 1:47 ` [PATCH v2 09/14] VFS/nfsd/ovl: introduce start_renaming() and end_renaming() NeilBrown
2025-10-19 10:25 ` Amir Goldstein
2025-10-19 10:33 ` Amir Goldstein
2025-10-21 13:25 ` Christian Brauner
2025-10-22 3:35 ` NeilBrown
2025-10-15 1:47 ` [PATCH v2 10/14] VFS/ovl/smb: introduce start_renaming_dentry() NeilBrown
2025-10-19 10:31 ` Amir Goldstein
2025-10-15 1:47 ` [PATCH v2 11/14] Add start_renaming_two_dentries() NeilBrown
2025-10-15 1:47 ` [PATCH v2 12/14] ecryptfs: use new start_creating/start_removing APIs NeilBrown
2025-10-19 10:38 ` Amir Goldstein
2025-10-15 1:47 ` [PATCH v2 13/14] VFS: change vfs_mkdir() to unlock on failure NeilBrown
2025-10-19 10:46 ` Amir Goldstein
2025-10-22 3:54 ` NeilBrown
2025-10-15 1:47 ` [PATCH v2 14/14] VFS: introduce end_creating_keep() NeilBrown
2025-10-19 10:39 ` Amir Goldstein
2025-10-19 10:50 ` [PATCH v2 00/14] Create and use APIs to centralise locking for directory ops Amir Goldstein
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).