From: Jeff Layton <jlayton@kernel.org>
To: NeilBrown <neil@brown.name>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>,
Amir Goldstein <amir73il@gmail.com>
Cc: Jan Kara <jack@suse.cz>,
linux-fsdevel@vger.kernel.org, Chris Mason <clm@fb.com>,
David Sterba <dsterba@suse.com>,
David Howells <dhowells@redhat.com>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
"Rafael J. Wysocki" <rafael@kernel.org>,
Danilo Krummrich <dakr@kernel.org>,
Tyler Hicks <code@tyhicks.com>,
Miklos Szeredi <miklos@szeredi.hu>,
Chuck Lever <chuck.lever@oracle.com>,
Olga Kornievskaia <okorniev@redhat.com>,
Dai Ngo <Dai.Ngo@oracle.com>,
Namjae Jeon <linkinjeon@kernel.org>,
Steve French <smfrench@gmail.com>,
Sergey Senozhatsky <senozhatsky@chromium.org>,
Carlos Maiolino <cem@kernel.org>,
John Johansen <john.johansen@canonical.com>,
Paul Moore <paul@paul-moore.com>,
James Morris <jmorris@namei.org>,
"Serge E. Hallyn" <serge@hallyn.com>,
Stephen Smalley <stephen.smalley.work@gmail.com>,
Ondrej Mosnacek <omosnace@redhat.com>,
Mateusz Guzik <mjguzik@gmail.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Stefan Berger <stefanb@linux.ibm.com>,
"Darrick J. Wong" <djwong@kernel.org>,
linux-kernel@vger.kernel.org, netfs@lists.linux.dev,
ecryptfs@vger.kernel.org, linux-nfs@vger.kernel.org,
linux-unionfs@vger.kernel.org, linux-cifs@vger.kernel.org,
linux-xfs@vger.kernel.org,
linux-security-module@vger.kernel.org, selinux@vger.kernel.org
Subject: Re: [PATCH v5 02/14] VFS: introduce start_dirop() and end_dirop()
Date: Wed, 12 Nov 2025 09:46:27 -0500 [thread overview]
Message-ID: <32e65149e7678ac3cbc7f8dbed26429fd9c7ae78.camel@kernel.org> (raw)
In-Reply-To: <20251106005333.956321-3-neilb@ownmail.net>
On Thu, 2025-11-06 at 11:50 +1100, NeilBrown wrote:
> From: NeilBrown <neil@brown.name>
>
> The fact that directory operations (create,remove,rename) are protected
> by a lock on the parent is known widely throughout the kernel.
> In order to change this - to instead lock the target dentry - it is
> best to centralise this knowledge so it can be changed in one place.
>
> This patch introduces start_dirop() which is local to VFS code.
> It performs the required locking for create and remove. Rename
> will be handled separately.
>
> Various functions with names like start_creating() or start_removing_path(),
> some of which already exist, will export this functionality beyond the VFS.
>
> end_dirop() is the partner of start_dirop(). It drops the lock and
> releases the reference on the dentry.
> It *is* exported so that various end_creating etc functions can be inline.
>
> As vfs_mkdir() drops the dentry on error we cannot use end_dirop() as
> that won't unlock when the dentry IS_ERR(). For now we need an explicit
> unlock when dentry IS_ERR(). I hope to change vfs_mkdir() to unlock
> when it drops a dentry so that explicit unlock can go away.
>
> end_dirop() can always be called on the result of start_dirop(), but not
> after vfs_mkdir(). After a vfs_mkdir() we still may need the explicit
> unlock as seen in end_creating_path().
>
> As well as adding start_dirop() and end_dirop()
> this patch uses them in:
> - simple_start_creating (which requires sharing lookup_noperm_common()
> with libfs.c)
> - start_removing_path / start_removing_user_path_at
> - filename_create / end_creating_path()
> - do_rmdir(), do_unlinkat()
>
> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> Signed-off-by: NeilBrown <neil@brown.name>
> ---
> fs/internal.h | 3 ++
> fs/libfs.c | 36 ++++++++---------
> fs/namei.c | 98 ++++++++++++++++++++++++++++++++++------------
> include/linux/fs.h | 2 +
> 4 files changed, 95 insertions(+), 44 deletions(-)
>
> diff --git a/fs/internal.h b/fs/internal.h
> index 9b2b4d116880..d08d5e2235e9 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -67,6 +67,9 @@ int vfs_tmpfile(struct mnt_idmap *idmap,
> const struct path *parentpath,
> struct file *file, umode_t mode);
> struct dentry *d_hash_and_lookup(struct dentry *, struct qstr *);
> +struct dentry *start_dirop(struct dentry *parent, struct qstr *name,
> + unsigned int lookup_flags);
> +int lookup_noperm_common(struct qstr *qname, struct dentry *base);
>
> /*
> * namespace.c
> diff --git a/fs/libfs.c b/fs/libfs.c
> index 1661dcb7d983..2d6657947abd 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -2290,27 +2290,25 @@ void stashed_dentry_prune(struct dentry *dentry)
> cmpxchg(stashed, dentry, NULL);
> }
>
> -/* parent must be held exclusive */
> +/**
> + * simple_start_creating - prepare to create a given name
> + * @parent: directory in which to prepare to create the name
> + * @name: the name to be created
> + *
> + * Required lock is taken and a lookup in performed prior to creating an
> + * object in a directory. No permission checking is performed.
> + *
> + * Returns: a negative dentry on which vfs_create() or similar may
> + * be attempted, or an error.
> + */
> struct dentry *simple_start_creating(struct dentry *parent, const char *name)
> {
> - struct dentry *dentry;
> - struct inode *dir = d_inode(parent);
> + struct qstr qname = QSTR(name);
> + int err;
>
> - inode_lock(dir);
> - if (unlikely(IS_DEADDIR(dir))) {
> - inode_unlock(dir);
> - return ERR_PTR(-ENOENT);
> - }
> - dentry = lookup_noperm(&QSTR(name), parent);
> - if (IS_ERR(dentry)) {
> - inode_unlock(dir);
> - return dentry;
> - }
> - if (dentry->d_inode) {
> - dput(dentry);
> - inode_unlock(dir);
> - return ERR_PTR(-EEXIST);
> - }
> - return dentry;
> + err = lookup_noperm_common(&qname, parent);
> + if (err)
> + return ERR_PTR(err);
> + return start_dirop(parent, &qname, LOOKUP_CREATE | LOOKUP_EXCL);
> }
> EXPORT_SYMBOL(simple_start_creating);
> diff --git a/fs/namei.c b/fs/namei.c
> index 39c4d52f5b54..231e1ffd4b8d 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -2765,6 +2765,48 @@ static int filename_parentat(int dfd, struct filename *name,
> return __filename_parentat(dfd, name, flags, parent, last, type, NULL);
> }
>
> +/**
> + * start_dirop - begin a create or remove dirop, performing locking and lookup
> + * @parent: the dentry of the parent in which the operation will occur
> + * @name: a qstr holding the name within that parent
> + * @lookup_flags: intent and other lookup flags.
> + *
> + * The lookup is performed and necessary locks are taken so that, on success,
> + * the returned dentry can be operated on safely.
> + * The qstr must already have the hash value calculated.
> + *
> + * Returns: a locked dentry, or an error.
> + *
> + */
> +struct dentry *start_dirop(struct dentry *parent, struct qstr *name,
> + unsigned int lookup_flags)
> +{
> + struct dentry *dentry;
> + struct inode *dir = d_inode(parent);
> +
> + inode_lock_nested(dir, I_MUTEX_PARENT);
> + dentry = lookup_one_qstr_excl(name, parent, lookup_flags);
> + if (IS_ERR(dentry))
> + inode_unlock(dir);
> + return dentry;
> +}
> +
> +/**
> + * end_dirop - signal completion of a dirop
> + * @de: the dentry which was returned by start_dirop or similar.
> + *
> + * If the de is an error, nothing happens. Otherwise any lock taken to
> + * protect the dentry is dropped and the dentry itself is release (dput()).
> + */
> +void end_dirop(struct dentry *de)
> +{
> + if (!IS_ERR(de)) {
> + inode_unlock(de->d_parent->d_inode);
> + dput(de);
> + }
> +}
> +EXPORT_SYMBOL(end_dirop);
> +
> /* does lookup, returns the object with parent locked */
> static struct dentry *__start_removing_path(int dfd, struct filename *name,
> struct path *path)
> @@ -2781,10 +2823,9 @@ static struct dentry *__start_removing_path(int dfd, struct filename *name,
> return ERR_PTR(-EINVAL);
> /* don't fail immediately if it's r/o, at least try to report other errors */
> error = mnt_want_write(parent_path.mnt);
> - inode_lock_nested(parent_path.dentry->d_inode, I_MUTEX_PARENT);
> - d = lookup_one_qstr_excl(&last, parent_path.dentry, 0);
> + d = start_dirop(parent_path.dentry, &last, 0);
> if (IS_ERR(d))
> - goto unlock;
> + goto drop;
> if (error)
> goto fail;
> path->dentry = no_free_ptr(parent_path.dentry);
> @@ -2792,10 +2833,9 @@ static struct dentry *__start_removing_path(int dfd, struct filename *name,
> return d;
>
> fail:
> - dput(d);
> + end_dirop(d);
> d = ERR_PTR(error);
> -unlock:
> - inode_unlock(parent_path.dentry->d_inode);
> +drop:
> if (!error)
> mnt_drop_write(parent_path.mnt);
> return d;
> @@ -2910,7 +2950,7 @@ int vfs_path_lookup(struct dentry *dentry, struct vfsmount *mnt,
> }
> EXPORT_SYMBOL(vfs_path_lookup);
>
> -static int lookup_noperm_common(struct qstr *qname, struct dentry *base)
> +int lookup_noperm_common(struct qstr *qname, struct dentry *base)
> {
> const char *name = qname->name;
> u32 len = qname->len;
> @@ -4223,21 +4263,18 @@ static struct dentry *filename_create(int dfd, struct filename *name,
> */
> if (last.name[last.len] && !want_dir)
> create_flags &= ~LOOKUP_CREATE;
> - inode_lock_nested(path->dentry->d_inode, I_MUTEX_PARENT);
> - dentry = lookup_one_qstr_excl(&last, path->dentry,
> - reval_flag | create_flags);
> + dentry = start_dirop(path->dentry, &last, reval_flag | create_flags);
> if (IS_ERR(dentry))
> - goto unlock;
> + goto out_drop_write;
>
> if (unlikely(error))
> goto fail;
>
> return dentry;
> fail:
> - dput(dentry);
> + end_dirop(dentry);
> dentry = ERR_PTR(error);
> -unlock:
> - inode_unlock(path->dentry->d_inode);
> +out_drop_write:
> if (!error)
> mnt_drop_write(path->mnt);
> out:
> @@ -4256,11 +4293,26 @@ struct dentry *start_creating_path(int dfd, const char *pathname,
> }
> EXPORT_SYMBOL(start_creating_path);
>
> +/**
> + * end_creating_path - finish a code section started by start_creating_path()
> + * @path: the path instantiated by start_creating_path()
> + * @dentry: the dentry returned by start_creating_path()
> + *
> + * end_creating_path() will unlock and locks taken by start_creating_path()
> + * and drop an references that were taken. It should only be called
> + * if start_creating_path() returned a non-error.
> + * If vfs_mkdir() was called and it returned an error, that error *should*
> + * be passed to end_creating_path() together with the path.
> + */
> void end_creating_path(const struct path *path, struct dentry *dentry)
> {
> - if (!IS_ERR(dentry))
> - dput(dentry);
> - inode_unlock(path->dentry->d_inode);
> + if (IS_ERR(dentry))
> + /* The parent is still locked despite the error from
> + * vfs_mkdir() - must unlock it.
> + */
> + inode_unlock(path->dentry->d_inode);
> + else
> + end_dirop(dentry);
> mnt_drop_write(path->mnt);
> path_put(path);
> }
> @@ -4592,8 +4644,7 @@ int do_rmdir(int dfd, struct filename *name)
> if (error)
> goto exit2;
>
> - inode_lock_nested(path.dentry->d_inode, I_MUTEX_PARENT);
> - dentry = lookup_one_qstr_excl(&last, path.dentry, lookup_flags);
> + dentry = start_dirop(path.dentry, &last, lookup_flags);
> error = PTR_ERR(dentry);
> if (IS_ERR(dentry))
> goto exit3;
> @@ -4602,9 +4653,8 @@ int do_rmdir(int dfd, struct filename *name)
> goto exit4;
> error = vfs_rmdir(mnt_idmap(path.mnt), path.dentry->d_inode, dentry);
> exit4:
> - dput(dentry);
> + end_dirop(dentry);
> exit3:
> - inode_unlock(path.dentry->d_inode);
> mnt_drop_write(path.mnt);
> exit2:
> path_put(&path);
> @@ -4721,8 +4771,7 @@ int do_unlinkat(int dfd, struct filename *name)
> if (error)
> goto exit2;
> retry_deleg:
> - inode_lock_nested(path.dentry->d_inode, I_MUTEX_PARENT);
> - dentry = lookup_one_qstr_excl(&last, path.dentry, lookup_flags);
> + dentry = start_dirop(path.dentry, &last, lookup_flags);
> error = PTR_ERR(dentry);
> if (!IS_ERR(dentry)) {
>
> @@ -4737,9 +4786,8 @@ int do_unlinkat(int dfd, struct filename *name)
> error = vfs_unlink(mnt_idmap(path.mnt), path.dentry->d_inode,
> dentry, &delegated_inode);
> exit3:
> - dput(dentry);
> + end_dirop(dentry);
> }
> - inode_unlock(path.dentry->d_inode);
> if (inode)
> iput(inode); /* truncate the inode here */
> inode = NULL;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 03e450dd5211..9e7556e79d19 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3196,6 +3196,8 @@ extern void iterate_supers_type(struct file_system_type *,
> void filesystems_freeze(void);
> void filesystems_thaw(void);
>
> +void end_dirop(struct dentry *de);
> +
> extern int dcache_dir_open(struct inode *, struct file *);
> extern int dcache_dir_close(struct inode *, struct file *);
> extern loff_t dcache_dir_lseek(struct file *, loff_t, int);
Reviewed-by: Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2025-11-12 14:46 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-06 0:50 [PATCH v5 00/14] Create and use APIs to centralise locking for directory ops NeilBrown
2025-11-06 0:50 ` [PATCH v5 01/14] debugfs: rename end_creating() to debugfs_end_creating() NeilBrown
2025-11-06 0:50 ` [PATCH v5 02/14] VFS: introduce start_dirop() and end_dirop() NeilBrown
2025-11-12 14:46 ` Jeff Layton [this message]
2025-11-06 0:50 ` [PATCH v5 03/14] VFS: tidy up do_unlinkat() NeilBrown
2025-11-12 14:47 ` Jeff Layton
2025-11-06 0:50 ` [PATCH v5 04/14] VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating() NeilBrown
2025-11-06 0:50 ` [PATCH v5 05/14] VFS/nfsd/cachefiles/ovl: introduce start_removing() and end_removing() NeilBrown
2025-11-12 14:51 ` Jeff Layton
2025-11-12 23:51 ` NeilBrown
2025-11-06 0:50 ` [PATCH v5 06/14] VFS: introduce start_creating_noperm() and start_removing_noperm() NeilBrown
2025-11-06 0:50 ` [PATCH v5 07/14] VFS: introduce start_removing_dentry() NeilBrown
2025-11-06 1:56 ` Namjae Jeon
2025-11-12 14:58 ` Jeff Layton
2025-11-06 0:50 ` [PATCH v5 08/14] VFS: add start_creating_killable() and start_removing_killable() NeilBrown
2025-11-12 15:01 ` Jeff Layton
2025-11-06 0:50 ` [PATCH v5 09/14] VFS/nfsd/ovl: introduce start_renaming() and end_renaming() NeilBrown
2025-11-12 15:06 ` Jeff Layton
2025-11-06 0:50 ` [PATCH v5 10/14] VFS/ovl/smb: introduce start_renaming_dentry() NeilBrown
2025-11-12 19:36 ` Jeff Layton
2025-11-06 0:50 ` [PATCH v5 11/14] Add start_renaming_two_dentries() NeilBrown
2025-11-10 16:08 ` Stephen Smalley
2025-11-10 17:30 ` Stephen Smalley
2025-11-12 23:37 ` NeilBrown
2025-11-12 19:38 ` Jeff Layton
2025-11-06 0:50 ` [PATCH v5 12/14] ecryptfs: use new start_creating/start_removing APIs NeilBrown
2025-11-12 19:41 ` Jeff Layton
2025-11-06 0:50 ` [PATCH v5 13/14] VFS: change vfs_mkdir() to unlock on failure NeilBrown
2025-11-12 19:45 ` Jeff Layton
2025-11-06 0:50 ` [PATCH v5 14/14] VFS: introduce end_creating_keep() NeilBrown
2025-11-12 19:46 ` Jeff Layton
2025-11-06 9:02 ` [syzbot ci] Re: Create and use APIs to centralise locking for directory ops syzbot ci
2025-11-12 22:50 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=32e65149e7678ac3cbc7f8dbed26429fd9c7ae78.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=Dai.Ngo@oracle.com \
--cc=amir73il@gmail.com \
--cc=brauner@kernel.org \
--cc=cem@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=clm@fb.com \
--cc=code@tyhicks.com \
--cc=dakr@kernel.org \
--cc=dhowells@redhat.com \
--cc=djwong@kernel.org \
--cc=dsterba@suse.com \
--cc=ecryptfs@vger.kernel.org \
--cc=gregkh@linuxfoundation.org \
--cc=jack@suse.cz \
--cc=jmorris@namei.org \
--cc=john.johansen@canonical.com \
--cc=linkinjeon@kernel.org \
--cc=linux-cifs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-security-module@vger.kernel.org \
--cc=linux-unionfs@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=miklos@szeredi.hu \
--cc=mjguzik@gmail.com \
--cc=neil@brown.name \
--cc=netfs@lists.linux.dev \
--cc=okorniev@redhat.com \
--cc=omosnace@redhat.com \
--cc=paul@paul-moore.com \
--cc=rafael@kernel.org \
--cc=selinux@vger.kernel.org \
--cc=senozhatsky@chromium.org \
--cc=serge@hallyn.com \
--cc=smfrench@gmail.com \
--cc=stefanb@linux.ibm.com \
--cc=stephen.smalley.work@gmail.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).