From: NeilBrown <neilb@ownmail.net>
To: Linus Torvalds <torvalds@linux-foundation.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
Jeff Layton <jlayton@kernel.org>,
Trond Myklebust <trondmy@kernel.org>,
Anna Schumaker <anna@kernel.org>,
Carlos Maiolino <cem@kernel.org>,
Miklos Szeredi <miklos@szeredi.hu>,
Amir Goldstein <amir73il@gmail.com>,
Jan Harkes <jaharkes@cs.cmu.edu>, Hugh Dickins <hughd@google.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
David Howells <dhowells@redhat.com>,
Marc Dionne <marc.dionne@auristor.com>,
Steve French <sfrench@samba.org>,
Namjae Jeon <linkinjeon@kernel.org>,
Sungjong Seo <sj1557.seo@samsung.com>,
Yuezhang Mo <yuezhang.mo@sony.com>,
Andreas Hindborg <a.hindborg@kernel.org>,
Breno Leitao <leitao@debian.org>, "Theodore Ts'o" <tytso@mit.edu>,
Andreas Dilger <adilger.kernel@dilger.ca>,
Steven Rostedt <rostedt@goodmis.org>,
Masami Hiramatsu <mhiramat@kernel.org>,
Ilya Dryomov <idryomov@gmail.com>,
Alex Markuze <amarkuze@redhat.com>,
Viacheslav Dubeyko <slava@dubeyko.com>,
Tyler Hicks <code@tyhicks.com>,
Andreas Gruenbacher <agruenba@redhat.com>,
Richard Weinberger <richard@nod.at>,
Anton Ivanov <anton.ivanov@cambridgegreys.com>,
Johannes Berg <johannes@sipsolutions.net>,
Jeremy Kerr <jk@ozlabs.org>, Ard Biesheuvel <ardb@kernel.org>
Cc: linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org,
linux-xfs@vger.kernel.org, linux-unionfs@vger.kernel.org,
coda@cs.cmu.edu, linux-mm@kvack.org,
linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org,
linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-trace-kernel@vger.kernel.org, ceph-devel@vger.kernel.org,
ecryptfs@vger.kernel.org, gfs2@lists.linux.dev,
linux-um@lists.infradead.org, linux-efi@vger.kernel.org
Subject: [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops
Date: Fri, 13 Mar 2026 08:11:47 +1100 [thread overview]
Message-ID: <20260312214330.3885211-1-neilb@ownmail.net> (raw)
This patch set progresses my effort to improve concurrency of
directory operations and specifically to allow concurrent updates
in a given directory.
There are a bunch of VFS patches which introduce some new APIs and
improve existing ones. Then a bunch of per-filesystem changes which
adjust to meet new needs, often using the new APIs, then a final bunch
of VFS patches which discard some APIs that are no longer wanted, and
one (the second last) which makes the big change. Some of the fs
patches don't depend on any preceeding patch and if maintainers wanted
to take those early I certainly wouldn't object! I've put a '*' next
to patches which I think can be taken at any time.
My longer term goal involves pushing the parent-directory locking down
into filesystems (which can then discard it if it isn't needed) and using
exclusive dentry locking in the VFS for all directory operations other
than readdir - which by its nature needs shared locking and will
continue to use the directory lock.
The VFS already has exclusive dentry locking for the limited case of
lookup. Newly created dentries (when created by d_alloc_parallel()) are
exclusively locked using the DCACHE_PAR_LOOKUP bit. They remain
exclusive locked until they are hashed as negative or positive dentries,
or they are discarded.
DCACHE_PAR_LOOKUP currently depends on a shared parent lock to exclude
directory modifying operations. This patch set removes this dependency
so that d_alloc_parallel() can be called without locking and all
directory modifying operations receive either a hashed dentry or an
in-lookup dentry (they currently recieve either a hashed or unhashed,
or sometimes in-lookup (atomic_open only)).
The cases where a filesystem can receive an in-lookup dentry are:
- lookup. Currently can receive in-lookup or unhashed. After this patch set
it always receives in-lookup
- atomic_open. Currently can receive in-lookup or hashed-negative.
This doesn't change with this patchset.
- rename. currently can receive hashed or unhashed. After this patchset
can also receive in-lookup where previously it would receive unhashed.
This is only for the target of a rename over NFS.
- link, mknod, mkdir, symlink. currently received hashed-negative except for
NFS which notices the implied exclusive create and skips the lookup so
the filesystem can received unhashed-negative for the operation.
There are two particular needs to be addressed before we can use d_alloc_parallel()
outside of the directory lock.
1/ d_alloc_parallel() effects a blocking lock so lock ordering is important.
If we are to take the directory lock *after* calling d_alloc_parallel() (and
still holding an in-lookup dentry, as happens at least when ->atomic_open
is called) then we must never call d_alloc_parallel() while holding the
directory lock, even a shared lock.
This particularly affects readdir as several filesystems prime the dcache
with readdir results and so use d_alloc_parallel() in the ->iterate_shared
handler, which will now have deadlock potential. To address this we
introduce d_alloc_noblock() which fails rather than blocking.
A few other cases of potential lock inversion exist. These are
addressed by dropping the directory lock when it is safe to do so
before calling d_alloc_parallel(). This requires the addtion of
LOOKUP_SHARED so that ->lookup knows how the parent is locked. This
is ugly but is gone by the end of the series. After the locking is
rearranged in the second last patch, ->lookup is only ever called
with a shared lock.
2/ As d_alloc_parallel() will be able to run without the directory lock,
holding that lock exclusively is not enough to protect some dcache
manipulations. In particular, several filesystems d_drop() a dentry
and (possibly) re-hash it. This will no longer be safe as
d_alloc_parallel() could run while the dentry was dropped, would find
that name doesn't exist in the dcache, and would create a new dentry
leading to two uncoordinated dentries with the same name.
It will still be safe to d_drop() a dentry after the operation has
completed, whether in success or failure. But d_drop()ing before that
is best avoided. An early d_drop() that isn't followed by a rehash is
not clearly problematic for a filesystem which still uses parent locking
(as all do at present) but is good to discourage that pattern now.
This is addressed, in part, by changing d_splice_alias() to be able to
instantiate any negative dentry, whether hashed, unhashed, or
in-lookup. This removes the need for d_drop() in most cases.
New APIs added are:
- d_alloc_noblock - see patch 05 for details
- d_duplicate - patch 06
Removed APIs:
- d_alloc
- d_rehash
- d_add
- lookup_one
- lookup_noperm
Changed APIs:
- d_alloc_paralle - no longer requires a waitqueue_head_t
- d_splice_alias - now works with in-lookup dentry
- d_alloc_name - now works with ->d_hash
d_alloc_name() should be used with d_make_persistent(). These don't require
VFS locking as the filesystem doesn't permit create/remove via VFS calls,
and provides its own locking to avoid duplicate names.
d_splice_alias() should *always* be used:
in ->lookup
in ->iterate_shared for cache priming.
in ->atomic_open, possibly via a call to ->lookup
in ->mkdir unless d_instantiate_new() can be used.
in ->link ->symlink ->mknod if ->lookup skips LOOKUP_CREATE|LOOKUP_EXCL
Thanks for reading this far! I've been testing NFS but haven't tried
anything else yet. As well as the normal review of details I'd love to
know if I've missed any important conseqeunces of the locking change.
It is a big conceptual change and there could easily be surprising
implications.
Thanks,
NeilBrown
[PATCH 01/53] VFS: fix various typos in documentation for
[PATCH 02/53] VFS: enhance d_splice_alias() to handle in-lookup
[PATCH 03/53] VFS: allow d_alloc_name() to be used with ->d_hash
[PATCH 04/53] VFS: use global wait-queue table for d_alloc_parallel()
[PATCH 05/53] VFS: introduce d_alloc_noblock()
[PATCH 06/53] VFS: add d_duplicate()
[PATCH 07/53] VFS: Add LOOKUP_SHARED flag.
[PATCH 08/53] VFS/xfs: drop parent lock across d_alloc_parallel() in
*[PATCH 09/53] nfs: remove d_drop()/d_alloc_parallel() from
[PATCH 10/53] nfs: use d_splice_alias() in nfs_link()
[PATCH 11/53] nfs: don't d_drop() before d_splice_alias()
[PATCH 12/53] nfs: don't d_drop() before d_splice_alias() in
[PATCH 13/53] nfs: Use d_alloc_noblock() in nfs_prime_dcache()
[PATCH 14/53] nfs: use d_alloc_noblock() in silly-rename
[PATCH 15/53] nfs: use d_duplicate()
*[PATCH 16/53] ovl: drop dir lock for lookups in impure readdir
*[PATCH 17/53] coda: don't d_drop() early.
[PATCH 18/53] shmem: use d_duplicate()
*[PATCH 19/53] afs: use d_time instead of d_fsdata
*[PATCH 20/53] afs: don't unhash/rehash dentries during unlink/rename
[PATCH 21/53] afs: use d_splice_alias() in afs_vnode_new_inode()
[PATCH 22/53] afs: use d_alloc_nonblock in afs_sillyrename()
[PATCH 23/53] afs: lookup_atsys to drop and reclaim lock.
[PATCH 24/53] afs: use d_duplicate()
*[PATCH 25/53] smb/client: use d_time to store a timestamp in dentry,
*[PATCH 26/53] smb/client: don't unhashed and rehash to prevent new
*[PATCH 27/53] smb/client: use d_splice_alias() in atomic_open
[PATCH 28/53] smb/client: Use d_alloc_noblock() in
*[PATCH 29/53] exfat: simplify exfat_lookup()
*[PATCH 30/53] configfs: remove d_add() calls before
[PATCH 31/53] configfs: stop using d_add().
*[PATCH 32/53] ext4: move dcache modifying code out of __ext4_link()
*[PATCH 33/53] ext4: use on-stack dentries in
[PATCH 34/53] tracefs: stop using d_add().
[PATCH 35/53] cephfs: stop using d_add().
*[PATCH 36/53] cephfs: remove d_alloc from CEPH_MDS_OP_LOOKUPNAME
[PATCH 37/53] cephfs: Use d_alloc_noblock() in
[PATCH 38/53] cephfs: Don't d_drop() before d_splice_alias()
[PATCH 39/53] ecryptfs: stop using d_add().
[PATCH 40/53] gfs2: stop using d_add().
[PATCH 41/53] libfs: stop using d_add().
[PATCH 42/53] fuse: don't d_drop() before d_splice_alias()
[PATCH 43/53] fuse: Use d_alloc_noblock() in fuse_direntplus_link()
[PATCH 44/53] hostfs: don't d_drop() before d_splice_alias() in
[PATCH 45/53] efivarfs: use d_alloc_name()
[PATCH 46/53] Remove references to d_add() in documentation and
[PATCH 47/53] VFS: make d_alloc() local to VFS.
[PATCH 48/53] VFS: remove d_add()
[PATCH 49/53] VFS: remove d_rehash()
[PATCH 50/53] VFS: remove lookup_one() and lookup_noperm()
[PATCH 51/53] VFS: use d_alloc_parallel() in lookup_one_qstr_excl().
[PATCH 52/53] VFS: lift d_alloc_parallel above inode_lock
[PATCH 53/53] VFS: remove LOOKUP_SHARED
next reply other threads:[~2026-03-12 21:44 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-12 21:11 NeilBrown [this message]
2026-03-12 21:11 ` [PATCH 01/53] VFS: fix various typos in documentation for start_creating start_removing etc NeilBrown
2026-03-12 21:11 ` [PATCH 02/53] VFS: enhance d_splice_alias() to handle in-lookup dentries NeilBrown
2026-03-12 21:11 ` [PATCH 03/53] VFS: allow d_alloc_name() to be used with ->d_hash NeilBrown
2026-03-12 21:11 ` [PATCH 04/53] VFS: use global wait-queue table for d_alloc_parallel() NeilBrown
2026-03-12 21:11 ` [PATCH 05/53] VFS: introduce d_alloc_noblock() NeilBrown
2026-03-12 21:11 ` [PATCH 06/53] VFS: add d_duplicate() NeilBrown
2026-03-12 21:11 ` [PATCH 07/53] VFS: Add LOOKUP_SHARED flag NeilBrown
2026-03-12 21:11 ` [PATCH 08/53] VFS/xfs: drop parent lock across d_alloc_parallel() in d_add_ci() NeilBrown
2026-03-12 21:11 ` [PATCH 09/53] nfs: remove d_drop()/d_alloc_parallel() from nfs_atomic_open() NeilBrown
2026-03-12 21:11 ` [PATCH 10/53] nfs: use d_splice_alias() in nfs_link() NeilBrown
2026-03-12 21:11 ` [PATCH 11/53] nfs: don't d_drop() before d_splice_alias() NeilBrown
2026-03-12 21:11 ` [PATCH 12/53] nfs: don't d_drop() before d_splice_alias() in atomic_create NeilBrown
2026-03-12 21:12 ` [PATCH 13/53] nfs: Use d_alloc_noblock() in nfs_prime_dcache() NeilBrown
2026-03-12 21:12 ` [PATCH 14/53] nfs: use d_alloc_noblock() in silly-rename NeilBrown
2026-03-12 21:12 ` [PATCH 15/53] nfs: use d_duplicate() NeilBrown
2026-03-12 21:12 ` [PATCH 16/53] ovl: drop dir lock for lookups in impure readdir NeilBrown
2026-03-15 13:51 ` Amir Goldstein
2026-03-18 21:10 ` NeilBrown
2026-03-20 14:47 ` Amir Goldstein
2026-03-12 21:12 ` [PATCH 17/53] coda: don't d_drop() early NeilBrown
2026-03-12 21:12 ` [PATCH 18/53] shmem: use d_duplicate() NeilBrown
2026-03-12 21:12 ` [PATCH 19/53] afs: use d_time instead of d_fsdata NeilBrown
2026-03-12 21:12 ` [PATCH 20/53] afs: don't unhash/rehash dentries during unlink/rename NeilBrown
2026-03-12 21:12 ` [PATCH 21/53] afs: use d_splice_alias() in afs_vnode_new_inode() NeilBrown
2026-03-12 21:12 ` [PATCH 22/53] afs: use d_alloc_nonblock in afs_sillyrename() NeilBrown
2026-03-12 21:12 ` [PATCH 23/53] afs: lookup_atsys to drop and reclaim lock NeilBrown
2026-03-12 21:12 ` [PATCH 24/53] afs: use d_duplicate() NeilBrown
2026-03-12 21:12 ` [PATCH 25/53] smb/client: use d_time to store a timestamp in dentry, not d_fsdata NeilBrown
2026-03-12 21:12 ` [PATCH 26/53] smb/client: don't unhashed and rehash to prevent new opens NeilBrown
2026-03-12 21:12 ` [PATCH 27/53] smb/client: use d_splice_alias() in atomic_open NeilBrown
2026-03-12 21:12 ` [PATCH 28/53] smb/client: Use d_alloc_noblock() in cifs_prime_dcache() NeilBrown
2026-03-12 21:12 ` [PATCH 29/53] exfat: simplify exfat_lookup() NeilBrown
2026-03-12 21:12 ` [PATCH 30/53] configfs: remove d_add() calls before configfs_attach_group() NeilBrown
2026-03-12 21:12 ` [PATCH 31/53] configfs: stop using d_add() NeilBrown
2026-03-12 21:12 ` [PATCH 32/53] ext4: move dcache modifying code out of __ext4_link() NeilBrown
2026-03-17 10:00 ` Jan Kara
2026-03-17 20:27 ` [PATCH 32/53f] " NeilBrown
2026-03-18 17:47 ` Jan Kara
2026-03-12 21:12 ` [PATCH 33/53] ext4: use on-stack dentries in ext4_fc_replay_link_internal() NeilBrown
2026-03-17 9:37 ` Jan Kara
2026-03-12 21:12 ` [PATCH 34/53] tracefs: stop using d_add() NeilBrown
2026-03-12 21:12 ` [PATCH 35/53] cephfs: " NeilBrown
2026-03-12 21:12 ` [PATCH 36/53] cephfs: remove d_alloc from CEPH_MDS_OP_LOOKUPNAME handling in ceph_fill_trace() NeilBrown
2026-03-12 21:12 ` [PATCH 37/53] cephfs: Use d_alloc_noblock() in ceph_readdir_prepopulate() NeilBrown
2026-03-12 21:12 ` [PATCH 38/53] cephfs: Don't d_drop() before d_splice_alias() NeilBrown
2026-03-12 21:12 ` [PATCH 39/53] ecryptfs: stop using d_add() NeilBrown
2026-03-12 21:12 ` [PATCH 40/53] gfs2: " NeilBrown
2026-03-12 21:12 ` [PATCH 41/53] libfs: " NeilBrown
2026-03-12 21:12 ` [PATCH 42/53] fuse: don't d_drop() before d_splice_alias() NeilBrown
2026-03-12 21:12 ` [PATCH 43/53] fuse: Use d_alloc_noblock() in fuse_direntplus_link() NeilBrown
2026-03-12 21:12 ` [PATCH 44/53] hostfs: don't d_drop() before d_splice_alias() in hostfs_mkdir() NeilBrown
2026-03-12 21:12 ` [PATCH 45/53] efivarfs: use d_alloc_name() NeilBrown
2026-03-12 21:12 ` [PATCH 46/53] Remove references to d_add() in documentation and comments NeilBrown
2026-03-12 21:12 ` [PATCH 47/53] VFS: make d_alloc() local to VFS NeilBrown
2026-03-12 21:12 ` [PATCH 48/53] VFS: remove d_add() NeilBrown
2026-03-12 21:12 ` [PATCH 49/53] VFS: remove d_rehash() NeilBrown
2026-03-12 21:12 ` [PATCH 50/53] VFS: remove lookup_one() and lookup_noperm() NeilBrown
2026-03-12 21:12 ` [PATCH 51/53] VFS: use d_alloc_parallel() in lookup_one_qstr_excl() NeilBrown
2026-03-12 21:12 ` [PATCH 52/53] VFS: lift d_alloc_parallel above inode_lock NeilBrown
2026-03-12 21:12 ` [PATCH 53/53] VFS: remove LOOKUP_SHARED NeilBrown
2026-03-12 23:38 ` [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops Steven Rostedt
2026-03-13 0:18 ` NeilBrown
2026-03-12 23:46 ` Linus Torvalds
2026-03-13 0:09 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260312214330.3885211-1-neilb@ownmail.net \
--to=neilb@ownmail.net \
--cc=a.hindborg@kernel.org \
--cc=adilger.kernel@dilger.ca \
--cc=agruenba@redhat.com \
--cc=amarkuze@redhat.com \
--cc=amir73il@gmail.com \
--cc=anna@kernel.org \
--cc=anton.ivanov@cambridgegreys.com \
--cc=ardb@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=brauner@kernel.org \
--cc=cem@kernel.org \
--cc=ceph-devel@vger.kernel.org \
--cc=coda@cs.cmu.edu \
--cc=code@tyhicks.com \
--cc=dhowells@redhat.com \
--cc=ecryptfs@vger.kernel.org \
--cc=gfs2@lists.linux.dev \
--cc=hughd@google.com \
--cc=idryomov@gmail.com \
--cc=jack@suse.cz \
--cc=jaharkes@cs.cmu.edu \
--cc=jk@ozlabs.org \
--cc=jlayton@kernel.org \
--cc=johannes@sipsolutions.net \
--cc=leitao@debian.org \
--cc=linkinjeon@kernel.org \
--cc=linux-afs@lists.infradead.org \
--cc=linux-cifs@vger.kernel.org \
--cc=linux-efi@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=linux-um@lists.infradead.org \
--cc=linux-unionfs@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=marc.dionne@auristor.com \
--cc=mhiramat@kernel.org \
--cc=miklos@szeredi.hu \
--cc=neil@brown.name \
--cc=richard@nod.at \
--cc=rostedt@goodmis.org \
--cc=sfrench@samba.org \
--cc=sj1557.seo@samsung.com \
--cc=slava@dubeyko.com \
--cc=torvalds@linux-foundation.org \
--cc=trondmy@kernel.org \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
--cc=yuezhang.mo@sony.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox