From: "John Stoffel" <john@stoffel.org>
To: NeilBrown <neilb@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
Linus Torvalds <torvalds@linux-foundation.org>,
Daire Byrne <daire@dneg.com>,
Trond Myklebust <trond.myklebust@hammerspace.com>,
Chuck Lever <chuck.lever@oracle.com>,
Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
linux-fsdevel@vger.kernel.org,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH/RFC 00/10 v5] Improve scalability of directory operations
Date: Fri, 26 Aug 2022 10:42:00 -0400 [thread overview]
Message-ID: <25352.56248.283092.213037@quad.stoffel.home> (raw)
In-Reply-To: <166147828344.25420.13834885828450967910.stgit@noble.brown>
>>>>> "NeilBrown" == NeilBrown <neilb@suse.de> writes:
NeilBrown> [I made up "v5" - I haven't been counting]
My first comments, but I'm not a serious developer...
NeilBrown> VFS currently holds an exclusive lock on the directory while making
NeilBrown> changes: add, remove, rename.
NeilBrown> When multiple threads make changes in the one directory, the contention
NeilBrown> can be noticeable.
NeilBrown> In the case of NFS with a high latency link, this can easily be
NeilBrown> demonstrated. NFS doesn't really need VFS locking as the server ensures
NeilBrown> correctness.
NeilBrown> Lustre uses a single(?) directory for object storage, and has patches
NeilBrown> for ext4 to support concurrent updates (Lustre accesses ext4 directly,
NeilBrown> not via the VFS).
NeilBrown> XFS (it is claimed) doesn't its own locking and doesn't need the VFS to
NeilBrown> help at all.
This sentence makes no sense to me... I assume you meant to say "...does
it's own locking..."
NeilBrown> This patch series allows filesystems to request a shared lock on
NeilBrown> directories and provides serialisation on just the affected name, not the
NeilBrown> whole directory. It changes both the VFS and NFSD to use shared locks
NeilBrown> when appropriate, and changes NFS to request shared locks.
Are there any performance results? Why wouldn't we just do a shared
locked across all VFS based filesystems?
NeilBrown> The central enabling feature is a new dentry flag DCACHE_PAR_UPDATE
NeilBrown> which acts as a bit-lock. The ->d_lock spinlock is taken to set/clear
NeilBrown> it, and wait_var_event() is used for waiting. This flag is set on all
NeilBrown> dentries that are part of a directory update, not just when a shared
NeilBrown> lock is taken.
NeilBrown> When a shared lock is taken we must use alloc_dentry_parallel() which
NeilBrown> needs a wq which must remain until the update is completed. To make use
NeilBrown> of concurrent create, kern_path_create() would need to be passed a wq.
NeilBrown> Rather than the churn required for that, we use exclusive locking when
NeilBrown> no wq is provided.
Is this a per-operation wq or a per-directory wq? Can there be issues
if someone does something silly like having 1,000 directories, all of
which have multiple processes making parallel changes?
Does it degrade gracefully if a wq can't be allocated?
NeilBrown> One interesting consequence of this is that silly-rename becomes a
NeilBrown> little more complex. As the directory may not be exclusively locked,
NeilBrown> the new silly-name needs to be locked (DCACHE_PAR_UPDATE) as well.
NeilBrown> A new LOOKUP_SILLY_RENAME is added which helps implement this using
NeilBrown> common code.
NeilBrown> While testing I found some odd behaviour that was caused by
NeilBrown> d_revalidate() racing with rename(). To resolve this I used
NeilBrown> DCACHE_PAR_UPDATE to ensure they cannot race any more.
NeilBrown> Testing, review, or other comments would be most welcome,
next prev parent reply other threads:[~2022-08-26 14:49 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-26 2:10 [PATCH/RFC 00/10 v5] Improve scalability of directory operations NeilBrown
2022-08-26 2:10 ` [PATCH 09/10] VFS: add LOOKUP_SILLY_RENAME NeilBrown
2022-08-27 1:21 ` Al Viro
2022-08-29 3:15 ` NeilBrown
2022-08-26 2:10 ` [PATCH 10/10] NFS: support parallel updates in the one directory NeilBrown
2022-08-26 15:31 ` John Stoffel
2022-08-26 23:13 ` NeilBrown
2022-08-26 2:10 ` [PATCH 08/10] NFSD: allow parallel creates from nfsd NeilBrown
2022-08-27 4:37 ` Al Viro
2022-08-29 3:12 ` NeilBrown
2022-08-26 2:10 ` [PATCH 06/10] VFS: support concurrent renames NeilBrown
2022-08-27 4:12 ` Al Viro
2022-08-29 3:08 ` NeilBrown
2022-08-26 2:10 ` [PATCH 02/10] VFS: move EEXIST and ENOENT tests into lookup_hash_update() NeilBrown
2022-08-26 2:10 ` [PATCH 01/10] VFS: support parallel updates in the one directory NeilBrown
2022-08-26 19:06 ` Linus Torvalds
2022-08-26 23:06 ` NeilBrown
2022-08-27 0:13 ` Linus Torvalds
2022-08-27 0:23 ` Al Viro
2022-08-27 21:14 ` Al Viro
2022-08-27 0:17 ` Al Viro
2022-09-01 0:31 ` NeilBrown
2022-09-01 3:44 ` Al Viro
2022-08-27 3:43 ` Al Viro
2022-08-29 1:59 ` NeilBrown
2022-09-03 0:06 ` Al Viro
2022-09-03 1:40 ` NeilBrown
2022-09-03 2:12 ` Al Viro
2022-09-03 17:52 ` Al Viro
2022-09-04 23:33 ` NeilBrown
2022-08-26 2:10 ` [PATCH 07/10] VFS: hold DCACHE_PAR_UPDATE lock across d_revalidate() NeilBrown
2022-08-26 2:10 ` [PATCH 03/10] VFS: move want_write checks into lookup_hash_update() NeilBrown
2022-08-27 3:48 ` Al Viro
2022-08-26 2:10 ` [PATCH 04/10] VFS: move dput() and mnt_drop_write() into done_path_update() NeilBrown
2022-08-26 2:10 ` [PATCH 05/10] VFS: export done_path_update() NeilBrown
2022-08-26 14:42 ` John Stoffel [this message]
2022-08-26 23:30 ` [PATCH/RFC 00/10 v5] Improve scalability of directory operations NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=25352.56248.283092.213037@quad.stoffel.home \
--to=john@stoffel.org \
--cc=chuck.lever@oracle.com \
--cc=daire@dneg.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=torvalds@linux-foundation.org \
--cc=trond.myklebust@hammerspace.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox