From: Al Viro <viro@zeniv.linux.org.uk>
To: NeilBrown <neilb@suse.de>
Cc: Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
Linus Torvalds <torvalds@linux-foundation.org>,
Jeff Layton <jlayton@kernel.org>,
Dave Chinner <david@fromorbit.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 08/19] VFS: introduce lookup_and_lock() and friends
Date: Wed, 12 Feb 2025 15:51:32 +0000 [thread overview]
Message-ID: <20250212155132.GQ1977892@ZenIV> (raw)
In-Reply-To: <173933773664.22054.1727909798811618895@noble.neil.brown.name>
On Wed, Feb 12, 2025 at 04:22:16PM +1100, NeilBrown wrote:
> lookup_for_removal() etc would only be temporarily needed. Eventually
> (I hope) we would get to a place where all filesystems support all
> operations with only a shared lock. When we get there,
> lookup_for_remove() and lookup_for_create() would be identical again.
>
> And the difference wouldn't be that one takes a shared lock and the
> other takes an exclusive lock. It would be that one takes a shared or
> exclusive lock based on flag X stored somewhere (inode, inode_operations,
> ...) while the other takes a shared or exclusive lock based on flag Y.
>
> It would be nice to be able to accelerate that and push the locking down
> into the filesystems call at once as Linus suggested last time:
>
> https://lore.kernel.org/all/CAHk-=whz69y=98udgGB5ujH6bapYuapwfHS2esWaFrKEoi9-Ow@mail.gmail.com/
>
> That would require either adding a new rwsem to each inode, possibly in
> the filesystem-private part of the inode, or changing VFS to not lock
> the inode at all. The first would be unwelcome by fs developers I
> expect, the second would be a serious challenge. I started thinking
> about and quickly decided I had enough challenges already.
I think it's the wrong way to go.
Your "in-update" state does make sense, but it doesn't go far enough
and it's not really about parallel anything - it's simply "this
dentry is nailed down <here> with <this> name for now".
And _that_ is really useful, provided that it's reliable. What we
need to avoid is d_drop()/d_rehash() windows, when that "operated
upon" dentry ceases to be visible.
Currently we can do that, provided that parent is held exclusive.
Any lookup will hit dcache miss and proceed to lookup_slow()
path, which will block on attempt to get the parent shared.
As soon as you switch to holding parent shared, that pattern becomes
a source of problems.
And if we deal with that, there's not much reason to nest this
dentry lock inside ->i_rwsem. Then ->i_rwsem would become easy
to push inside the methods.
Right now the fundamental problem with your locking is that you
get dentry locks sandwiched between ->i_rwsem on parents and that
on children. We can try to be clever with how we acquire them
(have ->d_parent rechecked before going to sleep, etc.), but
that's rather brittle.
_IF_ we push them outside of ->i_rwsem, the role of ->i_rwsem
would shrink to protecting (1) the directory internal representation,
(2) emptiness checks and (3) link counts.
What goes away is "we are holding it exclusive, so anything that
comes here with dcache miss won't get around to doing anything
until we unlock".
next prev parent reply other threads:[~2025-02-12 15:51 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-06 5:42 [PATCH 00/19 v7?] RFC: Allow concurrent and async changes in a directory NeilBrown
2025-02-06 5:42 ` [PATCH 01/19] VFS: introduce vfs_mkdir_return() NeilBrown
2025-02-06 12:24 ` Christian Brauner
2025-02-06 23:52 ` NeilBrown
2025-02-06 13:52 ` Jeff Layton
2025-02-06 23:57 ` NeilBrown
2025-02-07 19:45 ` Al Viro
2025-02-10 4:36 ` NeilBrown
2025-02-06 5:42 ` [PATCH 02/19] VFS: use global wait-queue table for d_alloc_parallel() NeilBrown
2025-02-07 19:32 ` Al Viro
2025-02-10 4:58 ` NeilBrown
2025-02-10 5:15 ` Al Viro
2025-02-11 23:35 ` NeilBrown
2025-02-12 0:25 ` Al Viro
2025-02-12 1:46 ` NeilBrown
2025-02-06 5:42 ` [PATCH 03/19] VFS: use d_alloc_parallel() in lookup_one_qstr_excl() and rename it NeilBrown
2025-02-06 14:30 ` Jeff Layton
2025-02-07 0:04 ` NeilBrown
2025-02-07 0:23 ` Jeff Layton
2025-02-07 20:01 ` Al Viro
2025-02-06 5:42 ` [PATCH 04/19] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry NeilBrown
2025-02-06 12:31 ` Christian Brauner
2025-02-06 13:09 ` Christian Brauner
2025-02-07 0:08 ` NeilBrown
2025-02-06 5:42 ` [PATCH 05/19] VFS: add common error checks to lookup_one_qstr() NeilBrown
2025-02-06 12:33 ` Christian Brauner
2025-02-07 20:14 ` Al Viro
2025-02-09 20:23 ` Al Viro
2025-02-06 5:42 ` [PATCH 06/19] VFS: repack DENTRY_ flags NeilBrown
2025-02-06 12:34 ` (subset) " Christian Brauner
2025-02-06 5:42 ` [PATCH 07/19] VFS: repack LOOKUP_ bit flags NeilBrown
2025-02-06 12:44 ` Christian Brauner
2025-02-07 0:24 ` NeilBrown
2025-02-06 12:54 ` (subset) " Christian Brauner
2025-02-06 5:42 ` [PATCH 08/19] VFS: introduce lookup_and_lock() and friends NeilBrown
2025-02-06 13:49 ` Christian Brauner
2025-02-07 1:28 ` NeilBrown
2025-02-07 20:22 ` Al Viro
2025-02-08 23:18 ` Al Viro
2025-02-12 5:22 ` NeilBrown
2025-02-12 15:51 ` Al Viro [this message]
2025-02-12 20:11 ` Al Viro
2025-02-12 4:49 ` NeilBrown
2025-02-06 5:42 ` [PATCH 09/19] VFS: add _async versions of the various directory modifying inode_operations NeilBrown
2025-02-06 13:15 ` Christian Brauner
2025-02-07 1:46 ` NeilBrown
2025-02-07 22:41 ` Al Viro
2025-02-09 1:09 ` Al Viro
2025-02-09 4:57 ` Al Viro
2025-02-06 5:42 ` [PATCH 10/19] VFS: introduce inode flags to report locking needs for directory ops NeilBrown
2025-02-06 13:22 ` Christian Brauner
2025-02-07 2:01 ` NeilBrown
2025-02-06 5:42 ` [PATCH 11/19] VFS: Add ability to exclusively lock a dentry and use for create/remove operations NeilBrown
2025-02-08 1:38 ` Al Viro
2025-02-09 6:40 ` Al Viro
2025-02-06 5:42 ` [PATCH 12/19] VFS: enhance d_splice_alias to accommodate shared-lock updates NeilBrown
2025-02-06 5:42 ` [PATCH 13/19] VFS: lock dentry for ->revalidate to avoid races with rename etc NeilBrown
2025-02-07 20:28 ` Al Viro
2025-02-07 20:35 ` Al Viro
2025-02-08 1:30 ` Al Viro
2025-02-08 1:35 ` Al Viro
2025-02-12 21:22 ` Al Viro
2025-02-06 5:42 ` [PATCH 14/19] VFS: Ensure no async updates happening in directory being removed NeilBrown
2025-02-06 14:06 ` Christian Brauner
2025-02-07 2:17 ` NeilBrown
2025-02-07 21:06 ` Al Viro
2025-02-08 22:06 ` Al Viro
2025-02-08 22:30 ` Linus Torvalds
2025-02-08 22:34 ` Linus Torvalds
2025-02-08 23:25 ` Al Viro
2025-02-06 5:42 ` [PATCH 15/19] VFS: Change lookup_and_lock() to use shared lock when possible NeilBrown
2025-02-06 5:42 ` [PATCH 16/19] VFS: add lookup_and_lock_rename() NeilBrown
2025-02-07 21:21 ` Al Viro
2025-02-06 5:42 ` [PATCH 17/19] nfsd: use lookup_and_lock_one() and lookup_and_lock_rename_one() NeilBrown
2025-02-06 5:42 ` [PATCH 18/19] nfs: change mkdir inode_operation to mkdir_async NeilBrown
2025-02-06 5:42 ` [PATCH 19/19] nfs: switch to _async for all directory ops NeilBrown
2025-02-13 3:51 ` Al Viro
2025-02-13 4:09 ` Al Viro
2025-02-13 18:01 ` Al Viro
2025-02-06 14:36 ` [PATCH 00/19 v7?] RFC: Allow concurrent and async changes in a directory Christian Brauner
2025-02-06 15:36 ` John Stoffel
2025-02-07 2:18 ` NeilBrown
2025-02-09 23:33 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250212155132.GQ1977892@ZenIV \
--to=viro@zeniv.linux.org.uk \
--cc=brauner@kernel.org \
--cc=david@fromorbit.com \
--cc=jack@suse.cz \
--cc=jlayton@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=neilb@suse.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).