From: Al Viro <viro@zeniv.linux.org.uk>
To: NeilBrown <neilb@suse.de>
Cc: Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
Linus Torvalds <torvalds@linux-foundation.org>,
Jeff Layton <jlayton@kernel.org>,
Dave Chinner <david@fromorbit.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 00/19 v7?] RFC: Allow concurrent and async changes in a directory
Date: Sun, 9 Feb 2025 23:33:41 +0000 [thread overview]
Message-ID: <20250209233341.GX1977892@ZenIV> (raw)
In-Reply-To: <20250206054504.2950516-1-neilb@suse.de>
On Thu, Feb 06, 2025 at 04:42:37PM +1100, NeilBrown wrote:
> The idea behind the async support is to eventually connect this to
> io_uring so that one process can launch several concurrent directory
> operations. I have not looked deeply into io_uring and cannot be
> certain that the interface I've provided will be able to be used. I
> would welcome any advice on that matter, though I hope to find time to
> explore myself. For now if any _async op returns -EINPROGRESS we simply
> wait for the callback to indicate completion.
OK, after looking through that and playing around with the locking
scheme of yours:
Separating directory rwsem for reads/modifications from locking of
individual dentries may be feasible, but it needs to be a lot more
careful about the states it sleeps in. Your current variant is rife
with deadlocks; for the "wait on dentry itself" it's probably possible
to avoid, with some care; for "wait on parent" it's really not an option.
Quite a bit of headache comes from the fact that NFS et.al. are playing
silly buggers with "OK, we see that lookup is for <operation>; skip it,
the call of actual method will do the right thing". The trouble is,
d_lookup_done() of not-really-looked-up is fine under exclusive lock on
parent, but only because there won't be d_alloc_parallel() on the same
name until we drop that exclusive lock.
Your scheme, OTOH, has hard dependency upon those suckers staying visible
to d_alloc_parallel() until the actual operation is done. Which means
that this code, including the methods, is exposed to in-lookup dentries.
What's more, similar dependency is there for dentries getting unhashed
between the lookup and the end of operation - something which NFS
cheerfully violates. If method's argument gets hit with d_drop() and
d_rehash(), there's a window where it won't be found in dcache, leaving
no indication that it's being operated upon. Currently we are fine -
exclusive lock on parent means that on dcache miss we try to grab
the parent shared and repeat dcache lookup when we get that.
Your variant does not have such exclusion - parent is held shared and
child dentry involved is not there to be found during d_drop()/d_rehash()
window.
IOW, your in-update state might make sense, but not in the way it's done
at the moment - it's too brittle.
And the part about async tree topology modifications are bloody insane,
IMO. I won't believe that to be feasible until I see the algorithm and
proof of correctness; preferably _before_ the actual code.
prev parent reply other threads:[~2025-02-09 23:33 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-06 5:42 [PATCH 00/19 v7?] RFC: Allow concurrent and async changes in a directory NeilBrown
2025-02-06 5:42 ` [PATCH 01/19] VFS: introduce vfs_mkdir_return() NeilBrown
2025-02-06 12:24 ` Christian Brauner
2025-02-06 23:52 ` NeilBrown
2025-02-06 13:52 ` Jeff Layton
2025-02-06 23:57 ` NeilBrown
2025-02-07 19:45 ` Al Viro
2025-02-10 4:36 ` NeilBrown
2025-02-06 5:42 ` [PATCH 02/19] VFS: use global wait-queue table for d_alloc_parallel() NeilBrown
2025-02-07 19:32 ` Al Viro
2025-02-10 4:58 ` NeilBrown
2025-02-10 5:15 ` Al Viro
2025-02-11 23:35 ` NeilBrown
2025-02-12 0:25 ` Al Viro
2025-02-12 1:46 ` NeilBrown
2025-02-06 5:42 ` [PATCH 03/19] VFS: use d_alloc_parallel() in lookup_one_qstr_excl() and rename it NeilBrown
2025-02-06 14:30 ` Jeff Layton
2025-02-07 0:04 ` NeilBrown
2025-02-07 0:23 ` Jeff Layton
2025-02-07 20:01 ` Al Viro
2025-02-06 5:42 ` [PATCH 04/19] VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry NeilBrown
2025-02-06 12:31 ` Christian Brauner
2025-02-06 13:09 ` Christian Brauner
2025-02-07 0:08 ` NeilBrown
2025-02-06 5:42 ` [PATCH 05/19] VFS: add common error checks to lookup_one_qstr() NeilBrown
2025-02-06 12:33 ` Christian Brauner
2025-02-07 20:14 ` Al Viro
2025-02-09 20:23 ` Al Viro
2025-02-06 5:42 ` [PATCH 06/19] VFS: repack DENTRY_ flags NeilBrown
2025-02-06 12:34 ` (subset) " Christian Brauner
2025-02-06 5:42 ` [PATCH 07/19] VFS: repack LOOKUP_ bit flags NeilBrown
2025-02-06 12:44 ` Christian Brauner
2025-02-07 0:24 ` NeilBrown
2025-02-06 12:54 ` (subset) " Christian Brauner
2025-02-06 5:42 ` [PATCH 08/19] VFS: introduce lookup_and_lock() and friends NeilBrown
2025-02-06 13:49 ` Christian Brauner
2025-02-07 1:28 ` NeilBrown
2025-02-07 20:22 ` Al Viro
2025-02-08 23:18 ` Al Viro
2025-02-12 5:22 ` NeilBrown
2025-02-12 15:51 ` Al Viro
2025-02-12 20:11 ` Al Viro
2025-02-12 4:49 ` NeilBrown
2025-02-06 5:42 ` [PATCH 09/19] VFS: add _async versions of the various directory modifying inode_operations NeilBrown
2025-02-06 13:15 ` Christian Brauner
2025-02-07 1:46 ` NeilBrown
2025-02-07 22:41 ` Al Viro
2025-02-09 1:09 ` Al Viro
2025-02-09 4:57 ` Al Viro
2025-02-06 5:42 ` [PATCH 10/19] VFS: introduce inode flags to report locking needs for directory ops NeilBrown
2025-02-06 13:22 ` Christian Brauner
2025-02-07 2:01 ` NeilBrown
2025-02-06 5:42 ` [PATCH 11/19] VFS: Add ability to exclusively lock a dentry and use for create/remove operations NeilBrown
2025-02-08 1:38 ` Al Viro
2025-02-09 6:40 ` Al Viro
2025-02-06 5:42 ` [PATCH 12/19] VFS: enhance d_splice_alias to accommodate shared-lock updates NeilBrown
2025-02-06 5:42 ` [PATCH 13/19] VFS: lock dentry for ->revalidate to avoid races with rename etc NeilBrown
2025-02-07 20:28 ` Al Viro
2025-02-07 20:35 ` Al Viro
2025-02-08 1:30 ` Al Viro
2025-02-08 1:35 ` Al Viro
2025-02-12 21:22 ` Al Viro
2025-02-06 5:42 ` [PATCH 14/19] VFS: Ensure no async updates happening in directory being removed NeilBrown
2025-02-06 14:06 ` Christian Brauner
2025-02-07 2:17 ` NeilBrown
2025-02-07 21:06 ` Al Viro
2025-02-08 22:06 ` Al Viro
2025-02-08 22:30 ` Linus Torvalds
2025-02-08 22:34 ` Linus Torvalds
2025-02-08 23:25 ` Al Viro
2025-02-06 5:42 ` [PATCH 15/19] VFS: Change lookup_and_lock() to use shared lock when possible NeilBrown
2025-02-06 5:42 ` [PATCH 16/19] VFS: add lookup_and_lock_rename() NeilBrown
2025-02-07 21:21 ` Al Viro
2025-02-06 5:42 ` [PATCH 17/19] nfsd: use lookup_and_lock_one() and lookup_and_lock_rename_one() NeilBrown
2025-02-06 5:42 ` [PATCH 18/19] nfs: change mkdir inode_operation to mkdir_async NeilBrown
2025-02-06 5:42 ` [PATCH 19/19] nfs: switch to _async for all directory ops NeilBrown
2025-02-13 3:51 ` Al Viro
2025-02-13 4:09 ` Al Viro
2025-02-13 18:01 ` Al Viro
2025-02-06 14:36 ` [PATCH 00/19 v7?] RFC: Allow concurrent and async changes in a directory Christian Brauner
2025-02-06 15:36 ` John Stoffel
2025-02-07 2:18 ` NeilBrown
2025-02-09 23:33 ` Al Viro [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250209233341.GX1977892@ZenIV \
--to=viro@zeniv.linux.org.uk \
--cc=brauner@kernel.org \
--cc=david@fromorbit.com \
--cc=jack@suse.cz \
--cc=jlayton@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=neilb@suse.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).