linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org, "Tobin C. Harding" <me@tobin.cc>
Subject: Re: [RFC] simplifying fast_dput(), dentry_kill() et.al.
Date: Mon, 6 Nov 2023 05:53:53 +0000	[thread overview]
Message-ID: <20231106055353.GT1957730@ZenIV> (raw)
In-Reply-To: <20231105195416.GA2771969@ZenIV>

On Sun, Nov 05, 2023 at 07:54:16PM +0000, Al Viro wrote:
> On Tue, Oct 31, 2023 at 12:18:48AM +0000, Al Viro wrote:
> > On Mon, Oct 30, 2023 at 12:18:28PM -1000, Linus Torvalds wrote:
> > > On Mon, 30 Oct 2023 at 11:53, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > > >
> > > > After fixing a couple of brainos, it seems to work.
> > > 
> > > This all makes me unnaturally nervous, probably because it;s overly
> > > subtle, and I have lost the context for some of the rules.
> > 
> > A bit of context: I started to look at the possibility of refcount overflows.
> > Writing the current rules for dentry refcounting and lifetime down was the
> > obvious first step, and that immediately turned into an awful mess.
> > 
> > It is overly subtle.
> 
> 	Another piece of too subtle shite: ordering of ->d_iput() of child
> and __dentry_kill() of parent.  As it is, in some cases it is possible for
> the latter to happen before the former.  It is *not* possible in the cases
> when in-tree ->d_iput() instances actually look at the parent (all of those
> are due to sillyrename stuff), but the proof is convoluted and very brittle.
> 
> 	The origin of that mess is in the interaction of shrink_dcache_for_umount()
> with shrink_dentry_list().  What we want to avoid is a directory looking like
> it's busy since shrink_dcache_for_umount() doesn't see any children to account
> for positive refcount of parent.  The kinda-sorta solution we use is to decrement
> the parent's refcount *before* __dentry_kill() of child and put said parent
> into a shrink list.  That makes shrink_dcache_for_umount() do the right thing,
> but it's possible to end up with parent freed before the child is done with;
> scenario is non-obvious, and rather hard to hit, but it's not impossible.

D'oh...  We actually don't need to worry about eviction on memory pressure at that
point; unregister_shrinker() is done early enough to prevent that.

So shrink_dcache_for_umount() does not need to worry about shrink lists use
in prune_dcache_sb().

Use in namespace_unlock() is guaranteed that all dentries involved either
have a matching mount in the list of mounts to be dropped (and thus protected
from simultaneous fs shutdown) or have a matching mount pinned by the caller.

Use in mntput_no_expire() is guaranteed the same - all dentries involved are
on superblock of mount we are going to drop after the call of shrink_dentry_list().

All other users also either have an active reference to superblock or are done
by ->kill_sb() synchronously (and thus can't race with shrink_dcache_for_umount())
or are done async, but flushed and/or waited for by foofs_kill_sb() before
they get to shrink_dcache_for_umount().

IOW, I'd been too paranoid in "Teach shrink_dcache_parent() to cope with
mixed-filesystem shrink lists" - the real requirements are milder; in-tree
users didn't need these games with parent.  Dcache side of Tobin's Slab
Movable Objects patches needed those, though...

AFAICS, there are 3 options:
	1) leave the current weirdness with ->d_iput() on child vs __dentry_kill()
on parent.  Document the requirement to ->d_iput() (and ->d_release()) to cope
with that, promise that in case of sillyrename the ordering will be there and
write down the proof of that.  No code changes, rather revolting docs to
write, trouble waiting to happen in ->d_iput().
	2) require that shrink_dentry_list() should never overlap with
shrink_dcache_for_umount() on any of the filesystems represented in the
shrink list, guarantee that parent won't get to __dentry_kill() before
the child gets through __dentry_kill() completely and accept that resurrecting
SMO stuff will require more work.  Smallish patch, tolerable docs, probably
the best option at the moment.
	3) bite the bullet and get shrink_dentry_list() to coexist with
shrink_dcache_for_umount(), with sane ordering of ->d_iput() vs. parent's
__dentry_kill().  Doable, but AFAICS it will take a counter of children
currently being killed in the parent dentry.  shrink_dentry_list() would
bump that on parent, __dentry_kill() the victim, then relock the parent
and decrement that counter along with the main refcount.  That would allow
the shrink_dcache_for_umount() to cope with that crap.  No requirements
for shrink_dentry_kill() callers that way, sane environment for ->d_iput(),
no obstacles for SMO stuff.  OTOH, we need to get space for additional
counter in struct dentry; again, doable (->d_subdirs/->d_child can be
converted to hlist, saving us a pointer in each dentry), but... I'd
leave that option alone until something that needs it would show up
(e.g. if/when Tobin resurrects his patchset).

	My preference would be (2) for the coming cycle + prototype of
a patch doing (3) on top of that for the future.

Completely untested diff for (2) (on top of #work.dcache, sans the
documentation update) below:

diff --git a/fs/dcache.c b/fs/dcache.c
index ccf41c5ee804..c978207f3fc4 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1086,10 +1086,27 @@ void d_prune_aliases(struct inode *inode)
 }
 EXPORT_SYMBOL(d_prune_aliases);
 
+static inline void shrink_kill(struct dentry *victim, struct list_head *list)
+{
+	struct dentry *parent = victim->d_parent;
+
+	__dentry_kill(victim);
+
+	if (parent == victim || lockref_put_or_lock(&parent->d_lockref))
+		return;
+
+	if (!WARN_ON_ONCE(parent->d_lockref.count != 1)) {
+		parent->d_lockref.count = 0;
+		to_shrink_list(parent, list);
+	}
+	spin_unlock(&parent->d_lock);
+}
+
+
 void shrink_dentry_list(struct list_head *list)
 {
 	while (!list_empty(list)) {
-		struct dentry *dentry, *parent;
+		struct dentry *dentry;
 
 		dentry = list_entry(list->prev, struct dentry, d_lru);
 		spin_lock(&dentry->d_lock);
@@ -1106,10 +1123,7 @@ void shrink_dentry_list(struct list_head *list)
 		}
 		rcu_read_unlock();
 		d_shrink_del(dentry);
-		parent = dentry->d_parent;
-		if (parent != dentry && !--parent->d_lockref.count)
-			to_shrink_list(parent, list);
-		__dentry_kill(dentry);
+		shrink_kill(dentry, list);
 	}
 }
 
@@ -1537,19 +1551,14 @@ void shrink_dcache_parent(struct dentry *parent)
 			break;
 		data.victim = NULL;
 		d_walk(parent, &data, select_collect2);
-		if (data.victim) {
-			struct dentry *parent;
+		if (data.victim) { // rcu_read_lock held - see select_collect2()
 			spin_lock(&data.victim->d_lock);
 			if (!lock_for_kill(data.victim)) {
 				spin_unlock(&data.victim->d_lock);
 				rcu_read_unlock();
 			} else {
 				rcu_read_unlock();
-				parent = data.victim->d_parent;
-				if (parent != data.victim &&
-				    !--parent->d_lockref.count)
-					to_shrink_list(parent, &data.dispose);
-				__dentry_kill(data.victim);
+				shrink_kill(data.victim, &data.dispose);
 			}
 		}
 		if (!list_empty(&data.dispose))

  parent reply	other threads:[~2023-11-06  5:54 UTC|newest]

Thread overview: 119+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-30  0:37 [RFC] simplifying fast_dput(), dentry_kill() et.al Al Viro
2023-10-30 21:53 ` Al Viro
2023-10-30 22:18   ` Linus Torvalds
2023-10-31  0:18     ` Al Viro
2023-10-31  1:53       ` Al Viro
2023-10-31  6:12         ` Al Viro
2023-11-01  6:18           ` Al Viro
2023-11-01  6:20           ` [PATCH 01/15] fast_dput(): having ->d_delete() is not reason to delay refcount decrement Al Viro
2023-11-01  6:20             ` [PATCH 02/15] fast_dput(): handle underflows gracefully Al Viro
2023-11-01  6:20             ` [PATCH 03/15] fast_dput(): new rules for refcount Al Viro
2023-11-01  6:20             ` [PATCH 04/15] __dput_to_list(): do decrement of refcount in the caller Al Viro
2023-11-01  6:20             ` [PATCH 05/15] retain_dentry(): lift decrement of ->d_count into callers Al Viro
2023-11-01  6:20             ` [PATCH 06/15] __dentry_kill(): get consistent rules for ->d_count Al Viro
2023-11-01  6:20             ` [PATCH 07/15] dentry_kill(): don't bother with retain_dentry() on slow path Al Viro
2023-11-01  6:20             ` [PATCH 08/15] Call retain_dentry() with refcount 0 Al Viro
2023-11-01  6:20             ` [PATCH 09/15] fold the call of retain_dentry() into fast_dput() Al Viro
2023-11-01  8:45               ` Al Viro
2023-11-01 17:30                 ` Linus Torvalds
2023-11-01 18:19                   ` Al Viro
2023-11-10  4:20                     ` lockless case of retain_dentry() (was Re: [PATCH 09/15] fold the call of retain_dentry() into fast_dput()) Al Viro
2023-11-10  5:57                       ` Linus Torvalds
2023-11-10  6:22                         ` Linus Torvalds
2023-11-22  6:29                           ` Guo Ren
2023-11-10  8:19                         ` Al Viro
2023-11-22  7:19                         ` Guo Ren
2023-11-22 17:20                           ` Linus Torvalds
2023-11-22 17:52                             ` Linus Torvalds
2023-11-22 18:05                               ` Linus Torvalds
2023-11-22 19:11                               ` Linus Torvalds
2023-11-29  7:14                                 ` Guo Ren
2023-11-29 12:25                                 ` Guo Ren
2023-11-29 14:42                                   ` Linus Torvalds
2023-11-26 16:39                             ` Guo Ren
2023-11-26 16:51                               ` Linus Torvalds
2023-11-30 10:00                                 ` Guo Ren
2023-12-01  1:09                                   ` Linus Torvalds
2023-12-01  3:36                                     ` Guo Ren
2023-12-01  5:15                                       ` Linus Torvalds
2023-12-01  7:31                                         ` Guo Ren
2023-11-26 16:51                               ` Guo Ren
2023-11-26 17:06                               ` Linus Torvalds
2023-11-26 17:59                                 ` Linus Torvalds
2023-11-29  9:52                                 ` Guo Ren
2023-11-01  6:20             ` [PATCH 10/15] don't try to cut corners in shrink_lock_dentry() Al Viro
2023-11-01  6:21             ` [PATCH 11/15] fold dentry_kill() into dput() Al Viro
2023-11-01  6:21             ` [PATCH 12/15] get rid of __dget() Al Viro
2023-11-01  6:21             ` [PATCH 13/15] shrink_dentry_list(): no need to check that dentry refcount is marked dead Al Viro
2023-11-01  6:21             ` [PATCH 14/15] to_shrink_list(): call only if refcount is 0 Al Viro
2023-11-01  6:21             ` [PATCH 15/15] switch select_collect{,2}() to use of to_shrink_list() Al Viro
2023-11-01  2:22       ` [RFC] simplifying fast_dput(), dentry_kill() et.al Al Viro
2023-11-01 14:29         ` Benjamin Coddington
2023-11-05 19:54       ` Al Viro
2023-11-05 21:59         ` Al Viro
2023-11-06  5:53         ` Al Viro [this message]
2023-11-07  2:08           ` Al Viro
2023-11-09  6:19             ` [RFC][PATCHSET v2] " Al Viro
2023-11-09  6:20               ` [PATCH 01/22] struct dentry: get rid of randomize_layout idiocy Al Viro
2023-11-09  6:20                 ` [PATCH 02/22] switch nfsd_client_rmdir() to use of simple_recursive_removal() Al Viro
2023-11-09 13:42                   ` Christian Brauner
2023-11-09 14:01                   ` Chuck Lever
2023-11-09 18:47                     ` Al Viro
2023-11-09 18:50                       ` Chuck Lever III
2023-11-09  6:20                 ` [PATCH 03/22] coda_flag_children(): cope with dentries turning negative Al Viro
2023-11-09 13:43                   ` Christian Brauner
2023-11-09  6:20                 ` [PATCH 04/22] dentry: switch the lists of children to hlist Al Viro
2023-11-09 13:48                   ` Christian Brauner
2023-11-09 19:32                     ` Al Viro
2023-11-09  6:20                 ` [PATCH 05/22] centralize killing dentry from shrink list Al Viro
2023-11-09 13:49                   ` Christian Brauner
2023-11-09  6:20                 ` [PATCH 06/22] get rid of __dget() Al Viro
2023-11-09 13:50                   ` Christian Brauner
2023-11-09  6:20                 ` [PATCH 07/22] shrink_dentry_list(): no need to check that dentry refcount is marked dead Al Viro
2023-11-09 13:53                   ` Christian Brauner
2023-11-09 20:28                     ` Al Viro
2023-11-09  6:20                 ` [PATCH 08/22] fast_dput(): having ->d_delete() is not reason to delay refcount decrement Al Viro
2023-11-09 13:58                   ` Christian Brauner
2023-11-09  6:20                 ` [PATCH 09/22] fast_dput(): handle underflows gracefully Al Viro
2023-11-09 14:46                   ` Christian Brauner
2023-11-09 20:39                     ` Al Viro
2023-11-09  6:20                 ` [PATCH 10/22] fast_dput(): new rules for refcount Al Viro
2023-11-09 14:54                   ` Christian Brauner
2023-11-09 20:52                     ` Al Viro
2023-11-09  6:20                 ` [PATCH 11/22] __dput_to_list(): do decrement of refcount in the callers Al Viro
2023-11-09 15:21                   ` Christian Brauner
2023-11-09  6:20                 ` [PATCH 12/22] Make retain_dentry() neutral with respect to refcounting Al Viro
2023-11-09 15:22                   ` Christian Brauner
2023-11-09  6:20                 ` [PATCH 13/22] __dentry_kill(): get consistent rules for victim's refcount Al Viro
2023-11-09 15:27                   ` Christian Brauner
2023-11-09  6:20                 ` [PATCH 14/22] dentry_kill(): don't bother with retain_dentry() on slow path Al Viro
2023-11-09 15:53                   ` Christian Brauner
2023-11-09 21:29                     ` Al Viro
2023-11-09  6:20                 ` [PATCH 15/22] Call retain_dentry() with refcount 0 Al Viro
2023-11-09 16:09                   ` Christian Brauner
2023-11-09  6:20                 ` [PATCH 16/22] fold the call of retain_dentry() into fast_dput() Al Viro
2023-11-09 16:17                   ` Christian Brauner
2023-11-09  6:20                 ` [PATCH 17/22] don't try to cut corners in shrink_lock_dentry() Al Viro
2023-11-09 17:20                   ` Christian Brauner
2023-11-09 21:45                     ` Al Viro
2023-11-10  9:07                       ` Christian Brauner
2023-11-09 17:39                   ` Linus Torvalds
2023-11-09 18:11                     ` Linus Torvalds
2023-11-09 18:20                     ` Al Viro
2023-11-09  6:20                 ` [PATCH 18/22] fold dentry_kill() into dput() Al Viro
2023-11-09 17:22                   ` Christian Brauner
2023-11-09  6:20                 ` [PATCH 19/22] to_shrink_list(): call only if refcount is 0 Al Viro
2023-11-09 17:29                   ` Christian Brauner
2023-11-09  6:20                 ` [PATCH 20/22] switch select_collect{,2}() to use of to_shrink_list() Al Viro
2023-11-09 17:31                   ` Christian Brauner
2023-11-09  6:20                 ` [PATCH 21/22] d_prune_aliases(): use a shrink list Al Viro
2023-11-09 17:33                   ` Christian Brauner
2023-11-09  6:20                 ` [PATCH 22/22] __dentry_kill(): new locking scheme Al Viro
2023-11-10 13:34                   ` Christian Brauner
2023-11-09 13:33                 ` [PATCH 01/22] struct dentry: get rid of randomize_layout idiocy Christian Brauner
2023-10-31  2:25     ` [RFC] simplifying fast_dput(), dentry_kill() et.al Gao Xiang
2023-10-31  2:29       ` Gao Xiang
2023-10-31  3:02       ` Linus Torvalds
2023-10-31  3:13         ` Gao Xiang
2023-10-31  3:26         ` Al Viro
2023-10-31  3:41           ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231106055353.GT1957730@ZenIV \
    --to=viro@zeniv.linux.org.uk \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=me@tobin.cc \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).