All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: "Tobin C. Harding" <me@tobin.cc>
Cc: "Tobin C. Harding" <tobin@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Roman Gushchin <guro@fb.com>,
	Alexander Viro <viro@ftp.linux.org.uk>,
	Christoph Hellwig <hch@infradead.org>,
	Pekka Enberg <penberg@cs.helsinki.fi>,
	David Rientjes <rientjes@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Christopher Lameter <cl@linux.com>,
	Matthew Wilcox <willy@infradead.org>,
	Miklos Szeredi <mszeredi@redhat.com>,
	Andreas Dilger <adilger@dilger.ca>,
	Waiman Long <longman@redhat.com>, Tycho Andersen <tycho@tycho.ws>,
	Theodore Ts'o <tytso@mit.edu>, Andi Kleen <ak@linux.intel.com>,
	David Chinner <david@fromorbit.com>,
	Nick Piggin <npiggin@gmail.com>, Rik van Riel <riel@redhat.com>,
	Hugh Dickins <hughd@google.com>, Jonathan Corbet <corbet@lwn.net>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [RFC PATCH v3 14/15] dcache: Implement partial shrink via Slab Movable Objects
Date: Sat, 29 Jun 2019 05:08:44 +0100	[thread overview]
Message-ID: <20190629040844.GS17978@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20190411210200.GH2217@ZenIV.linux.org.uk>

On Thu, Apr 11, 2019 at 10:02:00PM +0100, Al Viro wrote:

> Aaaarrgghhh...  No, we can't.  Look: we get one candidate dentry in isolate
> phase.  We put it into shrink list.  umount(2) comes and calls
> shrink_dcache_for_umount(), which calls shrink_dcache_parent(root).
> In the meanwhile, shrink_dentry_list() is run and does __dentry_kill() on
> that one dentry.  Fine, it's gone - before shrink_dcache_parent() even
> sees it.  Now shrink_dentry_list() holds a reference to its parent and
> is about to drop it in
>                 dentry = parent;
>                 while (dentry && !lockref_put_or_lock(&dentry->d_lockref))
>                         dentry = dentry_kill(dentry);
> And dropped it will be, but... shrink_dcache_parent() has finished the
> scan, without finding *anything* with zero refcount - the thing that used
> to be on the shrink list was already gone before shrink_dcache_parent()
> has gotten there and the reference to parent was not dropped yet.  So
> shrink_dcache_for_umount() plows past shrink_dcache_parent(), walks the
> tree and complains loudly about "busy" dentries (that parent we hadn't
> finished dropping), and then we proceed with filesystem shutdown.
> In the meanwhile, dentry_kill() finally gets to killing dentry and
> triggers an unexpected late call of ->d_iput() on a filesystem that
> has already been far enough into shutdown - far enough to destroy the
> data structures needed for that sucker.
> 
> The reason we don't hit that problem with regular memory shrinker is
> this:
>                 unregister_shrinker(&s->s_shrink);
>                 fs->kill_sb(s);
> in deactivate_locked_super().  IOW, shrinker for this fs is gone
> before we get around to shutdown.  And so are all normal sources
> of dentry eviction for that fs.
> 
> Your earlier variants all suffer the same problem - picking a page
> shared by dentries from several superblocks can run into trouble
> if it overlaps with umount of one of those.

FWIW, I think I see a kinda-sorta sane solution.  Namely, add

static void __dput_to_list(struct dentry *dentry, struct list_head *list)
{
	if (dentry->d_flags & DCACHE_SHRINK_LIST) {
		/* let the owner of the list it's on deal with it */
		--dentry->d_lockref.count;
	} else {
		if (dentry->d_flags & DCACHE_LRU_LIST)
			d_lru_del(dentry);
		if (!--dentry->d_lockref.count)
			d_shrink_add(parent, list);
	}
}

and have
shrink_dentry_list() do this in the end of loop:
                d_shrink_del(dentry);
                parent = dentry->d_parent;
		/* both dentry and parent are locked at that point */
		if (parent != dentry) {
			/*
			 * We need to prune ancestors too. This is necessary to
			 * prevent quadratic behavior of shrink_dcache_parent(),
			 * but is also expected to be beneficial in reducing
			 * dentry cache fragmentation.
			 */
			__dput_to_list(parent, list);
		}
		__dentry_kill(dentry);
        }

instead of
                d_shrink_del(dentry);
                parent = dentry->d_parent;
                __dentry_kill(dentry);
                if (parent == dentry)
                        continue;
                /*
                 * We need to prune ancestors too. This is necessary to prevent
                 * quadratic behavior of shrink_dcache_parent(), but is also
                 * expected to be beneficial in reducing dentry cache
                 * fragmentation.
                 */
                dentry = parent;
                while (dentry && !lockref_put_or_lock(&dentry->d_lockref))
                        dentry = dentry_kill(dentry);
        }
we have there now.  Linus, do you see any problems with that change?  AFAICS,
that should avoid the problem described above.  Moreover, it seems to allow
a fun API addition:

void dput_to_list(struct dentry *dentry, struct list_head *list)
{
	rcu_read_lock();
	if (likely(fast_dput(dentry))) {
		rcu_read_unlock();
		return;
	}
	rcu_read_unlock();
	if (!retain_dentry(dentry))
		__dput_to_list(dentry, list);
	spin_unlock(&dentry->d_lock);
}

allowing to take an empty list, do a bunch of dput_to_list() (under spinlocks,
etc.), then, once we are in better locking conditions, shrink_dentry_list()
to take them all out.  I can see applications for that in e.g. fs/namespace.c -
quite a bit of kludges with ->mnt_ex_mountpoint would be killable that way,
and there would be a chance to transfer the contribution to ->d_count of
mountpoint from struct mount to struct mountpoint (i.e. make any number of
mounts on the same mountpoint dentry contribute only 1 to its ->d_count,
not the number of such mounts).

  reply	other threads:[~2019-06-29  4:09 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-11  1:34 [RFC PATCH v3 00/15] Slab Movable Objects (SMO) Tobin C. Harding
2019-04-11  1:34 ` [RFC PATCH v3 01/15] slub: Add isolate() and migrate() methods Tobin C. Harding
2019-04-11  1:34 ` [RFC PATCH v3 02/15] tools/vm/slabinfo: Add support for -C and -M options Tobin C. Harding
2019-04-11  1:34 ` [RFC PATCH v3 03/15] slub: Sort slab cache list Tobin C. Harding
2019-04-11  1:34 ` [RFC PATCH v3 04/15] slub: Slab defrag core Tobin C. Harding
2019-04-11  1:34 ` [RFC PATCH v3 05/15] tools/vm/slabinfo: Add remote node defrag ratio output Tobin C. Harding
2019-04-11  1:34 ` [RFC PATCH v3 06/15] tools/vm/slabinfo: Add defrag_used_ratio output Tobin C. Harding
2019-04-11  1:34 ` [RFC PATCH v3 07/15] tools/testing/slab: Add object migration test module Tobin C. Harding
2019-04-11  1:34 ` [RFC PATCH v3 08/15] tools/testing/slab: Add object migration test suite Tobin C. Harding
2019-04-11  1:34 ` [RFC PATCH v3 09/15] xarray: Implement migration function for objects Tobin C. Harding
2019-04-11  1:34 ` [RFC PATCH v3 10/15] tools/testing/slab: Add XArray movable objects tests Tobin C. Harding
2019-04-11  1:34 ` [RFC PATCH v3 11/15] slub: Enable moving objects to/from specific nodes Tobin C. Harding
2019-04-11  1:34 ` [RFC PATCH v3 12/15] slub: Enable balancing slabs across nodes Tobin C. Harding
2019-04-11  1:34 ` [RFC PATCH v3 13/15] dcache: Provide a dentry constructor Tobin C. Harding
2019-04-11  1:34 ` [RFC PATCH v3 14/15] dcache: Implement partial shrink via Slab Movable Objects Tobin C. Harding
2019-04-11  2:33   ` Al Viro
2019-04-11  2:48     ` Tobin C. Harding
2019-04-11  4:47       ` Al Viro
2019-04-11  5:05         ` Tobin C. Harding
2019-04-11 20:01         ` Al Viro
2019-04-11 21:02         ` Al Viro
2019-06-29  4:08           ` Al Viro [this message]
2019-06-29  4:38             ` shrink_dentry_list() logics change (was Re: [RFC PATCH v3 14/15] dcache: Implement partial shrink via Slab Movable Objects) Al Viro
2019-06-29 19:06               ` Al Viro
2019-06-29 22:29                 ` Al Viro
2019-06-29 22:34                   ` Al Viro
2019-07-01  9:26                 ` Tobin C. Harding
2019-04-11  1:34 ` [RFC PATCH v3 15/15] dcache: Add CONFIG_DCACHE_SMO Tobin C. Harding

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190629040844.GS17978@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=adilger@dilger.ca \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=corbet@lwn.net \
    --cc=david@fromorbit.com \
    --cc=guro@fb.com \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=longman@redhat.com \
    --cc=me@tobin.cc \
    --cc=mszeredi@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=penberg@cs.helsinki.fi \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=tobin@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tycho@tycho.ws \
    --cc=tytso@mit.edu \
    --cc=viro@ftp.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.