linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Rik van Riel <riel@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andi Kleen <andi@firstfloor.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Greg Thelen <gthelen@google.com>,
	Christoph Hellwig <hch@infradead.org>,
	Hugh Dickins <hughd@google.com>, Jan Kara <jack@suse.cz>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Mel Gorman <mgorman@suse.de>, Minchan Kim <minchan.kim@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Michel Lespinasse <walken@google.com>,
	Seth Jennings <sjenning@linux.vnet.ibm.com>,
	Roman Gushchin <klamm@yandex-team.ru>,
	Ozgun Erdogan <ozgun@citusdata.com>,
	Metin Doslu <metin@citusdata.com>,
	Vlastimil Babka <vbabka@suse.cz>, Tejun Heo <tj@kernel.org>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [patch 0/8] mm: thrash detection-based file cache sizing v5
Date: Wed, 16 Oct 2013 18:31:04 -0400	[thread overview]
Message-ID: <20131016223104.GA738@cmpxchg.org> (raw)
In-Reply-To: <20131016022606.GD4446@dastard>

On Wed, Oct 16, 2013 at 01:26:06PM +1100, Dave Chinner wrote:
> On Tue, Oct 15, 2013 at 10:05:26PM -0400, Rik van Riel wrote:
> > On 10/15/2013 07:41 PM, Dave Chinner wrote:
> > > On Tue, Oct 15, 2013 at 01:41:28PM -0400, Johannes Weiner wrote:
> > 
> > >> I'm not forgetting about them, I just track them very coarsely by
> > >> linking up address spaces and then lazily enforce their upper limit
> > >> when memory is tight by using the shrinker callback.  The assumption
> > >> was that actually scanning them is such a rare event that we trade the
> > >> rare computational costs for smaller memory consumption most of the
> > >> time.
> > > 
> > > Sure, I understand the tradeoff that you made. But there's nothing
> > > worse than a system that slows down unpredictably because of some
> > > magic threshold in some subsystem has been crossed and
> > > computationally expensive operations kick in.
> > 
> > The shadow shrinker should remove the radix nodes with
> > the oldest shadow entries first, so true LRU should actually
> > work for the radix tree nodes.
> > 
> > Actually, since we only care about the age of the youngest
> > shadow entry in each radix tree node, FIFO will be the same
> > as LRU for that list.
> > 
> > That means the shrinker can always just take the radix tree
> > nodes off the end.
> 
> Right, but it can't necessarily free the node as it may still have
> pointers to pages in it. In that case, it would have to simply
> rotate the page to the end of the LRU again.
> 
> Unless, of course, we kept track of the number of exceptional
> entries in a node and didn't add it to the reclaim list until there
> were no non-expceptional entries in the node....

I think that's doable.  As long as there is an actual page in the
node, no memory is going to be freed anyway.  And we have a full int
per node with only 64 slots, we can easily split the counter.

> > >> But it
> > >> looks like tracking radix tree nodes with a list and backpointers to
> > >> the mapping object for the lock etc. will be a major pain in the ass.
> > > 
> > > Perhaps so - it may not work out when we get down to the fine
> > > details...
> > 
> > I suspect that a combination of lifetime rules (inode cannot
> > disappear until all the radix tree nodes) and using RCU free
> > for the radix tree nodes, and the inodes might do the trick.
> > 
> > That would mean that, while holding the rcu read lock, the
> > back pointer from a radix tree node to the inode will always
> > point to valid memory.
> 
> Yes, that is what I was thinking...
> 
> > That allows the shrinker to lock the inode, and verify that
> > the inode is still valid, before it attempts to rcu free the
> > radix tree node with shadow entries.
> 
> Lock the mapping, not the inode. The radix tree is protected by the
> mapping_lock, not an inode lock. i.e. I'd hope that this can all b
> contained within the struct address_space and not require any
> knowledge of inodes or inode lifecycles at all.

Agreed, we can point to struct address_space and invalidate it by
setting mapping->host to NULL or so during the RCU grace period.

Also, the parent pointer is in a union with the two-word rcu_head, so
we get the address space backpointer for free.

The struct list_head for the FIFO, as you said, we can get for free as
well (at least on slab).

The FIFO lists themselves can be trimmed by a shrinker, I think.  You
were concerned about wind-up but if the nodes are not in excess and
->count just returns 0 until we are actually prepared to shrink
objects, then there shouldn't be any windup, right?

I don't see a natural threshold for "excess" yet, though, because
there is no relationship between where radix_tree_node is allocated
and which zones the contained shadow entries point to, so it's hard to
estimate how worthwile any given radix_tree_node is to a node.  Maybe
tying it to an event might be better, like reclaim pressure, swapping
etc.

> > It also means that locking only needs to be in the inode,
> > and on the LRU list for shadow radix tree nodes.
> > 
> > Does that sound sane?
> > 
> > Am I overlooking something?
> 
> It's pretty much along the same lines of what I was thinking, but
> lets see what Johannes thinks.

That sounds great to me.  I just have looked at this code for so long
that I'm getting blind towards it, so I really appreciate your input.

  reply	other threads:[~2013-10-16 22:31 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-10 21:46 [patch 0/8] mm: thrash detection-based file cache sizing v5 Johannes Weiner
2013-10-10 21:46 ` [patch 1/8] fs: cachefiles: use add_to_page_cache_lru() Johannes Weiner
2013-10-10 21:46 ` [patch 2/8] lib: radix-tree: radix_tree_delete_item() Johannes Weiner
2013-10-10 21:46 ` [patch 3/8] mm: shmem: save one radix tree lookup when truncating swapped pages Johannes Weiner
2013-10-10 21:46 ` [patch 4/8] mm: filemap: move radix tree hole searching here Johannes Weiner
2013-10-10 21:46 ` [patch 5/8] mm + fs: prepare for non-page entries in page cache radix trees Johannes Weiner
2013-10-10 21:47 ` [patch 6/8] mm + fs: store shadow entries in page cache Johannes Weiner
2013-10-10 21:47 ` [patch 7/8] mm: thrash detection-based file cache sizing Johannes Weiner
2013-10-10 21:47 ` [patch 8/8] mm: workingset: keep shadow entries in check Johannes Weiner
2013-10-11  0:39 ` [patch 0/8] mm: thrash detection-based file cache sizing v5 Dave Chinner
2013-10-14 21:42   ` Johannes Weiner
2013-10-15  1:41     ` Dave Chinner
2013-10-15 10:27       ` Jan Kara
2013-10-15 17:41       ` Johannes Weiner
2013-10-15 23:41         ` Dave Chinner
2013-10-16  2:05           ` Rik van Riel
2013-10-16  2:26             ` Dave Chinner
2013-10-16 22:31               ` Johannes Weiner [this message]
2013-10-21 12:10                 ` Dave Chinner
2013-10-21  9:26 ` Vlastimil Babka
2013-10-22  9:35   ` Johannes Weiner
2013-11-14 15:56   ` Rik van Riel
2013-11-12 10:30 ` Bob Liu
2013-11-14 15:53   ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131016223104.GA738@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=david@fromorbit.com \
    --cc=gthelen@google.com \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=klamm@yandex-team.ru \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=metin@citusdata.com \
    --cc=mgorman@suse.de \
    --cc=minchan.kim@gmail.com \
    --cc=ozgun@citusdata.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=sjenning@linux.vnet.ibm.com \
    --cc=tj@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).