linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Andi Kleen <andi@firstfloor.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Christoph Hellwig <hch@infradead.org>,
	Greg Thelen <gthelen@google.com>, Hugh Dickins <hughd@google.com>,
	Jan Kara <jack@suse.cz>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Mel Gorman <mgorman@suse.de>, Metin Doslu <metin@citusdata.com>,
	Michel Lespinasse <walken@google.com>,
	Minchan Kim <minchan.kim@gmail.com>,
	Ozgun Erdogan <ozgun@citusdata.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Rik van Riel <riel@redhat.com>,
	Roman Gushchin <klamm@yandex-team.ru>,
	Ryan Mallon <rmallon@gmail.com>, Tejun Heo <tj@kernel.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [patch 9/9] mm: keep page cache radix tree nodes in check
Date: Tue, 3 Dec 2013 09:10:52 +1100	[thread overview]
Message-ID: <20131202221052.GT8803@dastard> (raw)
In-Reply-To: <1386012108-21006-10-git-send-email-hannes@cmpxchg.org>

On Mon, Dec 02, 2013 at 02:21:48PM -0500, Johannes Weiner wrote:
> Previously, page cache radix tree nodes were freed after reclaim
> emptied out their page pointers.  But now reclaim stores shadow
> entries in their place, which are only reclaimed when the inodes
> themselves are reclaimed.  This is problematic for bigger files that
> are still in use after they have a significant amount of their cache
> reclaimed, without any of those pages actually refaulting.  The shadow
> entries will just sit there and waste memory.  In the worst case, the
> shadow entries will accumulate until the machine runs out of memory.
> 
> To get this under control, the VM will track radix tree nodes
> exclusively containing shadow entries on a per-NUMA node list.
> Per-NUMA rather than global because we expect the radix tree nodes
> themselves to be allocated node-locally and we want to reduce
> cross-node references of otherwise independent cache workloads.  A
> simple shrinker will then reclaim these nodes on memory pressure.
> 
> A few things need to be stored in the radix tree node to implement the
> shadow node LRU and allow tree deletions coming from the list:
> 
> 1. There is no index available that would describe the reverse path
>    from the node up to the tree root, which is needed to perform a
>    deletion.  To solve this, encode in each node its offset inside the
>    parent.  This can be stored in the unused upper bits of the same
>    member that stores the node's height at no extra space cost.
> 
> 2. The number of shadow entries needs to be counted in addition to the
>    regular entries, to quickly detect when the node is ready to go to
>    the shadow node LRU list.  The current entry count is an unsigned
>    int but the maximum number of entries is 64, so a shadow counter
>    can easily be stored in the unused upper bits.
> 
> 3. Tree modification needs tree lock and tree root, which are located
>    in the address space, so store an address_space backpointer in the
>    node.  The parent pointer of the node is in a union with the 2-word
>    rcu_head, so the backpointer comes at no extra cost as well.
> 
> 4. The node needs to be linked to an LRU list, which requires a list
>    head inside the node.  This does increase the size of the node, but
>    it does not change the number of objects that fit into a slab page.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Mostly looks ok, though there is no need to expose the internals of
list_lru_add/del. The reason for the different return values was so
that the isolate callback could simply use list_del_init() and not
have to worry about all the internal accounting stuff. We can drop
the lock and then do the accounting after regaining it because it
won't result in the count of objects going negative and triggering
warnings.

Hence I think that all we need to do is add a new isolate return
value "LRU_REMOVED_RETRY" and add it to list_lru_walk_node() like
so:

 		switch (ret) {
+		case LRU_REMOVED_RETRY:
+			/*
+			 * object was removed from the list so we need to
+			 * account for it just like LRU_REMOVED hence the
+			 * fallthrough.  However, the list lock was also
+			 * dropped so we need to restart the list walk.
+			 */
 		case LRU_REMOVED:
 			if (--nlru->nr_items == 0)
 				node_clear(nid, lru->active_nodes);
 			WARN_ON_ONCE(nlru->nr_items < 0);
 			isolated++;
+			if (ret == LRU_REMOVED_RETRY)
+				goto restart;
 			break;

> +static unsigned long scan_shadow_nodes(struct shrinker *shrinker,
> +				       struct shrink_control *sc)
> +{
> +	unsigned long nr_reclaimed = 0;
> +
> +	list_lru_walk_node(&workingset_shadow_nodes, sc->nid,
> +			   shadow_lru_isolate, &nr_reclaimed, &sc->nr_to_scan);
> +
> +	return nr_reclaimed;
> +}

Do we need check against GFP_NOFS here? I don't think so, but I just
wanted to check...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-12-02 22:11 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-02 19:21 [patch 0/9] mm: thrash detection-based file cache sizing v7 Johannes Weiner
2013-12-02 19:21 ` [patch 1/9] fs: cachefiles: use add_to_page_cache_lru() Johannes Weiner
2013-12-02 21:00   ` Rik van Riel
2013-12-02 19:21 ` [patch 2/9] lib: radix-tree: radix_tree_delete_item() Johannes Weiner
2013-12-02 22:24   ` Rik van Riel
2013-12-02 19:21 ` [patch 3/9] mm: shmem: save one radix tree lookup when truncating swapped pages Johannes Weiner
2013-12-02 19:21 ` [patch 4/9] mm: filemap: move radix tree hole searching here Johannes Weiner
2013-12-02 19:21 ` [patch 5/9] mm + fs: prepare for non-page entries in page cache radix trees Johannes Weiner
2013-12-02 19:21 ` [patch 6/9] mm + fs: store shadow entries in page cache Johannes Weiner
2013-12-02 19:21 ` [patch 7/9] mm: thrash detection-based file cache sizing Johannes Weiner
2013-12-02 19:21 ` [patch 8/9] lib: radix_tree: tree node interface Johannes Weiner
2013-12-02 19:21 ` [patch 9/9] mm: keep page cache radix tree nodes in check Johannes Weiner
2013-12-02 22:10   ` Dave Chinner [this message]
2013-12-02 22:46     ` Johannes Weiner
  -- strict thread matches above, loose matches on Subject: below --
2014-01-10 18:10 [patch 0/9] mm: thrash detection-based file cache sizing v8 Johannes Weiner
2014-01-10 18:10 ` [patch 9/9] mm: keep page cache radix tree nodes in check Johannes Weiner
2014-01-10 23:09   ` Rik van Riel
2014-01-13  7:39   ` Minchan Kim
2014-01-14  5:40     ` Minchan Kim
2014-01-22 18:42     ` Johannes Weiner
2014-01-23  5:20       ` Minchan Kim
2014-01-23 19:22         ` Johannes Weiner
2014-01-27  2:31           ` Minchan Kim
2014-01-15  5:55   ` Bob Liu
2014-01-16 22:09     ` Johannes Weiner
2014-01-17  0:05   ` Dave Chinner
2014-01-20 23:17     ` Johannes Weiner
2014-01-21  3:03       ` Dave Chinner
2014-01-21  5:50         ` Johannes Weiner
2014-01-22  3:06           ` Dave Chinner
2014-01-22  6:57             ` Johannes Weiner
2014-01-22 18:48               ` Johannes Weiner
2014-01-23  5:57       ` Minchan Kim
2013-11-24 23:38 [patch 0/9] mm: thrash detection-based file cache sizing v6 Johannes Weiner
2013-11-24 23:38 ` [patch 9/9] mm: keep page cache radix tree nodes in check Johannes Weiner
2013-11-25 23:49   ` Dave Chinner
2013-11-26 21:27     ` Johannes Weiner
2013-11-26 22:29       ` Dave Chinner
2013-11-26 23:00         ` Johannes Weiner
2013-11-27  0:59           ` Dave Chinner
2013-11-26  0:13   ` Andrew Morton
2013-11-26 22:05     ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131202221052.GT8803@dastard \
    --to=david@fromorbit.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=klamm@yandex-team.ru \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=metin@citusdata.com \
    --cc=mgorman@suse.de \
    --cc=minchan.kim@gmail.com \
    --cc=ozgun@citusdata.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=rmallon@gmail.com \
    --cc=tj@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).