From: Dave Chinner <david@fromorbit.com>
To: Glauber Costa <glommer@openvz.org>
Cc: linux-mm@kvack.org, cgroups@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Greg Thelen <gthelen@google.com>,
kamezawa.hiroyu@jp.fujitsu.com, Michal Hocko <mhocko@suse.cz>,
Johannes Weiner <hannes@cmpxchg.org>,
linux-fsdevel@vger.kernel.org, Dave Chinner <dchinner@redhat.com>,
Glauber Costa <glommer@parallels.com>
Subject: Re: [PATCH v6 12/31] fs: convert inode and dentry shrinking to be node aware
Date: Tue, 14 May 2013 19:52:00 +1000 [thread overview]
Message-ID: <20130514095200.GI29466@dastard> (raw)
In-Reply-To: <1368382432-25462-13-git-send-email-glommer@openvz.org>
On Sun, May 12, 2013 at 10:13:33PM +0400, Glauber Costa wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> Now that the shrinker is passing a nodemask in the scan control
> structure, we can pass this to the the generic LRU list code to
> isolate reclaim to the lists on matching nodes.
>
> This requires a small amount of refactoring of the LRU list API,
> which might be best split out into a separate patch.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Glauber Costa <glommer@parallels.com>
> Acked-by: Mel Gorman <mgorman@suse.de>
What I see at this point is that the superblock shrinkers appear to
be way too agressive. As soon as memory pressure hits, the slab
caches are getting emptied down to almost nothing. I'm testing on a
4 node fake-numa system here, so I initially suspected what is
happening is that the amount of reclaim has just been multiplied by
4.
Peak performance is good, it's just that there is no stable steady
state - the caches are either filling at maximum rate, or being
emptied at maximum rate - and that implies that the shrinkers are
doing more work than they need to.
I think this is the point at what we need to ensure that the balance
between the dentry/inode caches and the page cache is balanced, so
I'm going to see what happens when I tweak a few numbers. e.g. see
what effect tweaking shrink.seeks has on the rate of change of the
cache sizes.....
Changing the seek count doesn't change the fact that it's
fundamentally unstable. More investigation needed. Trace points.
kswapd0-632 mm_shrink_slab_start: objects to shrink 945211
gfp_flags GFP_KERNEL pgs_scanned 32 lru_pgs 7046
cache items 600456 delta 1363 total_scan 300228
Bingo. We've got windup!
objects to shrink = shrinker->nr_in_batch
= 945211
= large amount of deferred work
cache items = max_pass
= current cache size
= 600456
total_scan = 300228
= (cache items) / 2.
delta = 1363
And this code:
/*
* We need to avoid excessive windup on filesystem shrinkers
* due to large numbers of GFP_NOFS allocations causing the
* shrinkers to return -1 all the time. This results in a large
* nr being built up so when a shrink that can do some work
* comes along it empties the entire cache due to nr >>>
* max_pass. This is bad for sustaining a working set in
* memory.
*
* Hence only allow the shrinker to scan the entire cache when
* a large delta change is calculated directly.
*/
if (delta < max_pass / 4)
total_scan = min(total_scan, max_pass / 2);
Has obviously triggered.
So, it would seem to me that we have a relatively small amount of
incremental memory pressure, but an awful lot of deferred work. When
I see this:
kswapd0-632 1210443.469309: mm_shrink_slab_start: cache items 600456 delta 1363 total_scan 300228
kswapd3-635 1210443.510311: mm_shrink_slab_start: cache items 514885 delta 1250 total_scan 101025
kswapd1-633 1210443.517440: mm_shrink_slab_start: cache items 613824 delta 1357 total_scan 97727
kswapd2-634 1210443.527026: mm_shrink_slab_start: cache items 568610 delta 1331 total_scan 259185
kswapd3-635 1210443.573165: mm_shrink_slab_start: cache items 486408 delta 1277 total_scan 243204
kswapd1-633 1210443.697012: mm_shrink_slab_start: cache items 550827 delta 1224 total_scan 82231
in the space of 230ms, I can see why the caches are getting
completely emptied. kswapds are making multiple, large scale scan
passes on the caches. Looks like our problem is an impedence
mismatch: global windup counter, per-node cache scan calculations.
So, that's the mess we really need to cleaning up before going much
further with this patchset. We need stable behaviour from the
shrinkers - I'll look into this a bit deeper tomorrow.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-05-14 9:52 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-12 18:13 [PATCH v6 00/31] kmemcg shrinkers Glauber Costa
2013-05-12 18:13 ` [PATCH v6 01/31] super: fix calculation of shrinkable objects for small numbers Glauber Costa
2013-05-12 18:13 ` [PATCH v6 02/31] dcache: convert dentry_stat.nr_unused to per-cpu counters Glauber Costa
2013-05-12 18:13 ` [PATCH v6 03/31] dentry: move to per-sb LRU locks Glauber Costa
2013-05-12 18:13 ` [PATCH v6 04/31] dcache: remove dentries from LRU before putting on dispose list Glauber Costa
2013-05-14 2:02 ` Dave Chinner
2013-05-14 5:46 ` [PATCH v7 " Dave Chinner
2013-05-14 7:10 ` Dave Chinner
2013-05-14 12:43 ` Glauber Costa
2013-05-14 20:32 ` Dave Chinner
2013-05-12 18:13 ` [PATCH v6 05/31] mm: new shrinker API Glauber Costa
2013-05-12 18:13 ` [PATCH v6 06/31] shrinker: convert superblock shrinkers to new API Glauber Costa
2013-05-12 18:13 ` [PATCH v6 07/31] list: add a new LRU list type Glauber Costa
2013-05-13 9:25 ` Mel Gorman
2013-05-12 18:13 ` [PATCH v6 08/31] inode: convert inode lru list to generic lru list code Glauber Costa
2013-05-12 18:13 ` [PATCH v6 09/31] dcache: convert to use new lru list infrastructure Glauber Costa
2013-05-14 6:59 ` Dave Chinner
2013-05-14 7:50 ` Glauber Costa
2013-05-14 14:01 ` Glauber Costa
2013-05-12 18:13 ` [PATCH v6 10/31] list_lru: per-node " Glauber Costa
2013-05-12 18:13 ` [PATCH v6 11/31] shrinker: add node awareness Glauber Costa
2013-05-12 18:13 ` [PATCH v6 12/31] fs: convert inode and dentry shrinking to be node aware Glauber Costa
2013-05-14 9:52 ` Dave Chinner [this message]
2013-05-15 15:27 ` Glauber Costa
2013-05-16 0:02 ` Dave Chinner
2013-05-16 8:03 ` Glauber Costa
2013-05-16 19:14 ` Glauber Costa
2013-05-17 0:51 ` Dave Chinner
2013-05-17 7:29 ` Glauber Costa
2013-05-17 14:49 ` Glauber Costa
2013-05-17 22:54 ` Glauber Costa
2013-05-18 3:39 ` Dave Chinner
2013-05-18 7:20 ` Glauber Costa
2013-05-12 18:13 ` [PATCH v6 13/31] xfs: convert buftarg LRU to generic code Glauber Costa
2013-05-12 18:13 ` [PATCH v6 14/31] xfs: convert dquot cache lru to list_lru Glauber Costa
2013-05-12 18:13 ` [PATCH v6 15/31] fs: convert fs shrinkers to new scan/count API Glauber Costa
2013-05-13 6:12 ` Artem Bityutskiy
2013-05-13 7:28 ` Glauber Costa
2013-05-13 7:43 ` Artem Bityutskiy
2013-05-13 10:36 ` Jan Kara
2013-05-12 18:13 ` [PATCH v6 16/31] drivers: convert shrinkers to new count/scan API Glauber Costa
2013-05-12 18:13 ` [PATCH v6 17/31] i915: bail out earlier when shrinker cannot acquire mutex Glauber Costa
2013-05-12 18:13 ` [PATCH v6 18/31] shrinker: convert remaining shrinkers to count/scan API Glauber Costa
2013-05-12 18:13 ` [PATCH v6 19/31] hugepage: convert huge zero page shrinker to new shrinker API Glauber Costa
2013-05-12 18:13 ` [PATCH v6 20/31] shrinker: Kill old ->shrink API Glauber Costa
2013-05-12 18:13 ` [PATCH v6 21/31] vmscan: also shrink slab in memcg pressure Glauber Costa
2013-05-12 18:13 ` [PATCH v6 22/31] memcg,list_lru: duplicate LRUs upon kmemcg creation Glauber Costa
2013-05-12 18:13 ` [PATCH v6 23/31] lru: add an element to a memcg list Glauber Costa
2013-05-12 18:13 ` [PATCH v6 24/31] list_lru: per-memcg walks Glauber Costa
2013-05-12 18:13 ` [PATCH v6 25/31] memcg: per-memcg kmem shrinking Glauber Costa
2013-05-12 18:13 ` [PATCH v6 26/31] memcg: scan cache objects hierarchically Glauber Costa
2013-05-12 18:13 ` [PATCH v6 27/31] vmscan: take at least one pass with shrinkers Glauber Costa
2013-05-12 18:13 ` [PATCH v6 28/31] super: targeted memcg reclaim Glauber Costa
2013-05-12 18:13 ` [PATCH v6 29/31] memcg: move initialization to memcg creation Glauber Costa
2013-05-12 18:13 ` [PATCH v6 30/31] vmpressure: in-kernel notifications Glauber Costa
2013-05-12 18:13 ` [PATCH v6 31/31] memcg: reap dead memcgs upon global memory pressure Glauber Costa
2013-05-13 7:14 ` [PATCH v6 00/31] kmemcg shrinkers Dave Chinner
2013-05-13 7:21 ` Dave Chinner
2013-05-13 8:00 ` Glauber Costa
2013-05-14 1:48 ` Dave Chinner
2013-05-14 5:22 ` Dave Chinner
2013-05-14 5:45 ` Dave Chinner
2013-05-14 7:38 ` Glauber Costa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130514095200.GI29466@dastard \
--to=david@fromorbit.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=dchinner@redhat.com \
--cc=glommer@openvz.org \
--cc=glommer@parallels.com \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).