From: Mel Gorman <mgorman@suse.de>
To: Glauber Costa <glommer@openvz.org>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
cgroups@vger.kernel.org, kamezawa.hiroyu@jp.fujitsu.com,
Dave Chinner <david@fromorbit.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@suse.cz>,
hughd@google.com, Greg Thelen <gthelen@google.com>,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v5 00/31] kmemcg shrinkers
Date: Thu, 9 May 2013 11:55:23 +0100 [thread overview]
Message-ID: <20130509105519.GQ11497@suse.de> (raw)
In-Reply-To: <1368079608-5611-1-git-send-email-glommer@openvz.org>
On Thu, May 09, 2013 at 10:06:17AM +0400, Glauber Costa wrote:
> [ Sending again, forgot to CC fsdevel. Shame on me ]
> To Mel
> ======
>
I'm surprised Dave Chinner is not on the cc. He may or may not see it
on fsdevel.
> Mel, I have identified the overly aggressive behavior you noticed to be a bug
> in the at-least-one-pass patch, that would ask the shrinkers to scan the full
> batch even when total_scan < batch. They would do their best for it, and
> eventually succeed. I also went further, and made that the behavior of direct
> reclaim only - The only case that really matter for memcg, and one in which
> we could argue that we are more or less desperate for small squeezes in memory.
> Thank you very much for spotting this.
>
I haven't seen the relevant code yet but in general I do not think it is
a good idea for direct reclaim to potentially reclaim all of slabs like
this. Direct reclaim does not necessarily mean the system is desperate
for small amounts of memory. Lets take a few examples where it would be
a poor decision to reclaim all the slab pages within direct reclaim.
1. Direct reclaim triggers because kswapd is stalled writing pages for
memcg (see code near comment "memcg doesn't have any dirty pages
throttling"). A memcg dirtying its limit of pages may cause a lot of
direct reclaim and dumping all the slab pages
2. Direct reclaim triggers because kswapd is writing pages out to swap.
Similar to memcg above, kswapd failing to make forward progress triggers
direct reclaim which then potentially reclaims all slab
3. Direct reclaim triggers because kswapd waits on congestion as there
are too many pages under writeback. In this case, a large amounts of
writes to slow storage like USB could result in all slab being reclaimed
4. The system has been up a long time, memory is fragmented and the page
allocator enters direct reclaim/compaction to allocate THPs. It would
be very unfortunate if allocating a THP reclaimed all the slabs
All that is potentially bad and likely to make Dave put in his cranky
pants. I would much prefer if direct reclaim and kswapd treated slab
similarly and not ask the shrinkers to do a full scan unless the alternative
is OOM kill.
> Running postmark on the final result (at least on my 2-node box) show something
> a lot saner. We are still stealing more inodes than before, but by a factor of
> around 15 %. Since the correct balance is somewhat heuristic anyway - I
> personally think this is acceptable. But I am waiting to hear from you on this
> matter. Meanwhile, I am investigating further to try to pinpoint where exactly
> this comes from. It might either be because of the new node-aware behavior, or
> because of the increased calculation precision in the first patch.
>
I'm going to defer to Dave as to whether that increased level of slab
reclaim is acceptable or not.
> In particular, I haven't done anything about your comment regarding MAX_NODES
> array. After the memcg patches are applying, fixing this is a lot easier,
> because memcg already departs from a static MAX_NODES array to a dynamic one.
> I wanted, however, to keep the noise introduction down in something that I
> expect to be merged soon. I would suggest merging a patch that fixes that
> on top of the series, instead of the middle, if you really think it matters.
> I, of course, commit to doing this in that case.
>
I think fixing it on top would be reasonable assuming the other memcg people
are happy with the memcg parts of the series. I didn't get a chance to look
at them the last time and focused more on the API and per-node list changes.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-05-09 10:55 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-09 6:06 [PATCH v5 00/31] kmemcg shrinkers Glauber Costa
2013-05-09 6:06 ` [PATCH v5 01/31] super: fix calculation of shrinkable objects for small numbers Glauber Costa
2013-05-09 6:06 ` [PATCH v5 02/31] vmscan: take at least one pass with shrinkers Glauber Costa
2013-05-09 11:12 ` Mel Gorman
[not found] ` <20130509111226.GR11497-l3A5Bk7waGM@public.gmane.org>
2013-05-09 11:28 ` Glauber Costa
[not found] ` <518B884C.9090704-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2013-05-09 11:35 ` Glauber Costa
2013-05-09 6:06 ` [PATCH v5 03/31] dcache: convert dentry_stat.nr_unused to per-cpu counters Glauber Costa
2013-05-09 6:06 ` [PATCH v5 04/31] dentry: move to per-sb LRU locks Glauber Costa
[not found] ` <1368079608-5611-5-git-send-email-glommer-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2013-05-10 5:29 ` Dave Chinner
2013-05-10 8:16 ` Dave Chinner
2013-05-09 6:06 ` [PATCH v5 05/31] dcache: remove dentries from LRU before putting on dispose list Glauber Costa
2013-05-09 6:06 ` [PATCH v5 06/31] mm: new shrinker API Glauber Costa
2013-05-09 13:30 ` Mel Gorman
[not found] ` <1368079608-5611-1-git-send-email-glommer-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2013-05-09 6:06 ` [PATCH v5 07/31] shrinker: convert superblock shrinkers to new API Glauber Costa
2013-05-09 13:33 ` Mel Gorman
2013-05-09 6:06 ` [PATCH v5 08/31] list: add a new LRU list type Glauber Costa
[not found] ` <1368079608-5611-9-git-send-email-glommer-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2013-05-09 13:37 ` Mel Gorman
[not found] ` <20130509133742.GW11497-l3A5Bk7waGM@public.gmane.org>
2013-05-09 21:02 ` Glauber Costa
2013-05-10 9:21 ` Mel Gorman
2013-05-10 9:56 ` Glauber Costa
[not found] ` <518CC44D.1020409-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2013-05-10 10:01 ` Mel Gorman
2013-05-09 6:06 ` [PATCH v5 09/31] inode: convert inode lru list to generic lru list code Glauber Costa
2013-05-09 6:06 ` [PATCH v5 10/31] dcache: convert to use new lru list infrastructure Glauber Costa
2013-05-09 6:06 ` [PATCH v5 11/31] list_lru: per-node " Glauber Costa
2013-05-09 13:42 ` Mel Gorman
[not found] ` <20130509134246.GX11497-l3A5Bk7waGM@public.gmane.org>
2013-05-09 21:05 ` Glauber Costa
2013-05-09 6:06 ` [PATCH v5 12/31] shrinker: add node awareness Glauber Costa
2013-05-09 6:06 ` [PATCH v5 13/31] fs: convert inode and dentry shrinking to be node aware Glauber Costa
2013-05-09 6:06 ` [PATCH v5 14/31] xfs: convert buftarg LRU to generic code Glauber Costa
[not found] ` <1368079608-5611-15-git-send-email-glommer-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2013-05-09 13:43 ` Mel Gorman
2013-05-09 6:06 ` [PATCH v5 15/31] xfs: convert dquot cache lru to list_lru Glauber Costa
2013-05-09 6:06 ` [PATCH v5 16/31] fs: convert fs shrinkers to new scan/count API Glauber Costa
2013-05-09 6:06 ` [PATCH v5 17/31] drivers: convert shrinkers to new count/scan API Glauber Costa
2013-05-09 13:52 ` Mel Gorman
[not found] ` <20130509135209.GZ11497-l3A5Bk7waGM@public.gmane.org>
2013-05-09 21:19 ` Glauber Costa
2013-05-10 9:00 ` Mel Gorman
2013-05-09 6:06 ` [PATCH v5 18/31] shrinker: convert remaining shrinkers to " Glauber Costa
2013-05-09 6:06 ` [PATCH v5 19/31] hugepage: convert huge zero page shrinker to new shrinker API Glauber Costa
[not found] ` <1368079608-5611-20-git-send-email-glommer-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2013-05-10 1:24 ` Kirill A. Shutemov
2013-05-09 6:06 ` [PATCH v5 20/31] shrinker: Kill old ->shrink API Glauber Costa
2013-05-09 13:53 ` Mel Gorman
2013-05-09 6:06 ` [PATCH v5 21/31] vmscan: also shrink slab in memcg pressure Glauber Costa
2013-05-09 6:06 ` [PATCH v5 22/31] memcg,list_lru: duplicate LRUs upon kmemcg creation Glauber Costa
2013-05-09 6:06 ` [PATCH v5 23/31] lru: add an element to a memcg list Glauber Costa
2013-05-09 6:06 ` [PATCH v5 24/31] list_lru: per-memcg walks Glauber Costa
2013-05-09 6:06 ` [PATCH v5 25/31] memcg: per-memcg kmem shrinking Glauber Costa
2013-05-09 6:06 ` [PATCH v5 26/31] memcg: scan cache objects hierarchically Glauber Costa
2013-05-09 6:06 ` [PATCH v5 27/31] super: targeted memcg reclaim Glauber Costa
2013-05-09 6:06 ` [PATCH v5 28/31] memcg: move initialization to memcg creation Glauber Costa
2013-05-09 6:06 ` [PATCH v5 29/31] vmpressure: in-kernel notifications Glauber Costa
2013-05-09 6:06 ` [PATCH v5 30/31] memcg: reap dead memcgs upon global memory pressure Glauber Costa
2013-05-09 6:06 ` [PATCH v5 31/31] memcg: debugging facility to access dangling memcgs Glauber Costa
2013-05-09 10:55 ` Mel Gorman [this message]
[not found] ` <20130509105519.GQ11497-l3A5Bk7waGM@public.gmane.org>
2013-05-09 11:34 ` [PATCH v5 00/31] kmemcg shrinkers Glauber Costa
2013-05-09 13:18 ` Dave Chinner
2013-05-09 14:03 ` Mel Gorman
[not found] ` <20130509140311.GB11497-l3A5Bk7waGM@public.gmane.org>
2013-05-09 21:24 ` Glauber Costa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130509105519.GQ11497@suse.de \
--to=mgorman@suse.de \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=david@fromorbit.com \
--cc=glommer@openvz.org \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).