From: Vlastimil Babka <vbabka@suse.cz>
To: Andrew Morton <akpm@linux-foundation.org>,
Vinayak Menon <vinmenon@codeaurora.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
hannes@cmpxchg.org, vdavydov@parallels.com, mhocko@suse.cz,
mgorman@suse.de, minchan@kernel.org
Subject: Re: [PATCH v2] mm: vmscan: fix the page state calculation in too_many_isolated
Date: Thu, 12 Feb 2015 17:19:10 +0100 [thread overview]
Message-ID: <54DCD27E.3060304@suse.cz> (raw)
In-Reply-To: <20150211141433.2a46f06668e5a64c21f94375@linux-foundation.org>
On 02/11/2015 11:14 PM, Andrew Morton wrote:
>
> Did we end up deciding to merge this, or is
> http://ozlabs.org/~akpm/mmots/broken-out/vmstat-do-not-use-deferrable-delayed-work-for-vmstat_update.patch
> a sufficient fix?
I think Michal wanted to have the general vmstat worker fix from elsewhere in
the thread tested, if it solves the problem by itself, without this patch.
> From: Vinayak Menon <vinmenon@codeaurora.org>
> Subject: mm: vmscan: fix the page state calculation in too_many_isolated
>
> It is observed that sometimes multiple tasks get blocked for long in the
> congestion_wait loop below, in shrink_inactive_list. This is because of
> vm_stat values not being synced.
>
> (__schedule) from [<c0a03328>]
> (schedule_timeout) from [<c0a04940>]
> (io_schedule_timeout) from [<c01d585c>]
> (congestion_wait) from [<c01cc9d8>]
> (shrink_inactive_list) from [<c01cd034>]
> (shrink_zone) from [<c01cdd08>]
> (try_to_free_pages) from [<c01c442c>]
> (__alloc_pages_nodemask) from [<c01f1884>]
> (new_slab) from [<c09fcf60>]
> (__slab_alloc) from [<c01f1a6c>]
>
> In one such instance, zone_page_state(zone, NR_ISOLATED_FILE) had returned
> 14, zone_page_state(zone, NR_INACTIVE_FILE) returned 92, and GFP_IOFS was
> set, and this resulted in too_many_isolated returning true. But one of
> the CPU's pageset vm_stat_diff had NR_ISOLATED_FILE as "-14". So the
> actual isolated count was zero. As there weren't any more updates to
> NR_ISOLATED_FILE and vmstat_update deffered work had not been scheduled
> yet, 7 tasks were spinning in the congestion wait loop for around 4
> seconds, in the direct reclaim path.
>
> This patch uses zone_page_state_snapshot instead, but restricts its usage
> to avoid performance penalty.
>
>
> The vmstat sync interval is HZ (sysctl_stat_interval), but since the
> vmstat_work is declared as a deferrable work, the timer trigger can be
> deferred to the next non-defferable timer expiry on the CPU which is in
> idle. This results in the vmstat syncing on an idle CPU being delayed by
> seconds. May be in most cases this behavior is fine, except in cases like
> this.
>
> [akpm@linux-foundation.org: move zone_page_state_snapshot() fallback logic into too_many_isolated()]
> Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Vladimir Davydov <vdavydov@parallels.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Michal Hocko <mhocko@suse.cz>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
> mm/vmscan.c | 51 +++++++++++++++++++++++++++++++++-----------------
> 1 file changed, 34 insertions(+), 17 deletions(-)
>
> diff -puN mm/vmscan.c~mm-vmscan-fix-the-page-state-calculation-in-too_many_isolated mm/vmscan.c
> --- a/mm/vmscan.c~mm-vmscan-fix-the-page-state-calculation-in-too_many_isolated
> +++ a/mm/vmscan.c
> @@ -1363,6 +1363,32 @@ int isolate_lru_page(struct page *page)
> return ret;
> }
>
> +static int __too_many_isolated(struct zone *zone, int file,
> + struct scan_control *sc, int safe)
> +{
> + unsigned long inactive, isolated;
> +
> + if (safe) {
> + inactive = zone_page_state_snapshot(zone,
> + NR_INACTIVE_ANON + 2 * file);
> + isolated = zone_page_state_snapshot(zone,
> + NR_ISOLATED_ANON + file);
> + } else {
> + inactive = zone_page_state(zone, NR_INACTIVE_ANON + 2 * file);
> + isolated = zone_page_state(zone, NR_ISOLATED_ANON + file);
> + }
> +
> + /*
> + * GFP_NOIO/GFP_NOFS callers are allowed to isolate more pages, so they
> + * won't get blocked by normal direct-reclaimers, forming a circular
> + * deadlock.
> + */
> + if ((sc->gfp_mask & GFP_IOFS) == GFP_IOFS)
> + inactive >>= 3;
> +
> + return isolated > inactive;
> +}
> +
> /*
> * A direct reclaimer may isolate SWAP_CLUSTER_MAX pages from the LRU list and
> * then get resheduled. When there are massive number of tasks doing page
> @@ -1371,33 +1397,24 @@ int isolate_lru_page(struct page *page)
> * unnecessary swapping, thrashing and OOM.
> */
> static int too_many_isolated(struct zone *zone, int file,
> - struct scan_control *sc)
> + struct scan_control *sc)
> {
> - unsigned long inactive, isolated;
> -
> if (current_is_kswapd())
> return 0;
>
> if (!global_reclaim(sc))
> return 0;
>
> - if (file) {
> - inactive = zone_page_state(zone, NR_INACTIVE_FILE);
> - isolated = zone_page_state(zone, NR_ISOLATED_FILE);
> - } else {
> - inactive = zone_page_state(zone, NR_INACTIVE_ANON);
> - isolated = zone_page_state(zone, NR_ISOLATED_ANON);
> - }
> -
> /*
> - * GFP_NOIO/GFP_NOFS callers are allowed to isolate more pages, so they
> - * won't get blocked by normal direct-reclaimers, forming a circular
> - * deadlock.
> + * __too_many_isolated(safe=0) is fast but inaccurate, because it
> + * doesn't account for the vm_stat_diff[] counters. So if it looks
> + * like too_many_isolated() is about to return true, fall back to the
> + * slower, more accurate zone_page_state_snapshot().
> */
> - if ((sc->gfp_mask & GFP_IOFS) == GFP_IOFS)
> - inactive >>= 3;
> + if (unlikely(__too_many_isolated(zone, file, sc, 0)))
> + return __too_many_isolated(zone, file, sc, 1);
>
> - return isolated > inactive;
> + return 0;
> }
>
> static noinline_for_stack void
> _
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
prev parent reply other threads:[~2015-02-12 16:19 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-14 11:36 [PATCH v2] mm: vmscan: fix the page state calculation in too_many_isolated Vinayak Menon
2015-01-14 16:50 ` Michal Hocko
2015-01-15 17:24 ` Vinayak Menon
2015-01-16 15:49 ` Michal Hocko
2015-01-16 17:57 ` Michal Hocko
2015-01-16 19:17 ` Christoph Lameter
2015-01-17 15:18 ` Vinayak Menon
2015-01-17 19:48 ` Christoph Lameter
2015-01-19 4:27 ` Vinayak Menon
2015-01-21 14:39 ` Michal Hocko
2015-01-22 15:16 ` Vlastimil Babka
2015-01-22 16:11 ` Christoph Lameter
2015-01-26 17:46 ` Michal Hocko
2015-01-26 18:35 ` Christoph Lameter
2015-01-27 10:52 ` Michal Hocko
2015-01-27 16:59 ` Christoph Lameter
2015-01-30 15:28 ` Michal Hocko
2015-01-26 17:28 ` Michal Hocko
2015-01-26 18:35 ` Christoph Lameter
2015-01-26 22:11 ` Andrew Morton
2015-01-27 10:41 ` Michal Hocko
2015-01-27 10:33 ` Vinayak Menon
2015-01-27 10:45 ` Michal Hocko
2015-01-29 17:32 ` Christoph Lameter
2015-01-30 15:27 ` Michal Hocko
2015-01-16 1:17 ` Andrew Morton
2015-01-16 5:10 ` Vinayak Menon
2015-01-17 16:29 ` Vinayak Menon
2015-02-11 22:14 ` Andrew Morton
2015-02-12 16:19 ` Vlastimil Babka [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54DCD27E.3060304@suse.cz \
--to=vbabka@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.cz \
--cc=minchan@kernel.org \
--cc=vdavydov@parallels.com \
--cc=vinmenon@codeaurora.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).