From: Minchan Kim <minchan@kernel.org>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Shantanu Goel <sgoel01@yahoo.com>, Chris Mason <clm@fb.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Vlastimil Babka <vbabka@suse.cz>,
LKML <linux-kernel@vger.kernel.org>,
Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH 1/3] mm, vmscan: fix zone balance check in prepare_kswapd_sleep
Date: Wed, 22 Feb 2017 16:00:36 +0900 [thread overview]
Message-ID: <20170222070036.GA17962@bbox> (raw)
In-Reply-To: <20170215092247.15989-2-mgorman@techsingularity.net>
Hi,
On Wed, Feb 15, 2017 at 09:22:45AM +0000, Mel Gorman wrote:
> From: Shantanu Goel <sgoel01@yahoo.com>
>
> The check in prepare_kswapd_sleep needs to match the one in balance_pgdat
> since the latter will return as soon as any one of the zones in the
> classzone is above the watermark. This is specially important for higher
> order allocations since balance_pgdat will typically reset the order to
> zero relying on compaction to create the higher order pages. Without this
> patch, prepare_kswapd_sleep fails to wake up kcompactd since the zone
> balance check fails.
>
> On 4.9.7 kswapd is failing to wake up kcompactd due to a mismatch in the
> zone balance check between balance_pgdat() and prepare_kswapd_sleep().
> balance_pgdat() returns as soon as a single zone satisfies the allocation
> but prepare_kswapd_sleep() requires all zones to do +the same. This causes
> prepare_kswapd_sleep() to never succeed except in the order == 0 case and
> consequently, wakeup_kcompactd() is never called. On my machine prior to
> apply this patch, the state of compaction from /proc/vmstat looked this
> way after a day and a half +of uptime:
>
> compact_migrate_scanned 240496
> compact_free_scanned 76238632
> compact_isolated 123472
> compact_stall 1791
> compact_fail 29
> compact_success 1762
> compact_daemon_wake 0
>
> After applying the patch and about 10 hours of uptime the state looks
> like this:
>
> compact_migrate_scanned 59927299
> compact_free_scanned 2021075136
> compact_isolated 640926
> compact_stall 4
> compact_fail 2
> compact_success 2
> compact_daemon_wake 5160
>
> Further notes from Mel that motivated him to pick this patch up and
> resend it;
>
> It was observed for the simoop workload (pressures the VM similar to HADOOP)
> that kswapd was failing to keep ahead of direct reclaim. The investigation
> noted that there was a need to rationalise kswapd decisions to reclaim
> with kswapd decisions to sleep. With this patch on a 2-socket box, there
> was a 43% reduction in direct reclaim scanning.
>
> However, the impact otherwise is extremely negative. Kswapd reclaim
> efficiency dropped from 98% to 76%. simoop has three latency-related
> metrics for read, write and allocation (an anonymous mmap and fault).
>
> 4.10.0-rc7 4.10.0-rc7
> mmots-20170209 fixcheck-v1
> Amean p50-Read 22325202.49 ( 0.00%) 20026926.55 ( 10.29%)
> Amean p95-Read 26102988.80 ( 0.00%) 27023360.00 ( -3.53%)
> Amean p99-Read 30935176.53 ( 0.00%) 30994432.00 ( -0.19%)
> Amean p50-Write 976.44 ( 0.00%) 1905.28 (-95.12%)
> Amean p95-Write 15471.29 ( 0.00%) 36210.09 (-134.05%)
> Amean p99-Write 35108.62 ( 0.00%) 479494.96 (-1265.75%)
> Amean p50-Allocation 76382.61 ( 0.00%) 87603.20 (-14.69%)
> Amean p95-Allocation 127777.39 ( 0.00%) 244491.38 (-91.34%)
> Amean p99-Allocation 187937.39 ( 0.00%) 1745237.33 (-828.63%)
>
> There are also more allocation stalls. One of the largest impacts was due
> to pages written back from kswapd context rising from 0 pages to 4516642
> pages during the hour the workload ran for. By and large, the patch has very
> bad behaviour but easily missed as the impact on a UMA machine is negligible.
>
> This patch is included with the data in case a bisection leads to this area.
> This patch is also a pre-requisite for the rest of the series.
>
> Signed-off-by: Shantanu Goel <sgoel01@yahoo.com>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Hmm, I don't understand why we should bind wakeup_kcompactd to kswapd's
short sleep point where every eligible zones are balanced.
What's the correlation between them?
Can't we wake up kcompactd once we found a zone has enough free pages
above high watermark like this?
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 26c3b405ef34..f4f0ad0e9ede 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3346,13 +3346,6 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o
* that pages and compaction may succeed so reset the cache.
*/
reset_isolation_suitable(pgdat);
-
- /*
- * We have freed the memory, now we should compact it to make
- * allocation of the requested order possible.
- */
- wakeup_kcompactd(pgdat, alloc_order, classzone_idx);
-
remaining = schedule_timeout(HZ/10);
/*
@@ -3451,6 +3444,14 @@ static int kswapd(void *p)
bool ret;
kswapd_try_sleep:
+ /*
+ * We have freed the memory, now we should compact it to make
+ * allocation of the requested order possible.
+ */
+ if (alloc_order > 0 && zone_balanced(zone, reclaim_order,
+ classzone_idx))
+ wakeup_kcompactd(pgdat, alloc_order, classzone_idx);
+
kswapd_try_to_sleep(pgdat, alloc_order, reclaim_order,
classzone_idx);
--
2.7.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Minchan Kim <minchan@kernel.org>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Shantanu Goel <sgoel01@yahoo.com>, Chris Mason <clm@fb.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Vlastimil Babka <vbabka@suse.cz>,
LKML <linux-kernel@vger.kernel.org>,
Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH 1/3] mm, vmscan: fix zone balance check in prepare_kswapd_sleep
Date: Wed, 22 Feb 2017 16:00:36 +0900 [thread overview]
Message-ID: <20170222070036.GA17962@bbox> (raw)
In-Reply-To: <20170215092247.15989-2-mgorman@techsingularity.net>
Hi,
On Wed, Feb 15, 2017 at 09:22:45AM +0000, Mel Gorman wrote:
> From: Shantanu Goel <sgoel01@yahoo.com>
>
> The check in prepare_kswapd_sleep needs to match the one in balance_pgdat
> since the latter will return as soon as any one of the zones in the
> classzone is above the watermark. This is specially important for higher
> order allocations since balance_pgdat will typically reset the order to
> zero relying on compaction to create the higher order pages. Without this
> patch, prepare_kswapd_sleep fails to wake up kcompactd since the zone
> balance check fails.
>
> On 4.9.7 kswapd is failing to wake up kcompactd due to a mismatch in the
> zone balance check between balance_pgdat() and prepare_kswapd_sleep().
> balance_pgdat() returns as soon as a single zone satisfies the allocation
> but prepare_kswapd_sleep() requires all zones to do +the same. This causes
> prepare_kswapd_sleep() to never succeed except in the order == 0 case and
> consequently, wakeup_kcompactd() is never called. On my machine prior to
> apply this patch, the state of compaction from /proc/vmstat looked this
> way after a day and a half +of uptime:
>
> compact_migrate_scanned 240496
> compact_free_scanned 76238632
> compact_isolated 123472
> compact_stall 1791
> compact_fail 29
> compact_success 1762
> compact_daemon_wake 0
>
> After applying the patch and about 10 hours of uptime the state looks
> like this:
>
> compact_migrate_scanned 59927299
> compact_free_scanned 2021075136
> compact_isolated 640926
> compact_stall 4
> compact_fail 2
> compact_success 2
> compact_daemon_wake 5160
>
> Further notes from Mel that motivated him to pick this patch up and
> resend it;
>
> It was observed for the simoop workload (pressures the VM similar to HADOOP)
> that kswapd was failing to keep ahead of direct reclaim. The investigation
> noted that there was a need to rationalise kswapd decisions to reclaim
> with kswapd decisions to sleep. With this patch on a 2-socket box, there
> was a 43% reduction in direct reclaim scanning.
>
> However, the impact otherwise is extremely negative. Kswapd reclaim
> efficiency dropped from 98% to 76%. simoop has three latency-related
> metrics for read, write and allocation (an anonymous mmap and fault).
>
> 4.10.0-rc7 4.10.0-rc7
> mmots-20170209 fixcheck-v1
> Amean p50-Read 22325202.49 ( 0.00%) 20026926.55 ( 10.29%)
> Amean p95-Read 26102988.80 ( 0.00%) 27023360.00 ( -3.53%)
> Amean p99-Read 30935176.53 ( 0.00%) 30994432.00 ( -0.19%)
> Amean p50-Write 976.44 ( 0.00%) 1905.28 (-95.12%)
> Amean p95-Write 15471.29 ( 0.00%) 36210.09 (-134.05%)
> Amean p99-Write 35108.62 ( 0.00%) 479494.96 (-1265.75%)
> Amean p50-Allocation 76382.61 ( 0.00%) 87603.20 (-14.69%)
> Amean p95-Allocation 127777.39 ( 0.00%) 244491.38 (-91.34%)
> Amean p99-Allocation 187937.39 ( 0.00%) 1745237.33 (-828.63%)
>
> There are also more allocation stalls. One of the largest impacts was due
> to pages written back from kswapd context rising from 0 pages to 4516642
> pages during the hour the workload ran for. By and large, the patch has very
> bad behaviour but easily missed as the impact on a UMA machine is negligible.
>
> This patch is included with the data in case a bisection leads to this area.
> This patch is also a pre-requisite for the rest of the series.
>
> Signed-off-by: Shantanu Goel <sgoel01@yahoo.com>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Hmm, I don't understand why we should bind wakeup_kcompactd to kswapd's
short sleep point where every eligible zones are balanced.
What's the correlation between them?
Can't we wake up kcompactd once we found a zone has enough free pages
above high watermark like this?
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 26c3b405ef34..f4f0ad0e9ede 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3346,13 +3346,6 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o
* that pages and compaction may succeed so reset the cache.
*/
reset_isolation_suitable(pgdat);
-
- /*
- * We have freed the memory, now we should compact it to make
- * allocation of the requested order possible.
- */
- wakeup_kcompactd(pgdat, alloc_order, classzone_idx);
-
remaining = schedule_timeout(HZ/10);
/*
@@ -3451,6 +3444,14 @@ static int kswapd(void *p)
bool ret;
kswapd_try_sleep:
+ /*
+ * We have freed the memory, now we should compact it to make
+ * allocation of the requested order possible.
+ */
+ if (alloc_order > 0 && zone_balanced(zone, reclaim_order,
+ classzone_idx))
+ wakeup_kcompactd(pgdat, alloc_order, classzone_idx);
+
kswapd_try_to_sleep(pgdat, alloc_order, reclaim_order,
classzone_idx);
--
2.7.4
next prev parent reply other threads:[~2017-02-22 7:00 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-15 9:22 [PATCH 0/3] Reduce amount of time kswapd sleeps prematurely Mel Gorman
2017-02-15 9:22 ` Mel Gorman
2017-02-15 9:22 ` [PATCH 1/3] mm, vmscan: fix zone balance check in prepare_kswapd_sleep Mel Gorman
2017-02-15 9:22 ` Mel Gorman
2017-02-16 2:50 ` Hillf Danton
2017-02-16 2:50 ` Hillf Danton
2017-02-22 7:00 ` Minchan Kim [this message]
2017-02-22 7:00 ` Minchan Kim
2017-02-23 15:05 ` Mel Gorman
2017-02-23 15:05 ` Mel Gorman
2017-02-24 1:17 ` Minchan Kim
2017-02-24 1:17 ` Minchan Kim
2017-02-24 9:11 ` Mel Gorman
2017-02-24 9:11 ` Mel Gorman
2017-02-27 6:16 ` Minchan Kim
2017-02-27 6:16 ` Minchan Kim
2017-02-15 9:22 ` [PATCH 2/3] mm, vmscan: Only clear pgdat congested/dirty/writeback state when balanced Mel Gorman
2017-02-15 9:22 ` Mel Gorman
2017-02-15 9:22 ` [PATCH 3/3] mm, vmscan: Prevent kswapd sleeping prematurely due to mismatched classzone_idx Mel Gorman
2017-02-15 9:22 ` Mel Gorman
2017-02-16 6:23 ` Hillf Danton
2017-02-16 6:23 ` Hillf Danton
2017-02-16 8:10 ` Mel Gorman
2017-02-16 8:10 ` Mel Gorman
2017-02-16 8:21 ` Hillf Danton
2017-02-16 8:21 ` Hillf Danton
2017-02-16 9:32 ` Mel Gorman
2017-02-16 9:32 ` Mel Gorman
2017-02-20 16:34 ` Vlastimil Babka
2017-02-20 16:34 ` Vlastimil Babka
2017-02-21 4:10 ` Hillf Danton
2017-02-21 4:10 ` Hillf Danton
2017-02-20 16:42 ` Vlastimil Babka
2017-02-20 16:42 ` Vlastimil Babka
2017-02-23 15:01 ` Mel Gorman
2017-02-23 15:01 ` Mel Gorman
2017-03-01 9:04 ` Vlastimil Babka
2017-03-01 9:04 ` Vlastimil Babka
2017-02-15 20:30 ` [PATCH 0/3] Reduce amount of time kswapd sleeps prematurely Andrew Morton
2017-02-15 20:30 ` Andrew Morton
2017-02-15 21:29 ` Mel Gorman
2017-02-15 21:29 ` Mel Gorman
2017-02-15 21:56 ` Andrew Morton
2017-02-15 21:56 ` Andrew Morton
2017-02-15 22:15 ` Mel Gorman
2017-02-15 22:15 ` Mel Gorman
2017-02-15 22:00 ` Vlastimil Babka
2017-02-15 22:00 ` Vlastimil Babka
-- strict thread matches above, loose matches on Subject: below --
2017-03-09 7:56 [PATCH 0/3] Reduce amount of time kswapd sleeps prematurely v2 Mel Gorman
2017-03-09 7:56 ` [PATCH 1/3] mm, vmscan: fix zone balance check in prepare_kswapd_sleep Mel Gorman
2017-03-09 7:56 ` Mel Gorman
2017-03-10 9:06 ` Vlastimil Babka
2017-03-10 9:06 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170222070036.GA17962@bbox \
--to=minchan@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=clm@fb.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=sgoel01@yahoo.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.