* [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small
@ 2011-06-24 13:43 Mel Gorman
2011-06-24 13:43 ` [PATCH 1/4] mm: vmscan: Correct check for kswapd sleeping in sleeping_prematurely Mel Gorman
` (3 more replies)
0 siblings, 4 replies; 16+ messages in thread
From: Mel Gorman @ 2011-06-24 13:43 UTC (permalink / raw)
To: Andrew Morton
Cc: Pádraig Brady, James Bottomley, Colin King, Minchan Kim,
Andrew Lutomirski, Rik van Riel, Johannes Weiner, linux-mm,
linux-kernel, Mel Gorman
During allocator-intensive workloads, kswapd will be woken frequently
causing free memory to oscillate between the high and min watermark.
This is expected behaviour. Unfortunately, if the highest zone is
small, a problem occurs.
This seems to happen most with recent sandybridge laptops but it's
probably a co-incidence as some of these laptops just happen to have
a small Normal zone. The reproduction case is almost always during
copying large files that kswapd pegs at 100% CPU until the file is
deleted or cache is dropped.
The problem is mostly down to sleeping_prematurely() keeping kswapd
awake when the highest zone is small and unreclaimable and compounded
by the fact we shrink slabs even when not shrinking zones causing a lot
of time to be spent in shrinkers and a lot of memory to be reclaimed.
Patch 1 corrects sleeping_prematurely to check the zones matching
the classzone_idx instead of all zones.
Patch 2 avoids shrinking slab when we are not shrinking a zone.
Patch 3 notes that sleeping_prematurely is checking lower zones against
a high classzone which is not what allocators or balance_pgdat()
is doing leading to an artifical believe that kswapd should be
still awake.
Patch 4 notes that when balance_pgdat() gives up on a high zone that the
decision is not communicated to sleeping_prematurely()
This problem affects 3.0-rc4 and 2.6.38.8 for certain and is expected
to affect 2.6.39 as well. If accepted, they need to go to -stable to
be picked up by distros. This series is against 3.0-rc4. I've cc'd
people that reported similar problems recently to see if they still
suffer from the problem and if this fixes it.
mm/vmscan.c | 57 ++++++++++++++++++++++++++++++++++-----------------------
1 files changed, 34 insertions(+), 23 deletions(-)
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/4] mm: vmscan: Correct check for kswapd sleeping in sleeping_prematurely
2011-06-24 13:43 [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small Mel Gorman
@ 2011-06-24 13:43 ` Mel Gorman
2011-06-24 13:43 ` [PATCH 2/4] mm: vmscan: Do not apply pressure to slab if we are not applying pressure to zone Mel Gorman
` (2 subsequent siblings)
3 siblings, 0 replies; 16+ messages in thread
From: Mel Gorman @ 2011-06-24 13:43 UTC (permalink / raw)
To: Andrew Morton
Cc: Pádraig Brady, James Bottomley, Colin King, Minchan Kim,
Andrew Lutomirski, Rik van Riel, Johannes Weiner, linux-mm,
linux-kernel, Mel Gorman
During allocator-intensive workloads, kswapd will be woken frequently
causing free memory to oscillate between the high and min watermark.
This is expected behaviour.
A problem occurs if the highest zone is small. balance_pgdat()
only considers unreclaimable zones when priority is DEF_PRIORITY
but sleeping_prematurely considers all zones. It's possible for this
sequence to occur
1. kswapd wakes up and enters balance_pgdat()
2. At DEF_PRIORITY, marks highest zone unreclaimable
3. At DEF_PRIORITY-1, ignores highest zone setting end_zone
4. At DEF_PRIORITY-1, calls shrink_slab freeing memory from
highest zone, clearing all_unreclaimable. Highest zone
is still unbalanced
5. kswapd returns and calls sleeping_prematurely
6. sleeping_prematurely looks at *all* zones, not just the ones
being considered by balance_pgdat. The highest small zone
has all_unreclaimable cleared but but the zone is not
balanced. all_zones_ok is false so kswapd stays awake
This patch corrects the behaviour of sleeping_prematurely to check
the zones balance_pgdat() checked.
Reported-and-tested-by: PA!draig Brady <P@draigBrady.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
mm/vmscan.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 8ff834e..841e3bf 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2323,7 +2323,7 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining,
return true;
/* Check the watermark levels */
- for (i = 0; i < pgdat->nr_zones; i++) {
+ for (i = 0; i <= classzone_idx; i++) {
struct zone *zone = pgdat->node_zones + i;
if (!populated_zone(zone))
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/4] mm: vmscan: Do not apply pressure to slab if we are not applying pressure to zone
2011-06-24 13:43 [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small Mel Gorman
2011-06-24 13:43 ` [PATCH 1/4] mm: vmscan: Correct check for kswapd sleeping in sleeping_prematurely Mel Gorman
@ 2011-06-24 13:43 ` Mel Gorman
2011-06-24 13:59 ` Mel Gorman
2011-06-24 13:43 ` [PATCH 3/4] mm: vmscan: Evaluate the watermarks against the correct classzone Mel Gorman
2011-06-24 13:43 ` [PATCH 4/4] mm: vmscan: Only read new_classzone_idx from pgdat when reclaiming successfully Mel Gorman
3 siblings, 1 reply; 16+ messages in thread
From: Mel Gorman @ 2011-06-24 13:43 UTC (permalink / raw)
To: Andrew Morton
Cc: Pádraig Brady, James Bottomley, Colin King, Minchan Kim,
Andrew Lutomirski, Rik van Riel, Johannes Weiner, linux-mm,
linux-kernel, Mel Gorman
During allocator-intensive workloads, kswapd will be woken frequently
causing free memory to oscillate between the high and min watermark.
This is expected behaviour.
When kswapd applies pressure to zones during node balancing, it checks
if the zone is above a high+balance_gap threshold. If it is, it does
not apply pressure but it unconditionally shrinks slab on a global
basis which is excessive. In the event kswapd is being kept awake due to
a high small unreclaimable zone, it skips zone shrinking but still
calls shrink_slab().
Once pressure has been applied, the check for zone being unreclaimable
is being made before the check is made if all_unreclaimable should be
set. This miss of unreclaimable can cause has_under_min_watermark_zone
to be set due to an unreclaimable zone preventing kswapd backing off
on congestion_wait().
Reported-and-tested-by: PA!draig Brady <P@draigBrady.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
mm/vmscan.c | 21 ++++++++++++---------
1 files changed, 12 insertions(+), 9 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 841e3bf..38665ec 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2509,16 +2509,16 @@ loop_again:
high_wmark_pages(zone) + balance_gap,
end_zone, 0))
shrink_zone(priority, zone, &sc);
- reclaim_state->reclaimed_slab = 0;
- nr_slab = shrink_slab(&shrink, sc.nr_scanned, lru_pages);
- sc.nr_reclaimed += reclaim_state->reclaimed_slab;
- total_scanned += sc.nr_scanned;
- if (zone->all_unreclaimable)
- continue;
- if (nr_slab == 0 &&
- !zone_reclaimable(zone))
- zone->all_unreclaimable = 1;
+ reclaim_state->reclaimed_slab = 0;
+ nr_slab = shrink_slab(&shrink, sc.nr_scanned, lru_pages);
+ sc.nr_reclaimed += reclaim_state->reclaimed_slab;
+ total_scanned += sc.nr_scanned;
+
+ if (nr_slab == 0 && !zone_reclaimable(zone))
+ zone->all_unreclaimable = 1;
+ }
+
/*
* If we've done a decent amount of scanning and
* the reclaim ratio is low, start doing writepage
@@ -2528,6 +2528,9 @@ loop_again:
total_scanned > sc.nr_reclaimed + sc.nr_reclaimed / 2)
sc.may_writepage = 1;
+ if (zone->all_unreclaimable)
+ continue;
+
if (!zone_watermark_ok_safe(zone, order,
high_wmark_pages(zone), end_zone, 0)) {
all_zones_ok = 0;
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 3/4] mm: vmscan: Evaluate the watermarks against the correct classzone
2011-06-24 13:43 [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small Mel Gorman
2011-06-24 13:43 ` [PATCH 1/4] mm: vmscan: Correct check for kswapd sleeping in sleeping_prematurely Mel Gorman
2011-06-24 13:43 ` [PATCH 2/4] mm: vmscan: Do not apply pressure to slab if we are not applying pressure to zone Mel Gorman
@ 2011-06-24 13:43 ` Mel Gorman
2011-06-24 13:43 ` [PATCH 4/4] mm: vmscan: Only read new_classzone_idx from pgdat when reclaiming successfully Mel Gorman
3 siblings, 0 replies; 16+ messages in thread
From: Mel Gorman @ 2011-06-24 13:43 UTC (permalink / raw)
To: Andrew Morton
Cc: Pádraig Brady, James Bottomley, Colin King, Minchan Kim,
Andrew Lutomirski, Rik van Riel, Johannes Weiner, linux-mm,
linux-kernel, Mel Gorman
When deciding if kswapd is sleeping prematurely, the classzone is
taken into account but this is different to what balance_pgdat() and
the allocator are doing. Specifically, the DMA zone will be checked
based on the classzone used when waking kswapd which could be for a
GFP_KERNEL or GFP_HIGHMEM request. The lowmem reserve limit kicks in,
the watermark is not met and kswapd thinks its sleeping prematurely
keeping kswapd awake in error.
Reported-and-tested-by: PA!draig Brady <P@draigBrady.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
mm/vmscan.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 38665ec..d859111 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2341,7 +2341,7 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining,
}
if (!zone_watermark_ok_safe(zone, order, high_wmark_pages(zone),
- classzone_idx, 0))
+ i, 0))
all_zones_ok = false;
else
balanced += zone->present_pages;
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 4/4] mm: vmscan: Only read new_classzone_idx from pgdat when reclaiming successfully
2011-06-24 13:43 [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small Mel Gorman
` (2 preceding siblings ...)
2011-06-24 13:43 ` [PATCH 3/4] mm: vmscan: Evaluate the watermarks against the correct classzone Mel Gorman
@ 2011-06-24 13:43 ` Mel Gorman
3 siblings, 0 replies; 16+ messages in thread
From: Mel Gorman @ 2011-06-24 13:43 UTC (permalink / raw)
To: Andrew Morton
Cc: Pádraig Brady, James Bottomley, Colin King, Minchan Kim,
Andrew Lutomirski, Rik van Riel, Johannes Weiner, linux-mm,
linux-kernel, Mel Gorman
During allocator-intensive workloads, kswapd will be woken frequently
causing free memory to oscillate between the high and min watermark.
This is expected behaviour. Unfortunately, if the highest zone is
small, a problem occurs.
When balance_pgdat() returns, it may be at a lower classzone_idx than
it started because the highest zone was unreclaimable. Before checking
if it should go to sleep though, it checks pgdat->classzone_idx which
when there is no other activity will be MAX_NR_ZONES-1. It interprets
this as it has been woken up while reclaiming, skips scheduling and
reclaims again. As there is no useful reclaim work to do, it enters
into a loop of shrinking slab consuming loads of CPU until the highest
zone becomes reclaimable for a long period of time.
There are two problems here. 1) If the returned classzone or order is
lower, it'll continue reclaiming without scheduling. 2) if the highest
zone was marked unreclaimable but balance_pgdat() returns immediately
at DEF_PRIORITY, the new lower classzone is not communicated back to
kswapd() for sleeping.
This patch does two things that are related. If the end_zone is
unreclaimable, this information is communicated back. Second, if
the classzone or order was reduced due to failing to reclaim, new
information is not read from pgdat and instead an attempt is made to go
to sleep. Due to this, it is also necessary that pgdat->classzone_idx
be initialised each time to pgdat->nr_zones - 1 to avoid re-reads
being interpreted as wakeups.
Reported-and-tested-by: PA!draig Brady <P@draigBrady.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
mm/vmscan.c | 34 +++++++++++++++++++++-------------
1 files changed, 21 insertions(+), 13 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index d859111..9297195 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2448,7 +2448,6 @@ loop_again:
if (!zone_watermark_ok_safe(zone, order,
high_wmark_pages(zone), 0, 0)) {
end_zone = i;
- *classzone_idx = i;
break;
}
}
@@ -2528,8 +2527,11 @@ loop_again:
total_scanned > sc.nr_reclaimed + sc.nr_reclaimed / 2)
sc.may_writepage = 1;
- if (zone->all_unreclaimable)
+ if (zone->all_unreclaimable) {
+ if (end_zone && end_zone == i)
+ end_zone--;
continue;
+ }
if (!zone_watermark_ok_safe(zone, order,
high_wmark_pages(zone), end_zone, 0)) {
@@ -2709,8 +2711,8 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int order, int classzone_idx)
*/
static int kswapd(void *p)
{
- unsigned long order;
- int classzone_idx;
+ unsigned long order, new_order;
+ int classzone_idx, new_classzone_idx;
pg_data_t *pgdat = (pg_data_t*)p;
struct task_struct *tsk = current;
@@ -2740,17 +2742,23 @@ static int kswapd(void *p)
tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
set_freezable();
- order = 0;
- classzone_idx = MAX_NR_ZONES - 1;
+ order = new_order = 0;
+ classzone_idx = new_classzone_idx = pgdat->nr_zones - 1;
for ( ; ; ) {
- unsigned long new_order;
- int new_classzone_idx;
int ret;
- new_order = pgdat->kswapd_max_order;
- new_classzone_idx = pgdat->classzone_idx;
- pgdat->kswapd_max_order = 0;
- pgdat->classzone_idx = MAX_NR_ZONES - 1;
+ /*
+ * If the last balance_pgdat was unsuccessful it's unlikely a
+ * new request of a similar or harder type will succeed soon
+ * so consider going to sleep on the basis we reclaimed at
+ */
+ if (classzone_idx >= new_classzone_idx && order == new_order) {
+ new_order = pgdat->kswapd_max_order;
+ new_classzone_idx = pgdat->classzone_idx;
+ pgdat->kswapd_max_order = 0;
+ pgdat->classzone_idx = pgdat->nr_zones - 1;
+ }
+
if (order < new_order || classzone_idx > new_classzone_idx) {
/*
* Don't sleep if someone wants a larger 'order'
@@ -2763,7 +2771,7 @@ static int kswapd(void *p)
order = pgdat->kswapd_max_order;
classzone_idx = pgdat->classzone_idx;
pgdat->kswapd_max_order = 0;
- pgdat->classzone_idx = MAX_NR_ZONES - 1;
+ pgdat->classzone_idx = pgdat->nr_zones - 1;
}
ret = try_to_freeze();
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 2/4] mm: vmscan: Do not apply pressure to slab if we are not applying pressure to zone
2011-06-24 13:43 ` [PATCH 2/4] mm: vmscan: Do not apply pressure to slab if we are not applying pressure to zone Mel Gorman
@ 2011-06-24 13:59 ` Mel Gorman
0 siblings, 0 replies; 16+ messages in thread
From: Mel Gorman @ 2011-06-24 13:59 UTC (permalink / raw)
To: Andrew Morton
Cc: P?draig Brady, James Bottomley, Colin King, Minchan Kim,
Andrew Lutomirski, Rik van Riel, Johannes Weiner, linux-mm,
linux-kernel
On Fri, Jun 24, 2011 at 02:43:16PM +0100, Mel Gorman wrote:
> During allocator-intensive workloads, kswapd will be woken frequently
> causing free memory to oscillate between the high and min watermark.
> This is expected behaviour.
>
Bah, I accidentally exported a branch with a build error in this
patch. Will resend shortly.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small
@ 2011-06-24 14:44 Mel Gorman
2011-06-25 14:23 ` Andrew Lutomirski
2011-07-21 15:37 ` Minchan Kim
0 siblings, 2 replies; 16+ messages in thread
From: Mel Gorman @ 2011-06-24 14:44 UTC (permalink / raw)
To: Andrew Morton
Cc: Pádraig Brady, James Bottomley, Colin King, Minchan Kim,
Andrew Lutomirski, Rik van Riel, Johannes Weiner, linux-mm,
linux-kernel, Mel Gorman
(Built this time and passed a basic sniff-test.)
During allocator-intensive workloads, kswapd will be woken frequently
causing free memory to oscillate between the high and min watermark.
This is expected behaviour. Unfortunately, if the highest zone is
small, a problem occurs.
This seems to happen most with recent sandybridge laptops but it's
probably a co-incidence as some of these laptops just happen to have
a small Normal zone. The reproduction case is almost always during
copying large files that kswapd pegs at 100% CPU until the file is
deleted or cache is dropped.
The problem is mostly down to sleeping_prematurely() keeping kswapd
awake when the highest zone is small and unreclaimable and compounded
by the fact we shrink slabs even when not shrinking zones causing a lot
of time to be spent in shrinkers and a lot of memory to be reclaimed.
Patch 1 corrects sleeping_prematurely to check the zones matching
the classzone_idx instead of all zones.
Patch 2 avoids shrinking slab when we are not shrinking a zone.
Patch 3 notes that sleeping_prematurely is checking lower zones against
a high classzone which is not what allocators or balance_pgdat()
is doing leading to an artifical believe that kswapd should be
still awake.
Patch 4 notes that when balance_pgdat() gives up on a high zone that the
decision is not communicated to sleeping_prematurely()
This problem affects 2.6.38.8 for certain and is expected to affect
2.6.39 and 3.0-rc4 as well. If accepted, they need to go to -stable
to be picked up by distros and this series is against 3.0-rc4. I've
cc'd people that reported similar problems recently to see if they
still suffer from the problem and if this fixes it.
mm/vmscan.c | 59 +++++++++++++++++++++++++++++++++++------------------------
1 files changed, 35 insertions(+), 24 deletions(-)
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small
2011-06-24 14:44 [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small Mel Gorman
@ 2011-06-25 14:23 ` Andrew Lutomirski
2011-07-21 15:37 ` Minchan Kim
1 sibling, 0 replies; 16+ messages in thread
From: Andrew Lutomirski @ 2011-06-25 14:23 UTC (permalink / raw)
To: Mel Gorman
Cc: Andrew Morton, Pádraig Brady, James Bottomley, Colin King,
Minchan Kim, Rik van Riel, Johannes Weiner, linux-mm,
linux-kernel
On Fri, Jun 24, 2011 at 8:44 AM, Mel Gorman <mgorman@suse.de> wrote:
> (Built this time and passed a basic sniff-test.)
>
> During allocator-intensive workloads, kswapd will be woken frequently
> causing free memory to oscillate between the high and min watermark.
> This is expected behaviour. Unfortunately, if the highest zone is
> small, a problem occurs.
>
[...]
I've been running these for a couple days with no problems, although I
haven't been trying to reproduce the problem. (Well, no problems
related to memory management.)
I suspect that my pet unnecessary-OOM-kill bug is still around, but
that's probably not related, especially since I can trigger it if I
stick 8 GB of RAM in this laptop.
Thanks,
Andy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small
2011-06-24 14:44 [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small Mel Gorman
2011-06-25 14:23 ` Andrew Lutomirski
@ 2011-07-21 15:37 ` Minchan Kim
2011-07-21 16:09 ` Mel Gorman
1 sibling, 1 reply; 16+ messages in thread
From: Minchan Kim @ 2011-07-21 15:37 UTC (permalink / raw)
To: Mel Gorman
Cc: Andrew Morton, Pádraig Brady, James Bottomley, Colin King,
Andrew Lutomirski, Rik van Riel, Johannes Weiner, linux-mm,
linux-kernel
On Fri, Jun 24, 2011 at 03:44:53PM +0100, Mel Gorman wrote:
> (Built this time and passed a basic sniff-test.)
>
> During allocator-intensive workloads, kswapd will be woken frequently
> causing free memory to oscillate between the high and min watermark.
> This is expected behaviour. Unfortunately, if the highest zone is
> small, a problem occurs.
>
> This seems to happen most with recent sandybridge laptops but it's
> probably a co-incidence as some of these laptops just happen to have
> a small Normal zone. The reproduction case is almost always during
> copying large files that kswapd pegs at 100% CPU until the file is
> deleted or cache is dropped.
>
> The problem is mostly down to sleeping_prematurely() keeping kswapd
> awake when the highest zone is small and unreclaimable and compounded
> by the fact we shrink slabs even when not shrinking zones causing a lot
> of time to be spent in shrinkers and a lot of memory to be reclaimed.
>
> Patch 1 corrects sleeping_prematurely to check the zones matching
> the classzone_idx instead of all zones.
>
> Patch 2 avoids shrinking slab when we are not shrinking a zone.
>
> Patch 3 notes that sleeping_prematurely is checking lower zones against
> a high classzone which is not what allocators or balance_pgdat()
> is doing leading to an artifical believe that kswapd should be
> still awake.
>
> Patch 4 notes that when balance_pgdat() gives up on a high zone that the
> decision is not communicated to sleeping_prematurely()
>
> This problem affects 2.6.38.8 for certain and is expected to affect
> 2.6.39 and 3.0-rc4 as well. If accepted, they need to go to -stable
> to be picked up by distros and this series is against 3.0-rc4. I've
> cc'd people that reported similar problems recently to see if they
> still suffer from the problem and if this fixes it.
>
Good!
This patch solved the problem.
But there is still a mystery.
In log, we could see excessive shrink_slab calls.
And as you know, we had merged patch which adds cond_resched where last of the function
in shrink_slab. So other task should get the CPU and we should not see
100% CPU of kswapd, I think.
Do you have any idea about this?
> mm/vmscan.c | 59 +++++++++++++++++++++++++++++++++++------------------------
> 1 files changed, 35 insertions(+), 24 deletions(-)
>
> --
> 1.7.3.4
>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small
2011-07-21 15:37 ` Minchan Kim
@ 2011-07-21 16:09 ` Mel Gorman
2011-07-21 16:24 ` Minchan Kim
0 siblings, 1 reply; 16+ messages in thread
From: Mel Gorman @ 2011-07-21 16:09 UTC (permalink / raw)
To: Minchan Kim
Cc: Andrew Morton, P?draig Brady, James Bottomley, Colin King,
Andrew Lutomirski, Rik van Riel, Johannes Weiner, linux-mm,
linux-kernel
On Fri, Jul 22, 2011 at 12:37:22AM +0900, Minchan Kim wrote:
> On Fri, Jun 24, 2011 at 03:44:53PM +0100, Mel Gorman wrote:
> > (Built this time and passed a basic sniff-test.)
> >
> > During allocator-intensive workloads, kswapd will be woken frequently
> > causing free memory to oscillate between the high and min watermark.
> > This is expected behaviour. Unfortunately, if the highest zone is
> > small, a problem occurs.
> >
> > This seems to happen most with recent sandybridge laptops but it's
> > probably a co-incidence as some of these laptops just happen to have
> > a small Normal zone. The reproduction case is almost always during
> > copying large files that kswapd pegs at 100% CPU until the file is
> > deleted or cache is dropped.
> >
> > The problem is mostly down to sleeping_prematurely() keeping kswapd
> > awake when the highest zone is small and unreclaimable and compounded
> > by the fact we shrink slabs even when not shrinking zones causing a lot
> > of time to be spent in shrinkers and a lot of memory to be reclaimed.
> >
> > Patch 1 corrects sleeping_prematurely to check the zones matching
> > the classzone_idx instead of all zones.
> >
> > Patch 2 avoids shrinking slab when we are not shrinking a zone.
> >
> > Patch 3 notes that sleeping_prematurely is checking lower zones against
> > a high classzone which is not what allocators or balance_pgdat()
> > is doing leading to an artifical believe that kswapd should be
> > still awake.
> >
> > Patch 4 notes that when balance_pgdat() gives up on a high zone that the
> > decision is not communicated to sleeping_prematurely()
> >
> > This problem affects 2.6.38.8 for certain and is expected to affect
> > 2.6.39 and 3.0-rc4 as well. If accepted, they need to go to -stable
> > to be picked up by distros and this series is against 3.0-rc4. I've
> > cc'd people that reported similar problems recently to see if they
> > still suffer from the problem and if this fixes it.
> >
>
> Good!
> This patch solved the problem.
> But there is still a mystery.
>
> In log, we could see excessive shrink_slab calls.
Yes, because shrink_slab() was called on each loop through
balance_pgdat() even if the zone was balanced.
> And as you know, we had merged patch which adds cond_resched where last of the function
> in shrink_slab. So other task should get the CPU and we should not see
> 100% CPU of kswapd, I think.
>
cond_resched() is not a substitute for going to sleep.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small
2011-07-21 16:09 ` Mel Gorman
@ 2011-07-21 16:24 ` Minchan Kim
2011-07-21 16:36 ` Andrew Lutomirski
0 siblings, 1 reply; 16+ messages in thread
From: Minchan Kim @ 2011-07-21 16:24 UTC (permalink / raw)
To: Mel Gorman
Cc: Andrew Morton, P?draig Brady, James Bottomley, Colin King,
Andrew Lutomirski, Rik van Riel, Johannes Weiner, linux-mm,
linux-kernel
On Thu, Jul 21, 2011 at 05:09:59PM +0100, Mel Gorman wrote:
> On Fri, Jul 22, 2011 at 12:37:22AM +0900, Minchan Kim wrote:
> > On Fri, Jun 24, 2011 at 03:44:53PM +0100, Mel Gorman wrote:
> > > (Built this time and passed a basic sniff-test.)
> > >
> > > During allocator-intensive workloads, kswapd will be woken frequently
> > > causing free memory to oscillate between the high and min watermark.
> > > This is expected behaviour. Unfortunately, if the highest zone is
> > > small, a problem occurs.
> > >
> > > This seems to happen most with recent sandybridge laptops but it's
> > > probably a co-incidence as some of these laptops just happen to have
> > > a small Normal zone. The reproduction case is almost always during
> > > copying large files that kswapd pegs at 100% CPU until the file is
> > > deleted or cache is dropped.
> > >
> > > The problem is mostly down to sleeping_prematurely() keeping kswapd
> > > awake when the highest zone is small and unreclaimable and compounded
> > > by the fact we shrink slabs even when not shrinking zones causing a lot
> > > of time to be spent in shrinkers and a lot of memory to be reclaimed.
> > >
> > > Patch 1 corrects sleeping_prematurely to check the zones matching
> > > the classzone_idx instead of all zones.
> > >
> > > Patch 2 avoids shrinking slab when we are not shrinking a zone.
> > >
> > > Patch 3 notes that sleeping_prematurely is checking lower zones against
> > > a high classzone which is not what allocators or balance_pgdat()
> > > is doing leading to an artifical believe that kswapd should be
> > > still awake.
> > >
> > > Patch 4 notes that when balance_pgdat() gives up on a high zone that the
> > > decision is not communicated to sleeping_prematurely()
> > >
> > > This problem affects 2.6.38.8 for certain and is expected to affect
> > > 2.6.39 and 3.0-rc4 as well. If accepted, they need to go to -stable
> > > to be picked up by distros and this series is against 3.0-rc4. I've
> > > cc'd people that reported similar problems recently to see if they
> > > still suffer from the problem and if this fixes it.
> > >
> >
> > Good!
> > This patch solved the problem.
> > But there is still a mystery.
> >
> > In log, we could see excessive shrink_slab calls.
>
> Yes, because shrink_slab() was called on each loop through
> balance_pgdat() even if the zone was balanced.
>
>
> > And as you know, we had merged patch which adds cond_resched where last of the function
> > in shrink_slab. So other task should get the CPU and we should not see
> > 100% CPU of kswapd, I think.
> >
>
> cond_resched() is not a substitute for going to sleep.
Of course, it's not equal with sleep but other task should get CPU and conusme their time slice
So we should never see 100% CPU consumption of kswapd.
No?
>
> --
> Mel Gorman
> SUSE Labs
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small
2011-07-21 16:24 ` Minchan Kim
@ 2011-07-21 16:36 ` Andrew Lutomirski
2011-07-21 16:42 ` Minchan Kim
0 siblings, 1 reply; 16+ messages in thread
From: Andrew Lutomirski @ 2011-07-21 16:36 UTC (permalink / raw)
To: Minchan Kim
Cc: Mel Gorman, Andrew Morton, P?draig Brady, James Bottomley,
Colin King, Rik van Riel, Johannes Weiner, linux-mm, linux-kernel
On Thu, Jul 21, 2011 at 12:24 PM, Minchan Kim <minchan.kim@gmail.com> wrote:
> On Thu, Jul 21, 2011 at 05:09:59PM +0100, Mel Gorman wrote:
>> On Fri, Jul 22, 2011 at 12:37:22AM +0900, Minchan Kim wrote:
>> > On Fri, Jun 24, 2011 at 03:44:53PM +0100, Mel Gorman wrote:
>> > > (Built this time and passed a basic sniff-test.)
>> > >
>> > > During allocator-intensive workloads, kswapd will be woken frequently
>> > > causing free memory to oscillate between the high and min watermark.
>> > > This is expected behaviour. Unfortunately, if the highest zone is
>> > > small, a problem occurs.
>> > >
>> > > This seems to happen most with recent sandybridge laptops but it's
>> > > probably a co-incidence as some of these laptops just happen to have
>> > > a small Normal zone. The reproduction case is almost always during
>> > > copying large files that kswapd pegs at 100% CPU until the file is
>> > > deleted or cache is dropped.
>> > >
>> > > The problem is mostly down to sleeping_prematurely() keeping kswapd
>> > > awake when the highest zone is small and unreclaimable and compounded
>> > > by the fact we shrink slabs even when not shrinking zones causing a lot
>> > > of time to be spent in shrinkers and a lot of memory to be reclaimed.
>> > >
>> > > Patch 1 corrects sleeping_prematurely to check the zones matching
>> > > the classzone_idx instead of all zones.
>> > >
>> > > Patch 2 avoids shrinking slab when we are not shrinking a zone.
>> > >
>> > > Patch 3 notes that sleeping_prematurely is checking lower zones against
>> > > a high classzone which is not what allocators or balance_pgdat()
>> > > is doing leading to an artifical believe that kswapd should be
>> > > still awake.
>> > >
>> > > Patch 4 notes that when balance_pgdat() gives up on a high zone that the
>> > > decision is not communicated to sleeping_prematurely()
>> > >
>> > > This problem affects 2.6.38.8 for certain and is expected to affect
>> > > 2.6.39 and 3.0-rc4 as well. If accepted, they need to go to -stable
>> > > to be picked up by distros and this series is against 3.0-rc4. I've
>> > > cc'd people that reported similar problems recently to see if they
>> > > still suffer from the problem and if this fixes it.
>> > >
>> >
>> > Good!
>> > This patch solved the problem.
>> > But there is still a mystery.
>> >
>> > In log, we could see excessive shrink_slab calls.
>>
>> Yes, because shrink_slab() was called on each loop through
>> balance_pgdat() even if the zone was balanced.
>>
>>
>> > And as you know, we had merged patch which adds cond_resched where last of the function
>> > in shrink_slab. So other task should get the CPU and we should not see
>> > 100% CPU of kswapd, I think.
>> >
>>
>> cond_resched() is not a substitute for going to sleep.
>
> Of course, it's not equal with sleep but other task should get CPU and conusme their time slice
> So we should never see 100% CPU consumption of kswapd.
> No?
If the rest of the system is idle, then kswapd will happily use 100%
CPU. (Or on a multi-core system, kswapd will use close to 100% of one
CPU even if another task is using the other one. This is bad enough
on a desktop, but on a laptop you start to notice when your battery
dies.)
--Andy
>
>>
>> --
>> Mel Gorman
>> SUSE Labs
>
> --
> Kind regards,
> Minchan Kim
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small
2011-07-21 16:36 ` Andrew Lutomirski
@ 2011-07-21 16:42 ` Minchan Kim
2011-07-21 16:58 ` Andrew Lutomirski
0 siblings, 1 reply; 16+ messages in thread
From: Minchan Kim @ 2011-07-21 16:42 UTC (permalink / raw)
To: Andrew Lutomirski
Cc: Mel Gorman, Andrew Morton, P?draig Brady, James Bottomley,
Colin King, Rik van Riel, Johannes Weiner, linux-mm, linux-kernel
On Thu, Jul 21, 2011 at 12:36:11PM -0400, Andrew Lutomirski wrote:
> On Thu, Jul 21, 2011 at 12:24 PM, Minchan Kim <minchan.kim@gmail.com> wrote:
> > On Thu, Jul 21, 2011 at 05:09:59PM +0100, Mel Gorman wrote:
> >> On Fri, Jul 22, 2011 at 12:37:22AM +0900, Minchan Kim wrote:
> >> > On Fri, Jun 24, 2011 at 03:44:53PM +0100, Mel Gorman wrote:
> >> > > (Built this time and passed a basic sniff-test.)
> >> > >
> >> > > During allocator-intensive workloads, kswapd will be woken frequently
> >> > > causing free memory to oscillate between the high and min watermark.
> >> > > This is expected behaviour. Unfortunately, if the highest zone is
> >> > > small, a problem occurs.
> >> > >
> >> > > This seems to happen most with recent sandybridge laptops but it's
> >> > > probably a co-incidence as some of these laptops just happen to have
> >> > > a small Normal zone. The reproduction case is almost always during
> >> > > copying large files that kswapd pegs at 100% CPU until the file is
> >> > > deleted or cache is dropped.
> >> > >
> >> > > The problem is mostly down to sleeping_prematurely() keeping kswapd
> >> > > awake when the highest zone is small and unreclaimable and compounded
> >> > > by the fact we shrink slabs even when not shrinking zones causing a lot
> >> > > of time to be spent in shrinkers and a lot of memory to be reclaimed.
> >> > >
> >> > > Patch 1 corrects sleeping_prematurely to check the zones matching
> >> > > the classzone_idx instead of all zones.
> >> > >
> >> > > Patch 2 avoids shrinking slab when we are not shrinking a zone.
> >> > >
> >> > > Patch 3 notes that sleeping_prematurely is checking lower zones against
> >> > > a high classzone which is not what allocators or balance_pgdat()
> >> > > is doing leading to an artifical believe that kswapd should be
> >> > > still awake.
> >> > >
> >> > > Patch 4 notes that when balance_pgdat() gives up on a high zone that the
> >> > > decision is not communicated to sleeping_prematurely()
> >> > >
> >> > > This problem affects 2.6.38.8 for certain and is expected to affect
> >> > > 2.6.39 and 3.0-rc4 as well. If accepted, they need to go to -stable
> >> > > to be picked up by distros and this series is against 3.0-rc4. I've
> >> > > cc'd people that reported similar problems recently to see if they
> >> > > still suffer from the problem and if this fixes it.
> >> > >
> >> >
> >> > Good!
> >> > This patch solved the problem.
> >> > But there is still a mystery.
> >> >
> >> > In log, we could see excessive shrink_slab calls.
> >>
> >> Yes, because shrink_slab() was called on each loop through
> >> balance_pgdat() even if the zone was balanced.
> >>
> >>
> >> > And as you know, we had merged patch which adds cond_resched where last of the function
> >> > in shrink_slab. So other task should get the CPU and we should not see
> >> > 100% CPU of kswapd, I think.
> >> >
> >>
> >> cond_resched() is not a substitute for going to sleep.
> >
> > Of course, it's not equal with sleep but other task should get CPU and conusme their time slice
> > So we should never see 100% CPU consumption of kswapd.
> > No?
>
> If the rest of the system is idle, then kswapd will happily use 100%
> CPU. (Or on a multi-core system, kswapd will use close to 100% of one
Of course. But at least, we have a test program and I think it's not idle.
> CPU even if another task is using the other one. This is bad enough
> on a desktop, but on a laptop you start to notice when your battery
Of course it's bad. :)
What I want to know is just what's exact cause of 100% CPU usage.
It might be not 100% but we might use the word sloppily.
> dies.)
>
> --Andy
>
> >
> >>
> >> --
> >> Mel Gorman
> >> SUSE Labs
> >
> > --
> > Kind regards,
> > Minchan Kim
> >
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small
2011-07-21 16:42 ` Minchan Kim
@ 2011-07-21 16:58 ` Andrew Lutomirski
2011-07-22 0:30 ` Minchan Kim
0 siblings, 1 reply; 16+ messages in thread
From: Andrew Lutomirski @ 2011-07-21 16:58 UTC (permalink / raw)
To: Minchan Kim
Cc: Mel Gorman, Andrew Morton, P?draig Brady, James Bottomley,
Colin King, Rik van Riel, Johannes Weiner, linux-mm, linux-kernel
On Thu, Jul 21, 2011 at 12:42 PM, Minchan Kim <minchan.kim@gmail.com> wrote:
> On Thu, Jul 21, 2011 at 12:36:11PM -0400, Andrew Lutomirski wrote:
>> On Thu, Jul 21, 2011 at 12:24 PM, Minchan Kim <minchan.kim@gmail.com> wrote:
>> > On Thu, Jul 21, 2011 at 05:09:59PM +0100, Mel Gorman wrote:
>> >> On Fri, Jul 22, 2011 at 12:37:22AM +0900, Minchan Kim wrote:
>> >> > On Fri, Jun 24, 2011 at 03:44:53PM +0100, Mel Gorman wrote:
>> >> > > (Built this time and passed a basic sniff-test.)
>> >> > >
>> >> > > During allocator-intensive workloads, kswapd will be woken frequently
>> >> > > causing free memory to oscillate between the high and min watermark.
>> >> > > This is expected behaviour. Unfortunately, if the highest zone is
>> >> > > small, a problem occurs.
>> >> > >
>> >> > > This seems to happen most with recent sandybridge laptops but it's
>> >> > > probably a co-incidence as some of these laptops just happen to have
>> >> > > a small Normal zone. The reproduction case is almost always during
>> >> > > copying large files that kswapd pegs at 100% CPU until the file is
>> >> > > deleted or cache is dropped.
>> >> > >
>> >> > > The problem is mostly down to sleeping_prematurely() keeping kswapd
>> >> > > awake when the highest zone is small and unreclaimable and compounded
>> >> > > by the fact we shrink slabs even when not shrinking zones causing a lot
>> >> > > of time to be spent in shrinkers and a lot of memory to be reclaimed.
>> >> > >
>> >> > > Patch 1 corrects sleeping_prematurely to check the zones matching
>> >> > > the classzone_idx instead of all zones.
>> >> > >
>> >> > > Patch 2 avoids shrinking slab when we are not shrinking a zone.
>> >> > >
>> >> > > Patch 3 notes that sleeping_prematurely is checking lower zones against
>> >> > > a high classzone which is not what allocators or balance_pgdat()
>> >> > > is doing leading to an artifical believe that kswapd should be
>> >> > > still awake.
>> >> > >
>> >> > > Patch 4 notes that when balance_pgdat() gives up on a high zone that the
>> >> > > decision is not communicated to sleeping_prematurely()
>> >> > >
>> >> > > This problem affects 2.6.38.8 for certain and is expected to affect
>> >> > > 2.6.39 and 3.0-rc4 as well. If accepted, they need to go to -stable
>> >> > > to be picked up by distros and this series is against 3.0-rc4. I've
>> >> > > cc'd people that reported similar problems recently to see if they
>> >> > > still suffer from the problem and if this fixes it.
>> >> > >
>> >> >
>> >> > Good!
>> >> > This patch solved the problem.
>> >> > But there is still a mystery.
>> >> >
>> >> > In log, we could see excessive shrink_slab calls.
>> >>
>> >> Yes, because shrink_slab() was called on each loop through
>> >> balance_pgdat() even if the zone was balanced.
>> >>
>> >>
>> >> > And as you know, we had merged patch which adds cond_resched where last of the function
>> >> > in shrink_slab. So other task should get the CPU and we should not see
>> >> > 100% CPU of kswapd, I think.
>> >> >
>> >>
>> >> cond_resched() is not a substitute for going to sleep.
>> >
>> > Of course, it's not equal with sleep but other task should get CPU and conusme their time slice
>> > So we should never see 100% CPU consumption of kswapd.
>> > No?
>>
>> If the rest of the system is idle, then kswapd will happily use 100%
>> CPU. (Or on a multi-core system, kswapd will use close to 100% of one
>
> Of course. But at least, we have a test program and I think it's not idle.
The test program I used was 'top', which is pretty close to idle.
>
>> CPU even if another task is using the other one. This is bad enough
>> on a desktop, but on a laptop you start to notice when your battery
>> dies.)
>
> Of course it's bad. :)
> What I want to know is just what's exact cause of 100% CPU usage.
> It might be not 100% but we might use the word sloppily.
>
Well, if you want to pedantic, my laptop can, in theory, demonstrate
true 100% CPU usage. Trigger the bug, suspend every other thread, and
listen to the laptop fan spin and feel the laptop get hot. (The fan
is controlled by the EC and takes no CPU.)
In practice, the usage was close enough to 100% that it got rounded.
The cond_resched was enough to at least make the system responsive
instead of the hard freeze I used to get.
--Andy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small
2011-07-21 16:58 ` Andrew Lutomirski
@ 2011-07-22 0:30 ` Minchan Kim
2011-07-22 13:21 ` Andrew Lutomirski
0 siblings, 1 reply; 16+ messages in thread
From: Minchan Kim @ 2011-07-22 0:30 UTC (permalink / raw)
To: Andrew Lutomirski
Cc: Mel Gorman, Andrew Morton, P?draig Brady, James Bottomley,
Colin King, Rik van Riel, Johannes Weiner, linux-mm, linux-kernel
On Fri, Jul 22, 2011 at 1:58 AM, Andrew Lutomirski <luto@mit.edu> wrote:
> On Thu, Jul 21, 2011 at 12:42 PM, Minchan Kim <minchan.kim@gmail.com> wrote:
>> On Thu, Jul 21, 2011 at 12:36:11PM -0400, Andrew Lutomirski wrote:
>>> On Thu, Jul 21, 2011 at 12:24 PM, Minchan Kim <minchan.kim@gmail.com> wrote:
>>> > On Thu, Jul 21, 2011 at 05:09:59PM +0100, Mel Gorman wrote:
>>> >> On Fri, Jul 22, 2011 at 12:37:22AM +0900, Minchan Kim wrote:
>>> >> > On Fri, Jun 24, 2011 at 03:44:53PM +0100, Mel Gorman wrote:
>>> >> > > (Built this time and passed a basic sniff-test.)
>>> >> > >
>>> >> > > During allocator-intensive workloads, kswapd will be woken frequently
>>> >> > > causing free memory to oscillate between the high and min watermark.
>>> >> > > This is expected behaviour. Unfortunately, if the highest zone is
>>> >> > > small, a problem occurs.
>>> >> > >
>>> >> > > This seems to happen most with recent sandybridge laptops but it's
>>> >> > > probably a co-incidence as some of these laptops just happen to have
>>> >> > > a small Normal zone. The reproduction case is almost always during
>>> >> > > copying large files that kswapd pegs at 100% CPU until the file is
>>> >> > > deleted or cache is dropped.
>>> >> > >
>>> >> > > The problem is mostly down to sleeping_prematurely() keeping kswapd
>>> >> > > awake when the highest zone is small and unreclaimable and compounded
>>> >> > > by the fact we shrink slabs even when not shrinking zones causing a lot
>>> >> > > of time to be spent in shrinkers and a lot of memory to be reclaimed.
>>> >> > >
>>> >> > > Patch 1 corrects sleeping_prematurely to check the zones matching
>>> >> > > the classzone_idx instead of all zones.
>>> >> > >
>>> >> > > Patch 2 avoids shrinking slab when we are not shrinking a zone.
>>> >> > >
>>> >> > > Patch 3 notes that sleeping_prematurely is checking lower zones against
>>> >> > > a high classzone which is not what allocators or balance_pgdat()
>>> >> > > is doing leading to an artifical believe that kswapd should be
>>> >> > > still awake.
>>> >> > >
>>> >> > > Patch 4 notes that when balance_pgdat() gives up on a high zone that the
>>> >> > > decision is not communicated to sleeping_prematurely()
>>> >> > >
>>> >> > > This problem affects 2.6.38.8 for certain and is expected to affect
>>> >> > > 2.6.39 and 3.0-rc4 as well. If accepted, they need to go to -stable
>>> >> > > to be picked up by distros and this series is against 3.0-rc4. I've
>>> >> > > cc'd people that reported similar problems recently to see if they
>>> >> > > still suffer from the problem and if this fixes it.
>>> >> > >
>>> >> >
>>> >> > Good!
>>> >> > This patch solved the problem.
>>> >> > But there is still a mystery.
>>> >> >
>>> >> > In log, we could see excessive shrink_slab calls.
>>> >>
>>> >> Yes, because shrink_slab() was called on each loop through
>>> >> balance_pgdat() even if the zone was balanced.
>>> >>
>>> >>
>>> >> > And as you know, we had merged patch which adds cond_resched where last of the function
>>> >> > in shrink_slab. So other task should get the CPU and we should not see
>>> >> > 100% CPU of kswapd, I think.
>>> >> >
>>> >>
>>> >> cond_resched() is not a substitute for going to sleep.
>>> >
>>> > Of course, it's not equal with sleep but other task should get CPU and conusme their time slice
>>> > So we should never see 100% CPU consumption of kswapd.
>>> > No?
>>>
>>> If the rest of the system is idle, then kswapd will happily use 100%
>>> CPU. (Or on a multi-core system, kswapd will use close to 100% of one
>>
>> Of course. But at least, we have a test program and I think it's not idle.
>
> The test program I used was 'top', which is pretty close to idle.
>
>>
>>> CPU even if another task is using the other one. This is bad enough
>>> on a desktop, but on a laptop you start to notice when your battery
>>> dies.)
>>
>> Of course it's bad. :)
>> What I want to know is just what's exact cause of 100% CPU usage.
>> It might be not 100% but we might use the word sloppily.
>>
>
> Well, if you want to pedantic, my laptop can, in theory, demonstrate
> true 100% CPU usage. Trigger the bug, suspend every other thread, and
> listen to the laptop fan spin and feel the laptop get hot. (The fan
> is controlled by the EC and takes no CPU.)
>
> In practice, the usage was close enough to 100% that it got rounded.
>
> The cond_resched was enough to at least make the system responsive
> instead of the hard freeze I used to get.
I don't want to be pedantic. :)
What I have a thought about 100% CPU usage was that it doesn't yield
CPU and spins on the CPU but as I heard your example(ie, cond_resched
makes the system responsive), it's not the case. It was just to use
most of time in kswapd, not 100%. It seems I was paranoid about the
word, sorry for that.
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small
2011-07-22 0:30 ` Minchan Kim
@ 2011-07-22 13:21 ` Andrew Lutomirski
0 siblings, 0 replies; 16+ messages in thread
From: Andrew Lutomirski @ 2011-07-22 13:21 UTC (permalink / raw)
To: Minchan Kim
Cc: Mel Gorman, Andrew Morton, P?draig Brady, James Bottomley,
Colin King, Rik van Riel, Johannes Weiner, linux-mm, linux-kernel
On Thu, Jul 21, 2011 at 8:30 PM, Minchan Kim <minchan.kim@gmail.com> wrote:
> On Fri, Jul 22, 2011 at 1:58 AM, Andrew Lutomirski <luto@mit.edu> wrote:
>> On Thu, Jul 21, 2011 at 12:42 PM, Minchan Kim <minchan.kim@gmail.com> wrote:
>>> On Thu, Jul 21, 2011 at 12:36:11PM -0400, Andrew Lutomirski wrote:
>>>> On Thu, Jul 21, 2011 at 12:24 PM, Minchan Kim <minchan.kim@gmail.com> wrote:
>>>> > On Thu, Jul 21, 2011 at 05:09:59PM +0100, Mel Gorman wrote:
>>>> >> On Fri, Jul 22, 2011 at 12:37:22AM +0900, Minchan Kim wrote:
>>>> >> > On Fri, Jun 24, 2011 at 03:44:53PM +0100, Mel Gorman wrote:
>>>> >> > > (Built this time and passed a basic sniff-test.)
>>>> >> > >
>>>> >> > > During allocator-intensive workloads, kswapd will be woken frequently
>>>> >> > > causing free memory to oscillate between the high and min watermark.
>>>> >> > > This is expected behaviour. Unfortunately, if the highest zone is
>>>> >> > > small, a problem occurs.
>>>> >> > >
>>>> >> > > This seems to happen most with recent sandybridge laptops but it's
>>>> >> > > probably a co-incidence as some of these laptops just happen to have
>>>> >> > > a small Normal zone. The reproduction case is almost always during
>>>> >> > > copying large files that kswapd pegs at 100% CPU until the file is
>>>> >> > > deleted or cache is dropped.
>>>> >> > >
>>>> >> > > The problem is mostly down to sleeping_prematurely() keeping kswapd
>>>> >> > > awake when the highest zone is small and unreclaimable and compounded
>>>> >> > > by the fact we shrink slabs even when not shrinking zones causing a lot
>>>> >> > > of time to be spent in shrinkers and a lot of memory to be reclaimed.
>>>> >> > >
>>>> >> > > Patch 1 corrects sleeping_prematurely to check the zones matching
>>>> >> > > the classzone_idx instead of all zones.
>>>> >> > >
>>>> >> > > Patch 2 avoids shrinking slab when we are not shrinking a zone.
>>>> >> > >
>>>> >> > > Patch 3 notes that sleeping_prematurely is checking lower zones against
>>>> >> > > a high classzone which is not what allocators or balance_pgdat()
>>>> >> > > is doing leading to an artifical believe that kswapd should be
>>>> >> > > still awake.
>>>> >> > >
>>>> >> > > Patch 4 notes that when balance_pgdat() gives up on a high zone that the
>>>> >> > > decision is not communicated to sleeping_prematurely()
>>>> >> > >
>>>> >> > > This problem affects 2.6.38.8 for certain and is expected to affect
>>>> >> > > 2.6.39 and 3.0-rc4 as well. If accepted, they need to go to -stable
>>>> >> > > to be picked up by distros and this series is against 3.0-rc4. I've
>>>> >> > > cc'd people that reported similar problems recently to see if they
>>>> >> > > still suffer from the problem and if this fixes it.
>>>> >> > >
>>>> >> >
>>>> >> > Good!
>>>> >> > This patch solved the problem.
>>>> >> > But there is still a mystery.
>>>> >> >
>>>> >> > In log, we could see excessive shrink_slab calls.
>>>> >>
>>>> >> Yes, because shrink_slab() was called on each loop through
>>>> >> balance_pgdat() even if the zone was balanced.
>>>> >>
>>>> >>
>>>> >> > And as you know, we had merged patch which adds cond_resched where last of the function
>>>> >> > in shrink_slab. So other task should get the CPU and we should not see
>>>> >> > 100% CPU of kswapd, I think.
>>>> >> >
>>>> >>
>>>> >> cond_resched() is not a substitute for going to sleep.
>>>> >
>>>> > Of course, it's not equal with sleep but other task should get CPU and conusme their time slice
>>>> > So we should never see 100% CPU consumption of kswapd.
>>>> > No?
>>>>
>>>> If the rest of the system is idle, then kswapd will happily use 100%
>>>> CPU. (Or on a multi-core system, kswapd will use close to 100% of one
>>>
>>> Of course. But at least, we have a test program and I think it's not idle.
>>
>> The test program I used was 'top', which is pretty close to idle.
>>
>>>
>>>> CPU even if another task is using the other one. This is bad enough
>>>> on a desktop, but on a laptop you start to notice when your battery
>>>> dies.)
>>>
>>> Of course it's bad. :)
>>> What I want to know is just what's exact cause of 100% CPU usage.
>>> It might be not 100% but we might use the word sloppily.
>>>
>>
>> Well, if you want to pedantic, my laptop can, in theory, demonstrate
>> true 100% CPU usage. Trigger the bug, suspend every other thread, and
>> listen to the laptop fan spin and feel the laptop get hot. (The fan
>> is controlled by the EC and takes no CPU.)
>>
>> In practice, the usage was close enough to 100% that it got rounded.
>>
>> The cond_resched was enough to at least make the system responsive
>> instead of the hard freeze I used to get.
>
> I don't want to be pedantic. :)
> What I have a thought about 100% CPU usage was that it doesn't yield
> CPU and spins on the CPU but as I heard your example(ie, cond_resched
> makes the system responsive), it's not the case. It was just to use
> most of time in kswapd, not 100%. It seems I was paranoid about the
> word, sorry for that.
Ah, sorry. I must have been unclear in my original email.
In 2.6.39, it made my system unresponsive. With your cond_resched and
pgdat_balanced fixes, it just made kswapd eat all available CPU, but
the system still worked.
--Andy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2011-07-22 13:22 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-24 13:43 [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small Mel Gorman
2011-06-24 13:43 ` [PATCH 1/4] mm: vmscan: Correct check for kswapd sleeping in sleeping_prematurely Mel Gorman
2011-06-24 13:43 ` [PATCH 2/4] mm: vmscan: Do not apply pressure to slab if we are not applying pressure to zone Mel Gorman
2011-06-24 13:59 ` Mel Gorman
2011-06-24 13:43 ` [PATCH 3/4] mm: vmscan: Evaluate the watermarks against the correct classzone Mel Gorman
2011-06-24 13:43 ` [PATCH 4/4] mm: vmscan: Only read new_classzone_idx from pgdat when reclaiming successfully Mel Gorman
-- strict thread matches above, loose matches on Subject: below --
2011-06-24 14:44 [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small Mel Gorman
2011-06-25 14:23 ` Andrew Lutomirski
2011-07-21 15:37 ` Minchan Kim
2011-07-21 16:09 ` Mel Gorman
2011-07-21 16:24 ` Minchan Kim
2011-07-21 16:36 ` Andrew Lutomirski
2011-07-21 16:42 ` Minchan Kim
2011-07-21 16:58 ` Andrew Lutomirski
2011-07-22 0:30 ` Minchan Kim
2011-07-22 13:21 ` Andrew Lutomirski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).