* [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small @ 2011-06-24 13:43 Mel Gorman 2011-06-24 13:43 ` [PATCH 1/4] mm: vmscan: Correct check for kswapd sleeping in sleeping_prematurely Mel Gorman ` (3 more replies) 0 siblings, 4 replies; 13+ messages in thread From: Mel Gorman @ 2011-06-24 13:43 UTC (permalink / raw) To: Andrew Morton Cc: Pádraig Brady, James Bottomley, Colin King, Minchan Kim, Andrew Lutomirski, Rik van Riel, Johannes Weiner, linux-mm, linux-kernel, Mel Gorman During allocator-intensive workloads, kswapd will be woken frequently causing free memory to oscillate between the high and min watermark. This is expected behaviour. Unfortunately, if the highest zone is small, a problem occurs. This seems to happen most with recent sandybridge laptops but it's probably a co-incidence as some of these laptops just happen to have a small Normal zone. The reproduction case is almost always during copying large files that kswapd pegs at 100% CPU until the file is deleted or cache is dropped. The problem is mostly down to sleeping_prematurely() keeping kswapd awake when the highest zone is small and unreclaimable and compounded by the fact we shrink slabs even when not shrinking zones causing a lot of time to be spent in shrinkers and a lot of memory to be reclaimed. Patch 1 corrects sleeping_prematurely to check the zones matching the classzone_idx instead of all zones. Patch 2 avoids shrinking slab when we are not shrinking a zone. Patch 3 notes that sleeping_prematurely is checking lower zones against a high classzone which is not what allocators or balance_pgdat() is doing leading to an artifical believe that kswapd should be still awake. Patch 4 notes that when balance_pgdat() gives up on a high zone that the decision is not communicated to sleeping_prematurely() This problem affects 3.0-rc4 and 2.6.38.8 for certain and is expected to affect 2.6.39 as well. If accepted, they need to go to -stable to be picked up by distros. This series is against 3.0-rc4. I've cc'd people that reported similar problems recently to see if they still suffer from the problem and if this fixes it. mm/vmscan.c | 57 ++++++++++++++++++++++++++++++++++----------------------- 1 files changed, 34 insertions(+), 23 deletions(-) -- 1.7.3.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/4] mm: vmscan: Correct check for kswapd sleeping in sleeping_prematurely 2011-06-24 13:43 [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small Mel Gorman @ 2011-06-24 13:43 ` Mel Gorman 2011-06-24 13:43 ` [PATCH 2/4] mm: vmscan: Do not apply pressure to slab if we are not applying pressure to zone Mel Gorman ` (2 subsequent siblings) 3 siblings, 0 replies; 13+ messages in thread From: Mel Gorman @ 2011-06-24 13:43 UTC (permalink / raw) To: Andrew Morton Cc: Pádraig Brady, James Bottomley, Colin King, Minchan Kim, Andrew Lutomirski, Rik van Riel, Johannes Weiner, linux-mm, linux-kernel, Mel Gorman During allocator-intensive workloads, kswapd will be woken frequently causing free memory to oscillate between the high and min watermark. This is expected behaviour. A problem occurs if the highest zone is small. balance_pgdat() only considers unreclaimable zones when priority is DEF_PRIORITY but sleeping_prematurely considers all zones. It's possible for this sequence to occur 1. kswapd wakes up and enters balance_pgdat() 2. At DEF_PRIORITY, marks highest zone unreclaimable 3. At DEF_PRIORITY-1, ignores highest zone setting end_zone 4. At DEF_PRIORITY-1, calls shrink_slab freeing memory from highest zone, clearing all_unreclaimable. Highest zone is still unbalanced 5. kswapd returns and calls sleeping_prematurely 6. sleeping_prematurely looks at *all* zones, not just the ones being considered by balance_pgdat. The highest small zone has all_unreclaimable cleared but but the zone is not balanced. all_zones_ok is false so kswapd stays awake This patch corrects the behaviour of sleeping_prematurely to check the zones balance_pgdat() checked. Reported-and-tested-by: PA!draig Brady <P@draigBrady.com> Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/vmscan.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 8ff834e..841e3bf 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2323,7 +2323,7 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining, return true; /* Check the watermark levels */ - for (i = 0; i < pgdat->nr_zones; i++) { + for (i = 0; i <= classzone_idx; i++) { struct zone *zone = pgdat->node_zones + i; if (!populated_zone(zone)) -- 1.7.3.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/4] mm: vmscan: Do not apply pressure to slab if we are not applying pressure to zone 2011-06-24 13:43 [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small Mel Gorman 2011-06-24 13:43 ` [PATCH 1/4] mm: vmscan: Correct check for kswapd sleeping in sleeping_prematurely Mel Gorman @ 2011-06-24 13:43 ` Mel Gorman 2011-06-24 13:59 ` Mel Gorman 2011-06-24 13:43 ` [PATCH 3/4] mm: vmscan: Evaluate the watermarks against the correct classzone Mel Gorman 2011-06-24 13:43 ` [PATCH 4/4] mm: vmscan: Only read new_classzone_idx from pgdat when reclaiming successfully Mel Gorman 3 siblings, 1 reply; 13+ messages in thread From: Mel Gorman @ 2011-06-24 13:43 UTC (permalink / raw) To: Andrew Morton Cc: Pádraig Brady, James Bottomley, Colin King, Minchan Kim, Andrew Lutomirski, Rik van Riel, Johannes Weiner, linux-mm, linux-kernel, Mel Gorman During allocator-intensive workloads, kswapd will be woken frequently causing free memory to oscillate between the high and min watermark. This is expected behaviour. When kswapd applies pressure to zones during node balancing, it checks if the zone is above a high+balance_gap threshold. If it is, it does not apply pressure but it unconditionally shrinks slab on a global basis which is excessive. In the event kswapd is being kept awake due to a high small unreclaimable zone, it skips zone shrinking but still calls shrink_slab(). Once pressure has been applied, the check for zone being unreclaimable is being made before the check is made if all_unreclaimable should be set. This miss of unreclaimable can cause has_under_min_watermark_zone to be set due to an unreclaimable zone preventing kswapd backing off on congestion_wait(). Reported-and-tested-by: PA!draig Brady <P@draigBrady.com> Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/vmscan.c | 21 ++++++++++++--------- 1 files changed, 12 insertions(+), 9 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 841e3bf..38665ec 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2509,16 +2509,16 @@ loop_again: high_wmark_pages(zone) + balance_gap, end_zone, 0)) shrink_zone(priority, zone, &sc); - reclaim_state->reclaimed_slab = 0; - nr_slab = shrink_slab(&shrink, sc.nr_scanned, lru_pages); - sc.nr_reclaimed += reclaim_state->reclaimed_slab; - total_scanned += sc.nr_scanned; - if (zone->all_unreclaimable) - continue; - if (nr_slab == 0 && - !zone_reclaimable(zone)) - zone->all_unreclaimable = 1; + reclaim_state->reclaimed_slab = 0; + nr_slab = shrink_slab(&shrink, sc.nr_scanned, lru_pages); + sc.nr_reclaimed += reclaim_state->reclaimed_slab; + total_scanned += sc.nr_scanned; + + if (nr_slab == 0 && !zone_reclaimable(zone)) + zone->all_unreclaimable = 1; + } + /* * If we've done a decent amount of scanning and * the reclaim ratio is low, start doing writepage @@ -2528,6 +2528,9 @@ loop_again: total_scanned > sc.nr_reclaimed + sc.nr_reclaimed / 2) sc.may_writepage = 1; + if (zone->all_unreclaimable) + continue; + if (!zone_watermark_ok_safe(zone, order, high_wmark_pages(zone), end_zone, 0)) { all_zones_ok = 0; -- 1.7.3.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/4] mm: vmscan: Do not apply pressure to slab if we are not applying pressure to zone 2011-06-24 13:43 ` [PATCH 2/4] mm: vmscan: Do not apply pressure to slab if we are not applying pressure to zone Mel Gorman @ 2011-06-24 13:59 ` Mel Gorman 0 siblings, 0 replies; 13+ messages in thread From: Mel Gorman @ 2011-06-24 13:59 UTC (permalink / raw) To: Andrew Morton Cc: P?draig Brady, James Bottomley, Colin King, Minchan Kim, Andrew Lutomirski, Rik van Riel, Johannes Weiner, linux-mm, linux-kernel On Fri, Jun 24, 2011 at 02:43:16PM +0100, Mel Gorman wrote: > During allocator-intensive workloads, kswapd will be woken frequently > causing free memory to oscillate between the high and min watermark. > This is expected behaviour. > Bah, I accidentally exported a branch with a build error in this patch. Will resend shortly. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 3/4] mm: vmscan: Evaluate the watermarks against the correct classzone 2011-06-24 13:43 [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small Mel Gorman 2011-06-24 13:43 ` [PATCH 1/4] mm: vmscan: Correct check for kswapd sleeping in sleeping_prematurely Mel Gorman 2011-06-24 13:43 ` [PATCH 2/4] mm: vmscan: Do not apply pressure to slab if we are not applying pressure to zone Mel Gorman @ 2011-06-24 13:43 ` Mel Gorman 2011-06-24 13:43 ` [PATCH 4/4] mm: vmscan: Only read new_classzone_idx from pgdat when reclaiming successfully Mel Gorman 3 siblings, 0 replies; 13+ messages in thread From: Mel Gorman @ 2011-06-24 13:43 UTC (permalink / raw) To: Andrew Morton Cc: Pádraig Brady, James Bottomley, Colin King, Minchan Kim, Andrew Lutomirski, Rik van Riel, Johannes Weiner, linux-mm, linux-kernel, Mel Gorman When deciding if kswapd is sleeping prematurely, the classzone is taken into account but this is different to what balance_pgdat() and the allocator are doing. Specifically, the DMA zone will be checked based on the classzone used when waking kswapd which could be for a GFP_KERNEL or GFP_HIGHMEM request. The lowmem reserve limit kicks in, the watermark is not met and kswapd thinks its sleeping prematurely keeping kswapd awake in error. Reported-and-tested-by: PA!draig Brady <P@draigBrady.com> Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/vmscan.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 38665ec..d859111 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2341,7 +2341,7 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining, } if (!zone_watermark_ok_safe(zone, order, high_wmark_pages(zone), - classzone_idx, 0)) + i, 0)) all_zones_ok = false; else balanced += zone->present_pages; -- 1.7.3.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 4/4] mm: vmscan: Only read new_classzone_idx from pgdat when reclaiming successfully 2011-06-24 13:43 [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small Mel Gorman ` (2 preceding siblings ...) 2011-06-24 13:43 ` [PATCH 3/4] mm: vmscan: Evaluate the watermarks against the correct classzone Mel Gorman @ 2011-06-24 13:43 ` Mel Gorman 3 siblings, 0 replies; 13+ messages in thread From: Mel Gorman @ 2011-06-24 13:43 UTC (permalink / raw) To: Andrew Morton Cc: Pádraig Brady, James Bottomley, Colin King, Minchan Kim, Andrew Lutomirski, Rik van Riel, Johannes Weiner, linux-mm, linux-kernel, Mel Gorman During allocator-intensive workloads, kswapd will be woken frequently causing free memory to oscillate between the high and min watermark. This is expected behaviour. Unfortunately, if the highest zone is small, a problem occurs. When balance_pgdat() returns, it may be at a lower classzone_idx than it started because the highest zone was unreclaimable. Before checking if it should go to sleep though, it checks pgdat->classzone_idx which when there is no other activity will be MAX_NR_ZONES-1. It interprets this as it has been woken up while reclaiming, skips scheduling and reclaims again. As there is no useful reclaim work to do, it enters into a loop of shrinking slab consuming loads of CPU until the highest zone becomes reclaimable for a long period of time. There are two problems here. 1) If the returned classzone or order is lower, it'll continue reclaiming without scheduling. 2) if the highest zone was marked unreclaimable but balance_pgdat() returns immediately at DEF_PRIORITY, the new lower classzone is not communicated back to kswapd() for sleeping. This patch does two things that are related. If the end_zone is unreclaimable, this information is communicated back. Second, if the classzone or order was reduced due to failing to reclaim, new information is not read from pgdat and instead an attempt is made to go to sleep. Due to this, it is also necessary that pgdat->classzone_idx be initialised each time to pgdat->nr_zones - 1 to avoid re-reads being interpreted as wakeups. Reported-and-tested-by: PA!draig Brady <P@draigBrady.com> Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/vmscan.c | 34 +++++++++++++++++++++------------- 1 files changed, 21 insertions(+), 13 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index d859111..9297195 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2448,7 +2448,6 @@ loop_again: if (!zone_watermark_ok_safe(zone, order, high_wmark_pages(zone), 0, 0)) { end_zone = i; - *classzone_idx = i; break; } } @@ -2528,8 +2527,11 @@ loop_again: total_scanned > sc.nr_reclaimed + sc.nr_reclaimed / 2) sc.may_writepage = 1; - if (zone->all_unreclaimable) + if (zone->all_unreclaimable) { + if (end_zone && end_zone == i) + end_zone--; continue; + } if (!zone_watermark_ok_safe(zone, order, high_wmark_pages(zone), end_zone, 0)) { @@ -2709,8 +2711,8 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int order, int classzone_idx) */ static int kswapd(void *p) { - unsigned long order; - int classzone_idx; + unsigned long order, new_order; + int classzone_idx, new_classzone_idx; pg_data_t *pgdat = (pg_data_t*)p; struct task_struct *tsk = current; @@ -2740,17 +2742,23 @@ static int kswapd(void *p) tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD; set_freezable(); - order = 0; - classzone_idx = MAX_NR_ZONES - 1; + order = new_order = 0; + classzone_idx = new_classzone_idx = pgdat->nr_zones - 1; for ( ; ; ) { - unsigned long new_order; - int new_classzone_idx; int ret; - new_order = pgdat->kswapd_max_order; - new_classzone_idx = pgdat->classzone_idx; - pgdat->kswapd_max_order = 0; - pgdat->classzone_idx = MAX_NR_ZONES - 1; + /* + * If the last balance_pgdat was unsuccessful it's unlikely a + * new request of a similar or harder type will succeed soon + * so consider going to sleep on the basis we reclaimed at + */ + if (classzone_idx >= new_classzone_idx && order == new_order) { + new_order = pgdat->kswapd_max_order; + new_classzone_idx = pgdat->classzone_idx; + pgdat->kswapd_max_order = 0; + pgdat->classzone_idx = pgdat->nr_zones - 1; + } + if (order < new_order || classzone_idx > new_classzone_idx) { /* * Don't sleep if someone wants a larger 'order' @@ -2763,7 +2771,7 @@ static int kswapd(void *p) order = pgdat->kswapd_max_order; classzone_idx = pgdat->classzone_idx; pgdat->kswapd_max_order = 0; - pgdat->classzone_idx = MAX_NR_ZONES - 1; + pgdat->classzone_idx = pgdat->nr_zones - 1; } ret = try_to_freeze(); -- 1.7.3.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small @ 2011-06-24 14:44 Mel Gorman 2011-06-24 14:44 ` [PATCH 1/4] mm: vmscan: Correct check for kswapd sleeping in sleeping_prematurely Mel Gorman 0 siblings, 1 reply; 13+ messages in thread From: Mel Gorman @ 2011-06-24 14:44 UTC (permalink / raw) To: Andrew Morton Cc: Pádraig Brady, James Bottomley, Colin King, Minchan Kim, Andrew Lutomirski, Rik van Riel, Johannes Weiner, linux-mm, linux-kernel, Mel Gorman (Built this time and passed a basic sniff-test.) During allocator-intensive workloads, kswapd will be woken frequently causing free memory to oscillate between the high and min watermark. This is expected behaviour. Unfortunately, if the highest zone is small, a problem occurs. This seems to happen most with recent sandybridge laptops but it's probably a co-incidence as some of these laptops just happen to have a small Normal zone. The reproduction case is almost always during copying large files that kswapd pegs at 100% CPU until the file is deleted or cache is dropped. The problem is mostly down to sleeping_prematurely() keeping kswapd awake when the highest zone is small and unreclaimable and compounded by the fact we shrink slabs even when not shrinking zones causing a lot of time to be spent in shrinkers and a lot of memory to be reclaimed. Patch 1 corrects sleeping_prematurely to check the zones matching the classzone_idx instead of all zones. Patch 2 avoids shrinking slab when we are not shrinking a zone. Patch 3 notes that sleeping_prematurely is checking lower zones against a high classzone which is not what allocators or balance_pgdat() is doing leading to an artifical believe that kswapd should be still awake. Patch 4 notes that when balance_pgdat() gives up on a high zone that the decision is not communicated to sleeping_prematurely() This problem affects 2.6.38.8 for certain and is expected to affect 2.6.39 and 3.0-rc4 as well. If accepted, they need to go to -stable to be picked up by distros and this series is against 3.0-rc4. I've cc'd people that reported similar problems recently to see if they still suffer from the problem and if this fixes it. mm/vmscan.c | 59 +++++++++++++++++++++++++++++++++++------------------------ 1 files changed, 35 insertions(+), 24 deletions(-) -- 1.7.3.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/4] mm: vmscan: Correct check for kswapd sleeping in sleeping_prematurely 2011-06-24 14:44 [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small Mel Gorman @ 2011-06-24 14:44 ` Mel Gorman 2011-06-25 21:33 ` Rik van Riel ` (3 more replies) 0 siblings, 4 replies; 13+ messages in thread From: Mel Gorman @ 2011-06-24 14:44 UTC (permalink / raw) To: Andrew Morton Cc: Pádraig Brady, James Bottomley, Colin King, Minchan Kim, Andrew Lutomirski, Rik van Riel, Johannes Weiner, linux-mm, linux-kernel, Mel Gorman During allocator-intensive workloads, kswapd will be woken frequently causing free memory to oscillate between the high and min watermark. This is expected behaviour. A problem occurs if the highest zone is small. balance_pgdat() only considers unreclaimable zones when priority is DEF_PRIORITY but sleeping_prematurely considers all zones. It's possible for this sequence to occur 1. kswapd wakes up and enters balance_pgdat() 2. At DEF_PRIORITY, marks highest zone unreclaimable 3. At DEF_PRIORITY-1, ignores highest zone setting end_zone 4. At DEF_PRIORITY-1, calls shrink_slab freeing memory from highest zone, clearing all_unreclaimable. Highest zone is still unbalanced 5. kswapd returns and calls sleeping_prematurely 6. sleeping_prematurely looks at *all* zones, not just the ones being considered by balance_pgdat. The highest small zone has all_unreclaimable cleared but but the zone is not balanced. all_zones_ok is false so kswapd stays awake This patch corrects the behaviour of sleeping_prematurely to check the zones balance_pgdat() checked. Reported-and-tested-by: PA!draig Brady <P@draigBrady.com> Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/vmscan.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 8ff834e..841e3bf 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2323,7 +2323,7 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining, return true; /* Check the watermark levels */ - for (i = 0; i < pgdat->nr_zones; i++) { + for (i = 0; i <= classzone_idx; i++) { struct zone *zone = pgdat->node_zones + i; if (!populated_zone(zone)) -- 1.7.3.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 1/4] mm: vmscan: Correct check for kswapd sleeping in sleeping_prematurely 2011-06-24 14:44 ` [PATCH 1/4] mm: vmscan: Correct check for kswapd sleeping in sleeping_prematurely Mel Gorman @ 2011-06-25 21:33 ` Rik van Riel 2011-06-27 6:10 ` Minchan Kim ` (2 subsequent siblings) 3 siblings, 0 replies; 13+ messages in thread From: Rik van Riel @ 2011-06-25 21:33 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Pádraig Brady, James Bottomley, Colin King, Minchan Kim, Andrew Lutomirski, Johannes Weiner, linux-mm, linux-kernel On 06/24/2011 10:44 AM, Mel Gorman wrote: > During allocator-intensive workloads, kswapd will be woken frequently > causing free memory to oscillate between the high and min watermark. > This is expected behaviour. > > A problem occurs if the highest zone is small. balance_pgdat() > only considers unreclaimable zones when priority is DEF_PRIORITY > but sleeping_prematurely considers all zones. It's possible for this > sequence to occur > > 1. kswapd wakes up and enters balance_pgdat() > 2. At DEF_PRIORITY, marks highest zone unreclaimable > 3. At DEF_PRIORITY-1, ignores highest zone setting end_zone > 4. At DEF_PRIORITY-1, calls shrink_slab freeing memory from > highest zone, clearing all_unreclaimable. Highest zone > is still unbalanced > 5. kswapd returns and calls sleeping_prematurely > 6. sleeping_prematurely looks at *all* zones, not just the ones > being considered by balance_pgdat. The highest small zone > has all_unreclaimable cleared but but the zone is not > balanced. all_zones_ok is false so kswapd stays awake > > This patch corrects the behaviour of sleeping_prematurely to check > the zones balance_pgdat() checked. > > Reported-and-tested-by: PA!draig Brady<P@draigBrady.com> > Signed-off-by: Mel Gorman<mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> -- All rights reversed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/4] mm: vmscan: Correct check for kswapd sleeping in sleeping_prematurely 2011-06-24 14:44 ` [PATCH 1/4] mm: vmscan: Correct check for kswapd sleeping in sleeping_prematurely Mel Gorman 2011-06-25 21:33 ` Rik van Riel @ 2011-06-27 6:10 ` Minchan Kim 2011-06-28 21:49 ` Andrew Morton 2011-06-30 2:23 ` KOSAKI Motohiro 3 siblings, 0 replies; 13+ messages in thread From: Minchan Kim @ 2011-06-27 6:10 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Pádraig Brady, James Bottomley, Colin King, Andrew Lutomirski, Rik van Riel, Johannes Weiner, linux-mm, linux-kernel On Fri, Jun 24, 2011 at 11:44 PM, Mel Gorman <mgorman@suse.de> wrote: > During allocator-intensive workloads, kswapd will be woken frequently > causing free memory to oscillate between the high and min watermark. > This is expected behaviour. > > A problem occurs if the highest zone is small. balance_pgdat() > only considers unreclaimable zones when priority is DEF_PRIORITY > but sleeping_prematurely considers all zones. It's possible for this > sequence to occur > > 1. kswapd wakes up and enters balance_pgdat() > 2. At DEF_PRIORITY, marks highest zone unreclaimable > 3. At DEF_PRIORITY-1, ignores highest zone setting end_zone > 4. At DEF_PRIORITY-1, calls shrink_slab freeing memory from > highest zone, clearing all_unreclaimable. Highest zone > is still unbalanced > 5. kswapd returns and calls sleeping_prematurely > 6. sleeping_prematurely looks at *all* zones, not just the ones > being considered by balance_pgdat. The highest small zone > has all_unreclaimable cleared but but the zone is not > balanced. all_zones_ok is false so kswapd stays awake > > This patch corrects the behaviour of sleeping_prematurely to check > the zones balance_pgdat() checked. > > Reported-and-tested-by: Pádraig Brady <P@draigBrady.com> > Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/4] mm: vmscan: Correct check for kswapd sleeping in sleeping_prematurely 2011-06-24 14:44 ` [PATCH 1/4] mm: vmscan: Correct check for kswapd sleeping in sleeping_prematurely Mel Gorman 2011-06-25 21:33 ` Rik van Riel 2011-06-27 6:10 ` Minchan Kim @ 2011-06-28 21:49 ` Andrew Morton 2011-06-29 10:57 ` Pádraig Brady 2011-06-30 9:39 ` Mel Gorman 2011-06-30 2:23 ` KOSAKI Motohiro 3 siblings, 2 replies; 13+ messages in thread From: Andrew Morton @ 2011-06-28 21:49 UTC (permalink / raw) To: Mel Gorman Cc: Pádraig Brady, James Bottomley, Colin King, Minchan Kim, Andrew Lutomirski, Rik van Riel, Johannes Weiner, linux-mm, linux-kernel On Fri, 24 Jun 2011 15:44:54 +0100 Mel Gorman <mgorman@suse.de> wrote: > During allocator-intensive workloads, kswapd will be woken frequently > causing free memory to oscillate between the high and min watermark. > This is expected behaviour. > > A problem occurs if the highest zone is small. balance_pgdat() > only considers unreclaimable zones when priority is DEF_PRIORITY > but sleeping_prematurely considers all zones. It's possible for this > sequence to occur > > 1. kswapd wakes up and enters balance_pgdat() > 2. At DEF_PRIORITY, marks highest zone unreclaimable > 3. At DEF_PRIORITY-1, ignores highest zone setting end_zone > 4. At DEF_PRIORITY-1, calls shrink_slab freeing memory from > highest zone, clearing all_unreclaimable. Highest zone > is still unbalanced > 5. kswapd returns and calls sleeping_prematurely > 6. sleeping_prematurely looks at *all* zones, not just the ones > being considered by balance_pgdat. The highest small zone > has all_unreclaimable cleared but but the zone is not > balanced. all_zones_ok is false so kswapd stays awake > > This patch corrects the behaviour of sleeping_prematurely to check > the zones balance_pgdat() checked. But kswapd is making progress: it's reclaiming slab. Eventually that won't work any more and all_unreclaimable will not be cleared and the condition will fix itself up? btw, if (!sleeping_prematurely(...)) sleep(); hurts my brain. My brain would prefer if (kswapd_should_sleep(...)) sleep(); no? > Reported-and-tested-by: Pádraig Brady <P@draigBrady.com> But what were the before-and-after observations? I don't understand how this can cause a permanent cpuchew by kswapd. > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -2323,7 +2323,7 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining, > return true; > > /* Check the watermark levels */ > - for (i = 0; i < pgdat->nr_zones; i++) { > + for (i = 0; i <= classzone_idx; i++) { > struct zone *zone = pgdat->node_zones + i; > > if (!populated_zone(zone)) The patch looks sensible. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/4] mm: vmscan: Correct check for kswapd sleeping in sleeping_prematurely 2011-06-28 21:49 ` Andrew Morton @ 2011-06-29 10:57 ` Pádraig Brady 2011-06-30 9:39 ` Mel Gorman 1 sibling, 0 replies; 13+ messages in thread From: Pádraig Brady @ 2011-06-29 10:57 UTC (permalink / raw) To: Andrew Morton Cc: Mel Gorman, James Bottomley, Colin King, Minchan Kim, Andrew Lutomirski, Rik van Riel, Johannes Weiner, linux-mm, linux-kernel On 28/06/11 22:49, Andrew Morton wrote: > On Fri, 24 Jun 2011 15:44:54 +0100 > Mel Gorman <mgorman@suse.de> wrote: > >> During allocator-intensive workloads, kswapd will be woken frequently >> causing free memory to oscillate between the high and min watermark. >> This is expected behaviour. >> >> A problem occurs if the highest zone is small. balance_pgdat() >> only considers unreclaimable zones when priority is DEF_PRIORITY >> but sleeping_prematurely considers all zones. It's possible for this >> sequence to occur >> >> 1. kswapd wakes up and enters balance_pgdat() >> 2. At DEF_PRIORITY, marks highest zone unreclaimable >> 3. At DEF_PRIORITY-1, ignores highest zone setting end_zone >> 4. At DEF_PRIORITY-1, calls shrink_slab freeing memory from >> highest zone, clearing all_unreclaimable. Highest zone >> is still unbalanced >> 5. kswapd returns and calls sleeping_prematurely >> 6. sleeping_prematurely looks at *all* zones, not just the ones >> being considered by balance_pgdat. The highest small zone >> has all_unreclaimable cleared but but the zone is not >> balanced. all_zones_ok is false so kswapd stays awake >> >> This patch corrects the behaviour of sleeping_prematurely to check >> the zones balance_pgdat() checked. > > But kswapd is making progress: it's reclaiming slab. Eventually that > won't work any more and all_unreclaimable will not be cleared and the > condition will fix itself up? > > > > btw, > > if (!sleeping_prematurely(...)) > sleep(); > > hurts my brain. My brain would prefer > > if (kswapd_should_sleep(...)) > sleep(); > > no? > >> Reported-and-tested-by: Padraig Brady <P@draigBrady.com> > > But what were the before-and-after observations? I don't understand > how this can cause a permanent cpuchew by kswapd. Context: http://marc.info/?t=130865025500001&r=1&w=2 https://bugzilla.redhat.com/show_bug.cgi?id=712019 Summary: This will spin kswapd0 on my SNB laptop with 3GB RAM (with small normal zone): dd bs=1M count=3000 if=/dev/zero of=spin.test Basically once a certain amount of data is cached, kswapd0 will start spinning, until the data is removed from cache (by `rm spin.test` for example). cheers, Padraig. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/4] mm: vmscan: Correct check for kswapd sleeping in sleeping_prematurely 2011-06-28 21:49 ` Andrew Morton 2011-06-29 10:57 ` Pádraig Brady @ 2011-06-30 9:39 ` Mel Gorman 1 sibling, 0 replies; 13+ messages in thread From: Mel Gorman @ 2011-06-30 9:39 UTC (permalink / raw) To: Andrew Morton Cc: P?draig Brady, James Bottomley, Colin King, Minchan Kim, Andrew Lutomirski, Rik van Riel, Johannes Weiner, linux-mm, linux-kernel On Tue, Jun 28, 2011 at 02:49:00PM -0700, Andrew Morton wrote: > On Fri, 24 Jun 2011 15:44:54 +0100 > Mel Gorman <mgorman@suse.de> wrote: > > > During allocator-intensive workloads, kswapd will be woken frequently > > causing free memory to oscillate between the high and min watermark. > > This is expected behaviour. > > > > A problem occurs if the highest zone is small. balance_pgdat() > > only considers unreclaimable zones when priority is DEF_PRIORITY > > but sleeping_prematurely considers all zones. It's possible for this > > sequence to occur > > > > 1. kswapd wakes up and enters balance_pgdat() > > 2. At DEF_PRIORITY, marks highest zone unreclaimable > > 3. At DEF_PRIORITY-1, ignores highest zone setting end_zone > > 4. At DEF_PRIORITY-1, calls shrink_slab freeing memory from > > highest zone, clearing all_unreclaimable. Highest zone > > is still unbalanced > > 5. kswapd returns and calls sleeping_prematurely > > 6. sleeping_prematurely looks at *all* zones, not just the ones > > being considered by balance_pgdat. The highest small zone > > has all_unreclaimable cleared but but the zone is not > > balanced. all_zones_ok is false so kswapd stays awake > > > > This patch corrects the behaviour of sleeping_prematurely to check > > the zones balance_pgdat() checked. > > But kswapd is making progress: it's reclaiming slab. Eventually that > won't work any more and all_unreclaimable will not be cleared and the > condition will fix itself up? > It might, but at that point we've dumped as much slab as we can which is very aggressive and there is no guarantee the condition is fixed up. For example, if fork is happening often enough due to terminal usage for example, it may be just enough allocation requests satisified from the highest zone to clear all_unreclaimable during exit. > btw, > > if (!sleeping_prematurely(...)) > sleep(); > > hurts my brain. My brain would prefer > > if (kswapd_should_sleep(...)) > sleep(); > > no? > kswapd_try_to_sleep -> should_sleep feel like it would hurt too. I prefer the sleeping_prematurely name because it indicates what condition we are checking but I'm biased and generally suck at naming. > > Reported-and-tested-by: Padraig Brady <P@draigBrady.com> > > But what were the before-and-after observations? I don't understand > how this can cause a permanent cpuchew by kswapd. > Padraig has reported on his before-and-after observations. On its own, this patch doesn't entirely fix his problem because all the patches are required but I felt that a rolled-up patch would be too hard to review. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/4] mm: vmscan: Correct check for kswapd sleeping in sleeping_prematurely 2011-06-24 14:44 ` [PATCH 1/4] mm: vmscan: Correct check for kswapd sleeping in sleeping_prematurely Mel Gorman ` (2 preceding siblings ...) 2011-06-28 21:49 ` Andrew Morton @ 2011-06-30 2:23 ` KOSAKI Motohiro 3 siblings, 0 replies; 13+ messages in thread From: KOSAKI Motohiro @ 2011-06-30 2:23 UTC (permalink / raw) To: mgorman Cc: akpm, P, James.Bottomley, colin.king, minchan.kim, luto, riel, hannes, linux-mm, linux-kernel (2011/06/24 23:44), Mel Gorman wrote: > During allocator-intensive workloads, kswapd will be woken frequently > causing free memory to oscillate between the high and min watermark. > This is expected behaviour. > > A problem occurs if the highest zone is small. balance_pgdat() > only considers unreclaimable zones when priority is DEF_PRIORITY > but sleeping_prematurely considers all zones. It's possible for this > sequence to occur > > 1. kswapd wakes up and enters balance_pgdat() > 2. At DEF_PRIORITY, marks highest zone unreclaimable > 3. At DEF_PRIORITY-1, ignores highest zone setting end_zone > 4. At DEF_PRIORITY-1, calls shrink_slab freeing memory from > highest zone, clearing all_unreclaimable. Highest zone > is still unbalanced > 5. kswapd returns and calls sleeping_prematurely > 6. sleeping_prematurely looks at *all* zones, not just the ones > being considered by balance_pgdat. The highest small zone > has all_unreclaimable cleared but but the zone is not > balanced. all_zones_ok is false so kswapd stays awake > > This patch corrects the behaviour of sleeping_prematurely to check > the zones balance_pgdat() checked. > > Reported-and-tested-by: PA!draig Brady <P@draigBrady.com> > Signed-off-by: Mel Gorman <mgorman@suse.de> > --- > mm/vmscan.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 8ff834e..841e3bf 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -2323,7 +2323,7 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining, > return true; > > /* Check the watermark levels */ > - for (i = 0; i < pgdat->nr_zones; i++) { > + for (i = 0; i <= classzone_idx; i++) { > struct zone *zone = pgdat->node_zones + i; > > if (!populated_zone(zone)) sorry for the delay. Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2011-06-30 9:39 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-06-24 13:43 [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small Mel Gorman 2011-06-24 13:43 ` [PATCH 1/4] mm: vmscan: Correct check for kswapd sleeping in sleeping_prematurely Mel Gorman 2011-06-24 13:43 ` [PATCH 2/4] mm: vmscan: Do not apply pressure to slab if we are not applying pressure to zone Mel Gorman 2011-06-24 13:59 ` Mel Gorman 2011-06-24 13:43 ` [PATCH 3/4] mm: vmscan: Evaluate the watermarks against the correct classzone Mel Gorman 2011-06-24 13:43 ` [PATCH 4/4] mm: vmscan: Only read new_classzone_idx from pgdat when reclaiming successfully Mel Gorman -- strict thread matches above, loose matches on Subject: below -- 2011-06-24 14:44 [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small Mel Gorman 2011-06-24 14:44 ` [PATCH 1/4] mm: vmscan: Correct check for kswapd sleeping in sleeping_prematurely Mel Gorman 2011-06-25 21:33 ` Rik van Riel 2011-06-27 6:10 ` Minchan Kim 2011-06-28 21:49 ` Andrew Morton 2011-06-29 10:57 ` Pádraig Brady 2011-06-30 9:39 ` Mel Gorman 2011-06-30 2:23 ` KOSAKI Motohiro
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).