linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: do not sleep in balance_pgdat if there's no i/o congestion
@ 2012-12-19 23:17 Zlatko Calusic
  2012-12-19 23:25 ` Zlatko Calusic
  2012-12-20 11:12 ` Mel Gorman
  0 siblings, 2 replies; 16+ messages in thread
From: Zlatko Calusic @ 2012-12-19 23:17 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, Mel Gorman, Hugh Dickins
  Cc: linux-mm, Linux Kernel Mailing List

On a 4GB RAM machine, where Normal zone is much smaller than
DMA32 zone, the Normal zone gets fragmented in time. This requires
relatively more pressure in balance_pgdat to get the zone above the
required watermark. Unfortunately, the congestion_wait() call in there
slows it down for a completely wrong reason, expecting that there's
a lot of writeback/swapout, even when there's none (much more common).
After a few days, when fragmentation progresses, this flawed logic
translates to a very high CPU iowait times, even though there's no
I/O congestion at all. If THP is enabled, the problem occurs sooner,
but I was able to see it even on !THP kernels, just by giving it a bit
more time to occur.

The proper way to deal with this is to not wait, unless there's
congestion. Thanks to Mel Gorman, we already have the function that
perfectly fits the job. The patch was tested on a machine which
nicely revealed the problem after only 1 day of uptime, and it's been
working great.
---
 mm/vmscan.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index b7ed376..4588d1d 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2546,7 +2546,7 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining,
 static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
 							int *classzone_idx)
 {
-	int all_zones_ok;
+	struct zone *unbalanced_zone;
 	unsigned long balanced;
 	int i;
 	int end_zone = 0;	/* Inclusive.  0 = ZONE_DMA */
@@ -2580,7 +2580,7 @@ loop_again:
 		unsigned long lru_pages = 0;
 		int has_under_min_watermark_zone = 0;
 
-		all_zones_ok = 1;
+		unbalanced_zone = NULL;
 		balanced = 0;
 
 		/*
@@ -2719,7 +2719,7 @@ loop_again:
 			}
 
 			if (!zone_balanced(zone, testorder, 0, end_zone)) {
-				all_zones_ok = 0;
+				unbalanced_zone = zone;
 				/*
 				 * We are still under min water mark.  This
 				 * means that we have a GFP_ATOMIC allocation
@@ -2752,7 +2752,7 @@ loop_again:
 				pfmemalloc_watermark_ok(pgdat))
 			wake_up(&pgdat->pfmemalloc_wait);
 
-		if (all_zones_ok || (order && pgdat_balanced(pgdat, balanced, *classzone_idx)))
+		if (!unbalanced_zone || (order && pgdat_balanced(pgdat, balanced, *classzone_idx)))
 			break;		/* kswapd: all done */
 		/*
 		 * OK, kswapd is getting into trouble.  Take a nap, then take
@@ -2762,7 +2762,7 @@ loop_again:
 			if (has_under_min_watermark_zone)
 				count_vm_event(KSWAPD_SKIP_CONGESTION_WAIT);
 			else
-				congestion_wait(BLK_RW_ASYNC, HZ/10);
+				wait_iff_congested(unbalanced_zone, BLK_RW_ASYNC, HZ/10);
 		}
 
 		/*
@@ -2781,7 +2781,7 @@ out:
 	 * high-order: Balanced zones must make up at least 25% of the node
 	 *             for the node to be balanced
 	 */
-	if (!(all_zones_ok || (order && pgdat_balanced(pgdat, balanced, *classzone_idx)))) {
+	if (unbalanced_zone && (!order || !pgdat_balanced(pgdat, balanced, *classzone_idx))) {
 		cond_resched();
 
 		try_to_freeze();
-- 
1.7.10.4

-- 
Zlatko

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2012-12-31  0:50 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-19 23:17 [PATCH] mm: do not sleep in balance_pgdat if there's no i/o congestion Zlatko Calusic
2012-12-19 23:25 ` Zlatko Calusic
2012-12-21 11:51   ` Hillf Danton
2012-12-27 15:42     ` Zlatko Calusic
2012-12-29  7:25       ` Hillf Danton
2012-12-29 12:11         ` Zlatko Calusic
2012-12-20 11:12 ` Mel Gorman
2012-12-20 20:58   ` Andrew Morton
2012-12-22 18:54     ` [PATCH] mm: modify pgdat_balanced() so that it also handles order=0 Zlatko Calusic
2012-12-23 14:12       ` [PATCH v2] " Zlatko Calusic
2012-12-26 15:07         ` [PATCH] mm: avoid calling pgdat_balanced() needlessly Zlatko Calusic
2012-12-28  2:16           ` [PATCH] mm: fix null pointer dereference in wait_iff_congested() Zlatko Calusic
2012-12-28  2:49             ` Minchan Kim
2012-12-28 13:29               ` Zlatko Calusic
2012-12-31  0:50                 ` Minchan Kim
2012-12-29  8:45             ` Sedat Dilek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).