public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [VM PATCH 2.6.8-rc1] Prevent excessive scanning of lower zone
@ 2004-07-23  1:40 Shantanu Goel
  2004-07-23  5:07 ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: Shantanu Goel @ 2004-07-23  1:40 UTC (permalink / raw)
  To: akpm; +Cc: Kernel

[-- Attachment #1: Type: text/plain, Size: 1414 bytes --]

Hi Andrew,

I emailed this a few weeks back to the list but it
seems to have gotten lost...

The default page scanner limits # pages reclaimed to
SWAP_CLUSTER_MAX in shrink_zone() which causes greater
stress on the lower zones (DMA in my case) since
kswapd() is unable to keep up with allocations and
more memory is allocated from the lower zone.  I
noticed while running my normal workstation load, the
kernel was paging more than I expected since amount of
mapped memory was only about 30% (swappiness = 60).

To demonstrate this, I have attached the difference in
/proc/vmstat when running kernbench in optimal mode
(-j16) between stock 2.6.8-rc1 and with the patch
applied.  The patch modifies kswapd() to keep scanning
until free_pages is greater than pages_high.  In both
cases swappiness is 60.  The machine has 2x2.0Ghz
Xeons with HT enabled and memory manually limited to
256MB.  Notice that the DMA zone is scanned more than
4 times more often in the stock kernel and there about
80000 more pgsteal's from the DMA zone compared to the
modified kernel.

Also, in try_to_free_pages() I changed it to test
total_reclaimed instead of sc.nr_reclaimed.  Not sure
if that's an oversight or something else was intended
that I've failed to grasp...


Thanks,
Shantanu



		
__________________________________
Do you Yahoo!?
Take Yahoo! Mail with you! Get it on your mobile phone.
http://mobile.yahoo.com/maildemo 

[-- Attachment #2: kb-2.6.8-rc1 --]
[-- Type: application/octet-stream, Size: 1898 bytes --]

4 cpus found
Cleaning source tree...
Making defconfig...
Making dep if needed...
Caching kernel source in ram...
Half load is 2 jobs, changing to 3 as a kernel compile won't guarantee 2 jobs

Performing 5 runs of
make -j 16

Warmup optimal load run...
Optimal load -j16 run number 1...
Optimal load -j16 run number 2...
Optimal load -j16 run number 3...
Optimal load -j16 run number 4...
Optimal load -j16 run number 5...

Average Optimal -j 16 Load Run:
Elapsed Time 216.838
User Time 687.224
System Time 65.028
Percent CPU 346.8
Context Switches 59434.4
Sleeps 49148

----
pgfault                       32319468
pgfree                        20602514
pgalloc_normal                19978414
pgrefill_normal               5721256
pgpgout                       2527928
pgpgin                        1793728
pgdeactivate                  1303311
pgrefill_dma                  762328
pgalloc_dma                   625244
pgscan_kswapd_normal          599511
pgsteal_normal                513219
pswpout                       509043
pgrotated                     505673
pgscan_direct_normal          464871
pgactivate                    414917
pgscan_kswapd_dma             373563
kswapd_steal                  342353
slabs_scanned                 267429
pswpin                        221967
pgscan_direct_dma             173806
pgsteal_dma                   126828
pgmajfault                    91201
allocstall                    6402
nr_dirty                      6343
pageoutrun                    2264
kswapd_inodesteal             1517
nr_slab                       920
pginodesteal                  849
nr_page_table_pages           3
pgsteal_high                  0
pgalloc_high                  0
pgscan_kswapd_high            0
pgscan_direct_high            0
pgrefill_high                 0
nr_unstable                   0
nr_writeback                  0
nr_mapped                     -5900

[-- Attachment #3: kb-2.6.8-rc1-vmfix --]
[-- Type: application/octet-stream, Size: 1896 bytes --]

4 cpus found
Cleaning source tree...
Making defconfig...
Making dep if needed...
Caching kernel source in ram...
Half load is 2 jobs, changing to 3 as a kernel compile won't guarantee 2 jobs

Performing 5 runs of
make -j 16

Warmup optimal load run...
Optimal load -j16 run number 1...
Optimal load -j16 run number 2...
Optimal load -j16 run number 3...
Optimal load -j16 run number 4...
Optimal load -j16 run number 5...

Average Optimal -j 16 Load Run:
Elapsed Time 212.566
User Time 686.288
System Time 65.142
Percent CPU 353.2
Context Switches 60876
Sleeps 46904.4

----
pgfault                       32269769
pgfree                        20590460
pgalloc_normal                20054105
pgrefill_normal               6019997
pgpgout                       2231720
pgpgin                        1767904
pgdeactivate                  1286259
pgscan_kswapd_normal          635877
pgscan_direct_normal          596409
pgsteal_normal                592167
pgalloc_dma                   538116
pgrefill_dma                  502863
pswpout                       437400
pgrotated                     435325
pgactivate                    409417
kswapd_steal                  306860
slabs_scanned                 258712
pswpin                        226396
pgscan_direct_dma             81351
pgmajfault                    77259
pgsteal_dma                   44217
pgscan_kswapd_dma             42343
allocstall                    7589
nr_dirty                      5777
pageoutrun                    2692
nr_slab                       1060
kswapd_inodesteal             1005
pginodesteal                  522
nr_page_table_pages           3
pgsteal_high                  0
pgalloc_high                  0
pgscan_kswapd_high            0
pgscan_direct_high            0
pgrefill_high                 0
nr_unstable                   0
nr_writeback                  0
nr_mapped                     -6043

[-- Attachment #4: vm.patch --]
[-- Type: application/octet-stream, Size: 1783 bytes --]

--- .orig/mm/vmscan.c	2004-07-22 20:32:01.658676745 -0400
+++ 2.6.8-rc1-vmfix/mm/vmscan.c	2004-07-19 00:52:02.000000000 -0400
@@ -798,6 +798,7 @@
 static void
 shrink_zone(struct zone *zone, struct scan_control *sc)
 {
+	int nr_pages;
 	unsigned long nr_active;
 	unsigned long nr_inactive;
 
@@ -819,7 +820,7 @@
 	else
 		nr_inactive = 0;
 
-	sc->nr_to_reclaim = SWAP_CLUSTER_MAX;
+	nr_pages = sc->nr_to_reclaim;
 
 	while (nr_active || nr_inactive) {
 		if (nr_active) {
@@ -834,7 +835,9 @@
 					(unsigned long)SWAP_CLUSTER_MAX);
 			nr_inactive -= sc->nr_to_scan;
 			shrink_cache(zone, sc);
-			if (sc->nr_to_reclaim <= 0)
+			if (nr_pages == 0 && zone->free_pages > zone->pages_high)
+				break;
+			if (nr_pages && sc->nr_to_reclaim <= 0)
 				break;
 		}
 	}
@@ -871,6 +874,7 @@
 		if (zone->all_unreclaimable && sc->priority != DEF_PRIORITY)
 			continue;	/* Let kswapd poll it */
 
+		sc->nr_to_reclaim = SWAP_CLUSTER_MAX;
 		shrink_zone(zone, sc);
 	}
 }
@@ -917,12 +921,12 @@
 			sc.nr_reclaimed += reclaim_state->reclaimed_slab;
 			reclaim_state->reclaimed_slab = 0;
 		}
-		if (sc.nr_reclaimed >= SWAP_CLUSTER_MAX) {
+		total_scanned += sc.nr_scanned;
+		total_reclaimed += sc.nr_reclaimed;
+		if (total_reclaimed >= SWAP_CLUSTER_MAX) {
 			ret = 1;
 			goto out;
 		}
-		total_scanned += sc.nr_scanned;
-		total_reclaimed += sc.nr_reclaimed;
 
 		/*
 		 * Try to write back as many pages as we just scanned.  This
@@ -1039,7 +1043,9 @@
 			if (nr_pages == 0) {	/* Not software suspend */
 				if (zone->free_pages <= zone->pages_high)
 					all_zones_ok = 0;
-			}
+				sc.nr_to_reclaim = 0;
+			} else
+				sc.nr_to_reclaim = to_free - total_reclaimed;
 			zone->temp_priority = priority;
 			if (zone->prev_priority > priority)
 				zone->prev_priority = priority;

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-08-06  4:38 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-23  1:40 [VM PATCH 2.6.8-rc1] Prevent excessive scanning of lower zone Shantanu Goel
2004-07-23  5:07 ` Andrew Morton
2004-07-23  2:26   ` Shantanu Goel
2004-07-23  3:43   ` Nick Piggin
2004-08-05  0:02   ` Shantanu Goel
2004-08-06  4:38     ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox