public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
* Variation in measure_migration_cost() with scheduler-cache-hot-autodetect.patch in -mm
@ 2005-06-22  3:19 Chen, Kenneth W
  2005-06-22  4:53 ` Luck, Tony
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Chen, Kenneth W @ 2005-06-22  3:19 UTC (permalink / raw)
  To: 'Ingo Molnar', linux-kernel, linux-ia64

I'm consistently getting a smaller than expected cache migration cost
as measured by Ingo's scheduler-cache-hot-autodetect.patch currently
in -mm tree.  In this patch, the memory used to calibrate migration
cost is obtained by vmalloc call.  Would it make sense to use
__get_free_pages() instead?  I did the following experiments on a
variety of machines I have access to:

				migration cost		migration cost
				with vmalloc mem		with __get_free_pages
3.0GHz Xeon, 8MB cache		6.23 ms		6.32 ms
3.4GHz Xeon, 2MB cache		1.62 ms		2.00 ms
1.6GHz Itanium2, 9MB		9.2 ms		10.2 ms
1.4GHz Itanium2, 4MB		4.2 ms		 4.4 ms

Why the discrepancy?  Possible cache coloring issue?



--- linux-2.6.12/kernel/sched.c.orig	2005-06-21 19:42:45.067876790 -0700
+++ linux-2.6.12/kernel/sched.c	2005-06-21 19:43:42.580571398 -0700
@@ -5420,7 +5420,7 @@ measure_cost(int cpu1, int cpu2, void *c
 __init static unsigned long long measure_migration_cost(int cpu1, int cpu2)
 {
 	unsigned long long max_cost = 0, fluct = 0, avg_fluct = 0;
-	unsigned int max_size, size, size_found = 0;
+	unsigned int order, max_size, size, size_found = 0;
 	long long cost = 0, prev_cost;
 	void *cache;
 
@@ -5448,7 +5448,10 @@ __init static unsigned long long measure
 	/*
 	 * Allocate the working set:
 	 */
-	cache = vmalloc(max_size);
+	for (order = 0; PAGE_SIZE << order < max_size; order++)
+		;
+	cache = (void *) __get_free_pages(GFP_KERNEL, order);
+
 	if (!cache) {
 		printk("could not vmalloc %d bytes for cache!\n", 2*max_size);
 		return 1000000; // return 1 msec on very small boxen
@@ -5508,7 +5511,7 @@ __init static unsigned long long measure
 		printk("[%d][%d] working set size found: %d, cost: %Ld\n",
 			cpu1, cpu2, size_found, max_cost);
 
-	vfree(cache);
+	free_pages((unsigned long) cache, order);
 
 	/*
 	 * A task is considered 'cache cold' if at least 2 times


^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: Variation in measure_migration_cost() with scheduler-cache-hot-autodetect.patch in -mm
  2005-06-22  3:19 Variation in measure_migration_cost() with scheduler-cache-hot-autodetect.patch in -mm Chen, Kenneth W
@ 2005-06-22  4:53 ` Luck, Tony
  2005-06-22  7:14 ` Ingo Molnar
  2005-06-23  1:41 ` Chen, Kenneth W
  2 siblings, 0 replies; 4+ messages in thread
From: Luck, Tony @ 2005-06-22  4:53 UTC (permalink / raw)
  To: Chen, Kenneth W, Ingo Molnar, linux-kernel, linux-ia64

Nitpick: 

> 		printk("could not vmalloc %d bytes for cache!\n", 2*max_size);

You should change this printk() message to not say "vmalloc".

-Tony

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Variation in measure_migration_cost() with scheduler-cache-hot-autodetect.patch in -mm
  2005-06-22  3:19 Variation in measure_migration_cost() with scheduler-cache-hot-autodetect.patch in -mm Chen, Kenneth W
  2005-06-22  4:53 ` Luck, Tony
@ 2005-06-22  7:14 ` Ingo Molnar
  2005-06-23  1:41 ` Chen, Kenneth W
  2 siblings, 0 replies; 4+ messages in thread
From: Ingo Molnar @ 2005-06-22  7:14 UTC (permalink / raw)
  To: Chen, Kenneth W; +Cc: linux-kernel, linux-ia64


* Chen, Kenneth W <kenneth.w.chen@intel.com> wrote:

> I'm consistently getting a smaller than expected cache migration cost 
> as measured by Ingo's scheduler-cache-hot-autodetect.patch currently 
> in -mm tree.  In this patch, the memory used to calibrate migration 
> cost is obtained by vmalloc call.  Would it make sense to use 
> __get_free_pages() instead?  I did the following experiments on a 
> variety of machines I have access to:
> 
> 				migration cost		migration cost
> 				with vmalloc mem		with __get_free_pages
> 3.0GHz Xeon, 8MB cache		6.23 ms		6.32 ms
> 3.4GHz Xeon, 2MB cache		1.62 ms		2.00 ms
> 1.6GHz Itanium2, 9MB		9.2 ms		10.2 ms
> 1.4GHz Itanium2, 4MB		4.2 ms		 4.4 ms
> 
> Why the discrepancy?  Possible cache coloring issue?

probably coloring effects, yes. Another reason could be that 
touch_cache() touches 6 separate areas of memory, which combined with 
the stack give a minimum of 7 hugepage TLBs. How many are there in these 
Xeons? If there are say 4 of them then we could be trashing these TLB 
entries. There are much more 4K TLBs. To reduce the number of TLBs
utilized, could you change touch_cache() to do something like:

        unsigned long size = __size/sizeof(long), chunk1 = size/2;
        unsigned long *cache = __cache;
        int i;

        for (i = 0; i < size/4; i += 8) {
                switch (i % 4) {
                        case 0: cache[i]++;
                        case 1: cache[size-1-i]++;
                        case 2: cache[chunk1-i]++;
                        case 3: cache[chunk1+i]++;
                }
        }

does this change the migration-cost values? Btw., how did you determine 
the value of the 'ideal' migration cost? Was this based on the database 
benchmark measurements?

There are a couple of reasons vmalloc() is better than gfp(): 1) it has 
no size limit in the measured range, and 2) it more accurately mimics 
migration costs of userspace apps, which typically have most of their 
cache-footprint in paged memory, not in hugepage memory.

	Ingo

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: Variation in measure_migration_cost() with scheduler-cache-hot-autodetect.patch in -mm
  2005-06-22  3:19 Variation in measure_migration_cost() with scheduler-cache-hot-autodetect.patch in -mm Chen, Kenneth W
  2005-06-22  4:53 ` Luck, Tony
  2005-06-22  7:14 ` Ingo Molnar
@ 2005-06-23  1:41 ` Chen, Kenneth W
  2 siblings, 0 replies; 4+ messages in thread
From: Chen, Kenneth W @ 2005-06-23  1:41 UTC (permalink / raw)
  To: 'Ingo Molnar'; +Cc: linux-kernel, linux-ia64

Ingo Molnar wrote on Wednesday, June 22, 2005 12:15 AM
> probably coloring effects, yes. Another reason could be that 
> touch_cache() touches 6 separate areas of memory, which combined with 
> the stack give a minimum of 7 hugepage TLBs. How many are there in these 
> Xeons? If there are say 4 of them then we could be trashing these TLB 
> entries. There are much more 4K TLBs. To reduce the number of TLBs
> utilized, could you change touch_cache() to do something like:
> 
>         unsigned long size = __size/sizeof(long), chunk1 = size/2;
>         unsigned long *cache = __cache;
>         int i;
> 
>         for (i = 0; i < size/4; i += 8) {
>                 switch (i % 4) {
>                         case 0: cache[i]++;
>                         case 1: cache[size-1-i]++;
>                         case 2: cache[chunk1-i]++;
>                         case 3: cache[chunk1+i]++;
>                 }
>         }
> 
> does this change the migration-cost values?

Yes it does.  On one processor, it goes down, but goes up on another.
So I'm not sure if I completely understand the behavior.

			vmalloc'ed	__get_free_pages
3.0GHz Xeon, 8MB	6.46ms	 7.05ms
3.4GHz Xeon, 2MB	0.93ms	 1.22ms
1.6GHz ia64, 9MB	9.72ms	10.06ms

What I'm really after though is to have the parameter close to an
experimentally determined optimal value.  So either algorithm with
__get_free_pages appears to be closer.


> Btw., how did you determine the value of the 'ideal' migration cost?
> Was this based on the database benchmark measurements?

Yes, it is based on my favorite "industry standard transaction processing
database" bench (I probably should use a shorter name like OLTP).


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2005-06-23  1:41 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-22  3:19 Variation in measure_migration_cost() with scheduler-cache-hot-autodetect.patch in -mm Chen, Kenneth W
2005-06-22  4:53 ` Luck, Tony
2005-06-22  7:14 ` Ingo Molnar
2005-06-23  1:41 ` Chen, Kenneth W

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox