public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] - Cacheline align jiffies_64
@ 2004-12-06 19:32 Jack Steiner
  2004-12-06 21:03 ` Grant Grundler
  2004-12-06 22:01 ` Jack Steiner
  0 siblings, 2 replies; 3+ messages in thread
From: Jack Steiner @ 2004-12-06 19:32 UTC (permalink / raw)
  To: linux-ia64


Is there any reason jiffies_64 should not be cacheline aligned?


On large systems, system overhead on cpu 0 is higher than on other
cpus. On a completely idle 512p system, the average amount of system time 
on cpu 0 is 2.4%  and .15% on cpu 1-511.

A second interesting data point is that if I run a busy-loop
program on cpus 1-511, the system overhead on cpu 0 drops 
significantly.


I moved the timekeeper to cpu 1. The excessive system time moved
to cpu 1 and the system time on cpu 0 dropped to .2%.



Further investigation showed that the problem was caused by false
sharing of the cacheline containing jiffies_64. On the kernel that
I was running, both jiffies_64 & pal_halt share the same cacheline.
Idle cpus are frequently accessing pal_halt. Minor kernel
changes (including some of the debugging code that I used to find the 
problem :-(  ) can cause variables to move & change the false sharing - the
symptoms of the problem can change or disappear.

The following shows system time on cpus 0-3 before & after the fix:

	OLD
		0.23   2.71   0.15   0.15
	
	ALIGNED
		0.22   0.75   0.16   0.15




Signed-off-by: Jack Steiner <steiner@sgi.com>

Cachealign jiffies_64 to prevent unexpected aliasing in the caches.



Index: linux/arch/ia64/kernel/time.c
=================================--- linux.orig/arch/ia64/kernel/time.c	2004-11-30 20:30:11.000000000 -0600
+++ linux/arch/ia64/kernel/time.c	2004-12-06 13:17:45.170451297 -0600
@@ -32,7 +32,7 @@
 
 extern unsigned long wall_jiffies;
 
-u64 jiffies_64 = INITIAL_JIFFIES;
+u64 jiffies_64 __cacheline_aligned_in_smp = INITIAL_JIFFIES;
 
 EXPORT_SYMBOL(jiffies_64);
 






-- 
Thanks

Jack Steiner (steiner@sgi.com)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] - Cacheline align jiffies_64
  2004-12-06 19:32 [PATCH] - Cacheline align jiffies_64 Jack Steiner
@ 2004-12-06 21:03 ` Grant Grundler
  2004-12-06 22:01 ` Jack Steiner
  1 sibling, 0 replies; 3+ messages in thread
From: Grant Grundler @ 2004-12-06 21:03 UTC (permalink / raw)
  To: linux-ia64

On Mon, Dec 06, 2004 at 01:32:32PM -0600, Jack Steiner wrote:
> On large systems, system overhead on cpu 0 is higher than on other
> cpus. On a completely idle 512p system, the average amount of system time 
> on cpu 0 is 2.4%  and .15% on cpu 1-511.

Jack,
Not to trivialize the problem, but I found it amusing that someone
has time to "optimize" the idle loop. :^)

I realize the symptom is an effect that is only easily measured
on an idle system...but it's amusing, none the less. :^)

I'd hope there is a better way to measure temporal locality
of what's in a cacheline with q-tools. But I only know how
to determine cacheline utilization. ie look up the cache line
aligned address in System.map and then use Data EAR to get hard data
as described (briefly) here:
   http://iou.parisc-linux.org/ols2004/www/4_Measuring_Cache_line_Miss.html

probably need a few more bits to isolate per CPU behaviors but I'm
pretty sure pfmon/q-tools can do that.  The "temporal locality"
is the bit I haven't seen any solution for. (Well, maybe a simulator
is the right tool to do that; I don't know).

thanks,
grant

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] - Cacheline align jiffies_64
  2004-12-06 19:32 [PATCH] - Cacheline align jiffies_64 Jack Steiner
  2004-12-06 21:03 ` Grant Grundler
@ 2004-12-06 22:01 ` Jack Steiner
  1 sibling, 0 replies; 3+ messages in thread
From: Jack Steiner @ 2004-12-06 22:01 UTC (permalink / raw)
  To: linux-ia64

On Mon, Dec 06, 2004 at 01:03:10PM -0800, Grant Grundler wrote:
> On Mon, Dec 06, 2004 at 01:32:32PM -0600, Jack Steiner wrote:
> > On large systems, system overhead on cpu 0 is higher than on other
> > cpus. On a completely idle 512p system, the average amount of system time 
> > on cpu 0 is 2.4%  and .15% on cpu 1-511.
> 
> Jack,
> Not to trivialize the problem, but I found it amusing that someone
> has time to "optimize" the idle loop. :^)

I wasn't clear. 

The problem is that idle cpus interfere with cpu 0 (timekeeper) doing
real work. It doesn't matter what is running on cpu 0 (idle or busy), When
a timer tick occurs on cpu 0, it takes a long time to obtain exclusive 
ownership of the line that contains jiffies. 

The problem is real. The investigation was triggered by a user app
that ran ~2.5% faster on cpu 1 than on cpu 0. 


Although the specific case that I ran into could be solved by moving pal_halt, 
I don't think that is a good solution. Jiffies can be a
fairly hot variable. If it is falsely shared with another variable that
is frequently written, it could significantly impact performance.

Moving jiffies to a private cacheline seems like a better solution.


> 
> I realize the symptom is an effect that is only easily measured
> on an idle system...but it's amusing, none the less. :^)
> 
> I'd hope there is a better way to measure temporal locality
> of what's in a cacheline with q-tools. But I only know how
> to determine cacheline utilization. ie look up the cache line
> aligned address in System.map and then use Data EAR to get hard data
> as described (briefly) here:
>    http://iou.parisc-linux.org/ols2004/www/4_Measuring_Cache_line_Miss.html

I'll experiment with this tool. I wonder if there are other hot
cache lines caused by false sharing.


-- 
Thanks

Jack Steiner (steiner@sgi.com)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2004-12-06 22:01 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-12-06 19:32 [PATCH] - Cacheline align jiffies_64 Jack Steiner
2004-12-06 21:03 ` Grant Grundler
2004-12-06 22:01 ` Jack Steiner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox