public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* HPL benchmarking linux kernel for shared memory performance.
@ 2008-03-16  5:38 Allan Menezes
  2008-03-16 10:42 ` Marcin Slusarz
  2008-03-16 18:13 ` Roger Heflin
  0 siblings, 2 replies; 5+ messages in thread
From: Allan Menezes @ 2008-03-16  5:38 UTC (permalink / raw)
  To: linux-kernel

Hi,
    I eliminated in which kernel verion this begins to happen. I am 
benchmarking a single node with hpl and openmpi beta 1.3 and gotoblas v1.24
for personal noncommercial reasons.
I tried a single node with the command $ mpirun -np 1 ./xhpl
and with kernel ver 2.6.23.14 i get over 38Gflops
but with kernel ver 2.6.23.15 compiling with the same .config i get 7.xx 
Gflops which is 1/5th that of the other kernel.
Keep in mind that only the kernels have changed from kernel.org not any 
hardware or anything else as all else is same! but the  performace drops 
to 1/5th
There is no network involved in this single node quad core intel test 
but just shared memory.
So the shared memory or smp performance of the newer kernels is far far 
worse than upto 2.6.23.14!
Even with 2.6.25.rc5 the performance is degraded!
Can you please help me find why this is occurring? Please advise!
mY set up Quad core Q6600 intel overclocked stably to 2.88 GHZ , 6 gig 
ddr2 800 mhz dual channel ram,
Allan Menezes

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: HPL benchmarking linux kernel for shared memory performance.
  2008-03-16  5:38 HPL benchmarking linux kernel for shared memory performance Allan Menezes
@ 2008-03-16 10:42 ` Marcin Slusarz
  2008-03-16 18:13 ` Roger Heflin
  1 sibling, 0 replies; 5+ messages in thread
From: Marcin Slusarz @ 2008-03-16 10:42 UTC (permalink / raw)
  To: Allan Menezes; +Cc: linux-kernel, Greg Kroah-Hartman

On Sun, Mar 16, 2008 at 01:38:51AM -0400, Allan Menezes wrote:
> Hi,
>    I eliminated in which kernel verion this begins to happen. I am 
> benchmarking a single node with hpl and openmpi beta 1.3 and gotoblas v1.24
> for personal noncommercial reasons.
> I tried a single node with the command $ mpirun -np 1 ./xhpl
> and with kernel ver 2.6.23.14 i get over 38Gflops
> but with kernel ver 2.6.23.15 compiling with the same .config i get 7.xx 
> Gflops which is 1/5th that of the other kernel.
> Keep in mind that only the kernels have changed from kernel.org not any 
> hardware or anything else as all else is same! but the  performace drops to 
> 1/5th
> There is no network involved in this single node quad core intel test but 
> just shared memory.
> So the shared memory or smp performance of the newer kernels is far far 
> worse than upto 2.6.23.14!
> Even with 2.6.25.rc5 the performance is degraded!
> Can you please help me find why this is occurring? Please advise!
> mY set up Quad core Q6600 intel overclocked stably to 2.88 GHZ , 6 gig ddr2 
> 800 mhz dual channel ram,
> Allan Menezes

If you want to speed up bug resolution, you can bisect it to only one patch with git
(google for git bisect) on this tree:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.23.y.git

ps: please post your .config and dmesg output

Marcin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: HPL benchmarking linux kernel for shared memory performance.
  2008-03-16  5:38 HPL benchmarking linux kernel for shared memory performance Allan Menezes
  2008-03-16 10:42 ` Marcin Slusarz
@ 2008-03-16 18:13 ` Roger Heflin
  1 sibling, 0 replies; 5+ messages in thread
From: Roger Heflin @ 2008-03-16 18:13 UTC (permalink / raw)
  To: Allan Menezes; +Cc: linux-kernel

Allan Menezes wrote:
> Hi,
>    I eliminated in which kernel verion this begins to happen. I am 
> benchmarking a single node with hpl and openmpi beta 1.3 and gotoblas v1.24
> for personal noncommercial reasons.
> I tried a single node with the command $ mpirun -np 1 ./xhpl
> and with kernel ver 2.6.23.14 i get over 38Gflops
> but with kernel ver 2.6.23.15 compiling with the same .config i get 7.xx 
> Gflops which is 1/5th that of the other kernel.
> Keep in mind that only the kernels have changed from kernel.org not any 
> hardware or anything else as all else is same! but the  performace drops 
> to 1/5th
> There is no network involved in this single node quad core intel test 
> but just shared memory.
> So the shared memory or smp performance of the newer kernels is far far 
> worse than upto 2.6.23.14!
> Even with 2.6.25.rc5 the performance is degraded!
> Can you please help me find why this is occurring? Please advise!
> mY set up Quad core Q6600 intel overclocked stably to 2.88 GHZ , 6 gig 
> ddr2 800 mhz dual channel ram,
> Allan Menezes

Make sure that it is actually using multiple processes or threads, if it was 
only using a single thread the number could be that low, and a number of really 
trivial things being wrong could cause it to only be using one thread.  "top -H" 
will show threads.

Is the HPL executable identical (the same executable should work just fine so 
long as the underlying HW is the same, so no need to recompile/relink xhpl)?

And you might also try Intel's MKL with hpl and see if that is better or worse 
than gotoblas (this probably won't affect the problem, but it would be good to 
know if the Intel MKL is better or worse than gotoblas).

Also, as the other person mentioned use git bisect and figure out which patch 
did it, if you can identify one patch someone can probably figure it out.

                            Roger

^ permalink raw reply	[flat|nested] 5+ messages in thread

* HPL benchmarking linux kernel for shared memory performance.
@ 2008-03-22  4:05 Allan Menezes
  2008-03-22 10:04 ` Andi Kleen
  0 siblings, 1 reply; 5+ messages in thread
From: Allan Menezes @ 2008-03-22  4:05 UTC (permalink / raw)
  To: linux-kernel

Hi,
   Sorry to have bothered you all but there is no degradation of 
performance with hpl and kernel upto 2.6.25-rc6. It was my silly error!
The motherboards I use are two Asus p5e vm do and 3 ASus p5k-vm with 
intel Q6600 quad cpus with GO stepping oveclocking them slightly i now 
get 165 GFlops for my 5 node quad core cluster
with gotoblas v1.24 and hpl and opensource openmpi v 1.3 alpha.
My error was that the CPUID LIMIT was enabled in bios which is for older 
windows systems which prevents the cpuid instruction on boot up from 
filling the register EAX with values above 0x03
So I was seeing only 2 cores for a quad core in cat /proc/cpuinfo on all 
nodes giving me lousy performance for every kernel except some like 
2.6.23.13 and 2.6.23.14.since i was oversubscribibg each node with 4 
processes and top -H  confirmed that.
Now I disabled CPUID LIMIT for each node and i see alll 4 processors on 
eac node with cat /proc/cpuinfo for kernel ver 2.6.25-rc6 and i get 
38.44 -38.9 Gflops per node. Ofcourse i am stably overclocking each 
q6600 by approx 480mhz to get that otherwise for the kentsfiel q6600 
intel at 2.4 GHZ the peak is 38.4 GFlops( reference paper by Dr. Jack 
Dongarra et al.)
The bios message is misleading : It says " Disable for Windows Xp" and 
since I am running linux I enable it till i Googled for CPUID LIMIT and 
found out and experimented!
So for your info disable CPUID LIMIT in bios when running quad core 
linux or more! It's meant for old oses like win98 etc.
Thank you and and sorry for the bother it was my error and not a fault 
in the kernels. This is FYI only.
Cheers,
Allan Menezes

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: HPL benchmarking linux kernel for shared memory performance.
  2008-03-22  4:05 Allan Menezes
@ 2008-03-22 10:04 ` Andi Kleen
  0 siblings, 0 replies; 5+ messages in thread
From: Andi Kleen @ 2008-03-22 10:04 UTC (permalink / raw)
  To: Allan Menezes; +Cc: linux-kernel

Allan Menezes <amenezes007@sympatico.ca> writes:

> from filling the register EAX with values above 0x03
> So I was seeing only 2 cores for a quad core in cat /proc/cpuinfo on
> all nodes giving me lousy performance for every kernel except some
> like 2.6.23.13 and 2.6.23.14.

You say some older kernels behaved better than newer ones in the two
core oversub configuration? That would be still an issue that should
be investigated.  After all Linux should run well even with
oversubscribing on two cores.

-Andi

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-03-22 10:04 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-16  5:38 HPL benchmarking linux kernel for shared memory performance Allan Menezes
2008-03-16 10:42 ` Marcin Slusarz
2008-03-16 18:13 ` Roger Heflin
  -- strict thread matches above, loose matches on Subject: below --
2008-03-22  4:05 Allan Menezes
2008-03-22 10:04 ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox