* HPL benchmarking linux kernel for shared memory performance.
@ 2008-03-16 5:38 Allan Menezes
2008-03-16 10:42 ` Marcin Slusarz
2008-03-16 18:13 ` Roger Heflin
0 siblings, 2 replies; 5+ messages in thread
From: Allan Menezes @ 2008-03-16 5:38 UTC (permalink / raw)
To: linux-kernel
Hi,
I eliminated in which kernel verion this begins to happen. I am
benchmarking a single node with hpl and openmpi beta 1.3 and gotoblas v1.24
for personal noncommercial reasons.
I tried a single node with the command $ mpirun -np 1 ./xhpl
and with kernel ver 2.6.23.14 i get over 38Gflops
but with kernel ver 2.6.23.15 compiling with the same .config i get 7.xx
Gflops which is 1/5th that of the other kernel.
Keep in mind that only the kernels have changed from kernel.org not any
hardware or anything else as all else is same! but the performace drops
to 1/5th
There is no network involved in this single node quad core intel test
but just shared memory.
So the shared memory or smp performance of the newer kernels is far far
worse than upto 2.6.23.14!
Even with 2.6.25.rc5 the performance is degraded!
Can you please help me find why this is occurring? Please advise!
mY set up Quad core Q6600 intel overclocked stably to 2.88 GHZ , 6 gig
ddr2 800 mhz dual channel ram,
Allan Menezes
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: HPL benchmarking linux kernel for shared memory performance.
2008-03-16 5:38 HPL benchmarking linux kernel for shared memory performance Allan Menezes
@ 2008-03-16 10:42 ` Marcin Slusarz
2008-03-16 18:13 ` Roger Heflin
1 sibling, 0 replies; 5+ messages in thread
From: Marcin Slusarz @ 2008-03-16 10:42 UTC (permalink / raw)
To: Allan Menezes; +Cc: linux-kernel, Greg Kroah-Hartman
On Sun, Mar 16, 2008 at 01:38:51AM -0400, Allan Menezes wrote:
> Hi,
> I eliminated in which kernel verion this begins to happen. I am
> benchmarking a single node with hpl and openmpi beta 1.3 and gotoblas v1.24
> for personal noncommercial reasons.
> I tried a single node with the command $ mpirun -np 1 ./xhpl
> and with kernel ver 2.6.23.14 i get over 38Gflops
> but with kernel ver 2.6.23.15 compiling with the same .config i get 7.xx
> Gflops which is 1/5th that of the other kernel.
> Keep in mind that only the kernels have changed from kernel.org not any
> hardware or anything else as all else is same! but the performace drops to
> 1/5th
> There is no network involved in this single node quad core intel test but
> just shared memory.
> So the shared memory or smp performance of the newer kernels is far far
> worse than upto 2.6.23.14!
> Even with 2.6.25.rc5 the performance is degraded!
> Can you please help me find why this is occurring? Please advise!
> mY set up Quad core Q6600 intel overclocked stably to 2.88 GHZ , 6 gig ddr2
> 800 mhz dual channel ram,
> Allan Menezes
If you want to speed up bug resolution, you can bisect it to only one patch with git
(google for git bisect) on this tree:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.23.y.git
ps: please post your .config and dmesg output
Marcin
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: HPL benchmarking linux kernel for shared memory performance.
2008-03-16 5:38 HPL benchmarking linux kernel for shared memory performance Allan Menezes
2008-03-16 10:42 ` Marcin Slusarz
@ 2008-03-16 18:13 ` Roger Heflin
1 sibling, 0 replies; 5+ messages in thread
From: Roger Heflin @ 2008-03-16 18:13 UTC (permalink / raw)
To: Allan Menezes; +Cc: linux-kernel
Allan Menezes wrote:
> Hi,
> I eliminated in which kernel verion this begins to happen. I am
> benchmarking a single node with hpl and openmpi beta 1.3 and gotoblas v1.24
> for personal noncommercial reasons.
> I tried a single node with the command $ mpirun -np 1 ./xhpl
> and with kernel ver 2.6.23.14 i get over 38Gflops
> but with kernel ver 2.6.23.15 compiling with the same .config i get 7.xx
> Gflops which is 1/5th that of the other kernel.
> Keep in mind that only the kernels have changed from kernel.org not any
> hardware or anything else as all else is same! but the performace drops
> to 1/5th
> There is no network involved in this single node quad core intel test
> but just shared memory.
> So the shared memory or smp performance of the newer kernels is far far
> worse than upto 2.6.23.14!
> Even with 2.6.25.rc5 the performance is degraded!
> Can you please help me find why this is occurring? Please advise!
> mY set up Quad core Q6600 intel overclocked stably to 2.88 GHZ , 6 gig
> ddr2 800 mhz dual channel ram,
> Allan Menezes
Make sure that it is actually using multiple processes or threads, if it was
only using a single thread the number could be that low, and a number of really
trivial things being wrong could cause it to only be using one thread. "top -H"
will show threads.
Is the HPL executable identical (the same executable should work just fine so
long as the underlying HW is the same, so no need to recompile/relink xhpl)?
And you might also try Intel's MKL with hpl and see if that is better or worse
than gotoblas (this probably won't affect the problem, but it would be good to
know if the Intel MKL is better or worse than gotoblas).
Also, as the other person mentioned use git bisect and figure out which patch
did it, if you can identify one patch someone can probably figure it out.
Roger
^ permalink raw reply [flat|nested] 5+ messages in thread
* HPL benchmarking linux kernel for shared memory performance.
@ 2008-03-22 4:05 Allan Menezes
2008-03-22 10:04 ` Andi Kleen
0 siblings, 1 reply; 5+ messages in thread
From: Allan Menezes @ 2008-03-22 4:05 UTC (permalink / raw)
To: linux-kernel
Hi,
Sorry to have bothered you all but there is no degradation of
performance with hpl and kernel upto 2.6.25-rc6. It was my silly error!
The motherboards I use are two Asus p5e vm do and 3 ASus p5k-vm with
intel Q6600 quad cpus with GO stepping oveclocking them slightly i now
get 165 GFlops for my 5 node quad core cluster
with gotoblas v1.24 and hpl and opensource openmpi v 1.3 alpha.
My error was that the CPUID LIMIT was enabled in bios which is for older
windows systems which prevents the cpuid instruction on boot up from
filling the register EAX with values above 0x03
So I was seeing only 2 cores for a quad core in cat /proc/cpuinfo on all
nodes giving me lousy performance for every kernel except some like
2.6.23.13 and 2.6.23.14.since i was oversubscribibg each node with 4
processes and top -H confirmed that.
Now I disabled CPUID LIMIT for each node and i see alll 4 processors on
eac node with cat /proc/cpuinfo for kernel ver 2.6.25-rc6 and i get
38.44 -38.9 Gflops per node. Ofcourse i am stably overclocking each
q6600 by approx 480mhz to get that otherwise for the kentsfiel q6600
intel at 2.4 GHZ the peak is 38.4 GFlops( reference paper by Dr. Jack
Dongarra et al.)
The bios message is misleading : It says " Disable for Windows Xp" and
since I am running linux I enable it till i Googled for CPUID LIMIT and
found out and experimented!
So for your info disable CPUID LIMIT in bios when running quad core
linux or more! It's meant for old oses like win98 etc.
Thank you and and sorry for the bother it was my error and not a fault
in the kernels. This is FYI only.
Cheers,
Allan Menezes
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: HPL benchmarking linux kernel for shared memory performance.
2008-03-22 4:05 Allan Menezes
@ 2008-03-22 10:04 ` Andi Kleen
0 siblings, 0 replies; 5+ messages in thread
From: Andi Kleen @ 2008-03-22 10:04 UTC (permalink / raw)
To: Allan Menezes; +Cc: linux-kernel
Allan Menezes <amenezes007@sympatico.ca> writes:
> from filling the register EAX with values above 0x03
> So I was seeing only 2 cores for a quad core in cat /proc/cpuinfo on
> all nodes giving me lousy performance for every kernel except some
> like 2.6.23.13 and 2.6.23.14.
You say some older kernels behaved better than newer ones in the two
core oversub configuration? That would be still an issue that should
be investigated. After all Linux should run well even with
oversubscribing on two cores.
-Andi
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-03-22 10:04 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-16 5:38 HPL benchmarking linux kernel for shared memory performance Allan Menezes
2008-03-16 10:42 ` Marcin Slusarz
2008-03-16 18:13 ` Roger Heflin
-- strict thread matches above, loose matches on Subject: below --
2008-03-22 4:05 Allan Menezes
2008-03-22 10:04 ` Andi Kleen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox