* Re: weird speed problem
@ 2003-11-23 8:52 Yaroslav Halchenko
2003-11-24 16:52 ` Luck, Tony
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Yaroslav Halchenko @ 2003-11-23 8:52 UTC (permalink / raw)
To: linux-ia64
Hi. I'm currently running unstable debian with 2.4.20 smp prepackaged
kernel but the problem started appearing before on RH. So I thought may
be you can give me a good guess why this might happen
I have a really weird problem with estimating how fast my itanium is.
I have a really simple/stupid program which is the loop which iterates
1000 times around the point of iterative approximation of PI.
First I've compiled it under shipped red hat... it ran 12 sec, I was
happy because it took the same amount of time for 64bit amd opteron...
Then I played with how fast ia32 emulation is (took it about 2-3 minutes
for the same but ia32 bit compiled program). But then at some point
after I recompiled it back to ia64 it started running almost as slow as
ia32 though file says
a.out: ELF 64-bit LSB executable, IA-64 (Intel 64 bit architecture)
version 1 (SYSV), for GNU/Linux 2.4.0, dynamically linked (uses shared
libs), not stripped
I reinstalled linux with debian... The same story - that damn program
which runs fast under all other architerctures runs stupidly slow 64bit
on itanium...
What could I screw by checking ia32 emulation so 'thoroughly'??
What can I check? I've checked /proc/acpi/cpu but didn't find anything
usefull/weird there...
P.S. bogomips run from command line reports just ~400 when /proc/cpuinfo
is 1400...
WHAT is WRONG??? Please advise
.-.
=------------------------------ /v\ ----------------------------Keep in touch // \\ (yoh@|www.)onerussian.com
Yaroslav Halchenko /( )\ ICQ#: 60653192
Linux User ^^-^^ [175555]
Key http://www.onerussian.com/gpg-yoh.asc
GPG fingerprint 3BB6 E124 0643 A615 6F00 6854 8D11 4563 75C0 24C8
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: weird speed problem
2003-11-23 8:52 weird speed problem Yaroslav Halchenko
@ 2003-11-24 16:52 ` Luck, Tony
2003-11-24 20:00 ` Luck, Tony
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Luck, Tony @ 2003-11-24 16:52 UTC (permalink / raw)
To: linux-ia64
> First I've compiled it under shipped red hat... it ran 12 sec, I was
> happy because it took the same amount of time for 64bit amd opteron...
> Then I played with how fast ia32 emulation is (took it about
> 2-3 minutes
> for the same but ia32 bit compiled program). But then at some point
> after I recompiled it back to ia64 it started running almost
> as slow as ia32
One _potential_ reason why it ran fast one time, and slow on other runs
is the lack of page colouring in Linux. If the working set of your test
program is some large percentage of the cache size, then you can hit this
on any architecture, not just ia64.
-Tony Luck
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: weird speed problem
2003-11-23 8:52 weird speed problem Yaroslav Halchenko
2003-11-24 16:52 ` Luck, Tony
@ 2003-11-24 20:00 ` Luck, Tony
2003-11-24 21:21 ` David Mosberger
2003-11-24 21:44 ` Yaroslav Halchenko
3 siblings, 0 replies; 5+ messages in thread
From: Luck, Tony @ 2003-11-24 20:00 UTC (permalink / raw)
To: linux-ia64
> One _potential_ reason why it ran fast one time, and slow on
> other runs
> is the lack of page colouring in Linux. If the working set
> of your test
> program is some large percentage of the cache size, then you
> can hit this
> on any architecture, not just ia64.
Following up to myself (in response to private e-mail requesting
clarification).
You may be able to determine whether this is your problem
by using Stephane's "pfmon" tool to count cache misses at
various levels of the cache hierarchy, and comparing these
numbers from run to run. If you see wildly varying numbers,
and your system is idle apart from the test program, then
lack of cache colouring is probably the issue.
You may be able to workaround the lack of cache colouring by
using hugetlbfs to allocate memory, since you are guaranteed
to get contiguous physical memory that will line up neatly
in your cache. Though this may be overkill (since hugetlbfs
pagesize is generally much bigger than your cachesize, you'll
be forced to allocate far more memory than your application
needs).
-Tony
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: weird speed problem
2003-11-23 8:52 weird speed problem Yaroslav Halchenko
2003-11-24 16:52 ` Luck, Tony
2003-11-24 20:00 ` Luck, Tony
@ 2003-11-24 21:21 ` David Mosberger
2003-11-24 21:44 ` Yaroslav Halchenko
3 siblings, 0 replies; 5+ messages in thread
From: David Mosberger @ 2003-11-24 21:21 UTC (permalink / raw)
To: linux-ia64
>>>>> On Mon, 24 Nov 2003 12:00:09 -0800, "Luck, Tony" <tony.luck@intel.com> said:
Tony> You may be able to determine whether this is your problem by
Tony> using Stephane's "pfmon" tool to count cache misses at various
Tony> levels of the cache hierarchy, and comparing these numbers
Tony> from run to run. If you see wildly varying numbers, and your
Tony> system is idle apart from the test program, then lack of cache
Tony> colouring is probably the issue.
Sounds like a good suggestion to me, but before doing that, I'd
recommend to collect a simple profile. Just to see if anything
obvious is going wrong (like unaligned accesses, lots of fpswa faults,
or similar).
Hans's qprof tool might come in handy for that:
http://www.hpl.hp.com/research/linux/qprof/
Also, I have a not-yet-released tool which can collect call-counts
(similar to gprof, but without recompilation). I hope to release it
sometime next week or shortly thereafter, but if someone screams
loudly enough, I might consider making a quick but totally unsupported
snapshot of what I have.
--david
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: weird speed problem
2003-11-23 8:52 weird speed problem Yaroslav Halchenko
` (2 preceding siblings ...)
2003-11-24 21:21 ` David Mosberger
@ 2003-11-24 21:44 ` Yaroslav Halchenko
3 siblings, 0 replies; 5+ messages in thread
From: Yaroslav Halchenko @ 2003-11-24 21:44 UTC (permalink / raw)
To: linux-ia64
Thank you guys for ideas and tools!
I will look into them...
For now I've just ran nbench tools to estimate performance... results
are not that rudiculosly bad, so probably I messed up something with
that my small programm...
If you're interested here are some results of comparison
under
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
---------------------------
Itanium (rx2600 HP server)
dual of
vendor : GenuineIntel
arch : IA-64
family : Itanium 2
model : 0
revision : 7
archrev : 0
features : branchlong
cpu number : 0
cpu regs : 4
cpu MHz : 900.000000
itc MHz : 900.000000
BogoMIPS : 1346.37
results:
MEMORY INDEX : 3.124
INTEGER INDEX : 5.144
FLOATING-POINT INDEX: 7.387
opteron dual of
processor : 1
vendor_id : AuthenticAMD
cpu family : 15
model : 5
model name : AMD Opteron(tm) Processor 240
stepping : 1
cpu MHz : 1403.219
cache size : 1024 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall mmxext lm 3dnowext
3dnow
bogomips : 2804.94
results in 32bits:
MEMORY INDEX : 9.101
INTEGER INDEX : 7.426
FLOATING-POINT INDEX: 14.897
results in 64bits:
MEMORY INDEX : 9.839
INTEGER INDEX : 9.880
FLOATING-POINT INDEX: 13.895
which is a bit confusing as for floating-point for opterons....
On Mon, Nov 24, 2003 at 01:21:39PM -0800, David Mosberger wrote:
> >>>>> On Mon, 24 Nov 2003 12:00:09 -0800, "Luck, Tony" <tony.luck@intel.com> said:
>
> Tony> You may be able to determine whether this is your problem by
> Tony> using Stephane's "pfmon" tool to count cache misses at various
> Tony> levels of the cache hierarchy, and comparing these numbers
> Tony> from run to run. If you see wildly varying numbers, and your
> Tony> system is idle apart from the test program, then lack of cache
> Tony> colouring is probably the issue.
>
> Sounds like a good suggestion to me, but before doing that, I'd
> recommend to collect a simple profile. Just to see if anything
> obvious is going wrong (like unaligned accesses, lots of fpswa faults,
> or similar).
>
> Hans's qprof tool might come in handy for that:
>
> http://www.hpl.hp.com/research/linux/qprof/
>
> Also, I have a not-yet-released tool which can collect call-counts
> (similar to gprof, but without recompilation). I hope to release it
> sometime next week or shortly thereafter, but if someone screams
> loudly enough, I might consider making a quick but totally unsupported
> snapshot of what I have.
>
> --david
.-.
=------------------------------ /v\ ----------------------------Keep in touch // \\ (yoh@|www.)onerussian.com
Yaroslav Halchenko /( )\ ICQ#: 60653192
Linux User ^^-^^ [175555]
Key http://www.onerussian.com/gpg-yoh.asc
GPG fingerprint 3BB6 E124 0643 A615 6F00 6854 8D11 4563 75C0 24C8
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2003-11-24 21:44 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-11-23 8:52 weird speed problem Yaroslav Halchenko
2003-11-24 16:52 ` Luck, Tony
2003-11-24 20:00 ` Luck, Tony
2003-11-24 21:21 ` David Mosberger
2003-11-24 21:44 ` Yaroslav Halchenko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox