* RE: weird speed problem
2003-11-23 8:52 weird speed problem Yaroslav Halchenko
@ 2003-11-24 16:52 ` Luck, Tony
2003-11-24 20:00 ` Luck, Tony
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Luck, Tony @ 2003-11-24 16:52 UTC (permalink / raw)
To: linux-ia64
> First I've compiled it under shipped red hat... it ran 12 sec, I was
> happy because it took the same amount of time for 64bit amd opteron...
> Then I played with how fast ia32 emulation is (took it about
> 2-3 minutes
> for the same but ia32 bit compiled program). But then at some point
> after I recompiled it back to ia64 it started running almost
> as slow as ia32
One _potential_ reason why it ran fast one time, and slow on other runs
is the lack of page colouring in Linux. If the working set of your test
program is some large percentage of the cache size, then you can hit this
on any architecture, not just ia64.
-Tony Luck
^ permalink raw reply [flat|nested] 5+ messages in thread* RE: weird speed problem
2003-11-23 8:52 weird speed problem Yaroslav Halchenko
2003-11-24 16:52 ` Luck, Tony
@ 2003-11-24 20:00 ` Luck, Tony
2003-11-24 21:21 ` David Mosberger
2003-11-24 21:44 ` Yaroslav Halchenko
3 siblings, 0 replies; 5+ messages in thread
From: Luck, Tony @ 2003-11-24 20:00 UTC (permalink / raw)
To: linux-ia64
> One _potential_ reason why it ran fast one time, and slow on
> other runs
> is the lack of page colouring in Linux. If the working set
> of your test
> program is some large percentage of the cache size, then you
> can hit this
> on any architecture, not just ia64.
Following up to myself (in response to private e-mail requesting
clarification).
You may be able to determine whether this is your problem
by using Stephane's "pfmon" tool to count cache misses at
various levels of the cache hierarchy, and comparing these
numbers from run to run. If you see wildly varying numbers,
and your system is idle apart from the test program, then
lack of cache colouring is probably the issue.
You may be able to workaround the lack of cache colouring by
using hugetlbfs to allocate memory, since you are guaranteed
to get contiguous physical memory that will line up neatly
in your cache. Though this may be overkill (since hugetlbfs
pagesize is generally much bigger than your cachesize, you'll
be forced to allocate far more memory than your application
needs).
-Tony
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: weird speed problem
2003-11-23 8:52 weird speed problem Yaroslav Halchenko
2003-11-24 16:52 ` Luck, Tony
2003-11-24 20:00 ` Luck, Tony
@ 2003-11-24 21:21 ` David Mosberger
2003-11-24 21:44 ` Yaroslav Halchenko
3 siblings, 0 replies; 5+ messages in thread
From: David Mosberger @ 2003-11-24 21:21 UTC (permalink / raw)
To: linux-ia64
>>>>> On Mon, 24 Nov 2003 12:00:09 -0800, "Luck, Tony" <tony.luck@intel.com> said:
Tony> You may be able to determine whether this is your problem by
Tony> using Stephane's "pfmon" tool to count cache misses at various
Tony> levels of the cache hierarchy, and comparing these numbers
Tony> from run to run. If you see wildly varying numbers, and your
Tony> system is idle apart from the test program, then lack of cache
Tony> colouring is probably the issue.
Sounds like a good suggestion to me, but before doing that, I'd
recommend to collect a simple profile. Just to see if anything
obvious is going wrong (like unaligned accesses, lots of fpswa faults,
or similar).
Hans's qprof tool might come in handy for that:
http://www.hpl.hp.com/research/linux/qprof/
Also, I have a not-yet-released tool which can collect call-counts
(similar to gprof, but without recompilation). I hope to release it
sometime next week or shortly thereafter, but if someone screams
loudly enough, I might consider making a quick but totally unsupported
snapshot of what I have.
--david
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: weird speed problem
2003-11-23 8:52 weird speed problem Yaroslav Halchenko
` (2 preceding siblings ...)
2003-11-24 21:21 ` David Mosberger
@ 2003-11-24 21:44 ` Yaroslav Halchenko
3 siblings, 0 replies; 5+ messages in thread
From: Yaroslav Halchenko @ 2003-11-24 21:44 UTC (permalink / raw)
To: linux-ia64
Thank you guys for ideas and tools!
I will look into them...
For now I've just ran nbench tools to estimate performance... results
are not that rudiculosly bad, so probably I messed up something with
that my small programm...
If you're interested here are some results of comparison
under
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
---------------------------
Itanium (rx2600 HP server)
dual of
vendor : GenuineIntel
arch : IA-64
family : Itanium 2
model : 0
revision : 7
archrev : 0
features : branchlong
cpu number : 0
cpu regs : 4
cpu MHz : 900.000000
itc MHz : 900.000000
BogoMIPS : 1346.37
results:
MEMORY INDEX : 3.124
INTEGER INDEX : 5.144
FLOATING-POINT INDEX: 7.387
opteron dual of
processor : 1
vendor_id : AuthenticAMD
cpu family : 15
model : 5
model name : AMD Opteron(tm) Processor 240
stepping : 1
cpu MHz : 1403.219
cache size : 1024 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall mmxext lm 3dnowext
3dnow
bogomips : 2804.94
results in 32bits:
MEMORY INDEX : 9.101
INTEGER INDEX : 7.426
FLOATING-POINT INDEX: 14.897
results in 64bits:
MEMORY INDEX : 9.839
INTEGER INDEX : 9.880
FLOATING-POINT INDEX: 13.895
which is a bit confusing as for floating-point for opterons....
On Mon, Nov 24, 2003 at 01:21:39PM -0800, David Mosberger wrote:
> >>>>> On Mon, 24 Nov 2003 12:00:09 -0800, "Luck, Tony" <tony.luck@intel.com> said:
>
> Tony> You may be able to determine whether this is your problem by
> Tony> using Stephane's "pfmon" tool to count cache misses at various
> Tony> levels of the cache hierarchy, and comparing these numbers
> Tony> from run to run. If you see wildly varying numbers, and your
> Tony> system is idle apart from the test program, then lack of cache
> Tony> colouring is probably the issue.
>
> Sounds like a good suggestion to me, but before doing that, I'd
> recommend to collect a simple profile. Just to see if anything
> obvious is going wrong (like unaligned accesses, lots of fpswa faults,
> or similar).
>
> Hans's qprof tool might come in handy for that:
>
> http://www.hpl.hp.com/research/linux/qprof/
>
> Also, I have a not-yet-released tool which can collect call-counts
> (similar to gprof, but without recompilation). I hope to release it
> sometime next week or shortly thereafter, but if someone screams
> loudly enough, I might consider making a quick but totally unsupported
> snapshot of what I have.
>
> --david
.-.
=------------------------------ /v\ ----------------------------Keep in touch // \\ (yoh@|www.)onerussian.com
Yaroslav Halchenko /( )\ ICQ#: 60653192
Linux User ^^-^^ [175555]
Key http://www.onerussian.com/gpg-yoh.asc
GPG fingerprint 3BB6 E124 0643 A615 6F00 6854 8D11 4563 75C0 24C8
^ permalink raw reply [flat|nested] 5+ messages in thread