* [Linux-ia64] Re: strange performance behaviour with floats
@ 2003-02-24 1:45 Keith Owens
2003-02-24 1:50 ` David Mosberger
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Keith Owens @ 2003-02-24 1:45 UTC (permalink / raw)
To: linux-ia64
On Fri, 21 Feb 2003 18:30:32 -0800,
David Mosberger <davidm@napali.hpl.hp.com> wrote:
>1-cycle loops are
>never optimal on McKinley (that's why the Linux bogomips comes out at
>1438 instead of 2000, for example), though I don't know the exact
>micro-architectural details that cause this.
--- include/asm-ia64/delay.h
+++ include/asm-ia64/delay.h
@@ -71,7 +71,7 @@
__asm__ __volatile__("mov %0=ar.lc;;" : "=r"(saved_ar_lc));
__asm__ __volatile__("mov ar.lc=%0;;" :: "r"(loops - 1));
- __asm__ __volatile__("1:\tbr.cloop.sptk.few 1b;;");
+ __asm__ __volatile__("1:\tnop 0;nop 0;nop 0;br.cloop.sptk.few 1b;;");
__asm__ __volatile__("mov ar.lc=%0" :: "r"(saved_ar_lc));
}
generated a two bundle loop as you suggested, but BogoMIPS went down,
not up.
Original code, one bundle br.cloop:
CPU 0: base freq 0.000MHz, ITC ratio=9/2, ITC freqê0.000MHz
Calibrating delay loop... 1347.52 BogoMIPS
Modified code, two bundle br.cloop:
CPU 0: base freq 0.000MHz, ITC ratio=9/2, ITC freqê0.000MHz
Calibrating delay loop... 898.68 BogoMIPS
processor : 0
vendor : GenuineIntel
arch : IA-64
family : Itanium 2
model : 0
revision : 6
archrev : 0
features : branchlong
cpu number : 0
cpu regs : 4
cpu MHz : 900.000000
itc MHz : 900.000000
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Linux-ia64] Re: strange performance behaviour with floats
2003-02-24 1:45 [Linux-ia64] Re: strange performance behaviour with floats Keith Owens
@ 2003-02-24 1:50 ` David Mosberger
2003-02-24 2:01 ` Keith Owens
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: David Mosberger @ 2003-02-24 1:50 UTC (permalink / raw)
To: linux-ia64
>>>>> On Mon, 24 Feb 2003 12:45:10 +1100, Keith Owens <kaos@sgi.com> said:
Keith> generated a two bundle loop as you suggested, but BogoMIPS went down,
Keith> not up.
Note: 2 bundle != 2 cycle, but even ignoring that: what did you
expect? BogoMIPS counts 2 instructions per loop iteration no matter
how many instructions are being executed. Perhaps you can get the
compiler to unroll the loop. Then you'd see a higher BogoMIPS.
--david
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Linux-ia64] Re: strange performance behaviour with floats
2003-02-24 1:45 [Linux-ia64] Re: strange performance behaviour with floats Keith Owens
2003-02-24 1:50 ` David Mosberger
@ 2003-02-24 2:01 ` Keith Owens
2003-02-24 2:16 ` David Mosberger
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Keith Owens @ 2003-02-24 2:01 UTC (permalink / raw)
To: linux-ia64
On Sun, 23 Feb 2003 17:50:41 -0800,
David Mosberger <davidm@napali.hpl.hp.com> wrote:
>>>>>> On Mon, 24 Feb 2003 12:45:10 +1100, Keith Owens <kaos@sgi.com> said:
>
> Keith> generated a two bundle loop as you suggested, but BogoMIPS went down,
> Keith> not up.
>
>Note: 2 bundle != 2 cycle, but even ignoring that: what did you
>expect? BogoMIPS counts 2 instructions per loop iteration no matter
>how many instructions are being executed. Perhaps you can get the
>compiler to unroll the loop. Then you'd see a higher BogoMIPS.
Which loop needs unrolling? __delay generates
2d0: 11 00 00 00 01 00 [MIB] nop.m 0x0
2d6: 00 70 04 55 00 00 mov.i ar.lc=r14
2dc: 00 00 00 20 nop.b 0x0;;
2e0: 11 00 00 00 01 00 [MIB] nop.m 0x0
2e6: 00 00 00 02 00 a0 nop.i 0x0
2ec: 00 00 00 40 br.cloop.sptk.few 2e0 <calibrate_delay+0x100>;;
br.cloop is already a single bundle loop.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Linux-ia64] Re: strange performance behaviour with floats
2003-02-24 1:45 [Linux-ia64] Re: strange performance behaviour with floats Keith Owens
2003-02-24 1:50 ` David Mosberger
2003-02-24 2:01 ` Keith Owens
@ 2003-02-24 2:16 ` David Mosberger
2003-02-24 2:25 ` Keith Owens
2003-02-24 19:12 ` David Mosberger
4 siblings, 0 replies; 6+ messages in thread
From: David Mosberger @ 2003-02-24 2:16 UTC (permalink / raw)
To: linux-ia64
>>>>> On Mon, 24 Feb 2003 13:01:10 +1100, Keith Owens <kaos@sgi.com> said:
Keith> Which loop needs unrolling? __delay generates
Keith> 2d0: 11 00 00 00 01 00 [MIB] nop.m 0x0
Keith> 2d6: 00 70 04 55 00 00 mov.i ar.lc=r14
Keith> 2dc: 00 00 00 20 nop.b 0x0;;
Keith> 2e0: 11 00 00 00 01 00 [MIB] nop.m 0x0
Keith> 2e6: 00 00 00 02 00 a0 nop.i 0x0
Keith> 2ec: 00 00 00 40 br.cloop.sptk.few 2e0 <calibrate_delay+0x100>;;
Keith> br.cloop is already a single bundle loop.
You're toying with me, right? ;-)
Let me say this again: you _don't_ want a single-cycle loop. You want
a 2-cycle loop that gets twice the work done as a 1-cycle loop. That
is, you'd want to decrement the loop counter by 2, compare it against
zero, and branch if it's not zero yet, all the while making sure you
get a 2-cycle loop.
--david
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Linux-ia64] Re: strange performance behaviour with floats
2003-02-24 1:45 [Linux-ia64] Re: strange performance behaviour with floats Keith Owens
` (2 preceding siblings ...)
2003-02-24 2:16 ` David Mosberger
@ 2003-02-24 2:25 ` Keith Owens
2003-02-24 19:12 ` David Mosberger
4 siblings, 0 replies; 6+ messages in thread
From: Keith Owens @ 2003-02-24 2:25 UTC (permalink / raw)
To: linux-ia64
On Sun, 23 Feb 2003 18:16:57 -0800,
David Mosberger <davidm@napali.hpl.hp.com> wrote:
>Let me say this again: you _don't_ want a single-cycle loop. You want
>a 2-cycle loop that gets twice the work done as a 1-cycle loop. That
>is, you'd want to decrement the loop counter by 2, compare it against
>zero, and branch if it's not zero yet, all the while making sure you
>get a 2-cycle loop.
Now I see what you are getting at, sorry for the confusion.
Of course, the fact that udelay() does not use bogomips at all makes
the whole question irrelevant. We now return you to our regularly
scheduled delay loop :)
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Linux-ia64] Re: strange performance behaviour with floats
2003-02-24 1:45 [Linux-ia64] Re: strange performance behaviour with floats Keith Owens
` (3 preceding siblings ...)
2003-02-24 2:25 ` Keith Owens
@ 2003-02-24 19:12 ` David Mosberger
4 siblings, 0 replies; 6+ messages in thread
From: David Mosberger @ 2003-02-24 19:12 UTC (permalink / raw)
To: linux-ia64
>>>>> On Mon, 24 Feb 2003 13:25:12 +1100, Keith Owens <kaos@sgi.com> said:
Keith> On Sun, 23 Feb 2003 18:16:57 -0800,
Keith> David Mosberger <davidm@napali.hpl.hp.com> wrote:
>> Let me say this again: you _don't_ want a single-cycle loop. You want
>> a 2-cycle loop that gets twice the work done as a 1-cycle loop. That
>> is, you'd want to decrement the loop counter by 2, compare it against
>> zero, and branch if it's not zero yet, all the while making sure you
>> get a 2-cycle loop.
Keith> Now I see what you are getting at, sorry for the confusion.
No problem. ;-)
Keith> Of course, the fact that udelay() does not use bogomips at all makes
Keith> the whole question irrelevant. We now return you to our regularly
Keith> scheduled delay loop :)
True, but the code gcc generates for the BogoMIPS loops basically
looks identical to the one for udelay(), so the comments still apply.
--david
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2003-02-24 19:12 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-02-24 1:45 [Linux-ia64] Re: strange performance behaviour with floats Keith Owens
2003-02-24 1:50 ` David Mosberger
2003-02-24 2:01 ` Keith Owens
2003-02-24 2:16 ` David Mosberger
2003-02-24 2:25 ` Keith Owens
2003-02-24 19:12 ` David Mosberger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox