linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [RFC] ARM: lib: delay-loop: Add align directive to fix BogoMIPS calculation
@ 2013-11-22 11:53 Fabio Estevam
  2013-11-29 11:02 ` Fabio Estevam
  0 siblings, 1 reply; 4+ messages in thread
From: Fabio Estevam @ 2013-11-22 11:53 UTC (permalink / raw)
  To: linux-arm-kernel

From: Fabio Estevam <fabio.estevam@freescale.com>

Currently mx53 (CortexA8) running at 1GHz reports:
Calibrating delay loop... 663.55 BogoMIPS (lpj=3317760)

Tom Evans verified that alignments of 0x0 and 0x8 run the two instructions of
__loop_delay in one clock cycle (1 clock/loop), while alignments of 0x4 and 
0xc take 3 clocks to run the loop twice. (1.5 clock/loop)

The original object code looks like this:

00000010 <__loop_const_udelay>:
  10:	e3e01000 	mvn	r1, #0
  14:	e51f201c 	ldr	r2, [pc, #-28]	; 0 <__loop_udelay-0x8>
  18:	e5922000 	ldr	r2, [r2]
  1c:	e0800921 	add	r0, r0, r1, lsr #18
  20:	e1a00720 	lsr	r0, r0, #14
  24:	e0822b21 	add	r2, r2, r1, lsr #22
  28:	e1a02522 	lsr	r2, r2, #10
  2c:	e0000092 	mul	r0, r2, r0
  30:	e0800d21 	add	r0, r0, r1, lsr #26
  34:	e1b00320 	lsrs	r0, r0, #6
  38:	01a0f00e 	moveq	pc, lr

0000003c <__loop_delay>:
  3c:	e2500001 	subs	r0, r0, #1
  40:	8afffffe 	bhi	3c <__loop_delay>
  44:	e1a0f00e 	mov	pc, lr

After adding the 'align 3' directive to __loop_delay (align to 8 bytes):

00000010 <__loop_const_udelay>:
  10:	e3e01000 	mvn	r1, #0
  14:	e51f201c 	ldr	r2, [pc, #-28]	; 0 <__loop_udelay-0x8>
  18:	e5922000 	ldr	r2, [r2]
  1c:	e0800921 	add	r0, r0, r1, lsr #18
  20:	e1a00720 	lsr	r0, r0, #14
  24:	e0822b21 	add	r2, r2, r1, lsr #22
  28:	e1a02522 	lsr	r2, r2, #10
  2c:	e0000092 	mul	r0, r2, r0
  30:	e0800d21 	add	r0, r0, r1, lsr #26
  34:	e1b00320 	lsrs	r0, r0, #6
  38:	01a0f00e 	moveq	pc, lr
  3c:	e320f000 	nop	{0}

00000040 <__loop_delay>:
  40:	e2500001 	subs	r0, r0, #1
  44:	8afffffe 	bhi	40 <__loop_delay>
  48:	e1a0f00e 	mov	pc, lr
  4c:	e320f000 	nop	{0}

, which now reports:
Calibrating delay loop... 996.14 BogoMIPS (lpj=4980736)

Some more test results:

On mx31 (ARM1136) running at 532 MHz, before the patch:
Calibrating delay loop... 351.43 BogoMIPS (lpj=1757184)

On mx31 (ARM1136) running at 532 MHz after the patch:
Calibrating delay loop... 528.79 BogoMIPS (lpj=2643968)

Also tested on mx6 (CortexA9) and on mx27 (ARM926), which shows the same 
BogoMIPS value before and after this patch.

Reported-by: Tom Evans <tom_usenet@optusnet.com.au>
Suggested-by: Tom Evans <tom_usenet@optusnet.com.au>
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
---
 arch/arm/lib/delay-loop.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/lib/delay-loop.S b/arch/arm/lib/delay-loop.S
index 36b668d..5e5673b8 100644
--- a/arch/arm/lib/delay-loop.S
+++ b/arch/arm/lib/delay-loop.S
@@ -40,6 +40,7 @@ ENTRY(__loop_const_udelay)			@ 0 <= r0 <= 0x7fffff06
 /*
  * loops = r0 * HZ * loops_per_jiffy / 1000000
  */
+		.align 3
 
 @ Delay routine
 ENTRY(__loop_delay)
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [RFC] ARM: lib: delay-loop: Add align directive to fix BogoMIPS calculation
  2013-11-22 11:53 [RFC] ARM: lib: delay-loop: Add align directive to fix BogoMIPS calculation Fabio Estevam
@ 2013-11-29 11:02 ` Fabio Estevam
  2013-11-30 11:49   ` Russell King - ARM Linux
  0 siblings, 1 reply; 4+ messages in thread
From: Fabio Estevam @ 2013-11-29 11:02 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Russell,

On Fri, Nov 22, 2013 at 9:53 AM, Fabio Estevam <festevam@gmail.com> wrote:
> From: Fabio Estevam <fabio.estevam@freescale.com>
>
> Currently mx53 (CortexA8) running at 1GHz reports:
> Calibrating delay loop... 663.55 BogoMIPS (lpj=3317760)
>
> Tom Evans verified that alignments of 0x0 and 0x8 run the two instructions of
> __loop_delay in one clock cycle (1 clock/loop), while alignments of 0x4 and
> 0xc take 3 clocks to run the loop twice. (1.5 clock/loop)
>
> The original object code looks like this:
>
> 00000010 <__loop_const_udelay>:
>   10:   e3e01000        mvn     r1, #0
>   14:   e51f201c        ldr     r2, [pc, #-28]  ; 0 <__loop_udelay-0x8>
>   18:   e5922000        ldr     r2, [r2]
>   1c:   e0800921        add     r0, r0, r1, lsr #18
>   20:   e1a00720        lsr     r0, r0, #14
>   24:   e0822b21        add     r2, r2, r1, lsr #22
>   28:   e1a02522        lsr     r2, r2, #10
>   2c:   e0000092        mul     r0, r2, r0
>   30:   e0800d21        add     r0, r0, r1, lsr #26
>   34:   e1b00320        lsrs    r0, r0, #6
>   38:   01a0f00e        moveq   pc, lr
>
> 0000003c <__loop_delay>:
>   3c:   e2500001        subs    r0, r0, #1
>   40:   8afffffe        bhi     3c <__loop_delay>
>   44:   e1a0f00e        mov     pc, lr
>
> After adding the 'align 3' directive to __loop_delay (align to 8 bytes):
>
> 00000010 <__loop_const_udelay>:
>   10:   e3e01000        mvn     r1, #0
>   14:   e51f201c        ldr     r2, [pc, #-28]  ; 0 <__loop_udelay-0x8>
>   18:   e5922000        ldr     r2, [r2]
>   1c:   e0800921        add     r0, r0, r1, lsr #18
>   20:   e1a00720        lsr     r0, r0, #14
>   24:   e0822b21        add     r2, r2, r1, lsr #22
>   28:   e1a02522        lsr     r2, r2, #10
>   2c:   e0000092        mul     r0, r2, r0
>   30:   e0800d21        add     r0, r0, r1, lsr #26
>   34:   e1b00320        lsrs    r0, r0, #6
>   38:   01a0f00e        moveq   pc, lr
>   3c:   e320f000        nop     {0}
>
> 00000040 <__loop_delay>:
>   40:   e2500001        subs    r0, r0, #1
>   44:   8afffffe        bhi     40 <__loop_delay>
>   48:   e1a0f00e        mov     pc, lr
>   4c:   e320f000        nop     {0}
>
> , which now reports:
> Calibrating delay loop... 996.14 BogoMIPS (lpj=4980736)
>
> Some more test results:
>
> On mx31 (ARM1136) running at 532 MHz, before the patch:
> Calibrating delay loop... 351.43 BogoMIPS (lpj=1757184)
>
> On mx31 (ARM1136) running at 532 MHz after the patch:
> Calibrating delay loop... 528.79 BogoMIPS (lpj=2643968)
>
> Also tested on mx6 (CortexA9) and on mx27 (ARM926), which shows the same
> BogoMIPS value before and after this patch.
>
> Reported-by: Tom Evans <tom_usenet@optusnet.com.au>
> Suggested-by: Tom Evans <tom_usenet@optusnet.com.au>
> Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>

Any comments on this, please?

Regards,

Fabio Estevam

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [RFC] ARM: lib: delay-loop: Add align directive to fix BogoMIPS calculation
  2013-11-29 11:02 ` Fabio Estevam
@ 2013-11-30 11:49   ` Russell King - ARM Linux
  2013-11-30 14:31     ` Fabio Estevam
  0 siblings, 1 reply; 4+ messages in thread
From: Russell King - ARM Linux @ 2013-11-30 11:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Nov 29, 2013 at 09:02:05AM -0200, Fabio Estevam wrote:
> Hi Russell,
> 
> On Fri, Nov 22, 2013 at 9:53 AM, Fabio Estevam <festevam@gmail.com> wrote:
> > From: Fabio Estevam <fabio.estevam@freescale.com>
> >
> > Currently mx53 (CortexA8) running at 1GHz reports:
> > Calibrating delay loop... 663.55 BogoMIPS (lpj=3317760)
> >
> > Tom Evans verified that alignments of 0x0 and 0x8 run the two instructions of
> > __loop_delay in one clock cycle (1 clock/loop), while alignments of 0x4 and
> > 0xc take 3 clocks to run the loop twice. (1.5 clock/loop)
> >
> > The original object code looks like this:
> >
> > 00000010 <__loop_const_udelay>:
> >   10:   e3e01000        mvn     r1, #0
> >   14:   e51f201c        ldr     r2, [pc, #-28]  ; 0 <__loop_udelay-0x8>
> >   18:   e5922000        ldr     r2, [r2]
> >   1c:   e0800921        add     r0, r0, r1, lsr #18
> >   20:   e1a00720        lsr     r0, r0, #14
> >   24:   e0822b21        add     r2, r2, r1, lsr #22
> >   28:   e1a02522        lsr     r2, r2, #10
> >   2c:   e0000092        mul     r0, r2, r0
> >   30:   e0800d21        add     r0, r0, r1, lsr #26
> >   34:   e1b00320        lsrs    r0, r0, #6
> >   38:   01a0f00e        moveq   pc, lr
> >
> > 0000003c <__loop_delay>:
> >   3c:   e2500001        subs    r0, r0, #1
> >   40:   8afffffe        bhi     3c <__loop_delay>
> >   44:   e1a0f00e        mov     pc, lr
> >
> > After adding the 'align 3' directive to __loop_delay (align to 8 bytes):
> >
> > 00000010 <__loop_const_udelay>:
> >   10:   e3e01000        mvn     r1, #0
> >   14:   e51f201c        ldr     r2, [pc, #-28]  ; 0 <__loop_udelay-0x8>
> >   18:   e5922000        ldr     r2, [r2]
> >   1c:   e0800921        add     r0, r0, r1, lsr #18
> >   20:   e1a00720        lsr     r0, r0, #14
> >   24:   e0822b21        add     r2, r2, r1, lsr #22
> >   28:   e1a02522        lsr     r2, r2, #10
> >   2c:   e0000092        mul     r0, r2, r0
> >   30:   e0800d21        add     r0, r0, r1, lsr #26
> >   34:   e1b00320        lsrs    r0, r0, #6
> >   38:   01a0f00e        moveq   pc, lr
> >   3c:   e320f000        nop     {0}
> >
> > 00000040 <__loop_delay>:
> >   40:   e2500001        subs    r0, r0, #1
> >   44:   8afffffe        bhi     40 <__loop_delay>
> >   48:   e1a0f00e        mov     pc, lr
> >   4c:   e320f000        nop     {0}
> >
> > , which now reports:
> > Calibrating delay loop... 996.14 BogoMIPS (lpj=4980736)
> >
> > Some more test results:
> >
> > On mx31 (ARM1136) running at 532 MHz, before the patch:
> > Calibrating delay loop... 351.43 BogoMIPS (lpj=1757184)
> >
> > On mx31 (ARM1136) running at 532 MHz after the patch:
> > Calibrating delay loop... 528.79 BogoMIPS (lpj=2643968)
> >
> > Also tested on mx6 (CortexA9) and on mx27 (ARM926), which shows the same
> > BogoMIPS value before and after this patch.
> >
> > Reported-by: Tom Evans <tom_usenet@optusnet.com.au>
> > Suggested-by: Tom Evans <tom_usenet@optusnet.com.au>
> > Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
> 
> Any comments on this, please?

Any chance that you could run hackbench, and build the kernel with
-falign-functions=32, comparing the kernel without and with this
option ?

If alignment has as much effect as the above suggests, the results
may be interesting.

As far as this patch is concerned, I'm happy with it, please put it in
the patch system, thanks.

Thanks.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [RFC] ARM: lib: delay-loop: Add align directive to fix BogoMIPS calculation
  2013-11-30 11:49   ` Russell King - ARM Linux
@ 2013-11-30 14:31     ` Fabio Estevam
  0 siblings, 0 replies; 4+ messages in thread
From: Fabio Estevam @ 2013-11-30 14:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Nov 30, 2013 at 9:49 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:

> Any chance that you could run hackbench, and build the kernel with
> -falign-functions=32, comparing the kernel without and with this
> option ?
>
> If alignment has as much effect as the above suggests, the results
> may be interesting.

Yes, that would be interesting to know.

I will try to run this test next week, or maybe Tom Evans could try it
sooner if he has a chance.

>
> As far as this patch is concerned, I'm happy with it, please put it in
> the patch system, thanks.

Sent it as 7907/1.

Regards,

Fabio Estevam

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-11-30 14:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-22 11:53 [RFC] ARM: lib: delay-loop: Add align directive to fix BogoMIPS calculation Fabio Estevam
2013-11-29 11:02 ` Fabio Estevam
2013-11-30 11:49   ` Russell King - ARM Linux
2013-11-30 14:31     ` Fabio Estevam

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).