linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: linux@arm.linux.org.uk (Russell King - ARM Linux)
To: linux-arm-kernel@lists.infradead.org
Subject: udelay() broken for SMP cores?
Date: Sat, 8 Jan 2011 23:24:27 +0000	[thread overview]
Message-ID: <20110108232427.GA19891@n2100.arm.linux.org.uk> (raw)
In-Reply-To: <20100422001417.GF27575@shareable.org>

On Thu, Apr 22, 2010 at 01:14:17AM +0100, Jamie Lokier wrote:
> Russell King - ARM Linux wrote:
> > Ok, since you seem to have a clear idea how to convert this into a double
> > nested loop, try converting it:
> > 
> > 						@ 0 <= r0 <= 0x7fffff06
> >                 ldr     r2, .LC0 (loops_per_jiffy)
> >                 ldr     r2, [r2]                @ max = 0x01ffffff
> >                 mov     r0, r0, lsr #14         @ max = 0x0001ffff
> >                 mov     r2, r2, lsr #10         @ max = 0x00007fff
> >                 mul     r0, r2, r0              @ max = 2^32-1
> >                 movs    r0, r0, lsr #6
> >                 moveq   pc, lr
> > 1:              subs    r0, r0, #1
> >                 bhi     1b
> >                 mov     pc, lr
> > 
> > into two loops without losing the precision - note that the multiply
> > is part of a 'dividing by multiply+shift' technique.
> 
> 	ldr	r2, loops_per_jiffy
> 	ldr	r3, microseconds_per_jiffy
>         mov     r4, r2
> 1:      subs    r4, r4, r3
>         bhi     1b
>         subs    r0, r0, #1
>         add     r4, r4, r2
>         bhi     1b
>         mov     pc, lr
> 
> Goodnight :)

I thought I'd dig this out and give it a go - but it has problems.  Let's
say usec_per_jiffy is 10000.  Initially, loops_per_jiffy is 1<<12 or 4096
at boot.

If udelay() is used prior to calibration (it is - see things like OMAP/8250
console drivers which use udelay(1)), the initial loops_per_jiffy value
will be used.

So, r0 = 10000.  r3 = 10000.  r2 = 4096.

	mov	r4, r2		@ r4 := 4096
1:	subs	r4, r4, r3	@ r4 -= 10000 := -5904
	bhi	1b		@ not taken
	subs	r0, r0, #1	@ r0 -= 1 := 9999
	add	r4, r4, r2	@ r4 += 4096 := -1808 (or 4294965488)
	bhi	1b		@ taken

That's the first iteration.  The next iteration:

1:	subs	r4, r4, r3	@ r4 -= 10000 := 4294955488
	bhi	1b		@ taken
1:	subs	r4, r4, r3	@ r4 -= 10000 := 4294945488
	bhi	1b		@ taken
... which means we have about 429493 loops to go ...

So this becomes an extremely slow loop - it works when loops_per_jiffy >
usec_per_jiffy.

Even with a value of 8192 (the first tried lpj in the calibration loop),
things eventually go wrong - r4 on each iteration goes -1808, -3616, ..
-9040 and then we're into the problem above - and this will be the case
for anyone with HZ=100.

So, this solution has undesirable behaviours... and this is what I've
come up with - we manually increase the lpj and decrease the required
delay by a power of two until we meet the necessary preconditions
(lpj >= usec/jiffy).

.LC0:		.long	loops_per_jiffy
		.long	(1000000 + (HZ / 2))/HZ

ENTRY(__delay)
                ldr     r3, .LC0 + 4		@ usec/jiffy
                mov     r2, r0
                mov     r0, r3
                b       2f
ENTRY(__udelay)
                ldr     r2, .LC0
                ldr     r3, .LC0 + 4		@ usec/jiffy
                ldr     r2, [r2]		@ lpj
2:              cmp     r2, r3
                movcc   r2, r2, lsl #1
                movcc   r0, r0, lsr #1
                bcc     2b
                mov     ip, r2
1:              subs    ip, ip, r3
                addls   ip, ip, r2
                sublss  r0, r0, #2
                bhi     1b
                mov     pc, lr

Note that I've also tweaked the loop a little to make the cycle count
(in theory) around the loop the same no matter what it does.

This way, I get the same lpj calibration value as the old way - which is
good as with the old way, we were calibrating just this loop:

1:		subs	r0, r0, #1
		bhi	1b
		mov	pc, lr

where r0 = lpj and the target delay time was 1 jiffy.

Now, what sparked this off was:

> > We could go to ns delays, but then we have a big problem - the cost of
> > calculating the number of loops starts to become significant compared to
> > the delays - and that's a quality of implementation factor.  In fact,
> > the existing cost has always been significant for short delays for
> > slower (sub-100MHz) ARMs.
> 
> I'm surprised it makes much difference to, say, 20MHz ARMs because you
> could structure it as a nested loop, the inner one executed once per
> microsecond and calibrated to 1us.  The delays don't have to be super
> accurate.

With this nested loop approach we can't go to ns resolution.
nsec_per_jiffy would be 10000000, and with an initial loops_per_jiffy
of 4096 or 8192, this would be extremely bad.

That said, I do think your approach has merit - especially as we're
now seeing CPUs in the 2000 BogoMips range, and our existing solution
goes bad at 3355 BogoMips.  As the board I have is something like 8
months old we've probably got what, 10 months left according to Moore's
law?

  reply	other threads:[~2011-01-08 23:24 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-21  2:19 udelay() broken for SMP cores? Saravana Kannan
2010-04-21  4:56 ` Shilimkar, Santosh
2010-04-21  6:43   ` Saravana Kannan
2010-04-21  7:22     ` Russell King - ARM Linux
2010-04-21  9:39       ` skannan at codeaurora.org
2010-04-21  9:50         ` Russell King - ARM Linux
2010-04-21  9:58           ` Gilles Chanteperdrix
2010-04-21 10:00           ` Jamie Lokier
2010-04-21 19:29             ` Russell King - ARM Linux
2010-04-21 19:52               ` Jamie Lokier
2010-04-21 20:21                 ` Russell King - ARM Linux
2010-04-21 20:47                   ` Jamie Lokier
2010-04-21 20:57                     ` Russell King - ARM Linux
2010-04-22  0:14                       ` Jamie Lokier
2011-01-08 23:24                         ` Russell King - ARM Linux [this message]
2010-04-21 10:31           ` skannan at codeaurora.org
2010-04-21 19:33             ` Russell King - ARM Linux
2010-04-21 23:47               ` Saravana Kannan
2010-04-23  9:00           ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110108232427.GA19891@n2100.arm.linux.org.uk \
    --to=linux@arm.linux.org.uk \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).