linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] nomadik: prevent sched_clock() wraparound
@ 2010-11-16  9:11 Linus Walleij
  2010-11-16 20:53 ` john stultz
  0 siblings, 1 reply; 9+ messages in thread
From: Linus Walleij @ 2010-11-16  9:11 UTC (permalink / raw)
  To: linux-arm-kernel

The current implementation of sched_clock() for the Nomadik
family is based on the clock source that will wrap around without
any compensation. Currently on the Ux500 after 1030 seconds.
Utilize cnt32_to_63 to expand the sched_clock() counter to 63
bits and introduce a keepwarm() timer to assure that sched clock
and this cnt32_to_63 is called atleast once every half period.

Cc: Alessandro Rubini <rubini@unipv.it>
Cc: Colin Cross <ccross@google.com>
Cc: Rabin Vincent <rabin.vincent@stericsson.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
---
In this version of the patch I have taken the approach of making
sched_clock() based on the clock source timer, reusing the mult
and div settings from the clock source but expanding the counter
to 63 bits before doing so since sched_clock() need to be
continous.

This way I believe I take most of the mailing list feedback from
LKML into account.

This should probably go into the next rc or so since it's a very
real bug.
---
 arch/arm/plat-nomadik/timer.c |   64 ++++++++++++++++++++++++++++++++++------
 1 files changed, 54 insertions(+), 10 deletions(-)

diff --git a/arch/arm/plat-nomadik/timer.c b/arch/arm/plat-nomadik/timer.c
index aedf9c1..69272e8 100644
--- a/arch/arm/plat-nomadik/timer.c
+++ b/arch/arm/plat-nomadik/timer.c
@@ -3,6 +3,7 @@
  *
  * Copyright (C) 2008 STMicroelectronics
  * Copyright (C) 2010 Alessandro Rubini
+ * Copyright (C) 2010 Linus Walleij
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2, as
@@ -16,11 +17,13 @@
 #include <linux/clk.h>
 #include <linux/jiffies.h>
 #include <linux/err.h>
+#include <linux/cnt32_to_63.h>
+#include <linux/timer.h>
 #include <asm/mach/time.h>
 
 #include <plat/mtu.h>
 
-void __iomem *mtu_base; /* ssigned by machine code */
+void __iomem *mtu_base; /* Assigned by machine code */
 
 /*
  * Kernel assumes that sched_clock can be called early
@@ -48,16 +51,55 @@ static struct clocksource nmdk_clksrc = {
 /*
  * Override the global weak sched_clock symbol with this
  * local implementation which uses the clocksource to get some
- * better resolution when scheduling the kernel. We accept that
- * this wraps around for now, since it is just a relative time
- * stamp. (Inspired by OMAP implementation.)
+ * better resolution when scheduling the kernel.
+ *
+ * Because the hardware timer period may be quite short
+ * (32.3 secs on the 133 MHz MTU timer selection on ux500)
+ * and because cnt32_to_63() needs to be called at least once per
+ * half period to work properly, a kernel keepwarm() timer is set up
+ * to ensure this requirement is always met.
  */
-unsigned long long notrace sched_clock(void)
+static struct timer_list cnt32_to_63_keepwarm_timer;
+
+unsigned long long sched_clock(void)
+{
+	unsigned long long v;
+
+	if (unlikely(!mtu_base))
+		return 0;
+
+	v = cnt32_to_63(-readl(mtu_base + MTU_VAL(0)));
+	/* The highest bit is not valid */
+	v &= 0x7FFFFFFFFFFFFFFFLLU;
+	return clocksource_cyc2ns(v, nmdk_clksrc.mult, nmdk_clksrc.shift);
+}
+
+/* Just kick sched_clock every so often */
+static void cnt32_to_63_keepwarm(unsigned long data)
 {
-	return clocksource_cyc2ns(nmdk_clksrc.read(
-				  &nmdk_clksrc),
-				  nmdk_clksrc.mult,
-				  nmdk_clksrc.shift);
+	mod_timer(&cnt32_to_63_keepwarm_timer, round_jiffies(jiffies + data));
+	(void) sched_clock();
+}
+
+/*
+ * Set up a timer to keep sched_clock():s 32_to_63 algorithm warm
+ * once in half a 32bit timer wrap interval.
+ */
+static void __init nmdk_sched_clock_init(unsigned long rate)
+{
+	u32 v;
+	unsigned long delta;
+
+	/* Formula: seconds per wrap = (2^32) / f */
+	v = 0xFFFFFFFFUL / rate;
+	/* We want half of the wrap time to keep cnt32_to_63 warm */
+	v /= 2;
+	pr_debug("sched_clock: prescaled timer rate: %lu, "
+		 "initialize keepwarm timer every %d seconds\n", rate, v);
+	/* Convert seconds to jiffies */
+	delta = msecs_to_jiffies(v*1000);
+	setup_timer(&cnt32_to_63_keepwarm_timer, cnt32_to_63_keepwarm, delta);
+	mod_timer(&cnt32_to_63_keepwarm_timer, round_jiffies(jiffies + delta));
 }
 
 /* Clockevent device: use one-shot mode */
@@ -161,13 +203,15 @@ void __init nmdk_timer_init(void)
 	writel(0, mtu_base + MTU_BGLR(0));
 	writel(cr | MTU_CRn_ENA, mtu_base + MTU_CR(0));
 
-	/* Now the scheduling clock is ready */
+	/* Now the clock source is ready */
 	nmdk_clksrc.read = nmdk_read_timer;
 
 	if (clocksource_register(&nmdk_clksrc))
 		pr_err("timer: failed to initialize clock source %s\n",
 		       nmdk_clksrc.name);
 
+	nmdk_sched_clock_init(rate);
+
 	/* Timer 1 is used for events */
 
 	clockevents_calc_mult_shift(&nmdk_clkevt, rate, MTU_MIN_RANGE);
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH] nomadik: prevent sched_clock() wraparound
  2010-11-16  9:11 [PATCH] nomadik: prevent sched_clock() wraparound Linus Walleij
@ 2010-11-16 20:53 ` john stultz
  2010-11-16 22:15   ` Linus Walleij
  0 siblings, 1 reply; 9+ messages in thread
From: john stultz @ 2010-11-16 20:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 2010-11-16 at 10:11 +0100, Linus Walleij wrote:
> +unsigned long long sched_clock(void)
> +{
> +	unsigned long long v;
> +
> +	if (unlikely(!mtu_base))
> +		return 0;
> +
> +	v = cnt32_to_63(-readl(mtu_base + MTU_VAL(0)));
> +	/* The highest bit is not valid */
> +	v &= 0x7FFFFFFFFFFFFFFFLLU;
> +	return clocksource_cyc2ns(v, nmdk_clksrc.mult, nmdk_clksrc.shift);


I suspect this will still break. 

The cycle value being passed as v is likely to be large, and the
clocksource mult and shift are calculated to be as large as possible
without overflowing when given only a few seconds worth of cycles.

So its very likely that after a few seconds of running (or even less,
with the 32_to_63 conversion), the cyc2ns function will overflow, as
v*mult will be greater then 64bits.

So I'd advise *not* using the mult/shift pair calculated for
timekeeping, and possibly not using the clocksource_cyc2ns function (as
I'm likely to make that private to timekeeping at somepoint in the
future). Instead calculate a mult/shift pair using a smaller shift
value, so the overflow point will be much more rare.

For instance, on x86, the TSC based sched_clock uses a shift value of
10, but the TSC clocksource uses a shift value of ~30.

> +}
> +
> +/* Just kick sched_clock every so often */
> +static void cnt32_to_63_keepwarm(unsigned long data)
>  {
> -	return clocksource_cyc2ns(nmdk_clksrc.read(
> -				  &nmdk_clksrc),
> -				  nmdk_clksrc.mult,
> -				  nmdk_clksrc.shift);
> +	mod_timer(&cnt32_to_63_keepwarm_timer, round_jiffies(jiffies + data));
> +	(void) sched_clock();
> +}

Also, just FYI: This can be problematic on PREEMPT_RT systems, where
timers can be delayed by high priority processes hogging the cpu.

thanks
-john

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] nomadik: prevent sched_clock() wraparound
  2010-11-16 20:53 ` john stultz
@ 2010-11-16 22:15   ` Linus Walleij
  2010-11-16 22:31     ` Russell King - ARM Linux
  0 siblings, 1 reply; 9+ messages in thread
From: Linus Walleij @ 2010-11-16 22:15 UTC (permalink / raw)
  To: linux-arm-kernel

2010/11/16 john stultz <johnstul@us.ibm.com>:
> On Tue, 2010-11-16 at 10:11 +0100, Linus Walleij wrote:

> The cycle value being passed as v is likely to be large, and the
> clocksource mult and shift are calculated to be as large as possible
> without overflowing when given only a few seconds worth of cycles.
>
> So its very likely that after a few seconds of running (or even less,
> with the 32_to_63 conversion), the cyc2ns function will overflow, as
> v*mult will be greater then 64bits.

Darn you're right of course.

I'll attempt to use
clocks_calc_mult_shift(&mult, &shift, rate, NSEC_PER_SEC, 3600*24*365);

For getting a somewhat more proper mult+shift for sched_clock().

>> + ? ? mod_timer(&cnt32_to_63_keepwarm_timer, round_jiffies(jiffies + data));
>> + ? ? (void) sched_clock();
>
> Also, just FYI: This can be problematic on PREEMPT_RT systems, where
> timers can be delayed by high priority processes hogging the cpu.

Hm is there a proper remedy for this? The watchdog in kernel/watchdog.c
creates a HRtimer based thread, SCHED_FIFO and prio MAX_RT_PRIO-1
is that what you should do for the ultimate sched_clock()
cnt32_to_63 guard?

In that case this is ripe for breaking out into a separate helper
somewhere, like a callable
setup_cnt32_to_63_guard_timer(max_rate, sched_clock);

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] nomadik: prevent sched_clock() wraparound
  2010-11-16 22:15   ` Linus Walleij
@ 2010-11-16 22:31     ` Russell King - ARM Linux
  2010-11-17  8:34       ` Linus Walleij
  0 siblings, 1 reply; 9+ messages in thread
From: Russell King - ARM Linux @ 2010-11-16 22:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 16, 2010 at 11:15:02PM +0100, Linus Walleij wrote:
> 2010/11/16 john stultz <johnstul@us.ibm.com>:
> > On Tue, 2010-11-16 at 10:11 +0100, Linus Walleij wrote:
> 
> > The cycle value being passed as v is likely to be large, and the
> > clocksource mult and shift are calculated to be as large as possible
> > without overflowing when given only a few seconds worth of cycles.
> >
> > So its very likely that after a few seconds of running (or even less,
> > with the 32_to_63 conversion), the cyc2ns function will overflow, as
> > v*mult will be greater then 64bits.
> 
> Darn you're right of course.
> 
> I'll attempt to use
> clocks_calc_mult_shift(&mult, &shift, rate, NSEC_PER_SEC, 3600*24*365);
> 
> For getting a somewhat more proper mult+shift for sched_clock().

What kind of mult are you expecting?  Let's look at the code again:

> +     /* The highest bit is not valid */
> +     v &= 0x7FFFFFFFFFFFFFFFLLU;
> +     return clocksource_cyc2ns(v, nmdk_clksrc.mult, nmdk_clksrc.shift);

v is 63-bit.  Any multiply greater than two will result in an overflow,
which means the best you can achieve with this is basically a divide by
a power of two.  So you've lost accuracy in the factor conversion.

You might be better off limiting the size of 'v' to a smaller number of
bits, and then use clocksource_cyc2ns() with a multiplier which guarantees
that it won't overflow 64-bit math, but putting up with it wrapping more
often than a 63-bit value would give you.

I think you have a trade-off to make here, between time between wraps
and conversion accuracy.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] nomadik: prevent sched_clock() wraparound
  2010-11-16 22:31     ` Russell King - ARM Linux
@ 2010-11-17  8:34       ` Linus Walleij
  2010-11-17  8:38         ` Russell King - ARM Linux
  2010-11-17  9:26         ` Linus Walleij
  0 siblings, 2 replies; 9+ messages in thread
From: Linus Walleij @ 2010-11-17  8:34 UTC (permalink / raw)
  To: linux-arm-kernel

2010/11/16 Russell King - ARM Linux <linux@arm.linux.org.uk>:
> On Tue, Nov 16, 2010 at 11:15:02PM +0100, Linus Walleij wrote:

>> + ? ? /* The highest bit is not valid */
>> + ? ? v &= 0x7FFFFFFFFFFFFFFFLLU;
>> + ? ? return clocksource_cyc2ns(v, nmdk_clksrc.mult, nmdk_clksrc.shift);
>
> v is 63-bit. ?Any multiply greater than two will result in an overflow,
> which means the best you can achieve with this is basically a divide by
> a power of two. ?So you've lost accuracy in the factor conversion.

Hm! Then in this code in plat-orion/timer.c:

#define TCLK2NS_SCALE_FACTOR 8
static unsigned long tclk2ns_scale;

unsigned long long sched_clock(void)
{
        unsigned long long v = cnt32_to_63(0xffffffff - readl(TIMER0_VAL));
        return (v * tclk2ns_scale) >> TCLK2NS_SCALE_FACTOR;
}

This is some basic mult and shift with new names. But tclk2ns_scale
cannot be greater than 2, still it's calculated like this:

unsigned long long v;
unsigned long data;

v = NSEC_PER_SEC;
v <<= TCLK2NS_SCALE_FACTOR;
v += tclk/2;
do_div(v, tclk);
/*
 * We want an even value to automatically clear the top bit
 * returned by cnt32_to_63() without an additional run time
 * instruction. So if the LSB is 1 then round it up.
 */
if (v & 1)
        v++;
tclk2ns_scale = v;

Seems like it will never fly for long, the long long will overflow
pretty soon, though later than@th 32bit boundary.

And the Tegra code in mach-tegra/timer.c:

static cycle_t tegra_clocksource_read(struct clocksource *cs)
{
        return cnt32_to_63(timer_readl(TIMERUS_CNTR_1US));
}

This just won't fly for long either, the mult for the clocksource is
usually something in the order of 20.

The right way is likely to shift *first* and *then* multiply to get
the proper lossy conversion.

> I think you have a trade-off to make here, between time between wraps
> and conversion accuracy.

Yeah I'll make a try. Looking at the above code it seems we have
some cleanup to do once I settle on something.

Linus Walleij

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] nomadik: prevent sched_clock() wraparound
  2010-11-17  8:34       ` Linus Walleij
@ 2010-11-17  8:38         ` Russell King - ARM Linux
  2010-11-17 15:15           ` Nicolas Pitre
  2010-11-17  9:26         ` Linus Walleij
  1 sibling, 1 reply; 9+ messages in thread
From: Russell King - ARM Linux @ 2010-11-17  8:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 17, 2010 at 09:34:56AM +0100, Linus Walleij wrote:
> And the Tegra code in mach-tegra/timer.c:
> 
> static cycle_t tegra_clocksource_read(struct clocksource *cs)
> {
>         return cnt32_to_63(timer_readl(TIMERUS_CNTR_1US));
> }
> 
> This just won't fly for long either, the mult for the clocksource is
> usually something in the order of 20.

I thought John/Thomas said that using cnt32_to_63 in there wasn't a good
idea - sounds like it needs to be killed.  Colin?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] nomadik: prevent sched_clock() wraparound
  2010-11-17  8:34       ` Linus Walleij
  2010-11-17  8:38         ` Russell King - ARM Linux
@ 2010-11-17  9:26         ` Linus Walleij
  2010-11-17 15:04           ` Nicolas Pitre
  1 sibling, 1 reply; 9+ messages in thread
From: Linus Walleij @ 2010-11-17  9:26 UTC (permalink / raw)
  To: linux-arm-kernel

2010/11/17 Linus Walleij <linus.ml.walleij@gmail.com>:

> Hm! Then in this code in plat-orion/timer.c:
>
> #define TCLK2NS_SCALE_FACTOR 8
> static unsigned long tclk2ns_scale;
>
> unsigned long long sched_clock(void)
> {
> ? ? ? ?unsigned long long v = cnt32_to_63(0xffffffff - readl(TIMER0_VAL));
> ? ? ? ?return (v * tclk2ns_scale) >> TCLK2NS_SCALE_FACTOR;
> }

Looking at this again it seems the Orion is implicitly
loosing bits - so this code will work and multiply out some of the
63 bits so it just wraps around earlier.

Would be nice to have these kind of things explicit though,
it's a bit hard to determine when this counter will wrap.

Linus Walleij

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] nomadik: prevent sched_clock() wraparound
  2010-11-17  9:26         ` Linus Walleij
@ 2010-11-17 15:04           ` Nicolas Pitre
  0 siblings, 0 replies; 9+ messages in thread
From: Nicolas Pitre @ 2010-11-17 15:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 17 Nov 2010, Linus Walleij wrote:

> 2010/11/17 Linus Walleij <linus.ml.walleij@gmail.com>:
> 
> > Hm! Then in this code in plat-orion/timer.c:
> >
> > #define TCLK2NS_SCALE_FACTOR 8
> > static unsigned long tclk2ns_scale;
> >
> > unsigned long long sched_clock(void)
> > {
> > ? ? ? ?unsigned long long v = cnt32_to_63(0xffffffff - readl(TIMER0_VAL));
> > ? ? ? ?return (v * tclk2ns_scale) >> TCLK2NS_SCALE_FACTOR;
> > }
> 
> Looking at this again it seems the Orion is implicitly
> loosing bits - so this code will work and multiply out some of the
> 63 bits so it just wraps around earlier.

Exact.  And that's what the comment above it says:

 * Orion's sched_clock implementation. It has a resolution of
 * at least 7.5ns (133MHz TCLK) and a maximum value of 834 days.

The 834 days comes from the fact that the final shift preserves only 56 
bits from the result.

On PXA this is even worse:

 * This is PXA's sched_clock implementation. This has a resolution
 * of at least 308 ns and a maximum value of 208 days.

But again, in the context of sched_clock() this is fine.  We want 
sched_clock() to be fast, and having a glitch during that jiffy when the 
clock wraps back every 208 days is perfectly reasonable.

> Would be nice to have these kind of things explicit though,
> it's a bit hard to determine when this counter will wrap.

It is all commented, at least in those implementations I've written 
myself.


Nicolas

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] nomadik: prevent sched_clock() wraparound
  2010-11-17  8:38         ` Russell King - ARM Linux
@ 2010-11-17 15:15           ` Nicolas Pitre
  0 siblings, 0 replies; 9+ messages in thread
From: Nicolas Pitre @ 2010-11-17 15:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 17 Nov 2010, Russell King - ARM Linux wrote:

> On Wed, Nov 17, 2010 at 09:34:56AM +0100, Linus Walleij wrote:
> > And the Tegra code in mach-tegra/timer.c:
> > 
> > static cycle_t tegra_clocksource_read(struct clocksource *cs)
> > {
> >         return cnt32_to_63(timer_readl(TIMERUS_CNTR_1US));
> > }
> > 
> > This just won't fly for long either, the mult for the clocksource is
> > usually something in the order of 20.
> 
> I thought John/Thomas said that using cnt32_to_63 in there wasn't a good
> idea - sounds like it needs to be killed.  Colin?

Exact.  The cnt32_to_63() should not be used for clock source as this is 
lying to the clock source core where there is already all the code 
needed to deal with limited hardware timer ranges with all the required 
accuracy.

Furthermore, the above usage of cnt32_to_63() is buggy.  It is 
explicitly mentioned in the cnt32_to_63() documentation that the top bit 
(the 64th one) is undefined and must be cleared by the caller as it 
_does_ contain a random value to the caller.  This is not done by 
cnt32_to_63() directly because its returned value often has to be 
multiplied by a scaling factor and therefore the 64th bit masking can be 
done implicitly by that multiplication, saving on a runtime instruction 
(remember this was designed for sched_clock() usage and the goal there 
is to be fast).


Nicolas

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-11-17 15:15 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-16  9:11 [PATCH] nomadik: prevent sched_clock() wraparound Linus Walleij
2010-11-16 20:53 ` john stultz
2010-11-16 22:15   ` Linus Walleij
2010-11-16 22:31     ` Russell King - ARM Linux
2010-11-17  8:34       ` Linus Walleij
2010-11-17  8:38         ` Russell King - ARM Linux
2010-11-17 15:15           ` Nicolas Pitre
2010-11-17  9:26         ` Linus Walleij
2010-11-17 15:04           ` Nicolas Pitre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).