* [RFC PATCH] sched_clock: also call register_current_timer_delay() if possible
@ 2014-04-30 12:23 Sebastian Andrzej Siewior
2014-04-30 12:48 ` Will Deacon
0 siblings, 1 reply; 6+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-04-30 12:23 UTC (permalink / raw)
To: linux-arm-kernel
On ARM one has to call register_current_timer_delay() in order to use
the (quick) timer based calibrarion instead of the a little slower loop
based delay.
The timer function specified in register_current_timer_delay() is also
used by read_current_timer() which would otherwise return -ENXIO.
And read_current_timer() timer is used by get_cycles() which in turn is
used by random_get_entropy(). That means each sub-architecture returned
here 0 if register_current_timer_delay() was no performed.
The parameters for for sched_clock_register() and
register_current_timer_delay() are mostly the same. Instead of calling
register_current_timer_delay() each time just after (or before)
sched_clock_register() I thought it might easier by doing it once at
sched_clock time since the parameter are the same. That means each ARM sub
arch that working sched-clock would also have a working random_get_entropy()
implementation.
Any comments?
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
arch/arm/include/asm/delay.h | 1 +
kernel/time/sched_clock.c | 20 ++++++++++++++++++++
2 files changed, 21 insertions(+)
diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
index dff714d..0ae3ff9 100644
--- a/arch/arm/include/asm/delay.h
+++ b/arch/arm/include/asm/delay.h
@@ -64,6 +64,7 @@ extern void __loop_const_udelay(unsigned long);
/* Delay-loop timer registration. */
#define ARCH_HAS_READ_CURRENT_TIMER
+#define ARCH_HAS_CURRENT_TIMER_DELAY
extern void register_current_timer_delay(const struct delay_timer *timer);
#endif /* __ASSEMBLY__ */
diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index 0abb364..dc09496 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -17,6 +17,7 @@
#include <linux/sched_clock.h>
#include <linux/seqlock.h>
#include <linux/bitops.h>
+#include <linux/delay.h>
struct clock_data {
ktime_t wrap_kt;
@@ -112,6 +113,24 @@ static enum hrtimer_restart sched_clock_poll(struct hrtimer *hrt)
hrtimer_forward_now(hrt, cd.wrap_kt);
return HRTIMER_RESTART;
}
+#ifdef ARCH_HAS_CURRENT_TIMER_DELAY
+static struct delay_timer generic_timer;
+
+static void __init do_register_current_timer_delay(u64 (*read)(void),
+ unsigned long rate)
+{
+ if (generic_timer.read_current_timer)
+ return;
+ generic_timer.read_current_timer = (unsigned long (*)(void))read;
+ generic_timer.freq = rate;
+ register_current_timer_delay(&generic_timer);
+}
+
+#else
+
+static inline void do_register_current_timer_delay(u64 (*read)(void),
+ unsigned long rate) {}
+#endif
void __init sched_clock_register(u64 (*read)(void), int bits,
unsigned long rate)
@@ -162,6 +181,7 @@ void __init sched_clock_register(u64 (*read)(void), int bits,
enable_sched_clock_irqtime();
pr_debug("Registered %pF as sched_clock source\n", read);
+ do_register_current_timer_delay(read, rate);
}
void __init setup_sched_clock(u32 (*read)(void), int bits, unsigned long rate)
--
1.9.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [RFC PATCH] sched_clock: also call register_current_timer_delay() if possible
2014-04-30 12:23 [RFC PATCH] sched_clock: also call register_current_timer_delay() if possible Sebastian Andrzej Siewior
@ 2014-04-30 12:48 ` Will Deacon
2014-04-30 13:01 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 6+ messages in thread
From: Will Deacon @ 2014-04-30 12:48 UTC (permalink / raw)
To: linux-arm-kernel
Hi Sebastian,
On Wed, Apr 30, 2014 at 01:23:34PM +0100, Sebastian Andrzej Siewior wrote:
> On ARM one has to call register_current_timer_delay() in order to use
> the (quick) timer based calibrarion instead of the a little slower loop
> based delay.
> The timer function specified in register_current_timer_delay() is also
> used by read_current_timer() which would otherwise return -ENXIO.
> And read_current_timer() timer is used by get_cycles() which in turn is
> used by random_get_entropy(). That means each sub-architecture returned
> here 0 if register_current_timer_delay() was no performed.
>
> The parameters for for sched_clock_register() and
> register_current_timer_delay() are mostly the same. Instead of calling
> register_current_timer_delay() each time just after (or before)
> sched_clock_register() I thought it might easier by doing it once at
> sched_clock time since the parameter are the same. That means each ARM sub
> arch that working sched-clock would also have a working random_get_entropy()
> implementation.
>
> Any comments?
As long as sched_clock is guaranteed to be a fixed frequency, always-on
clocksource then this could work, but it removes the flexibility of having
a separate delay clock and sched clock (is this useful?).
Looking at your patch, I noticed that we need to extend the
register_current_timer_delay function to deal with clocks that aren't as
wide as cycle_t, otherwise we don't delay() for long enough when the clock
overflows (this is potentially already an issue for architected timers <
64-bit). Could you cook a patch for that please?
Will
^ permalink raw reply [flat|nested] 6+ messages in thread
* [RFC PATCH] sched_clock: also call register_current_timer_delay() if possible
2014-04-30 12:48 ` Will Deacon
@ 2014-04-30 13:01 ` Sebastian Andrzej Siewior
2014-04-30 13:26 ` Will Deacon
0 siblings, 1 reply; 6+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-04-30 13:01 UTC (permalink / raw)
To: linux-arm-kernel
On 04/30/2014 02:48 PM, Will Deacon wrote:
> Hi Sebastian,
Hi Will,
> As long as sched_clock is guaranteed to be a fixed frequency, always-on
> clocksource then this could work, but it removes the flexibility of having
> a separate delay clock and sched clock (is this useful?).
> Looking at your patch, I noticed that we need to extend the
> register_current_timer_delay function to deal with clocks that aren't as
> wide as cycle_t, otherwise we don't delay() for long enough when the clock
> overflows (this is potentially already an issue for architected timers <
> 64-bit). Could you cook a patch for that please?
Sure, I would change the type from long to u64 and fix all users. Would
that be okay for you?
>
> Will
Sebastian
^ permalink raw reply [flat|nested] 6+ messages in thread
* [RFC PATCH] sched_clock: also call register_current_timer_delay() if possible
2014-04-30 13:01 ` Sebastian Andrzej Siewior
@ 2014-04-30 13:26 ` Will Deacon
2014-04-30 16:56 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 6+ messages in thread
From: Will Deacon @ 2014-04-30 13:26 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Apr 30, 2014 at 02:01:32PM +0100, Sebastian Andrzej Siewior wrote:
> On 04/30/2014 02:48 PM, Will Deacon wrote:
> > Hi Sebastian,
>
> Hi Will,
>
> > As long as sched_clock is guaranteed to be a fixed frequency, always-on
> > clocksource then this could work, but it removes the flexibility of having
> > a separate delay clock and sched clock (is this useful?).
>
>
>
> > Looking at your patch, I noticed that we need to extend the
> > register_current_timer_delay function to deal with clocks that aren't as
> > wide as cycle_t, otherwise we don't delay() for long enough when the clock
> > overflows (this is potentially already an issue for architected timers <
> > 64-bit). Could you cook a patch for that please?
>
> Sure, I would change the type from long to u64 and fix all users. Would
> that be okay for you?
I don't think that's the problem I was referring to. What I mean is that a
clocksource might overflow at any number of bits, so the delay calculation
needs to take this into account when it does:
while ((get_cycles() - start) < cycles)
because a premature overflow from get_cycles() will cause us to return
early. The solution is to mask the result of the subtraction before the
comparison to match the width of the clock.
Will
^ permalink raw reply [flat|nested] 6+ messages in thread
* [RFC PATCH] sched_clock: also call register_current_timer_delay() if possible
2014-04-30 13:26 ` Will Deacon
@ 2014-04-30 16:56 ` Sebastian Andrzej Siewior
2014-05-02 16:50 ` Will Deacon
0 siblings, 1 reply; 6+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-04-30 16:56 UTC (permalink / raw)
To: linux-arm-kernel
* Will Deacon | 2014-04-30 14:26:28 [+0100]:
Hi Will,
>I don't think that's the problem I was referring to. What I mean is that a
>clocksource might overflow at any number of bits, so the delay calculation
>needs to take this into account when it does:
>
> while ((get_cycles() - start) < cycles)
>
>because a premature overflow from get_cycles() will cause us to return
>early. The solution is to mask the result of the subtraction before the
>comparison to match the width of the clock.
So I got this:
diff --git a/arch/arm/include/asm/delay.h b/arch/arm/include/asm/delay.h
index dff714d..49c2e93 100644
--- a/arch/arm/include/asm/delay.h
+++ b/arch/arm/include/asm/delay.h
@@ -18,6 +18,7 @@
struct delay_timer {
unsigned long (*read_current_timer)(void);
unsigned long freq;
+ unsigned int bits;
};
extern struct arm_delay_ops {
@@ -25,6 +26,7 @@ extern struct arm_delay_ops {
void (*const_udelay)(unsigned long);
void (*udelay)(unsigned long);
unsigned long ticks_per_jiffy;
+ u32 mask;
} arm_delay_ops;
#define __delay(n) arm_delay_ops.delay(n)
diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
index 5306de3..dd32cc9 100644
--- a/arch/arm/lib/delay.c
+++ b/arch/arm/lib/delay.c
@@ -50,8 +50,9 @@ EXPORT_SYMBOL_GPL(read_current_timer);
static void __timer_delay(unsigned long cycles)
{
cycles_t start = get_cycles();
+ cycles_t mask = arm_delay_ops.mask;
- while ((get_cycles() - start) < cycles)
+ while (((get_cycles() - start) & mask) < cycles)
cpu_relax();
}
@@ -69,6 +70,10 @@ static void __timer_udelay(unsigned long usecs)
void __init register_current_timer_delay(const struct delay_timer *timer)
{
+ if (timer->bits > 32) {
+ pr_err("timer delay bits are limited to 32bit.\n");
+ return;
+ }
if (!delay_calibrated) {
pr_info("Switching to timer-based delay loop\n");
delay_timer = timer;
@@ -79,6 +84,7 @@ void __init register_current_timer_delay(const struct delay_timer *timer)
arm_delay_ops.delay = __timer_delay;
arm_delay_ops.const_udelay = __timer_const_udelay;
arm_delay_ops.udelay = __timer_udelay;
+ arm_delay_ops.mask = (1ULL << timer->bits) - 1;
delay_calibrated = true;
} else {
diff --git a/arch/arm/mach-imx/time.c b/arch/arm/mach-imx/time.c
index 65222ea..7ee80f5 100644
--- a/arch/arm/mach-imx/time.c
+++ b/arch/arm/mach-imx/time.c
@@ -131,6 +131,7 @@ static int __init mxc_clocksource_init(struct clk *timer_clk)
imx_delay_timer.read_current_timer = &imx_read_current_timer;
imx_delay_timer.freq = c;
+ imx_delay_timer.bits = 32;
register_current_timer_delay(&imx_delay_timer);
sched_clock_reg = reg;
diff --git a/drivers/clocksource/nomadik-mtu.c b/drivers/clocksource/nomadik-mtu.c
index a709cfa..aec6a61 100644
--- a/drivers/clocksource/nomadik-mtu.c
+++ b/drivers/clocksource/nomadik-mtu.c
@@ -241,6 +241,7 @@ static void __init nmdk_timer_init(void __iomem *base, int irq,
mtu_delay_timer.read_current_timer = &nmdk_timer_read_current_timer;
mtu_delay_timer.freq = rate;
+ mtu_delay_timer.bits = 32;
register_current_timer_delay(&mtu_delay_timer);
}
diff --git a/drivers/clocksource/timer-u300.c b/drivers/clocksource/timer-u300.c
index 5dcf756..39633aa 100644
--- a/drivers/clocksource/timer-u300.c
+++ b/drivers/clocksource/timer-u300.c
@@ -389,6 +389,7 @@ static void __init u300_timer_init_of(struct device_node *np)
u300_delay_timer.read_current_timer = &u300_read_current_timer;
u300_delay_timer.freq = rate;
+ u300_delay_timer.bits = 32;
register_current_timer_delay(&u300_delay_timer);
/*
Is this what you had in mind? If so, there is one user of
register_current_timer_delay() which I didn't convert. That is
arch_timer_delay_timer_register(). It returns arch_counter_get_cntvct()
which seems to return an u64 (which is truncated to 32bit). However
arch_counter_register() registers the clocksource with 56bits. So this
does not look too good, right?
The other thing I noticed is
|arch/arm/include/asm/timex.h:typedef unsigned long cycles_t;
This works for clocksource because timekeeping is using
|include/linux/clocksource.h:typedef u64 cycle_t;
instead.
Do I assume correct, that the arch_timer is really providing a number
wider than 32bit? Shouldn't I then promote cycles_t to 64bit if that
timer is active? Unless you have better suggestions of course :)
>Will
Sebastian
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [RFC PATCH] sched_clock: also call register_current_timer_delay() if possible
2014-04-30 16:56 ` Sebastian Andrzej Siewior
@ 2014-05-02 16:50 ` Will Deacon
0 siblings, 0 replies; 6+ messages in thread
From: Will Deacon @ 2014-05-02 16:50 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Apr 30, 2014 at 05:56:53PM +0100, Sebastian Andrzej Siewior wrote:
> * Will Deacon | 2014-04-30 14:26:28 [+0100]:
> >I don't think that's the problem I was referring to. What I mean is that a
> >clocksource might overflow at any number of bits, so the delay calculation
> >needs to take this into account when it does:
> >
> > while ((get_cycles() - start) < cycles)
> >
> >because a premature overflow from get_cycles() will cause us to return
> >early. The solution is to mask the result of the subtraction before the
> >comparison to match the width of the clock.
>
> So I got this:
[...]
> Is this what you had in mind? If so, there is one user of
> register_current_timer_delay() which I didn't convert. That is
> arch_timer_delay_timer_register(). It returns arch_counter_get_cntvct()
> which seems to return an u64 (which is truncated to 32bit). However
> arch_counter_register() registers the clocksource with 56bits. So this
> does not look too good, right?
That should be fine, I think there's only an issue if you can overflow
twice during a single delay operation, so the thing would need to be
ticking at quite a frequency for that to happen!
> The other thing I noticed is
> |arch/arm/include/asm/timex.h:typedef unsigned long cycles_t;
>
> This works for clocksource because timekeeping is using
> |include/linux/clocksource.h:typedef u64 cycle_t;
>
> instead.
> Do I assume correct, that the arch_timer is really providing a number
> wider than 32bit? Shouldn't I then promote cycles_t to 64bit if that
> timer is active? Unless you have better suggestions of course :)
The architected timer is guaranteed to be at least 56 bits wide, but I
think we can safely truncate delay sources to 32-bit.
So actually, we only have a problem if people want to register delay clocks
smaller than 32-bit. Maybe it's simpler to enforce at least 32-bit precision
and don't bother with the registration if the clock is smaller than that?
You could use sizeof(cycles_t) to parameterise that.
Will
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-05-02 16:50 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-30 12:23 [RFC PATCH] sched_clock: also call register_current_timer_delay() if possible Sebastian Andrzej Siewior
2014-04-30 12:48 ` Will Deacon
2014-04-30 13:01 ` Sebastian Andrzej Siewior
2014-04-30 13:26 ` Will Deacon
2014-04-30 16:56 ` Sebastian Andrzej Siewior
2014-05-02 16:50 ` Will Deacon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).