From: js@sig21.net (Johannes Stezenbach)
To: linux-arm-kernel@lists.infradead.org
Subject: Q: sched_clock() vs. clocksource, how to implement correctly
Date: Sat, 24 Apr 2010 22:50:11 +0200 [thread overview]
Message-ID: <20100424205011.GA21632@sig21.net> (raw)
In-Reply-To: <20100423182933.0df45b53@mschwide.boeblingen.de.ibm.com>
On Fri, Apr 23, 2010 at 06:29:33PM +0200, Martin Schwidefsky wrote:
> On Fri, 23 Apr 2010 17:09:17 +0200 > Johannes Stezenbach <js@sig21.net> wrote:
> >
> > - Isn't sched_clock() supposed to be extended to 64bit so
> > that it practically never wraps?
> > (old implementations use cnt32_to_63())
>
> Yes, sched_clock() is supposed to return a monotonic timestamp.
OK, searching LXR for sched_clock, I think serveral
sched_clock() implementations across architectures
have the wrapping issue.
BTW, wrapping sched_clock() also causes printk() timestamps to
wrap, so the problem should be easy to spot.
> > - What would be the effect on scheduling when sched_clock() wraps?
>
> It would confuse the process accounting and the scheduling I guess.
That's a bit vague ;-/
Let's try that: Could it cause processes calling e.g. nanosleep()
to sleep much longer than intended (e.g. until sched_clock() wraps
again)? Or can it only cause a momentary scheduler hickup,
i.e. the scheduler might make one wrong scheduling decision
per sched_clock() wrap? Given that there are serveral platforms
with wrapping sched_clock() I guess the latter.
> > - Is struct timecounter + struct cyclecounter + timecounter_read()
> > designated way to implement sched_clock() with a 32bit hw counter?
> >
> > arch/microblaze/kernel/timer.c seems to be the only user
> > of timecounter/cyclecounter in arch/, but I don't get what it does.
> >
> > Or is it better to stick with cnt32_to_63()?
>
> cnt32_to_63() is a way to extend the 32 bit clocksource to a useable
> sched_clock() provider. One way or the other you have to extend the 32
> bits of the 1MHz counter to something bigger. The obvious way is to
> count the number of wraps and use this number as the upper 32 bits
> just like cnt32_to_63() does. You just have to make sure that the
> clocksource is read frequently enough to get all the wraps. A non-idle
> cpu will tick with HZ frequency and the clock will be read often
> enough, for an idle cpu the maximum sleep time needs to be limited.
Somehow I got the impression that cnt32_to_63() is old-style,
and using clocksource_cyc2ns() is new-style. Essentially I wanted to
avoid the do_div() in sched_clock() (like e.g. in
arm/mach-versatile/core.c), and just using clocksource_cyc2ns()
looked simple and elegant. Bummer that it's wrong.
It seems that arch/arm/plat-orion/time.c is about the only
one which gets it completely right according to your description.
I'll quote it here for discussion:
/*
* Orion's sched_clock implementation. It has a resolution of
* at least 7.5ns (133MHz TCLK) and a maximum value of 834 days.
*
* Because the hardware timer period is quite short (21 secs if
* 200MHz TCLK) and because cnt32_to_63() needs to be called at
* least once per half period to work properly, a kernel timer is
* set up to ensure this requirement is always met.
*/
#define TCLK2NS_SCALE_FACTOR 8
static unsigned long tclk2ns_scale;
unsigned long long sched_clock(void)
{
unsigned long long v = cnt32_to_63(0xffffffff - readl(TIMER0_VAL));
return (v * tclk2ns_scale) >> TCLK2NS_SCALE_FACTOR;
}
static struct timer_list cnt32_to_63_keepwarm_timer;
static void cnt32_to_63_keepwarm(unsigned long data)
{
mod_timer(&cnt32_to_63_keepwarm_timer, round_jiffies(jiffies + data));
(void) sched_clock();
}
static void __init setup_sched_clock(unsigned long tclk)
{
unsigned long long v;
unsigned long data;
v = NSEC_PER_SEC;
v <<= TCLK2NS_SCALE_FACTOR;
v += tclk/2;
do_div(v, tclk);
/*
* We want an even value to automatically clear the top bit
* returned by cnt32_to_63() without an additional run time
* instruction. So if the LSB is 1 then round it up.
*/
if (v & 1)
v++;
tclk2ns_scale = v;
data = (0xffffffffUL / tclk / 2 - 2) * HZ;
setup_timer(&cnt32_to_63_keepwarm_timer, cnt32_to_63_keepwarm, data);
mod_timer(&cnt32_to_63_keepwarm_timer, round_jiffies(jiffies + data));
}
BTW, even though this uses TCLK2NS_SCALE_FACTOR of 8, the same file
uses a shift vaue of 20 for the orion_clksrc...
Thanks,
Johannes
next prev parent reply other threads:[~2010-04-24 20:50 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-23 15:09 Q: sched_clock() vs. clocksource, how to implement correctly Johannes Stezenbach
2010-04-23 16:29 ` Martin Schwidefsky
2010-04-24 20:50 ` Johannes Stezenbach [this message]
2010-04-25 14:36 ` Peter Zijlstra
2010-04-26 13:18 ` Johannes Stezenbach
2010-04-26 23:48 ` Daniel Walker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100424205011.GA21632@sig21.net \
--to=js@sig21.net \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).