Q: sched_clock() vs. clocksource, how to implement correctly

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

From: js@sig21.net (Johannes Stezenbach)
To: linux-arm-kernel@lists.infradead.org
Subject: Q: sched_clock() vs. clocksource, how to implement correctly
Date: Sat, 24 Apr 2010 22:50:11 +0200	[thread overview]
Message-ID: <20100424205011.GA21632@sig21.net> (raw)
In-Reply-To: <20100423182933.0df45b53@mschwide.boeblingen.de.ibm.com>

On Fri, Apr 23, 2010 at 06:29:33PM +0200, Martin Schwidefsky wrote:
> On Fri, 23 Apr 2010 17:09:17 +0200 > Johannes Stezenbach <js@sig21.net> wrote:
> > 
> > - Isn't sched_clock() supposed to be extended to 64bit so
> >   that it practically never wraps?
> >   (old implementations use cnt32_to_63())
> 
> Yes, sched_clock() is supposed to return a monotonic timestamp.

OK, searching LXR for sched_clock, I think serveral
sched_clock() implementations across architectures
have the wrapping issue.

BTW, wrapping sched_clock() also causes printk() timestamps to
wrap, so the problem should be easy to spot.

> > - What would be the effect on scheduling when sched_clock() wraps?
> 
> It would confuse the process accounting and the scheduling I guess.

That's a bit vague ;-/
Let's try that: Could it cause processes calling e.g. nanosleep()
to sleep much longer than intended (e.g. until sched_clock() wraps
again)?   Or can it only cause a momentary scheduler hickup,
i.e. the scheduler might make one wrong scheduling decision
per sched_clock() wrap?  Given that there are serveral platforms
with wrapping sched_clock() I guess the latter.

> > - Is struct timecounter + struct cyclecounter + timecounter_read()
> >   designated way to implement sched_clock() with a 32bit hw counter?
> > 
> >   arch/microblaze/kernel/timer.c seems to be the only user
> >   of timecounter/cyclecounter in arch/, but I don't get what it does.
> > 
> >   Or is it better to stick with cnt32_to_63()?
> 
> cnt32_to_63() is a way to extend the 32 bit clocksource to a useable
> sched_clock() provider. One way or the other you have to extend the 32
> bits of the 1MHz counter to something bigger. The obvious way is to
> count the number of wraps and use this number as the upper 32 bits
> just like cnt32_to_63() does. You just have to make sure that the
> clocksource is read frequently enough to get all the wraps. A non-idle
> cpu will tick with HZ frequency and the clock will be read often
> enough, for an idle cpu the maximum sleep time needs to be limited.

Somehow I got the impression that cnt32_to_63() is old-style,
and using clocksource_cyc2ns() is new-style.  Essentially I wanted to
avoid the do_div() in sched_clock() (like e.g. in
arm/mach-versatile/core.c), and just using clocksource_cyc2ns()
looked simple and elegant.  Bummer that it's wrong.

It seems that arch/arm/plat-orion/time.c is about the only
one which gets it completely right according to your description.
I'll quote it here for discussion:

/*
 * Orion's sched_clock implementation. It has a resolution of
 * at least 7.5ns (133MHz TCLK) and a maximum value of 834 days.
 *
 * Because the hardware timer period is quite short (21 secs if
 * 200MHz TCLK) and because cnt32_to_63() needs to be called at
 * least once per half period to work properly, a kernel timer is
 * set up to ensure this requirement is always met.
 */
#define TCLK2NS_SCALE_FACTOR 8

static unsigned long tclk2ns_scale;

unsigned long long sched_clock(void)
{
	unsigned long long v = cnt32_to_63(0xffffffff - readl(TIMER0_VAL));
	return (v * tclk2ns_scale) >> TCLK2NS_SCALE_FACTOR;
}

static struct timer_list cnt32_to_63_keepwarm_timer;

static void cnt32_to_63_keepwarm(unsigned long data)
{
	mod_timer(&cnt32_to_63_keepwarm_timer, round_jiffies(jiffies + data));
	(void) sched_clock();
}

static void __init setup_sched_clock(unsigned long tclk)
{
	unsigned long long v;
	unsigned long data;

	v = NSEC_PER_SEC;
	v <<= TCLK2NS_SCALE_FACTOR;
	v += tclk/2;
	do_div(v, tclk);
	/*
	 * We want an even value to automatically clear the top bit
	 * returned by cnt32_to_63() without an additional run time
	 * instruction. So if the LSB is 1 then round it up.
	 */
	if (v & 1)
		v++;
	tclk2ns_scale = v;

	data = (0xffffffffUL / tclk / 2 - 2) * HZ;
	setup_timer(&cnt32_to_63_keepwarm_timer, cnt32_to_63_keepwarm, data);
	mod_timer(&cnt32_to_63_keepwarm_timer, round_jiffies(jiffies + data));
}

BTW, even though this uses TCLK2NS_SCALE_FACTOR of 8, the same file
uses a shift vaue of 20 for the orion_clksrc...

Thanks,
Johannes

next prev parent reply	other threads:[~2010-04-24 20:50 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-23 15:09 Q: sched_clock() vs. clocksource, how to implement correctly Johannes Stezenbach
2010-04-23 16:29 ` Martin Schwidefsky
2010-04-24 20:50   ` Johannes Stezenbach [this message]
2010-04-25 14:36     ` Peter Zijlstra
2010-04-26 13:18     ` Johannes Stezenbach
2010-04-26 23:48   ` Daniel Walker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100424205011.GA21632@sig21.net \
    --to=js@sig21.net \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).