public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* CONFIG_PRINTK_TIME woes
@ 2005-08-18 18:47 Luck, Tony
  2005-08-21  9:13 ` Andrew Morton
  0 siblings, 1 reply; 24+ messages in thread
From: Luck, Tony @ 2005-08-18 18:47 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jason Uhlenkott

It has been pointed out to me that ia64 doesn't boot
with CONFIG_PRINTK_TIME=y.  The issue is the call to
sched_clock() ... which on ia64 accesses some per-cpu
data to adjust for possible variations in processor
speed between different cpus.  Since the per-cpu page
is not set up for the first few printk() calls, we die.

Is this an issue on any other architecture?  Most versions
of sched_clock() seem to just scale jiffies into nanoseconds,
so I guess they don't.  s390, sparc64, x86 and x86_64 all
have more sophisticated versions but they don't appear to me
to have limitations on how early they might be called.

Possible solutions:

1) Fix ia64 version of sched_clock() to check whether
the per-cpu page hase been initialized before using it.

I don't like this solution as it will make sched_clock()
slower for its primary uses in kernel/sched.c ... none of
which can be called before the per-cpu area is initialized.


2) Add some test to the printk() path along the lines of:

  t = (can_call_sched_clock) ? sched_clock() : 0;

This is somewhat ugly ... perhaps too unpleasant for words in
that can_call_sched_clock is a per-cpu concept!

3) Make the printk() code get the time in some other way.

Using sched_clock() here seems wrong.  The ia64 version
has a big fat comment about not comparing the results of
two calls from different cpus ... but since printk() calls
can come from any cpu ... it seems likely that a user who
turns on PRINTK_TIME might subtract timestamps from two
lines to see how long it was between the messages.  This
could have significant error on a system that has been up
for a long time.

But I don't have a better suggestion from existing code.  I
assume that sched_clock() was picked as it is both fast and
lockless.

-Tony


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: CONFIG_PRINTK_TIME woes
  2005-08-18 18:47 CONFIG_PRINTK_TIME woes Luck, Tony
@ 2005-08-21  9:13 ` Andrew Morton
  2005-08-21  9:16   ` Andrew Morton
  2005-08-23  7:18   ` Tony Lindgren
  0 siblings, 2 replies; 24+ messages in thread
From: Andrew Morton @ 2005-08-21  9:13 UTC (permalink / raw)
  To: Luck, Tony; +Cc: linux-kernel, jasonuhl

"Luck, Tony" <tony.luck@intel.com> wrote:
>
> It has been pointed out to me that ia64 doesn't boot
> with CONFIG_PRINTK_TIME=y.  The issue is the call to
> sched_clock() ... which on ia64 accesses some per-cpu
> data to adjust for possible variations in processor
> speed between different cpus.  Since the per-cpu page
> is not set up for the first few printk() calls, we die.
> 
> Is this an issue on any other architecture?  Most versions
> of sched_clock() seem to just scale jiffies into nanoseconds,
> so I guess they don't.  s390, sparc64, x86 and x86_64 all
> have more sophisticated versions but they don't appear to me
> to have limitations on how early they might be called.
> 
> Possible solutions:
> 
> 1) Fix ia64 version of sched_clock() to check whether
> the per-cpu page hase been initialized before using it.
> 
> I don't like this solution as it will make sched_clock()
> slower for its primary uses in kernel/sched.c ... none of
> which can be called before the per-cpu area is initialized.
> 
> 
> 2) Add some test to the printk() path along the lines of:
> 
>   t = (can_call_sched_clock) ? sched_clock() : 0;
> 
> This is somewhat ugly ... perhaps too unpleasant for words in
> that can_call_sched_clock is a per-cpu concept!
> 
> 3) Make the printk() code get the time in some other way.
> 
> Using sched_clock() here seems wrong.  The ia64 version
> has a big fat comment about not comparing the results of
> two calls from different cpus ... but since printk() calls
> can come from any cpu ... it seems likely that a user who
> turns on PRINTK_TIME might subtract timestamps from two
> lines to see how long it was between the messages.  This
> could have significant error on a system that has been up
> for a long time.
> 
> But I don't have a better suggestion from existing code.
> I assume that sched_clock() was picked as it is both fast and
> lockless.

Yes, using sched_clock() there is a bit abusive.

It's not terribly performance-sensitive in printk().

How about we give each arch a printk_clock()?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: CONFIG_PRINTK_TIME woes
  2005-08-21  9:13 ` Andrew Morton
@ 2005-08-21  9:16   ` Andrew Morton
  2005-08-21  9:27     ` Nick Piggin
                       ` (2 more replies)
  2005-08-23  7:18   ` Tony Lindgren
  1 sibling, 3 replies; 24+ messages in thread
From: Andrew Morton @ 2005-08-21  9:16 UTC (permalink / raw)
  To: tony.luck, linux-kernel, jasonuhl

Andrew Morton <akpm@osdl.org> wrote:
>
> How about we give each arch a printk_clock()?

Which might be as simple as this..


--- devel/kernel/printk.c~printk_clock	2005-08-21 02:14:05.000000000 -0700
+++ devel-akpm/kernel/printk.c	2005-08-21 02:15:14.000000000 -0700
@@ -488,6 +488,11 @@ static int __init printk_time_setup(char
 
 __setup("time", printk_time_setup);
 
+__attribute__((weak)) unsigned long long printk_clock(void)
+{
+	return sched_clock();
+}
+
 /*
  * This is printk.  It can be called from any context.  We want it to work.
  * 
@@ -558,7 +563,7 @@ asmlinkage int vprintk(const char *fmt, 
 					loglev_char = default_message_loglevel
 						+ '0';
 				}
-				t = sched_clock();
+				t = printk_clock();
 				nanosec_rem = do_div(t, 1000000000);
 				tlen = sprintf(tbuf,
 						"<%c>[%5lu.%06lu] ",
_


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: CONFIG_PRINTK_TIME woes
  2005-08-21  9:16   ` Andrew Morton
@ 2005-08-21  9:27     ` Nick Piggin
  2005-08-21  9:32       ` Andrew Morton
  2005-08-22 17:42     ` tony.luck
  2005-08-24  0:36     ` Tim Bird
  2 siblings, 1 reply; 24+ messages in thread
From: Nick Piggin @ 2005-08-21  9:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: tony.luck, linux-kernel, jasonuhl

Andrew Morton wrote:
> Andrew Morton <akpm@osdl.org> wrote:
> 
>>How about we give each arch a printk_clock()?
> 
> 
> Which might be as simple as this..
> 
> 

sched_clock() shouldn't really be taken outside kernel/sched.c,
especially for things like this.

It actually has some fundamental problems even in its current
use in the scheduler (which need to be fixed). But basically it
is a very nasty interface with a rather tenuous relationship to
time.

Why not use something like do_gettimeofday? (or I'm sure one
of our time keepers can suggest the right thing to use).

Nick

-- 
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: CONFIG_PRINTK_TIME woes
  2005-08-21  9:27     ` Nick Piggin
@ 2005-08-21  9:32       ` Andrew Morton
  2005-08-21 11:25         ` Nick Piggin
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2005-08-21  9:32 UTC (permalink / raw)
  To: Nick Piggin; +Cc: tony.luck, linux-kernel, jasonuhl

Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>
> Andrew Morton wrote:
> > Andrew Morton <akpm@osdl.org> wrote:
> > 
> >>How about we give each arch a printk_clock()?
> > 
> > 
> > Which might be as simple as this..
> > 
> > 
> 
> sched_clock() shouldn't really be taken outside kernel/sched.c,
> especially for things like this.
> 
> It actually has some fundamental problems even in its current
> use in the scheduler (which need to be fixed). But basically it
> is a very nasty interface with a rather tenuous relationship to
> time.

yup.

> Why not use something like do_gettimeofday? (or I'm sure one
> of our time keepers can suggest the right thing to use).

do_gettimeofday() takes locks, so a) we can't do printk from inside it and
b) if you do a printk-from-interupt and the interrupted code was running
do_gettimeofday(), deadlock.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: CONFIG_PRINTK_TIME woes
  2005-08-21  9:32       ` Andrew Morton
@ 2005-08-21 11:25         ` Nick Piggin
  2005-08-21 18:01           ` Andrew Morton
  0 siblings, 1 reply; 24+ messages in thread
From: Nick Piggin @ 2005-08-21 11:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: tony.luck, linux-kernel, jasonuhl

Andrew Morton wrote:

> 
> yup.
> 
> 
>>Why not use something like do_gettimeofday? (or I'm sure one
>>of our time keepers can suggest the right thing to use).
> 
> 
> do_gettimeofday() takes locks, so a) we can't do printk from inside it and

Dang, yeah maybe this is the showstopper.

> b) if you do a printk-from-interupt and the interrupted code was running
> do_gettimeofday(), deadlock.
> 

What about just using jiffies, then?

Really, sched_clock() is very broken for this (I know you're
not arguing against that).

It can go backwards when called twice from the same CPU, and the
number returned by one CPU need have no correlation with that
returned by another.

However, I understand you probably just want something quick and
dirty for 2.6.13 and would be happy just if it isn't more broken
than before ;)

-- 
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: CONFIG_PRINTK_TIME woes
  2005-08-21 11:25         ` Nick Piggin
@ 2005-08-21 18:01           ` Andrew Morton
  2005-08-24  0:04             ` Tim Bird
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2005-08-21 18:01 UTC (permalink / raw)
  To: Nick Piggin; +Cc: tony.luck, linux-kernel, jasonuhl

Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>
> Andrew Morton wrote:
> 
> > 
> > yup.
> > 
> > 
> >>Why not use something like do_gettimeofday? (or I'm sure one
> >>of our time keepers can suggest the right thing to use).
> > 
> > 
> > do_gettimeofday() takes locks, so a) we can't do printk from inside it and
> 
> Dang, yeah maybe this is the showstopper.
> 
> > b) if you do a printk-from-interupt and the interrupted code was running
> > do_gettimeofday(), deadlock.
> > 
> 
> What about just using jiffies, then?
> 
> Really, sched_clock() is very broken for this (I know you're
> not arguing against that).
>
> It can go backwards when called twice from the same CPU, and the
> number returned by one CPU need have no correlation with that
> returned by another.

jiffies wouldn't have sufficient resolution for this application.  Bear in
mind that this is just a debugging thing - it's better to have good
resolution with occasional theoretical weirdness than to have poor
resolution plus super-consistency, IMO.

> However, I understand you probably just want something quick and
> dirty for 2.6.13 and would be happy just if it isn't more broken
> than before ;)

We're OK for 2.6.13, I think.  ia64 people will quickly learn to not turn
the option on.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: CONFIG_PRINTK_TIME woes
  2005-08-21  9:16   ` Andrew Morton
  2005-08-21  9:27     ` Nick Piggin
@ 2005-08-22 17:42     ` tony.luck
  2005-08-22 17:50       ` Andrew Morton
  2005-08-22 20:20       ` David S. Miller
  2005-08-24  0:36     ` Tim Bird
  2 siblings, 2 replies; 24+ messages in thread
From: tony.luck @ 2005-08-22 17:42 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, jasonuhl

Andrew Morton wrote:
>jiffies wouldn't have sufficient resolution for this application.  Bear in
>mind that this is just a debugging thing - it's better to have good
>resolution with occasional theoretical weirdness than to have poor
>resolution plus super-consistency, IMO.

The majority of architectures currently use a sched_clock() that
just scales jiffies (but that may just mean they haven't gotten around
to implementing anything better yet).

But how much resolution we need is a very good question, as it will
affect our choice of clock.

In many cases I've been presented with a console log which has a
few warning messages, and then an OOPS ... the initial question
I would like to answer is were the warnings printed "just before"
the oops ... or days earlier.  For these cases having a 1 second
resolution would be a huge bonus over no time information at all.
Jiffies would be good enough for almost all cases (IMO).

At the other extreme ... the current use of sched_clock() with
potentially nano-second resolution is way over the top.  Logging
to a serial console at 115200 a typical line from printk will take
2-4 milli-seconds to print ... so there would seem to be little
benefit from a sub-millisecond resolution (in fact at 250HZ jiffies
are on the ragged edge of being good enough).

If that isn't sufficient ... it should be possible to make a cut-down,
lockless version of do_gettimeofday that meets Andrew's suggestion
of good resolution with occasional theoretical weirdness.  But before
we go there ... I'd like to hear whether there are usage models that
really need better resolution than jiffies can provide?

-Tony

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: CONFIG_PRINTK_TIME woes
  2005-08-22 17:42     ` tony.luck
@ 2005-08-22 17:50       ` Andrew Morton
  2005-08-22 20:52         ` tony.luck
  2005-08-22 20:20       ` David S. Miller
  1 sibling, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2005-08-22 17:50 UTC (permalink / raw)
  To: tony.luck; +Cc: linux-kernel, jasonuhl

tony.luck@intel.com wrote:
>
> At the other extreme ... the current use of sched_clock() with
>  potentially nano-second resolution is way over the top.  Logging
>  to a serial console at 115200 a typical line from printk will take
>  2-4 milli-seconds to print ... so there would seem to be little
>  benefit from a sub-millisecond resolution (in fact at 250HZ jiffies
>  are on the ragged edge of being good enough).
> 
>  If that isn't sufficient ... it should be possible to make a cut-down,
>  lockless version of do_gettimeofday that meets Andrew's suggestion
>  of good resolution with occasional theoretical weirdness.  But before
>  we go there ... I'd like to hear whether there are usage models that
>  really need better resolution than jiffies can provide?

I think so.  Say you're debugging or performance tuning filesystem requests
and I/O completions, etc.  You disable the console with `dmesg -n', run the
test then do `dmesg -s 1000000 > foo'.  Having somewhat-sub-millisecond
timestamping in the resulting trace is required.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: CONFIG_PRINTK_TIME woes
  2005-08-22 17:42     ` tony.luck
  2005-08-22 17:50       ` Andrew Morton
@ 2005-08-22 20:20       ` David S. Miller
  2005-08-22 20:33         ` Jason Uhlenkott
  1 sibling, 1 reply; 24+ messages in thread
From: David S. Miller @ 2005-08-22 20:20 UTC (permalink / raw)
  To: tony.luck; +Cc: akpm, linux-kernel, jasonuhl

From: tony.luck@intel.com
Date: Mon, 22 Aug 2005 10:42:22 -0700

> At the other extreme ... the current use of sched_clock() with
> potentially nano-second resolution is way over the top.

Not really, when I'm debugging TCP events over gigabit
these timestamps are exceptionally handy.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: CONFIG_PRINTK_TIME woes
  2005-08-22 20:20       ` David S. Miller
@ 2005-08-22 20:33         ` Jason Uhlenkott
  2005-08-22 20:42           ` David S. Miller
  2005-08-22 21:10           ` tony.luck
  0 siblings, 2 replies; 24+ messages in thread
From: Jason Uhlenkott @ 2005-08-22 20:33 UTC (permalink / raw)
  To: David S. Miller; +Cc: tony.luck, akpm, linux-kernel

On Mon, Aug 22, 2005 at 01:20:52PM -0700, David S. Miller wrote:
> From: tony.luck@intel.com
> Date: Mon, 22 Aug 2005 10:42:22 -0700
> 
> > At the other extreme ... the current use of sched_clock() with
> > potentially nano-second resolution is way over the top.
> 
> Not really, when I'm debugging TCP events over gigabit
> these timestamps are exceptionally handy.

Yes, but how many of those figures are really significant?  I strongly
suspect that the overhead of printk() is high enough, even when we're
just spewing to the dmesg buffer and not the console, that we have a
lot more precision than accuracy at nanosecond resolution.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: CONFIG_PRINTK_TIME woes
  2005-08-22 20:33         ` Jason Uhlenkott
@ 2005-08-22 20:42           ` David S. Miller
  2005-08-22 21:15             ` Andrew Morton
  2005-08-23 15:01             ` Nick Piggin
  2005-08-22 21:10           ` tony.luck
  1 sibling, 2 replies; 24+ messages in thread
From: David S. Miller @ 2005-08-22 20:42 UTC (permalink / raw)
  To: jasonuhl; +Cc: tony.luck, akpm, linux-kernel

From: Jason Uhlenkott <jasonuhl@sgi.com>
Date: Mon, 22 Aug 2005 13:33:06 -0700

> On Mon, Aug 22, 2005 at 01:20:52PM -0700, David S. Miller wrote:
> > Not really, when I'm debugging TCP events over gigabit
> > these timestamps are exceptionally handy.
> 
> Yes, but how many of those figures are really significant?  I strongly
> suspect that the overhead of printk() is high enough, even when we're
> just spewing to the dmesg buffer and not the console, that we have a
> lot more precision than accuracy at nanosecond resolution.

I turn off VC logging, and I turn off disk sync'ing, so it goes
straight to the page cache.

I really do need sub-microsecond timings when I put a lot of
printk tracing into the stack.

This is a useful feature, please do not labotomize it just because
it's difficult to implement on ia64.  Just make a
"printk_get_timestamp_because_ia64_sucks()" interface or something
like that :-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: CONFIG_PRINTK_TIME woes
  2005-08-22 17:50       ` Andrew Morton
@ 2005-08-22 20:52         ` tony.luck
  0 siblings, 0 replies; 24+ messages in thread
From: tony.luck @ 2005-08-22 20:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, jasonuhl

>>  we go there ... I'd like to hear whether there are usage models that
>>  really need better resolution than jiffies can provide?
>
>I think so.  Say you're debugging or performance tuning filesystem requests
>and I/O completions, etc.  You disable the console with `dmesg -n', run the
>test then do `dmesg -s 1000000 > foo'.  Having somewhat-sub-millisecond
>timestamping in the resulting trace is required.

That sounds like using a hammer to pound in screws ... it works,
but it might be a lot better to go find a screwdriver.

Couldn't you use kprobes to collect timestamps of interesting events
in your filesystem instead of splashing printk() all over the place?

But perhaps this is heresy, real kernel programmers do all their
debugging with printk() :-)

-Tony

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: CONFIG_PRINTK_TIME woes
  2005-08-22 20:33         ` Jason Uhlenkott
  2005-08-22 20:42           ` David S. Miller
@ 2005-08-22 21:10           ` tony.luck
  1 sibling, 0 replies; 24+ messages in thread
From: tony.luck @ 2005-08-22 21:10 UTC (permalink / raw)
  To: jasonuhl, David S. Miller; +Cc: akpm, linux-kernel

>I turn off VC logging, and I turn off disk sync'ing, so it goes
>straight to the page cache.
>
>I really do need sub-microsecond timings when I put a lot of
>printk tracing into the stack.

Right now you only have microsecond timing. Although printk() gets
"nanosecond" resolution from sched_clock() it throws away the low
order bits when it outputs the ".%06lu ... nanosec_rem/1000"

>This is a useful feature, please do not labotomize it just because
>it's difficult to implement on ia64.  Just make a
>"printk_get_timestamp_because_ia64_sucks()" interface or something
>like that :-)

I'm just trying to assess how much work is needed at the moment. Both
you and Andrew have come up with scenarios where higher than jiffies
resolution is needed ... so it seems unlikely that I'll be allowed
to do much damage here.  Andrew's proposed:

 __attribute__((weak)) unsigned long long printk_clock(void)

patch would provide the hooks I need.

-Tony

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: CONFIG_PRINTK_TIME woes
  2005-08-22 20:42           ` David S. Miller
@ 2005-08-22 21:15             ` Andrew Morton
  2005-08-22 22:33               ` tony.luck
  2005-08-23 15:01             ` Nick Piggin
  1 sibling, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2005-08-22 21:15 UTC (permalink / raw)
  To: David S. Miller; +Cc: jasonuhl, tony.luck, linux-kernel

"David S. Miller" <davem@davemloft.net> wrote:
>
> I really do need sub-microsecond timings when I put a lot of
> printk tracing into the stack.

How fast is printk?  I haven't looked.

ie: if you do back-to-back printk's, what's the timestamp increment?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: CONFIG_PRINTK_TIME woes
  2005-08-22 21:15             ` Andrew Morton
@ 2005-08-22 22:33               ` tony.luck
  2005-08-22 22:38                 ` Andrew Morton
  0 siblings, 1 reply; 24+ messages in thread
From: tony.luck @ 2005-08-22 22:33 UTC (permalink / raw)
  To: Andrew Morton, David S. Miller; +Cc: jasonuhl, linux-kernel

>How fast is printk?  I haven't looked.
>
>ie: if you do back-to-back printk's, what's the timestamp increment?

On ia64 it looks like about 4-5 usec increment for back-to-back
printk (with no serial console configured, and dmesg -n to turn
off messages to the VGA console).

-Tony

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: CONFIG_PRINTK_TIME woes
  2005-08-22 22:33               ` tony.luck
@ 2005-08-22 22:38                 ` Andrew Morton
  2005-08-22 23:27                   ` tony.luck
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2005-08-22 22:38 UTC (permalink / raw)
  To: tony.luck; +Cc: davem, jasonuhl, linux-kernel

tony.luck@intel.com wrote:
>
> >How fast is printk?  I haven't looked.
> >
> >ie: if you do back-to-back printk's, what's the timestamp increment?
> 
> On ia64 it looks like about 4-5 usec increment for back-to-back
> printk (with no serial console configured, and dmesg -n to turn
> off messages to the VGA console).
> 

Ah, thanks.  Presumably it'll be considerably longer with %d's and %s's in
there.  But still, ~10 usecs is good resolution for I/O operations.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: CONFIG_PRINTK_TIME woes
  2005-08-22 22:38                 ` Andrew Morton
@ 2005-08-22 23:27                   ` tony.luck
  2005-08-23  1:52                     ` Paul Mackerras
  0 siblings, 1 reply; 24+ messages in thread
From: tony.luck @ 2005-08-22 23:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: davem, jasonuhl, linux-kernel

>Ah, thanks.  Presumably it'll be considerably longer with %d's and %s's in
>there.  But still, ~10 usecs is good resolution for I/O operations.

The variation in times from one call to the next seems to be
greater than the time to evaluate 4 "%d" arguments.

So we are back to how to get a timestamp in printk().

Earlier I said that it would be possible to provide a simplified
do_gettimeofday() call that met the no locks requirement.  I still
think this is possible, but most architectures would only get
jiffie resolution from this (only ia64, sparc64 and HPET users
have time interpolators registered).

-Tony

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: CONFIG_PRINTK_TIME woes
  2005-08-22 23:27                   ` tony.luck
@ 2005-08-23  1:52                     ` Paul Mackerras
  0 siblings, 0 replies; 24+ messages in thread
From: Paul Mackerras @ 2005-08-23  1:52 UTC (permalink / raw)
  To: tony.luck; +Cc: Andrew Morton, davem, jasonuhl, linux-kernel

tony.luck@intel.com writes:

> Earlier I said that it would be possible to provide a simplified
> do_gettimeofday() call that met the no locks requirement.  I still
> think this is possible, but most architectures would only get
> jiffie resolution from this (only ia64, sparc64 and HPET users
> have time interpolators registered).

ppc64 has a lockless do_gettimeofday() with microsecond resolution.

Paul.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: CONFIG_PRINTK_TIME woes
  2005-08-21  9:13 ` Andrew Morton
  2005-08-21  9:16   ` Andrew Morton
@ 2005-08-23  7:18   ` Tony Lindgren
  1 sibling, 0 replies; 24+ messages in thread
From: Tony Lindgren @ 2005-08-23  7:18 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Luck, Tony, linux-kernel, jasonuhl

* Andrew Morton <akpm@osdl.org> [050821 02:15]:
> "Luck, Tony" <tony.luck@intel.com> wrote:
> >
> > It has been pointed out to me that ia64 doesn't boot
> > with CONFIG_PRINTK_TIME=y.  The issue is the call to
> > sched_clock() ... which on ia64 accesses some per-cpu
> > data to adjust for possible variations in processor
> > speed between different cpus.  Since the per-cpu page
> > is not set up for the first few printk() calls, we die.
> > 
> > Is this an issue on any other architecture?  Most versions
> > of sched_clock() seem to just scale jiffies into nanoseconds,
> > so I guess they don't.  s390, sparc64, x86 and x86_64 all
> > have more sophisticated versions but they don't appear to me
> > to have limitations on how early they might be called.

CONFIG_PRINTK_TIME also has a problem on at least ARM OMAP where
the IO mapping to read the clock may not be initialized when
sched_clock() is called for the first time.

I'd hate to have to test for something for CONFIG_PRINTK_TIME
every time sched_clock() is being called.

The quick fix would seem to be to only allow CONFIG_PRINTK_TIME
from kernel cmdline to make it happen a bit later. So basically
make int printk_time = 0 until command line is evaluated.

Tony

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: CONFIG_PRINTK_TIME woes
@ 2005-08-23 14:07 Luck, Tony
  0 siblings, 0 replies; 24+ messages in thread
From: Luck, Tony @ 2005-08-23 14:07 UTC (permalink / raw)
  To: Tony Lindgren, Andrew Morton; +Cc: linux-kernel, jasonuhl

>I'd hate to have to test for something for CONFIG_PRINTK_TIME
>every time sched_clock() is being called.

Me too.

>The quick fix would seem to be to only allow CONFIG_PRINTK_TIME
>from kernel cmdline to make it happen a bit later. So basically
>make int printk_time = 0 until command line is evaluated.

Good thought, but this won't work for ia64 in the hot-plug cpu case.
There are a couple of printk() calls by new cpus as they boot before
they have set-up their per-cpu areas.  So there is no global state
that can be checked to decide whether it is safe for printk() to
call sched_clock().

-Tony

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: CONFIG_PRINTK_TIME woes
  2005-08-22 20:42           ` David S. Miller
  2005-08-22 21:15             ` Andrew Morton
@ 2005-08-23 15:01             ` Nick Piggin
  1 sibling, 0 replies; 24+ messages in thread
From: Nick Piggin @ 2005-08-23 15:01 UTC (permalink / raw)
  To: David S. Miller; +Cc: jasonuhl, tony.luck, akpm, linux-kernel

David S. Miller wrote:

> This is a useful feature, please do not labotomize it just because
> it's difficult to implement on ia64.  Just make a
> "printk_get_timestamp_because_ia64_sucks()" interface or something
> like that :-)

I was a bit unclear when I raised this issue. It is not just an
ia64 problem.

The sched_clock() interface is allowed to return wildly different
values depending on which CPU it is called from, and currently
has fundamental problems at least on i386 where it can go fowards
and backwards arbitrary amounts of time (due to frequency scaling,
if I understand correctly), and also needn't be exactly nanoseconds
at the best of times.

The interface is like this so it can be per-cpu and lockless and
as fast as possible for the scheduler heuristics (which aren't too
picky).

I just don't want its usage spreading outside kernel/sched.c if we
can help it. Pragmatically it sounds like the best thing we have
for printk at this time, however I hope we can come up with
something slightly more appropriate even if it ends up being slower.

Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: CONFIG_PRINTK_TIME woes
  2005-08-21 18:01           ` Andrew Morton
@ 2005-08-24  0:04             ` Tim Bird
  0 siblings, 0 replies; 24+ messages in thread
From: Tim Bird @ 2005-08-24  0:04 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Nick Piggin, tony.luck, linux-kernel, jasonuhl

Andrew Morton wrote:
> Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>>
>>What about just using jiffies, then?
>>
>>Really, sched_clock() is very broken for this (I know you're
>>not arguing against that).
>>
>>It can go backwards when called twice from the same CPU, and the
>>number returned by one CPU need have no correlation with that
>>returned by another.
> 
> jiffies wouldn't have sufficient resolution for this application.  Bear
> in
> mind that this is just a debugging thing - it's better to have good
> resolution with occasional theoretical weirdness than to have poor
> resolution plus super-consistency, IMO.

Andrew's assessment is correct for my usage.

I use this for detailed info on bootup times, for tuning embedded
configurations.  Someone has already noted that the current code
truncates the value to microseconds (so a nanosecond-capable clock
interface is overkill for the current code).  Microseconds seems
to be a pretty useful precision for the work I'm doing.

Note that on many embedded platforms, a jiffy is 10 milliseconds,
which is far too low resolution for my purposes.  Also, not
to be totally egocentric, but most embedded platforms don't
have SMP (currently), so the SMP weirdness has never bothered
me. Even on SMP systems, the bootup code, which is what
I'm measuring, is mostly UP.

Just my 2 cents.

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Electronics
=============================


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: CONFIG_PRINTK_TIME woes
  2005-08-21  9:16   ` Andrew Morton
  2005-08-21  9:27     ` Nick Piggin
  2005-08-22 17:42     ` tony.luck
@ 2005-08-24  0:36     ` Tim Bird
  2 siblings, 0 replies; 24+ messages in thread
From: Tim Bird @ 2005-08-24  0:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: tony.luck, linux-kernel, jasonuhl

Andrew Morton wrote:
> Andrew Morton <akpm@osdl.org> wrote:
> 
>>How about we give each arch a printk_clock()?
> 
> 
> Which might be as simple as this..
> 
> 
> --- devel/kernel/printk.c~printk_clock	2005-08-21 02:14:05.000000000
> -0700
> +++ devel-akpm/kernel/printk.c	2005-08-21 02:15:14.000000000 -0700
> @@ -488,6 +488,11 @@ static int __init printk_time_setup(char
>  
>  __setup("time", printk_time_setup);
>  
> +__attribute__((weak)) unsigned long long printk_clock(void)
> +{
> +	return sched_clock();
> +}
> +

It might be good to call it "instrumentation_clock", or "tracing_clock"
or something similar.  Other tracing systems have similar requirements
and I think this would be a generally useful thing to have.

At CELF, we investigated different in-kernel timing interfaces for
tracing, and came up with a proposal for our own API.  See
http://tree.celinuxforum.org/pubwiki/moin.cgi/InstrumentationAPI
This was over a year ago and the information is somewhat dated.
We ended up not pursuing this, and just using sched_clock().

I've been advocating that CE vendors beef up their sched_clock()
implementations to be better than jiffy resolution, but if this
is an abuse, I'd rather steer developers to provide platform
support for an API that is dedicated to this purpose.

It sounds like the requirements are:
 - 64-bit value, sub-microsecond resolution
   - although for tracing we prefer to store only 32-bits
   and microseconds seems to give a decent balance between
   precision and rollover avoidance.
 - lockless
 - (for me) available VERY early
 (On some platforms, we have the firmware set up the clock so
 it can be read immediately from start_kernel() on)
 - (not sure how important is timestamp coherency between
 processors)

My apologies in advance for the following code, which is
sure to make your eyes bleed ;-), but here's some code
from a tracing system that supports several different
embedded boards.  This is from kft.c:

(I'm NOT advocating adopting this code - I'm just posting
for reference for the API it provides, and to show a couple
of different implementation approaches.  This particular code
uses an unsigned long (assumed to be 32-bits), and tries
to defer overflows with funky math.)

+/* control whether a generic or custom clock routine is used */
+#if !defined(CONFIG_MIPS) && !defined(CONFIG_SH)
+#define GENERIC_KFTREADCLOCK 1
+#endif
+
+#ifdef GENERIC_KFTREADCLOCK
+/*
+ * Define a genefic kft_readclock routine.
+ * This should work well enough for platforms where sched_clock()
+ * gives good (sub-microsecond) precision.
+ *
+ * There are valid reasons to use other routines, including:
+ *  - when using kft for boot timings
+ *    - on most platforms, sched_clock() does not work correctly until
+ *    after time_init()
+ *  - reduced overhead for obtaining a microsecond value
+ *    (This may be incorrect, since at most this adds one
+ *    64-bit-by-32-bit divide, in addition to the shift that
+ *    is inside sched_clock(). KFT does enough other stuff
+ *    that this one divide is probably not a major factor
+ *    in KFT overhead.)
+ */
+static inline unsigned long __noinstrument kft_readclock(void)
+{
+	unsigned long long t;
+	
+	t = sched_clock();
+	/* convert to microseconds */
+	do_div(t,1000);
+	return (unsigned long)t;
+}
+
+static inline unsigned long __noinstrument kft_clock_to_usecs(unsigned long clock)
+{
+	return clock;
+}
+
+#endif /* GENERIC_KFTREADCLOCK - non-MIPS, non-SH */
+
+#ifndef GENERIC_KFTREADCLOCK
+/*
+ * Use arch-specific kft_readclock() and kft_clock_to_usecs() routines
+ *
+ * First - define some platform-specific constants
+ *
+ * !! If using a non-generic KFT readclock, you need
+ * to set the following constants for your machine!!
+ *
+ * CLOCK_FREQ is a hardcoded value for the frequency of
+ * whatever clock you are using for kft_readclock()
+ * It would be nice to use a probed clock freq (cpu_hz)
+ * here, but it  isn't set early enough for some boot
+ * measurements.
+ * Hint: for x86, boot once and look at /proc/cpuinfo
+ *
+ * CLOCK_SHIFT is used to bring the clock frequency into
+ * a manageable range.  For my 3 GHz machine, I decided
+ * to divide the cpu cycle clock by 8. This throws
+ * away some clock precision, but makes some of the
+ * other math faster and helps us stay in 32 bits.
+ */
+
+#ifdef CONFIG_X86_TSC
+// Tim's old laptop
+//#define CLOCK_FREQ 645206000ULL
+// Tim's HP desktop
+#define CLOCK_FREQ 2992332000ULL
+#define CLOCK_SHIFT	3
+#endif /* CONFIG_X86_TSC */
+
+#ifdef CONFIG_PPC32
+// Ebony board
+#define CLOCK_FREQ 400000000ULL
+#define CLOCK_SHIFT	3
+#endif /* CONFIG_PPC32 */
+
+#ifdef CONFIG_CPU_SH4
+#define CLOCK_FREQ 15000000ULL	/* =P/4 */
+#define CLOCK_SHIFT	0
+#endif /* CONFIG_CPU_SH4 */
+
+#ifdef CONFIG_MIPS
+/* tx4938 */
+#define CLOCK_FREQ ( 300000000ULL / 2 )
+#define CLOCK_SHIFT	3
+#endif /* CONFIG_MIPS */
+
+
+
+#ifdef CONFIG_X86_TSC
+#include <asm/time.h>	/* for rdtscll macro */
+static inline unsigned long kft_readclock(void)
+{
+	unsigned long long ticks;
+
+	rdtscll(ticks);
+	return (unsigned long)((ticks>>CLOCK_SHIFT) & 0xffffffff);
+}
+#endif /* CONFIG_X86_TSC */
+
+
+#ifdef CONFIG_PPC32
+#include <asm/time.h>	/* for get_tbu macro */
+/* copied from sched_clock for ppc */
+static inline unsigned long kft_readclock(void)
+{
+	unsigned long lo, hi, hi2;
+	unsigned long long ticks;
+
+	do {
+		hi = get_tbu();
+		lo = get_tbl();
+		hi2 = get_tbu();
+	} while (hi2 != hi);
+	ticks = ((unsigned long long) hi << 32) | lo;
+	return (unsigned long)((ticks>>CLOCK_SHIFT) & 0xffffffff);
+}
+#endif /* CONFIG_PPC32 */
+
+#ifdef CONFIG_CPU_SH4
+/*
+ * In advance, start Timer Unit4(TMU4)
+ * ex.
+ *  *TMU4_TCR = 0x0000;
+ *  *TMU4_TCOR = 0;
+ *  *TMU4_TCNT = 0;
+ *  *TMU_TSTR2 = (*TMU_TSTR2|0x02);
+ */
+#define TMU4_TCNT	((volatile unsigned long *)0xFE100018)
+
+static inline unsigned long kft_readclock(void)
+{
+	return (-(*TMU4_TCNT))>>CLOCK_SHIFT;
+}
+#endif /* CONFIG_CPU_SH4 */
+
+#ifdef CONFIG_MIPS
+static inline unsigned long kft_readclock(void)
+{
+	return (unsigned long)read_c0_count();
+}
+#endif /* CONFIG_MIPS */
+
+/*
+ * Now define a generic routine to convert from clock tics to usecs.
+ *
+ * This weird scaling factor makes it possible to use shifts and a
+ * single 32-bit divide, instead of more expensive math,
+ * for the conversion to microseconds.
+ */
+#define CLOCK_SCALE ((((CLOCK_FREQ*1024*1024)/1000000))>>CLOCK_SHIFT)
+
+static inline unsigned long kft_clock_to_usecs(unsigned long clock)
+{
+	/* math to stay in 32 bits. Try to avoid over and underflows */
+	if (clock<4096)
+		return (clock<<20)/CLOCK_SCALE;
+	if (clock<(4096<<5))
+		return (clock<<15)/(CLOCK_SCALE>>5);
+	if (clock<(4096<<10))
+		return (clock<<10)/(CLOCK_SCALE>>10);
+	if (clock<(4096<<15))
+		return (clock<<5)/(CLOCK_SCALE>>15);
+	else
+		return clock/(CLOCK_SCALE>>20);
+}
+
+#endif /* not GENERIC_KFT_READCLOCK */
+
+#if CONFIG_KFT_CLOCK_SCALE
+
+extern void set_cyc2ns_scale(unsigned long cpu_mhz);
+
+/*
+ * Do whatever is required to prepare for calling sched_clock very
+ * early in the boot sequence.
+ */
+extern void setup_early_kft_clock(void)
+{
+	set_cyc2ns_scale(CONFIG_KFT_CLOCK_SCALE);
+}
+#endif /* CONFIG_KFT_CLOCK_SCALE */
+

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Electronics
=============================


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2005-08-24  0:36 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-18 18:47 CONFIG_PRINTK_TIME woes Luck, Tony
2005-08-21  9:13 ` Andrew Morton
2005-08-21  9:16   ` Andrew Morton
2005-08-21  9:27     ` Nick Piggin
2005-08-21  9:32       ` Andrew Morton
2005-08-21 11:25         ` Nick Piggin
2005-08-21 18:01           ` Andrew Morton
2005-08-24  0:04             ` Tim Bird
2005-08-22 17:42     ` tony.luck
2005-08-22 17:50       ` Andrew Morton
2005-08-22 20:52         ` tony.luck
2005-08-22 20:20       ` David S. Miller
2005-08-22 20:33         ` Jason Uhlenkott
2005-08-22 20:42           ` David S. Miller
2005-08-22 21:15             ` Andrew Morton
2005-08-22 22:33               ` tony.luck
2005-08-22 22:38                 ` Andrew Morton
2005-08-22 23:27                   ` tony.luck
2005-08-23  1:52                     ` Paul Mackerras
2005-08-23 15:01             ` Nick Piggin
2005-08-22 21:10           ` tony.luck
2005-08-24  0:36     ` Tim Bird
2005-08-23  7:18   ` Tony Lindgren
  -- strict thread matches above, loose matches on Subject: below --
2005-08-23 14:07 Luck, Tony

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox