* CONFIG_PRINTK_TIME woes @ 2005-08-18 18:47 Luck, Tony 2005-08-21 9:13 ` Andrew Morton 0 siblings, 1 reply; 24+ messages in thread From: Luck, Tony @ 2005-08-18 18:47 UTC (permalink / raw) To: linux-kernel; +Cc: Jason Uhlenkott It has been pointed out to me that ia64 doesn't boot with CONFIG_PRINTK_TIME=y. The issue is the call to sched_clock() ... which on ia64 accesses some per-cpu data to adjust for possible variations in processor speed between different cpus. Since the per-cpu page is not set up for the first few printk() calls, we die. Is this an issue on any other architecture? Most versions of sched_clock() seem to just scale jiffies into nanoseconds, so I guess they don't. s390, sparc64, x86 and x86_64 all have more sophisticated versions but they don't appear to me to have limitations on how early they might be called. Possible solutions: 1) Fix ia64 version of sched_clock() to check whether the per-cpu page hase been initialized before using it. I don't like this solution as it will make sched_clock() slower for its primary uses in kernel/sched.c ... none of which can be called before the per-cpu area is initialized. 2) Add some test to the printk() path along the lines of: t = (can_call_sched_clock) ? sched_clock() : 0; This is somewhat ugly ... perhaps too unpleasant for words in that can_call_sched_clock is a per-cpu concept! 3) Make the printk() code get the time in some other way. Using sched_clock() here seems wrong. The ia64 version has a big fat comment about not comparing the results of two calls from different cpus ... but since printk() calls can come from any cpu ... it seems likely that a user who turns on PRINTK_TIME might subtract timestamps from two lines to see how long it was between the messages. This could have significant error on a system that has been up for a long time. But I don't have a better suggestion from existing code. I assume that sched_clock() was picked as it is both fast and lockless. -Tony ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: CONFIG_PRINTK_TIME woes 2005-08-18 18:47 CONFIG_PRINTK_TIME woes Luck, Tony @ 2005-08-21 9:13 ` Andrew Morton 2005-08-21 9:16 ` Andrew Morton 2005-08-23 7:18 ` Tony Lindgren 0 siblings, 2 replies; 24+ messages in thread From: Andrew Morton @ 2005-08-21 9:13 UTC (permalink / raw) To: Luck, Tony; +Cc: linux-kernel, jasonuhl "Luck, Tony" <tony.luck@intel.com> wrote: > > It has been pointed out to me that ia64 doesn't boot > with CONFIG_PRINTK_TIME=y. The issue is the call to > sched_clock() ... which on ia64 accesses some per-cpu > data to adjust for possible variations in processor > speed between different cpus. Since the per-cpu page > is not set up for the first few printk() calls, we die. > > Is this an issue on any other architecture? Most versions > of sched_clock() seem to just scale jiffies into nanoseconds, > so I guess they don't. s390, sparc64, x86 and x86_64 all > have more sophisticated versions but they don't appear to me > to have limitations on how early they might be called. > > Possible solutions: > > 1) Fix ia64 version of sched_clock() to check whether > the per-cpu page hase been initialized before using it. > > I don't like this solution as it will make sched_clock() > slower for its primary uses in kernel/sched.c ... none of > which can be called before the per-cpu area is initialized. > > > 2) Add some test to the printk() path along the lines of: > > t = (can_call_sched_clock) ? sched_clock() : 0; > > This is somewhat ugly ... perhaps too unpleasant for words in > that can_call_sched_clock is a per-cpu concept! > > 3) Make the printk() code get the time in some other way. > > Using sched_clock() here seems wrong. The ia64 version > has a big fat comment about not comparing the results of > two calls from different cpus ... but since printk() calls > can come from any cpu ... it seems likely that a user who > turns on PRINTK_TIME might subtract timestamps from two > lines to see how long it was between the messages. This > could have significant error on a system that has been up > for a long time. > > But I don't have a better suggestion from existing code. > I assume that sched_clock() was picked as it is both fast and > lockless. Yes, using sched_clock() there is a bit abusive. It's not terribly performance-sensitive in printk(). How about we give each arch a printk_clock()? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: CONFIG_PRINTK_TIME woes 2005-08-21 9:13 ` Andrew Morton @ 2005-08-21 9:16 ` Andrew Morton 2005-08-21 9:27 ` Nick Piggin ` (2 more replies) 2005-08-23 7:18 ` Tony Lindgren 1 sibling, 3 replies; 24+ messages in thread From: Andrew Morton @ 2005-08-21 9:16 UTC (permalink / raw) To: tony.luck, linux-kernel, jasonuhl Andrew Morton <akpm@osdl.org> wrote: > > How about we give each arch a printk_clock()? Which might be as simple as this.. --- devel/kernel/printk.c~printk_clock 2005-08-21 02:14:05.000000000 -0700 +++ devel-akpm/kernel/printk.c 2005-08-21 02:15:14.000000000 -0700 @@ -488,6 +488,11 @@ static int __init printk_time_setup(char __setup("time", printk_time_setup); +__attribute__((weak)) unsigned long long printk_clock(void) +{ + return sched_clock(); +} + /* * This is printk. It can be called from any context. We want it to work. * @@ -558,7 +563,7 @@ asmlinkage int vprintk(const char *fmt, loglev_char = default_message_loglevel + '0'; } - t = sched_clock(); + t = printk_clock(); nanosec_rem = do_div(t, 1000000000); tlen = sprintf(tbuf, "<%c>[%5lu.%06lu] ", _ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: CONFIG_PRINTK_TIME woes 2005-08-21 9:16 ` Andrew Morton @ 2005-08-21 9:27 ` Nick Piggin 2005-08-21 9:32 ` Andrew Morton 2005-08-22 17:42 ` tony.luck 2005-08-24 0:36 ` Tim Bird 2 siblings, 1 reply; 24+ messages in thread From: Nick Piggin @ 2005-08-21 9:27 UTC (permalink / raw) To: Andrew Morton; +Cc: tony.luck, linux-kernel, jasonuhl Andrew Morton wrote: > Andrew Morton <akpm@osdl.org> wrote: > >>How about we give each arch a printk_clock()? > > > Which might be as simple as this.. > > sched_clock() shouldn't really be taken outside kernel/sched.c, especially for things like this. It actually has some fundamental problems even in its current use in the scheduler (which need to be fixed). But basically it is a very nasty interface with a rather tenuous relationship to time. Why not use something like do_gettimeofday? (or I'm sure one of our time keepers can suggest the right thing to use). Nick -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: CONFIG_PRINTK_TIME woes 2005-08-21 9:27 ` Nick Piggin @ 2005-08-21 9:32 ` Andrew Morton 2005-08-21 11:25 ` Nick Piggin 0 siblings, 1 reply; 24+ messages in thread From: Andrew Morton @ 2005-08-21 9:32 UTC (permalink / raw) To: Nick Piggin; +Cc: tony.luck, linux-kernel, jasonuhl Nick Piggin <nickpiggin@yahoo.com.au> wrote: > > Andrew Morton wrote: > > Andrew Morton <akpm@osdl.org> wrote: > > > >>How about we give each arch a printk_clock()? > > > > > > Which might be as simple as this.. > > > > > > sched_clock() shouldn't really be taken outside kernel/sched.c, > especially for things like this. > > It actually has some fundamental problems even in its current > use in the scheduler (which need to be fixed). But basically it > is a very nasty interface with a rather tenuous relationship to > time. yup. > Why not use something like do_gettimeofday? (or I'm sure one > of our time keepers can suggest the right thing to use). do_gettimeofday() takes locks, so a) we can't do printk from inside it and b) if you do a printk-from-interupt and the interrupted code was running do_gettimeofday(), deadlock. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: CONFIG_PRINTK_TIME woes 2005-08-21 9:32 ` Andrew Morton @ 2005-08-21 11:25 ` Nick Piggin 2005-08-21 18:01 ` Andrew Morton 0 siblings, 1 reply; 24+ messages in thread From: Nick Piggin @ 2005-08-21 11:25 UTC (permalink / raw) To: Andrew Morton; +Cc: tony.luck, linux-kernel, jasonuhl Andrew Morton wrote: > > yup. > > >>Why not use something like do_gettimeofday? (or I'm sure one >>of our time keepers can suggest the right thing to use). > > > do_gettimeofday() takes locks, so a) we can't do printk from inside it and Dang, yeah maybe this is the showstopper. > b) if you do a printk-from-interupt and the interrupted code was running > do_gettimeofday(), deadlock. > What about just using jiffies, then? Really, sched_clock() is very broken for this (I know you're not arguing against that). It can go backwards when called twice from the same CPU, and the number returned by one CPU need have no correlation with that returned by another. However, I understand you probably just want something quick and dirty for 2.6.13 and would be happy just if it isn't more broken than before ;) -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: CONFIG_PRINTK_TIME woes 2005-08-21 11:25 ` Nick Piggin @ 2005-08-21 18:01 ` Andrew Morton 2005-08-24 0:04 ` Tim Bird 0 siblings, 1 reply; 24+ messages in thread From: Andrew Morton @ 2005-08-21 18:01 UTC (permalink / raw) To: Nick Piggin; +Cc: tony.luck, linux-kernel, jasonuhl Nick Piggin <nickpiggin@yahoo.com.au> wrote: > > Andrew Morton wrote: > > > > > yup. > > > > > >>Why not use something like do_gettimeofday? (or I'm sure one > >>of our time keepers can suggest the right thing to use). > > > > > > do_gettimeofday() takes locks, so a) we can't do printk from inside it and > > Dang, yeah maybe this is the showstopper. > > > b) if you do a printk-from-interupt and the interrupted code was running > > do_gettimeofday(), deadlock. > > > > What about just using jiffies, then? > > Really, sched_clock() is very broken for this (I know you're > not arguing against that). > > It can go backwards when called twice from the same CPU, and the > number returned by one CPU need have no correlation with that > returned by another. jiffies wouldn't have sufficient resolution for this application. Bear in mind that this is just a debugging thing - it's better to have good resolution with occasional theoretical weirdness than to have poor resolution plus super-consistency, IMO. > However, I understand you probably just want something quick and > dirty for 2.6.13 and would be happy just if it isn't more broken > than before ;) We're OK for 2.6.13, I think. ia64 people will quickly learn to not turn the option on. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: CONFIG_PRINTK_TIME woes 2005-08-21 18:01 ` Andrew Morton @ 2005-08-24 0:04 ` Tim Bird 0 siblings, 0 replies; 24+ messages in thread From: Tim Bird @ 2005-08-24 0:04 UTC (permalink / raw) To: Andrew Morton; +Cc: Nick Piggin, tony.luck, linux-kernel, jasonuhl Andrew Morton wrote: > Nick Piggin <nickpiggin@yahoo.com.au> wrote: >> >>What about just using jiffies, then? >> >>Really, sched_clock() is very broken for this (I know you're >>not arguing against that). >> >>It can go backwards when called twice from the same CPU, and the >>number returned by one CPU need have no correlation with that >>returned by another. > > jiffies wouldn't have sufficient resolution for this application. Bear > in > mind that this is just a debugging thing - it's better to have good > resolution with occasional theoretical weirdness than to have poor > resolution plus super-consistency, IMO. Andrew's assessment is correct for my usage. I use this for detailed info on bootup times, for tuning embedded configurations. Someone has already noted that the current code truncates the value to microseconds (so a nanosecond-capable clock interface is overkill for the current code). Microseconds seems to be a pretty useful precision for the work I'm doing. Note that on many embedded platforms, a jiffy is 10 milliseconds, which is far too low resolution for my purposes. Also, not to be totally egocentric, but most embedded platforms don't have SMP (currently), so the SMP weirdness has never bothered me. Even on SMP systems, the bootup code, which is what I'm measuring, is mostly UP. Just my 2 cents. ============================= Tim Bird Architecture Group Chair, CE Linux Forum Senior Staff Engineer, Sony Electronics ============================= ^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: CONFIG_PRINTK_TIME woes 2005-08-21 9:16 ` Andrew Morton 2005-08-21 9:27 ` Nick Piggin @ 2005-08-22 17:42 ` tony.luck 2005-08-22 17:50 ` Andrew Morton 2005-08-22 20:20 ` David S. Miller 2005-08-24 0:36 ` Tim Bird 2 siblings, 2 replies; 24+ messages in thread From: tony.luck @ 2005-08-22 17:42 UTC (permalink / raw) To: akpm; +Cc: linux-kernel, jasonuhl Andrew Morton wrote: >jiffies wouldn't have sufficient resolution for this application. Bear in >mind that this is just a debugging thing - it's better to have good >resolution with occasional theoretical weirdness than to have poor >resolution plus super-consistency, IMO. The majority of architectures currently use a sched_clock() that just scales jiffies (but that may just mean they haven't gotten around to implementing anything better yet). But how much resolution we need is a very good question, as it will affect our choice of clock. In many cases I've been presented with a console log which has a few warning messages, and then an OOPS ... the initial question I would like to answer is were the warnings printed "just before" the oops ... or days earlier. For these cases having a 1 second resolution would be a huge bonus over no time information at all. Jiffies would be good enough for almost all cases (IMO). At the other extreme ... the current use of sched_clock() with potentially nano-second resolution is way over the top. Logging to a serial console at 115200 a typical line from printk will take 2-4 milli-seconds to print ... so there would seem to be little benefit from a sub-millisecond resolution (in fact at 250HZ jiffies are on the ragged edge of being good enough). If that isn't sufficient ... it should be possible to make a cut-down, lockless version of do_gettimeofday that meets Andrew's suggestion of good resolution with occasional theoretical weirdness. But before we go there ... I'd like to hear whether there are usage models that really need better resolution than jiffies can provide? -Tony ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: CONFIG_PRINTK_TIME woes 2005-08-22 17:42 ` tony.luck @ 2005-08-22 17:50 ` Andrew Morton 2005-08-22 20:52 ` tony.luck 2005-08-22 20:20 ` David S. Miller 1 sibling, 1 reply; 24+ messages in thread From: Andrew Morton @ 2005-08-22 17:50 UTC (permalink / raw) To: tony.luck; +Cc: linux-kernel, jasonuhl tony.luck@intel.com wrote: > > At the other extreme ... the current use of sched_clock() with > potentially nano-second resolution is way over the top. Logging > to a serial console at 115200 a typical line from printk will take > 2-4 milli-seconds to print ... so there would seem to be little > benefit from a sub-millisecond resolution (in fact at 250HZ jiffies > are on the ragged edge of being good enough). > > If that isn't sufficient ... it should be possible to make a cut-down, > lockless version of do_gettimeofday that meets Andrew's suggestion > of good resolution with occasional theoretical weirdness. But before > we go there ... I'd like to hear whether there are usage models that > really need better resolution than jiffies can provide? I think so. Say you're debugging or performance tuning filesystem requests and I/O completions, etc. You disable the console with `dmesg -n', run the test then do `dmesg -s 1000000 > foo'. Having somewhat-sub-millisecond timestamping in the resulting trace is required. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: CONFIG_PRINTK_TIME woes 2005-08-22 17:50 ` Andrew Morton @ 2005-08-22 20:52 ` tony.luck 0 siblings, 0 replies; 24+ messages in thread From: tony.luck @ 2005-08-22 20:52 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, jasonuhl >> we go there ... I'd like to hear whether there are usage models that >> really need better resolution than jiffies can provide? > >I think so. Say you're debugging or performance tuning filesystem requests >and I/O completions, etc. You disable the console with `dmesg -n', run the >test then do `dmesg -s 1000000 > foo'. Having somewhat-sub-millisecond >timestamping in the resulting trace is required. That sounds like using a hammer to pound in screws ... it works, but it might be a lot better to go find a screwdriver. Couldn't you use kprobes to collect timestamps of interesting events in your filesystem instead of splashing printk() all over the place? But perhaps this is heresy, real kernel programmers do all their debugging with printk() :-) -Tony ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: CONFIG_PRINTK_TIME woes 2005-08-22 17:42 ` tony.luck 2005-08-22 17:50 ` Andrew Morton @ 2005-08-22 20:20 ` David S. Miller 2005-08-22 20:33 ` Jason Uhlenkott 1 sibling, 1 reply; 24+ messages in thread From: David S. Miller @ 2005-08-22 20:20 UTC (permalink / raw) To: tony.luck; +Cc: akpm, linux-kernel, jasonuhl From: tony.luck@intel.com Date: Mon, 22 Aug 2005 10:42:22 -0700 > At the other extreme ... the current use of sched_clock() with > potentially nano-second resolution is way over the top. Not really, when I'm debugging TCP events over gigabit these timestamps are exceptionally handy. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: CONFIG_PRINTK_TIME woes 2005-08-22 20:20 ` David S. Miller @ 2005-08-22 20:33 ` Jason Uhlenkott 2005-08-22 20:42 ` David S. Miller 2005-08-22 21:10 ` tony.luck 0 siblings, 2 replies; 24+ messages in thread From: Jason Uhlenkott @ 2005-08-22 20:33 UTC (permalink / raw) To: David S. Miller; +Cc: tony.luck, akpm, linux-kernel On Mon, Aug 22, 2005 at 01:20:52PM -0700, David S. Miller wrote: > From: tony.luck@intel.com > Date: Mon, 22 Aug 2005 10:42:22 -0700 > > > At the other extreme ... the current use of sched_clock() with > > potentially nano-second resolution is way over the top. > > Not really, when I'm debugging TCP events over gigabit > these timestamps are exceptionally handy. Yes, but how many of those figures are really significant? I strongly suspect that the overhead of printk() is high enough, even when we're just spewing to the dmesg buffer and not the console, that we have a lot more precision than accuracy at nanosecond resolution. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: CONFIG_PRINTK_TIME woes 2005-08-22 20:33 ` Jason Uhlenkott @ 2005-08-22 20:42 ` David S. Miller 2005-08-22 21:15 ` Andrew Morton 2005-08-23 15:01 ` Nick Piggin 2005-08-22 21:10 ` tony.luck 1 sibling, 2 replies; 24+ messages in thread From: David S. Miller @ 2005-08-22 20:42 UTC (permalink / raw) To: jasonuhl; +Cc: tony.luck, akpm, linux-kernel From: Jason Uhlenkott <jasonuhl@sgi.com> Date: Mon, 22 Aug 2005 13:33:06 -0700 > On Mon, Aug 22, 2005 at 01:20:52PM -0700, David S. Miller wrote: > > Not really, when I'm debugging TCP events over gigabit > > these timestamps are exceptionally handy. > > Yes, but how many of those figures are really significant? I strongly > suspect that the overhead of printk() is high enough, even when we're > just spewing to the dmesg buffer and not the console, that we have a > lot more precision than accuracy at nanosecond resolution. I turn off VC logging, and I turn off disk sync'ing, so it goes straight to the page cache. I really do need sub-microsecond timings when I put a lot of printk tracing into the stack. This is a useful feature, please do not labotomize it just because it's difficult to implement on ia64. Just make a "printk_get_timestamp_because_ia64_sucks()" interface or something like that :-) ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: CONFIG_PRINTK_TIME woes 2005-08-22 20:42 ` David S. Miller @ 2005-08-22 21:15 ` Andrew Morton 2005-08-22 22:33 ` tony.luck 2005-08-23 15:01 ` Nick Piggin 1 sibling, 1 reply; 24+ messages in thread From: Andrew Morton @ 2005-08-22 21:15 UTC (permalink / raw) To: David S. Miller; +Cc: jasonuhl, tony.luck, linux-kernel "David S. Miller" <davem@davemloft.net> wrote: > > I really do need sub-microsecond timings when I put a lot of > printk tracing into the stack. How fast is printk? I haven't looked. ie: if you do back-to-back printk's, what's the timestamp increment? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: CONFIG_PRINTK_TIME woes 2005-08-22 21:15 ` Andrew Morton @ 2005-08-22 22:33 ` tony.luck 2005-08-22 22:38 ` Andrew Morton 0 siblings, 1 reply; 24+ messages in thread From: tony.luck @ 2005-08-22 22:33 UTC (permalink / raw) To: Andrew Morton, David S. Miller; +Cc: jasonuhl, linux-kernel >How fast is printk? I haven't looked. > >ie: if you do back-to-back printk's, what's the timestamp increment? On ia64 it looks like about 4-5 usec increment for back-to-back printk (with no serial console configured, and dmesg -n to turn off messages to the VGA console). -Tony ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: CONFIG_PRINTK_TIME woes 2005-08-22 22:33 ` tony.luck @ 2005-08-22 22:38 ` Andrew Morton 2005-08-22 23:27 ` tony.luck 0 siblings, 1 reply; 24+ messages in thread From: Andrew Morton @ 2005-08-22 22:38 UTC (permalink / raw) To: tony.luck; +Cc: davem, jasonuhl, linux-kernel tony.luck@intel.com wrote: > > >How fast is printk? I haven't looked. > > > >ie: if you do back-to-back printk's, what's the timestamp increment? > > On ia64 it looks like about 4-5 usec increment for back-to-back > printk (with no serial console configured, and dmesg -n to turn > off messages to the VGA console). > Ah, thanks. Presumably it'll be considerably longer with %d's and %s's in there. But still, ~10 usecs is good resolution for I/O operations. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: CONFIG_PRINTK_TIME woes 2005-08-22 22:38 ` Andrew Morton @ 2005-08-22 23:27 ` tony.luck 2005-08-23 1:52 ` Paul Mackerras 0 siblings, 1 reply; 24+ messages in thread From: tony.luck @ 2005-08-22 23:27 UTC (permalink / raw) To: Andrew Morton; +Cc: davem, jasonuhl, linux-kernel >Ah, thanks. Presumably it'll be considerably longer with %d's and %s's in >there. But still, ~10 usecs is good resolution for I/O operations. The variation in times from one call to the next seems to be greater than the time to evaluate 4 "%d" arguments. So we are back to how to get a timestamp in printk(). Earlier I said that it would be possible to provide a simplified do_gettimeofday() call that met the no locks requirement. I still think this is possible, but most architectures would only get jiffie resolution from this (only ia64, sparc64 and HPET users have time interpolators registered). -Tony ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: CONFIG_PRINTK_TIME woes 2005-08-22 23:27 ` tony.luck @ 2005-08-23 1:52 ` Paul Mackerras 0 siblings, 0 replies; 24+ messages in thread From: Paul Mackerras @ 2005-08-23 1:52 UTC (permalink / raw) To: tony.luck; +Cc: Andrew Morton, davem, jasonuhl, linux-kernel tony.luck@intel.com writes: > Earlier I said that it would be possible to provide a simplified > do_gettimeofday() call that met the no locks requirement. I still > think this is possible, but most architectures would only get > jiffie resolution from this (only ia64, sparc64 and HPET users > have time interpolators registered). ppc64 has a lockless do_gettimeofday() with microsecond resolution. Paul. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: CONFIG_PRINTK_TIME woes 2005-08-22 20:42 ` David S. Miller 2005-08-22 21:15 ` Andrew Morton @ 2005-08-23 15:01 ` Nick Piggin 1 sibling, 0 replies; 24+ messages in thread From: Nick Piggin @ 2005-08-23 15:01 UTC (permalink / raw) To: David S. Miller; +Cc: jasonuhl, tony.luck, akpm, linux-kernel David S. Miller wrote: > This is a useful feature, please do not labotomize it just because > it's difficult to implement on ia64. Just make a > "printk_get_timestamp_because_ia64_sucks()" interface or something > like that :-) I was a bit unclear when I raised this issue. It is not just an ia64 problem. The sched_clock() interface is allowed to return wildly different values depending on which CPU it is called from, and currently has fundamental problems at least on i386 where it can go fowards and backwards arbitrary amounts of time (due to frequency scaling, if I understand correctly), and also needn't be exactly nanoseconds at the best of times. The interface is like this so it can be per-cpu and lockless and as fast as possible for the scheduler heuristics (which aren't too picky). I just don't want its usage spreading outside kernel/sched.c if we can help it. Pragmatically it sounds like the best thing we have for printk at this time, however I hope we can come up with something slightly more appropriate even if it ends up being slower. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: CONFIG_PRINTK_TIME woes 2005-08-22 20:33 ` Jason Uhlenkott 2005-08-22 20:42 ` David S. Miller @ 2005-08-22 21:10 ` tony.luck 1 sibling, 0 replies; 24+ messages in thread From: tony.luck @ 2005-08-22 21:10 UTC (permalink / raw) To: jasonuhl, David S. Miller; +Cc: akpm, linux-kernel >I turn off VC logging, and I turn off disk sync'ing, so it goes >straight to the page cache. > >I really do need sub-microsecond timings when I put a lot of >printk tracing into the stack. Right now you only have microsecond timing. Although printk() gets "nanosecond" resolution from sched_clock() it throws away the low order bits when it outputs the ".%06lu ... nanosec_rem/1000" >This is a useful feature, please do not labotomize it just because >it's difficult to implement on ia64. Just make a >"printk_get_timestamp_because_ia64_sucks()" interface or something >like that :-) I'm just trying to assess how much work is needed at the moment. Both you and Andrew have come up with scenarios where higher than jiffies resolution is needed ... so it seems unlikely that I'll be allowed to do much damage here. Andrew's proposed: __attribute__((weak)) unsigned long long printk_clock(void) patch would provide the hooks I need. -Tony ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: CONFIG_PRINTK_TIME woes 2005-08-21 9:16 ` Andrew Morton 2005-08-21 9:27 ` Nick Piggin 2005-08-22 17:42 ` tony.luck @ 2005-08-24 0:36 ` Tim Bird 2 siblings, 0 replies; 24+ messages in thread From: Tim Bird @ 2005-08-24 0:36 UTC (permalink / raw) To: Andrew Morton; +Cc: tony.luck, linux-kernel, jasonuhl Andrew Morton wrote: > Andrew Morton <akpm@osdl.org> wrote: > >>How about we give each arch a printk_clock()? > > > Which might be as simple as this.. > > > --- devel/kernel/printk.c~printk_clock 2005-08-21 02:14:05.000000000 > -0700 > +++ devel-akpm/kernel/printk.c 2005-08-21 02:15:14.000000000 -0700 > @@ -488,6 +488,11 @@ static int __init printk_time_setup(char > > __setup("time", printk_time_setup); > > +__attribute__((weak)) unsigned long long printk_clock(void) > +{ > + return sched_clock(); > +} > + It might be good to call it "instrumentation_clock", or "tracing_clock" or something similar. Other tracing systems have similar requirements and I think this would be a generally useful thing to have. At CELF, we investigated different in-kernel timing interfaces for tracing, and came up with a proposal for our own API. See http://tree.celinuxforum.org/pubwiki/moin.cgi/InstrumentationAPI This was over a year ago and the information is somewhat dated. We ended up not pursuing this, and just using sched_clock(). I've been advocating that CE vendors beef up their sched_clock() implementations to be better than jiffy resolution, but if this is an abuse, I'd rather steer developers to provide platform support for an API that is dedicated to this purpose. It sounds like the requirements are: - 64-bit value, sub-microsecond resolution - although for tracing we prefer to store only 32-bits and microseconds seems to give a decent balance between precision and rollover avoidance. - lockless - (for me) available VERY early (On some platforms, we have the firmware set up the clock so it can be read immediately from start_kernel() on) - (not sure how important is timestamp coherency between processors) My apologies in advance for the following code, which is sure to make your eyes bleed ;-), but here's some code from a tracing system that supports several different embedded boards. This is from kft.c: (I'm NOT advocating adopting this code - I'm just posting for reference for the API it provides, and to show a couple of different implementation approaches. This particular code uses an unsigned long (assumed to be 32-bits), and tries to defer overflows with funky math.) +/* control whether a generic or custom clock routine is used */ +#if !defined(CONFIG_MIPS) && !defined(CONFIG_SH) +#define GENERIC_KFTREADCLOCK 1 +#endif + +#ifdef GENERIC_KFTREADCLOCK +/* + * Define a genefic kft_readclock routine. + * This should work well enough for platforms where sched_clock() + * gives good (sub-microsecond) precision. + * + * There are valid reasons to use other routines, including: + * - when using kft for boot timings + * - on most platforms, sched_clock() does not work correctly until + * after time_init() + * - reduced overhead for obtaining a microsecond value + * (This may be incorrect, since at most this adds one + * 64-bit-by-32-bit divide, in addition to the shift that + * is inside sched_clock(). KFT does enough other stuff + * that this one divide is probably not a major factor + * in KFT overhead.) + */ +static inline unsigned long __noinstrument kft_readclock(void) +{ + unsigned long long t; + + t = sched_clock(); + /* convert to microseconds */ + do_div(t,1000); + return (unsigned long)t; +} + +static inline unsigned long __noinstrument kft_clock_to_usecs(unsigned long clock) +{ + return clock; +} + +#endif /* GENERIC_KFTREADCLOCK - non-MIPS, non-SH */ + +#ifndef GENERIC_KFTREADCLOCK +/* + * Use arch-specific kft_readclock() and kft_clock_to_usecs() routines + * + * First - define some platform-specific constants + * + * !! If using a non-generic KFT readclock, you need + * to set the following constants for your machine!! + * + * CLOCK_FREQ is a hardcoded value for the frequency of + * whatever clock you are using for kft_readclock() + * It would be nice to use a probed clock freq (cpu_hz) + * here, but it isn't set early enough for some boot + * measurements. + * Hint: for x86, boot once and look at /proc/cpuinfo + * + * CLOCK_SHIFT is used to bring the clock frequency into + * a manageable range. For my 3 GHz machine, I decided + * to divide the cpu cycle clock by 8. This throws + * away some clock precision, but makes some of the + * other math faster and helps us stay in 32 bits. + */ + +#ifdef CONFIG_X86_TSC +// Tim's old laptop +//#define CLOCK_FREQ 645206000ULL +// Tim's HP desktop +#define CLOCK_FREQ 2992332000ULL +#define CLOCK_SHIFT 3 +#endif /* CONFIG_X86_TSC */ + +#ifdef CONFIG_PPC32 +// Ebony board +#define CLOCK_FREQ 400000000ULL +#define CLOCK_SHIFT 3 +#endif /* CONFIG_PPC32 */ + +#ifdef CONFIG_CPU_SH4 +#define CLOCK_FREQ 15000000ULL /* =P/4 */ +#define CLOCK_SHIFT 0 +#endif /* CONFIG_CPU_SH4 */ + +#ifdef CONFIG_MIPS +/* tx4938 */ +#define CLOCK_FREQ ( 300000000ULL / 2 ) +#define CLOCK_SHIFT 3 +#endif /* CONFIG_MIPS */ + + + +#ifdef CONFIG_X86_TSC +#include <asm/time.h> /* for rdtscll macro */ +static inline unsigned long kft_readclock(void) +{ + unsigned long long ticks; + + rdtscll(ticks); + return (unsigned long)((ticks>>CLOCK_SHIFT) & 0xffffffff); +} +#endif /* CONFIG_X86_TSC */ + + +#ifdef CONFIG_PPC32 +#include <asm/time.h> /* for get_tbu macro */ +/* copied from sched_clock for ppc */ +static inline unsigned long kft_readclock(void) +{ + unsigned long lo, hi, hi2; + unsigned long long ticks; + + do { + hi = get_tbu(); + lo = get_tbl(); + hi2 = get_tbu(); + } while (hi2 != hi); + ticks = ((unsigned long long) hi << 32) | lo; + return (unsigned long)((ticks>>CLOCK_SHIFT) & 0xffffffff); +} +#endif /* CONFIG_PPC32 */ + +#ifdef CONFIG_CPU_SH4 +/* + * In advance, start Timer Unit4(TMU4) + * ex. + * *TMU4_TCR = 0x0000; + * *TMU4_TCOR = 0; + * *TMU4_TCNT = 0; + * *TMU_TSTR2 = (*TMU_TSTR2|0x02); + */ +#define TMU4_TCNT ((volatile unsigned long *)0xFE100018) + +static inline unsigned long kft_readclock(void) +{ + return (-(*TMU4_TCNT))>>CLOCK_SHIFT; +} +#endif /* CONFIG_CPU_SH4 */ + +#ifdef CONFIG_MIPS +static inline unsigned long kft_readclock(void) +{ + return (unsigned long)read_c0_count(); +} +#endif /* CONFIG_MIPS */ + +/* + * Now define a generic routine to convert from clock tics to usecs. + * + * This weird scaling factor makes it possible to use shifts and a + * single 32-bit divide, instead of more expensive math, + * for the conversion to microseconds. + */ +#define CLOCK_SCALE ((((CLOCK_FREQ*1024*1024)/1000000))>>CLOCK_SHIFT) + +static inline unsigned long kft_clock_to_usecs(unsigned long clock) +{ + /* math to stay in 32 bits. Try to avoid over and underflows */ + if (clock<4096) + return (clock<<20)/CLOCK_SCALE; + if (clock<(4096<<5)) + return (clock<<15)/(CLOCK_SCALE>>5); + if (clock<(4096<<10)) + return (clock<<10)/(CLOCK_SCALE>>10); + if (clock<(4096<<15)) + return (clock<<5)/(CLOCK_SCALE>>15); + else + return clock/(CLOCK_SCALE>>20); +} + +#endif /* not GENERIC_KFT_READCLOCK */ + +#if CONFIG_KFT_CLOCK_SCALE + +extern void set_cyc2ns_scale(unsigned long cpu_mhz); + +/* + * Do whatever is required to prepare for calling sched_clock very + * early in the boot sequence. + */ +extern void setup_early_kft_clock(void) +{ + set_cyc2ns_scale(CONFIG_KFT_CLOCK_SCALE); +} +#endif /* CONFIG_KFT_CLOCK_SCALE */ + ============================= Tim Bird Architecture Group Chair, CE Linux Forum Senior Staff Engineer, Sony Electronics ============================= ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: CONFIG_PRINTK_TIME woes 2005-08-21 9:13 ` Andrew Morton 2005-08-21 9:16 ` Andrew Morton @ 2005-08-23 7:18 ` Tony Lindgren 1 sibling, 0 replies; 24+ messages in thread From: Tony Lindgren @ 2005-08-23 7:18 UTC (permalink / raw) To: Andrew Morton; +Cc: Luck, Tony, linux-kernel, jasonuhl * Andrew Morton <akpm@osdl.org> [050821 02:15]: > "Luck, Tony" <tony.luck@intel.com> wrote: > > > > It has been pointed out to me that ia64 doesn't boot > > with CONFIG_PRINTK_TIME=y. The issue is the call to > > sched_clock() ... which on ia64 accesses some per-cpu > > data to adjust for possible variations in processor > > speed between different cpus. Since the per-cpu page > > is not set up for the first few printk() calls, we die. > > > > Is this an issue on any other architecture? Most versions > > of sched_clock() seem to just scale jiffies into nanoseconds, > > so I guess they don't. s390, sparc64, x86 and x86_64 all > > have more sophisticated versions but they don't appear to me > > to have limitations on how early they might be called. CONFIG_PRINTK_TIME also has a problem on at least ARM OMAP where the IO mapping to read the clock may not be initialized when sched_clock() is called for the first time. I'd hate to have to test for something for CONFIG_PRINTK_TIME every time sched_clock() is being called. The quick fix would seem to be to only allow CONFIG_PRINTK_TIME from kernel cmdline to make it happen a bit later. So basically make int printk_time = 0 until command line is evaluated. Tony ^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: CONFIG_PRINTK_TIME woes @ 2005-08-23 14:07 Luck, Tony 0 siblings, 0 replies; 24+ messages in thread From: Luck, Tony @ 2005-08-23 14:07 UTC (permalink / raw) To: Tony Lindgren, Andrew Morton; +Cc: linux-kernel, jasonuhl >I'd hate to have to test for something for CONFIG_PRINTK_TIME >every time sched_clock() is being called. Me too. >The quick fix would seem to be to only allow CONFIG_PRINTK_TIME >from kernel cmdline to make it happen a bit later. So basically >make int printk_time = 0 until command line is evaluated. Good thought, but this won't work for ia64 in the hot-plug cpu case. There are a couple of printk() calls by new cpus as they boot before they have set-up their per-cpu areas. So there is no global state that can be checked to decide whether it is safe for printk() to call sched_clock(). -Tony ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2005-08-24 0:36 UTC | newest] Thread overview: 24+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-08-18 18:47 CONFIG_PRINTK_TIME woes Luck, Tony 2005-08-21 9:13 ` Andrew Morton 2005-08-21 9:16 ` Andrew Morton 2005-08-21 9:27 ` Nick Piggin 2005-08-21 9:32 ` Andrew Morton 2005-08-21 11:25 ` Nick Piggin 2005-08-21 18:01 ` Andrew Morton 2005-08-24 0:04 ` Tim Bird 2005-08-22 17:42 ` tony.luck 2005-08-22 17:50 ` Andrew Morton 2005-08-22 20:52 ` tony.luck 2005-08-22 20:20 ` David S. Miller 2005-08-22 20:33 ` Jason Uhlenkott 2005-08-22 20:42 ` David S. Miller 2005-08-22 21:15 ` Andrew Morton 2005-08-22 22:33 ` tony.luck 2005-08-22 22:38 ` Andrew Morton 2005-08-22 23:27 ` tony.luck 2005-08-23 1:52 ` Paul Mackerras 2005-08-23 15:01 ` Nick Piggin 2005-08-22 21:10 ` tony.luck 2005-08-24 0:36 ` Tim Bird 2005-08-23 7:18 ` Tony Lindgren -- strict thread matches above, loose matches on Subject: below -- 2005-08-23 14:07 Luck, Tony
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox