* sched_clock() uses are broken
@ 2006-05-02 13:29 Russell King
2006-05-02 14:21 ` Nick Piggin
` (2 more replies)
0 siblings, 3 replies; 41+ messages in thread
From: Russell King @ 2006-05-02 13:29 UTC (permalink / raw)
To: Linux Kernel List
Hi,
Someone just asked a question about sched_clock() on the ARM list, asking
whether there was any way to find the range of values returned by this
function.
My initial thought was "why would you want that", but on considering the
problem a bit further, I could see why the question arises. In looking
deeper, I discovered a rather sad state of affairs with sched_clock().
The kernel assumes that the following will work:
unsigned long long t0, t1, time_passed_ns;
t0 = sched_clock();
/* do something */
t1 = sched_clock();
time_passed_ns = t1 - t0;
This is all very well if sched_clock() returns values filling the
entire range of an unsigned long long - two complement maths will
naturally return the time difference in the case where t1 < t0.
However, this is not the case. On x86 with TSC, it returns a 54 bit
number. This means that when t1 < t0, time_passed_ns becomes a very
large number which no longer represents the amount of time.
All uses in kernel/sched.c seem to be aflicted by this problem.
There are several solutions to this - the most obvious being that we
need a function which returns the nanosecond difference between two
sched_clock() return values, and this function needs to know how to
handle the case where sched_clock() has wrapped.
IOW:
t0 = sched_clock();
/* do something */
t1 = sched_clock();
time_passed = sched_clock_diff(t1, t0);
Comments?
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: sched_clock() uses are broken 2006-05-02 13:29 sched_clock() uses are broken Russell King @ 2006-05-02 14:21 ` Nick Piggin 2006-05-02 16:43 ` Andi Kleen 2006-05-04 20:02 ` Florian Paul Schmidt 2 siblings, 0 replies; 41+ messages in thread From: Nick Piggin @ 2006-05-02 14:21 UTC (permalink / raw) To: Russell King; +Cc: Linux Kernel List, john stultz Russell King wrote: > There are several solutions to this - the most obvious being that we > need a function which returns the nanosecond difference between two > sched_clock() return values, and this function needs to know how to > handle the case where sched_clock() has wrapped. > > IOW: > > t0 = sched_clock(); > /* do something */ > t1 = sched_clock(); > > time_passed = sched_clock_diff(t1, t0); > > Comments? > There is another problem John pointed out to me: sched_clock (at least on i386) can do tsc frequency scaling on the raw tsc value. I'm not sure if this is still a problem (I'm not aware that it has been fixed), however it would mean that between two sched_clock()s, the values returned can be basically completely arbitrary. What is needed is something like: t0 = get_cycles_unsynchronized(); t1 = get_cycles_unsynchronized(); ns = cycles_to_ns(t1, t0); Where unsynchronized means not synchronized between CPUs. This would still cause the `ns' value to be skewed if a frequency change occured between t0 and t1, however at least it should be within some realistic range (something like ns +/- ns * max freq / min freq). -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-02 13:29 sched_clock() uses are broken Russell King 2006-05-02 14:21 ` Nick Piggin @ 2006-05-02 16:43 ` Andi Kleen 2006-05-02 16:50 ` Russell King 2006-05-02 16:54 ` Christopher Friesen 2006-05-04 20:02 ` Florian Paul Schmidt 2 siblings, 2 replies; 41+ messages in thread From: Andi Kleen @ 2006-05-02 16:43 UTC (permalink / raw) To: Russell King; +Cc: linux-kernel Russell King <rmk+lkml@arm.linux.org.uk> writes: > > However, this is not the case. On x86 with TSC, it returns a 54 bit > number. This means that when t1 < t0, time_passed_ns becomes a very > large number which no longer represents the amount of time. Good point. For a 1Ghz system this would happen every ~0.57 years. The problem is there is AFAIK no non destructive[1] way to find out how many bits the TSC has Destructive would be to overwrite it with -1 and see how many stick. > All uses in kernel/sched.c seem to be aflicted by this problem. > > There are several solutions to this - the most obvious being that we > need a function which returns the nanosecond difference between two > sched_clock() return values, and this function needs to know how to > handle the case where sched_clock() has wrapped. Ok it can be done with a simple test. > > IOW: > > t0 = sched_clock(); > /* do something */ > t1 = sched_clock(); > > time_passed = sched_clock_diff(t1, t0); > > Comments? Agreed it's a problem, but probably a small one. At worst you'll get a small scheduling hickup every half year, which should be hardly that big an issue. Might chose to just ignore it with a big fat comment? -Andi ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-02 16:43 ` Andi Kleen @ 2006-05-02 16:50 ` Russell King 2006-05-02 17:01 ` Andi Kleen 2006-05-02 17:15 ` Nicolas Pitre 2006-05-02 16:54 ` Christopher Friesen 1 sibling, 2 replies; 41+ messages in thread From: Russell King @ 2006-05-02 16:50 UTC (permalink / raw) To: Andi Kleen; +Cc: linux-kernel On Tue, May 02, 2006 at 06:43:45PM +0200, Andi Kleen wrote: > Russell King <rmk+lkml@arm.linux.org.uk> writes: > > > > However, this is not the case. On x86 with TSC, it returns a 54 bit > > number. This means that when t1 < t0, time_passed_ns becomes a very > > large number which no longer represents the amount of time. > > Good point. For a 1Ghz system this would happen every ~0.57 years. > > The problem is there is AFAIK no non destructive[1] way to find out how > many bits the TSC has > > Destructive would be to overwrite it with -1 and see how many stick. > > > All uses in kernel/sched.c seem to be aflicted by this problem. > > > > There are several solutions to this - the most obvious being that we > > need a function which returns the nanosecond difference between two > > sched_clock() return values, and this function needs to know how to > > handle the case where sched_clock() has wrapped. > > Ok it can be done with a simple test. > > > > > IOW: > > > > t0 = sched_clock(); > > /* do something */ > > t1 = sched_clock(); > > > > time_passed = sched_clock_diff(t1, t0); > > > > Comments? > > Agreed it's a problem, but probably a small one. At worst you'll get > a small scheduling hickup every half year, which should be hardly > that big an issue. > > Might chose to just ignore it with a big fat comment? You're right assuming you have a 64-bit TSC, but ARM has at best a 32-bit cycle counter which rolls over about every 179 seconds - with gives a range of values from sched_clock from 0 to 178956970625 or 0x29AAAAAA81. That's rather more of a problem than having it happen every 208 days. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-02 16:50 ` Russell King @ 2006-05-02 17:01 ` Andi Kleen 2006-05-02 17:18 ` Nicolas Pitre 2006-05-02 17:15 ` Nicolas Pitre 1 sibling, 1 reply; 41+ messages in thread From: Andi Kleen @ 2006-05-02 17:01 UTC (permalink / raw) To: Russell King; +Cc: linux-kernel On Tuesday 02 May 2006 18:50, Russell King wrote: > You're right assuming you have a 64-bit TSC, but ARM has at best a > 32-bit cycle counter which rolls over about every 179 seconds - with > gives a range of values from sched_clock from 0 to 178956970625 or > 0x29AAAAAA81. > > That's rather more of a problem than having it happen every 208 days. Ok but you know it's always 32bit right? You can fix it up then with your proposal of a sched_diff() The problem would be fixing it up with a unknown number of bits. -Andi ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-02 17:01 ` Andi Kleen @ 2006-05-02 17:18 ` Nicolas Pitre 2006-05-02 18:55 ` Russell King 0 siblings, 1 reply; 41+ messages in thread From: Nicolas Pitre @ 2006-05-02 17:18 UTC (permalink / raw) To: Andi Kleen; +Cc: Russell King, linux-kernel On Tue, 2 May 2006, Andi Kleen wrote: > On Tuesday 02 May 2006 18:50, Russell King wrote: > > > You're right assuming you have a 64-bit TSC, but ARM has at best a > > 32-bit cycle counter which rolls over about every 179 seconds - with > > gives a range of values from sched_clock from 0 to 178956970625 or > > 0x29AAAAAA81. > > > > That's rather more of a problem than having it happen every 208 days. > > Ok but you know it's always 32bit right? You can fix it up then > with your proposal of a sched_diff() > > The problem would be fixing it up with a unknown number of bits. Just shift it left so you know you always have the most significant bits valid. The sched_diff() would take care of scaling it back to nanosecs. Nicolas ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-02 17:18 ` Nicolas Pitre @ 2006-05-02 18:55 ` Russell King 2006-05-02 19:05 ` Nicolas Pitre 0 siblings, 1 reply; 41+ messages in thread From: Russell King @ 2006-05-02 18:55 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Andi Kleen, linux-kernel On Tue, May 02, 2006 at 01:18:25PM -0400, Nicolas Pitre wrote: > On Tue, 2 May 2006, Andi Kleen wrote: > > > On Tuesday 02 May 2006 18:50, Russell King wrote: > > > > > You're right assuming you have a 64-bit TSC, but ARM has at best a > > > 32-bit cycle counter which rolls over about every 179 seconds - with > > > gives a range of values from sched_clock from 0 to 178956970625 or > > > 0x29AAAAAA81. > > > > > > That's rather more of a problem than having it happen every 208 days. > > > > Ok but you know it's always 32bit right? You can fix it up then > > with your proposal of a sched_diff() > > > > The problem would be fixing it up with a unknown number of bits. > > Just shift it left so you know you always have the most significant bits > valid. The sched_diff() would take care of scaling it back to nanosecs. sched_clock is currently defined to return nanoseconds so this isn't a possibility. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-02 18:55 ` Russell King @ 2006-05-02 19:05 ` Nicolas Pitre 2006-05-02 19:08 ` Russell King 0 siblings, 1 reply; 41+ messages in thread From: Nicolas Pitre @ 2006-05-02 19:05 UTC (permalink / raw) To: Russell King; +Cc: Andi Kleen, linux-kernel [-- Attachment #1: Type: TEXT/PLAIN, Size: 1225 bytes --] On Tue, 2 May 2006, Russell King wrote: > On Tue, May 02, 2006 at 01:18:25PM -0400, Nicolas Pitre wrote: > > On Tue, 2 May 2006, Andi Kleen wrote: > > > > > On Tuesday 02 May 2006 18:50, Russell King wrote: > > > > > > > You're right assuming you have a 64-bit TSC, but ARM has at best a > > > > 32-bit cycle counter which rolls over about every 179 seconds - with > > > > gives a range of values from sched_clock from 0 to 178956970625 or > > > > 0x29AAAAAA81. > > > > > > > > That's rather more of a problem than having it happen every 208 days. > > > > > > Ok but you know it's always 32bit right? You can fix it up then > > > with your proposal of a sched_diff() > > > > > > The problem would be fixing it up with a unknown number of bits. > > > > Just shift it left so you know you always have the most significant bits > > valid. The sched_diff() would take care of scaling it back to nanosecs. > > sched_clock is currently defined to return nanoseconds so this isn't > a possibility. If we're discussing the addition of a sched_clock_diff(), why whouldn't shed_clock() return anything it wants in that context? It could be redefined to have a return value meaningful only to shed_clock_diff()é Nicolas ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-02 19:05 ` Nicolas Pitre @ 2006-05-02 19:08 ` Russell King 2006-05-02 19:23 ` Nicolas Pitre 0 siblings, 1 reply; 41+ messages in thread From: Russell King @ 2006-05-02 19:08 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Andi Kleen, linux-kernel On Tue, May 02, 2006 at 03:05:22PM -0400, Nicolas Pitre wrote: > On Tue, 2 May 2006, Russell King wrote: > > On Tue, May 02, 2006 at 01:18:25PM -0400, Nicolas Pitre wrote: > > > On Tue, 2 May 2006, Andi Kleen wrote: > > > > > > > On Tuesday 02 May 2006 18:50, Russell King wrote: > > > > > > > > > You're right assuming you have a 64-bit TSC, but ARM has at best a > > > > > 32-bit cycle counter which rolls over about every 179 seconds - with > > > > > gives a range of values from sched_clock from 0 to 178956970625 or > > > > > 0x29AAAAAA81. > > > > > > > > > > That's rather more of a problem than having it happen every 208 days. > > > > > > > > Ok but you know it's always 32bit right? You can fix it up then > > > > with your proposal of a sched_diff() > > > > > > > > The problem would be fixing it up with a unknown number of bits. > > > > > > Just shift it left so you know you always have the most significant bits > > > valid. The sched_diff() would take care of scaling it back to nanosecs. > > > > sched_clock is currently defined to return nanoseconds so this isn't > > a possibility. > > If we're discussing the addition of a sched_clock_diff(), why whouldn't > shed_clock() return anything it wants in that context? It could be > redefined to have a return value meaningful only to shed_clock_diff()? If we're talking about doing that, we need to replace sched_clock() to ensure that we all users are aware that it has changed. I did think about that for my original fix proposal, but stepped back because that's a bigger change - and is something for post-2.6.17. The smallest fix (suitable for -rc kernels) is as I detailed. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-02 19:08 ` Russell King @ 2006-05-02 19:23 ` Nicolas Pitre 2006-05-02 21:35 ` Russell King 0 siblings, 1 reply; 41+ messages in thread From: Nicolas Pitre @ 2006-05-02 19:23 UTC (permalink / raw) To: Russell King; +Cc: Andi Kleen, linux-kernel On Tue, 2 May 2006, Russell King wrote: > On Tue, May 02, 2006 at 03:05:22PM -0400, Nicolas Pitre wrote: > > If we're discussing the addition of a sched_clock_diff(), why whouldn't > > shed_clock() return anything it wants in that context? It could be > > redefined to have a return value meaningful only to shed_clock_diff()? > > If we're talking about doing that, we need to replace sched_clock() > to ensure that we all users are aware that it has changed. > > I did think about that for my original fix proposal, but stepped back > because that's a bigger change - and is something for post-2.6.17. > The smallest fix (suitable for -rc kernels) is as I detailed. Oh agreed. Nicolas ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-02 19:23 ` Nicolas Pitre @ 2006-05-02 21:35 ` Russell King 0 siblings, 0 replies; 41+ messages in thread From: Russell King @ 2006-05-02 21:35 UTC (permalink / raw) To: Nicolas Pitre, Ingo Molnar; +Cc: Andi Kleen, linux-kernel On Tue, May 02, 2006 at 03:23:01PM -0400, Nicolas Pitre wrote: > On Tue, 2 May 2006, Russell King wrote: > > > On Tue, May 02, 2006 at 03:05:22PM -0400, Nicolas Pitre wrote: > > > If we're discussing the addition of a sched_clock_diff(), why whouldn't > > > shed_clock() return anything it wants in that context? It could be > > > redefined to have a return value meaningful only to shed_clock_diff()? > > > > If we're talking about doing that, we need to replace sched_clock() > > to ensure that we all users are aware that it has changed. > > > > I did think about that for my original fix proposal, but stepped back > > because that's a bigger change - and is something for post-2.6.17. > > The smallest fix (suitable for -rc kernels) is as I detailed. > > Oh agreed. (Added Ingo - don't know if you saw my original message.) Actually, I'm not entirely convinced that the smallest fix is going to be the best solution - it looks too complex. Note the code change in update_cpu_clock and sched_current_time - arguably both are buggy as they stand today when the values wrap. Comments anyone? diff --git a/include/linux/sched.h b/include/linux/sched.h --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -988,6 +988,11 @@ static inline int set_cpus_allowed(task_ extern unsigned long long sched_clock(void); extern unsigned long long current_sched_time(const task_t *current_task); +static inline unsigned long long sched_clock_diff(unsigned long long t1, unsigned long long t0) +{ + return t1 - t0; +} + /* sched_exec is called by processes performing an exec */ #ifdef CONFIG_SMP extern void sched_exec(void); diff --git a/kernel/sched.c b/kernel/sched.c --- a/kernel/sched.c +++ b/kernel/sched.c @@ -731,7 +731,7 @@ static inline void __activate_idle_task( static int recalc_task_prio(task_t *p, unsigned long long now) { /* Caller must always ensure 'now >= p->timestamp' */ - unsigned long long __sleep_time = now - p->timestamp; + unsigned long long __sleep_time = sched_clock_diff(now, p->timestamp); unsigned long sleep_time; if (batch_task(p)) @@ -807,7 +807,7 @@ static void activate_task(task_t *p, run if (!local) { /* Compensate for drifting sched_clock */ runqueue_t *this_rq = this_rq(); - now = (now - this_rq->timestamp_last_tick) + now = sched_clock_diff(now, this_rq->timestamp_last_tick) + rq->timestamp_last_tick; } #endif @@ -2512,8 +2512,10 @@ EXPORT_PER_CPU_SYMBOL(kstat); static inline void update_cpu_clock(task_t *p, runqueue_t *rq, unsigned long long now) { - unsigned long long last = max(p->timestamp, rq->timestamp_last_tick); - p->sched_time += now - last; + unsigned long long d1, d2; + d1 = sched_clock_diff(now, p->timestamp); + d2 = sched_clock_diff(now, rq->timestamp_last_tick); + p->sched_time += min(d1, d2); } /* @@ -2522,11 +2524,13 @@ static inline void update_cpu_clock(task */ unsigned long long current_sched_time(const task_t *tsk) { - unsigned long long ns; + unsigned long long now, d1, d2, ns; unsigned long flags; local_irq_save(flags); - ns = max(tsk->timestamp, task_rq(tsk)->timestamp_last_tick); - ns = tsk->sched_time + (sched_clock() - ns); + now = sched_clock(); + d1 = sched_clock_diff(now, tsk->timestamp); + d2 = sched_clock_diff(now, task_rq(tsk)->timestamp_last_tick); + ns = tsk->sched_time + min(d1, d2); local_irq_restore(flags); return ns; } @@ -2925,7 +2929,7 @@ asmlinkage void __sched schedule(void) runqueue_t *rq; prio_array_t *array; struct list_head *queue; - unsigned long long now; + unsigned long long now, sleep; unsigned long run_time; int cpu, idx, new_prio; @@ -2960,11 +2964,10 @@ need_resched_nonpreemptible: schedstat_inc(rq, sched_cnt); now = sched_clock(); - if (likely((long long)(now - prev->timestamp) < NS_MAX_SLEEP_AVG)) { - run_time = now - prev->timestamp; - if (unlikely((long long)(now - prev->timestamp) < 0)) - run_time = 0; - } else + sleep = sched_clock_diff(now, prev->timestamp); + if (likely(sleep < NS_MAX_SLEEP_AVG)) + run_time = sleep; + else run_time = NS_MAX_SLEEP_AVG; /* @@ -3039,8 +3042,8 @@ go_idle: next = list_entry(queue->next, task_t, run_list); if (!rt_task(next) && interactive_sleep(next->sleep_type)) { - unsigned long long delta = now - next->timestamp; - if (unlikely((long long)(now - next->timestamp) < 0)) + unsigned long long delta = sched_clock_diff(now, next->timestamp); + if (unlikely((long long)delta < 0)) delta = 0; if (next->sleep_type == SLEEP_INTERACTIVE) -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-02 16:50 ` Russell King 2006-05-02 17:01 ` Andi Kleen @ 2006-05-02 17:15 ` Nicolas Pitre 2006-05-04 3:50 ` George Anzinger 1 sibling, 1 reply; 41+ messages in thread From: Nicolas Pitre @ 2006-05-02 17:15 UTC (permalink / raw) To: Russell King; +Cc: Andi Kleen, lkml On Tue, 2 May 2006, Russell King wrote: > On Tue, May 02, 2006 at 06:43:45PM +0200, Andi Kleen wrote: > > Russell King <rmk+lkml@arm.linux.org.uk> writes: > > > > > > However, this is not the case. On x86 with TSC, it returns a 54 bit > > > number. This means that when t1 < t0, time_passed_ns becomes a very > > > large number which no longer represents the amount of time. > > > > Good point. For a 1Ghz system this would happen every ~0.57 years. > > > > The problem is there is AFAIK no non destructive[1] way to find out how > > many bits the TSC has > > > > Destructive would be to overwrite it with -1 and see how many stick. > > > > > All uses in kernel/sched.c seem to be aflicted by this problem. > > > > > > There are several solutions to this - the most obvious being that we > > > need a function which returns the nanosecond difference between two > > > sched_clock() return values, and this function needs to know how to > > > handle the case where sched_clock() has wrapped. > > > > Ok it can be done with a simple test. Better yet the sched_clock() implementation just needs to return a value shifted left so the wrap around always happens on 64 bits and the difference between two consecutive samples is always right. > > > > > > IOW: > > > > > > t0 = sched_clock(); > > > /* do something */ > > > t1 = sched_clock(); > > > > > > time_passed = sched_clock_diff(t1, t0); > > > > > > Comments? > > > > Agreed it's a problem, but probably a small one. At worst you'll get > > a small scheduling hickup every half year, which should be hardly > > that big an issue. ... on x86 that is. > > Might chose to just ignore it with a big fat comment? > > You're right assuming you have a 64-bit TSC, but ARM has at best a > 32-bit cycle counter which rolls over about every 179 seconds - with > gives a range of values from sched_clock from 0 to 178956970625 or > 0x29AAAAAA81. > > That's rather more of a problem than having it happen every 208 days. Yet that counter isn't necessarily nanosecond based. So rescaling the returned value to nanosecs requires expensive divisions which could be done only once within sched_clock_diff() instead of twice as often in each sched_clock() calls. Nicolas ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-02 17:15 ` Nicolas Pitre @ 2006-05-04 3:50 ` George Anzinger 2006-05-04 14:18 ` Nicolas Pitre 0 siblings, 1 reply; 41+ messages in thread From: George Anzinger @ 2006-05-04 3:50 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Russell King, Andi Kleen, lkml Nicolas Pitre wrote: > On Tue, 2 May 2006, Russell King wrote: > > >>On Tue, May 02, 2006 at 06:43:45PM +0200, Andi Kleen wrote: >> >>>Russell King <rmk+lkml@arm.linux.org.uk> writes: >>> >>>>However, this is not the case. On x86 with TSC, it returns a 54 bit >>>>number. This means that when t1 < t0, time_passed_ns becomes a very >>>>large number which no longer represents the amount of time. >>> >>>Good point. For a 1Ghz system this would happen every ~0.57 years. >>> >>>The problem is there is AFAIK no non destructive[1] way to find out how >>>many bits the TSC has >>> >>>Destructive would be to overwrite it with -1 and see how many stick. >>> >>> >>>>All uses in kernel/sched.c seem to be aflicted by this problem. >>>> >>>>There are several solutions to this - the most obvious being that we >>>>need a function which returns the nanosecond difference between two >>>>sched_clock() return values, and this function needs to know how to >>>>handle the case where sched_clock() has wrapped. >>> >>>Ok it can be done with a simple test. > > > Better yet the sched_clock() implementation just needs to return a value > shifted left so the wrap around always happens on 64 bits and the > difference between two consecutive samples is always right. > > >>>>IOW: >>>> >>>> t0 = sched_clock(); >>>> /* do something */ >>>> t1 = sched_clock(); >>>> >>>> time_passed = sched_clock_diff(t1, t0); >>>> >>>>Comments? >>> >>>Agreed it's a problem, but probably a small one. At worst you'll get >>>a small scheduling hickup every half year, which should be hardly >>>that big an issue. > > > ... on x86 that is. > > >>>Might chose to just ignore it with a big fat comment? >> >>You're right assuming you have a 64-bit TSC, but ARM has at best a >>32-bit cycle counter which rolls over about every 179 seconds - with >>gives a range of values from sched_clock from 0 to 178956970625 or >>0x29AAAAAA81. >> >>That's rather more of a problem than having it happen every 208 days. > > > Yet that counter isn't necessarily nanosecond based. So rescaling the > returned value to nanosecs requires expensive divisions which could be > done only once within sched_clock_diff() instead of twice as often in > each sched_clock() calls. Oh phooey!! Scaling can be done with a mpy and a shift. See the new clock code where the TSC (or what ever) is scaled to ns. -- George Anzinger george@wildturkeyranch.net HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-04 3:50 ` George Anzinger @ 2006-05-04 14:18 ` Nicolas Pitre 0 siblings, 0 replies; 41+ messages in thread From: Nicolas Pitre @ 2006-05-04 14:18 UTC (permalink / raw) To: George Anzinger; +Cc: Russell King, Andi Kleen, lkml On Wed, 3 May 2006, George Anzinger wrote: > Nicolas Pitre wrote: > > Yet that counter isn't necessarily nanosecond based. So rescaling the > > returned value to nanosecs requires expensive divisions which could be done > > only once within sched_clock_diff() instead of twice as often in each > > sched_clock() calls. > > Oh phooey!! Scaling can be done with a mpy and a shift. See the new clock > code where the TSC (or what ever) is scaled to ns. I know. And if you want to preserve more than 32 bits of precision you need 4 mpy_and_add insns with a shift on ARM at least. But the point remains that it is more efficient to do it once rather than twice or more. Nicolas ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-02 16:43 ` Andi Kleen 2006-05-02 16:50 ` Russell King @ 2006-05-02 16:54 ` Christopher Friesen 2006-05-02 16:59 ` Andi Kleen 1 sibling, 1 reply; 41+ messages in thread From: Christopher Friesen @ 2006-05-02 16:54 UTC (permalink / raw) To: Andi Kleen; +Cc: Russell King, linux-kernel Andi Kleen wrote: > Agreed it's a problem, but probably a small one. At worst you'll get > a small scheduling hickup every half year, which should be hardly > that big an issue. Presumably this would be bad for soft-realtime embedded things. Set-top boxes, etc. Chris ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-02 16:54 ` Christopher Friesen @ 2006-05-02 16:59 ` Andi Kleen 2006-05-02 17:07 ` Nick Piggin 0 siblings, 1 reply; 41+ messages in thread From: Andi Kleen @ 2006-05-02 16:59 UTC (permalink / raw) To: Christopher Friesen; +Cc: Russell King, linux-kernel On Tuesday 02 May 2006 18:54, Christopher Friesen wrote: > Andi Kleen wrote: > > > Agreed it's a problem, but probably a small one. At worst you'll get > > a small scheduling hickup every half year, which should be hardly > > that big an issue. > > Presumably this would be bad for soft-realtime embedded things. Set-top > boxes, etc. SCHED_RR/FIFO are not affected. AFAIK it's only used by the interactivity detector in the normal scheduler. Worst case that happens is that a process is not detected to be interactive when it should once, which gives it only a small penalty. On the next time slice everything will be ok again. -Andi ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-02 16:59 ` Andi Kleen @ 2006-05-02 17:07 ` Nick Piggin 2006-05-03 7:09 ` Mike Galbraith 0 siblings, 1 reply; 41+ messages in thread From: Nick Piggin @ 2006-05-02 17:07 UTC (permalink / raw) To: Andi Kleen; +Cc: Christopher Friesen, Russell King, linux-kernel Andi Kleen wrote: > On Tuesday 02 May 2006 18:54, Christopher Friesen wrote: > >>Andi Kleen wrote: >> >> >>>Agreed it's a problem, but probably a small one. At worst you'll get >>>a small scheduling hickup every half year, which should be hardly >>>that big an issue. >> >>Presumably this would be bad for soft-realtime embedded things. Set-top >>boxes, etc. > > > SCHED_RR/FIFO are not affected. AFAIK it's only used by the interactivity > detector in the normal scheduler. Worst case that happens is that a > process is not detected to be interactive when it should once, which > gives it only a small penalty. On the next time slice everything will be ok again. Other problem is that some people didn't RTFM and have started trying to use it for precise accounting :( -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-02 17:07 ` Nick Piggin @ 2006-05-03 7:09 ` Mike Galbraith 2006-05-03 7:40 ` Andi Kleen 2006-05-07 12:33 ` Nick Piggin 0 siblings, 2 replies; 41+ messages in thread From: Mike Galbraith @ 2006-05-03 7:09 UTC (permalink / raw) To: Nick Piggin; +Cc: Andi Kleen, Christopher Friesen, Russell King, linux-kernel On Wed, 2006-05-03 at 03:07 +1000, Nick Piggin wrote: > Other problem is that some people didn't RTFM and have started trying to > use it for precise accounting :( Are you talking about me perchance? I don't really care about precision _that_ much, though I certainly do want to tighten timeslice accounting. Given that most people are going to end up using the pm_timer anyway, I don't see the point of even having a sched_clock(). If it's jiffy resolution, it's useless. If it's wildly inaccurate (as it is in the SMP case, monotonicity issues aside) it's more than useless. -Mike ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-03 7:09 ` Mike Galbraith @ 2006-05-03 7:40 ` Andi Kleen 2006-05-03 9:11 ` Mike Galbraith 2006-05-07 12:33 ` Nick Piggin 1 sibling, 1 reply; 41+ messages in thread From: Andi Kleen @ 2006-05-03 7:40 UTC (permalink / raw) To: Mike Galbraith Cc: Nick Piggin, Christopher Friesen, Russell King, linux-kernel On Wednesday 03 May 2006 09:09, Mike Galbraith wrote: > Given that most people are going to end up using the pm_timer anyway, I > don't see the point of even having a sched_clock(). If it's jiffy > resolution, it's useless. If it's wildly inaccurate (as it is in the > SMP case, monotonicity issues aside) it's more than useless. For sched_clock TSC is always used and it's fine - sched_clock doesn't require the guarantees that make TSC often useless otherwise e.g. the scheduler is designed to only use it on one CPU and can also tolerate variations scaling with frequency. -Andi ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-03 7:40 ` Andi Kleen @ 2006-05-03 9:11 ` Mike Galbraith 2006-05-03 9:16 ` Andi Kleen 0 siblings, 1 reply; 41+ messages in thread From: Mike Galbraith @ 2006-05-03 9:11 UTC (permalink / raw) To: Andi Kleen; +Cc: Nick Piggin, Christopher Friesen, Russell King, linux-kernel On Wed, 2006-05-03 at 09:40 +0200, Andi Kleen wrote: > On Wednesday 03 May 2006 09:09, Mike Galbraith wrote: > > > Given that most people are going to end up using the pm_timer anyway, I > > don't see the point of even having a sched_clock(). If it's jiffy > > resolution, it's useless. If it's wildly inaccurate (as it is in the > > SMP case, monotonicity issues aside) it's more than useless. > > For sched_clock TSC is always used and it's fine - sched_clock > doesn't require the guarantees that make TSC often useless otherwise Regrettable, that's not true. no command line now: 4294742814000000 X:6906->timestamp: 4294742813000000 now: 4294743815000000 konqueror:7409->timestamp: 4294743814000000 now: 4294744816000000 kicker:7352->timestamp: 4294744815000000 now: 4294745817000000 konsole:7363->timestamp: 4294745815000000 now: 4294746818000000 konqueror:7409->timestamp: 4294746817000000 now: 4294747819000000 kmix:7388->timestamp: 4294747818000000 now: 4294748820000000 kmix:7388->timestamp: 4294748818000000 now: 4294749821000000 konsole:7363->timestamp: 4294749820000000 command line clock=tsc now: 124079605551 gconfd-2:7372->timestamp: 124079563934 now: 125079899394 swapper:0->timestamp: 125077929715 now: 126080194639 swapper:0->timestamp: 126077228724 now: 127080489088 swapper:0->timestamp: 127077510347 now: 128080784525 swapper:0->timestamp: 128080615408 now: 129081079685 swapper:0->timestamp: 129080104338 now: 130081375553 evolution:7440->timestamp: 130080376731 If the (expletive deleted) pm timer is enabled in your .config (it is enabled by default and shall forever remain enabled for all who don't know the incantation to make pm timer option appear in acpi menu) it is used unless specifically overridden, which defeats the original purpose for ever having a high resolution sched_clock(). -Mike ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-03 9:11 ` Mike Galbraith @ 2006-05-03 9:16 ` Andi Kleen 2006-05-03 9:31 ` Mike Galbraith 0 siblings, 1 reply; 41+ messages in thread From: Andi Kleen @ 2006-05-03 9:16 UTC (permalink / raw) To: Mike Galbraith Cc: Nick Piggin, Christopher Friesen, Russell King, linux-kernel On Wednesday 03 May 2006 11:11, Mike Galbraith wrote: > On Wed, 2006-05-03 at 09:40 +0200, Andi Kleen wrote: > > On Wednesday 03 May 2006 09:09, Mike Galbraith wrote: > > > > > Given that most people are going to end up using the pm_timer anyway, I > > > don't see the point of even having a sched_clock(). If it's jiffy > > > resolution, it's useless. If it's wildly inaccurate (as it is in the > > > SMP case, monotonicity issues aside) it's more than useless. > > > > For sched_clock TSC is always used and it's fine - sched_clock > > doesn't require the guarantees that make TSC often useless otherwise > > Regrettable, that's not true. Hmm, maybe I'm thinking too much x86-64. At least on x86-64 it's true. I don't see a big reason to not do this on i386 either, except on systems that truly don't have a TSC (386/486) Ok i suppose if you don't want cruft you can always go to 64bit @) -Andi ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-03 9:16 ` Andi Kleen @ 2006-05-03 9:31 ` Mike Galbraith 0 siblings, 0 replies; 41+ messages in thread From: Mike Galbraith @ 2006-05-03 9:31 UTC (permalink / raw) To: Andi Kleen; +Cc: Nick Piggin, Christopher Friesen, Russell King, linux-kernel On Wed, 2006-05-03 at 11:16 +0200, Andi Kleen wrote: > On Wednesday 03 May 2006 11:11, Mike Galbraith wrote: > > On Wed, 2006-05-03 at 09:40 +0200, Andi Kleen wrote: > > > On Wednesday 03 May 2006 09:09, Mike Galbraith wrote: > > > > > > > Given that most people are going to end up using the pm_timer anyway, I > > > > don't see the point of even having a sched_clock(). If it's jiffy > > > > resolution, it's useless. If it's wildly inaccurate (as it is in the > > > > SMP case, monotonicity issues aside) it's more than useless. > > > > > > For sched_clock TSC is always used and it's fine - sched_clock > > > doesn't require the guarantees that make TSC often useless otherwise > > > > Regrettable, that's not true. > > Hmm, maybe I'm thinking too much x86-64. At least on x86-64 it's true. > > I don't see a big reason to not do this on i386 either, except > on systems that truly don't have a TSC (386/486) It should be this way on any system that has a half way functional high resolution source. Without it, the starvation scenario which sched_clock() was invented to solve returns. Making that the default wasn't (um um um) the most brilliant selection among available options. > Ok i suppose if you don't want cruft you can always go to 64bit @) Unemployed guys can't buy new toys without wives getting all grumpy ;-) -Mike ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-03 7:09 ` Mike Galbraith 2006-05-03 7:40 ` Andi Kleen @ 2006-05-07 12:33 ` Nick Piggin 2006-05-07 12:43 ` Russell King 2006-05-07 17:32 ` Mike Galbraith 1 sibling, 2 replies; 41+ messages in thread From: Nick Piggin @ 2006-05-07 12:33 UTC (permalink / raw) To: Mike Galbraith Cc: Andi Kleen, Christopher Friesen, Russell King, linux-kernel Mike Galbraith wrote: > On Wed, 2006-05-03 at 03:07 +1000, Nick Piggin wrote: > >>Other problem is that some people didn't RTFM and have started trying to >>use it for precise accounting :( > > > Are you talking about me perchance? I don't really care about precision > _that_ much, though I certainly do want to tighten timeslice accounting. No, sched_clock is fine to be used in CPU scheduling choices, which are heuristic anyway (although strictly speaking, even using it for timeslicing within a single CPU could cause slight unfairness). I'm talking about the update_cpu_clock() / task_struct->sched_time stuff. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-07 12:33 ` Nick Piggin @ 2006-05-07 12:43 ` Russell King 2006-05-07 12:56 ` Nick Piggin 2006-05-07 17:37 ` Mike Galbraith 2006-05-07 17:32 ` Mike Galbraith 1 sibling, 2 replies; 41+ messages in thread From: Russell King @ 2006-05-07 12:43 UTC (permalink / raw) To: Nick Piggin; +Cc: Mike Galbraith, Andi Kleen, Christopher Friesen, linux-kernel On Sun, May 07, 2006 at 10:33:41PM +1000, Nick Piggin wrote: > Mike Galbraith wrote: > >On Wed, 2006-05-03 at 03:07 +1000, Nick Piggin wrote: > > > >>Other problem is that some people didn't RTFM and have started trying to > >>use it for precise accounting :( > > > > > >Are you talking about me perchance? I don't really care about precision > >_that_ much, though I certainly do want to tighten timeslice accounting. > > No, sched_clock is fine to be used in CPU scheduling choices, which are > heuristic anyway (although strictly speaking, even using it for timeslicing > within a single CPU could cause slight unfairness). Except maybe if it rolls over every 178 seconds, which is my original point. Maybe someone could comment on my initial patch sent 5 days ago? -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-07 12:43 ` Russell King @ 2006-05-07 12:56 ` Nick Piggin 2006-05-07 13:00 ` Nick Piggin 2006-05-07 17:37 ` Mike Galbraith 1 sibling, 1 reply; 41+ messages in thread From: Nick Piggin @ 2006-05-07 12:56 UTC (permalink / raw) To: Russell King Cc: Mike Galbraith, Andi Kleen, Christopher Friesen, linux-kernel Russell King wrote: > On Sun, May 07, 2006 at 10:33:41PM +1000, Nick Piggin wrote: > >>No, sched_clock is fine to be used in CPU scheduling choices, which are >>heuristic anyway (although strictly speaking, even using it for timeslicing >>within a single CPU could cause slight unfairness). > > > Except maybe if it rolls over every 178 seconds, which is my original > point. Maybe someone could comment on my initial patch sent 5 days > ago? Well yes that's true. I meant the "sched_clock interface as defined". Now there are obviously issues (including the one you raised) that makes the sched_clock interface unreasonable to implement. I stand by my first reply to your comment WRT the API. Seems like most of the rest of the debate was unrelated or concerning implementation details. kernel/sched.c patches implementing the new API would get an ack from me. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-07 12:56 ` Nick Piggin @ 2006-05-07 13:00 ` Nick Piggin 2006-05-07 13:18 ` Russell King 0 siblings, 1 reply; 41+ messages in thread From: Nick Piggin @ 2006-05-07 13:00 UTC (permalink / raw) To: Russell King Cc: Mike Galbraith, Andi Kleen, Christopher Friesen, linux-kernel Nick Piggin wrote: > I stand by my first reply to your comment WRT the API. Actually, on rereading, it seems like I was a bit confused about your proposal. I don't think you specified anyway the units returned by your new sched_clock(). So it is identical to my "corrected" interface :\ -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-07 13:00 ` Nick Piggin @ 2006-05-07 13:18 ` Russell King 2006-05-07 13:30 ` Nick Piggin 0 siblings, 1 reply; 41+ messages in thread From: Russell King @ 2006-05-07 13:18 UTC (permalink / raw) To: Nick Piggin; +Cc: Mike Galbraith, Andi Kleen, Christopher Friesen, linux-kernel On Sun, May 07, 2006 at 11:00:29PM +1000, Nick Piggin wrote: > Nick Piggin wrote: > > >I stand by my first reply to your comment WRT the API. > > Actually, on rereading, it seems like I was a bit confused about > your proposal. I don't think you specified anyway the units > returned by your new sched_clock(). So it is identical to my > "corrected" interface :\ Okay, so that presumably means we have to either stick with what we currently have, or go the whole hog and re-implement the sched_clock() support? IOW, my patch on 2nd May isn't of any use as it currently stands? -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-07 13:18 ` Russell King @ 2006-05-07 13:30 ` Nick Piggin 2006-05-07 13:55 ` Russell King 2006-05-07 16:53 ` Russell King 0 siblings, 2 replies; 41+ messages in thread From: Nick Piggin @ 2006-05-07 13:30 UTC (permalink / raw) To: Russell King Cc: Mike Galbraith, Andi Kleen, Christopher Friesen, linux-kernel Russell King wrote: > On Sun, May 07, 2006 at 11:00:29PM +1000, Nick Piggin wrote: > >>Nick Piggin wrote: >> >> >>>I stand by my first reply to your comment WRT the API. >> >>Actually, on rereading, it seems like I was a bit confused about >>your proposal. I don't think you specified anyway the units >>returned by your new sched_clock(). So it is identical to my >>"corrected" interface :\ > > > Okay, so that presumably means we have to either stick with what we > currently have, or go the whole hog and re-implement the sched_clock() > support? > > IOW, my patch on 2nd May isn't of any use as it currently stands? IMO it would probably be best to try to re implement it in one go. It shouldn't have spread too far out of kernel/sched.c, and the arch code should mostly be implementable in terms of their sched_clock(). Mundane but not difficult. Making arch code actually try to do the right thing may require a bit more thinking, to handle both the variable time counter issue and your time counter wrap problem. That wouldn't be your problem though, outside arch/arm/ -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-07 13:30 ` Nick Piggin @ 2006-05-07 13:55 ` Russell King 2006-05-07 14:04 ` Nick Piggin 2006-05-07 16:53 ` Russell King 1 sibling, 1 reply; 41+ messages in thread From: Russell King @ 2006-05-07 13:55 UTC (permalink / raw) To: Nick Piggin; +Cc: Mike Galbraith, Andi Kleen, Christopher Friesen, linux-kernel On Sun, May 07, 2006 at 11:30:15PM +1000, Nick Piggin wrote: > Russell King wrote: > >On Sun, May 07, 2006 at 11:00:29PM +1000, Nick Piggin wrote: > > > >>Nick Piggin wrote: > >> > >> > >>>I stand by my first reply to your comment WRT the API. > >> > >>Actually, on rereading, it seems like I was a bit confused about > >>your proposal. I don't think you specified anyway the units > >>returned by your new sched_clock(). So it is identical to my > >>"corrected" interface :\ > > > > > >Okay, so that presumably means we have to either stick with what we > >currently have, or go the whole hog and re-implement the sched_clock() > >support? > > > >IOW, my patch on 2nd May isn't of any use as it currently stands? > > IMO it would probably be best to try to re implement it in one go. > It shouldn't have spread too far out of kernel/sched.c, and the arch > code should mostly be implementable in terms of their sched_clock(). > Mundane but not difficult. > > Making arch code actually try to do the right thing may require a > bit more thinking, to handle both the variable time counter issue > and your time counter wrap problem. That wouldn't be your problem > though, outside arch/arm/ Also, any comments on update_cpu_clock() and current_sched_time() both appearing to be buggy, or am I barking up the wrong tree with those? -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-07 13:55 ` Russell King @ 2006-05-07 14:04 ` Nick Piggin 2006-05-07 16:03 ` Andi Kleen 0 siblings, 1 reply; 41+ messages in thread From: Nick Piggin @ 2006-05-07 14:04 UTC (permalink / raw) To: Russell King Cc: Mike Galbraith, Andi Kleen, Christopher Friesen, linux-kernel Russell King wrote: > Also, any comments on update_cpu_clock() and current_sched_time() both > appearing to be buggy, or am I barking up the wrong tree with those? Can't remember off the top of my head who put those in, sorry. Aside from the fact that they appear to be fundamentally buggy anyway because we don't require exact ns intervals from sched_clock() at the best of times, I think your wrapping fix for them looks correct. Thanks, Nick -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-07 14:04 ` Nick Piggin @ 2006-05-07 16:03 ` Andi Kleen 0 siblings, 0 replies; 41+ messages in thread From: Andi Kleen @ 2006-05-07 16:03 UTC (permalink / raw) To: Nick Piggin Cc: Russell King, Mike Galbraith, Christopher Friesen, linux-kernel On Sunday 07 May 2006 16:04, Nick Piggin wrote: > Russell King wrote: > > > Also, any comments on update_cpu_clock() and current_sched_time() both > > appearing to be buggy, or am I barking up the wrong tree with those? > > Can't remember off the top of my head who put those in, sorry. > > Aside from the fact that they appear to be fundamentally buggy anyway > because we don't require exact ns intervals from sched_clock() at the > best of times, I think your wrapping fix for them looks correct. I must say I always hated them because adding code to the context switch for such an obscure POSIX bu^w"feature" is just a bad idea. Maybe it would be best to remove it again instead of adding the scaling needed to make it actually work (since we got along with such a broken implementation for so long I would guess that nobody uses these timers anyways) -Andi ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-07 13:30 ` Nick Piggin 2006-05-07 13:55 ` Russell King @ 2006-05-07 16:53 ` Russell King 2006-05-07 17:52 ` Mike Galbraith 1 sibling, 1 reply; 41+ messages in thread From: Russell King @ 2006-05-07 16:53 UTC (permalink / raw) To: Nick Piggin; +Cc: Mike Galbraith, Andi Kleen, Christopher Friesen, linux-kernel On Sun, May 07, 2006 at 11:30:15PM +1000, Nick Piggin wrote: > Russell King wrote: > >On Sun, May 07, 2006 at 11:00:29PM +1000, Nick Piggin wrote: > > > >>Nick Piggin wrote: > >> > >> > >>>I stand by my first reply to your comment WRT the API. > >> > >>Actually, on rereading, it seems like I was a bit confused about > >>your proposal. I don't think you specified anyway the units > >>returned by your new sched_clock(). So it is identical to my > >>"corrected" interface :\ > > > > > >Okay, so that presumably means we have to either stick with what we > >currently have, or go the whole hog and re-implement the sched_clock() > >support? > > > >IOW, my patch on 2nd May isn't of any use as it currently stands? > > IMO it would probably be best to try to re implement it in one go. > It shouldn't have spread too far out of kernel/sched.c, and the arch > code should mostly be implementable in terms of their sched_clock(). > Mundane but not difficult. Having looked at this several times over the last couple of days, I've come to the conclusion that I'm not the right person to fix this problem. I've tried several methods of converting the code, but every time I remain unconvinced that the changes are provably correct as far as not missing something, so I end up throwing the changes away and starting again. Yes, I admit defeat. Maybe someone who cares about this stuff[1] (or who sees the problem) should look into it. [1] - eg, the original poster on linux-arm-kernel who indirectly pointed out the sched_clock() issue. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-07 16:53 ` Russell King @ 2006-05-07 17:52 ` Mike Galbraith 0 siblings, 0 replies; 41+ messages in thread From: Mike Galbraith @ 2006-05-07 17:52 UTC (permalink / raw) To: Russell King; +Cc: Nick Piggin, Andi Kleen, Christopher Friesen, linux-kernel On Sun, 2006-05-07 at 17:53 +0100, Russell King wrote: > On Sun, May 07, 2006 at 11:30:15PM +1000, Nick Piggin wrote: > > Russell King wrote: > > >On Sun, May 07, 2006 at 11:00:29PM +1000, Nick Piggin wrote: > > > > > >>Nick Piggin wrote: > > >> > > >> > > >>>I stand by my first reply to your comment WRT the API. > > >> > > >>Actually, on rereading, it seems like I was a bit confused about > > >>your proposal. I don't think you specified anyway the units > > >>returned by your new sched_clock(). So it is identical to my > > >>"corrected" interface :\ > > > > > > > > >Okay, so that presumably means we have to either stick with what we > > >currently have, or go the whole hog and re-implement the sched_clock() > > >support? > > > > > >IOW, my patch on 2nd May isn't of any use as it currently stands? > > > > IMO it would probably be best to try to re implement it in one go. > > It shouldn't have spread too far out of kernel/sched.c, and the arch > > code should mostly be implementable in terms of their sched_clock(). > > Mundane but not difficult. > > Having looked at this several times over the last couple of days, I've > come to the conclusion that I'm not the right person to fix this problem. > I've tried several methods of converting the code, but every time I > remain unconvinced that the changes are provably correct as far as not > missing something, so I end up throwing the changes away and starting > again. > > Yes, I admit defeat. Ah, you feel my pain. I look forward to the continuous nanosecond clock that the time guys are currently talking about, and will hopefully _not_ decide is not needed. -Mike ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-07 12:43 ` Russell King 2006-05-07 12:56 ` Nick Piggin @ 2006-05-07 17:37 ` Mike Galbraith 1 sibling, 0 replies; 41+ messages in thread From: Mike Galbraith @ 2006-05-07 17:37 UTC (permalink / raw) To: Russell King; +Cc: Nick Piggin, Andi Kleen, Christopher Friesen, linux-kernel On Sun, 2006-05-07 at 13:43 +0100, Russell King wrote: > On Sun, May 07, 2006 at 10:33:41PM +1000, Nick Piggin wrote: > > Mike Galbraith wrote: > > >On Wed, 2006-05-03 at 03:07 +1000, Nick Piggin wrote: > > > > > >>Other problem is that some people didn't RTFM and have started trying to > > >>use it for precise accounting :( > > > > > > > > >Are you talking about me perchance? I don't really care about precision > > >_that_ much, though I certainly do want to tighten timeslice accounting. > > > > No, sched_clock is fine to be used in CPU scheduling choices, which are > > heuristic anyway (although strictly speaking, even using it for timeslicing > > within a single CPU could cause slight unfairness). > > Except maybe if it rolls over every 178 seconds, which is my original > point. Maybe someone could comment on my initial patch sent 5 days > ago? Simply ignore the wrap... unless you have a scenario where the wrap event itself is significant event. -Mike ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-07 12:33 ` Nick Piggin 2006-05-07 12:43 ` Russell King @ 2006-05-07 17:32 ` Mike Galbraith 2006-05-08 4:14 ` Mike Galbraith 1 sibling, 1 reply; 41+ messages in thread From: Mike Galbraith @ 2006-05-07 17:32 UTC (permalink / raw) To: Nick Piggin; +Cc: Andi Kleen, Christopher Friesen, Russell King, linux-kernel On Sun, 2006-05-07 at 22:33 +1000, Nick Piggin wrote: > Mike Galbraith wrote: > > On Wed, 2006-05-03 at 03:07 +1000, Nick Piggin wrote: > > > >>Other problem is that some people didn't RTFM and have started trying to > >>use it for precise accounting :( > > > > > > Are you talking about me perchance? I don't really care about precision > > _that_ much, though I certainly do want to tighten timeslice accounting. > > No, sched_clock is fine to be used in CPU scheduling choices, which are > heuristic anyway (although strictly speaking, even using it for timeslicing > within a single CPU could cause slight unfairness). > > I'm talking about the update_cpu_clock() / task_struct->sched_time stuff. Oh. I kinda sorta agree. If the continuous nanosecond thing happens, it'll make things _much_ easier. I actually have exactly one testcase that actually requires nanoseconds as things stand, and that one I've worked around in the past. The rest of my desire to increase accuracy stems from instrumenting timeslice enforcement. Statistical at best, and truly sad at worst. -Mike ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-07 17:32 ` Mike Galbraith @ 2006-05-08 4:14 ` Mike Galbraith 2006-05-08 4:37 ` Mike Galbraith 0 siblings, 1 reply; 41+ messages in thread From: Mike Galbraith @ 2006-05-08 4:14 UTC (permalink / raw) To: Nick Piggin; +Cc: Andi Kleen, Christopher Friesen, Russell King, linux-kernel On Sun, 2006-05-07 at 19:32 +0200, Mike Galbraith wrote: > On Sun, 2006-05-07 at 22:33 +1000, Nick Piggin wrote: > > Mike Galbraith wrote: > > > On Wed, 2006-05-03 at 03:07 +1000, Nick Piggin wrote: > > > > > >>Other problem is that some people didn't RTFM and have started trying to > > >>use it for precise accounting :( > > > > > > > > > Are you talking about me perchance? I don't really care about precision > > > _that_ much, though I certainly do want to tighten timeslice accounting. > > > > No, sched_clock is fine to be used in CPU scheduling choices, which are > > heuristic anyway (although strictly speaking, even using it for timeslicing > > within a single CPU could cause slight unfairness). > > > > I'm talking about the update_cpu_clock() / task_struct->sched_time stuff. > > Oh. I kinda sorta agree. Here's part of the reason why. The below is from update_cpu_clock() when now <= last. I print now, last_tick, timestamp, and a total timewarp count. This is a UP kernel, CPU is 3GHz P4 running flat out, no clockmod (though it is compiled in). now: 28781601231 tick: 28781610060 stamp: 28781484480 total: 111 now: 39653939737 tick: 39653953110 stamp: 39652958475 total: 112 now: 48382789863 tick: 48382802652 stamp: 48382788437 total: 113 now: 64846190533 tick: 64846203307 stamp: 64846185356 total: 114 now: 87418287278 tick: 87418298913 stamp: 87417324842 total: 115 now: 88679398808 tick: 88679409618 stamp: 88679395865 total: 116 now: 97397256650 tick: 97397270613 stamp: 97397254208 total: 117 now: 108467457132 tick: 108467470627 stamp: 108467410356 total: 118 now: 217186858841 tick: 217186868844 stamp: 217186858049 total: 119 now: 267909122394 tick: 267909132306 stamp: 267909120976 total: 120 now: 432481173047 tick: 432481183477 stamp: 432481169893 total: 121 now: 435389124209 tick: 435389134985 stamp: 435389107999 total: 122 now: 437341748522 tick: 437341757801 stamp: 437341714053 total: 124 now: 472829745453 tick: 472829755435 stamp: 472829743083 total: 125 Most happen at boot, but they continue to happen occasionally. Even tossing these in the bit bucket, I've given up on timeslice accuracy, because the numbers just don't add up right. This and the pm timer thing make sched_clock() fairly annoying. -Mike ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-08 4:14 ` Mike Galbraith @ 2006-05-08 4:37 ` Mike Galbraith 2006-05-08 4:46 ` Nick Piggin 0 siblings, 1 reply; 41+ messages in thread From: Mike Galbraith @ 2006-05-08 4:37 UTC (permalink / raw) To: Nick Piggin; +Cc: Andi Kleen, Christopher Friesen, Russell King, linux-kernel On Mon, 2006-05-08 at 06:15 +0200, Mike Galbraith wrote: > On Sun, 2006-05-07 at 19:32 +0200, Mike Galbraith wrote: > > On Sun, 2006-05-07 at 22:33 +1000, Nick Piggin wrote: > > > Mike Galbraith wrote: > > > > On Wed, 2006-05-03 at 03:07 +1000, Nick Piggin wrote: > > > > > > > >>Other problem is that some people didn't RTFM and have started trying to > > > >>use it for precise accounting :( > > > > > > > > > > > > Are you talking about me perchance? I don't really care about precision > > > > _that_ much, though I certainly do want to tighten timeslice accounting. > > > > > > No, sched_clock is fine to be used in CPU scheduling choices, which are > > > heuristic anyway (although strictly speaking, even using it for timeslicing > > > within a single CPU could cause slight unfairness). > > > > > > I'm talking about the update_cpu_clock() / task_struct->sched_time stuff. > > > > Oh. I kinda sorta agree. > Sorry for yet another reply, but running the old starvation testcase that caused sched_clock() to be born in the first place tickled my funny-bone. With that running and hitting 300k context switches... now: 2100508962835 tick: 2100508972067 stamp: 2100508961220 total: 2906 now: 2101531243883 tick: 2101531251877 stamp: 2101531238543 total: 2924 now: 2102695422392 tick: 2102695431699 stamp: 2102695418265 total: 2940 Accounting? Not :) -Mike ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-08 4:37 ` Mike Galbraith @ 2006-05-08 4:46 ` Nick Piggin 2006-05-08 5:24 ` Mike Galbraith 0 siblings, 1 reply; 41+ messages in thread From: Nick Piggin @ 2006-05-08 4:46 UTC (permalink / raw) To: Mike Galbraith Cc: Andi Kleen, Christopher Friesen, Russell King, linux-kernel Mike Galbraith wrote: >Sorry for yet another reply, but running the old starvation testcase >that caused sched_clock() to be born in the first place tickled my >funny-bone. With that running and hitting 300k context switches... > >now: 2100508962835 tick: 2100508972067 stamp: 2100508961220 total: 2906 >now: 2101531243883 tick: 2101531251877 stamp: 2101531238543 total: 2924 >now: 2102695422392 tick: 2102695431699 stamp: 2102695418265 total: 2940 > >Accounting? Not :) > Yeah I agree with Andi that this accounting stuff is probably done for some POSIX conformance that doesn't matter. Actually it is worse than that because if anyone _really_ did need it, then they'll get a horrible surprise when their system mysteriously fails in production. It should either get ripped out, or perhaps converted to use jiffies until a sane high resolution, low overhead scheme is developed (if ever). And that would exclude something that does this accounting in fastpaths for the 99.99% of processes that never use it. -- Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-08 4:46 ` Nick Piggin @ 2006-05-08 5:24 ` Mike Galbraith 2006-05-08 5:30 ` Nick Piggin 0 siblings, 1 reply; 41+ messages in thread From: Mike Galbraith @ 2006-05-08 5:24 UTC (permalink / raw) To: Nick Piggin; +Cc: Andi Kleen, Christopher Friesen, Russell King, linux-kernel On Mon, 2006-05-08 at 14:46 +1000, Nick Piggin wrote: > It should either get ripped out, or perhaps converted to use jiffies > until a sane high resolution, low overhead scheme is developed (if > ever). And that would exclude something that does this accounting in > fastpaths for the 99.99% of processes that never use it. The accounting is really light compared to the interactivity part. That doesn't need to be in the fast path, and in my tree it isn't. FWIW (0), yy tree is missing every last shred of the interactivity code, and not missing it one bit. [root@Homer]:> diffstat xx sched.c | 480 +++++++++++++++++++++++++++++----------------------------------- 1 files changed, 219 insertions(+), 261 deletions(-) And that's with full throttling, and absolute starvation proofing. Ho hum. Back to work on my never-going-anywhere-but-fun tree :) later, -Mike (shutting the hell up now;) ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-08 5:24 ` Mike Galbraith @ 2006-05-08 5:30 ` Nick Piggin 0 siblings, 0 replies; 41+ messages in thread From: Nick Piggin @ 2006-05-08 5:30 UTC (permalink / raw) To: Mike Galbraith Cc: Andi Kleen, Christopher Friesen, Russell King, linux-kernel Mike Galbraith wrote: >On Mon, 2006-05-08 at 14:46 +1000, Nick Piggin wrote: > >>It should either get ripped out, or perhaps converted to use jiffies >>until a sane high resolution, low overhead scheme is developed (if >>ever). And that would exclude something that does this accounting in >>fastpaths for the 99.99% of processes that never use it. >> > >The accounting is really light compared to the interactivity part. That >doesn't need to be in the fast path, and in my tree it isn't. > And it is worse than useless if it is wrong. > >FWIW (0), yy tree is missing every last shred of the interactivity code, >and not missing it one bit. > Join the club ;) > >[root@Homer]:> diffstat xx > sched.c | 480 >+++++++++++++++++++++++++++++----------------------------------- > 1 files changed, 219 insertions(+), 261 deletions(-) > >And that's with full throttling, and absolute starvation proofing. > >Ho hum. Back to work on my never-going-anywhere-but-fun tree :) > Join the club ;) Nah, the interactivity stuff isn't too bad (as a function of bug reports to lkml). One day when there is nothing else wrong with the kernel, it will probably end up getting reworked. No need to stir it up now though, really. -- Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: sched_clock() uses are broken 2006-05-02 13:29 sched_clock() uses are broken Russell King 2006-05-02 14:21 ` Nick Piggin 2006-05-02 16:43 ` Andi Kleen @ 2006-05-04 20:02 ` Florian Paul Schmidt 2 siblings, 0 replies; 41+ messages in thread From: Florian Paul Schmidt @ 2006-05-04 20:02 UTC (permalink / raw) To: Russell King; +Cc: Linux Kernel List On Tue, 2 May 2006 14:29:53 +0100 Russell King <rmk+lkml@arm.linux.org.uk> wrote: > However, this is not the case. On x86 with TSC, it returns a 54 bit > number. This means that when t1 < t0, time_passed_ns becomes a very > large number which no longer represents the amount of time. Hi, is this specific to the sched_clock() or does the rdtsc on these systems only use 54 bit? Flo -- Palimm Palimm! http://tapas.affenbande.org ^ permalink raw reply [flat|nested] 41+ messages in thread
end of thread, other threads:[~2006-05-08 5:30 UTC | newest] Thread overview: 41+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-05-02 13:29 sched_clock() uses are broken Russell King 2006-05-02 14:21 ` Nick Piggin 2006-05-02 16:43 ` Andi Kleen 2006-05-02 16:50 ` Russell King 2006-05-02 17:01 ` Andi Kleen 2006-05-02 17:18 ` Nicolas Pitre 2006-05-02 18:55 ` Russell King 2006-05-02 19:05 ` Nicolas Pitre 2006-05-02 19:08 ` Russell King 2006-05-02 19:23 ` Nicolas Pitre 2006-05-02 21:35 ` Russell King 2006-05-02 17:15 ` Nicolas Pitre 2006-05-04 3:50 ` George Anzinger 2006-05-04 14:18 ` Nicolas Pitre 2006-05-02 16:54 ` Christopher Friesen 2006-05-02 16:59 ` Andi Kleen 2006-05-02 17:07 ` Nick Piggin 2006-05-03 7:09 ` Mike Galbraith 2006-05-03 7:40 ` Andi Kleen 2006-05-03 9:11 ` Mike Galbraith 2006-05-03 9:16 ` Andi Kleen 2006-05-03 9:31 ` Mike Galbraith 2006-05-07 12:33 ` Nick Piggin 2006-05-07 12:43 ` Russell King 2006-05-07 12:56 ` Nick Piggin 2006-05-07 13:00 ` Nick Piggin 2006-05-07 13:18 ` Russell King 2006-05-07 13:30 ` Nick Piggin 2006-05-07 13:55 ` Russell King 2006-05-07 14:04 ` Nick Piggin 2006-05-07 16:03 ` Andi Kleen 2006-05-07 16:53 ` Russell King 2006-05-07 17:52 ` Mike Galbraith 2006-05-07 17:37 ` Mike Galbraith 2006-05-07 17:32 ` Mike Galbraith 2006-05-08 4:14 ` Mike Galbraith 2006-05-08 4:37 ` Mike Galbraith 2006-05-08 4:46 ` Nick Piggin 2006-05-08 5:24 ` Mike Galbraith 2006-05-08 5:30 ` Nick Piggin 2006-05-04 20:02 ` Florian Paul Schmidt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox