* [PATCH] sched: avoid unnecessary overflow in sched_clock
@ 2011-11-15 21:59 Salman Qazi
2011-11-15 22:12 ` Salman Qazi
0 siblings, 1 reply; 9+ messages in thread
From: Salman Qazi @ 2011-11-15 21:59 UTC (permalink / raw)
To: John Stultz, Salman Qazi, LKML, Peter Zijlstra
In hundreds of days, the __cycles_2_ns calculation in sched_clock
has an overflow. cyc * per_cpu(cyc2ns, cpu) exceeds 64 bits, causing
the final value to become zero. We can solve this without losing
any precision.
We can decompose TSC into quotient and remainder of division by the
scale factor, and then use this to convert TSC into nanoseconds.
---
arch/x86/include/asm/timer.h | 23 ++++++++++++++++++++++-
1 files changed, 22 insertions(+), 1 deletions(-)
diff --git a/arch/x86/include/asm/timer.h b/arch/x86/include/asm/timer.h
index fa7b917..431793e 100644
--- a/arch/x86/include/asm/timer.h
+++ b/arch/x86/include/asm/timer.h
@@ -32,6 +32,22 @@ extern int no_timer_check;
* (mathieu.desnoyers@polymtl.ca)
*
* -johnstul@us.ibm.com "math is hard, lets go shopping!"
+ *
+ * In:
+ *
+ * ns = cycles * cyc2ns_scale / SC
+ *
+ * Although we may still have enough bits to store the value of ns,
+ * in some cases, we may not have enough bits to store cycles * cyc2ns_scale,
+ * leading to an incorrect result.
+ *
+ * To avoid this, we can decompose 'cycles' into quotient and remainder
+ * of division by SC. Then,
+ *
+ * ns = (quot * SC + rem) * cyc2ns_scale / SC
+ * = quot * cyc2ns_scale + (rem * cyc2ns_scale) / SC
+ *
+ * - sqazi@google.com
*/
DECLARE_PER_CPU(unsigned long, cyc2ns);
@@ -41,9 +57,14 @@ DECLARE_PER_CPU(unsigned long long, cyc2ns_offset);
static inline unsigned long long __cycles_2_ns(unsigned long long cyc)
{
+ unsigned long long quot;
+ unsigned long long rem;
int cpu = smp_processor_id();
unsigned long long ns = per_cpu(cyc2ns_offset, cpu);
- ns += cyc * per_cpu(cyc2ns, cpu) >> CYC2NS_SCALE_FACTOR;
+ quot = (cyc >> CYC2NS_SCALE_FACTOR);
+ rem = cyc & ((1ULL << CYC2NS_SCALE_FACTOR) - 1);
+ ns += quot * per_cpu(cyc2ns, cpu) +
+ ((rem * per_cpu(cyc2ns, cpu)) >> CYC2NS_SCALE_FACTOR);
return ns;
}
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH] sched: avoid unnecessary overflow in sched_clock
2011-11-15 21:59 [PATCH] sched: avoid unnecessary overflow in sched_clock Salman Qazi
@ 2011-11-15 22:12 ` Salman Qazi
2011-11-15 23:02 ` john stultz
2011-11-18 23:45 ` [tip:sched/urgent] sched, x86: Avoid " tip-bot for Salman Qazi
0 siblings, 2 replies; 9+ messages in thread
From: Salman Qazi @ 2011-11-15 22:12 UTC (permalink / raw)
To: John Stultz, Salman Qazi, Ingo Molnar, LKML, Peter Zijlstra
(Added the missing signed-off-by line)
In hundreds of days, the __cycles_2_ns calculation in sched_clock
has an overflow. cyc * per_cpu(cyc2ns, cpu) exceeds 64 bits, causing
the final value to become zero. We can solve this without losing
any precision.
We can decompose TSC into quotient and remainder of division by the
scale factor, and then use this to convert TSC into nanoseconds.
Signed-off-by: Salman Qazi <sqazi@google.com>
---
arch/x86/include/asm/timer.h | 23 ++++++++++++++++++++++-
1 files changed, 22 insertions(+), 1 deletions(-)
diff --git a/arch/x86/include/asm/timer.h b/arch/x86/include/asm/timer.h
index fa7b917..431793e 100644
--- a/arch/x86/include/asm/timer.h
+++ b/arch/x86/include/asm/timer.h
@@ -32,6 +32,22 @@ extern int no_timer_check;
* (mathieu.desnoyers@polymtl.ca)
*
* -johnstul@us.ibm.com "math is hard, lets go shopping!"
+ *
+ * In:
+ *
+ * ns = cycles * cyc2ns_scale / SC
+ *
+ * Although we may still have enough bits to store the value of ns,
+ * in some cases, we may not have enough bits to store cycles * cyc2ns_scale,
+ * leading to an incorrect result.
+ *
+ * To avoid this, we can decompose 'cycles' into quotient and remainder
+ * of division by SC. Then,
+ *
+ * ns = (quot * SC + rem) * cyc2ns_scale / SC
+ * = quot * cyc2ns_scale + (rem * cyc2ns_scale) / SC
+ *
+ * - sqazi@google.com
*/
DECLARE_PER_CPU(unsigned long, cyc2ns);
@@ -41,9 +57,14 @@ DECLARE_PER_CPU(unsigned long long, cyc2ns_offset);
static inline unsigned long long __cycles_2_ns(unsigned long long cyc)
{
+ unsigned long long quot;
+ unsigned long long rem;
int cpu = smp_processor_id();
unsigned long long ns = per_cpu(cyc2ns_offset, cpu);
- ns += cyc * per_cpu(cyc2ns, cpu) >> CYC2NS_SCALE_FACTOR;
+ quot = (cyc >> CYC2NS_SCALE_FACTOR);
+ rem = cyc & ((1ULL << CYC2NS_SCALE_FACTOR) - 1);
+ ns += quot * per_cpu(cyc2ns, cpu) +
+ ((rem * per_cpu(cyc2ns, cpu)) >> CYC2NS_SCALE_FACTOR);
return ns;
}
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] sched: avoid unnecessary overflow in sched_clock
2011-11-15 22:12 ` Salman Qazi
@ 2011-11-15 23:02 ` john stultz
2011-11-16 0:07 ` Paul Turner
2011-11-16 6:41 ` Mike Galbraith
2011-11-18 23:45 ` [tip:sched/urgent] sched, x86: Avoid " tip-bot for Salman Qazi
1 sibling, 2 replies; 9+ messages in thread
From: john stultz @ 2011-11-15 23:02 UTC (permalink / raw)
To: Salman Qazi; +Cc: Ingo Molnar, LKML, Peter Zijlstra
On Tue, 2011-11-15 at 14:12 -0800, Salman Qazi wrote:
> (Added the missing signed-off-by line)
>
> In hundreds of days, the __cycles_2_ns calculation in sched_clock
> has an overflow. cyc * per_cpu(cyc2ns, cpu) exceeds 64 bits, causing
> the final value to become zero. We can solve this without losing
> any precision.
>
> We can decompose TSC into quotient and remainder of division by the
> scale factor, and then use this to convert TSC into nanoseconds.
>
> Signed-off-by: Salman Qazi <sqazi@google.com>
Acked-by: John Stultz <johnstul@us.ibm.com>
> ---
> arch/x86/include/asm/timer.h | 23 ++++++++++++++++++++++-
> 1 files changed, 22 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/include/asm/timer.h b/arch/x86/include/asm/timer.h
> index fa7b917..431793e 100644
> --- a/arch/x86/include/asm/timer.h
> +++ b/arch/x86/include/asm/timer.h
> @@ -32,6 +32,22 @@ extern int no_timer_check;
> * (mathieu.desnoyers@polymtl.ca)
> *
> * -johnstul@us.ibm.com "math is hard, lets go shopping!"
> + *
> + * In:
> + *
> + * ns = cycles * cyc2ns_scale / SC
> + *
> + * Although we may still have enough bits to store the value of ns,
> + * in some cases, we may not have enough bits to store cycles * cyc2ns_scale,
> + * leading to an incorrect result.
> + *
> + * To avoid this, we can decompose 'cycles' into quotient and remainder
> + * of division by SC. Then,
> + *
> + * ns = (quot * SC + rem) * cyc2ns_scale / SC
> + * = quot * cyc2ns_scale + (rem * cyc2ns_scale) / SC
> + *
> + * - sqazi@google.com
> */
>
> DECLARE_PER_CPU(unsigned long, cyc2ns);
> @@ -41,9 +57,14 @@ DECLARE_PER_CPU(unsigned long long, cyc2ns_offset);
>
> static inline unsigned long long __cycles_2_ns(unsigned long long cyc)
> {
> + unsigned long long quot;
> + unsigned long long rem;
> int cpu = smp_processor_id();
> unsigned long long ns = per_cpu(cyc2ns_offset, cpu);
> - ns += cyc * per_cpu(cyc2ns, cpu) >> CYC2NS_SCALE_FACTOR;
> + quot = (cyc >> CYC2NS_SCALE_FACTOR);
> + rem = cyc & ((1ULL << CYC2NS_SCALE_FACTOR) - 1);
> + ns += quot * per_cpu(cyc2ns, cpu) +
> + ((rem * per_cpu(cyc2ns, cpu)) >> CYC2NS_SCALE_FACTOR);
> return ns;
> }
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] sched: avoid unnecessary overflow in sched_clock
2011-11-15 23:02 ` john stultz
@ 2011-11-16 0:07 ` Paul Turner
2011-11-16 6:41 ` Mike Galbraith
1 sibling, 0 replies; 9+ messages in thread
From: Paul Turner @ 2011-11-16 0:07 UTC (permalink / raw)
To: linux-kernel; +Cc: Salman Qazi, Ingo Molnar, LKML, Peter Zijlstra
On 11/15/2011 03:02 PM, john stultz wrote:
> On Tue, 2011-11-15 at 14:12 -0800, Salman Qazi wrote:
>> (Added the missing signed-off-by line)
>>
>> In hundreds of days, the __cycles_2_ns calculation in sched_clock
>> has an overflow. cyc * per_cpu(cyc2ns, cpu) exceeds 64 bits, causing
>> the final value to become zero. We can solve this without losing
>> any precision.
>>
>> We can decompose TSC into quotient and remainder of division by the
>> scale factor, and then use this to convert TSC into nanoseconds.
>>
>> Signed-off-by: Salman Qazi<sqazi@google.com>
>
> Acked-by: John Stultz<johnstul@us.ibm.com>
>
Reviewed-by: Paul Turner <pjt@google.com>
>
>> ---
>> arch/x86/include/asm/timer.h | 23 ++++++++++++++++++++++-
>> 1 files changed, 22 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/timer.h b/arch/x86/include/asm/timer.h
>> index fa7b917..431793e 100644
>> --- a/arch/x86/include/asm/timer.h
>> +++ b/arch/x86/include/asm/timer.h
>> @@ -32,6 +32,22 @@ extern int no_timer_check;
>> * (mathieu.desnoyers@polymtl.ca)
>> *
>> * -johnstul@us.ibm.com "math is hard, lets go shopping!"
>> + *
>> + * In:
>> + *
>> + * ns = cycles * cyc2ns_scale / SC
>> + *
>> + * Although we may still have enough bits to store the value of ns,
>> + * in some cases, we may not have enough bits to store cycles * cyc2ns_scale,
>> + * leading to an incorrect result.
>> + *
>> + * To avoid this, we can decompose 'cycles' into quotient and remainder
>> + * of division by SC. Then,
>> + *
>> + * ns = (quot * SC + rem) * cyc2ns_scale / SC
>> + * = quot * cyc2ns_scale + (rem * cyc2ns_scale) / SC
>> + *
>> + * - sqazi@google.com
>> */
>>
>> DECLARE_PER_CPU(unsigned long, cyc2ns);
>> @@ -41,9 +57,14 @@ DECLARE_PER_CPU(unsigned long long, cyc2ns_offset);
>>
>> static inline unsigned long long __cycles_2_ns(unsigned long long cyc)
>> {
>> + unsigned long long quot;
>> + unsigned long long rem;
>> int cpu = smp_processor_id();
>> unsigned long long ns = per_cpu(cyc2ns_offset, cpu);
>> - ns += cyc * per_cpu(cyc2ns, cpu)>> CYC2NS_SCALE_FACTOR;
>> + quot = (cyc>> CYC2NS_SCALE_FACTOR);
>> + rem = cyc& ((1ULL<< CYC2NS_SCALE_FACTOR) - 1);
>> + ns += quot * per_cpu(cyc2ns, cpu) +
>> + ((rem * per_cpu(cyc2ns, cpu))>> CYC2NS_SCALE_FACTOR);
>> return ns;
>> }
>>
>>
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] sched: avoid unnecessary overflow in sched_clock
2011-11-15 23:02 ` john stultz
2011-11-16 0:07 ` Paul Turner
@ 2011-11-16 6:41 ` Mike Galbraith
2011-11-16 7:08 ` Paul Turner
1 sibling, 1 reply; 9+ messages in thread
From: Mike Galbraith @ 2011-11-16 6:41 UTC (permalink / raw)
To: john stultz; +Cc: Salman Qazi, Ingo Molnar, LKML, Peter Zijlstra
On Tue, 2011-11-15 at 15:02 -0800, john stultz wrote:
> On Tue, 2011-11-15 at 14:12 -0800, Salman Qazi wrote:
> > (Added the missing signed-off-by line)
> >
> > In hundreds of days, the __cycles_2_ns calculation in sched_clock
> > has an overflow. cyc * per_cpu(cyc2ns, cpu) exceeds 64 bits, causing
> > the final value to become zero. We can solve this without losing
> > any precision.
> >
> > We can decompose TSC into quotient and remainder of division by the
> > scale factor, and then use this to convert TSC into nanoseconds.
> >
> > Signed-off-by: Salman Qazi <sqazi@google.com>
>
> Acked-by: John Stultz <johnstul@us.ibm.com>
This wants a stable tag, no?
-Mike
> > ---
> > arch/x86/include/asm/timer.h | 23 ++++++++++++++++++++++-
> > 1 files changed, 22 insertions(+), 1 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/timer.h b/arch/x86/include/asm/timer.h
> > index fa7b917..431793e 100644
> > --- a/arch/x86/include/asm/timer.h
> > +++ b/arch/x86/include/asm/timer.h
> > @@ -32,6 +32,22 @@ extern int no_timer_check;
> > * (mathieu.desnoyers@polymtl.ca)
> > *
> > * -johnstul@us.ibm.com "math is hard, lets go shopping!"
> > + *
> > + * In:
> > + *
> > + * ns = cycles * cyc2ns_scale / SC
> > + *
> > + * Although we may still have enough bits to store the value of ns,
> > + * in some cases, we may not have enough bits to store cycles * cyc2ns_scale,
> > + * leading to an incorrect result.
> > + *
> > + * To avoid this, we can decompose 'cycles' into quotient and remainder
> > + * of division by SC. Then,
> > + *
> > + * ns = (quot * SC + rem) * cyc2ns_scale / SC
> > + * = quot * cyc2ns_scale + (rem * cyc2ns_scale) / SC
> > + *
> > + * - sqazi@google.com
> > */
> >
> > DECLARE_PER_CPU(unsigned long, cyc2ns);
> > @@ -41,9 +57,14 @@ DECLARE_PER_CPU(unsigned long long, cyc2ns_offset);
> >
> > static inline unsigned long long __cycles_2_ns(unsigned long long cyc)
> > {
> > + unsigned long long quot;
> > + unsigned long long rem;
> > int cpu = smp_processor_id();
> > unsigned long long ns = per_cpu(cyc2ns_offset, cpu);
> > - ns += cyc * per_cpu(cyc2ns, cpu) >> CYC2NS_SCALE_FACTOR;
> > + quot = (cyc >> CYC2NS_SCALE_FACTOR);
> > + rem = cyc & ((1ULL << CYC2NS_SCALE_FACTOR) - 1);
> > + ns += quot * per_cpu(cyc2ns, cpu) +
> > + ((rem * per_cpu(cyc2ns, cpu)) >> CYC2NS_SCALE_FACTOR);
> > return ns;
> > }
> >
> >
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] sched: avoid unnecessary overflow in sched_clock
2011-11-16 6:41 ` Mike Galbraith
@ 2011-11-16 7:08 ` Paul Turner
2011-11-16 8:56 ` Peter Zijlstra
0 siblings, 1 reply; 9+ messages in thread
From: Paul Turner @ 2011-11-16 7:08 UTC (permalink / raw)
To: Mike Galbraith
Cc: john stultz, Salman Qazi, Ingo Molnar, LKML, Peter Zijlstra
On 11/15/2011 10:41 PM, Mike Galbraith wrote:
> On Tue, 2011-11-15 at 15:02 -0800, john stultz wrote:
>> On Tue, 2011-11-15 at 14:12 -0800, Salman Qazi wrote:
>>> (Added the missing signed-off-by line)
>>>
>>> In hundreds of days, the __cycles_2_ns calculation in sched_clock
>>> has an overflow. cyc * per_cpu(cyc2ns, cpu) exceeds 64 bits, causing
>>> the final value to become zero. We can solve this without losing
>>> any precision.
>>>
>>> We can decompose TSC into quotient and remainder of division by the
>>> scale factor, and then use this to convert TSC into nanoseconds.
>>>
>>> Signed-off-by: Salman Qazi<sqazi@google.com>
>>
>> Acked-by: John Stultz<johnstul@us.ibm.com>
>
> This wants a stable tag, no?
>
> -Mike
>
Probably a good idea -- This especially sucks rocks in the sched_clock_stable==1
case; resulting in it coming straight back out of sched_clock_cpu() and trashing
rq->clock.
- Paul
>>> ---
>>> arch/x86/include/asm/timer.h | 23 ++++++++++++++++++++++-
>>> 1 files changed, 22 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/arch/x86/include/asm/timer.h b/arch/x86/include/asm/timer.h
>>> index fa7b917..431793e 100644
>>> --- a/arch/x86/include/asm/timer.h
>>> +++ b/arch/x86/include/asm/timer.h
>>> @@ -32,6 +32,22 @@ extern int no_timer_check;
>>> * (mathieu.desnoyers@polymtl.ca)
>>> *
>>> * -johnstul@us.ibm.com "math is hard, lets go shopping!"
>>> + *
>>> + * In:
>>> + *
>>> + * ns = cycles * cyc2ns_scale / SC
>>> + *
>>> + * Although we may still have enough bits to store the value of ns,
>>> + * in some cases, we may not have enough bits to store cycles * cyc2ns_scale,
>>> + * leading to an incorrect result.
>>> + *
>>> + * To avoid this, we can decompose 'cycles' into quotient and remainder
>>> + * of division by SC. Then,
>>> + *
>>> + * ns = (quot * SC + rem) * cyc2ns_scale / SC
>>> + * = quot * cyc2ns_scale + (rem * cyc2ns_scale) / SC
>>> + *
>>> + * - sqazi@google.com
>>> */
>>>
>>> DECLARE_PER_CPU(unsigned long, cyc2ns);
>>> @@ -41,9 +57,14 @@ DECLARE_PER_CPU(unsigned long long, cyc2ns_offset);
>>>
>>> static inline unsigned long long __cycles_2_ns(unsigned long long cyc)
>>> {
>>> + unsigned long long quot;
>>> + unsigned long long rem;
>>> int cpu = smp_processor_id();
>>> unsigned long long ns = per_cpu(cyc2ns_offset, cpu);
>>> - ns += cyc * per_cpu(cyc2ns, cpu)>> CYC2NS_SCALE_FACTOR;
>>> + quot = (cyc>> CYC2NS_SCALE_FACTOR);
>>> + rem = cyc& ((1ULL<< CYC2NS_SCALE_FACTOR) - 1);
>>> + ns += quot * per_cpu(cyc2ns, cpu) +
>>> + ((rem * per_cpu(cyc2ns, cpu))>> CYC2NS_SCALE_FACTOR);
>>> return ns;
>>> }
>>>
>>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] sched: avoid unnecessary overflow in sched_clock
2011-11-16 7:08 ` Paul Turner
@ 2011-11-16 8:56 ` Peter Zijlstra
2011-11-16 20:55 ` Salman Qazi
0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2011-11-16 8:56 UTC (permalink / raw)
To: Paul Turner; +Cc: Mike Galbraith, john stultz, Salman Qazi, Ingo Molnar, LKML
On Tue, 2011-11-15 at 23:08 -0800, Paul Turner wrote:
> Probably a good idea -- This especially sucks rocks in the sched_clock_stable==1
> case; resulting in it coming straight back out of sched_clock_cpu() and trashing
> rq->clock.
There's another problem there in that TSC will wrap too soon and things
will blow up in your face that way.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] sched: avoid unnecessary overflow in sched_clock
2011-11-16 8:56 ` Peter Zijlstra
@ 2011-11-16 20:55 ` Salman Qazi
0 siblings, 0 replies; 9+ messages in thread
From: Salman Qazi @ 2011-11-16 20:55 UTC (permalink / raw)
To: Peter Zijlstra, john stultz, Mike Galbraith, LKML, Ingo Molnar,
Paul Turner
(Added Reviewed-by/Acked-by tags)
In hundreds of days, the __cycles_2_ns calculation in sched_clock
has an overflow. cyc * per_cpu(cyc2ns, cpu) exceeds 64 bits, causing
the final value to become zero. We can solve this without losing
any precision.
We can decompose TSC into quotient and remainder of division by the
scale factor, and then use this to convert TSC into nanoseconds.
Reviewed-by: Paul Turner <pjt@google.com>
Acked-by: John Stultz<johnstul@us.ibm.com>
Signed-off-by: Salman Qazi <sqazi@google.com>
---
arch/x86/include/asm/timer.h | 23 ++++++++++++++++++++++-
1 files changed, 22 insertions(+), 1 deletions(-)
diff --git a/arch/x86/include/asm/timer.h b/arch/x86/include/asm/timer.h
index fa7b917..431793e 100644
--- a/arch/x86/include/asm/timer.h
+++ b/arch/x86/include/asm/timer.h
@@ -32,6 +32,22 @@ extern int no_timer_check;
* (mathieu.desnoyers@polymtl.ca)
*
* -johnstul@us.ibm.com "math is hard, lets go shopping!"
+ *
+ * In:
+ *
+ * ns = cycles * cyc2ns_scale / SC
+ *
+ * Although we may still have enough bits to store the value of ns,
+ * in some cases, we may not have enough bits to store cycles * cyc2ns_scale,
+ * leading to an incorrect result.
+ *
+ * To avoid this, we can decompose 'cycles' into quotient and remainder
+ * of division by SC. Then,
+ *
+ * ns = (quot * SC + rem) * cyc2ns_scale / SC
+ * = quot * cyc2ns_scale + (rem * cyc2ns_scale) / SC
+ *
+ * - sqazi@google.com
*/
DECLARE_PER_CPU(unsigned long, cyc2ns);
@@ -41,9 +57,14 @@ DECLARE_PER_CPU(unsigned long long, cyc2ns_offset);
static inline unsigned long long __cycles_2_ns(unsigned long long cyc)
{
+ unsigned long long quot;
+ unsigned long long rem;
int cpu = smp_processor_id();
unsigned long long ns = per_cpu(cyc2ns_offset, cpu);
- ns += cyc * per_cpu(cyc2ns, cpu) >> CYC2NS_SCALE_FACTOR;
+ quot = (cyc >> CYC2NS_SCALE_FACTOR);
+ rem = cyc & ((1ULL << CYC2NS_SCALE_FACTOR) - 1);
+ ns += quot * per_cpu(cyc2ns, cpu) +
+ ((rem * per_cpu(cyc2ns, cpu)) >> CYC2NS_SCALE_FACTOR);
return ns;
}
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [tip:sched/urgent] sched, x86: Avoid unnecessary overflow in sched_clock
2011-11-15 22:12 ` Salman Qazi
2011-11-15 23:02 ` john stultz
@ 2011-11-18 23:45 ` tip-bot for Salman Qazi
1 sibling, 0 replies; 9+ messages in thread
From: tip-bot for Salman Qazi @ 2011-11-18 23:45 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, hpa, mingo, a.p.zijlstra, johnstul, pjt, sqazi,
tglx, mingo
Commit-ID: 4cecf6d401a01d054afc1e5f605bcbfe553cb9b9
Gitweb: http://git.kernel.org/tip/4cecf6d401a01d054afc1e5f605bcbfe553cb9b9
Author: Salman Qazi <sqazi@google.com>
AuthorDate: Tue, 15 Nov 2011 14:12:06 -0800
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 16 Nov 2011 19:51:25 +0100
sched, x86: Avoid unnecessary overflow in sched_clock
(Added the missing signed-off-by line)
In hundreds of days, the __cycles_2_ns calculation in sched_clock
has an overflow. cyc * per_cpu(cyc2ns, cpu) exceeds 64 bits, causing
the final value to become zero. We can solve this without losing
any precision.
We can decompose TSC into quotient and remainder of division by the
scale factor, and then use this to convert TSC into nanoseconds.
Signed-off-by: Salman Qazi <sqazi@google.com>
Acked-by: John Stultz <johnstul@us.ibm.com>
Reviewed-by: Paul Turner <pjt@google.com>
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20111115221121.7262.88871.stgit@dungbeetle.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/x86/include/asm/timer.h | 23 ++++++++++++++++++++++-
1 files changed, 22 insertions(+), 1 deletions(-)
diff --git a/arch/x86/include/asm/timer.h b/arch/x86/include/asm/timer.h
index fa7b917..431793e 100644
--- a/arch/x86/include/asm/timer.h
+++ b/arch/x86/include/asm/timer.h
@@ -32,6 +32,22 @@ extern int no_timer_check;
* (mathieu.desnoyers@polymtl.ca)
*
* -johnstul@us.ibm.com "math is hard, lets go shopping!"
+ *
+ * In:
+ *
+ * ns = cycles * cyc2ns_scale / SC
+ *
+ * Although we may still have enough bits to store the value of ns,
+ * in some cases, we may not have enough bits to store cycles * cyc2ns_scale,
+ * leading to an incorrect result.
+ *
+ * To avoid this, we can decompose 'cycles' into quotient and remainder
+ * of division by SC. Then,
+ *
+ * ns = (quot * SC + rem) * cyc2ns_scale / SC
+ * = quot * cyc2ns_scale + (rem * cyc2ns_scale) / SC
+ *
+ * - sqazi@google.com
*/
DECLARE_PER_CPU(unsigned long, cyc2ns);
@@ -41,9 +57,14 @@ DECLARE_PER_CPU(unsigned long long, cyc2ns_offset);
static inline unsigned long long __cycles_2_ns(unsigned long long cyc)
{
+ unsigned long long quot;
+ unsigned long long rem;
int cpu = smp_processor_id();
unsigned long long ns = per_cpu(cyc2ns_offset, cpu);
- ns += cyc * per_cpu(cyc2ns, cpu) >> CYC2NS_SCALE_FACTOR;
+ quot = (cyc >> CYC2NS_SCALE_FACTOR);
+ rem = cyc & ((1ULL << CYC2NS_SCALE_FACTOR) - 1);
+ ns += quot * per_cpu(cyc2ns, cpu) +
+ ((rem * per_cpu(cyc2ns, cpu)) >> CYC2NS_SCALE_FACTOR);
return ns;
}
^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2011-11-18 23:45 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-15 21:59 [PATCH] sched: avoid unnecessary overflow in sched_clock Salman Qazi
2011-11-15 22:12 ` Salman Qazi
2011-11-15 23:02 ` john stultz
2011-11-16 0:07 ` Paul Turner
2011-11-16 6:41 ` Mike Galbraith
2011-11-16 7:08 ` Paul Turner
2011-11-16 8:56 ` Peter Zijlstra
2011-11-16 20:55 ` Salman Qazi
2011-11-18 23:45 ` [tip:sched/urgent] sched, x86: Avoid " tip-bot for Salman Qazi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).