[PATCH] ARM: sched_clock: improve mult/shift accuracy with high frequency clocks

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] ARM: sched_clock: improve mult/shift accuracy with high frequency clocks
@ 2010-12-21  1:21 Nicolas Pitre
  2010-12-21 10:51 ` Russell King - ARM Linux
  2011-01-09  3:21 ` Nicolas Pitre
  0 siblings, 2 replies; 13+ messages in thread
From: Nicolas Pitre @ 2010-12-21  1:21 UTC (permalink / raw)
  To: linux-arm-kernel

The minsec argument to clocks_calc_mult_shift() [ which appears to be
misnamed by the way ] is used to clamp the magnitude of the mult factor
so that the given range won't overflow a 64 bit result.  The mult factor
is itself already limited to a maximum of 32 bits.  Given that the largest
clock tick range we may have is also 32 bits, there is no way our usage of
any computed mult factor would ever overflow 64 bits.

The choice of 60 seconds here for minsec is rather arbitrary.  If the
mult factor wasn't already limited to 32 bits, this value of 60 would
create overflows for clocks which take more than 60 seconds to wrap.

And for clocks which take less than 60 seconds to wrap then we do lose
precision as the mult factor is made smaller (fewer bits) to be adjusted
to a range which is larger than what we actually use.  This currently
affects clocks faster than 71MHz.

We could use minsec = 0 instead of 60 and a good 32-bit mult factor would
be computed in all cases.  But let's be formal and provide the actual
range to clocks_calc_mult_shift(). None of the cyc_to_fixed_sched_clock()
users are affected as they all are using clocks < 71MHz.

Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>

diff --git a/arch/arm/kernel/sched_clock.c b/arch/arm/kernel/sched_clock.c
index 2cdcc92..f7a0f93 100644
--- a/arch/arm/kernel/sched_clock.c
+++ b/arch/arm/kernel/sched_clock.c
@@ -30,11 +30,13 @@ void __init init_sched_clock(struct clock_data *cd, void (*update)(void),
 	unsigned long r, w;
 	u64 res, wrap;
 	char r_unit;
+	u32 mask = (1ULL << clock_bits) - 1;

 	sched_clock_update_fn = update;

 	/* calculate the mult/shift to convert counter ticks to ns. */
-	clocks_calc_mult_shift(&cd->mult, &cd->shift, rate, NSEC_PER_SEC, 60);
+	clocks_calc_mult_shift(&cd->mult, &cd->shift, rate, NSEC_PER_SEC,
+			       mask/rate);

 	r = rate;
 	if (r >= 4000000) {
@@ -46,7 +48,7 @@ void __init init_sched_clock(struct clock_data *cd, void (*update)(void),
 	}

 	/* calculate how many ns until we wrap */
-	wrap = cyc_to_ns((1ULL << clock_bits) - 1, cd->mult, cd->shift);
+	wrap = cyc_to_ns(mask, cd->mult, cd->shift);
 	do_div(wrap, NSEC_PER_MSEC);
 	w = wrap;

@@ -54,6 +56,7 @@ void __init init_sched_clock(struct clock_data *cd, void (*update)(void),
 	res = cyc_to_ns(1ULL, cd->mult, cd->shift);
 	pr_info("sched_clock: %u bits at %lu%cHz, resolution %lluns, wraps every %lums\n",
 		clock_bits, r, r_unit, res, w);
+	pr_debug("sched_clock: mult=0x%08x shift=%u)\n", cd->mult, cd->shift);

 	/*
 	 * Start the timer to keep sched_clock() properly updated and

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH] ARM: sched_clock: improve mult/shift accuracy with high frequency clocks
  2010-12-21  1:21 [PATCH] ARM: sched_clock: improve mult/shift accuracy with high frequency clocks Nicolas Pitre
@ 2010-12-21 10:51 ` Russell King - ARM Linux
  2010-12-30  1:31   ` Nicolas Pitre
  2011-01-09  3:21 ` Nicolas Pitre
  1 sibling, 1 reply; 13+ messages in thread
From: Russell King - ARM Linux @ 2010-12-21 10:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Dec 20, 2010 at 08:21:41PM -0500, Nicolas Pitre wrote:
> 
> The minsec argument to clocks_calc_mult_shift() [ which appears to be
> misnamed by the way ] is used to clamp the magnitude of the mult factor
> so that the given range won't overflow a 64 bit result.  The mult factor
> is itself already limited to a maximum of 32 bits.  Given that the largest
> clock tick range we may have is also 32 bits, there is no way our usage of
> any computed mult factor would ever overflow 64 bits.
> 
> The choice of 60 seconds here for minsec is rather arbitrary.  If the
> mult factor wasn't already limited to 32 bits, this value of 60 would
> create overflows for clocks which take more than 60 seconds to wrap.

60 seconds was arbitary, chosen from the selection of clock rates which
I had available at the time (the highest of which was 24MHz).

> And for clocks which take less than 60 seconds to wrap then we do lose
> precision as the mult factor is made smaller (fewer bits) to be adjusted
> to a range which is larger than what we actually use.  This currently
> affects clocks faster than 71MHz.
> 
> We could use minsec = 0 instead of 60 and a good 32-bit mult factor would
> be computed in all cases.  But let's be formal and provide the actual
> range to clocks_calc_mult_shift(). None of the cyc_to_fixed_sched_clock()
> users are affected as they all are using clocks < 71MHz.

Maybe __clocksource_updatefreq_scale() needs a similar fix too, instead
of assuming 5 seconds?  It also has access to the mask and rate.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] ARM: sched_clock: improve mult/shift accuracy with high frequency clocks
  2010-12-21 10:51 ` Russell King - ARM Linux
@ 2010-12-30  1:31   ` Nicolas Pitre
  2011-01-03  0:37     ` Russell King - ARM Linux
  0 siblings, 1 reply; 13+ messages in thread
From: Nicolas Pitre @ 2010-12-30  1:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 21 Dec 2010, Russell King - ARM Linux wrote:

> On Mon, Dec 20, 2010 at 08:21:41PM -0500, Nicolas Pitre wrote:
> > 
> > The minsec argument to clocks_calc_mult_shift() [ which appears to be
> > misnamed by the way ] is used to clamp the magnitude of the mult factor
> > so that the given range won't overflow a 64 bit result.  The mult factor
> > is itself already limited to a maximum of 32 bits.  Given that the largest
> > clock tick range we may have is also 32 bits, there is no way our usage of
> > any computed mult factor would ever overflow 64 bits.
> > 
> > The choice of 60 seconds here for minsec is rather arbitrary.  If the
> > mult factor wasn't already limited to 32 bits, this value of 60 would
> > create overflows for clocks which take more than 60 seconds to wrap.
> 
> 60 seconds was arbitary, chosen from the selection of clock rates which
> I had available at the time (the highest of which was 24MHz).

Fair enough.  It is just not universally optimal given the different 
clock ranges this code now covers.

> > And for clocks which take less than 60 seconds to wrap then we do lose
> > precision as the mult factor is made smaller (fewer bits) to be adjusted
> > to a range which is larger than what we actually use.  This currently
> > affects clocks faster than 71MHz.
> > 
> > We could use minsec = 0 instead of 60 and a good 32-bit mult factor would
> > be computed in all cases.  But let's be formal and provide the actual
> > range to clocks_calc_mult_shift(). None of the cyc_to_fixed_sched_clock()
> > users are affected as they all are using clocks < 71MHz.
> 
> Maybe __clocksource_updatefreq_scale() needs a similar fix too, instead
> of assuming 5 seconds?  It also has access to the mask and rate.

There is a comment to that effect right above the 
clocks_calc_mult_shift() indicating that this is a known issue in that 
case already.  

And looking at that code, it appears that there is a balance to be made 
between cs->max_idle_ns and cs->mult, the former currently being 
determined by the later.  But if cs->mult is maximized to 32 bits, that 
leaves only 31 bits for cs->max_idle_ns which corresponds to approx 2 
seconds only.

So this is not clear to me what would be the optimal mult/shift values 
in the __clocksource_updatefreq_scale() context, while this is rather 
obvious in the init_sched_clock() context.  Hence this patch.


Nicolas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] ARM: sched_clock: improve mult/shift accuracy with high frequency clocks
  2010-12-30  1:31   ` Nicolas Pitre
@ 2011-01-03  0:37     ` Russell King - ARM Linux
  2011-01-03  1:21       ` Nicolas Pitre
  2011-01-03 19:47       ` john stultz
  0 siblings, 2 replies; 13+ messages in thread
From: Russell King - ARM Linux @ 2011-01-03  0:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Dec 29, 2010 at 08:31:58PM -0500, Nicolas Pitre wrote:
> On Tue, 21 Dec 2010, Russell King - ARM Linux wrote:
> 
> > On Mon, Dec 20, 2010 at 08:21:41PM -0500, Nicolas Pitre wrote:
> > > 
> > > The minsec argument to clocks_calc_mult_shift() [ which appears to be
> > > misnamed by the way ] is used to clamp the magnitude of the mult factor
> > > so that the given range won't overflow a 64 bit result.  The mult factor
> > > is itself already limited to a maximum of 32 bits.  Given that the largest
> > > clock tick range we may have is also 32 bits, there is no way our usage of
> > > any computed mult factor would ever overflow 64 bits.
> > > 
> > > The choice of 60 seconds here for minsec is rather arbitrary.  If the
> > > mult factor wasn't already limited to 32 bits, this value of 60 would
> > > create overflows for clocks which take more than 60 seconds to wrap.
> > 
> > 60 seconds was arbitary, chosen from the selection of clock rates which
> > I had available at the time (the highest of which was 24MHz).
> 
> Fair enough.  It is just not universally optimal given the different 
> clock ranges this code now covers.
> 
> > > And for clocks which take less than 60 seconds to wrap then we do lose
> > > precision as the mult factor is made smaller (fewer bits) to be adjusted
> > > to a range which is larger than what we actually use.  This currently
> > > affects clocks faster than 71MHz.
> > > 
> > > We could use minsec = 0 instead of 60 and a good 32-bit mult factor would
> > > be computed in all cases.  But let's be formal and provide the actual
> > > range to clocks_calc_mult_shift(). None of the cyc_to_fixed_sched_clock()
> > > users are affected as they all are using clocks < 71MHz.
> > 
> > Maybe __clocksource_updatefreq_scale() needs a similar fix too, instead
> > of assuming 5 seconds?  It also has access to the mask and rate.
> 
> There is a comment to that effect right above the 
> clocks_calc_mult_shift() indicating that this is a known issue in that 
> case already.  
> 
> And looking at that code, it appears that there is a balance to be made 
> between cs->max_idle_ns and cs->mult, the former currently being 
> determined by the later.  But if cs->mult is maximized to 32 bits, that 
> leaves only 31 bits for cs->max_idle_ns which corresponds to approx 2 
> seconds only.
> 
> So this is not clear to me what would be the optimal mult/shift values 
> in the __clocksource_updatefreq_scale() context, while this is rather 
> obvious in the init_sched_clock() context.  Hence this patch.

As clocksources are about precision (I believe it has been stated so in
the past) it seems that fudging the shift and multiplier to get an
extended period out of the clock is not the correct approach.

Maybe Thomas or John can shed some light on this?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] ARM: sched_clock: improve mult/shift accuracy with high frequency clocks
  2011-01-03  0:37     ` Russell King - ARM Linux
@ 2011-01-03  1:21       ` Nicolas Pitre
  2011-01-03 19:56         ` john stultz
  2011-01-03 19:47       ` john stultz
  1 sibling, 1 reply; 13+ messages in thread
From: Nicolas Pitre @ 2011-01-03  1:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 3 Jan 2011, Russell King - ARM Linux wrote:

> On Wed, Dec 29, 2010 at 08:31:58PM -0500, Nicolas Pitre wrote:
> > On Tue, 21 Dec 2010, Russell King - ARM Linux wrote:
> > 
> > > On Mon, Dec 20, 2010 at 08:21:41PM -0500, Nicolas Pitre wrote:
> > > > 
> > > > The minsec argument to clocks_calc_mult_shift() [ which appears to be
> > > > misnamed by the way ] is used to clamp the magnitude of the mult factor
> > > > so that the given range won't overflow a 64 bit result.  The mult factor
> > > > is itself already limited to a maximum of 32 bits.  Given that the largest
> > > > clock tick range we may have is also 32 bits, there is no way our usage of
> > > > any computed mult factor would ever overflow 64 bits.
> > > > 
> > > > The choice of 60 seconds here for minsec is rather arbitrary.  If the
> > > > mult factor wasn't already limited to 32 bits, this value of 60 would
> > > > create overflows for clocks which take more than 60 seconds to wrap.
> > > 
> > > 60 seconds was arbitary, chosen from the selection of clock rates which
> > > I had available at the time (the highest of which was 24MHz).
> > 
> > Fair enough.  It is just not universally optimal given the different 
> > clock ranges this code now covers.
> > 
> > > > And for clocks which take less than 60 seconds to wrap then we do lose
> > > > precision as the mult factor is made smaller (fewer bits) to be adjusted
> > > > to a range which is larger than what we actually use.  This currently
> > > > affects clocks faster than 71MHz.
> > > > 
> > > > We could use minsec = 0 instead of 60 and a good 32-bit mult factor would
> > > > be computed in all cases.  But let's be formal and provide the actual
> > > > range to clocks_calc_mult_shift(). None of the cyc_to_fixed_sched_clock()
> > > > users are affected as they all are using clocks < 71MHz.
> > > 
> > > Maybe __clocksource_updatefreq_scale() needs a similar fix too, instead
> > > of assuming 5 seconds?  It also has access to the mask and rate.
> > 
> > There is a comment to that effect right above the 
> > clocks_calc_mult_shift() indicating that this is a known issue in that 
> > case already.  
> > 
> > And looking at that code, it appears that there is a balance to be made 
> > between cs->max_idle_ns and cs->mult, the former currently being 
> > determined by the later.  But if cs->mult is maximized to 32 bits, that 
> > leaves only 31 bits for cs->max_idle_ns which corresponds to approx 2 
> > seconds only.
> > 
> > So this is not clear to me what would be the optimal mult/shift values 
> > in the __clocksource_updatefreq_scale() context, while this is rather 
> > obvious in the init_sched_clock() context.  Hence this patch.
> 
> As clocksources are about precision (I believe it has been stated so in
> the past) it seems that fudging the shift and multiplier to get an
> extended period out of the clock is not the correct approach.
> 
> Maybe Thomas or John can shed some light on this?

Sure.  My current understanding is that the core might be trying to play 
it safe as the mult and shift factors used to be provided by each clock 
source drivers, and they were often the result of some guessing from the 
authors of those drivers.  Now that those factors are programmatically 
determined via the clocksource_register_khz interface (is this the case 
for them all?) then it should be possible to tighten that code a bit and 
make it optimal.

Since this is somewhat separate from your sched_clock infrastructure, I 
think that my patch is relevant regardless of what happens with the 
__clocksource_updatefreq_scale() code and should be applied 
nevertheless.


Nicolas



> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] ARM: sched_clock: improve mult/shift accuracy with high frequency clocks
  2011-01-03  0:37     ` Russell King - ARM Linux
  2011-01-03  1:21       ` Nicolas Pitre
@ 2011-01-03 19:47       ` john stultz
  2011-01-09 10:52         ` Russell King - ARM Linux
  1 sibling, 1 reply; 13+ messages in thread
From: john stultz @ 2011-01-03 19:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2011-01-03 at 00:37 +0000, Russell King - ARM Linux wrote:
> On Wed, Dec 29, 2010 at 08:31:58PM -0500, Nicolas Pitre wrote:
> > On Tue, 21 Dec 2010, Russell King - ARM Linux wrote:
> > 
> > > On Mon, Dec 20, 2010 at 08:21:41PM -0500, Nicolas Pitre wrote:
> > > > 
> > > > The minsec argument to clocks_calc_mult_shift() [ which appears to be
> > > > misnamed by the way ] is used to clamp the magnitude of the mult factor
> > > > so that the given range won't overflow a 64 bit result.  The mult factor
> > > > is itself already limited to a maximum of 32 bits.  Given that the largest
> > > > clock tick range we may have is also 32 bits, there is no way our usage of
> > > > any computed mult factor would ever overflow 64 bits.
> > > > 
> > > > The choice of 60 seconds here for minsec is rather arbitrary.  If the
> > > > mult factor wasn't already limited to 32 bits, this value of 60 would
> > > > create overflows for clocks which take more than 60 seconds to wrap.
> > > 
> > > 60 seconds was arbitary, chosen from the selection of clock rates which
> > > I had available at the time (the highest of which was 24MHz).
> > 
> > Fair enough.  It is just not universally optimal given the different 
> > clock ranges this code now covers.
> > 
> > > > And for clocks which take less than 60 seconds to wrap then we do lose
> > > > precision as the mult factor is made smaller (fewer bits) to be adjusted
> > > > to a range which is larger than what we actually use.  This currently
> > > > affects clocks faster than 71MHz.
> > > > 
> > > > We could use minsec = 0 instead of 60 and a good 32-bit mult factor would
> > > > be computed in all cases.  But let's be formal and provide the actual
> > > > range to clocks_calc_mult_shift(). None of the cyc_to_fixed_sched_clock()
> > > > users are affected as they all are using clocks < 71MHz.
> > > 
> > > Maybe __clocksource_updatefreq_scale() needs a similar fix too, instead
> > > of assuming 5 seconds?  It also has access to the mask and rate.
> > 
> > There is a comment to that effect right above the 
> > clocks_calc_mult_shift() indicating that this is a known issue in that 
> > case already.  
> > 
> > And looking at that code, it appears that there is a balance to be made 
> > between cs->max_idle_ns and cs->mult, the former currently being 
> > determined by the later.  But if cs->mult is maximized to 32 bits, that 
> > leaves only 31 bits for cs->max_idle_ns which corresponds to approx 2 
> > seconds only.
> > 
> > So this is not clear to me what would be the optimal mult/shift values 
> > in the __clocksource_updatefreq_scale() context, while this is rather 
> > obvious in the init_sched_clock() context.  Hence this patch.
> 
> As clocksources are about precision (I believe it has been stated so in
> the past) it seems that fudging the shift and multiplier to get an
> extended period out of the clock is not the correct approach.
> 
> Maybe Thomas or John can shed some light on this?

Right, so there's conflicting goals of long idle times and fine-grained
time accuracy that pull the clocksource mult/shift pair calculation in
different directions.

Originally, the advice was pick a mult/shift pair that allows for a few
seconds of time to accrue before we overflow. However this was a little
vague and as the number of clocksources grew, and the desired idle times
were increasing,  I started to see that the lack of consistency was
going to give us troubles.

So the clocksource_register_hz/khz interfaces try to consolidate that
decision into one place, utilizing the MAX_UPDATE_LENGTH value
(currently 5 seconds) as the max idle time. Clocksource driver writers
don't have to worry about making the correct decision, the kernel should
do it for you.

Now, at some point I expect folks to get grumpy about the max idle time
being limited to 5 seconds, and we'll have to either push it out for
everyone (at the cost of less accurate NTP steering), or make it a
architecture specific config option.

Now, for sched_clock, there are a different set of expectations with
regards to accuracy and expected idle times, and we'll probably need a
similar consolidation effort to make sure the mult/shift calculations
are correct and the resulting limits are taken into account by the
scheduler when going into NOHZ mode.

thanks
-john

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] ARM: sched_clock: improve mult/shift accuracy with high frequency clocks
  2011-01-03  1:21       ` Nicolas Pitre
@ 2011-01-03 19:56         ` john stultz
  0 siblings, 0 replies; 13+ messages in thread
From: john stultz @ 2011-01-03 19:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, 2011-01-02 at 20:21 -0500, Nicolas Pitre wrote:
> On Mon, 3 Jan 2011, Russell King - ARM Linux wrote:
> 
> > On Wed, Dec 29, 2010 at 08:31:58PM -0500, Nicolas Pitre wrote:
> > > On Tue, 21 Dec 2010, Russell King - ARM Linux wrote:
> > > 
> > > > On Mon, Dec 20, 2010 at 08:21:41PM -0500, Nicolas Pitre wrote:
> > > > > 
> > > > > The minsec argument to clocks_calc_mult_shift() [ which appears to be
> > > > > misnamed by the way ] is used to clamp the magnitude of the mult factor
> > > > > so that the given range won't overflow a 64 bit result.  The mult factor
> > > > > is itself already limited to a maximum of 32 bits.  Given that the largest
> > > > > clock tick range we may have is also 32 bits, there is no way our usage of
> > > > > any computed mult factor would ever overflow 64 bits.
> > > > > 
> > > > > The choice of 60 seconds here for minsec is rather arbitrary.  If the
> > > > > mult factor wasn't already limited to 32 bits, this value of 60 would
> > > > > create overflows for clocks which take more than 60 seconds to wrap.
> > > > 
> > > > 60 seconds was arbitary, chosen from the selection of clock rates which
> > > > I had available at the time (the highest of which was 24MHz).
> > > 
> > > Fair enough.  It is just not universally optimal given the different 
> > > clock ranges this code now covers.
> > > 
> > > > > And for clocks which take less than 60 seconds to wrap then we do lose
> > > > > precision as the mult factor is made smaller (fewer bits) to be adjusted
> > > > > to a range which is larger than what we actually use.  This currently
> > > > > affects clocks faster than 71MHz.
> > > > > 
> > > > > We could use minsec = 0 instead of 60 and a good 32-bit mult factor would
> > > > > be computed in all cases.  But let's be formal and provide the actual
> > > > > range to clocks_calc_mult_shift(). None of the cyc_to_fixed_sched_clock()
> > > > > users are affected as they all are using clocks < 71MHz.
> > > > 
> > > > Maybe __clocksource_updatefreq_scale() needs a similar fix too, instead
> > > > of assuming 5 seconds?  It also has access to the mask and rate.
> > > 
> > > There is a comment to that effect right above the 
> > > clocks_calc_mult_shift() indicating that this is a known issue in that 
> > > case already.  
> > > 
> > > And looking at that code, it appears that there is a balance to be made 
> > > between cs->max_idle_ns and cs->mult, the former currently being 
> > > determined by the later.  But if cs->mult is maximized to 32 bits, that 
> > > leaves only 31 bits for cs->max_idle_ns which corresponds to approx 2 
> > > seconds only.
> > > 
> > > So this is not clear to me what would be the optimal mult/shift values 
> > > in the __clocksource_updatefreq_scale() context, while this is rather 
> > > obvious in the init_sched_clock() context.  Hence this patch.
> > 
> > As clocksources are about precision (I believe it has been stated so in
> > the past) it seems that fudging the shift and multiplier to get an
> > extended period out of the clock is not the correct approach.
> > 
> > Maybe Thomas or John can shed some light on this?
> 
> Sure.  My current understanding is that the core might be trying to play 
> it safe as the mult and shift factors used to be provided by each clock 
> source drivers, and they were often the result of some guessing from the 
> authors of those drivers.  Now that those factors are programmatically 
> determined via the clocksource_register_khz interface (is this the case 
> for them all?) then it should be possible to tighten that code a bit and 
> make it optimal.

Its not yet the case for all clocksources, I've had some trouble getting
arch maintainers to pick up the conversion patches I've sent out so far
(Russell jumped in and did the arm conversion for me, which lightens my
load dramatically, so thanks again for that!). 

And yes, the consolidation was done as I was seeing a lot of mult/shift
copy-pasta in clocksources (which is my fault, as the advice on how to
calculate it was poor), and as idle time lengths are growing I saw a
potential hazard of breaking platforms as idle lengths grow, or users
being frustrated that one platform might seemingly use more power then
another due to the mult/shift pair needlessly limiting it.

So the consolidation is mainly just to get the expectations unified and
make it so changes don't have to be done on a clocksource by clocksource
basis.

thanks
-john

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] ARM: sched_clock: improve mult/shift accuracy with high frequency clocks
  2010-12-21  1:21 [PATCH] ARM: sched_clock: improve mult/shift accuracy with high frequency clocks Nicolas Pitre
  2010-12-21 10:51 ` Russell King - ARM Linux
@ 2011-01-09  3:21 ` Nicolas Pitre
  1 sibling, 0 replies; 13+ messages in thread
From: Nicolas Pitre @ 2011-01-09  3:21 UTC (permalink / raw)
  To: linux-arm-kernel


Ping.

I think this patch is needed independently of what happens to 
__clocksource_updatefreq_scale().  We must cover at least the full hw 
clock range because of update_sched_clock().  But no more than the full 
hw clock range otherwise we lose precision in the mult factor.  Without 
this patch we lose two bits of precision on Kirkwood.

On Mon, 20 Dec 2010, Nicolas Pitre wrote:

> 
> The minsec argument to clocks_calc_mult_shift() [ which appears to be
> misnamed by the way ] is used to clamp the magnitude of the mult factor
> so that the given range won't overflow a 64 bit result.  The mult factor
> is itself already limited to a maximum of 32 bits.  Given that the largest
> clock tick range we may have is also 32 bits, there is no way our usage of
> any computed mult factor would ever overflow 64 bits.
> 
> The choice of 60 seconds here for minsec is rather arbitrary.  If the
> mult factor wasn't already limited to 32 bits, this value of 60 would
> create overflows for clocks which take more than 60 seconds to wrap.
> 
> And for clocks which take less than 60 seconds to wrap then we do lose
> precision as the mult factor is made smaller (fewer bits) to be adjusted
> to a range which is larger than what we actually use.  This currently
> affects clocks faster than 71MHz.
> 
> We could use minsec = 0 instead of 60 and a good 32-bit mult factor would
> be computed in all cases.  But let's be formal and provide the actual
> range to clocks_calc_mult_shift(). None of the cyc_to_fixed_sched_clock()
> users are affected as they all are using clocks < 71MHz.
> 
> Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
> 
> diff --git a/arch/arm/kernel/sched_clock.c b/arch/arm/kernel/sched_clock.c
> index 2cdcc92..f7a0f93 100644
> --- a/arch/arm/kernel/sched_clock.c
> +++ b/arch/arm/kernel/sched_clock.c
> @@ -30,11 +30,13 @@ void __init init_sched_clock(struct clock_data *cd, void (*update)(void),
>  	unsigned long r, w;
>  	u64 res, wrap;
>  	char r_unit;
> +	u32 mask = (1ULL << clock_bits) - 1;
>  
>  	sched_clock_update_fn = update;
>  
>  	/* calculate the mult/shift to convert counter ticks to ns. */
> -	clocks_calc_mult_shift(&cd->mult, &cd->shift, rate, NSEC_PER_SEC, 60);
> +	clocks_calc_mult_shift(&cd->mult, &cd->shift, rate, NSEC_PER_SEC,
> +			       mask/rate);
>  
>  	r = rate;
>  	if (r >= 4000000) {
> @@ -46,7 +48,7 @@ void __init init_sched_clock(struct clock_data *cd, void (*update)(void),
>  	}
>  
>  	/* calculate how many ns until we wrap */
> -	wrap = cyc_to_ns((1ULL << clock_bits) - 1, cd->mult, cd->shift);
> +	wrap = cyc_to_ns(mask, cd->mult, cd->shift);
>  	do_div(wrap, NSEC_PER_MSEC);
>  	w = wrap;
>  
> @@ -54,6 +56,7 @@ void __init init_sched_clock(struct clock_data *cd, void (*update)(void),
>  	res = cyc_to_ns(1ULL, cd->mult, cd->shift);
>  	pr_info("sched_clock: %u bits at %lu%cHz, resolution %lluns, wraps every %lums\n",
>  		clock_bits, r, r_unit, res, w);
> +	pr_debug("sched_clock: mult=0x%08x shift=%u)\n", cd->mult, cd->shift);
>  
>  	/*
>  	 * Start the timer to keep sched_clock() properly updated and
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] ARM: sched_clock: improve mult/shift accuracy with high frequency clocks
  2011-01-03 19:47       ` john stultz
@ 2011-01-09 10:52         ` Russell King - ARM Linux
  2011-01-10  3:55           ` Nicolas Pitre
  0 siblings, 1 reply; 13+ messages in thread
From: Russell King - ARM Linux @ 2011-01-09 10:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 03, 2011 at 11:47:29AM -0800, john stultz wrote:
> Now, for sched_clock, there are a different set of expectations with
> regards to accuracy and expected idle times, and we'll probably need a
> similar consolidation effort to make sure the mult/shift calculations
> are correct and the resulting limits are taken into account by the
> scheduler when going into NOHZ mode.

However, it's exactly the same concerns wrt idle time.  If you want
a 100% accurate sched_clock() and you're using the same counter
register for both sched_clock() and clocksource, then you might as
well have a 100% accurate clocksource too (it's essentially the same
conversion with the same upper bound.)

With a 32-bit counter at 200MHz, theoretically you have a wrap time of
slightly less than 21.5s, but with a 5ns accuracy (actually 5ns).

The existing sched_clock() code comes out with:

sched_clock: 32 bits at 24MHz, resolution 41ns, wraps every 178956ms
Versatile: shift = 26 mult = 2796202667
sched_clock: 32 bits at 3686kHz, resolution 271ns, wraps every 1165084ms
SA11x0: shift = 23 mult = 2275555556
sched_clock: 32 bits at 1000kHz, resolution 1000ns, wraps every 4294967ms
Tegra: shift = 22 mult = 4194304000
sched_clock: 32 bits at 32kHz, resolution 30517ns, wraps every 131071999ms
OMAP: shift = 17 mult = 4000000000
sched_clock: 32 bits at 200MHz, resolution 5ns, wraps every 21474ms
Orion: shift = 27 mult = 671088640

Reducing down the minsec from 60 to 5 gives:

sched_clock: 32 bits at 24MHz, resolution 41ns, wraps every 178956ms
Versatile: shift = 26 mult = 2796202667
sched_clock: 32 bits at 3686kHz, resolution 271ns, wraps every 1165084ms
SA11x0: shift = 23 mult = 2275555556
sched_clock: 32 bits at 1000kHz, resolution 1000ns, wraps every 4294967ms
Tegra: shift = 22 mult = 4194304000
sched_clock: 32 bits at 32kHz, resolution 30517ns, wraps every 131071999ms
OMAP: shift = 17 mult = 4000000000
sched_clock: 32 bits at 200MHz, resolution 5ns, wraps every 21474ms
Orion: shift = 29 mult = 2684354560

Note that the resolution and wrap periods are calculated using the chosen
constants.  The constants for "Orion" do change, but it produces no visible
effect on the outcome - we still achieve the same resolution and the same
wrap period.  Let's just check that with bc:

1 * 671088640 / 2^27
5.00000000000000000000
1 * 2684354560 / 2^29
5.00000000000000000000

Let's look at 183MHz, which is a value I've randomly picked to be obscure:

minsec=60
sched_clock: 32 bits at 183MHz, resolution 5ns, wraps every 23469ms
Orion: shift = 27 mult = 733430208
minsec=5
sched_clock: 32 bits at 183MHz, resolution 5ns, wraps every 23469ms
Orion: shift = 29 mult = 2933720831

1 * 733430208 / 2^27
5.46448087692260742187
1 * 2933720831 / 2^29
5.46448087505996227264

The difference between is 1.00000000034086406226 - so about 34 parts
per trillion. (34 * 10^-12)

Now, a Caesium fountain frequency standard may have an accuracy of
approx. 1 part in 10^-14.  Rubidium frequency standards are around
1 part in 10^-12.

A standard crystal oscillator is around 1 part in 10^-6 to 10^-7.  If
you really care about accuracy, you might use an ovened crystal
oscillator (OXCO) which'll get you to around 1 part in 10^-7..10^-9,
still well short of the calculation inaccuracy.  You wouldn't use an
OXCO in a battery operated device though due to power consumption.

We're generally don't have a Caesium or Rubidium frequency standard, not
even a OXCO providing the clock source for the counter, so the accuracy
of the counters clock is much more significant than the conversion
factors by a factor of about one million.

What I'm saying is that there becomes a time where it really doesn't
matter if the conversion isn't accurate, provided it's accurate enough,
and it would appear to be accurate enough.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] ARM: sched_clock: improve mult/shift accuracy with high frequency clocks
  2011-01-09 10:52         ` Russell King - ARM Linux
@ 2011-01-10  3:55           ` Nicolas Pitre
  2011-01-10 10:51             ` Russell King - ARM Linux
  0 siblings, 1 reply; 13+ messages in thread
From: Nicolas Pitre @ 2011-01-10  3:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, 9 Jan 2011, Russell King - ARM Linux wrote:

> Now, a Caesium fountain frequency standard may have an accuracy of
> approx. 1 part in 10^-14.  Rubidium frequency standards are around
> 1 part in 10^-12.
> 
> A standard crystal oscillator is around 1 part in 10^-6 to 10^-7.  If
> you really care about accuracy, you might use an ovened crystal
> oscillator (OXCO) which'll get you to around 1 part in 10^-7..10^-9,
> still well short of the calculation inaccuracy.  You wouldn't use an
> OXCO in a battery operated device though due to power consumption.
> 
> We're generally don't have a Caesium or Rubidium frequency standard, not
> even a OXCO providing the clock source for the counter, so the accuracy
> of the counters clock is much more significant than the conversion
> factors by a factor of about one million.
> 
> What I'm saying is that there becomes a time where it really doesn't
> matter if the conversion isn't accurate, provided it's accurate enough,
> and it would appear to be accurate enough.

It sure is, thanks for the demonstration.

However this begs the question about the actual meaning of the value for 
the minsec argument to clocks_calc_mult_shift() (which IMHO should be 
renamed to maxsec instead).  In the ARM sched_clock code the value of 60 
is totally arbitrary and may happen to be good enough, but a value of 0 
would also be totally arbitrary and also work fine.  But at least a 0 
value wouldn't imply any false meaning.  And in the case of the 
sched_clock support code, we know the value we need: 90% 
of the actual hardware clock period, so using that would at least make 
the code self consistent even if in practice this doesn't change the 
final results.

Nicolas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] ARM: sched_clock: improve mult/shift accuracy with high frequency clocks
  2011-01-10  3:55           ` Nicolas Pitre
@ 2011-01-10 10:51             ` Russell King - ARM Linux
  2011-01-10 13:53               ` Nicolas Pitre
  0 siblings, 1 reply; 13+ messages in thread
From: Russell King - ARM Linux @ 2011-01-10 10:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, Jan 09, 2011 at 10:55:04PM -0500, Nicolas Pitre wrote:
> However this begs the question about the actual meaning of the value for 
> the minsec argument to clocks_calc_mult_shift() (which IMHO should be 
> renamed to maxsec instead).  In the ARM sched_clock code the value of 60 
> is totally arbitrary and may happen to be good enough, but a value of 0 
> would also be totally arbitrary and also work fine.  But at least a 0 
> value wouldn't imply any false meaning.  And in the case of the 
> sched_clock support code, we know the value we need: 90% 
> of the actual hardware clock period, so using that would at least make 
> the code self consistent even if in practice this doesn't change the 
> final results.

Actually, minsec is utterly wrong.

minsec is there to clamp the conversion from the N-bit cyclecounter to
a 64-bit nanosecond value to ensure that there isn't a 64-bit overflow
within the 'minsec' period.

With a 32-bit or smaller cyclecounter, as 32-bit x 32-bit can never
overflow a 64-bit destination, so if anything zero should be passed in
this case.

If larger than 32-bit, then a value may be needed to clamp it.  However,
	wrap = (1 << bits) / freq
	ns = (mult * cnt) >> shift
	mult = (NSEC_PER_SEC (a 30-bit number) << 32) / freq (32-bit max)

A 33-bit counter would need a 32-bit multiplier to wrap-around within
the 64-bit maths.  The frequency which produces a 32-bit multiplier is
2GHz, which gives a wrap period of 4.29s.  Above this frequency, the
64-bit math can't overflow as the multiplier becomes smaller.  Below
this frequency, counter wrap periods get longer and the multiplier
becomes larger up to 32 bits - and this is where the 64-bit math problem
starts.

A 33-bit counter with a 1.8GHz clock gives a multiplier of 2386092942
(0x8E38E38E).  Such a multiplier wraps 64-bit maths@7730941133 and
it takes the counter 4.29s to get there.

1 << bits / freq gives a counter wrap period of 4.77s, which is
over-estimating the 64-bit math wrap.

I question whether using 1 << bits / freq is valid for minsec - does
there exist a frequency where an integer minsec is larger than required.
Luckily the maths is safe as it'll produce a smaller mult.

However, I question whether using 1 << bits / freq is any arbitary than
a 60 or 0 value - it's certainly mathematically the wrong wrap period.

So, in summary I'd suggest using a value of 0 for sched_clock() if we're
going to change it - we don't accept more than 32-bits from the counter
at present, so the whole minsec thing really isn't needed to prevent
wrap.  If we ever allow more than 32-bits then yes it will.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] ARM: sched_clock: improve mult/shift accuracy with high frequency clocks
  2011-01-10 10:51             ` Russell King - ARM Linux
@ 2011-01-10 13:53               ` Nicolas Pitre
  2011-01-11 16:54                 ` Russell King - ARM Linux
  0 siblings, 1 reply; 13+ messages in thread
From: Nicolas Pitre @ 2011-01-10 13:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 10 Jan 2011, Russell King - ARM Linux wrote:

> With a 32-bit or smaller cyclecounter, as 32-bit x 32-bit can never
> overflow a 64-bit destination, so if anything zero should be passed in
> this case.

That's the alternative suggestion I made as well.


Nicolas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] ARM: sched_clock: improve mult/shift accuracy with high frequency clocks
  2011-01-10 13:53               ` Nicolas Pitre
@ 2011-01-11 16:54                 ` Russell King - ARM Linux
  0 siblings, 0 replies; 13+ messages in thread
From: Russell King - ARM Linux @ 2011-01-11 16:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 10, 2011 at 08:53:14AM -0500, Nicolas Pitre wrote:
> On Mon, 10 Jan 2011, Russell King - ARM Linux wrote:
> 
> > With a 32-bit or smaller cyclecounter, as 32-bit x 32-bit can never
> > overflow a 64-bit destination, so if anything zero should be passed in
> > this case.
> 
> That's the alternative suggestion I made as well.

I've just committed a patch which makes it zero.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-01-11 16:54 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-21  1:21 [PATCH] ARM: sched_clock: improve mult/shift accuracy with high frequency clocks Nicolas Pitre
2010-12-21 10:51 ` Russell King - ARM Linux
2010-12-30  1:31   ` Nicolas Pitre
2011-01-03  0:37     ` Russell King - ARM Linux
2011-01-03  1:21       ` Nicolas Pitre
2011-01-03 19:56         ` john stultz
2011-01-03 19:47       ` john stultz
2011-01-09 10:52         ` Russell King - ARM Linux
2011-01-10  3:55           ` Nicolas Pitre
2011-01-10 10:51             ` Russell King - ARM Linux
2011-01-10 13:53               ` Nicolas Pitre
2011-01-11 16:54                 ` Russell King - ARM Linux
2011-01-09  3:21 ` Nicolas Pitre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).