public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH] tsc_khz= boot option to avoid TSC calibration variance
@ 2009-05-08  0:30 john stultz
  2009-05-08 12:15 ` Ingo Molnar
  2009-05-08 12:19 ` [RFC][PATCH] tsc_khz= boot option to avoid TSC calibration variance George Spelvin
  0 siblings, 2 replies; 8+ messages in thread
From: john stultz @ 2009-05-08  0:30 UTC (permalink / raw)
  To: George Spelvin
  Cc: ulrich.windl, linux-kernel, tglx, Clark Williams, zippel,
	Ingo Molnar

All, 

Despite recent tweaking, TSC calibration variance is still biting users
who care about keeping close sync with NTP servers over reboots.

Here's a recent example:
http://lkml.indiana.edu/hypermail/linux/kernel/0905.0/02061.html

The problem is, each reboot, we have to calibrate the TSC, and any
error, regardless of how small, in the calibrated freq has to be
corrected for by NTP. Assuming the error is within 500ppm NTP can
correct this, but until it finds the proper correction value for the new
TSC freq, users may see time offsets from the NTP server.

In my experience, its fairly easy to see 100khz variance from reboot to
reboot with 2.6.30-rc.

While I think its worth trying to improve the calibration further, there
will likely be a trade-off between very accurate calibration and fast
boot times.

To mitigate this, I wanted to provide a tsc_khz= boot option. This would
allow users to set the tsc_khz value at boot-up, assuming they are
within 1Mhz of the calibrated value (to protect against bad values).
Once the tsc_khz value is set in grub, the box will always boot with the
same value, so the NTP drift value prior to reboot will still be correct
after rebooting.

Thanks to George Spelvin for the idea:
	http://lkml.indiana.edu/hypermail/linux/kernel/0905.0/02807.html


Thoughts or feedback?

Signed-off-by: John Stultz <johnstul@us.ibm.com>

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 90b3924..1f9ee19 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2396,6 +2396,13 @@ and is between 256 and 4096 characters. It is defined in the file
 			Used to enable high-resolution timer mode on older
 			hardware, and in virtualized environment.
 
+	tsc_khz=	[x86] Set the TSC freq value.
+			Format: <khz>
+			Used to avoid TSC calibration error.
+			The specified tsc_khz value must be accurate to the
+			calibrated tsc value by 1MHz. Otherwise the calibrated
+			value will be used.
+
 	turbografx.map[2|3]=	[HW,JOY]
 			TurboGraFX parallel port interface
 			Format:
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index d57de05..93fc4c0 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -825,6 +825,16 @@ static void __init init_tsc_clocksource(void)
 	clocksource_register(&clocksource_tsc);
 }
 
+unsigned long tsc_khz_specified;
+static int __init tsc_khz_specified_setup(char *str)
+{
+	tsc_khz_specified = simple_strtoul(str, NULL, 0);
+	return 1;
+}
+
+__setup("tsc_khz=", tsc_khz_specified_setup);
+
+
 void __init tsc_init(void)
 {
 	u64 lpj;
@@ -834,6 +844,20 @@ void __init tsc_init(void)
 		return;
 
 	tsc_khz = calibrate_tsc();
+
+	/*
+	 * If the calibrated TSC freq and user specified
+	 * TSC freq are close enough, pick the what the
+	 * user told us.
+	 */
+	if (tsc_khz_specified && ((tsc_khz+(tsc_khz/2))/1000 ==
+			(tsc_khz_specified+(tsc_khz_specified/2))/1000)) {
+		printk(KERN_INFO "Using user defined TSC freq: %lu.%03lu MHz\n",
+			tsc_khz_specified/1000,
+			tsc_khz_specified%1000);
+		tsc_khz = tsc_khz_specified;
+	}
+
 	cpu_khz = tsc_khz;
 
 	if (!tsc_khz) {




^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] tsc_khz= boot option to avoid TSC calibration variance
  2009-05-08  0:30 [RFC][PATCH] tsc_khz= boot option to avoid TSC calibration variance john stultz
@ 2009-05-08 12:15 ` Ingo Molnar
  2009-05-08 15:12   ` George Spelvin
  2009-05-08 19:06   ` [RFC][PATCH] tsc_khz= boot option to avoid TSC calibrationvariance john stultz
  2009-05-08 12:19 ` [RFC][PATCH] tsc_khz= boot option to avoid TSC calibration variance George Spelvin
  1 sibling, 2 replies; 8+ messages in thread
From: Ingo Molnar @ 2009-05-08 12:15 UTC (permalink / raw)
  To: john stultz
  Cc: George Spelvin, ulrich.windl, linux-kernel, tglx, Clark Williams,
	zippel, Linus Torvalds


* john stultz <johnstul@us.ibm.com> wrote:

> All, 
> 
> Despite recent tweaking, TSC calibration variance is still biting 
> users who care about keeping close sync with NTP servers over 
> reboots.
> 
> Here's a recent example: 
> http://lkml.indiana.edu/hypermail/linux/kernel/0905.0/02061.html
> 
> The problem is, each reboot, we have to calibrate the TSC, and any 
> error, regardless of how small, in the calibrated freq has to be 
> corrected for by NTP. Assuming the error is within 500ppm NTP can 
> correct this, but until it finds the proper correction value for 
> the new TSC freq, users may see time offsets from the NTP server.
> 
> In my experience, its fairly easy to see 100khz variance from 
> reboot to reboot with 2.6.30-rc.
> 
> While I think its worth trying to improve the calibration further, 
> there will likely be a trade-off between very accurate calibration 
> and fast boot times.
> 
> To mitigate this, I wanted to provide a tsc_khz= boot option. This 
> would allow users to set the tsc_khz value at boot-up, assuming 
> they are within 1Mhz of the calibrated value (to protect against 
> bad values). Once the tsc_khz value is set in grub, the box will 
> always boot with the same value, so the NTP drift value prior to 
> reboot will still be correct after rebooting.
> 
> Thanks to George Spelvin for the idea:
> 	http://lkml.indiana.edu/hypermail/linux/kernel/0905.0/02807.html
> 
> Thoughts or feedback?
> 
> Signed-off-by: John Stultz <johnstul@us.ibm.com>

Wouldnt it be a lot more flexible to have a sysctl for this, which 
would be set before ntpd is started? (or which would be set by ntpd)

The mechanism and semantics would be similar: we would _not_ expose 
cpu_khz directly, we'd have a boot_cpu_khz value saved for sure, and 
we'd allow the sysctl to set the cpu_khz to within 1MHz of cpu_khz - 
and we'd re-scale the timer irq and other calibrated values 
accordingly.

Alternatively, a much simpler method: why doesnt ntpd save its own 
notion of cpu_khz once it has reached stability, and reads cpu_khz 
(from /proc/cpuinfo) during bootup and re-scales its initial offset 
and phase shift accordingly, compensating for that noise? (if it's 
within 1MHz)

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] tsc_khz= boot option to avoid TSC calibration variance
  2009-05-08  0:30 [RFC][PATCH] tsc_khz= boot option to avoid TSC calibration variance john stultz
  2009-05-08 12:15 ` Ingo Molnar
@ 2009-05-08 12:19 ` George Spelvin
  2009-05-08 15:27   ` George Spelvin
  2009-05-08 19:02   ` john stultz
  1 sibling, 2 replies; 8+ messages in thread
From: George Spelvin @ 2009-05-08 12:19 UTC (permalink / raw)
  To: johnstul, linux; +Cc: linux-kernel, mingo, tglx, ulrich.windl, williams, zippel

> To mitigate this, I wanted to provide a tsc_khz= boot option. This would
> allow users to set the tsc_khz value at boot-up, assuming they are
> within 1Mhz of the calibrated value (to protect against bad values).
> Once the tsc_khz value is set in grub, the box will always boot with the
> same value, so the NTP drift value prior to reboot will still be correct
> after rebooting.

A run-time adjustable would be more convenient, but this is simple and works.

The 1 MHz tolerance, however, isn't implemented right.  I can't quote
figure out what you were trying to do; was that (x + (x/2))/1000 trying
to round the /1000 properly?  That should be (x + 1000/2)/1000 in that
case, which I normally write as (x/500 + 1)/2;

But in any case, no equality comparison can possibly work; there's
always a case where the measured value rounds to k and the correct
value rounds to k+1.  Also, 1 MHz is a pretty wide tolerance on
sub-GHz processors.  I'd suggest the following:

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index d57de05..d7ab640 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -825,15 +825,42 @@ static void __init init_tsc_clocksource(void)
 	clocksource_register(&clocksource_tsc);
 }
 
+unsigned long tsc_khz_specified;
+static int __init tsc_khz_specified_setup(char *str)
+{
+	tsc_khz_specified = simple_strtoul(str, NULL, 0);
+	return 1;
+}
+
+__setup("tsc_khz=", tsc_khz_specified_setup);
+
+
 void __init tsc_init(void)
 {
 	u64 lpj;
 	int cpu;
+	long difference;
 
 	if (!cpu_has_tsc)
 		return;
 
 	tsc_khz = calibrate_tsc();
+
+	/*
*	 * If the calibrated TSC freq and user specified TSC freq
*	 * are close enough, pick the what the user told us.  Reject
*	 * obviously bogus values to make the option safe to use.
+	 */
+	difference = tsc_khz - tsc_khz_specified;
+	if (difference < 0)
+		difference = -difference;
+	if (difference <= tsc_khz >> 10) {	/* 1/1024 = 976 ppm */
+		printk(KERN_INFO "Using user defined TSC freq: %lu.%03lu MHz\n",
+			tsc_khz_specified/1000,
+			tsc_khz_specified%1000);
+		tsc_khz = tsc_khz_specified;
+	}
+
 	cpu_khz = tsc_khz;
 
 	if (!tsc_khz) {

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] tsc_khz= boot option to avoid TSC calibration variance
  2009-05-08 12:15 ` Ingo Molnar
@ 2009-05-08 15:12   ` George Spelvin
  2009-05-08 19:06   ` [RFC][PATCH] tsc_khz= boot option to avoid TSC calibrationvariance john stultz
  1 sibling, 0 replies; 8+ messages in thread
From: George Spelvin @ 2009-05-08 15:12 UTC (permalink / raw)
  To: johnstul, mingo
  Cc: linux-kernel, linux, tglx, torvalds, ulrich.windl, williams,
	zippel

> Wouldnt it be a lot more flexible to have a sysctl for this, which 
> would be set before ntpd is started? (or which would be set by ntpd)

I thought so, but a boot-time option is sufficient and simpler.

> The mechanism and semantics would be similar: we would _not_ expose 
> cpu_khz directly, we'd have a boot_cpu_khz value saved for sure, and 
> we'd allow the sysctl to set the cpu_khz to within 1MHz of cpu_khz - 
> and we'd re-scale the timer irq and other calibrated values 
> accordingly.

Yes, that would be nicer.

> Alternatively, a much simpler method: why doesnt ntpd save its own 
> notion of cpu_khz once it has reached stability, and reads cpu_khz 
> (from /proc/cpuinfo) during bootup and re-scales its initial offset 
> and phase shift accordingly, compensating for that noise? (if it's 
> within 1MHz)

The problem is, it's only an issue if the TSC is used as the system
time base, which is a kernel detail that a) NTP really doesn't want to
be bothered with, and b) can change at run time.

It's possible, but it feels like a kludge.  It's such a Linux-specific
detail that it seems cleaner to fix it at the source.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] tsc_khz= boot option to avoid TSC calibration variance
  2009-05-08 12:19 ` [RFC][PATCH] tsc_khz= boot option to avoid TSC calibration variance George Spelvin
@ 2009-05-08 15:27   ` George Spelvin
  2009-05-08 19:02   ` john stultz
  1 sibling, 0 replies; 8+ messages in thread
From: George Spelvin @ 2009-05-08 15:27 UTC (permalink / raw)
  To: johnstul, linux; +Cc: linux-kernel, mingo, tglx, ulrich.windl, williams, zippel

> I'd suggest the following:

I should add the documentation patch that goes with it:
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 90b3924..6747881 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2391,10 +2391,20 @@ and is between 256 and 4096 characters. It is defined in the file
 
 	tsc=		Disable clocksource-must-verify flag for TSC.
 			Format: <string>
-			[x86] reliable: mark tsc clocksource as reliable, this
-			disables clocksource verification at runtime.
-			Used to enable high-resolution timer mode on older
-			hardware, and in virtualized environment.
+			[x86] if value is "reliable", mark tsc clocksource
+			as reliable.  This disables clocksource verification
+			at runtime.  Used to enable high-resolution timer mode
+			on older hardware, and in virtualized environment.
+
+	tsc_khz=	[x86] Set the TSC freq value.
+			Format: <khz>
+			Used to avoid TSC calibration error.  The TSC frequency
+			is re-measured each boot, and changes by a few parts
+			per hundred thousand each time.  This is noticeable to
+			time synchronization applications like NTP.
+			This option makes it consistent across boots.
+			If the specified value differs from the measured one
+			by more than 0.1%, it is ignored.
 
 	turbografx.map[2|3]=	[HW,JOY]
 			TurboGraFX parallel port interface

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] tsc_khz= boot option to avoid TSC calibration  variance
  2009-05-08 12:19 ` [RFC][PATCH] tsc_khz= boot option to avoid TSC calibration variance George Spelvin
  2009-05-08 15:27   ` George Spelvin
@ 2009-05-08 19:02   ` john stultz
  2009-05-11  6:54     ` George Spelvin
  1 sibling, 1 reply; 8+ messages in thread
From: john stultz @ 2009-05-08 19:02 UTC (permalink / raw)
  To: George Spelvin; +Cc: linux-kernel, mingo, tglx, ulrich.windl, williams, zippel

On Fri, 2009-05-08 at 08:19 -0400, George Spelvin wrote:
> > To mitigate this, I wanted to provide a tsc_khz= boot option. This would
> > allow users to set the tsc_khz value at boot-up, assuming they are
> > within 1Mhz of the calibrated value (to protect against bad values).
> > Once the tsc_khz value is set in grub, the box will always boot with the
> > same value, so the NTP drift value prior to reboot will still be correct
> > after rebooting.
> 
> A run-time adjustable would be more convenient, but this is simple and works.
> 
> The 1 MHz tolerance, however, isn't implemented right.  I can't quote
> figure out what you were trying to do; was that (x + (x/2))/1000 trying
> to round the /1000 properly?  That should be (x + 1000/2)/1000 in that
> case, which I normally write as (x/500 + 1)/2;
> 
> But in any case, no equality comparison can possibly work; there's
> always a case where the measured value rounds to k and the correct
> value rounds to k+1.  Also, 1 MHz is a pretty wide tolerance on
> sub-GHz processors.  I'd suggest the following:

Ah. Indeed you're right. Thanks for catching this.
Your version is much better.

-john

> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index d57de05..d7ab640 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -825,15 +825,42 @@ static void __init init_tsc_clocksource(void)
>  	clocksource_register(&clocksource_tsc);
>  }
> 
> +unsigned long tsc_khz_specified;
> +static int __init tsc_khz_specified_setup(char *str)
> +{
> +	tsc_khz_specified = simple_strtoul(str, NULL, 0);
> +	return 1;
> +}
> +
> +__setup("tsc_khz=", tsc_khz_specified_setup);
> +
> +
>  void __init tsc_init(void)
>  {
>  	u64 lpj;
>  	int cpu;
> +	long difference;
> 
>  	if (!cpu_has_tsc)
>  		return;
> 
>  	tsc_khz = calibrate_tsc();
> +
> +	/*
> *	 * If the calibrated TSC freq and user specified TSC freq
> *	 * are close enough, pick the what the user told us.  Reject
> *	 * obviously bogus values to make the option safe to use.
> +	 */
> +	difference = tsc_khz - tsc_khz_specified;
> +	if (difference < 0)
> +		difference = -difference;
> +	if (difference <= tsc_khz >> 10) {	/* 1/1024 = 976 ppm */
> +		printk(KERN_INFO "Using user defined TSC freq: %lu.%03lu MHz\n",
> +			tsc_khz_specified/1000,
> +			tsc_khz_specified%1000);
> +		tsc_khz = tsc_khz_specified;
> +	}
> +
>  	cpu_khz = tsc_khz;
> 
>  	if (!tsc_khz) {


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] tsc_khz= boot option to avoid TSC  calibrationvariance
  2009-05-08 12:15 ` Ingo Molnar
  2009-05-08 15:12   ` George Spelvin
@ 2009-05-08 19:06   ` john stultz
  1 sibling, 0 replies; 8+ messages in thread
From: john stultz @ 2009-05-08 19:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: George Spelvin, ulrich.windl, linux-kernel, tglx, Clark Williams,
	zippel, Linus Torvalds

On Fri, 2009-05-08 at 14:15 +0200, Ingo Molnar wrote:
> * john stultz <johnstul@us.ibm.com> wrote:
> 
> > All, 
> > 
> > Despite recent tweaking, TSC calibration variance is still biting 
> > users who care about keeping close sync with NTP servers over 
> > reboots.
> > 
> > Here's a recent example: 
> > http://lkml.indiana.edu/hypermail/linux/kernel/0905.0/02061.html
> > 
> > The problem is, each reboot, we have to calibrate the TSC, and any 
> > error, regardless of how small, in the calibrated freq has to be 
> > corrected for by NTP. Assuming the error is within 500ppm NTP can 
> > correct this, but until it finds the proper correction value for 
> > the new TSC freq, users may see time offsets from the NTP server.
> > 
> > In my experience, its fairly easy to see 100khz variance from 
> > reboot to reboot with 2.6.30-rc.
> > 
> > While I think its worth trying to improve the calibration further, 
> > there will likely be a trade-off between very accurate calibration 
> > and fast boot times.
> > 
> > To mitigate this, I wanted to provide a tsc_khz= boot option. This 
> > would allow users to set the tsc_khz value at boot-up, assuming 
> > they are within 1Mhz of the calibrated value (to protect against 
> > bad values). Once the tsc_khz value is set in grub, the box will 
> > always boot with the same value, so the NTP drift value prior to 
> > reboot will still be correct after rebooting.
> > 
> > Thanks to George Spelvin for the idea:
> > 	http://lkml.indiana.edu/hypermail/linux/kernel/0905.0/02807.html
> > 
> > Thoughts or feedback?
> > 
> > Signed-off-by: John Stultz <johnstul@us.ibm.com>
> 
> Wouldnt it be a lot more flexible to have a sysctl for this, which 
> would be set before ntpd is started? (or which would be set by ntpd)

The difficulty is that once we've initialized the TSC clocksource and
its in use, its difficult to just override the existing freq. This is
partially why we disqualify the TSC if it changes freq from power
management.

> The mechanism and semantics would be similar: we would _not_ expose 
> cpu_khz directly, we'd have a boot_cpu_khz value saved for sure, and 
> we'd allow the sysctl to set the cpu_khz to within 1MHz of cpu_khz - 
> and we'd re-scale the timer irq and other calibrated values 
> accordingly.
> 
> Alternatively, a much simpler method: why doesnt ntpd save its own 
> notion of cpu_khz once it has reached stability, and reads cpu_khz 
> (from /proc/cpuinfo) during bootup and re-scales its initial offset 
> and phase shift accordingly, compensating for that noise? (if it's 
> within 1MHz)

Yea, I'm with George on this. It would be a very linux-specific cludge
(really even further, linux-x86 specific). Further it would require NTP
to figure out which clocksource is being used (as ACPI PM and HPET don't
have this calibration error) before applying. 

Seems poor to push the problem to NTP when the in-kernel calibration
code is really the cause of the issue.

thanks
-john



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] tsc_khz= boot option to avoid TSC calibration variance
  2009-05-08 19:02   ` john stultz
@ 2009-05-11  6:54     ` George Spelvin
  0 siblings, 0 replies; 8+ messages in thread
From: George Spelvin @ 2009-05-11  6:54 UTC (permalink / raw)
  To: johnstul, linux; +Cc: linux-kernel, mingo, tglx, ulrich.windl, williams, zippel

Just as a data point, I have this patch running on two PPS-synced
NTP servers with apparent success.  After thinking through the sign
conventions carefully, (if NTP reports a that it is applying a positive
clock frequency adjustment, the TSC frequency should be reduced by that
amount), I now have the NTP frequency offset within a couple PPM of 0.

Thanks for the patch!

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-05-11  6:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-08  0:30 [RFC][PATCH] tsc_khz= boot option to avoid TSC calibration variance john stultz
2009-05-08 12:15 ` Ingo Molnar
2009-05-08 15:12   ` George Spelvin
2009-05-08 19:06   ` [RFC][PATCH] tsc_khz= boot option to avoid TSC calibrationvariance john stultz
2009-05-08 12:19 ` [RFC][PATCH] tsc_khz= boot option to avoid TSC calibration variance George Spelvin
2009-05-08 15:27   ` George Spelvin
2009-05-08 19:02   ` john stultz
2009-05-11  6:54     ` George Spelvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox