* [PATCH] i386: fix TSC clock source calibration error
@ 2007-10-16 14:50 Dave Johnson
2007-10-16 15:11 ` Chuck Ebbert
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Dave Johnson @ 2007-10-16 14:50 UTC (permalink / raw)
To: linux-kernel
From: Dave Johnson <djohnson@sw.starentnetworks.com>
I ran into this problem on a system that was unable to obtain NTP sync
because the clock was running very slow (over 10000ppm slow). ntpd had
declared all of its peers 'reject' with 'peer_dist' reason.
On investigation, the tsc_khz variable was significantly incorrect
causing xtime to run slow. After a reboot tsc_khz was correct so I
did a reboot test to see how often the problem occurred:
Test was done on a 2000 Mhz Xeon system. Of 689 reboots, 8 of them
had unacceptable tsc_khz values (>500ppm):
range of tsc_khz # of boots % of boots
----------------- ---------- ----------
< 1999750 0 0.000%
1999750 - 1999800 21 3.048%
1999800 - 1999850 166 24.128%
1999850 - 1999900 241 35.029%
1999900 - 1999950 211 30.669%
1999950 - 2000000 42 6.105%
2000000 - 2000000 0 0.000%
2000050 - 2000100 0 0.000%
[...]
2000100 - 2015000 1 0.145% << BAD
2015000 - 2030000 6 0.872% << BAD
2030000 - 2045000 1 0.145% << BAD
2045000 < 0 0.000%
The worst boot was 2032.577 Mhz, over 1.5% off!
It appears that on rare occasions, mach_countup() is taking longer to
complete than necessary.
I suspect that this is caused by the CPU taking a periodic SMI
interrupt right at the end of the 30ms calibration loop. This would
cause the loop to delay while the SMI BIOS hander runs. The resulting
TSC value is beyond what it actually should be resulting in a higher
tsc_khz.
The below patch makes native_calculate_cpu_khz() take the best
(shortest duration, lowest khz) run of it's 3 calibration loops. If a
SMI goes off causing a bad result (long duration, higher khz) it will
be discarded.
With the patch applied, 300 boots of the same system produce good
results:
range of tsc_khz # of boots % of boots
----------------- ---------- ----------
< 1999750 0 0.000%
1999750 - 1999800 30 10.000%
1999800 - 1999850 166 55.333%
1999850 - 1999900 89 29.667%
1999900 - 1999950 15 5.000%
1999950 < 0 0.000%
Problem was found and tested against 2.6.18. Patch is against 2.6.22.
Signed-off-by: Dave Johnson <djohnson@sw.starentnetworks.com>
===== arch/i386/kernel/tsc.c 1.27 vs edited =====
--- 1.27/arch/i386/kernel/tsc.c 2007-05-02 13:27:18 -04:00
+++ edited/arch/i386/kernel/tsc.c 2007-10-15 16:31:04 -04:00
@@ -122,7 +122,7 @@
{
unsigned long long start, end;
unsigned long count;
- u64 delta64;
+ u64 delta64 = (u64)ULLONG_MAX;
int i;
unsigned long flags;
@@ -134,6 +134,7 @@
rdtscll(start);
mach_countup(&count);
rdtscll(end);
+ delta64 = min(delta64, (end - start));
}
/*
* Error: ECTCNEVERSET
@@ -143,8 +144,6 @@
*/
if (count <= 1)
goto err;
-
- delta64 = end - start;
/* cpu freq too fast: */
if (delta64 > (1ULL<<32))
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] i386: fix TSC clock source calibration error
2007-10-16 14:50 [PATCH] i386: fix TSC clock source calibration error Dave Johnson
@ 2007-10-16 15:11 ` Chuck Ebbert
2007-10-18 8:57 ` Ingo Molnar
2007-10-18 20:43 ` [PATCH] i386: fix TSC clock source calibration error Daniel Walker
2 siblings, 0 replies; 9+ messages in thread
From: Chuck Ebbert @ 2007-10-16 15:11 UTC (permalink / raw)
To: Dave Johnson; +Cc: linux-kernel, Thomas Gleixner
On 10/16/2007 10:50 AM, Dave Johnson wrote:
> From: Dave Johnson <djohnson@sw.starentnetworks.com>
>
> I ran into this problem on a system that was unable to obtain NTP sync
> because the clock was running very slow (over 10000ppm slow). ntpd had
> declared all of its peers 'reject' with 'peer_dist' reason.
>
> On investigation, the tsc_khz variable was significantly incorrect
> causing xtime to run slow. After a reboot tsc_khz was correct so I
> did a reboot test to see how often the problem occurred:
>
> Test was done on a 2000 Mhz Xeon system. Of 689 reboots, 8 of them
> had unacceptable tsc_khz values (>500ppm):
>
> range of tsc_khz # of boots % of boots
> ----------------- ---------- ----------
> < 1999750 0 0.000%
> 1999750 - 1999800 21 3.048%
> 1999800 - 1999850 166 24.128%
> 1999850 - 1999900 241 35.029%
> 1999900 - 1999950 211 30.669%
> 1999950 - 2000000 42 6.105%
> 2000000 - 2000000 0 0.000%
> 2000050 - 2000100 0 0.000%
> [...]
> 2000100 - 2015000 1 0.145% << BAD
> 2015000 - 2030000 6 0.872% << BAD
> 2030000 - 2045000 1 0.145% << BAD
> 2045000 < 0 0.000%
>
> The worst boot was 2032.577 Mhz, over 1.5% off!
>
> It appears that on rare occasions, mach_countup() is taking longer to
> complete than necessary.
>
> I suspect that this is caused by the CPU taking a periodic SMI
> interrupt right at the end of the 30ms calibration loop. This would
> cause the loop to delay while the SMI BIOS hander runs. The resulting
> TSC value is beyond what it actually should be resulting in a higher
> tsc_khz.
>
> The below patch makes native_calculate_cpu_khz() take the best
> (shortest duration, lowest khz) run of it's 3 calibration loops. If a
> SMI goes off causing a bad result (long duration, higher khz) it will
> be discarded.
>
> With the patch applied, 300 boots of the same system produce good
> results:
>
> range of tsc_khz # of boots % of boots
> ----------------- ---------- ----------
> < 1999750 0 0.000%
> 1999750 - 1999800 30 10.000%
> 1999800 - 1999850 166 55.333%
> 1999850 - 1999900 89 29.667%
> 1999900 - 1999950 15 5.000%
> 1999950 < 0 0.000%
>
> Problem was found and tested against 2.6.18. Patch is against 2.6.22.
>
> Signed-off-by: Dave Johnson <djohnson@sw.starentnetworks.com>
>
> ===== arch/i386/kernel/tsc.c 1.27 vs edited =====
> --- 1.27/arch/i386/kernel/tsc.c 2007-05-02 13:27:18 -04:00
> +++ edited/arch/i386/kernel/tsc.c 2007-10-15 16:31:04 -04:00
> @@ -122,7 +122,7 @@
> {
> unsigned long long start, end;
> unsigned long count;
> - u64 delta64;
> + u64 delta64 = (u64)ULLONG_MAX;
> int i;
> unsigned long flags;
>
> @@ -134,6 +134,7 @@
> rdtscll(start);
> mach_countup(&count);
> rdtscll(end);
> + delta64 = min(delta64, (end - start));
> }
> /*
> * Error: ECTCNEVERSET
> @@ -143,8 +144,6 @@
> */
> if (count <= 1)
> goto err;
> -
> - delta64 = end - start;
>
> /* cpu freq too fast: */
> if (delta64 > (1ULL<<32))
>
(added cc:)
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] i386: fix TSC clock source calibration error
2007-10-16 14:50 [PATCH] i386: fix TSC clock source calibration error Dave Johnson
2007-10-16 15:11 ` Chuck Ebbert
@ 2007-10-18 8:57 ` Ingo Molnar
2007-10-19 17:16 ` [PATCH] i386: fix TSC clock source calibration error [part 2] Dave Johnson
2007-10-18 20:43 ` [PATCH] i386: fix TSC clock source calibration error Daniel Walker
2 siblings, 1 reply; 9+ messages in thread
From: Ingo Molnar @ 2007-10-18 8:57 UTC (permalink / raw)
To: Dave Johnson; +Cc: linux-kernel, Thomas Gleixner, Greg KH, Chris Wright
* Dave Johnson <djohnson+linux-kernel@sw.starentnetworks.com> wrote:
> I ran into this problem on a system that was unable to obtain NTP sync
> because the clock was running very slow (over 10000ppm slow). ntpd had
> declared all of its peers 'reject' with 'peer_dist' reason.
>
> On investigation, the tsc_khz variable was significantly incorrect
> causing xtime to run slow. After a reboot tsc_khz was correct so I
> did a reboot test to see how often the problem occurred:
>
> Test was done on a 2000 Mhz Xeon system. Of 689 reboots, 8 of them
> had unacceptable tsc_khz values (>500ppm):
>
> range of tsc_khz # of boots % of boots
> ----------------- ---------- ----------
> < 1999750 0 0.000%
> 1999750 - 1999800 21 3.048%
> 1999800 - 1999850 166 24.128%
> 1999850 - 1999900 241 35.029%
> 1999900 - 1999950 211 30.669%
> 1999950 - 2000000 42 6.105%
> 2000000 - 2000000 0 0.000%
> 2000050 - 2000100 0 0.000%
> [...]
> 2000100 - 2015000 1 0.145% << BAD
> 2015000 - 2030000 6 0.872% << BAD
> 2030000 - 2045000 1 0.145% << BAD
> 2045000 < 0 0.000%
>
> The worst boot was 2032.577 Mhz, over 1.5% off!
you are plain crazy, 689 reboots! :-)
> It appears that on rare occasions, mach_countup() is taking longer to
> complete than necessary.
>
> I suspect that this is caused by the CPU taking a periodic SMI
> interrupt right at the end of the 30ms calibration loop. This would
> cause the loop to delay while the SMI BIOS hander runs. The resulting
> TSC value is beyond what it actually should be resulting in a higher
> tsc_khz.
>
> The below patch makes native_calculate_cpu_khz() take the best
> (shortest duration, lowest khz) run of it's 3 calibration loops. If a
> SMI goes off causing a bad result (long duration, higher khz) it will
> be discarded.
>
> With the patch applied, 300 boots of the same system produce good
> results:
>
> range of tsc_khz # of boots % of boots
> ----------------- ---------- ----------
> < 1999750 0 0.000%
> 1999750 - 1999800 30 10.000%
> 1999800 - 1999850 166 55.333%
> 1999850 - 1999900 89 29.667%
> 1999900 - 1999950 15 5.000%
> 1999950 < 0 0.000%
>
> Problem was found and tested against 2.6.18. Patch is against 2.6.22.
very cool problem description and debugging, and a very nice patch!
We've added your fix to the x86 tree, will go to Linus in the next batch
of fixes. This patch is a stable kernel candidate as well.
Ingo
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] i386: fix TSC clock source calibration error [part 2]
2007-10-18 8:57 ` Ingo Molnar
@ 2007-10-19 17:16 ` Dave Johnson
2007-10-19 18:01 ` Hiroshi Shimamoto
2007-10-22 11:42 ` Ingo Molnar
0 siblings, 2 replies; 9+ messages in thread
From: Dave Johnson @ 2007-10-19 17:16 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel, Thomas Gleixner, Greg KH, Chris Wright
From: Dave Johnson <djohnson@sw.starentnetworks.com>
The previous patch wasn't correctly handling the 'count' variable. If
a CPU gave bad results on the 1st or 2nd run but good results on the
3rd, it wouldn't do the correct thing. No idea if any such CPU
exists, but the patch below handles that case by discarding the bad
runs.
If a bad result (too quick, or too slow) occurs on any of the 3 runs
it will be discarded.
Also updated some comments to explain what's going on.
Signed-off-by: Dave Johnson <djohnson@sw.starentnetworks.com>
---
Should be applied after my previous patch.
===== arch/i386/kernel/tsc.c 1.28 vs edited =====
--- 1.28/arch/i386/kernel/tsc.c 2007-10-19 11:07:36 -04:00
+++ edited/arch/i386/kernel/tsc.c 2007-10-19 11:07:44 -04:00
@@ -128,29 +128,35 @@
local_irq_save(flags);
- /* run 3 times to ensure the cache is warm */
+ /* run 3 times to ensure the cache is warm and to get an accurate reading */
for (i = 0; i < 3; i++) {
mach_prepare_counter();
rdtscll(start);
mach_countup(&count);
rdtscll(end);
+
+ /*
+ * Error: ECTCNEVERSET
+ * The CTC wasn't reliable: we got a hit on the very first read,
+ * or the CPU was so fast/slow that the quotient wouldn't fit in
+ * 32 bits..
+ */
+ if (count <= 1)
+ continue;
+
+ /* cpu freq too slow: */
+ if ((end - start) <= CALIBRATE_TIME_MSEC)
+ continue;
+
+ /*
+ * We want the minimum time of all runs in case one of them
+ * is inaccurate due to SMI or other delay
+ */
delta64 = min(delta64, (end - start));
}
- /*
- * Error: ECTCNEVERSET
- * The CTC wasn't reliable: we got a hit on the very first read,
- * or the CPU was so fast/slow that the quotient wouldn't fit in
- * 32 bits..
- */
- if (count <= 1)
- goto err;
- /* cpu freq too fast: */
+ /* cpu freq too fast (or every run was bad): */
if (delta64 > (1ULL<<32))
- goto err;
-
- /* cpu freq too slow: */
- if (delta64 <= CALIBRATE_TIME_MSEC)
goto err;
delta64 += CALIBRATE_TIME_MSEC/2; /* round for do_div */
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] i386: fix TSC clock source calibration error [part 2]
2007-10-19 17:16 ` [PATCH] i386: fix TSC clock source calibration error [part 2] Dave Johnson
@ 2007-10-19 18:01 ` Hiroshi Shimamoto
2007-10-19 18:34 ` Dave Johnson
2007-10-22 11:42 ` Ingo Molnar
1 sibling, 1 reply; 9+ messages in thread
From: Hiroshi Shimamoto @ 2007-10-19 18:01 UTC (permalink / raw)
To: Dave Johnson
Cc: Ingo Molnar, linux-kernel, Thomas Gleixner, Greg KH, Chris Wright
Dave Johnson wrote:
> From: Dave Johnson <djohnson@sw.starentnetworks.com>
>
> The previous patch wasn't correctly handling the 'count' variable. If
> a CPU gave bad results on the 1st or 2nd run but good results on the
> 3rd, it wouldn't do the correct thing. No idea if any such CPU
> exists, but the patch below handles that case by discarding the bad
> runs.
>
> If a bad result (too quick, or too slow) occurs on any of the 3 runs
> it will be discarded.
>
> Also updated some comments to explain what's going on.
>
> Signed-off-by: Dave Johnson <djohnson@sw.starentnetworks.com>
>
> ---
>
> Should be applied after my previous patch.
>
> ===== arch/i386/kernel/tsc.c 1.28 vs edited =====
> --- 1.28/arch/i386/kernel/tsc.c 2007-10-19 11:07:36 -04:00
> +++ edited/arch/i386/kernel/tsc.c 2007-10-19 11:07:44 -04:00
> @@ -128,29 +128,35 @@
>
> local_irq_save(flags);
>
> - /* run 3 times to ensure the cache is warm */
> + /* run 3 times to ensure the cache is warm and to get an accurate reading */
> for (i = 0; i < 3; i++) {
> mach_prepare_counter();
It's a really rare case, but if SMI interrupt takes CPU here, just after
prepare and before rdtscll, it makes delta64 shorter than expected one.
Is it possible? And how about moving rdtscll before mach_prepare_counter()?
> rdtscll(start);
Thanks
Hiroshi Shimamoto
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] i386: fix TSC clock source calibration error [part 2]
2007-10-19 18:01 ` Hiroshi Shimamoto
@ 2007-10-19 18:34 ` Dave Johnson
0 siblings, 0 replies; 9+ messages in thread
From: Dave Johnson @ 2007-10-19 18:34 UTC (permalink / raw)
To: Hiroshi Shimamoto
Cc: Ingo Molnar, linux-kernel, Thomas Gleixner, Greg KH, Chris Wright
Hiroshi Shimamoto writes:
> Dave Johnson wrote:
> > mach_prepare_counter();
>
> It's a really rare case, but if SMI interrupt takes CPU here, just after
> prepare and before rdtscll, it makes delta64 shorter than expected one.
> Is it possible? And how about moving rdtscll before mach_prepare_counter()?
>
> > rdtscll(start);
Yep, rare indeed (about 1 instruction). Moving the start read before
the prepare call should solve that one too providing the setup doesn't
take any real measurable time.
--
Dave Johnson
Starent Networks
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] i386: fix TSC clock source calibration error [part 2]
2007-10-19 17:16 ` [PATCH] i386: fix TSC clock source calibration error [part 2] Dave Johnson
2007-10-19 18:01 ` Hiroshi Shimamoto
@ 2007-10-22 11:42 ` Ingo Molnar
1 sibling, 0 replies; 9+ messages in thread
From: Ingo Molnar @ 2007-10-22 11:42 UTC (permalink / raw)
To: Dave Johnson; +Cc: linux-kernel, Thomas Gleixner, Greg KH, Chris Wright
* Dave Johnson <djohnson+linux-kernel@sw.starentnetworks.com> wrote:
> The previous patch wasn't correctly handling the 'count' variable. If
> a CPU gave bad results on the 1st or 2nd run but good results on the
> 3rd, it wouldn't do the correct thing. No idea if any such CPU
> exists, but the patch below handles that case by discarding the bad
> runs.
>
> If a bad result (too quick, or too slow) occurs on any of the 3 runs
> it will be discarded.
>
> Also updated some comments to explain what's going on.
thanks - applied in addition to your previous patch.
Ingo
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] i386: fix TSC clock source calibration error
2007-10-16 14:50 [PATCH] i386: fix TSC clock source calibration error Dave Johnson
2007-10-16 15:11 ` Chuck Ebbert
2007-10-18 8:57 ` Ingo Molnar
@ 2007-10-18 20:43 ` Daniel Walker
2007-10-19 17:31 ` Dave Johnson
2 siblings, 1 reply; 9+ messages in thread
From: Daniel Walker @ 2007-10-18 20:43 UTC (permalink / raw)
To: Dave Johnson; +Cc: linux-kernel
On Tue, 2007-10-16 at 10:50 -0400, Dave Johnson wrote:
>
> range of tsc_khz # of boots % of boots
> ----------------- ---------- ----------
> < 1999750 0 0.000%
> 1999750 - 1999800 21 3.048%
> 1999800 - 1999850 166 24.128%
> 1999850 - 1999900 241 35.029%
> 1999900 - 1999950 211 30.669%
> 1999950 - 2000000 42 6.105%
> 2000000 - 2000000 0 0.000%
> 2000050 - 2000100 0 0.000%
> [...]
> 2000100 - 2015000 1 0.145% << BAD
> 2015000 - 2030000 6 0.872% << BAD
> 2030000 - 2045000 1 0.145% << BAD
> 2045000 < 0 0.000%
>
> The worst boot was 2032.577 Mhz, over 1.5% off!
Can you tell us what type of machine this was? I've seen complaints
where the SMI's can cause some other funny stuff with calibration , be
no one can every reproduce anything..
Daniel
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2007-10-22 11:42 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-16 14:50 [PATCH] i386: fix TSC clock source calibration error Dave Johnson
2007-10-16 15:11 ` Chuck Ebbert
2007-10-18 8:57 ` Ingo Molnar
2007-10-19 17:16 ` [PATCH] i386: fix TSC clock source calibration error [part 2] Dave Johnson
2007-10-19 18:01 ` Hiroshi Shimamoto
2007-10-19 18:34 ` Dave Johnson
2007-10-22 11:42 ` Ingo Molnar
2007-10-18 20:43 ` [PATCH] i386: fix TSC clock source calibration error Daniel Walker
2007-10-19 17:31 ` Dave Johnson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox