public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Chuck Ebbert <cebbert@redhat.com>
To: Dave Johnson <djohnson+linux-kernel@sw.starentnetworks.com>
Cc: linux-kernel@vger.kernel.org, Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH] i386: fix TSC clock source calibration error
Date: Tue, 16 Oct 2007 11:11:19 -0400	[thread overview]
Message-ID: <4714D497.7010004@redhat.com> (raw)
In-Reply-To: <18196.53154.100115.92459@zeus.sw.starentnetworks.com>

On 10/16/2007 10:50 AM, Dave Johnson wrote:
> From: Dave Johnson <djohnson@sw.starentnetworks.com>
> 
> I ran into this problem on a system that was unable to obtain NTP sync
> because the clock was running very slow (over 10000ppm slow). ntpd had
> declared all of its peers 'reject' with 'peer_dist' reason.
> 
> On investigation, the tsc_khz variable was significantly incorrect
> causing xtime to run slow.  After a reboot tsc_khz was correct so I
> did a reboot test to see how often the problem occurred:
> 
> Test was done on a 2000 Mhz Xeon system.  Of 689 reboots, 8 of them
> had unacceptable tsc_khz values (>500ppm):
> 
>  range of tsc_khz  # of boots  % of boots
> -----------------  ----------  ----------
>         < 1999750           0      0.000%
> 1999750 - 1999800          21      3.048%
> 1999800 - 1999850         166     24.128%
> 1999850 - 1999900         241     35.029%
> 1999900 - 1999950         211     30.669%
> 1999950 - 2000000          42      6.105%
> 2000000 - 2000000           0      0.000%
> 2000050 - 2000100           0      0.000%
>                    [...]
> 2000100 - 2015000           1      0.145%  << BAD
> 2015000 - 2030000           6      0.872%  << BAD
> 2030000 - 2045000           1      0.145%  << BAD
> 2045000 <                   0      0.000%
> 
> The worst boot was 2032.577 Mhz, over 1.5% off!
> 
> It appears that on rare occasions, mach_countup() is taking longer to
> complete than necessary.
> 
> I suspect that this is caused by the CPU taking a periodic SMI
> interrupt right at the end of the 30ms calibration loop.  This would
> cause the loop to delay while the SMI BIOS hander runs. The resulting
> TSC value is beyond what it actually should be resulting in a higher
> tsc_khz.
> 
> The below patch makes native_calculate_cpu_khz() take the best
> (shortest duration, lowest khz) run of it's 3 calibration loops.  If a
> SMI goes off causing a bad result (long duration, higher khz) it will
> be discarded.
> 
> With the patch applied, 300 boots of the same system produce good
> results:
> 
>  range of tsc_khz  # of boots  % of boots
> -----------------  ----------  ----------
>         < 1999750           0      0.000%
> 1999750 - 1999800          30     10.000%
> 1999800 - 1999850         166     55.333%
> 1999850 - 1999900          89     29.667%
> 1999900 - 1999950          15      5.000%
> 1999950 <                   0      0.000%
> 
> Problem was found and tested against 2.6.18.  Patch is against 2.6.22.
> 
> Signed-off-by: Dave Johnson <djohnson@sw.starentnetworks.com>
> 
> ===== arch/i386/kernel/tsc.c 1.27 vs edited =====
> --- 1.27/arch/i386/kernel/tsc.c	2007-05-02 13:27:18 -04:00
> +++ edited/arch/i386/kernel/tsc.c	2007-10-15 16:31:04 -04:00
> @@ -122,7 +122,7 @@
>  {
>  	unsigned long long start, end;
>  	unsigned long count;
> -	u64 delta64;
> +	u64 delta64 = (u64)ULLONG_MAX;
>  	int i;
>  	unsigned long flags;
>  
> @@ -134,6 +134,7 @@
>  		rdtscll(start);
>  		mach_countup(&count);
>  		rdtscll(end);
> +		delta64 = min(delta64, (end - start));
>  	}
>  	/*
>  	 * Error: ECTCNEVERSET
> @@ -143,8 +144,6 @@
>  	 */
>  	if (count <= 1)
>  		goto err;
> -
> -	delta64 = end - start;
>  
>  	/* cpu freq too fast: */
>  	if (delta64 > (1ULL<<32))
> 

(added cc:)

  reply	other threads:[~2007-10-16 15:11 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-16 14:50 [PATCH] i386: fix TSC clock source calibration error Dave Johnson
2007-10-16 15:11 ` Chuck Ebbert [this message]
2007-10-18  8:57 ` Ingo Molnar
2007-10-19 17:16   ` [PATCH] i386: fix TSC clock source calibration error [part 2] Dave Johnson
2007-10-19 18:01     ` Hiroshi Shimamoto
2007-10-19 18:34       ` Dave Johnson
2007-10-22 11:42     ` Ingo Molnar
2007-10-18 20:43 ` [PATCH] i386: fix TSC clock source calibration error Daniel Walker
2007-10-19 17:31   ` Dave Johnson
  -- strict thread matches above, loose matches on Subject: below --
2007-10-19 18:45 Charles R Harris

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4714D497.7010004@redhat.com \
    --to=cebbert@redhat.com \
    --cc=djohnson+linux-kernel@sw.starentnetworks.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox