All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chuck Ebbert <cebbert@redhat.com>
To: Dave Johnson <djohnson+linux-kernel@sw.starentnetworks.com>
Cc: linux-kernel@vger.kernel.org, Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH] i386: fix TSC clock source calibration error
Date: Tue, 16 Oct 2007 11:11:19 -0400	[thread overview]
Message-ID: <4714D497.7010004@redhat.com> (raw)
In-Reply-To: <18196.53154.100115.92459@zeus.sw.starentnetworks.com>

On 10/16/2007 10:50 AM, Dave Johnson wrote:
> From: Dave Johnson <djohnson@sw.starentnetworks.com>
> 
> I ran into this problem on a system that was unable to obtain NTP sync
> because the clock was running very slow (over 10000ppm slow). ntpd had
> declared all of its peers 'reject' with 'peer_dist' reason.
> 
> On investigation, the tsc_khz variable was significantly incorrect
> causing xtime to run slow.  After a reboot tsc_khz was correct so I
> did a reboot test to see how often the problem occurred:
> 
> Test was done on a 2000 Mhz Xeon system.  Of 689 reboots, 8 of them
> had unacceptable tsc_khz values (>500ppm):
> 
>  range of tsc_khz  # of boots  % of boots
> -----------------  ----------  ----------
>         < 1999750           0      0.000%
> 1999750 - 1999800          21      3.048%
> 1999800 - 1999850         166     24.128%
> 1999850 - 1999900         241     35.029%
> 1999900 - 1999950         211     30.669%
> 1999950 - 2000000          42      6.105%
> 2000000 - 2000000           0      0.000%
> 2000050 - 2000100           0      0.000%
>                    [...]
> 2000100 - 2015000           1      0.145%  << BAD
> 2015000 - 2030000           6      0.872%  << BAD
> 2030000 - 2045000           1      0.145%  << BAD
> 2045000 <                   0      0.000%
> 
> The worst boot was 2032.577 Mhz, over 1.5% off!
> 
> It appears that on rare occasions, mach_countup() is taking longer to
> complete than necessary.
> 
> I suspect that this is caused by the CPU taking a periodic SMI
> interrupt right at the end of the 30ms calibration loop.  This would
> cause the loop to delay while the SMI BIOS hander runs. The resulting
> TSC value is beyond what it actually should be resulting in a higher
> tsc_khz.
> 
> The below patch makes native_calculate_cpu_khz() take the best
> (shortest duration, lowest khz) run of it's 3 calibration loops.  If a
> SMI goes off causing a bad result (long duration, higher khz) it will
> be discarded.
> 
> With the patch applied, 300 boots of the same system produce good
> results:
> 
>  range of tsc_khz  # of boots  % of boots
> -----------------  ----------  ----------
>         < 1999750           0      0.000%
> 1999750 - 1999800          30     10.000%
> 1999800 - 1999850         166     55.333%
> 1999850 - 1999900          89     29.667%
> 1999900 - 1999950          15      5.000%
> 1999950 <                   0      0.000%
> 
> Problem was found and tested against 2.6.18.  Patch is against 2.6.22.
> 
> Signed-off-by: Dave Johnson <djohnson@sw.starentnetworks.com>
> 
> ===== arch/i386/kernel/tsc.c 1.27 vs edited =====
> --- 1.27/arch/i386/kernel/tsc.c	2007-05-02 13:27:18 -04:00
> +++ edited/arch/i386/kernel/tsc.c	2007-10-15 16:31:04 -04:00
> @@ -122,7 +122,7 @@
>  {
>  	unsigned long long start, end;
>  	unsigned long count;
> -	u64 delta64;
> +	u64 delta64 = (u64)ULLONG_MAX;
>  	int i;
>  	unsigned long flags;
>  
> @@ -134,6 +134,7 @@
>  		rdtscll(start);
>  		mach_countup(&count);
>  		rdtscll(end);
> +		delta64 = min(delta64, (end - start));
>  	}
>  	/*
>  	 * Error: ECTCNEVERSET
> @@ -143,8 +144,6 @@
>  	 */
>  	if (count <= 1)
>  		goto err;
> -
> -	delta64 = end - start;
>  
>  	/* cpu freq too fast: */
>  	if (delta64 > (1ULL<<32))
> 

(added cc:)

  reply	other threads:[~2007-10-16 15:11 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-16 14:50 [PATCH] i386: fix TSC clock source calibration error Dave Johnson
2007-10-16 15:11 ` Chuck Ebbert [this message]
2007-10-18  8:57 ` Ingo Molnar
2007-10-19 17:16   ` [PATCH] i386: fix TSC clock source calibration error [part 2] Dave Johnson
2007-10-19 18:01     ` Hiroshi Shimamoto
2007-10-19 18:34       ` Dave Johnson
2007-10-22 11:42     ` Ingo Molnar
2007-10-18 20:43 ` [PATCH] i386: fix TSC clock source calibration error Daniel Walker
2007-10-19 17:31   ` Dave Johnson
  -- strict thread matches above, loose matches on Subject: below --
2007-10-19 18:45 Charles R Harris

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4714D497.7010004@redhat.com \
    --to=cebbert@redhat.com \
    --cc=djohnson+linux-kernel@sw.starentnetworks.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.