public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Dan Hecht <dhecht@vmware.com>
To: Andi Kleen <andi@firstfloor.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>,
	Alok Kataria <akataria@vmware.com>, Ingo Molnar <mingo@elte.hu>,
	LKML <linux-kernel@vger.kernel.org>,
	the arch/x86 maintainers <x86@kernel.org>,
	dhecht@vmware.com
Subject: Re: [PATCH] Skip tsc synchronization checks if CONSTANT_TSC bit is set.
Date: Fri, 24 Oct 2008 13:18:24 -0700	[thread overview]
Message-ID: <49022D90.6080606@vmware.com> (raw)
In-Reply-To: <20081024192532.GG27492@one.firstfloor.org>

On 10/24/2008 12:25 PM, Andi Kleen wrote:
>> BIOSes are also just software, and we have to deal with bugs in them
>> *all the time*.  The reality is that we're going to have to deal with
>> both vendor and user reluctance to upgrade, and therefore have to deal
>> with brokenness in the field.
> 
> In the field they will just continue using clock=pit, like they
> always did on vmware. And also they will not update the Linux kernel.
> 
> This is strictly for new installations. And I frankly don't
> see why Linux needs to get white listed workarounds when the
> Hypervisor couldn't as well be fixed. We have the bizarre
> situation here where a HV vendor tries to add workarounds
> to Linux instead of fixing it on their products.
> 

What exactly would you like vmware to fix?  VMware fully virtualizes 
x86.  However, when running the kernel on virtual cpus, as compared to 
running on a physical cpus, the timing characteristics are different -- 
  virtual cpus have to time share physical cpus with each other.

So, timing assumptions that the kernel makes when running directly on 
physical cpus no longer hold when running on virtual cpus.

Prior to clocksource/clockevents, the timing assumptions that the Linux 
kernel made were terrible for hypervisors.  Now, the assumptions are 
much better.  However, three *minor* assumptions that the kernel makes 
are violated when running on a hypervisor:

1) The fast-path TSC calibration code makes assumptions about being able 
to sample various counters in sequence in a set amount of time that are 
not true when running virtualized.  (Actually, it makes assumptions that 
aren't really true 100% of the time on physical cpus, but in that case 
the odds of violating the assumptions (by hitting an SMI at exactly the 
right time and length) are really rare.

Note that accurate calibration of the TSC is extremely important in 
clocksource kernels since any error will lead to long term drift of 
wallclock time.

2) There is no guarantee that the acpi_pm timer will be sampled at least 
every 4.68 seconds (the wrap interval), because the vcpu, in extreme 
circumstances, may not have a chance to run in that time.  Thus, the 
acpi_pm timer is not suitable to be used as a clocksource watchdog when 
running on a hypervisor.

3) Virtual TSCs can be kept nearly in sync, but because the virtual TSC 
offset is set by software, it's not perfect.  So, the TSC 
synchronization test can fail.  (Really, it can fail on native as well, 
and that's why the tests for backwards TSC were added to 
read_tsc()/vread_tsc()).

Clearly, #1 and #2 *cannot* be fixed in the hypervisor.  These are cases 
where the kernel is making assumptions that just are not true when 
running on certain platforms (i.e. hypervisors).  Let's fix them.

#3, as you have suggested below, can perhaps be fixed by loosening the 
check a bit to allow some leeway for marginally offset TSCs.

> Now making generic code a little more flexible in what
> it accepts is fine though (like relaxing tsc_sync or
> checking and trusting UNSTABLE_TSC). That will scale at least
> and doesn't need significant new code.
> 

I think everyone can agree that this is the preferred approach, in 
general.  And in fact it was the approach Alok first used for the TSC 
frequency calibration problem (this is one reason why he merged the 
32-bit and 64-bit TSC code -- to standardize on the more robust 64-bit 
calibration code).  But, in the end, folks wanted a "fast" TSC 
calibration path, and that path makes assumptions that just won't be 
true when running on a hypervisor, so we are left with skipping that 
path if we are on a virtual cpu.

Also, with regards to your claim that users should continue to use 
clock=pit like options on newer kernels: that is just plain *wrong*. 
The reason for clock=pit (really clock=pmtmr) recommendation on 
pre-clocksource kernels wasn't to avoid using the TSC, but it was simply 
a workaround to avoid the kernel code that attempted to compensate for 
lost ticks (but would do so incorrectly in the case of late, but not 
lost, interrupts -- again, it was a kernel timing assumption that was 
invalid on hypervisors).

Dan

  parent reply	other threads:[~2008-10-24 20:15 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-21  1:35 [PATCH 0/3] Improve TSC as a clocksource under VMware Alok Kataria
2008-10-21  5:55 ` Chris Snook
2008-10-21 19:11   ` Alok Kataria
2008-10-22 19:26   ` Alok Kataria
2008-10-22 19:29     ` Jeff Hansen
2008-10-21  9:43 ` Andi Kleen
2008-10-21 16:41   ` Alok Kataria
2008-10-21 17:40     ` Andi Kleen
2008-10-21 18:04       ` Alok Kataria
2008-10-21 18:15         ` Andi Kleen
2008-10-21 19:10           ` Alok Kataria
2008-10-21 19:27             ` Andi Kleen
2008-10-21 19:47               ` Alok Kataria
2008-10-21 21:41                 ` Alok Kataria
2008-10-22 19:23               ` [PATCH] Skip tsc synchronization checks if CONSTANT_TSC bit is set Alok Kataria
2008-10-22 19:26                 ` Ingo Molnar
2008-10-22 19:30                   ` Alok Kataria
2008-10-22 20:17                   ` Andi Kleen
2008-10-22 22:04                     ` Alok Kataria
2008-10-22 19:58                 ` Andi Kleen
2008-10-22 22:00                   ` Alok Kataria
2008-10-22 22:13                     ` Andi Kleen
2008-10-22 22:11                       ` Alok Kataria
2008-10-22 22:54                         ` Andi Kleen
2008-10-23  2:21                           ` Alok Kataria
2008-10-23  8:10                             ` Andi Kleen
2008-10-23 23:39                               ` Alok Kataria
2008-10-23 23:47                                 ` H. Peter Anvin
2008-10-24  0:25                                   ` Andi Kleen
2008-10-24  0:46                                     ` H. Peter Anvin
2008-10-24  7:23                                       ` Andi Kleen
2008-10-24 15:30                                         ` H. Peter Anvin
2008-10-24 19:25                                           ` Andi Kleen
2008-10-24 19:19                                             ` H. Peter Anvin
2008-10-24 19:34                                               ` Andi Kleen
2008-10-24 19:50                                                 ` Alok Kataria
2008-10-24 20:18                                             ` Dan Hecht [this message]
2008-10-24  1:12                                   ` Alok Kataria
2008-10-24  1:18                                     ` H. Peter Anvin
2008-10-22 22:15                     ` Alan Cox
2008-10-22 22:17                       ` Alok Kataria
2008-10-21  9:50 ` [PATCH 0/3] Improve TSC as a clocksource under VMware Pavel Machek
2008-10-21 16:48   ` Alok Kataria
2008-10-21 18:36     ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49022D90.6080606@vmware.com \
    --to=dhecht@vmware.com \
    --cc=akataria@vmware.com \
    --cc=andi@firstfloor.org \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox