linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Zachary Amsden <zamsden@redhat.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: KVM <kvm@vger.kernel.org>, Avi Kivity <avi@redhat.com>,
	Glauber Costa <glommer@redhat.com>,
	Linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 09/18] Robust TSC compensation
Date: Tue, 13 Jul 2010 11:15:39 -1000	[thread overview]
Message-ID: <4C3CD77B.9050108@redhat.com> (raw)
In-Reply-To: <20100713203418.GA903@amt.cnet>

On 07/13/2010 10:34 AM, Marcelo Tosatti wrote:
> On Mon, Jul 12, 2010 at 04:25:29PM -1000, Zachary Amsden wrote:
>    
>> Make the match of TSC find TSC writes that are close to each other
>> instead of perfectly identical; this allows the compensator to also
>> work in migration / suspend scenarios.
>>
>> Signed-off-by: Zachary Amsden<zamsden@redhat.com>
>> ---
>>   arch/x86/kvm/x86.c |   14 ++++++++++----
>>   1 files changed, 10 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 79c4608..51d3f3e 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -926,21 +926,27 @@ void guest_write_tsc(struct kvm_vcpu *vcpu, u64 data)
>>   	struct kvm *kvm = vcpu->kvm;
>>   	u64 offset, ns, elapsed;
>>   	struct timespec ts;
>> +	s64 sdiff;
>>
>>   	spin_lock(&kvm->arch.tsc_write_lock);
>>   	offset = data - native_read_tsc();
>>   	ns = get_kernel_ns();
>>   	elapsed = ns - kvm->arch.last_tsc_nsec;
>> +	sdiff = data - kvm->arch.last_tsc_write;
>> +	if (sdiff<  0)
>> +		sdiff = -sdiff;
>>
>>   	/*
>> -	 * Special case: identical write to TSC within 5 seconds of
>> +	 * Special case: close write to TSC within 5 seconds of
>>   	 * another CPU is interpreted as an attempt to synchronize
>> -	 * (the 5 seconds is to accomodate host load / swapping).
>> +	 * The 5 seconds is to accomodate host load / swapping as
>> +	 * well as any reset of TSC during the boot process.
>>   	 *
>>   	 * In that case, for a reliable TSC, we can match TSC offsets,
>> -	 * or make a best guest using kernel_ns value.
>> +	 * or make a best guest using elapsed value.
>>   	 */
>> -	if (data == kvm->arch.last_tsc_write&&  elapsed<  5ULL * NSEC_PER_SEC) {
>> +	if (sdiff<  nsec_to_cycles(5ULL * NSEC_PER_SEC)&&
>> +	    elapsed<  5ULL * NSEC_PER_SEC) {
>>   		if (!check_tsc_unstable()) {
>>   			offset = kvm->arch.last_tsc_offset;
>>   			pr_debug("kvm: matched tsc offset for %llu\n", data);
>>      
> What prevents a vcpu from seeing its TSC go backwards, in case the first
> write in the 5 second window is smaller than the victim vcpu's last
> visible TSC value ?
>    

Nothing, unfortunately.  However, the TSC would already have to be out 
of sync in order for the problem to occur.  It can never happen in 
normal circumstances on a stable hardware TSC except in one case; 
migration.  During the CPU state transfer phase of migration, however, 
all the VCPUs should already be stopped, so the maximum TSC that can be 
observed by any CPU is bounded.

The problem, of course is that the TSC write will latch the first TSC to 
be written, which, if you stop in order, and start in order, will be the 
lowest TSC; so later VCPUs can observe a negative TSC drift (timing is 
additionally complicated by the VCPU teardown / create time).

There are a couple solutions; some of which are ugly and some of which 
are very ugly.

1) Make some global state available about whether the VM is running or 
not, use the global TSC lock and record the last TSC when the VM is 
stopped.  Return this TSC from any reads when the VM is stopped.  I'm 
not sure the kernel model is equipped to do this properly; in theory, 
userspace could stop and start running VCPUs using the ioctls whenever 
it feels like and requires no protocol to do so.

2) Make userspace deal with it; when starting up a VM, read the VCPU 
state for all VCPUs in first, then take the maximum TSC and set all TSCs 
to this value before starting the VCPUs.  I'm not sure the userspace 
model is equipped to do this properly, it could start running earlier 
CPUs before reading later CPU states...

3) Drop passthrough TSC altogether and switch to trap / emulate TSC.

4) Pray to a deity of your choice.

Of course, none of these solutions work for a guest which deliberately 
runs with desynchronized TSCs, but we needn't really be concerned with 
that, no guest does it on purpose.

  reply	other threads:[~2010-07-13 21:15 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-13  2:25 KVM timekeeping fixes, V2 Zachary Amsden
2010-07-13  2:25 ` [PATCH 01/18] Make TSC offset writes non-preemptible Zachary Amsden
2010-07-13 21:33   ` Rik van Riel
2010-07-18 14:28   ` Avi Kivity
2010-07-18 14:30   ` Avi Kivity
2010-07-13  2:25 ` [PATCH 02/18] Fix SVM VMCB reset Zachary Amsden
2010-07-13 21:37   ` Rik van Riel
2010-07-13  2:25 ` [PATCH 03/18] TSC reset compensation Zachary Amsden
2010-07-13 22:11   ` Rik van Riel
2010-07-18 14:34   ` Avi Kivity
2010-07-19 20:01     ` Zachary Amsden
2010-07-13  2:25 ` [PATCH 04/18] Make cpu_tsc_khz updates use local CPU Zachary Amsden
2010-07-14 14:41   ` Rik van Riel
2010-07-18 14:45   ` Avi Kivity
2010-07-19 20:06     ` Zachary Amsden
2010-07-20  8:53       ` Avi Kivity
2010-07-20 21:57         ` Zachary Amsden
2010-07-13  2:25 ` [PATCH 05/18] Warn about unstable TSC Zachary Amsden
2010-07-14 15:02   ` Rik van Riel
2010-07-18 14:47   ` Avi Kivity
2010-07-13  2:25 ` [PATCH 06/18] Unify TSC logic Zachary Amsden
2010-07-14 15:53   ` Rik van Riel
2010-07-13  2:25 ` [PATCH 07/18] Fix deep C-state TSC desynchronization Zachary Amsden
2010-07-14 16:14   ` Rik van Riel
2010-07-13  2:25 ` [PATCH 08/18] Add helper functions for time computation Zachary Amsden
2010-07-14 19:02   ` Rik van Riel
2010-07-13  2:25 ` [PATCH 09/18] Robust TSC compensation Zachary Amsden
2010-07-13 20:34   ` Marcelo Tosatti
2010-07-13 21:15     ` Zachary Amsden [this message]
2010-07-13 21:42       ` David S. Ahern
2010-07-13 21:45         ` Zachary Amsden
2010-07-13 23:32         ` Zachary Amsden
2010-07-14 22:33   ` Rik van Riel
2010-07-18 14:52   ` Avi Kivity
2010-07-19 20:39     ` Zachary Amsden
2010-07-13  2:25 ` [PATCH 10/18] Keep SMP VMs more in sync on unstable TSC Zachary Amsden
2010-07-15  2:13   ` Rik van Riel
2010-07-13  2:25 ` [PATCH 11/18] Perform hardware_enable in CPU_STARTING callback Zachary Amsden
2010-07-15  2:15   ` Rik van Riel
2010-07-13  2:25 ` [PATCH 12/18] Add clock sync request to hardware enable Zachary Amsden
2010-07-15  2:32   ` Rik van Riel
2010-07-13  2:25 ` [PATCH 13/18] Move scale_delta into common header Zachary Amsden
2010-07-15  2:35   ` Rik van Riel
2010-07-13  2:25 ` [PATCH 14/18] Fix a possible backwards warp of kvmclock Zachary Amsden
2010-07-15  2:37   ` Rik van Riel
2010-07-13  2:25 ` [PATCH 15/18] Implement getnsboottime kernel API Zachary Amsden
2010-07-15  2:41   ` Rik van Riel
2010-07-18 15:07   ` Avi Kivity
2010-07-13  2:25 ` [PATCH 16/18] Use getnsboottime in KVM Zachary Amsden
2010-07-15  2:43   ` Rik van Riel
2010-07-13  2:25 ` [PATCH 17/18] Indicate reliable TSC in kvmclock Zachary Amsden
2010-07-15  2:44   ` Rik van Riel
2010-07-13  2:25 ` [PATCH 18/18] Add timekeeping documentation Zachary Amsden
2010-07-14  7:16   ` Takuya Yoshikawa
2010-07-14 20:28     ` Zachary Amsden
2010-07-16 13:19 ` KVM timekeeping fixes, V2 Joerg Roedel
2010-07-16 17:20   ` Zachary Amsden
2010-07-16 19:26     ` Joerg Roedel
2010-07-18 14:22       ` Avi Kivity
2010-07-18 15:08 ` Avi Kivity
2010-07-19  8:11   ` Zachary Amsden

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C3CD77B.9050108@redhat.com \
    --to=zamsden@redhat.com \
    --cc=avi@redhat.com \
    --cc=glommer@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).