From mboxrd@z Thu Jan  1 00:00:00 1970
From: Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: KVM: x86: update masterclock when kvmclock_offset is calculated
Date: Wed, 28 Aug 2013 14:37:20 +0200
Message-ID: <521DEF00.2030200@redhat.com>
References: <20130820182012.GA814@amt.cnet> <521644D0.9050902@redhat.com> <20130828025243.GA13653@amt.cnet>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: kvm-devel <kvm@vger.kernel.org>, Gleb Natapov <gleb@redhat.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mail-qc0-f180.google.com ([209.85.216.180]:60058 "EHLO
	mail-qc0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752891Ab3H1MhY (ORCPT <rfc822;kvm@vger.kernel.org>);
	Wed, 28 Aug 2013 08:37:24 -0400
Received: by mail-qc0-f180.google.com with SMTP id l13so2926911qcy.11
        for <kvm@vger.kernel.org>; Wed, 28 Aug 2013 05:37:23 -0700 (PDT)
In-Reply-To: <20130828025243.GA13653@amt.cnet>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Il 28/08/2013 04:52, Marcelo Tosatti ha scritto:
> On Thu, Aug 22, 2013 at 07:05:20PM +0200, Paolo Bonzini wrote:
>> Il 20/08/2013 20:20, Marcelo Tosatti ha scritto:
>>>
>>> The offset to add to the hosts monotonic time, kvmclock_offset, is
>>> calculated against the monotonic time at KVM_SET_CLOCK ioctl time.
>>>
>>> Request a master clock update at this time, to reduce a potentially
>>> unbounded difference between the values of the masterclock and
>>> the clock value used to calculate kvmclock_offset.
>>>
>>> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>>>
>>> Index: linux-2.6-kvmclock-fixes/arch/x86/kvm/x86.c
>>> ===================================================================
>>> --- linux-2.6-kvmclock-fixes.orig/arch/x86/kvm/x86.c
>>> +++ linux-2.6-kvmclock-fixes/arch/x86/kvm/x86.c
>>> @@ -3806,6 +3806,7 @@ long kvm_arch_vm_ioctl(struct file *filp
>>>  		delta = user_ns.clock - now_ns;
>>>  		local_irq_enable();
>>>  		kvm->arch.kvmclock_offset = delta;
>>> +		kvm_gen_update_masterclock(kvm);
>>>  		break;
>>>  	}
>>>  	case KVM_GET_CLOCK: {
>>
>> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>>
>> While reviewing this patch, which BTW looks good, I noticed the handling
>> of KVM_REQ_MCLOCK_INPROGRESS, the dummy request that is never processed
>> and is only used to block guest entry.
>>
>> It seems to me that this bit is not necessary.  After
>> KVM_REQ_CLOCK_UPDATE is issued, no guest entries will happen because
>> kvm_guest_time_update will try to take the pvclock_gtod_sync_lock,
>> currently taken by kvm_gen_update_masterclock.
> 
> Not entirely clear, to cancel guest entry the bit is necessary:
> 
>         if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
>             || need_resched() || signal_pending(current)) {
>                 vcpu->mode = OUTSIDE_GUEST_MODE;
>                 smp_wmb();
>                 local_irq_enable();
>                 preempt_enable();
>                 r = 1;
>                 goto cancel_injection;
>         }

Yes, KVM_REQ_CLOCK_UPDATE is enough to cancel guest entries too.  See
code below.

>> Thus, you do not need the dummy request.  You can simply issue
>> KVM_REQ_CLOCK_UPDATE before calling pvclock_update_vm_gtod_copy (with
>> the side effect of exiting VCPUs).  VCPUs will stall in
>> kvm_guest_time_update until pvclock_gtod_sync_lock is released by
>> kvm_gen_update_masterclock.  What do you think?
> 
> Not sure its safe. Can you describe the safety of your proposal in more
> detail ?

Here is the code I was thinking of:

	spin_lock(&ka->pvclock_gtod_sync_lock);
	make_all_cpus_request(kvm, KVM_REQ_CLOCK_UPDATE);

	/*
	 * No guest entries from this point: VCPUs will be spinning
	 * on pvclock_gtod_sync_lock in kvm_guest_time_update.
	 */
	pvclock_update_vm_gtod_copy(kvm);

	/*
	 * Let kvm_guest_time_update continue: entering the guest
	 * is now allowed too.
	 */
	spin_unlock(&ka->pvclock_gtod_sync_lock);

KVM_REQ_CLOCK_UPDATE is used to cancel guest entry and execute
kvm_guest_time_update.  But kvm_guest_time_update will spin on
pvclock_gtod_sync_lock until pvclock_update_vm_gtod_copy exits and
kvm_gen_update_masterclock releases the spinlock.

>> On top of this, optionally the spinlock could become an rw_semaphore
>> so that clock updates for different VCPUs will not be serialized.  The
>> effect is probably not visible, though.
> 
> Still not clear of the benefits, but this area certainly welcomes
> performance improvements (the global kick is one thing we discussed 
> and that should be improved).

This unfortunately does not eliminate the global kick, so there is
likely no performance improvement yet.  It simplifies the logic a bit
though.

The change I suggested above is to make pvclock_gtod_sync_lock an rwsem
or, probably better, a seqlock.  VCPUs reading ka->use_master_clock,
ka->master_cycle_now, ka->master_kernel_ns can then run concurrently,
with no locked operations in kvm_guest_time_update.

Paolo