From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marcelo Tosatti Subject: Re: KVM: x86: update masterclock when kvmclock_offset is calculated Date: Tue, 3 Sep 2013 00:03:51 -0300 Message-ID: <20130903030351.GA3167@amt.cnet> References: <20130820182012.GA814@amt.cnet> <521644D0.9050902@redhat.com> <20130828025243.GA13653@amt.cnet> <521DEF00.2030200@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: kvm-devel , Gleb Natapov To: Paolo Bonzini Return-path: Received: from mx1.redhat.com ([209.132.183.28]:36536 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759219Ab3ICDEP (ORCPT ); Mon, 2 Sep 2013 23:04:15 -0400 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r8334FIq007585 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Mon, 2 Sep 2013 23:04:15 -0400 Content-Disposition: inline In-Reply-To: <521DEF00.2030200@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On Wed, Aug 28, 2013 at 02:37:20PM +0200, Paolo Bonzini wrote: > Il 28/08/2013 04:52, Marcelo Tosatti ha scritto: > > On Thu, Aug 22, 2013 at 07:05:20PM +0200, Paolo Bonzini wrote: > >> Il 20/08/2013 20:20, Marcelo Tosatti ha scritto: > >>> > >>> The offset to add to the hosts monotonic time, kvmclock_offset, is > >>> calculated against the monotonic time at KVM_SET_CLOCK ioctl time. > >>> > >>> Request a master clock update at this time, to reduce a potentially > >>> unbounded difference between the values of the masterclock and > >>> the clock value used to calculate kvmclock_offset. > >>> > >>> Signed-off-by: Marcelo Tosatti > >>> > >>> Index: linux-2.6-kvmclock-fixes/arch/x86/kvm/x86.c > >>> =================================================================== > >>> --- linux-2.6-kvmclock-fixes.orig/arch/x86/kvm/x86.c > >>> +++ linux-2.6-kvmclock-fixes/arch/x86/kvm/x86.c > >>> @@ -3806,6 +3806,7 @@ long kvm_arch_vm_ioctl(struct file *filp > >>> delta = user_ns.clock - now_ns; > >>> local_irq_enable(); > >>> kvm->arch.kvmclock_offset = delta; > >>> + kvm_gen_update_masterclock(kvm); > >>> break; > >>> } > >>> case KVM_GET_CLOCK: { > >> > >> Reviewed-by: Paolo Bonzini > >> > >> While reviewing this patch, which BTW looks good, I noticed the handling > >> of KVM_REQ_MCLOCK_INPROGRESS, the dummy request that is never processed > >> and is only used to block guest entry. > >> > >> It seems to me that this bit is not necessary. After > >> KVM_REQ_CLOCK_UPDATE is issued, no guest entries will happen because > >> kvm_guest_time_update will try to take the pvclock_gtod_sync_lock, > >> currently taken by kvm_gen_update_masterclock. > > > > Not entirely clear, to cancel guest entry the bit is necessary: > > > > if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests > > || need_resched() || signal_pending(current)) { > > vcpu->mode = OUTSIDE_GUEST_MODE; > > smp_wmb(); > > local_irq_enable(); > > preempt_enable(); > > r = 1; > > goto cancel_injection; > > } > > Yes, KVM_REQ_CLOCK_UPDATE is enough to cancel guest entries too. See > code below. > > >> Thus, you do not need the dummy request. You can simply issue > >> KVM_REQ_CLOCK_UPDATE before calling pvclock_update_vm_gtod_copy (with > >> the side effect of exiting VCPUs). VCPUs will stall in > >> kvm_guest_time_update until pvclock_gtod_sync_lock is released by > >> kvm_gen_update_masterclock. What do you think? > > > > Not sure its safe. Can you describe the safety of your proposal in more > > detail ? > > Here is the code I was thinking of: > > spin_lock(&ka->pvclock_gtod_sync_lock); > make_all_cpus_request(kvm, KVM_REQ_CLOCK_UPDATE); > > /* > * No guest entries from this point: VCPUs will be spinning > * on pvclock_gtod_sync_lock in kvm_guest_time_update. > */ > pvclock_update_vm_gtod_copy(kvm); > > /* > * Let kvm_guest_time_update continue: entering the guest > * is now allowed too. > */ > spin_unlock(&ka->pvclock_gtod_sync_lock); > > KVM_REQ_CLOCK_UPDATE is used to cancel guest entry and execute > kvm_guest_time_update. But kvm_guest_time_update will spin on > pvclock_gtod_sync_lock until pvclock_update_vm_gtod_copy exits and > kvm_gen_update_masterclock releases the spinlock. Not safe because there are places which set KVM_REQ_CLOCK_UPDATE without kicking target vcpu out of guest mode. Unless you use a modified make_all_cpus_request. The point of REQ_MCLOCK_INPROGRESS request is to guarantee that the following is not possible: - 2 vcpus in guest mode with per-vcpu kvmclock areas with different {system_timestamp, tsc_offset} values. To achieve that: - Kick all vcpus out of guest mode (via a request bit that can't be cleared). - Update the {system_timestamp, tsc_offset} values. - Clear the request bit. > >> On top of this, optionally the spinlock could become an rw_semaphore > >> so that clock updates for different VCPUs will not be serialized. The > >> effect is probably not visible, though. > > > > Still not clear of the benefits, but this area certainly welcomes > > performance improvements (the global kick is one thing we discussed > > and that should be improved). > > This unfortunately does not eliminate the global kick, so there is > likely no performance improvement yet. It simplifies the logic a bit > though. > > The change I suggested above is to make pvclock_gtod_sync_lock an rwsem > or, probably better, a seqlock. > VCPUs reading ka->use_master_clock, > ka->master_cycle_now, ka->master_kernel_ns can then run concurrently, > with no locked operations in kvm_guest_time_update. Good point.