From mboxrd@z Thu Jan 1 00:00:00 1970 From: Paolo Bonzini Subject: Re: [RFC PATCH 4/5] Utilize the vmx preemption timer for tsc deadline timer Date: Sat, 21 May 2016 08:38:58 -0400 (EDT) Message-ID: <141447210.16267842.1463834338692.JavaMail.zimbra@redhat.com> References: <1463708703-19208-1-git-send-email-yunhong.jiang@linux.intel.com> <1463708703-19208-5-git-send-email-yunhong.jiang@linux.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Cc: Yunhong Jiang , kvm@vger.kernel.org, rkrcmar@redhat.com To: Yunhong Jiang Return-path: Received: from mx4-phx2.redhat.com ([209.132.183.25]:34273 "EHLO mx4-phx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751215AbcEUMjF (ORCPT ); Sat, 21 May 2016 08:39:05 -0400 In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: ----- Original Message ----- > From: "Yunhong Jiang" > To: "Paolo Bonzini" , "Yunhong Jiang" , kvm@vger.kernel.org > Cc: rkrcmar@redhat.com > Sent: Saturday, May 21, 2016 12:06:16 AM > Subject: RE: [RFC PATCH 4/5] Utilize the vmx preemption timer for tsc deadline timer > > > > > -----Original Message----- > > From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On > > Behalf Of Paolo Bonzini > > Sent: Friday, May 20, 2016 3:34 AM > > To: Yunhong Jiang ; kvm@vger.kernel.org > > Cc: rkrcmar@redhat.com > > Subject: Re: [RFC PATCH 4/5] Utilize the vmx preemption timer for tsc > > deadline timer > > > > > > > > On 20/05/2016 03:45, Yunhong Jiang wrote: > > > From: Yunhong Jiang > > > > > > Utilizing the VMX preemption timer for tsc deadline timer > > > virtualization. The VMX preemption timer is armed when the vCPU is > > > running, and a VMExit will happen if the virtual TSC deadline timer > > > expires. > > > > > > When the vCPU thread is scheduled out, the tsc deadline timer > > > virtualization will be switched to use the current solution, i.e. use > > > the timer for it. It's switched back to VMX preemption timer when the > > > vCPU thread is scheduled int. > > > > > > This solution avoids the complex OS's hrtimer system, and also the host > > > timer interrupt handling cost, with a preemption_timer VMexit. It fits > > > well for some NFV usage scenario, when the vCPU is bound to a pCPU and > > > the pCPU is isolated, or some similar scenario. > > > > > > However, it possibly has impact if the vCPU thread is scheduled in/out > > > very frequently, because it switches from/to the hrtimer emulation a lot. > > > > > > Signed-off-by: Yunhong Jiang > > > --- > > > arch/x86/kvm/lapic.c | 108 > > +++++++++++++++++++++++++++++++++++++++++++++++++-- > > > arch/x86/kvm/lapic.h | 10 +++++ > > > arch/x86/kvm/vmx.c | 26 +++++++++++++ > > > arch/x86/kvm/x86.c | 6 +++ > > > 4 files changed, 147 insertions(+), 3 deletions(-) > > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > > index 5776473be362..a613bcfda59a 100644 > > > --- a/arch/x86/kvm/x86.c > > > +++ b/arch/x86/kvm/x86.c > > > @@ -6608,6 +6608,8 @@ static int vcpu_enter_guest(struct kvm_vcpu > > *vcpu) > > > > > > local_irq_disable(); > > > > > > + inject_expired_hwemul_timer(vcpu); > > > > Is this really fast enough (and does it trigger often enough) that it is > > worth slowing down all vmenters? > > > > I'd rather call inject_expired_hwemul_timer from the preemption timer > > vmexit handler instead. inject_pending_hwemul_timer will set the > > preemption timer countdown to zero if the deadline of the guest LAPIC > > timer has passed already. This should be relatively rare. > > Sure and will take this way on the new patch set. I'd give some reson why > it's this way now. Originally this patch was for cyclictest on guest > with latency less than 15us for 24 hours. So, if the timer expires already > before VM entry, we try to inject it immediately, instead of waiting for > an extra VMExit, which may be 4~5 us. This seems too much... A vmexit+vmentry on Ivy Bridge or newer is around 1200-1500 cycles, that should give 1-2 microseconds at most including the time to inject the interrupt. There are a few more ideas that I have about optimizing the preemption timer, hopefully we can get it down to that and not pessimize the sched_out/sched_in case. Instead, I think what we want to touch is the blocking/unblocking callback. Wanpeng Li's patches to handle the APIC timer specially in kvm_vcpu_block could help too for this. However, there's time for that. Please keep sched_out/sched_in in your next submission, and we can work on it a step at a time. Thanks, Paolo