From mboxrd@z Thu Jan 1 00:00:00 1970 From: Liran Alon Subject: Re: [PATCH v2 4/5] KVM: nVMX: Deliver missed nested-PI notification-vector via self-IPI while interrupts disabled Date: Thu, 07 Dec 2017 13:33:57 +0200 Message-ID: <5A292725.1090207@ORACLE.COM> References: <1512461786-6465-1-git-send-email-liran.alon@oracle.com> <1512461786-6465-5-git-send-email-liran.alon@oracle.com> <20171206204552.GD3644@flask> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Cc: pbonzini@redhat.com, kvm@vger.kernel.org, jmattson@google.com, wanpeng.li@hotmail.com, idan.brown@ORACLE.COM, Krish Sadhukhan , Konrad Rzeszutek Wilk To: =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= Return-path: Received: from aserp2130.oracle.com ([141.146.126.79]:35362 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752749AbdLGLeI (ORCPT ); Thu, 7 Dec 2017 06:34:08 -0500 In-Reply-To: <20171206204552.GD3644@flask> Sender: kvm-owner@vger.kernel.org List-ID: On 06/12/17 22:45, Radim Krčmář wrote: > 2017-12-05 10:16+0200, Liran Alon: >> When L1 wants to send a posted-interrupt to another L1 CPU running L2, >> it sets the relevant bit in vmx->nested.pi_desc->pir and ON bit in >> vmx->nested.pi_desc->control. Then it attempts to send a >> notification-vector IPI to dest L1 CPU. >> This attempt to send IPI will exit to L0 which will reach >> vmx_deliver_nested_posted_interrupt() which does the following: >> 1. If dest L0 CPU is currently running guest (vcpu->mode == >> IN_GUEST_MODE), it sends a physical IPI of PI nested-notification-vector. >> 2. It sets KVM_REQ_EVENT in dest vCPU. This is done such that if dest >> L0 CPU exits from guest to host and therefore doesn't recognize physical >> IPI (or it wasn't sent), then KVM_REQ_EVENT will be consumed on next >> vmentry which will call vmx_check_nested_events() which should call >> (in theory) vmx_complete_nested_posted_interrupt(). That function >> should see vmx->nested.pi_desc->control ON bit is set and therefore >> "emulate" posted-interrupt delivery for L1 (sync PIR to IRR in L1 >> virtual-apic-page & update vmcs02 RVI). >> >> The above logic regarding nested-posted-interrupts contains multiple >> issues: >> >> A) Race-condition: >> On step (1) it is possible sender will see vcpu->mode == IN_GUEST_MODE >> but before sending physical IPI, the dest CPU will exit to host. >> Therefore, physical IPI could be received at host which it's hanlder >> does nothing. In addition, assume that dest CPU passes the checks >> for pending kvm requests before sender sets KVM_REQ_EVENT. Therefore, >> dest CPU will resume L2 without evaluating nested-posted-interrupts. >> >> B) vmx_complete_nested_posted_interrupt() is not always called >> when needed. Even if dest CPU consumed KVM_REQ_EVENT, there is a bug >> that vmx_check_nested_events() could exit from L2 to L1 before calling >> vmx_complete_nested_posted_interrupt(). Therefore, on next resume of >> L1 into L2, nested-posted-interrupts won't be evaluated even though L0 >> resume L2 (We may resume L2 without having KVM_REQ_EVENT set). >> >> This commit removes nested-posted-interrupts processing from >> check_nested_events() and instead makes sure to process >> nested-posted-interrupts on vmentry after interrupts >> disabled. Processing of nested-posted-interrupts is delegated >> to hardware by issueing a self-IPI of relevant notification-vector >> which will be delivered to CPU when CPU is in guest. >> >> * Bug (A) is solved by the fact that processing of >> nested-posted-interrupts is not dependent on KVM_REQ_EVENT and >> happens before every vmentry to L2. >> * Bug (B) is now trivially solved by processing >> nested-posted-interrupts before each vmentry to L2 guest. >> >> An alternative could have been to just call >> vmx_complete_nested_posted_interrupt() at this call-site. However, we >> have decided to go with this approach because: >> 1. It would require modifying vmx_complete_nested_posted_interrupt() >> to be able to work with interrupts disabled (not-trivial). >> 2. We preffer to avoid software-emulation of hardware behavior if it >> is possible. > > Nice solution! Self-IPI was slower on non-nested, but even if it is > slower here, the code saving is definitely worth it. > >> Fixes: 705699a13994 ("KVM: nVMX: Enable nested posted interrupt processing") >> >> Signed-off-by: Liran Alon >> Co-authored-by: Nikita Leshenko >> Reviewed-by: Krish Sadhukhan >> Signed-off-by: Krish Sadhukhan >> Signed-off-by: Konrad Rzeszutek Wilk >> --- >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >> @@ -5156,6 +5114,17 @@ static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector) >> kvm_vcpu_kick(vcpu); >> } >> >> +static void vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu) >> +{ >> + struct vcpu_vmx *vmx = to_vmx(vcpu); >> + >> + WARN_ON(!vcpu->arch.apicv_active); >> + >> + if (is_guest_mode(vcpu) && vmx->nested.pi_desc && >> + pi_test_on(vmx->nested.pi_desc)) >> + kvm_vcpu_trigger_posted_interrupt(vcpu, true); > > We're always sending to self, so the function would be better with > apic->send_IPI_self() as it might be accelerated by hardware. > Agreed. Will change. >> +} >> + >> /* >> * Set up the vmcs's constant host-state fields, i.e., host-state fields that >> * will not change in the lifetime of the guest. >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> @@ -6962,12 +6962,18 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) >> smp_mb__after_srcu_read_unlock(); >> >> /* >> - * This handles the case where a posted interrupt was >> - * notified with kvm_vcpu_kick. >> + * In case guest got the posted-interrupt notification >> + * vector while running in host, we need to make sure >> + * it arrives to guest. >> + * For L1 posted-interrupts, we manually sync PIR to IRR. >> + * For L2 posted-interrupts, we send notification-vector >> + * again by self IPI such that it will now be received in guest. >> */ >> - if (kvm_lapic_enabled(vcpu)) { >> - if (kvm_x86_ops->sync_pir_to_irr && vcpu->arch.apicv_active) >> + if (kvm_lapic_enabled(vcpu) && vcpu->arch.apicv_active) { >> + if (kvm_x86_ops->sync_pir_to_irr) >> kvm_x86_ops->sync_pir_to_irr(vcpu); >> + if (kvm_x86_ops->complete_nested_posted_interrupt) >> + kvm_x86_ops->complete_nested_posted_interrupt(vcpu); > > I think that vcpu->arch.apicv_active is enough to guarantee presence of > both sync_pir_to_irr and complete_nested_posted_interrupt, I am not sure you can remove the check for kvm_lapic_enabled(). apicv_active can be true even when lapic_in_kernel()==false. apicv_active is only dependent on if physical CPU supports APICv related features & enable_apicv module parameter is 1. However, sync_pir_to_irr() assumes that lapic_in_kernel()==true because it reads & updates data in in-kernel LAPIC IRR register. So I think at least it doesn't makes sense to invoke sync_pir_to_irr() in case kvm_lapic_enabled()==false. But I do agree that complete_nested_posted_interrupt() should be invoked if just apicv_active==true. Because the LAPIC that L1 emulate for L2 can be enabled and implemented in-kernel even if the LAPIC that L0 emulate for L1 is not currently enabled. Therefore, L1 can still send posted-interrupts for it's L2 guest in this case. > > and considering this is a hot path, I'd check is_guest_mode(vcpu) before > calling the second function. > > (Saving the function call with a new x86_op for those two seems viable > too, but would need deeper analysis and benchmarking as it is putting > more code into hot cache ...) > OK. Will change. Regards, -Liran > Thanks. >