From mboxrd@z Thu Jan 1 00:00:00 1970 From: Liran Alon Subject: Re: [PATCH] KVM: nVMX: Fix races when sending nested PI while dest enters/leaves L2 Date: Fri, 10 Nov 2017 22:40:00 +0200 Message-ID: <5A060EA0.7060507@ORACLE.COM> References: <1510252040-5609-1-git-send-email-liran.alon@oracle.com> <20171110180616.GE15561@flask> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Cc: kvm@vger.kernel.org, jmattson@google.com, wanpeng.li@hotmail.com, idan.brown@ORACLE.COM, Krish Sadhukhan To: =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , Paolo Bonzini Return-path: Received: from aserp1040.oracle.com ([141.146.126.69]:40452 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750799AbdKJUkK (ORCPT ); Fri, 10 Nov 2017 15:40:10 -0500 In-Reply-To: <20171110180616.GE15561@flask> Sender: kvm-owner@vger.kernel.org List-ID: On 10/11/17 20:06, Radim Krčmář wrote: > 2017-11-10 18:06+0100, Paolo Bonzini: >> On 09/11/2017 19:27, Liran Alon wrote: >>> Consider the following scenario: >>> 1. CPU A calls vmx_deliver_nested_posted_interrupt() to send an IPI >>> to CPU B via virtual posted-interrupt mechanism. >>> 2. CPU B is currently executing L2 guest. >>> 3. vmx_deliver_nested_posted_interrupt() calls >>> kvm_vcpu_trigger_posted_interrupt() which will note that >>> vcpu->mode == IN_GUEST_MODE. >>> 4. Assume that before CPU A sends the physical POSTED_INTR_NESTED_VECTOR >>> IPI, CPU B exits from L2 to L0 during event-delivery >>> (valid IDT-vectoring-info). >>> 5. CPU A now sends the physical IPI. The IPI is received in host and >>> it's handler (smp_kvm_posted_intr_nested_ipi()) does nothing. >>> 6. Assume that before CPU A sets pi_pending=true and KVM_REQ_EVENT, >>> CPU B continues to run in L0 and reach vcpu_enter_guest(). As >>> KVM_REQ_EVENT is not set yet, vcpu_enter_guest() will continue and resume >>> L2 guest. >>> 7. At this point, CPU A sets pi_pending=true and KVM_REQ_EVENT but >>> it's too late! CPU B already entered L2 and KVM_REQ_EVENT will only be >>> consumed at next L2 entry! >> >> The bug is real (great debugging!) but I think the fix is wrong. The >> basic issue is that we're not kicking the VCPU, so this should also fix it: >> >> /* the PIR and ON have been set by L1. */ >> if (!kvm_vcpu_trigger_posted_interrupt(vcpu, true)) { > > This would still fail on the exiting case. > > If one VCPU was just after a VM exit, then the sender would see it > IN_GUEST_MODE, send the posted notification and return true, but the > notification would do nothing and we didn't set vmx->nested.pi_pending = > true, so the interrupt would not get handled until the next posted > notification or nested VM entry. > I agree. I was about to write to Paolo the exact same thing. >> /* >> * If a posted intr is not recognized by hardware, >> * we will accomplish it in the next vmentry. >> */ >> vmx->nested.pi_pending = true; >> kvm_make_request(KVM_REQ_EVENT, vcpu); >> kvm_vcpu_kick(vcpu); >> } >> >> See the comments around the setting of IN_GUEST_MODE, introduced by >> commit b95234c84004 ("kvm: x86: do not use KVM_REQ_EVENT for APICv >> interrupt injection", 2017-02-15). >> >> Even though nested PI must use KVM_REQ_EVENT, the reasoning behind the >> ordering of cli and vcpu->mode = IN_GUEST_MODE should hold in both cases. > > I think the fix is correct, but the code is using very awkward > synchronization through vmx->nested.pi_pending and slaps KVM_REQ_EVENT > on top. > If we checked vmx->nested.pi_pending after cli, we could get rid of > KVM_REQ_EVENT and if we want to support nested posted interrupts from > PCI devices, we should explicitly check for pi.on during VM entry. I like this suggestion. If I understand correctly, these are the changes I will do for v2 of this patch: 1. I Will change vmx_deliver_nested_posted_interrupt() to first set pi_pending=true and then call kvm_vcpu_trigger_posted_interrupt(). No need to set KVM_REQ_EVENT and no need to check kvm_vcpu_trigger_posted_interrupt() ret-val. 2. I will create a kvm_x86_ops->complete_nested_posted_interrupt() that will be called from vcpu_enter_guest() just after sync_pir_to_irr() is called but outside the "if" for kvm_lapic_enabled(). 3. I will remove call to vmx_complete_nested_posted_interrupt() from vmx_check_nested_events(). That call-site had a bug anyway that I fixed in another patch-series I posted here. This change will make that patch (not the series) redundant. See redundant patch here: https://marc.info/?l=kvm&m=150989090007281&w=2 I will prepare a v2 patch by this suggestion and send it. Thanks! -Liran > > Kicks do not seem necessary as the only case where we need the kick is > the same case where kvm_vcpu_trigger_posted_interrupt() should just > deliver the interrupt. >