From mboxrd@z Thu Jan  1 00:00:00 1970
From: Liran Alon <LIRAN.ALON@ORACLE.COM>
Subject: Re: [PATCH] KVM: nVMX: Fix races when sending nested PI while dest
 enters/leaves L2
Date: Thu, 16 Nov 2017 20:36:11 +0200
Message-ID: <5A0DDA9B.9040808@ORACLE.COM>
References: <1510252040-5609-1-git-send-email-liran.alon@oracle.com> <a9bf526f-032d-0515-0d69-e56425943251@redhat.com> <20171110180616.GE15561@flask> <0b202d55-6c87-e9b9-49aa-74aa7e83ff6b@redhat.com> <5A062A2C.9040102@ORACLE.COM> <20171116173714.GC20438@flask>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Cc: Paolo Bonzini <pbonzini@redhat.com>, kvm@vger.kernel.org,
        jmattson@google.com, wanpeng.li@hotmail.com, idan.brown@ORACLE.COM,
        Krish Sadhukhan <krish.sadhukhan@ORACLE.COM>
To: =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= <rkrcmar@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from userp1040.oracle.com ([156.151.31.81]:39300 "EHLO
        userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1751741AbdKPSgS (ORCPT <rfc822;kvm@vger.kernel.org>);
        Thu, 16 Nov 2017 13:36:18 -0500
In-Reply-To: <20171116173714.GC20438@flask>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>


On 16/11/17 19:37, Radim Krčmář wrote:
> 2017-11-11 00:37+0200, Liran Alon:
>> On 10/11/17 23:37, Paolo Bonzini wrote:
>>> On 10/11/2017 19:06, Radim Krčmář wrote:
>>>>> 	/* the PIR and ON have been set by L1. */
>>>>> 	if (!kvm_vcpu_trigger_posted_interrupt(vcpu, true)) {
>>>> This would still fail on the exiting case.
>>>>
>>>> If one VCPU was just after a VM exit, then the sender would see it
>>>> IN_GUEST_MODE, send the posted notification and return true, but the
>>>> notification would do nothing
>>>
>>> It would cause *something*---a vmexit because the vector doesn't match
>>> the L1 posted interrupt.  Then smp_kvm_posted_intr_nested_ipi would be
>>> invoked from vmx_handle_external_intr.
>>>
>>> Could we detect the vector in vmx_handle_external_intr and set
>>> pi_pending+KVM_REQ_EVENT?  Or invoke a function in KVM from
>>> smp_kvm_posted_intr_nested_ipi?  Or would both be insane?...
>>>
>>
>> I have actually thought about it before writing this patch. But have found
>> an issue with this approach (which doesn't exist in this v1 patch and in
>> Radim's suggestion for v2):
>>
>> Consider the case sender sees vcpu->mode==IN_GUEST_MODE and before it sends
>> the physical IPI, dest CPU exits from guest and continues in L0 all the way
>> until vcpu_enter_guest() and pass the part it checks for KVM_REQ_EVENT but
>> before it disables interrupts. Then sender sends the physical IPI which is
>> received in host-context and therefore runs smp_kvm_posted_intr_nested_ipi()
>> which sets KVM_REQ_EVENT & pi_pending=true. But without Radim's suggestion
>> of checking pi_pending after interrupts disabled, this is too late as dest
>> CPU will not check these again until next exit from L2 guest.
>>
>> I hope I didn't misunderstand something here :)
>
> kvm_request_pending() would notice KVM_REQ_EVENT and forces the VM entry
> to restart, so that wouldn't be a problem.
>
> I just realized another complication, though: when the sender looks at
> IN_GUEST_MODE and before it sends IPI, the destination can reschedule to
> a different VCPU => the IPI handler cannot use the 'current VCPU' and we
> have to build a list of VCPUs potentially awaiting notification vector
> for every PCPU, which makes it strictly worse than just looking directly
> at the ON bit, IMO.
>

It's funny because I have actually tried to first implement Paolo's 
suggestion and noticed this exact issue when implementing the IPI 
handler at host-side.

Therefore, I decided to go with Radim's suggestion. This also makes 
nested posted-interrupt handling more symmetrical to non-nested 
posted-interrupt handling.

I will soon post here the v2 series after I will complete more internal 
review & tests on this series.

BTW, I was wondering why do we really need the vmx->nested.pi_pending.
Can't we just rely on vmx->nested.pi_desc ON bit that will be checked at 
start of vmx_complete_nested_posted_interrupt()?

Thanks.
-Liran