From: Liran Alon <LIRAN.ALON@ORACLE.COM>
To: Paolo Bonzini <pbonzini@redhat.com>,
rkrcmar@redhat.com, kvm@vger.kernel.org
Cc: jmattson@google.com, wanpeng.li@hotmail.com,
idan.brown@ORACLE.COM, Liam Merwick <liam.merwick@ORACLE.COM>
Subject: Re: [PATCH v3 11/11] KVM: nVMX: Wake L2 from HLT when nested posted-interrupt pending
Date: Wed, 27 Dec 2017 17:15:29 +0200 [thread overview]
Message-ID: <5A43B911.3090500@ORACLE.COM> (raw)
In-Reply-To: <c9b4ded1-200d-a6f7-6e9d-54a40505317b@redhat.com>
On 27/12/17 14:55, Paolo Bonzini wrote:
> On 27/12/2017 11:51, Liran Alon wrote:
>> Maybe I misunderstand you, but I don't think this is true.
>> complete_nested_posted_interrupt() relies on being called at a very
>> specific call-site: Right before VMEntry and after interrupts are
>> disabled. It works by issue a self-IPI to cause CPU to actually process
>> the nested posted-interrupts.
>>
>> In addition, I don't see how we can utilize the fact that
>> kvm_cpu_has_interrupt() calls apic_has_interrupt_for_ppr() as
>> vmx.nested.pi_desc->pir should never be synced into L0 vLAPIC IRR.
>> In contrast to vmx.pi_desc->pir which does after sync_pir_to_irr().
>
> You're right, the callbacks must stay. What confused me is the
> reference to pi_test_on in vmx_cpu_has_nested_posted_interrupt. While
> your patches are an improvement, there is still some confusion on what
> leads to injecting a nested posted interrupt, and on the split between
> "deliver" and "complete":
>
> - checking pi_test_on should only be done when consuming the L2 PI
> descriptor (which in your case is done by the processor via the
> self-IPI). So you don't need to test it before sending the self-IPI.
I agree we could have not check it. It is done as an optimization to
avoid sending a self-IPI when it won't do anything because the
vmx.nested->pi_desc.control ON bit is off.
This optimization is even more important when this code runs in L1
(think about triple-virtualization ^_^) to avoid a #VMExit on write to
LAPIC ICR.
>
> - vmx->nested.pi_pending should not be set in
> vmx_deliver_nested_posted_interrupt after patch 9. pi_pending should
> only be set by the nested PI handler, when it is invoked outside guest
> mode. It is then consumed before doing the self-IPI as in the above sketch.
I disagree.
vmx->nested.pi_pending is used as a signal to know L1 has indeed sent a
vmcs12->posted_intr_nv IPI. If vmx_deliver_nested_posted_interrupt()
will see target vCPU thread is OUTSIDE_GUEST_MODE then it won't send a
physical IPI but instead only kick vCPU. In this case, the only way the
target vCPU thread knows it should trigger self-IPI is by the fact
pi_pending was set by vmx_deliver_nested_posted_interrupt().
(As in this case, the nested PI handler in host was never invoked).
(It cannot only rely on the vmx.nested->pi_desc.control ON bit as it may
be set by L1 without L1 triggering a vmcs12->posted_intr_nv IPI)
>
> - likewise, vmx_cpu_has_nested_posted_interrupt should only check
> vmx->nested.pi_pending. I don't think it should even check
> is_guest_mode(vcpu) or vmx->nested.pi_desc; compare with the handling of
> KVM_REQ_NMI, it doesn't test nmi_allowed. This leads me to the next point.
There is an interesting question on what should happen when
(is_guest_mode(vcpu) && vmx->nested.pi_pending &&
!pi_test_on(vmx->nested.pi_desc)). I would expect L2 guest to not be
waken from it's HLT as virtual-interrupt-delivery should not see any
pending interrupt to inject. But I would expect that to also happen when
PIR is empty (even if ON bit is set) which is not checked in this
condition...
So I think I agree but we may have a small semantic issue here of waking
the L2 vCPU unnecessarily. But this should be a very rare case so maybe
handle it later (Write TODO comment).
>
> - vmx_complete_nested_posted_interrupt *must* find a valid descriptor.
> Even after a vmexit, because vmx_complete_nested_posted_interrupt is the
> very first thing that runs and L1 has had no time to run a vmwrite. So
I agree. It is better to WARN in case descriptor is not valid.
> the above could become:
>
> if (is_guest_mode(vcpu) && vmx->nested.pi_pending) {
> vmx->nested.pi_pending = false;
> apic->send_IPI_mask(get_cpu_mask(vcpu->cpu),
> POSTED_INTR_NESTED_VECTOR);
> }
>
> and then what about that is_guest_mode check? When L1 sent the PI
> notification, it thought that the destination VCPU was in guest mode.
> Upon vmexit, L1 can expect to either see ON=0 or get the notification
> interrupt itself. So either we deliver the interrupt to L1, or we must
> do L2 virtual interrupt delivery before L1 even runs. Hence IMO the
> is_guest_mode is wrong, which unfortunately throws a wrench somewhat in
> the self-IPI idea quite a bit.
>
> I agree with moving vmx_complete_nested_posted_interrupt out of
> check_nested_events, and also with the simplification of
> vmx_hwapic_irr_update. However, in my opinion the self-IPI is actually
> complicating things. Once the handling of vmx->nested.pi_pending is
> cleaned up (the nested PI interrupt handler in patch 9 is a great
> improvement in that respect), we can move
> vmx_complete_nested_posted_interrupt to a new KVM_REQ_NESTED_PI request
> that replaces vmx->nested.pi_pending. The GUEST_INTR_STATUS write can
> be done in the vmcs12, and will then be copied by prepare_vmcs02 to the
> vmcs02.
>
> I may be wrong, but based on my experience cleaning up the bare-metal
> APICv, I suspect that both the current code and this version are overly
> complicated, and there is a nice and simple core waiting to be
> extracted. Let's fix the race condition in the simple way, and
> separately focus on cleaning up pi_pending and everything that follows
> from there.
Interesting.
I think I now follow what you mean regarding cleaning logic around
pi_pending. This is how I understand it:
1. "vmx->nested.pi_pending" is a flag used to indicate "L1 has sent a
vmcs12->posted_intr_nv IPI". That's it.
2. Currently code is a bit weird in the sense that instead of signal the
pending IPI in virtual LAPIC IRR, we set it in a special variable.
If we would have set it in virtual LAPIC IRR, we could in theory behave
very similar to a standard CPU. At interrupt injection point, we could:
(a) If vCPU is in root-mode: Just inject the pending interrupt normally.
(b) If vCPU is in non-root-mode and posted-interrupts feature is active,
then instead of injecting the pending interrupt, we should simulate
processing of posted-interrupts.
3. The processing of the nested posted-interrupts itself can still be
done in self-IPI mechanism.
4. Because not doing (2), there is still currently an issue that L1
doesn't receive a vmcs12->posted_intr_nv interrupt when target vCPU
thread has exited from L2 to L1 and pi_pending=true.
Do we agree on the above? Or am I still misunderstanding something?
If we agree, I will attempt to clean code to handle it like described above.
Regards,
-Liran
>
> Paolo
>
next prev parent reply other threads:[~2017-12-27 15:15 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-24 16:12 [PATCH v3 00/11] KVM: nVMX: Fix multiple issues in nested posted-interrupts Liran Alon
2017-12-24 16:12 ` [PATCH v3 01/11] KVM: x86: Optimization: Create SVM stubs for sync_pir_to_irr() Liran Alon
2017-12-27 9:56 ` Paolo Bonzini
2017-12-27 10:01 ` Liran Alon
2017-12-24 16:12 ` [PATCH v3 02/11] KVM: x86: Change __kvm_apic_update_irr() to also return if max IRR updated Liran Alon
2018-01-02 1:51 ` Quan Xu
2017-12-24 16:12 ` [PATCH v3 03/11] KVM: nVMX: Re-evaluate L1 pending events when running L2 and L1 got posted-interrupt Liran Alon
2018-01-02 2:45 ` Quan Xu
2018-01-02 9:57 ` Liran Alon
2018-01-02 11:21 ` Quan Xu
2018-01-02 11:52 ` Quan Xu
2018-01-02 12:20 ` Liran Alon
2018-01-03 5:32 ` Quan Xu
2018-01-03 5:35 ` Quan Xu
2017-12-24 16:12 ` [PATCH v3 04/11] KVM: nVMX: Fix injection to L2 when L1 don't intercept external-interrupts Liran Alon
2017-12-24 16:12 ` [PATCH v3 05/11] KVM: x86: Rename functions which saves vCPU in per-cpu var Liran Alon
2017-12-24 16:12 ` [PATCH v3 06/11] KVM: x86: Set current_vcpu per-cpu var before enabling interrupts at host Liran Alon
2017-12-27 10:06 ` Paolo Bonzini
2017-12-27 10:44 ` Liran Alon
2017-12-24 16:12 ` [PATCH v3 07/11] KVM: x86: Add util for getting current vCPU running on CPU Liran Alon
2017-12-24 16:13 ` [PATCH v3 08/11] KVM: x86: Register empty handler for POSTED_INTR_NESTED_VECTOR IPI Liran Alon
2017-12-24 16:13 ` [PATCH v3 09/11] KVM: nVMX: Deliver missed nested-PI notification-vector via self-IPI while interrupts disabled Liran Alon
2017-12-24 16:13 ` [PATCH v3 10/11] KVM: nVMX: Wake halted L2 on nested posted-interrupt Liran Alon
2017-12-27 11:31 ` Paolo Bonzini
2017-12-27 12:01 ` Liran Alon
2017-12-27 12:27 ` Paolo Bonzini
2017-12-27 12:52 ` Liran Alon
2017-12-27 13:05 ` Paolo Bonzini
2017-12-27 15:33 ` Liran Alon
2017-12-27 15:54 ` Paolo Bonzini
2018-01-01 21:32 ` Paolo Bonzini
2018-01-01 22:37 ` Liran Alon
2018-01-02 7:25 ` Paolo Bonzini
2017-12-24 16:13 ` [PATCH v3 11/11] KVM: nVMX: Wake L2 from HLT when nested posted-interrupt pending Liran Alon
2017-12-27 10:15 ` Paolo Bonzini
2017-12-27 10:51 ` Liran Alon
2017-12-27 12:55 ` Paolo Bonzini
2017-12-27 15:15 ` Liran Alon [this message]
2017-12-27 15:55 ` Paolo Bonzini
2020-11-23 19:22 ` Oliver Upton
2020-11-23 22:42 ` Paolo Bonzini
2020-11-24 0:10 ` Oliver Upton
2020-11-24 0:13 ` Oliver Upton
2020-11-24 1:55 ` Sean Christopherson
2020-11-24 3:19 ` Sean Christopherson
2020-11-24 11:39 ` Paolo Bonzini
2020-11-24 21:22 ` Sean Christopherson
2020-11-25 0:10 ` Paolo Bonzini
2020-11-25 1:14 ` Sean Christopherson
2020-11-25 17:00 ` Paolo Bonzini
2020-11-25 18:32 ` Sean Christopherson
2020-11-26 13:13 ` Paolo Bonzini
2020-11-30 19:14 ` Sean Christopherson
2020-11-30 19:36 ` Paolo Bonzini
2020-12-03 22:07 ` Jim Mattson
2020-11-24 11:09 ` Paolo Bonzini
2020-12-03 21:45 ` Jim Mattson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5A43B911.3090500@ORACLE.COM \
--to=liran.alon@oracle.com \
--cc=idan.brown@ORACLE.COM \
--cc=jmattson@google.com \
--cc=kvm@vger.kernel.org \
--cc=liam.merwick@ORACLE.COM \
--cc=pbonzini@redhat.com \
--cc=rkrcmar@redhat.com \
--cc=wanpeng.li@hotmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.