From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
mlevitsk@redhat.com, vkuznets@redhat.com
Subject: Re: [PATCH v2 9/9] KVM: x86: never write to memory from kvm_vcpu_check_block
Date: Tue, 20 Sep 2022 00:32:47 +0000 [thread overview]
Message-ID: <YykKLx+EMufA+uuZ@google.com> (raw)
In-Reply-To: <20220811210605.402337-10-pbonzini@redhat.com>
On Thu, Aug 11, 2022, Paolo Bonzini wrote:
> kvm_vcpu_check_block() is called while not in TASK_RUNNING, and therefore
> it cannot sleep. Writing to guest memory is therefore forbidden, but it
> can happen on AMD processors if kvm_check_nested_events() causes a vmexit.
>
> Fortunately, all events that are caught by kvm_check_nested_events() are
> also recognized by kvm_vcpu_has_events() through vendor callbacks such as
> kvm_x86_interrupt_allowed() or kvm_x86_ops.nested_ops->has_events(), so
> remove the call and postpone the actual processing to vcpu_block().
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> arch/x86/kvm/x86.c | 14 +++++++++++---
> 1 file changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 5e9358ea112b..9226fd536783 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10639,6 +10639,17 @@ static inline int vcpu_block(struct kvm_vcpu *vcpu)
> return 1;
> }
>
> + if (is_guest_mode(vcpu)) {
> + /*
> + * Evaluate nested events before exiting the halted state.
> + * This allows the halt state to be recorded properly in
> + * the VMCS12's activity state field (AMD does not have
> + * a similar field and a vmexit always causes a spurious
> + * wakeup from HLT).
> + */
> + kvm_check_nested_events(vcpu);
> + }
> +
> if (kvm_apic_accept_events(vcpu) < 0)
> return 0;
Oof, this ends up yielding a really confusing code sequence. kvm_apic_accept_events()
has its own kvm_check_nested_events(), but has code to snapshot pending INITs/SIPIs
_before_ the call. Unpacked, KVM ends up with:
if (is_guest_mode(vcpu))
kvm_check_nested_events(vcpu);
/*
* Read pending events before calling the check_events
* callback.
*/
pe = smp_load_acquire(&apic->pending_events);
if (!pe)
return 0;
if (is_guest_mode(vcpu)) {
r = kvm_check_nested_events(vcpu);
if (r < 0)
return r == -EBUSY ? 0 : r;
}
if (kvm_vcpu_latch_init(vcpu)) {
WARN_ON_ONCE(vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED);
if (test_bit(KVM_APIC_SIPI, &pe))
clear_bit(KVM_APIC_SIPI, &apic->pending_events);
return 0;
}
if (test_bit(KVM_APIC_INIT, &pe)) {
clear_bit(KVM_APIC_INIT, &apic->pending_events);
kvm_vcpu_reset(vcpu, true);
if (kvm_vcpu_is_bsp(apic->vcpu))
vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
else
vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
}
if (test_bit(KVM_APIC_SIPI, &pe)) {
clear_bit(KVM_APIC_SIPI, &apic->pending_events);
if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
/* evaluate pending_events before reading the vector */
smp_rmb();
sipi_vector = apic->sipi_vector;
static_call(kvm_x86_vcpu_deliver_sipi_vector)(vcpu, sipi_vector);
vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
}
}
which on the surface makes this code look broken, e.g. if kvm_check_nested_events()
_needs_ to be after the pending_events snapshot is taken, why is it safe to add a
kvm_check_nested_events() call immediately before the snapshot?
In reality, it's just a bunch of noise because the pending events snapshot is
completely unnecessary and subtly relies on INIT+SIPI being blocked after VM-Exit
on VMX (and SVM, but it's more important for VMX).
In particular, testing "pe" after VM-Exit is nonsensical. On VMX, events are consumed
if they trigger VM-Exit, i.e. processing INIT/SIPI is flat out wrong if the INIT/SIPI
was the direct cause of VM-Exit. On SVM, events are left pending, so if any pending
INIT/SIPI will still be there.
The VMX code works because kvm_vcpu_latch_init(), a.k.a. "is INIT blocked", is
always true after VM-Exit since INIT is always blocked in VMX root mode. Ditto for
the conditional clearing of SIPI; the CPU can't be in wait-for-SIPI immediately
after VM-Exit and so dropping SIPIs ends up being architecturally ok.
I'll add a patch to drop the snapshot code, assuming I'm not missing something even
more subtle...
next prev parent reply other threads:[~2022-09-20 0:32 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-11 21:05 [PATCH v2 0/9] KVM: x86: never write to memory from kvm_vcpu_check_block Paolo Bonzini
2022-08-11 21:05 ` [PATCH v2 1/9] KVM: x86: check validity of argument to KVM_SET_MP_STATE Paolo Bonzini
2022-08-15 13:31 ` Maxim Levitsky
2022-08-16 22:50 ` Sean Christopherson
2022-08-11 21:05 ` [PATCH v2 2/9] KVM: x86: remove return value of kvm_vcpu_block Paolo Bonzini
2022-08-16 23:34 ` Sean Christopherson
2022-08-17 14:10 ` Maxim Levitsky
2022-08-17 15:31 ` Paolo Bonzini
2022-08-17 16:41 ` Sean Christopherson
2022-08-17 16:49 ` Paolo Bonzini
2022-09-20 0:42 ` Sean Christopherson
2022-08-11 21:05 ` [PATCH v2 3/9] KVM: x86: make kvm_vcpu_{block,halt} return whether vCPU is runnable Paolo Bonzini
2022-08-11 21:06 ` [PATCH v2 4/9] KVM: mips, x86: do not rely on KVM_REQ_UNHALT Paolo Bonzini
2022-08-11 21:06 ` [PATCH v2 5/9] KVM: remove KVM_REQ_UNHALT Paolo Bonzini
2022-08-11 21:06 ` [PATCH v2 6/9] KVM: x86: make vendor code check for all nested events Paolo Bonzini
2022-08-16 23:47 ` Sean Christopherson
2022-08-17 14:10 ` Maxim Levitsky
2022-08-11 21:06 ` [PATCH v2 7/9] KVM: nVMX: Make an event request when pending an MTF nested VM-Exit Paolo Bonzini
2022-08-17 14:11 ` Maxim Levitsky
2022-08-11 21:06 ` [PATCH v2 8/9] KVM: x86: lapic does not have to process INIT if it is blocked Paolo Bonzini
2022-08-17 0:07 ` Sean Christopherson
2022-08-17 14:11 ` Maxim Levitsky
2022-08-17 15:33 ` Paolo Bonzini
2022-08-11 21:06 ` [PATCH v2 9/9] KVM: x86: never write to memory from kvm_vcpu_check_block Paolo Bonzini
2022-08-16 23:45 ` Sean Christopherson
2022-08-17 14:11 ` Maxim Levitsky
2022-09-20 0:32 ` Sean Christopherson [this message]
2022-09-20 0:55 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YykKLx+EMufA+uuZ@google.com \
--to=seanjc@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mlevitsk@redhat.com \
--cc=pbonzini@redhat.com \
--cc=vkuznets@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox