From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
mlevitsk@redhat.com, vkuznets@redhat.com
Subject: Re: [PATCH v2 9/9] KVM: x86: never write to memory from kvm_vcpu_check_block
Date: Tue, 20 Sep 2022 00:32:47 +0000 [thread overview]
Message-ID: <YykKLx+EMufA+uuZ@google.com> (raw)
In-Reply-To: <20220811210605.402337-10-pbonzini@redhat.com>
On Thu, Aug 11, 2022, Paolo Bonzini wrote:
> kvm_vcpu_check_block() is called while not in TASK_RUNNING, and therefore
> it cannot sleep. Writing to guest memory is therefore forbidden, but it
> can happen on AMD processors if kvm_check_nested_events() causes a vmexit.
>
> Fortunately, all events that are caught by kvm_check_nested_events() are
> also recognized by kvm_vcpu_has_events() through vendor callbacks such as
> kvm_x86_interrupt_allowed() or kvm_x86_ops.nested_ops->has_events(), so
> remove the call and postpone the actual processing to vcpu_block().
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> arch/x86/kvm/x86.c | 14 +++++++++++---
> 1 file changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 5e9358ea112b..9226fd536783 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10639,6 +10639,17 @@ static inline int vcpu_block(struct kvm_vcpu *vcpu)
> return 1;
> }
>
> + if (is_guest_mode(vcpu)) {
> + /*
> + * Evaluate nested events before exiting the halted state.
> + * This allows the halt state to be recorded properly in
> + * the VMCS12's activity state field (AMD does not have
> + * a similar field and a vmexit always causes a spurious
> + * wakeup from HLT).
> + */
> + kvm_check_nested_events(vcpu);
> + }
> +
> if (kvm_apic_accept_events(vcpu) < 0)
> return 0;
Oof, this ends up yielding a really confusing code sequence. kvm_apic_accept_events()
has its own kvm_check_nested_events(), but has code to snapshot pending INITs/SIPIs
_before_ the call. Unpacked, KVM ends up with:
if (is_guest_mode(vcpu))
kvm_check_nested_events(vcpu);
/*
* Read pending events before calling the check_events
* callback.
*/
pe = smp_load_acquire(&apic->pending_events);
if (!pe)
return 0;
if (is_guest_mode(vcpu)) {
r = kvm_check_nested_events(vcpu);
if (r < 0)
return r == -EBUSY ? 0 : r;
}
if (kvm_vcpu_latch_init(vcpu)) {
WARN_ON_ONCE(vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED);
if (test_bit(KVM_APIC_SIPI, &pe))
clear_bit(KVM_APIC_SIPI, &apic->pending_events);
return 0;
}
if (test_bit(KVM_APIC_INIT, &pe)) {
clear_bit(KVM_APIC_INIT, &apic->pending_events);
kvm_vcpu_reset(vcpu, true);
if (kvm_vcpu_is_bsp(apic->vcpu))
vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
else
vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
}
if (test_bit(KVM_APIC_SIPI, &pe)) {
clear_bit(KVM_APIC_SIPI, &apic->pending_events);
if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
/* evaluate pending_events before reading the vector */
smp_rmb();
sipi_vector = apic->sipi_vector;
static_call(kvm_x86_vcpu_deliver_sipi_vector)(vcpu, sipi_vector);
vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
}
}
which on the surface makes this code look broken, e.g. if kvm_check_nested_events()
_needs_ to be after the pending_events snapshot is taken, why is it safe to add a
kvm_check_nested_events() call immediately before the snapshot?
In reality, it's just a bunch of noise because the pending events snapshot is
completely unnecessary and subtly relies on INIT+SIPI being blocked after VM-Exit
on VMX (and SVM, but it's more important for VMX).
In particular, testing "pe" after VM-Exit is nonsensical. On VMX, events are consumed
if they trigger VM-Exit, i.e. processing INIT/SIPI is flat out wrong if the INIT/SIPI
was the direct cause of VM-Exit. On SVM, events are left pending, so if any pending
INIT/SIPI will still be there.
The VMX code works because kvm_vcpu_latch_init(), a.k.a. "is INIT blocked", is
always true after VM-Exit since INIT is always blocked in VMX root mode. Ditto for
the conditional clearing of SIPI; the CPU can't be in wait-for-SIPI immediately
after VM-Exit and so dropping SIPIs ends up being architecturally ok.
I'll add a patch to drop the snapshot code, assuming I'm not missing something even
more subtle...
next prev parent reply other threads:[~2022-09-20 0:32 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-11 21:05 [PATCH v2 0/9] KVM: x86: never write to memory from kvm_vcpu_check_block Paolo Bonzini
2022-08-11 21:05 ` [PATCH v2 1/9] KVM: x86: check validity of argument to KVM_SET_MP_STATE Paolo Bonzini
2022-08-15 13:31 ` Maxim Levitsky
2022-08-16 22:50 ` Sean Christopherson
2022-08-11 21:05 ` [PATCH v2 2/9] KVM: x86: remove return value of kvm_vcpu_block Paolo Bonzini
2022-08-16 23:34 ` Sean Christopherson
2022-08-17 14:10 ` Maxim Levitsky
2022-08-17 15:31 ` Paolo Bonzini
2022-08-17 16:41 ` Sean Christopherson
2022-08-17 16:49 ` Paolo Bonzini
2022-09-20 0:42 ` Sean Christopherson
2022-08-11 21:05 ` [PATCH v2 3/9] KVM: x86: make kvm_vcpu_{block,halt} return whether vCPU is runnable Paolo Bonzini
2022-08-11 21:06 ` [PATCH v2 4/9] KVM: mips, x86: do not rely on KVM_REQ_UNHALT Paolo Bonzini
2022-08-11 21:06 ` [PATCH v2 5/9] KVM: remove KVM_REQ_UNHALT Paolo Bonzini
2022-08-11 21:06 ` [PATCH v2 6/9] KVM: x86: make vendor code check for all nested events Paolo Bonzini
2022-08-16 23:47 ` Sean Christopherson
2022-08-17 14:10 ` Maxim Levitsky
2022-08-11 21:06 ` [PATCH v2 7/9] KVM: nVMX: Make an event request when pending an MTF nested VM-Exit Paolo Bonzini
2022-08-17 14:11 ` Maxim Levitsky
2022-08-11 21:06 ` [PATCH v2 8/9] KVM: x86: lapic does not have to process INIT if it is blocked Paolo Bonzini
2022-08-17 0:07 ` Sean Christopherson
2022-08-17 14:11 ` Maxim Levitsky
2022-08-17 15:33 ` Paolo Bonzini
2022-08-11 21:06 ` [PATCH v2 9/9] KVM: x86: never write to memory from kvm_vcpu_check_block Paolo Bonzini
2022-08-16 23:45 ` Sean Christopherson
2022-08-17 14:11 ` Maxim Levitsky
2022-09-20 0:32 ` Sean Christopherson [this message]
2022-09-20 0:55 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YykKLx+EMufA+uuZ@google.com \
--to=seanjc@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mlevitsk@redhat.com \
--cc=pbonzini@redhat.com \
--cc=vkuznets@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.