From: Sean Christopherson <seanjc@google.com>
To: "Li, Xin3" <xin3.li@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
Paolo Bonzini <pbonzini@redhat.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"x86@kernel.org" <x86@kernel.org>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"mingo@redhat.com" <mingo@redhat.com>,
"bp@alien8.de" <bp@alien8.de>,
"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
"hpa@zytor.com" <hpa@zytor.com>,
"Tian, Kevin" <kevin.tian@intel.com>
Subject: Re: [RESEND PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi_irq() for NMI/IRQ reinjection
Date: Wed, 23 Nov 2022 20:42:51 +0000 [thread overview]
Message-ID: <Y36Fy/OYO5u0AzEG@google.com> (raw)
In-Reply-To: <BN6PR1101MB216162F44664713802201FAFA80C9@BN6PR1101MB2161.namprd11.prod.outlook.com>
On Wed, Nov 23, 2022, Li, Xin3 wrote:
> > Actually, SVM already enables NMIs far earlier, which means the probability of
> > breaking something by moving NMI handling earlier is lower. Not zero, as I
> > don't trust that SVM gets all the corner right, but definitely low.
Duh. Moving NMI handling much earlier in VMX's VM-Exit path _must_ be safe,
otherwise we already have problems. On VMX, NMIs are enabled right up until
VM-Enter (VMLAUNCH/VMRESUME), and unless the VM-Exit is due to an NMI, NMIs are
enabled immediately after VM-Exit.
This case is problematic because the CPU can end up running the NMI handler with
NMIs enabled, not because it might run with incorrect state loaded.
> > E.g. this should be doable
>
> Do we need to recover *all* host states before switching to NMI stack and
> handler?
As above, no.
> if not what is the minimal set?
The absolute minimal set is what is context switched via the VMCS.
> If we restore the minimal set and do "int $2" as early as possible, is it a
> quick and dirty approach?
No, it is simply "the approach". If there are other problems with respect to
noinstr, then they can/should be fixed separately.
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index
> > cea8c07f5229..2fec93f38960 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -7249,6 +7249,8 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu
> > *vcpu)
> > if (unlikely(vmx->exit_reason.failed_vmentry))
> > return EXIT_FASTPATH_NONE;
> >
> > + <handle NMIs here>
> > +
> > vmx->loaded_vmcs->launched = 1;
> >
> > vmx_recover_nmi_blocking(vmx);
> >
> >
> > thouh we'd like want a fair bit of refactoring so that all of vmx_vcpu_run() and
> > svm_vcpu_run() don't need to be noinstr.
For the record, svm_vcpu_run() is fine, at least as far as NMIs are concerned.
> This sounds reasonable to me, however from Documentation/core-api/entry.rst,
> we do need it.
Why do you say that?
I believe this is what we should end up with. Compile tested only, and needs to
split up into 4+ patches. I'll test and massage this into a proper series next
week (US holiday the rest of this week).
---
arch/x86/kvm/kvm_cache_regs.h | 16 +++++------
arch/x86/kvm/vmx/vmenter.S | 4 +--
arch/x86/kvm/vmx/vmx.c | 51 ++++++++++++++++++-----------------
arch/x86/kvm/vmx/vmx.h | 2 +-
arch/x86/kvm/x86.h | 6 ++---
5 files changed, 41 insertions(+), 38 deletions(-)
diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index c09174f73a34..af9bd0374915 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -50,26 +50,26 @@ BUILD_KVM_GPR_ACCESSORS(r15, R15)
* 1 0 register in vcpu->arch
* 1 1 register in vcpu->arch, needs to be stored back
*/
-static inline bool kvm_register_is_available(struct kvm_vcpu *vcpu,
- enum kvm_reg reg)
+static __always_inline bool kvm_register_is_available(struct kvm_vcpu *vcpu,
+ enum kvm_reg reg)
{
return test_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
}
-static inline bool kvm_register_is_dirty(struct kvm_vcpu *vcpu,
- enum kvm_reg reg)
+static __always_inline bool kvm_register_is_dirty(struct kvm_vcpu *vcpu,
+ enum kvm_reg reg)
{
return test_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty);
}
-static inline void kvm_register_mark_available(struct kvm_vcpu *vcpu,
- enum kvm_reg reg)
+static __always_inline void kvm_register_mark_available(struct kvm_vcpu *vcpu,
+ enum kvm_reg reg)
{
__set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
}
-static inline void kvm_register_mark_dirty(struct kvm_vcpu *vcpu,
- enum kvm_reg reg)
+static __always_inline void kvm_register_mark_dirty(struct kvm_vcpu *vcpu,
+ enum kvm_reg reg)
{
__set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
__set_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty);
diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index 0b5db4de4d09..b104dfd282ed 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -318,7 +318,7 @@ SYM_FUNC_START(vmread_error_trampoline)
RET
SYM_FUNC_END(vmread_error_trampoline)
-SYM_FUNC_START(vmx_do_interrupt_nmi_irqoff)
+SYM_FUNC_START(vmx_do_interrupt_irqoff)
/*
* Unconditionally create a stack frame, getting the correct RSP on the
* stack (for x86-64) would take two instructions anyways, and RBP can
@@ -349,4 +349,4 @@ SYM_FUNC_START(vmx_do_interrupt_nmi_irqoff)
mov %_ASM_BP, %_ASM_SP
pop %_ASM_BP
RET
-SYM_FUNC_END(vmx_do_interrupt_nmi_irqoff)
+SYM_FUNC_END(vmx_do_interrupt_irqoff)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cea8c07f5229..b721fde4f5af 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5064,8 +5064,13 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
vect_info = vmx->idt_vectoring_info;
intr_info = vmx_get_intr_info(vcpu);
+ /*
+ * Machine checks are handled by handle_exception_irqoff(), or by
+ * vmx_vcpu_run() if a #MC occurs on VM-Entry. NMIs are handled by
+ * vmx_vcpu_enter_exit().
+ */
if (is_machine_check(intr_info) || is_nmi(intr_info))
- return 1; /* handled by handle_exception_nmi_irqoff() */
+ return 1;
/*
* Queue the exception here instead of in handle_nm_fault_irqoff().
@@ -6755,18 +6760,6 @@ static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu)
memset(vmx->pi_desc.pir, 0, sizeof(vmx->pi_desc.pir));
}
-void vmx_do_interrupt_nmi_irqoff(unsigned long entry);
-
-static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu,
- unsigned long entry)
-{
- bool is_nmi = entry == (unsigned long)asm_exc_nmi_noist;
-
- kvm_before_interrupt(vcpu, is_nmi ? KVM_HANDLING_NMI : KVM_HANDLING_IRQ);
- vmx_do_interrupt_nmi_irqoff(entry);
- kvm_after_interrupt(vcpu);
-}
-
static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu)
{
/*
@@ -6787,9 +6780,8 @@ static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu)
rdmsrl(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err);
}
-static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
+static void handle_exception_irqoff(struct vcpu_vmx *vmx)
{
- const unsigned long nmi_entry = (unsigned long)asm_exc_nmi_noist;
u32 intr_info = vmx_get_intr_info(&vmx->vcpu);
/* if exit due to PF check for async PF */
@@ -6801,11 +6793,10 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
/* Handle machine checks before interrupts are enabled */
else if (is_machine_check(intr_info))
kvm_machine_check();
- /* We need to handle NMIs before interrupts are enabled */
- else if (is_nmi(intr_info))
- handle_interrupt_nmi_irqoff(&vmx->vcpu, nmi_entry);
}
+void vmx_do_interrupt_irqoff(unsigned long entry);
+
static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
{
u32 intr_info = vmx_get_intr_info(vcpu);
@@ -6816,7 +6807,10 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
"KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
return;
- handle_interrupt_nmi_irqoff(vcpu, gate_offset(desc));
+ kvm_before_interrupt(vcpu, KVM_HANDLING_IRQ);
+ vmx_do_interrupt_irqoff(gate_offset(desc));
+ kvm_after_interrupt(vcpu);
+
vcpu->arch.at_instruction_boundary = true;
}
@@ -6830,7 +6824,7 @@ static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
if (vmx->exit_reason.basic == EXIT_REASON_EXTERNAL_INTERRUPT)
handle_external_interrupt_irqoff(vcpu);
else if (vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI)
- handle_exception_nmi_irqoff(vmx);
+ handle_exception_irqoff(vmx);
}
/*
@@ -7091,6 +7085,18 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
vmx_enable_fb_clear(vmx);
+ if (unlikely(vmx->fail))
+ vmx->exit_reason.full = 0xdead;
+ else
+ vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON);
+
+ if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI &&
+ is_nmi(vmx_get_intr_info(vcpu))) {
+ kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
+ asm("int $2");
+ kvm_after_interrupt(vcpu);
+ }
+
guest_state_exit_irqoff();
}
@@ -7232,12 +7238,9 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
vmx->idt_vectoring_info = 0;
- if (unlikely(vmx->fail)) {
- vmx->exit_reason.full = 0xdead;
+ if (unlikely(vmx->fail))
return EXIT_FASTPATH_NONE;
- }
- vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON);
if (unlikely((u16)vmx->exit_reason.basic == EXIT_REASON_MCE_DURING_VMENTRY))
kvm_machine_check();
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index a3da84f4ea45..eb52cfde5ec2 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -680,7 +680,7 @@ static inline unsigned long vmx_get_exit_qual(struct kvm_vcpu *vcpu)
return vmx->exit_qualification;
}
-static inline u32 vmx_get_intr_info(struct kvm_vcpu *vcpu)
+static __always_inline u32 vmx_get_intr_info(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 9de72586f406..44d1827f0a30 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -382,13 +382,13 @@ enum kvm_intr_type {
KVM_HANDLING_NMI,
};
-static inline void kvm_before_interrupt(struct kvm_vcpu *vcpu,
- enum kvm_intr_type intr)
+static __always_inline void kvm_before_interrupt(struct kvm_vcpu *vcpu,
+ enum kvm_intr_type intr)
{
WRITE_ONCE(vcpu->arch.handling_intr_from_guest, (u8)intr);
}
-static inline void kvm_after_interrupt(struct kvm_vcpu *vcpu)
+static __always_inline void kvm_after_interrupt(struct kvm_vcpu *vcpu)
{
WRITE_ONCE(vcpu->arch.handling_intr_from_guest, 0);
}
base-commit: 0fa32dad1e78629cb42999dacd82489503fdf4c2
--
next prev parent reply other threads:[~2022-11-23 20:43 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-10 6:15 [RESEND PATCH 0/6] x86/traps,VMX: implement software based NMI/IRQ dispatch for VMX NMI/IRQ reinjection Xin Li
2022-11-10 6:15 ` [RESEND PATCH 1/6] x86/traps: let common_interrupt() handle IRQ_MOVE_CLEANUP_VECTOR Xin Li
2022-11-10 18:59 ` Ashok Raj
2022-11-10 22:09 ` Li, Xin3
2022-11-10 6:15 ` [RESEND PATCH 2/6] x86/traps: add a system interrupt table for system interrupt dispatch Xin Li
2022-11-10 8:56 ` Peter Zijlstra
2022-11-10 19:55 ` Li, Xin3
2022-11-10 20:36 ` Li, Xin3
2022-11-10 21:12 ` Nathan Chancellor
2022-11-10 23:00 ` Li, Xin3
2022-11-11 0:08 ` Nathan Chancellor
2022-11-11 3:03 ` Li, Xin3
2022-11-11 8:58 ` Peter Zijlstra
2022-11-11 1:12 ` Tian, Kevin
2022-11-11 3:54 ` Li, Xin3
2022-11-11 8:55 ` Peter Zijlstra
2022-11-11 22:07 ` H. Peter Anvin
2022-11-12 9:47 ` Peter Zijlstra
2022-11-10 6:15 ` [RESEND PATCH 3/6] x86/traps: add install_system_interrupt_handler() Xin Li
2022-11-10 6:15 ` [RESEND PATCH 4/6] x86/traps: add external_interrupt() to dispatch external interrupts Xin Li
2022-11-10 16:24 ` Sean Christopherson
2022-11-10 18:02 ` Li, Xin3
2022-11-10 20:10 ` Sean Christopherson
2022-11-10 6:15 ` [RESEND PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi_irq() for NMI/IRQ reinjection Xin Li
2022-11-10 9:03 ` Peter Zijlstra
2022-11-10 20:53 ` Li, Xin3
2022-11-11 9:15 ` Peter Zijlstra
2022-11-11 12:04 ` Paolo Bonzini
2022-11-11 12:19 ` Peter Zijlstra
2022-11-11 12:48 ` Paolo Bonzini
2022-11-11 14:23 ` Peter Zijlstra
2022-11-11 16:35 ` Andrew Cooper
2022-11-11 22:22 ` H. Peter Anvin
2022-11-12 0:08 ` Andrew Cooper
2022-11-11 18:06 ` Li, Xin3
2022-11-11 19:33 ` Peter Zijlstra
2022-11-12 6:35 ` Li, Xin3
2022-11-14 4:39 ` Li, Xin3
2022-11-14 9:08 ` Peter Zijlstra
2022-11-15 7:50 ` Li, Xin3
2022-11-15 9:17 ` Peter Zijlstra
2022-11-17 3:37 ` Li, Xin3
2022-11-17 15:51 ` Sean Christopherson
2022-11-18 0:05 ` Li, Xin3
2022-11-22 13:00 ` Li, Xin3
2022-11-22 20:52 ` Sean Christopherson
2022-11-23 8:31 ` Li, Xin3
2022-11-23 20:42 ` Sean Christopherson [this message]
2022-11-24 3:40 ` Li, Xin3
2022-11-28 16:26 ` Sean Christopherson
2022-11-24 9:46 ` Peter Zijlstra
2022-11-28 19:05 ` Sean Christopherson
2022-11-23 9:16 ` Peter Zijlstra
2022-11-23 19:18 ` Sean Christopherson
2022-11-11 22:15 ` H. Peter Anvin
2022-11-10 6:15 ` [RESEND PATCH 6/6] x86/traps: remove unused NMI entry exc_nmi_noist() Xin Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y36Fy/OYO5u0AzEG@google.com \
--to=seanjc@google.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
--cc=xin3.li@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.