From: Sean Christopherson <seanjc@google.com>
To: "Li, Xin3" <xin3.li@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
Paolo Bonzini <pbonzini@redhat.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"x86@kernel.org" <x86@kernel.org>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"mingo@redhat.com" <mingo@redhat.com>,
"bp@alien8.de" <bp@alien8.de>,
"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
"hpa@zytor.com" <hpa@zytor.com>,
"Tian, Kevin" <kevin.tian@intel.com>
Subject: Re: [RESEND PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi_irq() for NMI/IRQ reinjection
Date: Wed, 23 Nov 2022 20:42:51 +0000 [thread overview]
Message-ID: <Y36Fy/OYO5u0AzEG@google.com> (raw)
In-Reply-To: <BN6PR1101MB216162F44664713802201FAFA80C9@BN6PR1101MB2161.namprd11.prod.outlook.com>
On Wed, Nov 23, 2022, Li, Xin3 wrote:
> > Actually, SVM already enables NMIs far earlier, which means the probability of
> > breaking something by moving NMI handling earlier is lower. Not zero, as I
> > don't trust that SVM gets all the corner right, but definitely low.
Duh. Moving NMI handling much earlier in VMX's VM-Exit path _must_ be safe,
otherwise we already have problems. On VMX, NMIs are enabled right up until
VM-Enter (VMLAUNCH/VMRESUME), and unless the VM-Exit is due to an NMI, NMIs are
enabled immediately after VM-Exit.
This case is problematic because the CPU can end up running the NMI handler with
NMIs enabled, not because it might run with incorrect state loaded.
> > E.g. this should be doable
>
> Do we need to recover *all* host states before switching to NMI stack and
> handler?
As above, no.
> if not what is the minimal set?
The absolute minimal set is what is context switched via the VMCS.
> If we restore the minimal set and do "int $2" as early as possible, is it a
> quick and dirty approach?
No, it is simply "the approach". If there are other problems with respect to
noinstr, then they can/should be fixed separately.
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index
> > cea8c07f5229..2fec93f38960 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -7249,6 +7249,8 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu
> > *vcpu)
> > if (unlikely(vmx->exit_reason.failed_vmentry))
> > return EXIT_FASTPATH_NONE;
> >
> > + <handle NMIs here>
> > +
> > vmx->loaded_vmcs->launched = 1;
> >
> > vmx_recover_nmi_blocking(vmx);
> >
> >
> > thouh we'd like want a fair bit of refactoring so that all of vmx_vcpu_run() and
> > svm_vcpu_run() don't need to be noinstr.
For the record, svm_vcpu_run() is fine, at least as far as NMIs are concerned.
> This sounds reasonable to me, however from Documentation/core-api/entry.rst,
> we do need it.
Why do you say that?
I believe this is what we should end up with. Compile tested only, and needs to
split up into 4+ patches. I'll test and massage this into a proper series next
week (US holiday the rest of this week).
---
arch/x86/kvm/kvm_cache_regs.h | 16 +++++------
arch/x86/kvm/vmx/vmenter.S | 4 +--
arch/x86/kvm/vmx/vmx.c | 51 ++++++++++++++++++-----------------
arch/x86/kvm/vmx/vmx.h | 2 +-
arch/x86/kvm/x86.h | 6 ++---
5 files changed, 41 insertions(+), 38 deletions(-)
diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index c09174f73a34..af9bd0374915 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -50,26 +50,26 @@ BUILD_KVM_GPR_ACCESSORS(r15, R15)
* 1 0 register in vcpu->arch
* 1 1 register in vcpu->arch, needs to be stored back
*/
-static inline bool kvm_register_is_available(struct kvm_vcpu *vcpu,
- enum kvm_reg reg)
+static __always_inline bool kvm_register_is_available(struct kvm_vcpu *vcpu,
+ enum kvm_reg reg)
{
return test_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
}
-static inline bool kvm_register_is_dirty(struct kvm_vcpu *vcpu,
- enum kvm_reg reg)
+static __always_inline bool kvm_register_is_dirty(struct kvm_vcpu *vcpu,
+ enum kvm_reg reg)
{
return test_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty);
}
-static inline void kvm_register_mark_available(struct kvm_vcpu *vcpu,
- enum kvm_reg reg)
+static __always_inline void kvm_register_mark_available(struct kvm_vcpu *vcpu,
+ enum kvm_reg reg)
{
__set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
}
-static inline void kvm_register_mark_dirty(struct kvm_vcpu *vcpu,
- enum kvm_reg reg)
+static __always_inline void kvm_register_mark_dirty(struct kvm_vcpu *vcpu,
+ enum kvm_reg reg)
{
__set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
__set_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty);
diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index 0b5db4de4d09..b104dfd282ed 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -318,7 +318,7 @@ SYM_FUNC_START(vmread_error_trampoline)
RET
SYM_FUNC_END(vmread_error_trampoline)
-SYM_FUNC_START(vmx_do_interrupt_nmi_irqoff)
+SYM_FUNC_START(vmx_do_interrupt_irqoff)
/*
* Unconditionally create a stack frame, getting the correct RSP on the
* stack (for x86-64) would take two instructions anyways, and RBP can
@@ -349,4 +349,4 @@ SYM_FUNC_START(vmx_do_interrupt_nmi_irqoff)
mov %_ASM_BP, %_ASM_SP
pop %_ASM_BP
RET
-SYM_FUNC_END(vmx_do_interrupt_nmi_irqoff)
+SYM_FUNC_END(vmx_do_interrupt_irqoff)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cea8c07f5229..b721fde4f5af 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5064,8 +5064,13 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
vect_info = vmx->idt_vectoring_info;
intr_info = vmx_get_intr_info(vcpu);
+ /*
+ * Machine checks are handled by handle_exception_irqoff(), or by
+ * vmx_vcpu_run() if a #MC occurs on VM-Entry. NMIs are handled by
+ * vmx_vcpu_enter_exit().
+ */
if (is_machine_check(intr_info) || is_nmi(intr_info))
- return 1; /* handled by handle_exception_nmi_irqoff() */
+ return 1;
/*
* Queue the exception here instead of in handle_nm_fault_irqoff().
@@ -6755,18 +6760,6 @@ static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu)
memset(vmx->pi_desc.pir, 0, sizeof(vmx->pi_desc.pir));
}
-void vmx_do_interrupt_nmi_irqoff(unsigned long entry);
-
-static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu,
- unsigned long entry)
-{
- bool is_nmi = entry == (unsigned long)asm_exc_nmi_noist;
-
- kvm_before_interrupt(vcpu, is_nmi ? KVM_HANDLING_NMI : KVM_HANDLING_IRQ);
- vmx_do_interrupt_nmi_irqoff(entry);
- kvm_after_interrupt(vcpu);
-}
-
static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu)
{
/*
@@ -6787,9 +6780,8 @@ static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu)
rdmsrl(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err);
}
-static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
+static void handle_exception_irqoff(struct vcpu_vmx *vmx)
{
- const unsigned long nmi_entry = (unsigned long)asm_exc_nmi_noist;
u32 intr_info = vmx_get_intr_info(&vmx->vcpu);
/* if exit due to PF check for async PF */
@@ -6801,11 +6793,10 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
/* Handle machine checks before interrupts are enabled */
else if (is_machine_check(intr_info))
kvm_machine_check();
- /* We need to handle NMIs before interrupts are enabled */
- else if (is_nmi(intr_info))
- handle_interrupt_nmi_irqoff(&vmx->vcpu, nmi_entry);
}
+void vmx_do_interrupt_irqoff(unsigned long entry);
+
static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
{
u32 intr_info = vmx_get_intr_info(vcpu);
@@ -6816,7 +6807,10 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
"KVM: unexpected VM-Exit interrupt info: 0x%x", intr_info))
return;
- handle_interrupt_nmi_irqoff(vcpu, gate_offset(desc));
+ kvm_before_interrupt(vcpu, KVM_HANDLING_IRQ);
+ vmx_do_interrupt_irqoff(gate_offset(desc));
+ kvm_after_interrupt(vcpu);
+
vcpu->arch.at_instruction_boundary = true;
}
@@ -6830,7 +6824,7 @@ static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
if (vmx->exit_reason.basic == EXIT_REASON_EXTERNAL_INTERRUPT)
handle_external_interrupt_irqoff(vcpu);
else if (vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI)
- handle_exception_nmi_irqoff(vmx);
+ handle_exception_irqoff(vmx);
}
/*
@@ -7091,6 +7085,18 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
vmx_enable_fb_clear(vmx);
+ if (unlikely(vmx->fail))
+ vmx->exit_reason.full = 0xdead;
+ else
+ vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON);
+
+ if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI &&
+ is_nmi(vmx_get_intr_info(vcpu))) {
+ kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
+ asm("int $2");
+ kvm_after_interrupt(vcpu);
+ }
+
guest_state_exit_irqoff();
}
@@ -7232,12 +7238,9 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
vmx->idt_vectoring_info = 0;
- if (unlikely(vmx->fail)) {
- vmx->exit_reason.full = 0xdead;
+ if (unlikely(vmx->fail))
return EXIT_FASTPATH_NONE;
- }
- vmx->exit_reason.full = vmcs_read32(VM_EXIT_REASON);
if (unlikely((u16)vmx->exit_reason.basic == EXIT_REASON_MCE_DURING_VMENTRY))
kvm_machine_check();
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index a3da84f4ea45..eb52cfde5ec2 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -680,7 +680,7 @@ static inline unsigned long vmx_get_exit_qual(struct kvm_vcpu *vcpu)
return vmx->exit_qualification;
}
-static inline u32 vmx_get_intr_info(struct kvm_vcpu *vcpu)
+static __always_inline u32 vmx_get_intr_info(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 9de72586f406..44d1827f0a30 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -382,13 +382,13 @@ enum kvm_intr_type {
KVM_HANDLING_NMI,
};
-static inline void kvm_before_interrupt(struct kvm_vcpu *vcpu,
- enum kvm_intr_type intr)
+static __always_inline void kvm_before_interrupt(struct kvm_vcpu *vcpu,
+ enum kvm_intr_type intr)
{
WRITE_ONCE(vcpu->arch.handling_intr_from_guest, (u8)intr);
}
-static inline void kvm_after_interrupt(struct kvm_vcpu *vcpu)
+static __always_inline void kvm_after_interrupt(struct kvm_vcpu *vcpu)
{
WRITE_ONCE(vcpu->arch.handling_intr_from_guest, 0);
}
base-commit: 0fa32dad1e78629cb42999dacd82489503fdf4c2
--
next prev parent reply other threads:[~2022-11-23 20:43 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-10 6:15 [RESEND PATCH 0/6] x86/traps,VMX: implement software based NMI/IRQ dispatch for VMX NMI/IRQ reinjection Xin Li
2022-11-10 6:15 ` [RESEND PATCH 1/6] x86/traps: let common_interrupt() handle IRQ_MOVE_CLEANUP_VECTOR Xin Li
2022-11-10 18:59 ` Ashok Raj
2022-11-10 22:09 ` Li, Xin3
2022-11-10 6:15 ` [RESEND PATCH 2/6] x86/traps: add a system interrupt table for system interrupt dispatch Xin Li
2022-11-10 8:56 ` Peter Zijlstra
2022-11-10 19:55 ` Li, Xin3
2022-11-10 20:36 ` Li, Xin3
2022-11-10 21:12 ` Nathan Chancellor
2022-11-10 23:00 ` Li, Xin3
2022-11-11 0:08 ` Nathan Chancellor
2022-11-11 3:03 ` Li, Xin3
2022-11-11 8:58 ` Peter Zijlstra
2022-11-11 1:12 ` Tian, Kevin
2022-11-11 3:54 ` Li, Xin3
2022-11-11 8:55 ` Peter Zijlstra
2022-11-11 22:07 ` H. Peter Anvin
2022-11-12 9:47 ` Peter Zijlstra
2022-11-10 6:15 ` [RESEND PATCH 3/6] x86/traps: add install_system_interrupt_handler() Xin Li
2022-11-10 6:15 ` [RESEND PATCH 4/6] x86/traps: add external_interrupt() to dispatch external interrupts Xin Li
2022-11-10 16:24 ` Sean Christopherson
2022-11-10 18:02 ` Li, Xin3
2022-11-10 20:10 ` Sean Christopherson
2022-11-10 6:15 ` [RESEND PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi_irq() for NMI/IRQ reinjection Xin Li
2022-11-10 9:03 ` Peter Zijlstra
2022-11-10 20:53 ` Li, Xin3
2022-11-11 9:15 ` Peter Zijlstra
2022-11-11 12:04 ` Paolo Bonzini
2022-11-11 12:19 ` Peter Zijlstra
2022-11-11 12:48 ` Paolo Bonzini
2022-11-11 14:23 ` Peter Zijlstra
2022-11-11 16:35 ` Andrew Cooper
2022-11-11 22:22 ` H. Peter Anvin
2022-11-12 0:08 ` Andrew Cooper
2022-11-11 18:06 ` Li, Xin3
2022-11-11 19:33 ` Peter Zijlstra
2022-11-12 6:35 ` Li, Xin3
2022-11-14 4:39 ` Li, Xin3
2022-11-14 9:08 ` Peter Zijlstra
2022-11-15 7:50 ` Li, Xin3
2022-11-15 9:17 ` Peter Zijlstra
2022-11-17 3:37 ` Li, Xin3
2022-11-17 15:51 ` Sean Christopherson
2022-11-18 0:05 ` Li, Xin3
2022-11-22 13:00 ` Li, Xin3
2022-11-22 20:52 ` Sean Christopherson
2022-11-23 8:31 ` Li, Xin3
2022-11-23 20:42 ` Sean Christopherson [this message]
2022-11-24 3:40 ` Li, Xin3
2022-11-28 16:26 ` Sean Christopherson
2022-11-24 9:46 ` Peter Zijlstra
2022-11-28 19:05 ` Sean Christopherson
2022-11-23 9:16 ` Peter Zijlstra
2022-11-23 19:18 ` Sean Christopherson
2022-11-11 22:15 ` H. Peter Anvin
2022-11-10 6:15 ` [RESEND PATCH 6/6] x86/traps: remove unused NMI entry exc_nmi_noist() Xin Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y36Fy/OYO5u0AzEG@google.com \
--to=seanjc@google.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
--cc=xin3.li@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).