public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, dmatlack@google.com
Subject: Re: [PATCH v2 06/25] KVM: nVMX/nSVM: do not monkey-patch inject_page_fault callback
Date: Tue, 8 Mar 2022 20:34:49 +0000	[thread overview]
Message-ID: <Yie96ZqK8NYBOMYm@google.com> (raw)
In-Reply-To: <YieOvca6qbCDgrMl@google.com>

On Tue, Mar 08, 2022, Sean Christopherson wrote:
> On Mon, Feb 21, 2022, Paolo Bonzini wrote:
> > Currently, vendor code is patching the inject_page_fault and later, on
> > vmexit, expecting kvm_init_mmu to restore the inject_page_fault callback.
> > 
> > This is brittle, as exposed by the fact that SVM KVM_SET_NESTED_STATE
> > forgets to do it.  Instead, do the check at the time a page fault actually
> > has to be injected.  This does incur the cost of an extra retpoline
> > for nested vmexits when TDP is disabled, but is overall much cleaner.
> > While at it, add a comment that explains why the different behavior
> > is needed in this case.
> > 
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> 
> If I have NAK powers, NAK NAK NAK NAK NAK :-)
> 
> Forcing a VM-Exit is a hack, e.g. it's the entire reason inject_emulated_exception()
> returns a bool.  Even worse, it's confusing and misleading due to being incomplete.
> 
> The need hack for the hack is not unique to !tdp_enabled, the #DF can be triggered
> any time L0 is intercepting #PF.  Hello, allow_smaller_maxphyaddr.
> 
> And while I think allow_smaller_maxphyaddr should be burned with fire, architecturally
> it's still incomplete.  Any exception that is injected by KVM needs to be subjected
> to nested interception checks, not just #PF.  E.g. a #GP while vectoring a different
> fault should also be routed to L1.  KVM (mostly) gets away with special casing #PF
> because that's the only common scenario where L1 wants to intercept _and fix_ a fault
> that can occur while vectoring an exception.  E.g. in the #GP => #DF case, odds are
> very good that L1 will inject a #DF too, but that doesn't make KVM's behavior correct.
> 
> I have a series to handle this by performing the interception checks when an exception
> is queued, instead of when KVM injects the excepiton, and using a second kvm_queued_exception
> field to track exceptions that are queued for VM-Exit (so as not to lose the injected
> exception, which needs to be saved into vmc*12.  It's functional, though I haven't
> tested migration (requires minor shenanigans to perform interception checks for pending
> exceptions coming in from userspace).

Here's my preferred band-aid for this so we can make inject_page_fault() constant
without having to wait for a proper fix.  It's still putting lipstick on a pig,
but is a bit more complete and IMO better documents the mess.  This slots into
your series in place of your patch without much fuss.


From: Sean Christopherson <seanjc@google.com>
Date: Thu, 3 Mar 2022 20:20:17 -0800
Subject: [PATCH] KVM: x86: Clean up and document nested #PF workaround

Replace the per-vendor hack-a-fix for KVM's #PF => #PF => #DF workaround
with an explicit, common workaround in kvm_inject_emulated_page_fault().
Aside from being a hack, the current approach is brittle and incomplete,
e.g. nSVM's KVM_SET_NESTED_STATE fails to set ->inject_page_fault(),
and nVMX fails to apply the workaround when VMX is intercepting #PF due
to allow_smaller_maxphyaddr=1.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/svm/nested.c       | 15 ++++++++-------
 arch/x86/kvm/vmx/nested.c       | 15 ++++++---------
 arch/x86/kvm/x86.c              | 21 ++++++++++++++++++++-
 4 files changed, 36 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index da2f3a21e37b..c372a74acd9c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1496,6 +1496,8 @@ struct kvm_x86_ops {
 struct kvm_x86_nested_ops {
 	void (*leave_nested)(struct kvm_vcpu *vcpu);
 	int (*check_events)(struct kvm_vcpu *vcpu);
+	int (*handle_page_fault_workaround)(struct kvm_vcpu *vcpu,
+					    struct x86_exception *fault);
 	bool (*hv_timer_pending)(struct kvm_vcpu *vcpu);
 	void (*triple_fault)(struct kvm_vcpu *vcpu);
 	int (*get_state)(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 96bab464967f..dd942c719cf6 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -54,22 +54,25 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
 	nested_svm_vmexit(svm);
 }

-static void svm_inject_page_fault_nested(struct kvm_vcpu *vcpu, struct x86_exception *fault)
+static int nested_svm_handle_page_fault_workaround(struct kvm_vcpu *vcpu,
+						   struct x86_exception *fault)
 {
        struct vcpu_svm *svm = to_svm(vcpu);
+
        WARN_ON(!is_guest_mode(vcpu));

 	if (vmcb12_is_intercept(&svm->nested.ctl,
 				INTERCEPT_EXCEPTION_OFFSET + PF_VECTOR) &&
-	    !svm->nested.nested_run_pending) {
+	    !WARN_ON_ONCE(svm->nested.nested_run_pending)) {
                svm->vmcb->control.exit_code = SVM_EXIT_EXCP_BASE + PF_VECTOR;
                svm->vmcb->control.exit_code_hi = 0;
                svm->vmcb->control.exit_info_1 = fault->error_code;
                svm->vmcb->control.exit_info_2 = fault->address;
                nested_svm_vmexit(svm);
-       } else {
-               kvm_inject_page_fault(vcpu, fault);
+	       return 0;
        }
+
+	return -EINVAL;
 }

 static u64 nested_svm_get_tdp_pdptr(struct kvm_vcpu *vcpu, int index)
@@ -680,9 +683,6 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
 	if (ret)
 		return ret;

-	if (!npt_enabled)
-		vcpu->arch.mmu->inject_page_fault = svm_inject_page_fault_nested;
-
 	if (!from_vmrun)
 		kvm_make_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);

@@ -1567,6 +1567,7 @@ static bool svm_get_nested_state_pages(struct kvm_vcpu *vcpu)
 struct kvm_x86_nested_ops svm_nested_ops = {
 	.leave_nested = svm_leave_nested,
 	.check_events = svm_check_nested_events,
+	.handle_page_fault_workaround = nested_svm_handle_page_fault_workaround,
 	.triple_fault = nested_svm_triple_fault,
 	.get_nested_state_pages = svm_get_nested_state_pages,
 	.get_state = svm_get_nested_state,
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index f18744f7ff82..cc4c74339d35 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -476,24 +476,23 @@ static int nested_vmx_check_exception(struct kvm_vcpu *vcpu, unsigned long *exit
 	return 0;
 }

-
-static void vmx_inject_page_fault_nested(struct kvm_vcpu *vcpu,
-		struct x86_exception *fault)
+static int nested_vmx_handle_page_fault_workaround(struct kvm_vcpu *vcpu,
+						   struct x86_exception *fault)
 {
 	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);

 	WARN_ON(!is_guest_mode(vcpu));

 	if (nested_vmx_is_page_fault_vmexit(vmcs12, fault->error_code) &&
-		!to_vmx(vcpu)->nested.nested_run_pending) {
+	    !WARN_ON_ONCE(to_vmx(vcpu)->nested.nested_run_pending)) {
 		vmcs12->vm_exit_intr_error_code = fault->error_code;
 		nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
 				  PF_VECTOR | INTR_TYPE_HARD_EXCEPTION |
 				  INTR_INFO_DELIVER_CODE_MASK | INTR_INFO_VALID_MASK,
 				  fault->address);
-	} else {
-		kvm_inject_page_fault(vcpu, fault);
+		return 0;
 	}
+	return -EINVAL;
 }

 static int nested_vmx_check_io_bitmap_controls(struct kvm_vcpu *vcpu,
@@ -2614,9 +2613,6 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 		vmcs_write64(GUEST_PDPTR3, vmcs12->guest_pdptr3);
 	}

-	if (!enable_ept)
-		vcpu->arch.walk_mmu->inject_page_fault = vmx_inject_page_fault_nested;
-
 	if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL) &&
 	    WARN_ON_ONCE(kvm_set_msr(vcpu, MSR_CORE_PERF_GLOBAL_CTRL,
 				     vmcs12->guest_ia32_perf_global_ctrl))) {
@@ -6804,6 +6800,7 @@ __init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *))
 struct kvm_x86_nested_ops vmx_nested_ops = {
 	.leave_nested = vmx_leave_nested,
 	.check_events = vmx_check_nested_events,
+	.handle_page_fault_workaround = nested_vmx_handle_page_fault_workaround,
 	.hv_timer_pending = nested_vmx_preemption_timer_pending,
 	.triple_fault = nested_vmx_triple_fault,
 	.get_state = vmx_get_nested_state,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7fa1bdd9909e..010fb54a9a82 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -748,6 +748,7 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
 }
 EXPORT_SYMBOL_GPL(kvm_inject_page_fault);

+/* Returns true if the page fault was immediately morphed into a VM-Exit. */
 bool kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
 				    struct x86_exception *fault)
 {
@@ -766,8 +767,26 @@ bool kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
 		kvm_mmu_invalidate_gva(vcpu, fault_mmu, fault->address,
 				       fault_mmu->root.hpa);

+	/*
+	 * A workaround for KVM's bad exception handling.  If KVM injected an
+	 * exception into L2, and L2 encountered a #PF while vectoring the
+	 * injected exception, manually check to see if L1 wants to intercept
+	 * #PF, otherwise queuing the #PF will lead to #DF or a lost exception.
+	 * In all other cases, defer the check to nested_ops->check_events(),
+	 * which will correctly handle priority (this does not).  Note, other
+	 * exceptions, e.g. #GP, are theoretically affected, #PF is simply the
+	 * most problematic, e.g. when L0 and L1 are both intercepting #PF for
+	 * shadow paging.
+	 *
+	 * TODO: Rewrite exception handling to track injected and pending
+	 *       (VM-Exit) exceptions separately.
+	 */
+	if (unlikely(vcpu->arch.exception.injected && is_guest_mode(vcpu)) &&
+	    !kvm_x86_ops.nested_ops->handle_page_fault_workaround(vcpu, fault))
+		return true;
+
 	fault_mmu->inject_page_fault(vcpu, fault);
-	return fault->nested_page_fault;
+	return false;
 }
 EXPORT_SYMBOL_GPL(kvm_inject_emulated_page_fault);


base-commit: bb92fb66dcf8735c5190f415fed587bff7dd6717
--


  reply	other threads:[~2022-03-08 20:34 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-21 16:22 [PATCH v2 00/25] KVM MMU refactoring part 2: role changes Paolo Bonzini
2022-02-21 16:22 ` [PATCH v2 01/25] KVM: x86/mmu: avoid indirect call for get_cr3 Paolo Bonzini
2022-03-08 16:16   ` Sean Christopherson
2022-03-08 16:21     ` Paolo Bonzini
2022-03-08 16:32       ` Sean Christopherson
2022-03-08 16:43         ` Paolo Bonzini
2022-03-08 16:53           ` Sean Christopherson
2022-03-08 17:14             ` Paolo Bonzini
2022-02-21 16:22 ` [PATCH v2 02/25] KVM: x86/mmu: nested EPT cannot be used in SMM Paolo Bonzini
2022-03-08 16:18   ` Sean Christopherson
2022-02-21 16:22 ` [PATCH v2 03/25] KVM: x86/mmu: constify uses of struct kvm_mmu_role_regs Paolo Bonzini
2022-03-08 16:22   ` Sean Christopherson
2022-02-21 16:22 ` [PATCH v2 04/25] KVM: x86/mmu: pull computation of kvm_mmu_role_regs to kvm_init_mmu Paolo Bonzini
2022-02-21 16:22 ` [PATCH v2 05/25] KVM: x86/mmu: rephrase unclear comment Paolo Bonzini
2022-03-08 16:39   ` Sean Christopherson
2022-02-21 16:22 ` [PATCH v2 06/25] KVM: nVMX/nSVM: do not monkey-patch inject_page_fault callback Paolo Bonzini
2022-03-08 17:13   ` Sean Christopherson
2022-03-08 20:34     ` Sean Christopherson [this message]
2022-02-21 16:22 ` [PATCH v2 07/25] KVM: x86/mmu: remove "bool base_only" arguments Paolo Bonzini
2022-03-08 17:15   ` Sean Christopherson
2022-02-21 16:22 ` [PATCH v2 08/25] KVM: x86/mmu: split cpu_mode from mmu_role Paolo Bonzini
2022-03-08 17:36   ` Sean Christopherson
2022-03-08 17:49     ` Paolo Bonzini
2022-03-08 18:55   ` Sean Christopherson
2022-03-09  9:58     ` Paolo Bonzini
2022-03-09 15:38       ` Sean Christopherson
2022-03-09 15:40         ` Paolo Bonzini
2022-02-21 16:22 ` [PATCH v2 09/25] KVM: x86/mmu: do not recompute root level from kvm_mmu_role_regs Paolo Bonzini
2022-03-08 17:41   ` Sean Christopherson
2022-02-21 16:22 ` [PATCH v2 10/25] KVM: x86/mmu: remove ept_ad field Paolo Bonzini
2022-03-08 17:42   ` Sean Christopherson
2022-02-21 16:22 ` [PATCH v2 11/25] KVM: x86/mmu: remove kvm_calc_shadow_root_page_role_common Paolo Bonzini
2022-03-08 17:48   ` Sean Christopherson
2022-03-08 17:50     ` Paolo Bonzini
2022-03-08 18:17       ` Sean Christopherson
2022-03-08 18:18         ` Paolo Bonzini
2022-02-21 16:22 ` [PATCH v2 12/25] KVM: x86/mmu: cleanup computation of MMU roles for two-dimensional paging Paolo Bonzini
2022-03-08 18:11   ` Sean Christopherson
2022-03-08 18:24     ` Paolo Bonzini
2022-03-08 18:44       ` Sean Christopherson
2022-03-08 18:38     ` Sean Christopherson
2022-02-21 16:22 ` [PATCH v2 13/25] KVM: x86/mmu: cleanup computation of MMU roles for shadow paging Paolo Bonzini
2022-02-21 16:22 ` [PATCH v2 14/25] KVM: x86/mmu: store shadow EFER.NX in the MMU role Paolo Bonzini
2022-02-21 16:22 ` [PATCH v2 15/25] KVM: x86/mmu: remove extended bits from mmu_role, rename field Paolo Bonzini
2022-03-08 19:02   ` Sean Christopherson
2022-02-21 16:22 ` [PATCH v2 16/25] KVM: x86/mmu: rename kvm_mmu_role union Paolo Bonzini
2022-03-08 19:15   ` Sean Christopherson
2022-02-21 16:22 ` [PATCH v2 17/25] KVM: x86/mmu: remove redundant bits from extended role Paolo Bonzini
2022-02-21 16:22 ` [PATCH v2 18/25] KVM: x86/mmu: remove valid " Paolo Bonzini
2022-02-21 16:22 ` [PATCH v2 19/25] KVM: x86/mmu: simplify and/or inline computation of shadow MMU roles Paolo Bonzini
2022-03-08 19:35   ` Sean Christopherson
2022-03-08 19:41     ` Sean Christopherson
2022-03-09 10:33     ` Paolo Bonzini
2022-02-21 16:22 ` [PATCH v2 20/25] KVM: x86/mmu: pull CPU mode computation to kvm_init_mmu Paolo Bonzini
2022-03-08 19:45   ` Sean Christopherson
2022-02-21 16:22 ` [PATCH v2 21/25] KVM: x86/mmu: replace shadow_root_level with root_role.level Paolo Bonzini
2022-03-08 19:48   ` Sean Christopherson
2022-02-21 16:22 ` [PATCH v2 22/25] KVM: x86/mmu: replace root_level with cpu_mode.base.level Paolo Bonzini
2022-03-08 19:49   ` Sean Christopherson
2022-02-21 16:22 ` [PATCH v2 23/25] KVM: x86/mmu: replace direct_map with root_role.direct Paolo Bonzini
2022-03-08 19:52   ` Sean Christopherson
2022-02-21 16:22 ` [PATCH v2 24/25] KVM: x86/mmu: initialize constant-value fields just once Paolo Bonzini
2022-03-08 20:58   ` Sean Christopherson
2022-03-09 10:34     ` Paolo Bonzini
2022-02-21 16:22 ` [PATCH v2 25/25] KVM: x86/mmu: extract initialization of the page walking data Paolo Bonzini
2022-03-08 20:02   ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yie96ZqK8NYBOMYm@google.com \
    --to=seanjc@google.com \
    --cc=dmatlack@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox