[PATCH v2 00/17] KVM: x86/mmu: Page fault and MMIO cleanups

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 00/17] KVM: x86/mmu: Page fault and MMIO cleanups
@ 2024-05-07 15:58 Paolo Bonzini
  2024-05-07 15:58 ` [PATCH 01/17] KVM: x86/mmu: Exit to userspace with -EFAULT if private fault hits emulation Paolo Bonzini
                   ` (16 more replies)
  0 siblings, 17 replies; 34+ messages in thread
From: Paolo Bonzini @ 2024-05-07 15:58 UTC (permalink / raw)
  To: linux-kernel, kvm

This is an updated version of the series at
https://patchew.org/linux/20240228024147.41573-1-seanjc@google.com/
which has been used for SEV-SNP development, taking into account
all review comments.  Patch 8 is the only completely new patch
(and even then it had been posted together with the various
TDX/SNP prep series).

Here is an annotated git range-diff:

==============================================================================
KVM: x86/mmu: Exit to userspace with -EFAULT if private fault hits emulation
- tweak commit message, add comment

    @@ Commit message
         Exit to userspace with -EFAULT / KVM_EXIT_MEMORY_FAULT if a private fault
         triggers emulation of any kind, as KVM doesn't currently support emulating
         access to guest private memory.  Practically speaking, private faults and
    -    emulation are already mutually exclusive, but there are edge cases upon
    -    edge cases where KVM can return RET_PF_EMULATE, and adding one last check
    -    to harden against weird, unexpected combinations is inexpensive.
    +    emulation are already mutually exclusive, but there are many flow that
    +    can result in KVM returning RET_PF_EMULATE, and adding one last check
    +    to harden against weird, unexpected combinations and/or KVM bugs is
    +    inexpensive.
     
         Suggested-by: Yan Zhao <yan.y.zhao@intel.com>
         Signed-off-by: Sean Christopherson <seanjc@google.com>
    @@ arch/x86/kvm/mmu/mmu_internal.h: static inline int kvm_mmu_do_page_fault(struct
      	else
      		r = vcpu->arch.mmu->page_fault(vcpu, &fault);
      
    ++	/*
    ++	 * Not sure what's happening, but punt to userspace and hope that
    ++	 * they can fix it by changing memory to shared, or they can
    ++	 * provide a better error.
    ++	 */
     +	if (r == RET_PF_EMULATE && fault.is_private) {
    ++		pr_warn_ratelimited("kvm: unexpected emulation request on private memory\n");
     +		kvm_mmu_prepare_memory_fault_exit(vcpu, &fault);
     +		return -EFAULT;
     +	}


==============================================================================
KVM: x86: Remove separate "bit" defines for page fault error code masks
- do not use ilog2

    @@ Commit message
         just to see which flag corresponds to which bit is quite annoying, as is
         having to define two macros just to add recognition of a new flag.
     
    -    Use ilog2() to derive the bit in permission_fault(), the one function that
    -    actually needs the bit number (it does clever shifting to manipulate flags
    -    in order to avoid conditional branches).
    +    Use ternary operator to derive the bit in permission_fault(), the one
    +    function that actually needs the bit number as part of clever shifting
    +    to avoid conditional branches.  Generally the compiler is able to turn
    +    it into a conditional move, and if not it's not really a big deal.
     
         No functional change intended.
     
    @@ arch/x86/kvm/mmu.h: static inline u8 permission_fault(struct kvm_vcpu *vcpu, str
      	u64 implicit_access = access & PFERR_IMPLICIT_ACCESS;
      	bool not_smap = ((rflags & X86_EFLAGS_AC) | implicit_access) == X86_EFLAGS_AC;
     -	int index = (pfec + (not_smap << PFERR_RSVD_BIT)) >> 1;
    -+	int index = (pfec + (not_smap << ilog2(PFERR_RSVD_MASK))) >> 1;
    ++	int index = (pfec | (not_smap ? PFERR_RSVD_MASK : 0)) >> 1;
      	u32 errcode = PFERR_PRESENT_MASK;
      	bool fault;
      
     @@ arch/x86/kvm/mmu.h: static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
    + 		pkru_bits = (vcpu->arch.pkru >> (pte_pkey * 2)) & 3;
      
      		/* clear present bit, replace PFEC.RSVD with ACC_USER_MASK. */
    - 		offset = (pfec & ~1) +
    +-		offset = (pfec & ~1) +
     -			((pte_access & PT_USER_MASK) << (PFERR_RSVD_BIT - PT_USER_SHIFT));
    -+			((pte_access & PT_USER_MASK) << (ilog2(PFERR_RSVD_MASK) - PT_USER_SHIFT));
    ++		offset = (pfec & ~1) | ((pte_access & PT_USER_MASK) ? PFERR_RSVD_MASK : 0);
      
      		pkru_bits &= mmu->pkru_mask >> offset;
      		errcode |= -pkru_bits & PFERR_PK_MASK;

==============================================================================
KVM: x86: Define more SEV+ page fault error bits/flags for #NPF
- match commit message to bits defined in header file

    @@ Commit message
         Define more #NPF error code flags that are relevant to SEV+ (mostly SNP)
         guests, as specified by the APM:
     
    +     * Bit 31 (RMP):   Set to 1 if the fault was caused due to an RMP check or a
    +                       VMPL check failure, 0 otherwise.
          * Bit 34 (ENC):   Set to 1 if the guest’s effective C-bit was 1, 0 otherwise.
          * Bit 35 (SIZEM): Set to 1 if the fault was caused by a size mismatch between
                            PVALIDATE or RMPADJUST and the RMP, 0 otherwise.
          * Bit 36 (VMPL):  Set to 1 if the fault was caused by a VMPL permission
                            check failure, 0 otherwise.
    -     * Bit 37 (SSS):   Set to VMPL permission mask SSS (bit 4) value if VmplSSS is
    -                       enabled.
     
         Note, the APM is *extremely* misleading, and strongly implies that the
         above flags can _only_ be set for #NPF exits from SNP guests.  That is a


==============================================================================
KVM: x86: Move synthetic PFERR_* sanity checks to SVM's #NPF handler
- add a description of PFERR_IMPLICIT_ACCESS

    @@ Commit message
         Add a compile-time assert in the legacy #PF handler to make sure that KVM-
         define flags are covered by its existing sanity check on the upper bits.
     
    +    Opportunistically add a description of PFERR_IMPLICIT_ACCESS, since we
    +    are removing the comment that defined it.
    +
         Signed-off-by: Sean Christopherson <seanjc@google.com>
    +    Reviewed-by: Kai Huang <kai.huang@intel.com>
    +    Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
         Message-ID: <20240228024147.41573-8-seanjc@google.com>
    +    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    +
    + ## arch/x86/include/asm/kvm_host.h ##
    +@@ arch/x86/include/asm/kvm_host.h: enum x86_intercept_stage;
    + #define PFERR_GUEST_ENC_MASK	BIT_ULL(34)
    + #define PFERR_GUEST_SIZEM_MASK	BIT_ULL(35)
    + #define PFERR_GUEST_VMPL_MASK	BIT_ULL(36)
    ++
    ++/*
    ++ * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP checks
    ++ * when emulating instructions that triggers implicit access.
    ++ */
    + #define PFERR_IMPLICIT_ACCESS	BIT_ULL(48)
    ++#define PFERR_SYNTHETIC_MASK	(PFERR_IMPLICIT_ACCESS)
    + 
    + #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK |	\
    + 				 PFERR_WRITE_MASK |		\
     
      ## arch/x86/kvm/mmu/mmu.c ##
     @@ arch/x86/kvm/mmu/mmu.c: int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
    - 	if (WARN_ON_ONCE(error_code >> 32))
    - 		error_code = lower_32_bits(error_code);
    + 		return -EFAULT;
    + #endif
      
     +	/* Ensure the above sanity check also covers KVM-defined flags. */
     +	BUILD_BUG_ON(lower_32_bits(PFERR_SYNTHETIC_MASK));

- move in front of the other synthetic page fault error code patches

==============================================================================
KVM: x86/mmu: WARN if upper 32 bits of legacy #PF error code are non-zero
  commit message copy editing

    @@ Commit message
         and even more explicitly in the SDM as VMCS.VM_EXIT_INTR_ERROR_CODE is a
         32-bit field.
     
    -    Simply drop the upper bits of hardware provides garbage, as spurious
    +    Simply drop the upper bits if hardware provides garbage, as spurious
         information should do no harm (though in all likelihood hardware is buggy
         and the kernel is doomed).
     
    @@ Commit message
         which in turn will allow deriving PFERR_PRIVATE_ACCESS from AMD's
         PFERR_GUEST_ENC_MASK without running afoul of the sanity check.
     
    -    Note, this also why Intel uses bit 15 for SGX (highest bit on Intel CPUs)
    +    Note, this is also why Intel uses bit 15 for SGX (highest bit on Intel CPUs)
         and AMD uses bit 31 for RMP (highest bit on AMD CPUs); using the highest
         bit minimizes the probability of a collision with the "other" vendor,
         without needing to plumb more bits through microcode.


==============================================================================
KVM: x86/mmu: Use synthetic page fault error code to indicate private faults
- patch reordering, no other changes

==============================================================================
KVM: x86/mmu: check for invalid async page faults involving private memory
- new patch coming from TDX/SNP prep series; test PFERR_PRIVATE_ACCESS, set arch.error_code

    @@ arch/x86/kvm/mmu/mmu.c: static u32 alloc_apf_token(struct kvm_vcpu *vcpu)
      	arch.token = alloc_apf_token(vcpu);
     -	arch.gfn = gfn;
     +	arch.gfn = fault->gfn;
    ++	arch.error_code = fault->error_code;
      	arch.direct_map = vcpu->arch.mmu->root_role.direct;
      	arch.cr3 = kvm_mmu_get_guest_pgd(vcpu, vcpu->arch.mmu);
      
    @@ arch/x86/kvm/mmu/mmu.c: static u32 alloc_apf_token(struct kvm_vcpu *vcpu)
      {
      	int r;
      
    -+	if (WARN_ON_ONCE(work->arch.error_code & PFERR_GUEST_ENC_MASK))
    ++	if (WARN_ON_ONCE(work->arch.error_code & PFERR_PRIVATE_ACCESS))
     +		return;
     +
      	if ((vcpu->arch.mmu->root_role.direct != work->arch.direct_map) ||


==============================================================================
KVM: x86/mmu: WARN and skip MMIO cache on private, reserved page faults
- exit to userspace if the wrong case happens, test PFERR_PRIVATE_ACCESS

    @@ Commit message
     
      ## arch/x86/kvm/mmu/mmu.c ##
     @@ arch/x86/kvm/mmu/mmu.c: int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
    - 		error_code |= PFERR_PRIVATE_ACCESS;
      
      	r = RET_PF_INVALID;
    --	if (unlikely(error_code & PFERR_RSVD_MASK)) {
    -+	if (unlikely((error_code & PFERR_RSVD_MASK) &&
    -+		     !WARN_ON_ONCE(error_code & PFERR_GUEST_ENC_MASK))) {
    + 	if (unlikely(error_code & PFERR_RSVD_MASK)) {
    ++		if (WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))
    ++			return -EFAULT;
    ++
      		r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct);
      		if (r == RET_PF_EMULATE)
      			goto emulate;

==============================================================================
KVM: x86/mmu: Move private vs. shared check above slot validity checks
- add comment about use of mmu_invalidate_seq

    @@ arch/x86/kvm/mmu/mmu.c: static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, stru
      		return kvm_faultin_pfn_private(vcpu, fault);
      
     @@ arch/x86/kvm/mmu/mmu.c: static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
    + {
    + 	int ret;
    + 
    ++	/*
    ++	 * Note that the mmu_invalidate_seq also serves to detect a concurrent
    ++	 * change in attributes.  is_page_fault_stale() will detect an
    ++	 * invalidation relate to fault->fn and resume the guest without
    ++	 * installing a mapping in the page tables.
    ++	 */
      	fault->mmu_seq = vcpu->kvm->mmu_invalidate_seq;
      	smp_rmb();
      
     +	/*
    -+	 * Check for a private vs. shared mismatch *after* taking a snapshot of
    -+	 * mmu_invalidate_seq, as changes to gfn attributes are guarded by the
    -+	 * invalidation notifier.
    ++	 * Now that we have a snapshot of mmu_invalidate_seq we can check for a
    ++	 * private vs. shared mismatch.
     +	 */
     +	if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) {
     +		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);



==============================================================================
KVM: x86/mmu: Move slot checks from __kvm_faultin_pfn() to kvm_faultin_pfn()
- differences in moved code, range-diff is unreadable


==============================================================================
KVM: x86/mmu: Handle no-slot faults at the beginning of kvm_faultin_pfn()
- remove unnecessary change

    @@ arch/x86/kvm/mmu/mmu.c: static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct
      		return kvm_handle_noslot_fault(vcpu, fault, access);
      
      	/*
    -
    - ## arch/x86/kvm/mmu/mmu_internal.h ##
    -@@ arch/x86/kvm/mmu/mmu_internal.h: struct kvm_page_fault {
    - 	/* The memslot containing gfn. May be NULL. */
    - 	struct kvm_memory_slot *slot;
    - 
    --	/* Outputs of kvm_faultin_pfn.  */
    -+	/* Outputs of kvm_faultin_pfn. */
    - 	unsigned long mmu_seq;
    - 	kvm_pfn_t pfn;
    - 	hva_t hva;


Isaku Yamahata (1):
  KVM: x86/mmu: Pass full 64-bit error code when handling page faults

Paolo Bonzini (1):
  KVM: x86/mmu: check for invalid async page faults involving private
    memory

Sean Christopherson (15):
  KVM: x86/mmu: Exit to userspace with -EFAULT if private fault hits
    emulation
  KVM: x86: Remove separate "bit" defines for page fault error code
    masks
  KVM: x86: Define more SEV+ page fault error bits/flags for #NPF
  KVM: x86: Move synthetic PFERR_* sanity checks to SVM's #NPF handler
  KVM: x86/mmu: WARN if upper 32 bits of legacy #PF error code are
    non-zero
  KVM: x86/mmu: Use synthetic page fault error code to indicate private
    faults
  KVM: x86/mmu: WARN and skip MMIO cache on private, reserved page
    faults
  KVM: x86/mmu: Move private vs. shared check above slot validity checks
  KVM: x86/mmu: Don't force emulation of L2 accesses to non-APIC
    internal slots
  KVM: x86/mmu: Explicitly disallow private accesses to emulated MMIO
  KVM: x86/mmu: Move slot checks from __kvm_faultin_pfn() to
    kvm_faultin_pfn()
  KVM: x86/mmu: Handle no-slot faults at the beginning of
    kvm_faultin_pfn()
  KVM: x86/mmu: Set kvm_page_fault.hva to KVM_HVA_ERR_BAD for "no slot"
    faults
  KVM: x86/mmu: Initialize kvm_page_fault's pfn and hva to error values
  KVM: x86/mmu: Sanity check that __kvm_faultin_pfn() doesn't create
    noslot pfns

 arch/x86/include/asm/kvm_host.h |  46 ++++----
 arch/x86/kvm/mmu.h              |   5 +-
 arch/x86/kvm/mmu/mmu.c          | 182 ++++++++++++++++++++------------
 arch/x86/kvm/mmu/mmu_internal.h |  28 ++++-
 arch/x86/kvm/mmu/mmutrace.h     |   2 +-
 arch/x86/kvm/svm/svm.c          |   9 ++
 6 files changed, 174 insertions(+), 98 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 01/17] KVM: x86/mmu: Exit to userspace with -EFAULT if private fault hits emulation
  2024-05-07 15:58 [PATCH v2 00/17] KVM: x86/mmu: Page fault and MMIO cleanups Paolo Bonzini
@ 2024-05-07 15:58 ` Paolo Bonzini
  2024-05-13  5:25   ` Xiaoyao Li
  2024-05-07 15:58 ` [PATCH 02/17] KVM: x86: Remove separate "bit" defines for page fault error code masks Paolo Bonzini
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2024-05-07 15:58 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Sean Christopherson, Yan Zhao

From: Sean Christopherson <seanjc@google.com>

Exit to userspace with -EFAULT / KVM_EXIT_MEMORY_FAULT if a private fault
triggers emulation of any kind, as KVM doesn't currently support emulating
access to guest private memory.  Practically speaking, private faults and
emulation are already mutually exclusive, but there are many flow that
can result in KVM returning RET_PF_EMULATE, and adding one last check
to harden against weird, unexpected combinations and/or KVM bugs is
inexpensive.

Suggested-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240228024147.41573-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c          |  8 --------
 arch/x86/kvm/mmu/mmu_internal.h | 19 +++++++++++++++++++
 2 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 45b6d8f9e359..c72a2033ca96 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4257,14 +4257,6 @@ static inline u8 kvm_max_level_for_order(int order)
 	return PG_LEVEL_4K;
 }
 
-static void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
-					      struct kvm_page_fault *fault)
-{
-	kvm_prepare_memory_fault_exit(vcpu, fault->gfn << PAGE_SHIFT,
-				      PAGE_SIZE, fault->write, fault->exec,
-				      fault->is_private);
-}
-
 static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
 				   struct kvm_page_fault *fault)
 {
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 5390a591a571..61f49967047a 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -279,6 +279,14 @@ enum {
 	RET_PF_SPURIOUS,
 };
 
+static inline void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
+						     struct kvm_page_fault *fault)
+{
+	kvm_prepare_memory_fault_exit(vcpu, fault->gfn << PAGE_SHIFT,
+				      PAGE_SIZE, fault->write, fault->exec,
+				      fault->is_private);
+}
+
 static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 					u32 err, bool prefetch, int *emulation_type)
 {
@@ -320,6 +328,17 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	else
 		r = vcpu->arch.mmu->page_fault(vcpu, &fault);
 
+	/*
+	 * Not sure what's happening, but punt to userspace and hope that
+	 * they can fix it by changing memory to shared, or they can
+	 * provide a better error.
+	 */
+	if (r == RET_PF_EMULATE && fault.is_private) {
+		pr_warn_ratelimited("kvm: unexpected emulation request on private memory\n");
+		kvm_mmu_prepare_memory_fault_exit(vcpu, &fault);
+		return -EFAULT;
+	}
+
 	if (fault.write_fault_to_shadow_pgtable && emulation_type)
 		*emulation_type |= EMULTYPE_WRITE_PF_TO_SP;
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 02/17] KVM: x86: Remove separate "bit" defines for page fault error code masks
  2024-05-07 15:58 [PATCH v2 00/17] KVM: x86/mmu: Page fault and MMIO cleanups Paolo Bonzini
  2024-05-07 15:58 ` [PATCH 01/17] KVM: x86/mmu: Exit to userspace with -EFAULT if private fault hits emulation Paolo Bonzini
@ 2024-05-07 15:58 ` Paolo Bonzini
  2024-05-13  5:29   ` Xiaoyao Li
  2024-05-07 15:58 ` [PATCH 03/17] KVM: x86: Define more SEV+ page fault error bits/flags for #NPF Paolo Bonzini
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2024-05-07 15:58 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Sean Christopherson

From: Sean Christopherson <seanjc@google.com>

Open code the bit number directly in the PFERR_* masks and drop the
intermediate PFERR_*_BIT defines, as having to bounce through two macros
just to see which flag corresponds to which bit is quite annoying, as is
having to define two macros just to add recognition of a new flag.

Use ternary operator to derive the bit in permission_fault(), the one
function that actually needs the bit number as part of clever shifting
to avoid conditional branches.  Generally the compiler is able to turn
it into a conditional move, and if not it's not really a big deal.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20240228024147.41573-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 32 ++++++++++----------------------
 arch/x86/kvm/mmu.h              |  5 ++---
 2 files changed, 12 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9f92bdb78504..a047480da5af 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -254,28 +254,16 @@ enum x86_intercept_stage;
 	KVM_GUESTDBG_INJECT_DB | \
 	KVM_GUESTDBG_BLOCKIRQ)
 
-
-#define PFERR_PRESENT_BIT 0
-#define PFERR_WRITE_BIT 1
-#define PFERR_USER_BIT 2
-#define PFERR_RSVD_BIT 3
-#define PFERR_FETCH_BIT 4
-#define PFERR_PK_BIT 5
-#define PFERR_SGX_BIT 15
-#define PFERR_GUEST_FINAL_BIT 32
-#define PFERR_GUEST_PAGE_BIT 33
-#define PFERR_IMPLICIT_ACCESS_BIT 48
-
-#define PFERR_PRESENT_MASK	BIT(PFERR_PRESENT_BIT)
-#define PFERR_WRITE_MASK	BIT(PFERR_WRITE_BIT)
-#define PFERR_USER_MASK		BIT(PFERR_USER_BIT)
-#define PFERR_RSVD_MASK		BIT(PFERR_RSVD_BIT)
-#define PFERR_FETCH_MASK	BIT(PFERR_FETCH_BIT)
-#define PFERR_PK_MASK		BIT(PFERR_PK_BIT)
-#define PFERR_SGX_MASK		BIT(PFERR_SGX_BIT)
-#define PFERR_GUEST_FINAL_MASK	BIT_ULL(PFERR_GUEST_FINAL_BIT)
-#define PFERR_GUEST_PAGE_MASK	BIT_ULL(PFERR_GUEST_PAGE_BIT)
-#define PFERR_IMPLICIT_ACCESS	BIT_ULL(PFERR_IMPLICIT_ACCESS_BIT)
+#define PFERR_PRESENT_MASK	BIT(0)
+#define PFERR_WRITE_MASK	BIT(1)
+#define PFERR_USER_MASK		BIT(2)
+#define PFERR_RSVD_MASK		BIT(3)
+#define PFERR_FETCH_MASK	BIT(4)
+#define PFERR_PK_MASK		BIT(5)
+#define PFERR_SGX_MASK		BIT(15)
+#define PFERR_GUEST_FINAL_MASK	BIT_ULL(32)
+#define PFERR_GUEST_PAGE_MASK	BIT_ULL(33)
+#define PFERR_IMPLICIT_ACCESS	BIT_ULL(48)
 
 #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK |	\
 				 PFERR_WRITE_MASK |		\
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 60f21bb4c27b..2343c9f00e31 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -213,7 +213,7 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 	 */
 	u64 implicit_access = access & PFERR_IMPLICIT_ACCESS;
 	bool not_smap = ((rflags & X86_EFLAGS_AC) | implicit_access) == X86_EFLAGS_AC;
-	int index = (pfec + (not_smap << PFERR_RSVD_BIT)) >> 1;
+	int index = (pfec | (not_smap ? PFERR_RSVD_MASK : 0)) >> 1;
 	u32 errcode = PFERR_PRESENT_MASK;
 	bool fault;
 
@@ -234,8 +234,7 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 		pkru_bits = (vcpu->arch.pkru >> (pte_pkey * 2)) & 3;
 
 		/* clear present bit, replace PFEC.RSVD with ACC_USER_MASK. */
-		offset = (pfec & ~1) +
-			((pte_access & PT_USER_MASK) << (PFERR_RSVD_BIT - PT_USER_SHIFT));
+		offset = (pfec & ~1) | ((pte_access & PT_USER_MASK) ? PFERR_RSVD_MASK : 0);
 
 		pkru_bits &= mmu->pkru_mask >> offset;
 		errcode |= -pkru_bits & PFERR_PK_MASK;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 03/17] KVM: x86: Define more SEV+ page fault error bits/flags for #NPF
  2024-05-07 15:58 [PATCH v2 00/17] KVM: x86/mmu: Page fault and MMIO cleanups Paolo Bonzini
  2024-05-07 15:58 ` [PATCH 01/17] KVM: x86/mmu: Exit to userspace with -EFAULT if private fault hits emulation Paolo Bonzini
  2024-05-07 15:58 ` [PATCH 02/17] KVM: x86: Remove separate "bit" defines for page fault error code masks Paolo Bonzini
@ 2024-05-07 15:58 ` Paolo Bonzini
  2024-05-07 15:58 ` [PATCH 04/17] KVM: x86: Move synthetic PFERR_* sanity checks to SVM's #NPF handler Paolo Bonzini
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 34+ messages in thread
From: Paolo Bonzini @ 2024-05-07 15:58 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Sean Christopherson

From: Sean Christopherson <seanjc@google.com>

Define more #NPF error code flags that are relevant to SEV+ (mostly SNP)
guests, as specified by the APM:

 * Bit 31 (RMP):   Set to 1 if the fault was caused due to an RMP check or a
                   VMPL check failure, 0 otherwise.
 * Bit 34 (ENC):   Set to 1 if the guest’s effective C-bit was 1, 0 otherwise.
 * Bit 35 (SIZEM): Set to 1 if the fault was caused by a size mismatch between
                   PVALIDATE or RMPADJUST and the RMP, 0 otherwise.
 * Bit 36 (VMPL):  Set to 1 if the fault was caused by a VMPL permission
                   check failure, 0 otherwise.

Note, the APM is *extremely* misleading, and strongly implies that the
above flags can _only_ be set for #NPF exits from SNP guests.  That is a
lie, as bit 34 (C-bit=1, i.e. was encrypted) can be set when running _any_
flavor of SEV guest on SNP capable hardware.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240228024147.41573-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a047480da5af..58bbcf76ad1e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -261,8 +261,12 @@ enum x86_intercept_stage;
 #define PFERR_FETCH_MASK	BIT(4)
 #define PFERR_PK_MASK		BIT(5)
 #define PFERR_SGX_MASK		BIT(15)
+#define PFERR_GUEST_RMP_MASK	BIT_ULL(31)
 #define PFERR_GUEST_FINAL_MASK	BIT_ULL(32)
 #define PFERR_GUEST_PAGE_MASK	BIT_ULL(33)
+#define PFERR_GUEST_ENC_MASK	BIT_ULL(34)
+#define PFERR_GUEST_SIZEM_MASK	BIT_ULL(35)
+#define PFERR_GUEST_VMPL_MASK	BIT_ULL(36)
 #define PFERR_IMPLICIT_ACCESS	BIT_ULL(48)
 
 #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK |	\
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 04/17] KVM: x86: Move synthetic PFERR_* sanity checks to SVM's #NPF handler
  2024-05-07 15:58 [PATCH v2 00/17] KVM: x86/mmu: Page fault and MMIO cleanups Paolo Bonzini
                   ` (2 preceding siblings ...)
  2024-05-07 15:58 ` [PATCH 03/17] KVM: x86: Define more SEV+ page fault error bits/flags for #NPF Paolo Bonzini
@ 2024-05-07 15:58 ` Paolo Bonzini
  2024-05-13  5:50   ` Xiaoyao Li
  2024-05-07 15:58 ` [PATCH 05/17] KVM: x86/mmu: Pass full 64-bit error code when handling page faults Paolo Bonzini
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2024-05-07 15:58 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Sean Christopherson, Kai Huang, Binbin Wu

From: Sean Christopherson <seanjc@google.com>

Move the sanity check that hardware never sets bits that collide with KVM-
define synthetic bits from kvm_mmu_page_fault() to npf_interception(),
i.e. make the sanity check #NPF specific.  The legacy #PF path already
WARNs if _any_ of bits 63:32 are set, and the error code that comes from
VMX's EPT Violatation and Misconfig is 100% synthesized (KVM morphs VMX's
EXIT_QUALIFICATION into error code flags).

Add a compile-time assert in the legacy #PF handler to make sure that KVM-
define flags are covered by its existing sanity check on the upper bits.

Opportunistically add a description of PFERR_IMPLICIT_ACCESS, since we
are removing the comment that defined it.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Message-ID: <20240228024147.41573-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  6 ++++++
 arch/x86/kvm/mmu/mmu.c          | 14 +++-----------
 arch/x86/kvm/svm/svm.c          |  9 +++++++++
 3 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 58bbcf76ad1e..12e727301262 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -267,7 +267,13 @@ enum x86_intercept_stage;
 #define PFERR_GUEST_ENC_MASK	BIT_ULL(34)
 #define PFERR_GUEST_SIZEM_MASK	BIT_ULL(35)
 #define PFERR_GUEST_VMPL_MASK	BIT_ULL(36)
+
+/*
+ * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP checks
+ * when emulating instructions that triggers implicit access.
+ */
 #define PFERR_IMPLICIT_ACCESS	BIT_ULL(48)
+#define PFERR_SYNTHETIC_MASK	(PFERR_IMPLICIT_ACCESS)
 
 #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK |	\
 				 PFERR_WRITE_MASK |		\
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c72a2033ca96..5562d693880a 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4502,6 +4502,9 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
 		return -EFAULT;
 #endif
 
+	/* Ensure the above sanity check also covers KVM-defined flags. */
+	BUILD_BUG_ON(lower_32_bits(PFERR_SYNTHETIC_MASK));
+
 	vcpu->arch.l1tf_flush_l1d = true;
 	if (!flags) {
 		trace_kvm_page_fault(vcpu, fault_address, error_code);
@@ -5786,17 +5789,6 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
 	int r, emulation_type = EMULTYPE_PF;
 	bool direct = vcpu->arch.mmu->root_role.direct;
 
-	/*
-	 * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP
-	 * checks when emulating instructions that triggers implicit access.
-	 * WARN if hardware generates a fault with an error code that collides
-	 * with the KVM-defined value.  Clear the flag and continue on, i.e.
-	 * don't terminate the VM, as KVM can't possibly be relying on a flag
-	 * that KVM doesn't know about.
-	 */
-	if (WARN_ON_ONCE(error_code & PFERR_IMPLICIT_ACCESS))
-		error_code &= ~PFERR_IMPLICIT_ACCESS;
-
 	if (WARN_ON_ONCE(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
 		return RET_PF_RETRY;
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 0f3b59da0d4a..535018f152a3 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2047,6 +2047,15 @@ static int npf_interception(struct kvm_vcpu *vcpu)
 	u64 fault_address = svm->vmcb->control.exit_info_2;
 	u64 error_code = svm->vmcb->control.exit_info_1;
 
+	/*
+	 * WARN if hardware generates a fault with an error code that collides
+	 * with KVM-defined sythentic flags.  Clear the flags and continue on,
+	 * i.e. don't terminate the VM, as KVM can't possibly be relying on a
+	 * flag that KVM doesn't know about.
+	 */
+	if (WARN_ON_ONCE(error_code & PFERR_SYNTHETIC_MASK))
+		error_code &= ~PFERR_SYNTHETIC_MASK;
+
 	trace_kvm_page_fault(vcpu, fault_address, error_code);
 	return kvm_mmu_page_fault(vcpu, fault_address, error_code,
 			static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 05/17] KVM: x86/mmu: Pass full 64-bit error code when handling page faults
  2024-05-07 15:58 [PATCH v2 00/17] KVM: x86/mmu: Page fault and MMIO cleanups Paolo Bonzini
                   ` (3 preceding siblings ...)
  2024-05-07 15:58 ` [PATCH 04/17] KVM: x86: Move synthetic PFERR_* sanity checks to SVM's #NPF handler Paolo Bonzini
@ 2024-05-07 15:58 ` Paolo Bonzini
  2024-05-07 15:58 ` [PATCH 06/17] KVM: x86/mmu: WARN if upper 32 bits of legacy #PF error code are non-zero Paolo Bonzini
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 34+ messages in thread
From: Paolo Bonzini @ 2024-05-07 15:58 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Isaku Yamahata, Michael Roth, Sean Christopherson, Xiaoyao Li

From: Isaku Yamahata <isaku.yamahata@intel.com>

Plumb the full 64-bit error code throughout the page fault handling code
so that KVM can use the upper 32 bits, e.g. SNP's PFERR_GUEST_ENC_MASK
will be used to determine whether or not a fault is private vs. shared.

Note, passing the 64-bit error code to FNAME(walk_addr)() does NOT change
the behavior of permission_fault() when invoked in the page fault path, as
KVM explicitly clears PFERR_IMPLICIT_ACCESS in kvm_mmu_page_fault().

Continue passing '0' from the async #PF worker, as guest_memfd and thus
private memory doesn't support async page faults.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
[mdr: drop references/changes on rebase, update commit message]
Signed-off-by: Michael Roth <michael.roth@amd.com>
[sean: drop truncation in call to FNAME(walk_addr)(), rewrite changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20240228024147.41573-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c          | 3 +--
 arch/x86/kvm/mmu/mmu_internal.h | 4 ++--
 arch/x86/kvm/mmu/mmutrace.h     | 2 +-
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 5562d693880a..28b1e7657d7f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5800,8 +5800,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
 	}
 
 	if (r == RET_PF_INVALID) {
-		r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa,
-					  lower_32_bits(error_code), false,
+		r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, error_code, false,
 					  &emulation_type);
 		if (KVM_BUG_ON(r == RET_PF_INVALID, vcpu->kvm))
 			return -EIO;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 61f49967047a..797b80f996a7 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -190,7 +190,7 @@ static inline bool is_nx_huge_page_enabled(struct kvm *kvm)
 struct kvm_page_fault {
 	/* arguments to kvm_mmu_do_page_fault.  */
 	const gpa_t addr;
-	const u32 error_code;
+	const u64 error_code;
 	const bool prefetch;
 
 	/* Derived from error_code.  */
@@ -288,7 +288,7 @@ static inline void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
 }
 
 static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
-					u32 err, bool prefetch, int *emulation_type)
+					u64 err, bool prefetch, int *emulation_type)
 {
 	struct kvm_page_fault fault = {
 		.addr = cr2_or_gpa,
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index ae86820cef69..195d98bc8de8 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -260,7 +260,7 @@ TRACE_EVENT(
 	TP_STRUCT__entry(
 		__field(int, vcpu_id)
 		__field(gpa_t, cr2_or_gpa)
-		__field(u32, error_code)
+		__field(u64, error_code)
 		__field(u64 *, sptep)
 		__field(u64, old_spte)
 		__field(u64, new_spte)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 06/17] KVM: x86/mmu: WARN if upper 32 bits of legacy #PF error code are non-zero
  2024-05-07 15:58 [PATCH v2 00/17] KVM: x86/mmu: Page fault and MMIO cleanups Paolo Bonzini
                   ` (4 preceding siblings ...)
  2024-05-07 15:58 ` [PATCH 05/17] KVM: x86/mmu: Pass full 64-bit error code when handling page faults Paolo Bonzini
@ 2024-05-07 15:58 ` Paolo Bonzini
  2024-05-07 15:58 ` [PATCH 07/17] KVM: x86/mmu: Use synthetic page fault error code to indicate private faults Paolo Bonzini
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 34+ messages in thread
From: Paolo Bonzini @ 2024-05-07 15:58 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Sean Christopherson, Kai Huang

From: Sean Christopherson <seanjc@google.com>

WARN if bits 63:32 are non-zero when handling an intercepted legacy #PF,
as the error code for #PF is limited to 32 bits (and in practice, 16 bits
on Intel CPUS).  This behavior is architectural, is part of KVM's ABI
(see kvm_vcpu_events.error_code), and is explicitly documented as being
preserved for intecerpted #PF in both the APM:

  The error code saved in EXITINFO1 is the same as would be pushed onto
  the stack by a non-intercepted #PF exception in protected mode.

and even more explicitly in the SDM as VMCS.VM_EXIT_INTR_ERROR_CODE is a
32-bit field.

Simply drop the upper bits if hardware provides garbage, as spurious
information should do no harm (though in all likelihood hardware is buggy
and the kernel is doomed).

Handling all upper 32 bits in the #PF path will allow moving the sanity
check on synthetic checks from kvm_mmu_page_fault() to npf_interception(),
which in turn will allow deriving PFERR_PRIVATE_ACCESS from AMD's
PFERR_GUEST_ENC_MASK without running afoul of the sanity check.

Note, this is also why Intel uses bit 15 for SGX (highest bit on Intel CPUs)
and AMD uses bit 31 for RMP (highest bit on AMD CPUs); using the highest
bit minimizes the probability of a collision with the "other" vendor,
without needing to plumb more bits through microcode.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-ID: <20240228024147.41573-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 28b1e7657d7f..3609167ba30e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4501,6 +4501,13 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
 	if (WARN_ON_ONCE(fault_address >> 32))
 		return -EFAULT;
 #endif
+	/*
+	 * Legacy #PF exception only have a 32-bit error code.  Simply drop the
+	 * upper bits as KVM doesn't use them for #PF (because they are never
+	 * set), and to ensure there are no collisions with KVM-defined bits.
+	 */
+	if (WARN_ON_ONCE(error_code >> 32))
+		error_code = lower_32_bits(error_code);

 	/* Ensure the above sanity check also covers KVM-defined flags. */
 	BUILD_BUG_ON(lower_32_bits(PFERR_SYNTHETIC_MASK));
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 07/17] KVM: x86/mmu: Use synthetic page fault error code to indicate private faults
  2024-05-07 15:58 [PATCH v2 00/17] KVM: x86/mmu: Page fault and MMIO cleanups Paolo Bonzini
                   ` (5 preceding siblings ...)
  2024-05-07 15:58 ` [PATCH 06/17] KVM: x86/mmu: WARN if upper 32 bits of legacy #PF error code are non-zero Paolo Bonzini
@ 2024-05-07 15:58 ` Paolo Bonzini
  2024-05-13  5:56   ` Xiaoyao Li
  2024-05-07 15:58 ` [PATCH 08/17] KVM: x86/mmu: check for invalid async page faults involving private memory Paolo Bonzini
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2024-05-07 15:58 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Sean Christopherson

From: Sean Christopherson <seanjc@google.com>

Add and use a synthetic, KVM-defined page fault error code to indicate
whether a fault is to private vs. shared memory.  TDX and SNP have
different mechanisms for reporting private vs. shared, and KVM's
software-protected VMs have no mechanism at all.  Usurp an error code
flag to avoid having to plumb another parameter to kvm_mmu_page_fault()
and friends.

Alternatively, KVM could borrow AMD's PFERR_GUEST_ENC_MASK, i.e. set it
for TDX and software-protected VMs as appropriate, but that would require
*clearing* the flag for SEV and SEV-ES VMs, which support encrypted
memory at the hardware layer, but don't utilize private memory at the
KVM layer.

Opportunistically add a comment to call out that the logic for software-
protected VMs is (and was before this commit) broken for nested MMUs, i.e.
for nested TDP, as the GPA is an L2 GPA.  Punt on trying to play nice with
nested MMUs as there is a _lot_ of functionality that simply doesn't work
for software-protected VMs, e.g. all of the paths where KVM accesses guest
memory need to be updated to be aware of private vs. shared memory.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20240228024147.41573-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  7 ++++++-
 arch/x86/kvm/mmu/mmu.c          | 14 ++++++++++++++
 arch/x86/kvm/mmu/mmu_internal.h |  2 +-
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 12e727301262..0dc755a6dc0c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -273,7 +273,12 @@ enum x86_intercept_stage;
  * when emulating instructions that triggers implicit access.
  */
 #define PFERR_IMPLICIT_ACCESS	BIT_ULL(48)
-#define PFERR_SYNTHETIC_MASK	(PFERR_IMPLICIT_ACCESS)
+/*
+ * PRIVATE_ACCESS is a KVM-defined flag us to indicate that a fault occurred
+ * when the guest was accessing private memory.
+ */
+#define PFERR_PRIVATE_ACCESS   BIT_ULL(49)
+#define PFERR_SYNTHETIC_MASK   (PFERR_IMPLICIT_ACCESS | PFERR_PRIVATE_ACCESS)
 
 #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK |	\
 				 PFERR_WRITE_MASK |		\
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3609167ba30e..eb041acec2dc 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5799,6 +5799,20 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
 	if (WARN_ON_ONCE(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
 		return RET_PF_RETRY;
 
+	/*
+	 * Except for reserved faults (emulated MMIO is shared-only), set the
+	 * PFERR_PRIVATE_ACCESS flag for software-protected VMs based on the gfn's
+	 * current attributes, which are the source of truth for such VMs.  Note,
+	 * this wrong for nested MMUs as the GPA is an L2 GPA, but KVM doesn't
+	 * currently supported nested virtualization (among many other things)
+	 * for software-protected VMs.
+	 */
+	if (IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) &&
+	    !(error_code & PFERR_RSVD_MASK) &&
+	    vcpu->kvm->arch.vm_type == KVM_X86_SW_PROTECTED_VM &&
+	    kvm_mem_is_private(vcpu->kvm, gpa_to_gfn(cr2_or_gpa)))
+		error_code |= PFERR_PRIVATE_ACCESS;
+
 	r = RET_PF_INVALID;
 	if (unlikely(error_code & PFERR_RSVD_MASK)) {
 		r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct);
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 797b80f996a7..dfd9ff383663 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -306,7 +306,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 		.max_level = KVM_MAX_HUGEPAGE_LEVEL,
 		.req_level = PG_LEVEL_4K,
 		.goal_level = PG_LEVEL_4K,
-		.is_private = kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT),
+		.is_private = err & PFERR_PRIVATE_ACCESS,
 	};
 	int r;
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 08/17] KVM: x86/mmu: check for invalid async page faults involving private memory
  2024-05-07 15:58 [PATCH v2 00/17] KVM: x86/mmu: Page fault and MMIO cleanups Paolo Bonzini
                   ` (6 preceding siblings ...)
  2024-05-07 15:58 ` [PATCH 07/17] KVM: x86/mmu: Use synthetic page fault error code to indicate private faults Paolo Bonzini
@ 2024-05-07 15:58 ` Paolo Bonzini
  2024-05-07 15:58 ` [PATCH 09/17] KVM: x86/mmu: WARN and skip MMIO cache on private, reserved page faults Paolo Bonzini
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 34+ messages in thread
From: Paolo Bonzini @ 2024-05-07 15:58 UTC (permalink / raw)
  To: linux-kernel, kvm

Right now the error code is not used when an async page fault is completed.
This is not a problem in the current code, but it is untidy.  For protected
VMs, we will also need to check that the page attributes match the current
state of the page, because asynchronous page faults can only occur on
shared pages (private pages go through kvm_faultin_pfn_private() instead of
__gfn_to_pfn_memslot()).

Start by piping the error code from kvm_arch_setup_async_pf() to
kvm_arch_async_page_ready() via the architecture-specific async page
fault data.  For now, it can be used to assert that there are no
async page faults on private memory.

Extracted from a patch by Isaku Yamahata.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/mmu/mmu.c          | 18 +++++++++++-------
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0dc755a6dc0c..9d6368512be6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1851,6 +1851,7 @@ struct kvm_arch_async_pf {
 	gfn_t gfn;
 	unsigned long cr3;
 	bool direct_map;
+	u64 error_code;
 };
 
 extern u32 __read_mostly kvm_nr_uret_msrs;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index eb041acec2dc..d52794663290 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4207,24 +4207,28 @@ static u32 alloc_apf_token(struct kvm_vcpu *vcpu)
 	return (vcpu->arch.apf.id++ << 12) | vcpu->vcpu_id;
 }
 
-static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
-				    gfn_t gfn)
+static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu,
+				    struct kvm_page_fault *fault)
 {
 	struct kvm_arch_async_pf arch;
 
 	arch.token = alloc_apf_token(vcpu);
-	arch.gfn = gfn;
+	arch.gfn = fault->gfn;
+	arch.error_code = fault->error_code;
 	arch.direct_map = vcpu->arch.mmu->root_role.direct;
 	arch.cr3 = kvm_mmu_get_guest_pgd(vcpu, vcpu->arch.mmu);
 
-	return kvm_setup_async_pf(vcpu, cr2_or_gpa,
-				  kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch);
+	return kvm_setup_async_pf(vcpu, fault->addr,
+				  kvm_vcpu_gfn_to_hva(vcpu, fault->gfn), &arch);
 }
 
 void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
 {
 	int r;
 
+	if (WARN_ON_ONCE(work->arch.error_code & PFERR_PRIVATE_ACCESS))
+		return;
+
 	if ((vcpu->arch.mmu->root_role.direct != work->arch.direct_map) ||
 	      work->wakeup_all)
 		return;
@@ -4237,7 +4241,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
 	      work->arch.cr3 != kvm_mmu_get_guest_pgd(vcpu, vcpu->arch.mmu))
 		return;
 
-	kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true, NULL);
+	kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, work->arch.error_code, true, NULL);
 }
 
 static inline u8 kvm_max_level_for_order(int order)
@@ -4334,7 +4338,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 			trace_kvm_async_pf_repeated_fault(fault->addr, fault->gfn);
 			kvm_make_request(KVM_REQ_APF_HALT, vcpu);
 			return RET_PF_RETRY;
-		} else if (kvm_arch_setup_async_pf(vcpu, fault->addr, fault->gfn)) {
+		} else if (kvm_arch_setup_async_pf(vcpu, fault)) {
 			return RET_PF_RETRY;
 		}
 	}
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 09/17] KVM: x86/mmu: WARN and skip MMIO cache on private, reserved page faults
  2024-05-07 15:58 [PATCH v2 00/17] KVM: x86/mmu: Page fault and MMIO cleanups Paolo Bonzini
                   ` (7 preceding siblings ...)
  2024-05-07 15:58 ` [PATCH 08/17] KVM: x86/mmu: check for invalid async page faults involving private memory Paolo Bonzini
@ 2024-05-07 15:58 ` Paolo Bonzini
  2024-05-13  6:15   ` Xiaoyao Li
  2024-05-07 15:58 ` [PATCH 10/17] KVM: x86/mmu: Move private vs. shared check above slot validity checks Paolo Bonzini
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2024-05-07 15:58 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Sean Christopherson

From: Sean Christopherson <seanjc@google.com>

WARN and skip the emulated MMIO fastpath if a private, reserved page fault
is encountered, as private+reserved should be an impossible combination
(KVM should never create an MMIO SPTE for a private access).

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240228024147.41573-9-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d52794663290..0d884d0b0f35 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5819,6 +5819,9 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
 
 	r = RET_PF_INVALID;
 	if (unlikely(error_code & PFERR_RSVD_MASK)) {
+		if (WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))
+			return -EFAULT;
+
 		r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct);
 		if (r == RET_PF_EMULATE)
 			goto emulate;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 10/17] KVM: x86/mmu: Move private vs. shared check above slot validity checks
  2024-05-07 15:58 [PATCH v2 00/17] KVM: x86/mmu: Page fault and MMIO cleanups Paolo Bonzini
                   ` (8 preceding siblings ...)
  2024-05-07 15:58 ` [PATCH 09/17] KVM: x86/mmu: WARN and skip MMIO cache on private, reserved page faults Paolo Bonzini
@ 2024-05-07 15:58 ` Paolo Bonzini
  2024-05-13  6:22   ` Xiaoyao Li
  2024-05-07 15:58 ` [PATCH 11/17] KVM: x86/mmu: Don't force emulation of L2 accesses to non-APIC internal slots Paolo Bonzini
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2024-05-07 15:58 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Sean Christopherson, Yu Zhang, Chao Peng, Fuad Tabba,
	Michael Roth, Isaku Yamahata, Kai Huang

From: Sean Christopherson <seanjc@google.com>

Prioritize private vs. shared gfn attribute checks above slot validity
checks to ensure a consistent userspace ABI.  E.g. as is, KVM will exit to
userspace if there is no memslot, but emulate accesses to the APIC access
page even if the attributes mismatch.

Fixes: 8dd2eee9d526 ("KVM: x86/mmu: Handle page fault for private memory")
Cc: Yu Zhang <yu.c.zhang@linux.intel.com>
Cc: Chao Peng <chao.p.peng@linux.intel.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Michael Roth <michael.roth@amd.com>
Cc: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-ID: <20240228024147.41573-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 0d884d0b0f35..ba50b93e93ed 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4317,11 +4317,6 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 			return RET_PF_EMULATE;
 	}
 
-	if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) {
-		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
-		return -EFAULT;
-	}
-
 	if (fault->is_private)
 		return kvm_faultin_pfn_private(vcpu, fault);
 
@@ -4359,9 +4354,24 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 {
 	int ret;
 
+	/*
+	 * Note that the mmu_invalidate_seq also serves to detect a concurrent
+	 * change in attributes.  is_page_fault_stale() will detect an
+	 * invalidation relate to fault->fn and resume the guest without
+	 * installing a mapping in the page tables.
+	 */
 	fault->mmu_seq = vcpu->kvm->mmu_invalidate_seq;
 	smp_rmb();
 
+	/*
+	 * Now that we have a snapshot of mmu_invalidate_seq we can check for a
+	 * private vs. shared mismatch.
+	 */
+	if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) {
+		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
+		return -EFAULT;
+	}
+
 	/*
 	 * Check for a relevant mmu_notifier invalidation event before getting
 	 * the pfn from the primary MMU, and before acquiring mmu_lock.
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 11/17] KVM: x86/mmu: Don't force emulation of L2 accesses to non-APIC internal slots
  2024-05-07 15:58 [PATCH v2 00/17] KVM: x86/mmu: Page fault and MMIO cleanups Paolo Bonzini
                   ` (9 preceding siblings ...)
  2024-05-07 15:58 ` [PATCH 10/17] KVM: x86/mmu: Move private vs. shared check above slot validity checks Paolo Bonzini
@ 2024-05-07 15:58 ` Paolo Bonzini
  2024-05-07 15:58 ` [PATCH 12/17] KVM: x86/mmu: Explicitly disallow private accesses to emulated MMIO Paolo Bonzini
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 34+ messages in thread
From: Paolo Bonzini @ 2024-05-07 15:58 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Sean Christopherson, Kai Huang

From: Sean Christopherson <seanjc@google.com>

Allow mapping KVM's internal memslots used for EPT without unrestricted
guest into L2, i.e. allow mapping the hidden TSS and the identity mapped
page tables into L2.  Unlike the APIC access page, there is no correctness
issue with letting L2 access the "hidden" memory.  Allowing these memslots
to be mapped into L2 fixes a largely theoretical bug where KVM could
incorrectly emulate subsequent _L1_ accesses as MMIO, and also ensures
consistent KVM behavior for L2.

If KVM is using TDP, but L1 is using shadow paging for L2, then routing
through kvm_handle_noslot_fault() will incorrectly cache the gfn as MMIO,
and create an MMIO SPTE.  Creating an MMIO SPTE is ok, but only because
kvm_mmu_page_role.guest_mode ensure KVM uses different roots for L1 vs.
L2.  But vcpu->arch.mmio_gfn will remain valid, and could cause KVM to
incorrectly treat an L1 access to the hidden TSS or identity mapped page
tables as MMIO.

Furthermore, forcing L2 accesses to be treated as "no slot" faults doesn't
actually prevent exposing KVM's internal memslots to L2, it simply forces
KVM to emulate the access.  In most cases, that will trigger MMIO,
amusingly due to filling vcpu->arch.mmio_gfn, but also because
vcpu_is_mmio_gpa() unconditionally treats APIC accesses as MMIO, i.e. APIC
accesses are ok.  But the hidden TSS and identity mapped page tables could
go either way (MMIO or access the private memslot's backing memory).

Alternatively, the inconsistent emulator behavior could be addressed by
forcing MMIO emulation for L2 access to all internal memslots, not just to
the APIC.  But that's arguably less correct than letting L2 access the
hidden TSS and identity mapped page tables, not to mention that it's
*extremely* unlikely anyone cares what KVM does in this case.  From L1's
perspective there is R/W memory at those memslots, the memory just happens
to be initialized with non-zero data.  Making the memory disappear when it
is accessed by L2 is far more magical and arbitrary than the memory
existing in the first place.

The APIC access page is special because KVM _must_ emulate the access to
do the right thing (emulate an APIC access instead of reading/writing the
APIC access page).  And despite what commit 3a2936dedd20 ("kvm: mmu: Don't
expose private memslots to L2") said, it's not just necessary when L1 is
accelerating L2's virtual APIC, it's just as important (likely *more*
imporant for correctness when L1 is passing through its own APIC to L2.

Fixes: 3a2936dedd20 ("kvm: mmu: Don't expose private memslots to L2")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-ID: <20240228024147.41573-11-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index ba50b93e93ed..a8e14c2b68a7 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4298,8 +4298,18 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	if (slot && (slot->flags & KVM_MEMSLOT_INVALID))
 		return RET_PF_RETRY;

-	if (!kvm_is_visible_memslot(slot)) {
-		/* Don't expose private memslots to L2. */
+	if (slot && slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT) {
+		/*
+		 * Don't map L1's APIC access page into L2, KVM doesn't support
+		 * using APICv/AVIC to accelerate L2 accesses to L1's APIC,
+		 * i.e. the access needs to be emulated.  Emulating access to
+		 * L1's APIC is also correct if L1 is accelerating L2's own
+		 * virtual APIC, but for some reason L1 also maps _L1's_ APIC
+		 * into L2.  Note, vcpu_is_mmio_gpa() always treats access to
+		 * the APIC as MMIO.  Allow an MMIO SPTE to be created, as KVM
+		 * uses different roots for L1 vs. L2, i.e. there is no danger
+		 * of breaking APICv/AVIC for L1.
+		 */
 		if (is_guest_mode(vcpu)) {
 			fault->slot = NULL;
 			fault->pfn = KVM_PFN_NOSLOT;
@@ -4312,8 +4322,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 		 * MMIO SPTE.  That way the cache doesn't need to be purged
 		 * when the AVIC is re-enabled.
 		 */
-		if (slot && slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT &&
-		    !kvm_apicv_activated(vcpu->kvm))
+		if (!kvm_apicv_activated(vcpu->kvm))
 			return RET_PF_EMULATE;
 	}

-- 
2.43.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 12/17] KVM: x86/mmu: Explicitly disallow private accesses to emulated MMIO
  2024-05-07 15:58 [PATCH v2 00/17] KVM: x86/mmu: Page fault and MMIO cleanups Paolo Bonzini
                   ` (10 preceding siblings ...)
  2024-05-07 15:58 ` [PATCH 11/17] KVM: x86/mmu: Don't force emulation of L2 accesses to non-APIC internal slots Paolo Bonzini
@ 2024-05-07 15:58 ` Paolo Bonzini
  2024-05-13  6:26   ` Xiaoyao Li
  2024-05-07 15:58 ` [PATCH 13/17] KVM: x86/mmu: Move slot checks from __kvm_faultin_pfn() to kvm_faultin_pfn() Paolo Bonzini
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2024-05-07 15:58 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Sean Christopherson

From: Sean Christopherson <seanjc@google.com>

Explicitly detect and disallow private accesses to emulated MMIO in
kvm_handle_noslot_fault() instead of relying on kvm_faultin_pfn_private()
to perform the check.  This will allow the page fault path to go straight
to kvm_handle_noslot_fault() without bouncing through __kvm_faultin_pfn().

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240228024147.41573-12-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a8e14c2b68a7..fdae6d19e72b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3262,6 +3262,11 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
 {
 	gva_t gva = fault->is_tdp ? 0 : fault->addr;
 
+	if (fault->is_private) {
+		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
+		return -EFAULT;
+	}
+
 	vcpu_cache_mmio_info(vcpu, gva, fault->gfn,
 			     access & shadow_mmio_access_mask);
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 13/17] KVM: x86/mmu: Move slot checks from __kvm_faultin_pfn() to kvm_faultin_pfn()
  2024-05-07 15:58 [PATCH v2 00/17] KVM: x86/mmu: Page fault and MMIO cleanups Paolo Bonzini
                   ` (11 preceding siblings ...)
  2024-05-07 15:58 ` [PATCH 12/17] KVM: x86/mmu: Explicitly disallow private accesses to emulated MMIO Paolo Bonzini
@ 2024-05-07 15:58 ` Paolo Bonzini
  2024-05-13  6:27   ` Xiaoyao Li
  2024-05-07 15:58 ` [PATCH 14/17] KVM: x86/mmu: Handle no-slot faults at the beginning of kvm_faultin_pfn() Paolo Bonzini
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2024-05-07 15:58 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Sean Christopherson, Kai Huang

From: Sean Christopherson <seanjc@google.com>

Move the checks related to the validity of an access to a memslot from the
inner __kvm_faultin_pfn() to its sole caller, kvm_faultin_pfn().  This
allows emulating accesses to the APIC access page, which don't need to
resolve a pfn, even if there is a relevant in-progress mmu_notifier
invalidation.  Ditto for accesses to KVM internal memslots from L2, which
KVM also treats as emulated MMIO.

More importantly, this will allow for future cleanup by having the
"no memslot" case bail from kvm_faultin_pfn() very early on.

Go to rather extreme and gross lengths to make the change a glorified
nop, e.g. call into __kvm_faultin_pfn() even when there is no slot, as the
related code is very subtle.  E.g. fault->slot can be nullified if it
points at the APIC access page, some flows in KVM x86 expect fault->pfn
to be KVM_PFN_NOSLOT, while others check only fault->slot, etc.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-ID: <20240228024147.41573-13-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 115 +++++++++++++++++++++--------------------
 1 file changed, 58 insertions(+), 57 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index fdae6d19e72b..b09c8034ed15 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4292,9 +4292,64 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
 
 static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
-	struct kvm_memory_slot *slot = fault->slot;
 	bool async;
 
+	if (fault->is_private)
+		return kvm_faultin_pfn_private(vcpu, fault);
+
+	async = false;
+	fault->pfn = __gfn_to_pfn_memslot(fault->slot, fault->gfn, false, false,
+					  &async, fault->write,
+					  &fault->map_writable, &fault->hva);
+	if (!async)
+		return RET_PF_CONTINUE; /* *pfn has correct page already */
+
+	if (!fault->prefetch && kvm_can_do_async_pf(vcpu)) {
+		trace_kvm_try_async_get_page(fault->addr, fault->gfn);
+		if (kvm_find_async_pf_gfn(vcpu, fault->gfn)) {
+			trace_kvm_async_pf_repeated_fault(fault->addr, fault->gfn);
+			kvm_make_request(KVM_REQ_APF_HALT, vcpu);
+			return RET_PF_RETRY;
+		} else if (kvm_arch_setup_async_pf(vcpu, fault)) {
+			return RET_PF_RETRY;
+		}
+	}
+
+	/*
+	 * Allow gup to bail on pending non-fatal signals when it's also allowed
+	 * to wait for IO.  Note, gup always bails if it is unable to quickly
+	 * get a page and a fatal signal, i.e. SIGKILL, is pending.
+	 */
+	fault->pfn = __gfn_to_pfn_memslot(fault->slot, fault->gfn, false, true,
+					  NULL, fault->write,
+					  &fault->map_writable, &fault->hva);
+	return RET_PF_CONTINUE;
+}
+
+static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
+			   unsigned int access)
+{
+	struct kvm_memory_slot *slot = fault->slot;
+	int ret;
+
+	/*
+	 * Note that the mmu_invalidate_seq also serves to detect a concurrent
+	 * change in attributes.  is_page_fault_stale() will detect an
+	 * invalidation relate to fault->fn and resume the guest without
+	 * installing a mapping in the page tables.
+	 */
+	fault->mmu_seq = vcpu->kvm->mmu_invalidate_seq;
+	smp_rmb();
+
+	/*
+	 * Now that we have a snapshot of mmu_invalidate_seq we can check for a
+	 * private vs. shared mismatch.
+	 */
+	if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) {
+		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
+		return -EFAULT;
+	}
+
 	/*
 	 * Retry the page fault if the gfn hit a memslot that is being deleted
 	 * or moved.  This ensures any existing SPTEs for the old memslot will
@@ -4319,7 +4374,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 			fault->slot = NULL;
 			fault->pfn = KVM_PFN_NOSLOT;
 			fault->map_writable = false;
-			return RET_PF_CONTINUE;
+			goto faultin_done;
 		}
 		/*
 		 * If the APIC access page exists but is disabled, go directly
@@ -4331,61 +4386,6 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 			return RET_PF_EMULATE;
 	}
 
-	if (fault->is_private)
-		return kvm_faultin_pfn_private(vcpu, fault);
-
-	async = false;
-	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, false, &async,
-					  fault->write, &fault->map_writable,
-					  &fault->hva);
-	if (!async)
-		return RET_PF_CONTINUE; /* *pfn has correct page already */
-
-	if (!fault->prefetch && kvm_can_do_async_pf(vcpu)) {
-		trace_kvm_try_async_get_page(fault->addr, fault->gfn);
-		if (kvm_find_async_pf_gfn(vcpu, fault->gfn)) {
-			trace_kvm_async_pf_repeated_fault(fault->addr, fault->gfn);
-			kvm_make_request(KVM_REQ_APF_HALT, vcpu);
-			return RET_PF_RETRY;
-		} else if (kvm_arch_setup_async_pf(vcpu, fault)) {
-			return RET_PF_RETRY;
-		}
-	}
-
-	/*
-	 * Allow gup to bail on pending non-fatal signals when it's also allowed
-	 * to wait for IO.  Note, gup always bails if it is unable to quickly
-	 * get a page and a fatal signal, i.e. SIGKILL, is pending.
-	 */
-	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, true, NULL,
-					  fault->write, &fault->map_writable,
-					  &fault->hva);
-	return RET_PF_CONTINUE;
-}
-
-static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
-			   unsigned int access)
-{
-	int ret;
-
-	/*
-	 * Note that the mmu_invalidate_seq also serves to detect a concurrent
-	 * change in attributes.  is_page_fault_stale() will detect an
-	 * invalidation relate to fault->fn and resume the guest without
-	 * installing a mapping in the page tables.
-	 */
-	fault->mmu_seq = vcpu->kvm->mmu_invalidate_seq;
-	smp_rmb();
-
-	/*
-	 * Now that we have a snapshot of mmu_invalidate_seq we can check for a
-	 * private vs. shared mismatch.
-	 */
-	if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) {
-		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
-		return -EFAULT;
-	}
-
 	/*
 	 * Check for a relevant mmu_notifier invalidation event before getting
 	 * the pfn from the primary MMU, and before acquiring mmu_lock.
@@ -4415,6 +4415,7 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 	if (ret != RET_PF_CONTINUE)
 		return ret;
 
+faultin_done:
 	if (unlikely(is_error_pfn(fault->pfn)))
 		return kvm_handle_error_pfn(vcpu, fault);
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 14/17] KVM: x86/mmu: Handle no-slot faults at the beginning of kvm_faultin_pfn()
  2024-05-07 15:58 [PATCH v2 00/17] KVM: x86/mmu: Page fault and MMIO cleanups Paolo Bonzini
                   ` (12 preceding siblings ...)
  2024-05-07 15:58 ` [PATCH 13/17] KVM: x86/mmu: Move slot checks from __kvm_faultin_pfn() to kvm_faultin_pfn() Paolo Bonzini
@ 2024-05-07 15:58 ` Paolo Bonzini
  2024-05-13  6:28   ` Xiaoyao Li
  2024-05-07 15:58 ` [PATCH 15/17] KVM: x86/mmu: Set kvm_page_fault.hva to KVM_HVA_ERR_BAD for "no slot" faults Paolo Bonzini
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2024-05-07 15:58 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Sean Christopherson, David Matlack, Kai Huang

From: Sean Christopherson <seanjc@google.com>

Handle the "no memslot" case at the beginning of kvm_faultin_pfn(), just
after the private versus shared check, so that there's no need to
repeatedly query whether or not a slot exists.  This also makes it more
obvious that, except for private vs. shared attributes, the process of
faulting in a pfn simply doesn't apply to gfns without a slot.

Opportunistically stuff @fault's metadata in kvm_handle_noslot_fault() so
that it doesn't need to be duplicated in all paths that invoke
kvm_handle_noslot_fault(), and to minimize the probability of not stuffing
the right fields.

Leave the existing handle behind, but convert it to a WARN, to guard
against __kvm_faultin_pfn() unexpectedly nullifying fault->slot.

Cc: David Matlack <dmatlack@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-ID: <20240228024147.41573-14-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b09c8034ed15..7630ad8cb022 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3270,6 +3270,10 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
 	vcpu_cache_mmio_info(vcpu, gva, fault->gfn,
 			     access & shadow_mmio_access_mask);
 
+	fault->slot = NULL;
+	fault->pfn = KVM_PFN_NOSLOT;
+	fault->map_writable = false;
+
 	/*
 	 * If MMIO caching is disabled, emulate immediately without
 	 * touching the shadow page tables as attempting to install an
@@ -4350,15 +4354,18 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 		return -EFAULT;
 	}
 
+	if (unlikely(!slot))
+		return kvm_handle_noslot_fault(vcpu, fault, access);
+
 	/*
 	 * Retry the page fault if the gfn hit a memslot that is being deleted
 	 * or moved.  This ensures any existing SPTEs for the old memslot will
 	 * be zapped before KVM inserts a new MMIO SPTE for the gfn.
 	 */
-	if (slot && (slot->flags & KVM_MEMSLOT_INVALID))
+	if (slot->flags & KVM_MEMSLOT_INVALID)
 		return RET_PF_RETRY;
 
-	if (slot && slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT) {
+	if (slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT) {
 		/*
 		 * Don't map L1's APIC access page into L2, KVM doesn't support
 		 * using APICv/AVIC to accelerate L2 accesses to L1's APIC,
@@ -4370,12 +4377,9 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 		 * uses different roots for L1 vs. L2, i.e. there is no danger
 		 * of breaking APICv/AVIC for L1.
 		 */
-		if (is_guest_mode(vcpu)) {
-			fault->slot = NULL;
-			fault->pfn = KVM_PFN_NOSLOT;
-			fault->map_writable = false;
-			goto faultin_done;
-		}
+		if (is_guest_mode(vcpu))
+			return kvm_handle_noslot_fault(vcpu, fault, access);
+
 		/*
 		 * If the APIC access page exists but is disabled, go directly
 		 * to emulation without caching the MMIO access or creating a
@@ -4386,6 +4390,9 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 			return RET_PF_EMULATE;
 	}
 
+	fault->mmu_seq = vcpu->kvm->mmu_invalidate_seq;
+	smp_rmb();
+
 	/*
 	 * Check for a relevant mmu_notifier invalidation event before getting
 	 * the pfn from the primary MMU, and before acquiring mmu_lock.
@@ -4407,19 +4414,17 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 	 * *guaranteed* to need to retry, i.e. waiting until mmu_lock is held
 	 * to detect retry guarantees the worst case latency for the vCPU.
 	 */
-	if (fault->slot &&
-	    mmu_invalidate_retry_gfn_unsafe(vcpu->kvm, fault->mmu_seq, fault->gfn))
+	if (mmu_invalidate_retry_gfn_unsafe(vcpu->kvm, fault->mmu_seq, fault->gfn))
 		return RET_PF_RETRY;
 
 	ret = __kvm_faultin_pfn(vcpu, fault);
 	if (ret != RET_PF_CONTINUE)
 		return ret;
 
-faultin_done:
 	if (unlikely(is_error_pfn(fault->pfn)))
 		return kvm_handle_error_pfn(vcpu, fault);
 
-	if (unlikely(!fault->slot))
+	if (WARN_ON_ONCE(!fault->slot))
 		return kvm_handle_noslot_fault(vcpu, fault, access);
 
 	/*
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 15/17] KVM: x86/mmu: Set kvm_page_fault.hva to KVM_HVA_ERR_BAD for "no slot" faults
  2024-05-07 15:58 [PATCH v2 00/17] KVM: x86/mmu: Page fault and MMIO cleanups Paolo Bonzini
                   ` (13 preceding siblings ...)
  2024-05-07 15:58 ` [PATCH 14/17] KVM: x86/mmu: Handle no-slot faults at the beginning of kvm_faultin_pfn() Paolo Bonzini
@ 2024-05-07 15:58 ` Paolo Bonzini
  2024-05-13  6:28   ` Xiaoyao Li
  2024-05-07 15:58 ` [PATCH 16/17] KVM: x86/mmu: Initialize kvm_page_fault's pfn and hva to error values Paolo Bonzini
  2024-05-07 15:58 ` [PATCH 17/17] KVM: x86/mmu: Sanity check that __kvm_faultin_pfn() doesn't create noslot pfns Paolo Bonzini
  16 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2024-05-07 15:58 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Sean Christopherson, Kai Huang

From: Sean Christopherson <seanjc@google.com>

Explicitly set fault->hva to KVM_HVA_ERR_BAD when handling a "no slot"
fault to ensure that KVM doesn't use a bogus virtual address, e.g. if
there *was* a slot but it's unusable (APIC access page), or if there
really was no slot, in which case fault->hva will be '0' (which is a
legal address for x86).

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-ID: <20240228024147.41573-15-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 7630ad8cb022..d717d60c6f19 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3273,6 +3273,7 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
 	fault->slot = NULL;
 	fault->pfn = KVM_PFN_NOSLOT;
 	fault->map_writable = false;
+	fault->hva = KVM_HVA_ERR_BAD;
 
 	/*
 	 * If MMIO caching is disabled, emulate immediately without
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 16/17] KVM: x86/mmu: Initialize kvm_page_fault's pfn and hva to error values
  2024-05-07 15:58 [PATCH v2 00/17] KVM: x86/mmu: Page fault and MMIO cleanups Paolo Bonzini
                   ` (14 preceding siblings ...)
  2024-05-07 15:58 ` [PATCH 15/17] KVM: x86/mmu: Set kvm_page_fault.hva to KVM_HVA_ERR_BAD for "no slot" faults Paolo Bonzini
@ 2024-05-07 15:58 ` Paolo Bonzini
  2024-05-13  6:29   ` Xiaoyao Li
  2024-05-07 15:58 ` [PATCH 17/17] KVM: x86/mmu: Sanity check that __kvm_faultin_pfn() doesn't create noslot pfns Paolo Bonzini
  16 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2024-05-07 15:58 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Sean Christopherson, Kai Huang

From: Sean Christopherson <seanjc@google.com>

Explicitly set "pfn" and "hva" to error values in kvm_mmu_do_page_fault()
to harden KVM against using "uninitialized" values.  In quotes because the
fields are actually zero-initialized, and zero is a legal value for both
page frame numbers and virtual addresses.  E.g. failure to set "pfn" prior
to creating an SPTE could result in KVM pointing at physical address '0',
which is far less desirable than KVM generating a SPTE with reserved PA
bits set and thus effectively killing the VM.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-ID: <20240228024147.41573-16-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu_internal.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index dfd9ff383663..ce2fcd19ba6b 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -307,6 +307,9 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 		.req_level = PG_LEVEL_4K,
 		.goal_level = PG_LEVEL_4K,
 		.is_private = err & PFERR_PRIVATE_ACCESS,
+
+		.pfn = KVM_PFN_ERR_FAULT,
+		.hva = KVM_HVA_ERR_BAD,
 	};
 	int r;
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 17/17] KVM: x86/mmu: Sanity check that __kvm_faultin_pfn() doesn't create noslot pfns
  2024-05-07 15:58 [PATCH v2 00/17] KVM: x86/mmu: Page fault and MMIO cleanups Paolo Bonzini
                   ` (15 preceding siblings ...)
  2024-05-07 15:58 ` [PATCH 16/17] KVM: x86/mmu: Initialize kvm_page_fault's pfn and hva to error values Paolo Bonzini
@ 2024-05-07 15:58 ` Paolo Bonzini
  2024-05-13  6:40   ` Xiaoyao Li
  16 siblings, 1 reply; 34+ messages in thread
From: Paolo Bonzini @ 2024-05-07 15:58 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Sean Christopherson, David Matlack, Kai Huang

From: Sean Christopherson <seanjc@google.com>

WARN if __kvm_faultin_pfn() generates a "no slot" pfn, and gracefully
handle the unexpected behavior instead of continuing on with dangerous
state, e.g. tdp_mmu_map_handle_target_level() _only_ checks fault->slot,
and so could install a bogus PFN into the guest.

The existing code is functionally ok, because kvm_faultin_pfn() pre-checks
all of the cases that result in KVM_PFN_NOSLOT, but it is unnecessarily
unsafe as it relies on __gfn_to_pfn_memslot() getting the _exact_ same
memslot, i.e. not a re-retrieved pointer with KVM_MEMSLOT_INVALID set.
And checking only fault->slot would fall apart if KVM ever added a flag or
condition that forced emulation, similar to how KVM handles writes to
read-only memslots.

Cc: David Matlack <dmatlack@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-ID: <20240228024147.41573-17-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d717d60c6f19..510eb1117012 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4425,7 +4425,7 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 	if (unlikely(is_error_pfn(fault->pfn)))
 		return kvm_handle_error_pfn(vcpu, fault);

-	if (WARN_ON_ONCE(!fault->slot))
+	if (WARN_ON_ONCE(!fault->slot || is_noslot_pfn(fault->pfn)))
 		return kvm_handle_noslot_fault(vcpu, fault, access);

 	/*
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 01/17] KVM: x86/mmu: Exit to userspace with -EFAULT if private fault hits emulation
  2024-05-07 15:58 ` [PATCH 01/17] KVM: x86/mmu: Exit to userspace with -EFAULT if private fault hits emulation Paolo Bonzini
@ 2024-05-13  5:25   ` Xiaoyao Li
  0 siblings, 0 replies; 34+ messages in thread
From: Xiaoyao Li @ 2024-05-13  5:25 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: Sean Christopherson, Yan Zhao

On 5/7/2024 11:58 PM, Paolo Bonzini wrote:
> From: Sean Christopherson <seanjc@google.com>
> 
> Exit to userspace with -EFAULT / KVM_EXIT_MEMORY_FAULT if a private fault
> triggers emulation of any kind, as KVM doesn't currently support emulating
> access to guest private memory.  Practically speaking, private faults and
> emulation are already mutually exclusive, but there are many flow that
> can result in KVM returning RET_PF_EMULATE, and adding one last check
> to harden against weird, unexpected combinations and/or KVM bugs is
> inexpensive.
> 
> Suggested-by: Yan Zhao <yan.y.zhao@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Message-ID: <20240228024147.41573-2-seanjc@google.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

> ---
>   arch/x86/kvm/mmu/mmu.c          |  8 --------
>   arch/x86/kvm/mmu/mmu_internal.h | 19 +++++++++++++++++++
>   2 files changed, 19 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 45b6d8f9e359..c72a2033ca96 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4257,14 +4257,6 @@ static inline u8 kvm_max_level_for_order(int order)
>   	return PG_LEVEL_4K;
>   }
>   
> -static void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
> -					      struct kvm_page_fault *fault)
> -{
> -	kvm_prepare_memory_fault_exit(vcpu, fault->gfn << PAGE_SHIFT,
> -				      PAGE_SIZE, fault->write, fault->exec,
> -				      fault->is_private);
> -}
> -
>   static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
>   				   struct kvm_page_fault *fault)
>   {
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index 5390a591a571..61f49967047a 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -279,6 +279,14 @@ enum {
>   	RET_PF_SPURIOUS,
>   };
>   
> +static inline void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
> +						     struct kvm_page_fault *fault)
> +{
> +	kvm_prepare_memory_fault_exit(vcpu, fault->gfn << PAGE_SHIFT,
> +				      PAGE_SIZE, fault->write, fault->exec,
> +				      fault->is_private);
> +}
> +
>   static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>   					u32 err, bool prefetch, int *emulation_type)
>   {
> @@ -320,6 +328,17 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>   	else
>   		r = vcpu->arch.mmu->page_fault(vcpu, &fault);
>   
> +	/*
> +	 * Not sure what's happening, but punt to userspace and hope that
> +	 * they can fix it by changing memory to shared, or they can
> +	 * provide a better error.
> +	 */
> +	if (r == RET_PF_EMULATE && fault.is_private) {
> +		pr_warn_ratelimited("kvm: unexpected emulation request on private memory\n");
> +		kvm_mmu_prepare_memory_fault_exit(vcpu, &fault);
> +		return -EFAULT;
> +	}
> +
>   	if (fault.write_fault_to_shadow_pgtable && emulation_type)
>   		*emulation_type |= EMULTYPE_WRITE_PF_TO_SP;
>   


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 02/17] KVM: x86: Remove separate "bit" defines for page fault error code masks
  2024-05-07 15:58 ` [PATCH 02/17] KVM: x86: Remove separate "bit" defines for page fault error code masks Paolo Bonzini
@ 2024-05-13  5:29   ` Xiaoyao Li
  0 siblings, 0 replies; 34+ messages in thread
From: Xiaoyao Li @ 2024-05-13  5:29 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: Sean Christopherson

On 5/7/2024 11:58 PM, Paolo Bonzini wrote:
> From: Sean Christopherson <seanjc@google.com>
> 
> Open code the bit number directly in the PFERR_* masks and drop the
> intermediate PFERR_*_BIT defines, as having to bounce through two macros
> just to see which flag corresponds to which bit is quite annoying,

can't agree more on it.

> as is
> having to define two macros just to add recognition of a new flag.
> 
> Use ternary operator to derive the bit in permission_fault(), the one
> function that actually needs the bit number as part of clever shifting
> to avoid conditional branches.  Generally the compiler is able to turn
> it into a conditional move, and if not it's not really a big deal.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Message-ID: <20240228024147.41573-3-seanjc@google.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.co
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

> ---
>   arch/x86/include/asm/kvm_host.h | 32 ++++++++++----------------------
>   arch/x86/kvm/mmu.h              |  5 ++---
>   2 files changed, 12 insertions(+), 25 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 9f92bdb78504..a047480da5af 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -254,28 +254,16 @@ enum x86_intercept_stage;
>   	KVM_GUESTDBG_INJECT_DB | \
>   	KVM_GUESTDBG_BLOCKIRQ)
>   
> -
> -#define PFERR_PRESENT_BIT 0
> -#define PFERR_WRITE_BIT 1
> -#define PFERR_USER_BIT 2
> -#define PFERR_RSVD_BIT 3
> -#define PFERR_FETCH_BIT 4
> -#define PFERR_PK_BIT 5
> -#define PFERR_SGX_BIT 15
> -#define PFERR_GUEST_FINAL_BIT 32
> -#define PFERR_GUEST_PAGE_BIT 33
> -#define PFERR_IMPLICIT_ACCESS_BIT 48
> -
> -#define PFERR_PRESENT_MASK	BIT(PFERR_PRESENT_BIT)
> -#define PFERR_WRITE_MASK	BIT(PFERR_WRITE_BIT)
> -#define PFERR_USER_MASK		BIT(PFERR_USER_BIT)
> -#define PFERR_RSVD_MASK		BIT(PFERR_RSVD_BIT)
> -#define PFERR_FETCH_MASK	BIT(PFERR_FETCH_BIT)
> -#define PFERR_PK_MASK		BIT(PFERR_PK_BIT)
> -#define PFERR_SGX_MASK		BIT(PFERR_SGX_BIT)
> -#define PFERR_GUEST_FINAL_MASK	BIT_ULL(PFERR_GUEST_FINAL_BIT)
> -#define PFERR_GUEST_PAGE_MASK	BIT_ULL(PFERR_GUEST_PAGE_BIT)
> -#define PFERR_IMPLICIT_ACCESS	BIT_ULL(PFERR_IMPLICIT_ACCESS_BIT)
> +#define PFERR_PRESENT_MASK	BIT(0)
> +#define PFERR_WRITE_MASK	BIT(1)
> +#define PFERR_USER_MASK		BIT(2)
> +#define PFERR_RSVD_MASK		BIT(3)
> +#define PFERR_FETCH_MASK	BIT(4)
> +#define PFERR_PK_MASK		BIT(5)
> +#define PFERR_SGX_MASK		BIT(15)
> +#define PFERR_GUEST_FINAL_MASK	BIT_ULL(32)
> +#define PFERR_GUEST_PAGE_MASK	BIT_ULL(33)
> +#define PFERR_IMPLICIT_ACCESS	BIT_ULL(48)
>   
>   #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK |	\
>   				 PFERR_WRITE_MASK |		\
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 60f21bb4c27b..2343c9f00e31 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -213,7 +213,7 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>   	 */
>   	u64 implicit_access = access & PFERR_IMPLICIT_ACCESS;
>   	bool not_smap = ((rflags & X86_EFLAGS_AC) | implicit_access) == X86_EFLAGS_AC;
> -	int index = (pfec + (not_smap << PFERR_RSVD_BIT)) >> 1;
> +	int index = (pfec | (not_smap ? PFERR_RSVD_MASK : 0)) >> 1;
>   	u32 errcode = PFERR_PRESENT_MASK;
>   	bool fault;
>   
> @@ -234,8 +234,7 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>   		pkru_bits = (vcpu->arch.pkru >> (pte_pkey * 2)) & 3;
>   
>   		/* clear present bit, replace PFEC.RSVD with ACC_USER_MASK. */
> -		offset = (pfec & ~1) +
> -			((pte_access & PT_USER_MASK) << (PFERR_RSVD_BIT - PT_USER_SHIFT));
> +		offset = (pfec & ~1) | ((pte_access & PT_USER_MASK) ? PFERR_RSVD_MASK : 0);
>   
>   		pkru_bits &= mmu->pkru_mask >> offset;
>   		errcode |= -pkru_bits & PFERR_PK_MASK;


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 04/17] KVM: x86: Move synthetic PFERR_* sanity checks to SVM's #NPF handler
  2024-05-07 15:58 ` [PATCH 04/17] KVM: x86: Move synthetic PFERR_* sanity checks to SVM's #NPF handler Paolo Bonzini
@ 2024-05-13  5:50   ` Xiaoyao Li
  2024-05-13 17:31     ` Sean Christopherson
  0 siblings, 1 reply; 34+ messages in thread
From: Xiaoyao Li @ 2024-05-13  5:50 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm
  Cc: Sean Christopherson, Kai Huang, Binbin Wu

On 5/7/2024 11:58 PM, Paolo Bonzini wrote:
> From: Sean Christopherson <seanjc@google.com>
> 
> Move the sanity check that hardware never sets bits that collide with KVM-
> define synthetic bits from kvm_mmu_page_fault() to npf_interception(),
> i.e. make the sanity check #NPF specific.  The legacy #PF path already
> WARNs if _any_ of bits 63:32 are set, and the error code that comes from
> VMX's EPT Violatation and Misconfig is 100% synthesized (KVM morphs VMX's
> EXIT_QUALIFICATION into error code flags).
> 
> Add a compile-time assert in the legacy #PF handler to make sure that KVM-
> define flags are covered by its existing sanity check on the upper bits.
> 
> Opportunistically add a description of PFERR_IMPLICIT_ACCESS, since we
> are removing the comment that defined it.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Reviewed-by: Kai Huang <kai.huang@intel.com>
> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> Message-ID: <20240228024147.41573-8-seanjc@google.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   arch/x86/include/asm/kvm_host.h |  6 ++++++
>   arch/x86/kvm/mmu/mmu.c          | 14 +++-----------
>   arch/x86/kvm/svm/svm.c          |  9 +++++++++
>   3 files changed, 18 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 58bbcf76ad1e..12e727301262 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -267,7 +267,13 @@ enum x86_intercept_stage;
>   #define PFERR_GUEST_ENC_MASK	BIT_ULL(34)
>   #define PFERR_GUEST_SIZEM_MASK	BIT_ULL(35)
>   #define PFERR_GUEST_VMPL_MASK	BIT_ULL(36)
> +
> +/*
> + * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP checks
> + * when emulating instructions that triggers implicit access.
> + */
>   #define PFERR_IMPLICIT_ACCESS	BIT_ULL(48)
> +#define PFERR_SYNTHETIC_MASK	(PFERR_IMPLICIT_ACCESS)
>   
>   #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK |	\
>   				 PFERR_WRITE_MASK |		\
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index c72a2033ca96..5562d693880a 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4502,6 +4502,9 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
>   		return -EFAULT;
>   #endif
>   
> +	/* Ensure the above sanity check also covers KVM-defined flags. */

1. There is no sanity check above related to KVM-defined flags yet. It 
has to be after Patch 6.

2. I somehow cannot parse the comment properly, though I know it's to 
ensure KVM-defined PFERR_SYNTHETIC_MASK not contain any bit below 32-bits.

> +	BUILD_BUG_ON(lower_32_bits(PFERR_SYNTHETIC_MASK));
> +
>   	vcpu->arch.l1tf_flush_l1d = true;
>   	if (!flags) {
>   		trace_kvm_page_fault(vcpu, fault_address, error_code);
> @@ -5786,17 +5789,6 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
>   	int r, emulation_type = EMULTYPE_PF;
>   	bool direct = vcpu->arch.mmu->root_role.direct;
>   
> -	/*
> -	 * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP
> -	 * checks when emulating instructions that triggers implicit access.
> -	 * WARN if hardware generates a fault with an error code that collides
> -	 * with the KVM-defined value.  Clear the flag and continue on, i.e.
> -	 * don't terminate the VM, as KVM can't possibly be relying on a flag
> -	 * that KVM doesn't know about.
> -	 */
> -	if (WARN_ON_ONCE(error_code & PFERR_IMPLICIT_ACCESS))
> -		error_code &= ~PFERR_IMPLICIT_ACCESS;
> -
>   	if (WARN_ON_ONCE(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
>   		return RET_PF_RETRY;
>   
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 0f3b59da0d4a..535018f152a3 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2047,6 +2047,15 @@ static int npf_interception(struct kvm_vcpu *vcpu)
>   	u64 fault_address = svm->vmcb->control.exit_info_2;
>   	u64 error_code = svm->vmcb->control.exit_info_1;
>   
> +	/*
> +	 * WARN if hardware generates a fault with an error code that collides
> +	 * with KVM-defined sythentic flags.  Clear the flags and continue on,
> +	 * i.e. don't terminate the VM, as KVM can't possibly be relying on a
> +	 * flag that KVM doesn't know about.
> +	 */
> +	if (WARN_ON_ONCE(error_code & PFERR_SYNTHETIC_MASK))
> +		error_code &= ~PFERR_SYNTHETIC_MASK;
> +
>   	trace_kvm_page_fault(vcpu, fault_address, error_code);
>   	return kvm_mmu_page_fault(vcpu, fault_address, error_code,
>   			static_cpu_has(X86_FEATURE_DECODEASSISTS) ?


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 07/17] KVM: x86/mmu: Use synthetic page fault error code to indicate private faults
  2024-05-07 15:58 ` [PATCH 07/17] KVM: x86/mmu: Use synthetic page fault error code to indicate private faults Paolo Bonzini
@ 2024-05-13  5:56   ` Xiaoyao Li
  0 siblings, 0 replies; 34+ messages in thread
From: Xiaoyao Li @ 2024-05-13  5:56 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: Sean Christopherson

On 5/7/2024 11:58 PM, Paolo Bonzini wrote:
> From: Sean Christopherson <seanjc@google.com>
> 
> Add and use a synthetic, KVM-defined page fault error code to indicate
> whether a fault is to private vs. shared memory.  TDX and SNP have
> different mechanisms for reporting private vs. shared, and KVM's
> software-protected VMs have no mechanism at all.  Usurp an error code
> flag to avoid having to plumb another parameter to kvm_mmu_page_fault()
> and friends.
> 
> Alternatively, KVM could borrow AMD's PFERR_GUEST_ENC_MASK, i.e. set it
> for TDX and software-protected VMs as appropriate, but that would require
> *clearing* the flag for SEV and SEV-ES VMs, which support encrypted
> memory at the hardware layer, but don't utilize private memory at the
> KVM layer.
> 
> Opportunistically add a comment to call out that the logic for software-
> protected VMs is (and was before this commit) broken for nested MMUs, i.e.
> for nested TDP, as the GPA is an L2 GPA.  Punt on trying to play nice with
> nested MMUs as there is a _lot_ of functionality that simply doesn't work
> for software-protected VMs, e.g. all of the paths where KVM accesses guest
> memory need to be updated to be aware of private vs. shared memory.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Message-Id: <20240228024147.41573-6-seanjc@google.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

> ---
>   arch/x86/include/asm/kvm_host.h |  7 ++++++-
>   arch/x86/kvm/mmu/mmu.c          | 14 ++++++++++++++
>   arch/x86/kvm/mmu/mmu_internal.h |  2 +-
>   3 files changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 12e727301262..0dc755a6dc0c 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -273,7 +273,12 @@ enum x86_intercept_stage;
>    * when emulating instructions that triggers implicit access.
>    */
>   #define PFERR_IMPLICIT_ACCESS	BIT_ULL(48)
> -#define PFERR_SYNTHETIC_MASK	(PFERR_IMPLICIT_ACCESS)
> +/*
> + * PRIVATE_ACCESS is a KVM-defined flag us to indicate that a fault occurred
> + * when the guest was accessing private memory.
> + */
> +#define PFERR_PRIVATE_ACCESS   BIT_ULL(49)
> +#define PFERR_SYNTHETIC_MASK   (PFERR_IMPLICIT_ACCESS | PFERR_PRIVATE_ACCESS)
>   
>   #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK |	\
>   				 PFERR_WRITE_MASK |		\
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 3609167ba30e..eb041acec2dc 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5799,6 +5799,20 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
>   	if (WARN_ON_ONCE(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
>   		return RET_PF_RETRY;
>   
> +	/*
> +	 * Except for reserved faults (emulated MMIO is shared-only), set the
> +	 * PFERR_PRIVATE_ACCESS flag for software-protected VMs based on the gfn's
> +	 * current attributes, which are the source of truth for such VMs.  Note,
> +	 * this wrong for nested MMUs as the GPA is an L2 GPA, but KVM doesn't
> +	 * currently supported nested virtualization (among many other things)
> +	 * for software-protected VMs.
> +	 */
> +	if (IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) &&
> +	    !(error_code & PFERR_RSVD_MASK) &&
> +	    vcpu->kvm->arch.vm_type == KVM_X86_SW_PROTECTED_VM &&
> +	    kvm_mem_is_private(vcpu->kvm, gpa_to_gfn(cr2_or_gpa)))
> +		error_code |= PFERR_PRIVATE_ACCESS;
> +
>   	r = RET_PF_INVALID;
>   	if (unlikely(error_code & PFERR_RSVD_MASK)) {
>   		r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct);
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index 797b80f996a7..dfd9ff383663 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -306,7 +306,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>   		.max_level = KVM_MAX_HUGEPAGE_LEVEL,
>   		.req_level = PG_LEVEL_4K,
>   		.goal_level = PG_LEVEL_4K,
> -		.is_private = kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT),
> +		.is_private = err & PFERR_PRIVATE_ACCESS,
>   	};
>   	int r;
>   


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 09/17] KVM: x86/mmu: WARN and skip MMIO cache on private, reserved page faults
  2024-05-07 15:58 ` [PATCH 09/17] KVM: x86/mmu: WARN and skip MMIO cache on private, reserved page faults Paolo Bonzini
@ 2024-05-13  6:15   ` Xiaoyao Li
  0 siblings, 0 replies; 34+ messages in thread
From: Xiaoyao Li @ 2024-05-13  6:15 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: Sean Christopherson

On 5/7/2024 11:58 PM, Paolo Bonzini wrote:
> From: Sean Christopherson <seanjc@google.com>
> 
> WARN and skip the emulated MMIO fastpath if a private, reserved page fault
> is encountered, as private+reserved should be an impossible combination
> (KVM should never create an MMIO SPTE for a private access).
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Message-ID: <20240228024147.41573-9-seanjc@google.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

> ---
>   arch/x86/kvm/mmu/mmu.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index d52794663290..0d884d0b0f35 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5819,6 +5819,9 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
>   
>   	r = RET_PF_INVALID;
>   	if (unlikely(error_code & PFERR_RSVD_MASK)) {
> +		if (WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))
> +			return -EFAULT;
> +
>   		r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct);
>   		if (r == RET_PF_EMULATE)
>   			goto emulate;


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 10/17] KVM: x86/mmu: Move private vs. shared check above slot validity checks
  2024-05-07 15:58 ` [PATCH 10/17] KVM: x86/mmu: Move private vs. shared check above slot validity checks Paolo Bonzini
@ 2024-05-13  6:22   ` Xiaoyao Li
  0 siblings, 0 replies; 34+ messages in thread
From: Xiaoyao Li @ 2024-05-13  6:22 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm
  Cc: Sean Christopherson, Yu Zhang, Chao Peng, Fuad Tabba,
	Michael Roth, Isaku Yamahata, Kai Huang

On 5/7/2024 11:58 PM, Paolo Bonzini wrote:
> From: Sean Christopherson <seanjc@google.com>
> 
> Prioritize private vs. shared gfn attribute checks above slot validity
> checks to ensure a consistent userspace ABI.  E.g. as is, KVM will exit to
> userspace if there is no memslot, but emulate accesses to the APIC access
> page even if the attributes mismatch.
> 
> Fixes: 8dd2eee9d526 ("KVM: x86/mmu: Handle page fault for private memory")
> Cc: Yu Zhang <yu.c.zhang@linux.intel.com>
> Cc: Chao Peng <chao.p.peng@linux.intel.com>
> Cc: Fuad Tabba <tabba@google.com>
> Cc: Michael Roth <michael.roth@amd.com>
> Cc: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Reviewed-by: Kai Huang <kai.huang@intel.com>
> Message-ID: <20240228024147.41573-10-seanjc@google.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

> ---
>   arch/x86/kvm/mmu/mmu.c | 20 +++++++++++++++-----
>   1 file changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 0d884d0b0f35..ba50b93e93ed 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4317,11 +4317,6 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>   			return RET_PF_EMULATE;
>   	}
>   
> -	if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) {
> -		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> -		return -EFAULT;
> -	}
> -
>   	if (fault->is_private)
>   		return kvm_faultin_pfn_private(vcpu, fault);
>   
> @@ -4359,9 +4354,24 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
>   {
>   	int ret;
>   
> +	/*
> +	 * Note that the mmu_invalidate_seq also serves to detect a concurrent
> +	 * change in attributes.  is_page_fault_stale() will detect an
> +	 * invalidation relate to fault->fn and resume the guest without

s/fault->fn/fault->gfn/

> +	 * installing a mapping in the page tables.
> +	 */
>   	fault->mmu_seq = vcpu->kvm->mmu_invalidate_seq;
>   	smp_rmb();
>   
> +	/*
> +	 * Now that we have a snapshot of mmu_invalidate_seq we can check for a
> +	 * private vs. shared mismatch.
> +	 */
> +	if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) {
> +		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> +		return -EFAULT;
> +	}
> +
>   	/*
>   	 * Check for a relevant mmu_notifier invalidation event before getting
>   	 * the pfn from the primary MMU, and before acquiring mmu_lock.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 12/17] KVM: x86/mmu: Explicitly disallow private accesses to emulated MMIO
  2024-05-07 15:58 ` [PATCH 12/17] KVM: x86/mmu: Explicitly disallow private accesses to emulated MMIO Paolo Bonzini
@ 2024-05-13  6:26   ` Xiaoyao Li
  0 siblings, 0 replies; 34+ messages in thread
From: Xiaoyao Li @ 2024-05-13  6:26 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: Sean Christopherson

On 5/7/2024 11:58 PM, Paolo Bonzini wrote:
> From: Sean Christopherson <seanjc@google.com>
> 
> Explicitly detect and disallow private accesses to emulated MMIO in
> kvm_handle_noslot_fault() instead of relying on kvm_faultin_pfn_private()
> to perform the check.  This will allow the page fault path to go straight
> to kvm_handle_noslot_fault() without bouncing through __kvm_faultin_pfn().
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Message-ID: <20240228024147.41573-12-seanjc@google.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

> ---
>   arch/x86/kvm/mmu/mmu.c | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index a8e14c2b68a7..fdae6d19e72b 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3262,6 +3262,11 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
>   {
>   	gva_t gva = fault->is_tdp ? 0 : fault->addr;
>   
> +	if (fault->is_private) {
> +		kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> +		return -EFAULT;
> +	}
> +
>   	vcpu_cache_mmio_info(vcpu, gva, fault->gfn,
>   			     access & shadow_mmio_access_mask);
>   


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 13/17] KVM: x86/mmu: Move slot checks from __kvm_faultin_pfn() to kvm_faultin_pfn()
  2024-05-07 15:58 ` [PATCH 13/17] KVM: x86/mmu: Move slot checks from __kvm_faultin_pfn() to kvm_faultin_pfn() Paolo Bonzini
@ 2024-05-13  6:27   ` Xiaoyao Li
  0 siblings, 0 replies; 34+ messages in thread
From: Xiaoyao Li @ 2024-05-13  6:27 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: Sean Christopherson, Kai Huang

On 5/7/2024 11:58 PM, Paolo Bonzini wrote:
> From: Sean Christopherson <seanjc@google.com>
> 
> Move the checks related to the validity of an access to a memslot from the
> inner __kvm_faultin_pfn() to its sole caller, kvm_faultin_pfn().  This
> allows emulating accesses to the APIC access page, which don't need to
> resolve a pfn, even if there is a relevant in-progress mmu_notifier
> invalidation.  Ditto for accesses to KVM internal memslots from L2, which
> KVM also treats as emulated MMIO.
> 
> More importantly, this will allow for future cleanup by having the
> "no memslot" case bail from kvm_faultin_pfn() very early on.
> 
> Go to rather extreme and gross lengths to make the change a glorified
> nop, e.g. call into __kvm_faultin_pfn() even when there is no slot, as the
> related code is very subtle.  E.g. fault->slot can be nullified if it
> points at the APIC access page, some flows in KVM x86 expect fault->pfn
> to be KVM_PFN_NOSLOT, while others check only fault->slot, etc.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Reviewed-by: Kai Huang <kai.huang@intel.com>
> Message-ID: <20240228024147.41573-13-seanjc@google.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 14/17] KVM: x86/mmu: Handle no-slot faults at the beginning of kvm_faultin_pfn()
  2024-05-07 15:58 ` [PATCH 14/17] KVM: x86/mmu: Handle no-slot faults at the beginning of kvm_faultin_pfn() Paolo Bonzini
@ 2024-05-13  6:28   ` Xiaoyao Li
  0 siblings, 0 replies; 34+ messages in thread
From: Xiaoyao Li @ 2024-05-13  6:28 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm
  Cc: Sean Christopherson, David Matlack, Kai Huang

On 5/7/2024 11:58 PM, Paolo Bonzini wrote:
> From: Sean Christopherson <seanjc@google.com>
> 
> Handle the "no memslot" case at the beginning of kvm_faultin_pfn(), just
> after the private versus shared check, so that there's no need to
> repeatedly query whether or not a slot exists.  This also makes it more
> obvious that, except for private vs. shared attributes, the process of
> faulting in a pfn simply doesn't apply to gfns without a slot.
> 
> Opportunistically stuff @fault's metadata in kvm_handle_noslot_fault() so
> that it doesn't need to be duplicated in all paths that invoke
> kvm_handle_noslot_fault(), and to minimize the probability of not stuffing
> the right fields.
> 
> Leave the existing handle behind, but convert it to a WARN, to guard
> against __kvm_faultin_pfn() unexpectedly nullifying fault->slot.
> 
> Cc: David Matlack <dmatlack@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Reviewed-by: Kai Huang <kai.huang@intel.com>
> Message-ID: <20240228024147.41573-14-seanjc@google.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 15/17] KVM: x86/mmu: Set kvm_page_fault.hva to KVM_HVA_ERR_BAD for "no slot" faults
  2024-05-07 15:58 ` [PATCH 15/17] KVM: x86/mmu: Set kvm_page_fault.hva to KVM_HVA_ERR_BAD for "no slot" faults Paolo Bonzini
@ 2024-05-13  6:28   ` Xiaoyao Li
  0 siblings, 0 replies; 34+ messages in thread
From: Xiaoyao Li @ 2024-05-13  6:28 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: Sean Christopherson, Kai Huang

On 5/7/2024 11:58 PM, Paolo Bonzini wrote:
> From: Sean Christopherson <seanjc@google.com>
> 
> Explicitly set fault->hva to KVM_HVA_ERR_BAD when handling a "no slot"
> fault to ensure that KVM doesn't use a bogus virtual address, e.g. if
> there *was* a slot but it's unusable (APIC access page), or if there
> really was no slot, in which case fault->hva will be '0' (which is a
> legal address for x86).
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Reviewed-by: Kai Huang <kai.huang@intel.com>
> Message-ID: <20240228024147.41573-15-seanjc@google.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

> ---
>   arch/x86/kvm/mmu/mmu.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 7630ad8cb022..d717d60c6f19 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3273,6 +3273,7 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
>   	fault->slot = NULL;
>   	fault->pfn = KVM_PFN_NOSLOT;
>   	fault->map_writable = false;
> +	fault->hva = KVM_HVA_ERR_BAD;
>   
>   	/*
>   	 * If MMIO caching is disabled, emulate immediately without


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 16/17] KVM: x86/mmu: Initialize kvm_page_fault's pfn and hva to error values
  2024-05-07 15:58 ` [PATCH 16/17] KVM: x86/mmu: Initialize kvm_page_fault's pfn and hva to error values Paolo Bonzini
@ 2024-05-13  6:29   ` Xiaoyao Li
  0 siblings, 0 replies; 34+ messages in thread
From: Xiaoyao Li @ 2024-05-13  6:29 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: Sean Christopherson, Kai Huang

On 5/7/2024 11:58 PM, Paolo Bonzini wrote:
> From: Sean Christopherson <seanjc@google.com>
> 
> Explicitly set "pfn" and "hva" to error values in kvm_mmu_do_page_fault()
> to harden KVM against using "uninitialized" values.  In quotes because the
> fields are actually zero-initialized, and zero is a legal value for both
> page frame numbers and virtual addresses.  E.g. failure to set "pfn" prior
> to creating an SPTE could result in KVM pointing at physical address '0',
> which is far less desirable than KVM generating a SPTE with reserved PA
> bits set and thus effectively killing the VM.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Reviewed-by: Kai Huang <kai.huang@intel.com>
> Message-ID: <20240228024147.41573-16-seanjc@google.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

> ---
>   arch/x86/kvm/mmu/mmu_internal.h | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index dfd9ff383663..ce2fcd19ba6b 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -307,6 +307,9 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>   		.req_level = PG_LEVEL_4K,
>   		.goal_level = PG_LEVEL_4K,
>   		.is_private = err & PFERR_PRIVATE_ACCESS,
> +
> +		.pfn = KVM_PFN_ERR_FAULT,
> +		.hva = KVM_HVA_ERR_BAD,
>   	};
>   	int r;
>   


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 17/17] KVM: x86/mmu: Sanity check that __kvm_faultin_pfn() doesn't create noslot pfns
  2024-05-07 15:58 ` [PATCH 17/17] KVM: x86/mmu: Sanity check that __kvm_faultin_pfn() doesn't create noslot pfns Paolo Bonzini
@ 2024-05-13  6:40   ` Xiaoyao Li
  0 siblings, 0 replies; 34+ messages in thread
From: Xiaoyao Li @ 2024-05-13  6:40 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm
  Cc: Sean Christopherson, David Matlack, Kai Huang

On 5/7/2024 11:58 PM, Paolo Bonzini wrote:
> From: Sean Christopherson <seanjc@google.com>
> 
> WARN if __kvm_faultin_pfn() generates a "no slot" pfn, and gracefully
> handle the unexpected behavior instead of continuing on with dangerous
> state, e.g. tdp_mmu_map_handle_target_level() _only_ checks fault->slot,
> and so could install a bogus PFN into the guest.
> 
> The existing code is functionally ok, because kvm_faultin_pfn() pre-checks
> all of the cases that result in KVM_PFN_NOSLOT, but it is unnecessarily
> unsafe as it relies on __gfn_to_pfn_memslot() getting the _exact_ same
> memslot, i.e. not a re-retrieved pointer with KVM_MEMSLOT_INVALID set.
> And checking only fault->slot would fall apart if KVM ever added a flag or
> condition that forced emulation, similar to how KVM handles writes to
> read-only memslots.
> 
> Cc: David Matlack <dmatlack@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Reviewed-by: Kai Huang <kai.huang@intel.com>
> Message-ID: <20240228024147.41573-17-seanjc@google.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

> ---
>   arch/x86/kvm/mmu/mmu.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index d717d60c6f19..510eb1117012 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4425,7 +4425,7 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
>   	if (unlikely(is_error_pfn(fault->pfn)))
>   		return kvm_handle_error_pfn(vcpu, fault);
>   
> -	if (WARN_ON_ONCE(!fault->slot))
> +	if (WARN_ON_ONCE(!fault->slot || is_noslot_pfn(fault->pfn)))
>   		return kvm_handle_noslot_fault(vcpu, fault, access);
>   
>   	/*


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 04/17] KVM: x86: Move synthetic PFERR_* sanity checks to SVM's #NPF handler
  2024-05-13  5:50   ` Xiaoyao Li
@ 2024-05-13 17:31     ` Sean Christopherson
  2024-05-14  4:25       ` Xiaoyao Li
  0 siblings, 1 reply; 34+ messages in thread
From: Sean Christopherson @ 2024-05-13 17:31 UTC (permalink / raw)
  To: Xiaoyao Li; +Cc: Paolo Bonzini, linux-kernel, kvm, Kai Huang, Binbin Wu

On Mon, May 13, 2024, Xiaoyao Li wrote:
> On 5/7/2024 11:58 PM, Paolo Bonzini wrote:
> > From: Sean Christopherson <seanjc@google.com>
> > 
> > Move the sanity check that hardware never sets bits that collide with KVM-
> > define synthetic bits from kvm_mmu_page_fault() to npf_interception(),
> > i.e. make the sanity check #NPF specific.  The legacy #PF path already
> > WARNs if _any_ of bits 63:32 are set, and the error code that comes from
> > VMX's EPT Violatation and Misconfig is 100% synthesized (KVM morphs VMX's
> > EXIT_QUALIFICATION into error code flags).
> > 
> > Add a compile-time assert in the legacy #PF handler to make sure that KVM-
> > define flags are covered by its existing sanity check on the upper bits.
> > 
> > Opportunistically add a description of PFERR_IMPLICIT_ACCESS, since we
> > are removing the comment that defined it.
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > Reviewed-by: Kai Huang <kai.huang@intel.com>
> > Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> > Message-ID: <20240228024147.41573-8-seanjc@google.com>
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> >   arch/x86/include/asm/kvm_host.h |  6 ++++++
> >   arch/x86/kvm/mmu/mmu.c          | 14 +++-----------
> >   arch/x86/kvm/svm/svm.c          |  9 +++++++++
> >   3 files changed, 18 insertions(+), 11 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 58bbcf76ad1e..12e727301262 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -267,7 +267,13 @@ enum x86_intercept_stage;
> >   #define PFERR_GUEST_ENC_MASK	BIT_ULL(34)
> >   #define PFERR_GUEST_SIZEM_MASK	BIT_ULL(35)
> >   #define PFERR_GUEST_VMPL_MASK	BIT_ULL(36)
> > +
> > +/*
> > + * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP checks
> > + * when emulating instructions that triggers implicit access.
> > + */
> >   #define PFERR_IMPLICIT_ACCESS	BIT_ULL(48)
> > +#define PFERR_SYNTHETIC_MASK	(PFERR_IMPLICIT_ACCESS)
> >   #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK |	\
> >   				 PFERR_WRITE_MASK |		\
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index c72a2033ca96..5562d693880a 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -4502,6 +4502,9 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
> >   		return -EFAULT;
> >   #endif
> > +	/* Ensure the above sanity check also covers KVM-defined flags. */
> 
> 1. There is no sanity check above related to KVM-defined flags yet. It has
> to be after Patch 6.

Ya, it's not just the comment, the entire changelog expects this patch to land
after patch 6.
> 
> 2. I somehow cannot parse the comment properly, though I know it's to ensure
> KVM-defined PFERR_SYNTHETIC_MASK not contain any bit below 32-bits.

Hmm, how about this?

	/*
	 * Ensure that the above sanity check on hardware error code bits 63:32
	 * also prevents false positives on KVM-defined flags.
	 */

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 04/17] KVM: x86: Move synthetic PFERR_* sanity checks to SVM's #NPF handler
  2024-05-13 17:31     ` Sean Christopherson
@ 2024-05-14  4:25       ` Xiaoyao Li
  2024-05-14 15:32         ` Sean Christopherson
  0 siblings, 1 reply; 34+ messages in thread
From: Xiaoyao Li @ 2024-05-14  4:25 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, linux-kernel, kvm, Kai Huang, Binbin Wu

On 5/14/2024 1:31 AM, Sean Christopherson wrote:
> On Mon, May 13, 2024, Xiaoyao Li wrote:
>> On 5/7/2024 11:58 PM, Paolo Bonzini wrote:
>>> From: Sean Christopherson <seanjc@google.com>
>>>
>>> Move the sanity check that hardware never sets bits that collide with KVM-
>>> define synthetic bits from kvm_mmu_page_fault() to npf_interception(),
>>> i.e. make the sanity check #NPF specific.  The legacy #PF path already
>>> WARNs if _any_ of bits 63:32 are set, and the error code that comes from
>>> VMX's EPT Violatation and Misconfig is 100% synthesized (KVM morphs VMX's
>>> EXIT_QUALIFICATION into error code flags).
>>>
>>> Add a compile-time assert in the legacy #PF handler to make sure that KVM-
>>> define flags are covered by its existing sanity check on the upper bits.
>>>
>>> Opportunistically add a description of PFERR_IMPLICIT_ACCESS, since we
>>> are removing the comment that defined it.
>>>
>>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>>> Reviewed-by: Kai Huang <kai.huang@intel.com>
>>> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
>>> Message-ID: <20240228024147.41573-8-seanjc@google.com>
>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>> ---
>>>    arch/x86/include/asm/kvm_host.h |  6 ++++++
>>>    arch/x86/kvm/mmu/mmu.c          | 14 +++-----------
>>>    arch/x86/kvm/svm/svm.c          |  9 +++++++++
>>>    3 files changed, 18 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>> index 58bbcf76ad1e..12e727301262 100644
>>> --- a/arch/x86/include/asm/kvm_host.h
>>> +++ b/arch/x86/include/asm/kvm_host.h
>>> @@ -267,7 +267,13 @@ enum x86_intercept_stage;
>>>    #define PFERR_GUEST_ENC_MASK	BIT_ULL(34)
>>>    #define PFERR_GUEST_SIZEM_MASK	BIT_ULL(35)
>>>    #define PFERR_GUEST_VMPL_MASK	BIT_ULL(36)
>>> +
>>> +/*
>>> + * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP checks
>>> + * when emulating instructions that triggers implicit access.
>>> + */
>>>    #define PFERR_IMPLICIT_ACCESS	BIT_ULL(48)
>>> +#define PFERR_SYNTHETIC_MASK	(PFERR_IMPLICIT_ACCESS)
>>>    #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK |	\
>>>    				 PFERR_WRITE_MASK |		\
>>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>>> index c72a2033ca96..5562d693880a 100644
>>> --- a/arch/x86/kvm/mmu/mmu.c
>>> +++ b/arch/x86/kvm/mmu/mmu.c
>>> @@ -4502,6 +4502,9 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
>>>    		return -EFAULT;
>>>    #endif
>>> +	/* Ensure the above sanity check also covers KVM-defined flags. */
>>
>> 1. There is no sanity check above related to KVM-defined flags yet. It has
>> to be after Patch 6.
> 
> Ya, it's not just the comment, the entire changelog expects this patch to land
> after patch 6.
>>
>> 2. I somehow cannot parse the comment properly, though I know it's to ensure
>> KVM-defined PFERR_SYNTHETIC_MASK not contain any bit below 32-bits.
> 
> Hmm, how about this?
> 
> 	/*
> 	 * Ensure that the above sanity check on hardware error code bits 63:32
> 	 * also prevents false positives on KVM-defined flags.
> 	 */
> 

Maybe it's just myself inability, I still cannot interpret it well.

Can't we put it above the sanity check of error code, and just with a 
comment like

	/*
  	 * Ensure KVM-defined flags not occupied any bits below 32-bits,
          * that are used by hardware.
	 * /

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 04/17] KVM: x86: Move synthetic PFERR_* sanity checks to SVM's #NPF handler
  2024-05-14  4:25       ` Xiaoyao Li
@ 2024-05-14 15:32         ` Sean Christopherson
  2024-05-15  1:03           ` Xiaoyao Li
  0 siblings, 1 reply; 34+ messages in thread
From: Sean Christopherson @ 2024-05-14 15:32 UTC (permalink / raw)
  To: Xiaoyao Li; +Cc: Paolo Bonzini, linux-kernel, kvm, Kai Huang, Binbin Wu

On Tue, May 14, 2024, Xiaoyao Li wrote:
> On 5/14/2024 1:31 AM, Sean Christopherson wrote:
> > On Mon, May 13, 2024, Xiaoyao Li wrote:
> > > On 5/7/2024 11:58 PM, Paolo Bonzini wrote:
> > > > +#define PFERR_SYNTHETIC_MASK	(PFERR_IMPLICIT_ACCESS)
> > > >    #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK |	\
> > > >    				 PFERR_WRITE_MASK |		\
> > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > index c72a2033ca96..5562d693880a 100644
> > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > @@ -4502,6 +4502,9 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
> > > >    		return -EFAULT;
> > > >    #endif
> > > > +	/* Ensure the above sanity check also covers KVM-defined flags. */
> > > 
> > > 1. There is no sanity check above related to KVM-defined flags yet. It has
> > > to be after Patch 6.
> > 
> > Ya, it's not just the comment, the entire changelog expects this patch to land
> > after patch 6.
> > > 
> > > 2. I somehow cannot parse the comment properly, though I know it's to ensure
> > > KVM-defined PFERR_SYNTHETIC_MASK not contain any bit below 32-bits.
> > 
> > Hmm, how about this?
> > 
> > 	/*
> > 	 * Ensure that the above sanity check on hardware error code bits 63:32
> > 	 * also prevents false positives on KVM-defined flags.
> > 	 */
> > 
> 
> Maybe it's just myself inability, I still cannot interpret it well.
> 
> Can't we put it above the sanity check of error code, and just with a
> comment like
> 
> 	/*
>  	 * Ensure KVM-defined flags not occupied any bits below 32-bits,
>        * that are used by hardware.

This is somewhat misleading, as hardware does use bits 63:32 (for #NPF), just not
for #PF error codes.  And the reason I'm using rather indirect wording is that
KVM _could_ define synthetic flags in bits 31:0, there's simply a higher probability
of needing to reshuffle bit numbers due to a conflict with a future feature.

Is this better?  I think it captures what you're looking for, while hopefully also
capturing that staying out of bits 31:0 isn't a hard requirement.

	/*
	 * Restrict KVM-defined flags to bits 63:32 so that it's impossible for
	 * them to conflict with #PF error codes, which are limited to 32 bits.
	 */

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 04/17] KVM: x86: Move synthetic PFERR_* sanity checks to SVM's #NPF handler
  2024-05-14 15:32         ` Sean Christopherson
@ 2024-05-15  1:03           ` Xiaoyao Li
  0 siblings, 0 replies; 34+ messages in thread
From: Xiaoyao Li @ 2024-05-15  1:03 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, linux-kernel, kvm, Kai Huang, Binbin Wu

On 5/14/2024 11:32 PM, Sean Christopherson wrote:
> On Tue, May 14, 2024, Xiaoyao Li wrote:
>> On 5/14/2024 1:31 AM, Sean Christopherson wrote:
>>> On Mon, May 13, 2024, Xiaoyao Li wrote:
>>>> On 5/7/2024 11:58 PM, Paolo Bonzini wrote:
>>>>> +#define PFERR_SYNTHETIC_MASK	(PFERR_IMPLICIT_ACCESS)
>>>>>     #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK |	\
>>>>>     				 PFERR_WRITE_MASK |		\
>>>>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>>>>> index c72a2033ca96..5562d693880a 100644
>>>>> --- a/arch/x86/kvm/mmu/mmu.c
>>>>> +++ b/arch/x86/kvm/mmu/mmu.c
>>>>> @@ -4502,6 +4502,9 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
>>>>>     		return -EFAULT;
>>>>>     #endif
>>>>> +	/* Ensure the above sanity check also covers KVM-defined flags. */
>>>>
>>>> 1. There is no sanity check above related to KVM-defined flags yet. It has
>>>> to be after Patch 6.
>>>
>>> Ya, it's not just the comment, the entire changelog expects this patch to land
>>> after patch 6.
>>>>
>>>> 2. I somehow cannot parse the comment properly, though I know it's to ensure
>>>> KVM-defined PFERR_SYNTHETIC_MASK not contain any bit below 32-bits.
>>>
>>> Hmm, how about this?
>>>
>>> 	/*
>>> 	 * Ensure that the above sanity check on hardware error code bits 63:32
>>> 	 * also prevents false positives on KVM-defined flags.
>>> 	 */
>>>
>>
>> Maybe it's just myself inability, I still cannot interpret it well.
>>
>> Can't we put it above the sanity check of error code, and just with a
>> comment like
>>
>> 	/*
>>   	 * Ensure KVM-defined flags not occupied any bits below 32-bits,
>>         * that are used by hardware.
> 
> This is somewhat misleading, as hardware does use bits 63:32 (for #NPF), just not
> for #PF error codes.  And the reason I'm using rather indirect wording is that
> KVM _could_ define synthetic flags in bits 31:0, there's simply a higher probability
> of needing to reshuffle bit numbers due to a conflict with a future feature.
> 
> Is this better?  I think it captures what you're looking for, while hopefully also
> capturing that staying out of bits 31:0 isn't a hard requirement.

yeah, it looks better!

> 	/*
> 	 * Restrict KVM-defined flags to bits 63:32 so that it's impossible for
> 	 * them to conflict with #PF error codes, which are limited to 32 bits.
> 	 */


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2024-05-15  1:03 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-07 15:58 [PATCH v2 00/17] KVM: x86/mmu: Page fault and MMIO cleanups Paolo Bonzini
2024-05-07 15:58 ` [PATCH 01/17] KVM: x86/mmu: Exit to userspace with -EFAULT if private fault hits emulation Paolo Bonzini
2024-05-13  5:25   ` Xiaoyao Li
2024-05-07 15:58 ` [PATCH 02/17] KVM: x86: Remove separate "bit" defines for page fault error code masks Paolo Bonzini
2024-05-13  5:29   ` Xiaoyao Li
2024-05-07 15:58 ` [PATCH 03/17] KVM: x86: Define more SEV+ page fault error bits/flags for #NPF Paolo Bonzini
2024-05-07 15:58 ` [PATCH 04/17] KVM: x86: Move synthetic PFERR_* sanity checks to SVM's #NPF handler Paolo Bonzini
2024-05-13  5:50   ` Xiaoyao Li
2024-05-13 17:31     ` Sean Christopherson
2024-05-14  4:25       ` Xiaoyao Li
2024-05-14 15:32         ` Sean Christopherson
2024-05-15  1:03           ` Xiaoyao Li
2024-05-07 15:58 ` [PATCH 05/17] KVM: x86/mmu: Pass full 64-bit error code when handling page faults Paolo Bonzini
2024-05-07 15:58 ` [PATCH 06/17] KVM: x86/mmu: WARN if upper 32 bits of legacy #PF error code are non-zero Paolo Bonzini
2024-05-07 15:58 ` [PATCH 07/17] KVM: x86/mmu: Use synthetic page fault error code to indicate private faults Paolo Bonzini
2024-05-13  5:56   ` Xiaoyao Li
2024-05-07 15:58 ` [PATCH 08/17] KVM: x86/mmu: check for invalid async page faults involving private memory Paolo Bonzini
2024-05-07 15:58 ` [PATCH 09/17] KVM: x86/mmu: WARN and skip MMIO cache on private, reserved page faults Paolo Bonzini
2024-05-13  6:15   ` Xiaoyao Li
2024-05-07 15:58 ` [PATCH 10/17] KVM: x86/mmu: Move private vs. shared check above slot validity checks Paolo Bonzini
2024-05-13  6:22   ` Xiaoyao Li
2024-05-07 15:58 ` [PATCH 11/17] KVM: x86/mmu: Don't force emulation of L2 accesses to non-APIC internal slots Paolo Bonzini
2024-05-07 15:58 ` [PATCH 12/17] KVM: x86/mmu: Explicitly disallow private accesses to emulated MMIO Paolo Bonzini
2024-05-13  6:26   ` Xiaoyao Li
2024-05-07 15:58 ` [PATCH 13/17] KVM: x86/mmu: Move slot checks from __kvm_faultin_pfn() to kvm_faultin_pfn() Paolo Bonzini
2024-05-13  6:27   ` Xiaoyao Li
2024-05-07 15:58 ` [PATCH 14/17] KVM: x86/mmu: Handle no-slot faults at the beginning of kvm_faultin_pfn() Paolo Bonzini
2024-05-13  6:28   ` Xiaoyao Li
2024-05-07 15:58 ` [PATCH 15/17] KVM: x86/mmu: Set kvm_page_fault.hva to KVM_HVA_ERR_BAD for "no slot" faults Paolo Bonzini
2024-05-13  6:28   ` Xiaoyao Li
2024-05-07 15:58 ` [PATCH 16/17] KVM: x86/mmu: Initialize kvm_page_fault's pfn and hva to error values Paolo Bonzini
2024-05-13  6:29   ` Xiaoyao Li
2024-05-07 15:58 ` [PATCH 17/17] KVM: x86/mmu: Sanity check that __kvm_faultin_pfn() doesn't create noslot pfns Paolo Bonzini
2024-05-13  6:40   ` Xiaoyao Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox