[PATCH 1/2] kvm: x86/mmu: Reduce the update to the spte in FNAME(sync

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/2] kvm: x86/mmu: Reduce the update to the spte in FNAME(sync_page)
       [not found] <20221212153205.3360-1-jiangshanlai@gmail.com>
@ 2022-12-12 15:32 ` Lai Jiangshan
  2022-12-13 18:11   ` Sean Christopherson
  2022-12-12 15:32 ` [PATCH 2/2] kvm: x86/mmu: Remove useless shadow_host_writable_mask Lai Jiangshan
  1 sibling, 1 reply; 6+ messages in thread
From: Lai Jiangshan @ 2022-12-12 15:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paolo Bonzini, Sean Christopherson, Lai Jiangshan,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, kvm

From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

Sometimes when the guest updates its pagetable, it adds only new gptes
to it without changing any existed one, so there is no point to update
the sptes for these existed gptes.

Also when the sptes for these unchanged gptes are updated, the AD
bits are also removed since make_spte() is called with prefetch=true
which might result unneeded TLB flushing.

Do nothing if the permissions are unchanged or only write-access is
being added.  Only update the spte when write-access is being removed.
Drop the SPTE otherwise.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
 arch/x86/kvm/mmu/paging_tmpl.h | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index e5662dbd519c..613f043a3e9e 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -1023,7 +1023,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 	for (i = 0; i < SPTE_ENT_PER_PAGE; i++) {
 		u64 *sptep, spte;
 		struct kvm_memory_slot *slot;
-		unsigned pte_access;
+		unsigned old_pte_access, pte_access;
 		pt_element_t gpte;
 		gpa_t pte_gpa;
 		gfn_t gfn;
@@ -1064,6 +1064,23 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 			continue;
 		}
 
+		/*
+		 * Drop the SPTE if the new protections would result in access
+		 * permissions other than write-access is changing.  Do nothing
+		 * if the permissions are unchanged or only write-access is
+		 * being added.  Only update the spte when write-access is being
+		 * removed.
+		 */
+		old_pte_access = kvm_mmu_page_get_access(sp, i);
+		if (old_pte_access == pte_access ||
+		    (old_pte_access | ACC_WRITE_MASK) == pte_access)
+			continue;
+		if (old_pte_access != (pte_access | ACC_WRITE_MASK)) {
+			drop_spte(vcpu->kvm, &sp->spt[i]);
+			flush = true;
+			continue;
+		}
+
 		/* Update the shadowed access bits in case they changed. */
 		kvm_mmu_page_set_access(sp, i, pte_access);
 
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] kvm: x86/mmu: Remove useless shadow_host_writable_mask
       [not found] <20221212153205.3360-1-jiangshanlai@gmail.com>
  2022-12-12 15:32 ` [PATCH 1/2] kvm: x86/mmu: Reduce the update to the spte in FNAME(sync_page) Lai Jiangshan
@ 2022-12-12 15:32 ` Lai Jiangshan
  1 sibling, 0 replies; 6+ messages in thread
From: Lai Jiangshan @ 2022-12-12 15:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paolo Bonzini, Sean Christopherson, Lai Jiangshan,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, kvm

From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

shadow_host_writable_mask is only used in FNAME(sync_page) which
doesn't actually need it.

Remove it and release a bit from spte.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
 arch/x86/kvm/mmu/paging_tmpl.h |  7 ++++++-
 arch/x86/kvm/mmu/spte.c        |  8 +------
 arch/x86/kvm/mmu/spte.h        | 38 +++++++++++-----------------------
 3 files changed, 19 insertions(+), 34 deletions(-)

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 613f043a3e9e..8b83abf1d8bc 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -1084,9 +1084,14 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 		/* Update the shadowed access bits in case they changed. */
 		kvm_mmu_page_set_access(sp, i, pte_access);
 
+		/*
+		 * It doesn't matter whether it is host_writable or not since
+		 * write-access is being removed.
+		 */
+		host_writable = false;
+
 		sptep = &sp->spt[i];
 		spte = *sptep;
-		host_writable = spte & shadow_host_writable_mask;
 		slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
 		make_spte(vcpu, sp, slot, pte_access, gfn,
 			  spte_to_pfn(spte), spte, true, false,
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index c0fd7e049b4e..00c88b1dca0a 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -24,7 +24,6 @@ static bool __ro_after_init allow_mmio_caching;
 module_param_named(mmio_caching, enable_mmio_caching, bool, 0444);
 EXPORT_SYMBOL_GPL(enable_mmio_caching);
 
-u64 __read_mostly shadow_host_writable_mask;
 u64 __read_mostly shadow_mmu_writable_mask;
 u64 __read_mostly shadow_nx_mask;
 u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */
@@ -192,9 +191,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
 	if (shadow_memtype_mask)
 		spte |= static_call(kvm_x86_get_mt_mask)(vcpu, gfn,
 							 kvm_is_mmio_pfn(pfn));
-	if (host_writable)
-		spte |= shadow_host_writable_mask;
-	else
+	if (!host_writable)
 		pte_access &= ~ACC_WRITE_MASK;
 
 	if (shadow_me_value && !kvm_is_mmio_pfn(pfn))
@@ -332,7 +329,6 @@ u64 kvm_mmu_changed_pte_notifier_make_spte(u64 old_spte, kvm_pfn_t new_pfn)
 	new_spte |= (u64)new_pfn << PAGE_SHIFT;
 
 	new_spte &= ~PT_WRITABLE_MASK;
-	new_spte &= ~shadow_host_writable_mask;
 	new_spte &= ~shadow_mmu_writable_mask;
 
 	new_spte = mark_spte_for_access_track(new_spte);
@@ -440,7 +436,6 @@ void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only)
 	 */
 	shadow_memtype_mask	= VMX_EPT_MT_MASK | VMX_EPT_IPAT_BIT;
 	shadow_acc_track_mask	= VMX_EPT_RWX_MASK;
-	shadow_host_writable_mask = EPT_SPTE_HOST_WRITABLE;
 	shadow_mmu_writable_mask  = EPT_SPTE_MMU_WRITABLE;
 
 	/*
@@ -500,7 +495,6 @@ void kvm_mmu_reset_all_pte_masks(void)
 	shadow_me_mask		= 0;
 	shadow_me_value		= 0;
 
-	shadow_host_writable_mask = DEFAULT_SPTE_HOST_WRITABLE;
 	shadow_mmu_writable_mask  = DEFAULT_SPTE_MMU_WRITABLE;
 
 	/*
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 1f03701b943a..9824b33539c9 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -75,8 +75,7 @@ static_assert(!(SPTE_TDP_AD_MASK & SHADOW_ACC_TRACK_SAVED_MASK));
  * SPTE is write-protected. See is_writable_pte() for details.
  */
 
-/* Bits 9 and 10 are ignored by all non-EPT PTEs. */
-#define DEFAULT_SPTE_HOST_WRITABLE	BIT_ULL(9)
+/* Bit 10 are ignored by all non-EPT PTEs. */
 #define DEFAULT_SPTE_MMU_WRITABLE	BIT_ULL(10)
 
 /*
@@ -84,12 +83,9 @@ static_assert(!(SPTE_TDP_AD_MASK & SHADOW_ACC_TRACK_SAVED_MASK));
  * to not overlap the A/D type mask or the saved access bits of access-tracked
  * SPTEs when A/D bits are disabled.
  */
-#define EPT_SPTE_HOST_WRITABLE		BIT_ULL(57)
 #define EPT_SPTE_MMU_WRITABLE		BIT_ULL(58)
 
-static_assert(!(EPT_SPTE_HOST_WRITABLE & SPTE_TDP_AD_MASK));
 static_assert(!(EPT_SPTE_MMU_WRITABLE & SPTE_TDP_AD_MASK));
-static_assert(!(EPT_SPTE_HOST_WRITABLE & SHADOW_ACC_TRACK_SAVED_MASK));
 static_assert(!(EPT_SPTE_MMU_WRITABLE & SHADOW_ACC_TRACK_SAVED_MASK));
 
 /* Defined only to keep the above static asserts readable. */
@@ -148,7 +144,6 @@ static_assert(MMIO_SPTE_GEN_LOW_BITS == 8 && MMIO_SPTE_GEN_HIGH_BITS == 11);
 
 #define MMIO_SPTE_GEN_MASK		GENMASK_ULL(MMIO_SPTE_GEN_LOW_BITS + MMIO_SPTE_GEN_HIGH_BITS - 1, 0)
 
-extern u64 __read_mostly shadow_host_writable_mask;
 extern u64 __read_mostly shadow_mmu_writable_mask;
 extern u64 __read_mostly shadow_nx_mask;
 extern u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */
@@ -383,27 +378,23 @@ static __always_inline bool is_rsvd_spte(struct rsvd_bits_validate *rsvd_check,
  *
  * For cases #1 and #4, KVM can safely make such SPTEs writable without taking
  * mmu_lock as capturing the Accessed/Dirty state doesn't require taking it.
- * To differentiate #1 and #4 from #2 and #3, KVM uses two software-only bits
+ * To differentiate #1 and #4 from #2 and #3, KVM uses a software-only bit
  * in the SPTE:
  *
  *  shadow_mmu_writable_mask, aka MMU-writable -
  *    Cleared on SPTEs that KVM is currently write-protecting for shadow paging
  *    purposes (case 2 above).
- *
- *  shadow_host_writable_mask, aka Host-writable -
  *    Cleared on SPTEs that are not host-writable (case 3 above)
  *
- * Note, not all possible combinations of PT_WRITABLE_MASK,
- * shadow_mmu_writable_mask, and shadow_host_writable_mask are valid. A given
- * SPTE can be in only one of the following states, which map to the
- * aforementioned 3 cases:
+ * Note, not all possible combinations of PT_WRITABLE_MASK and
+ * shadow_mmu_writable_mask are valid. A given SPTE can be in only one of the
+ * following states, which map to the aforementioned 3 cases:
  *
- *   shadow_host_writable_mask | shadow_mmu_writable_mask | PT_WRITABLE_MASK
- *   ------------------------- | ------------------------ | ----------------
- *   1                         | 1                        | 1       (writable)
- *   1                         | 1                        | 0       (case 1)
- *   1                         | 0                        | 0       (case 2)
- *   0                         | 0                        | 0       (case 3)
+ *   shadow_mmu_writable_mask | PT_WRITABLE_MASK
+ *   ------------------------ | ----------------
+ *   1                        | 1       (writable)
+ *   1                        | 0       (case 1)
+ *   0                        | 0       (case 2,3)
  *
  * The valid combinations of these bits are checked by
  * check_spte_writable_invariants() whenever an SPTE is modified.
@@ -433,13 +424,8 @@ static inline bool is_writable_pte(unsigned long pte)
 /* Note: spte must be a shadow-present leaf SPTE. */
 static inline void check_spte_writable_invariants(u64 spte)
 {
-	if (spte & shadow_mmu_writable_mask)
-		WARN_ONCE(!(spte & shadow_host_writable_mask),
-			  "kvm: MMU-writable SPTE is not Host-writable: %llx",
-			  spte);
-	else
-		WARN_ONCE(is_writable_pte(spte),
-			  "kvm: Writable SPTE is not MMU-writable: %llx", spte);
+	WARN_ONCE(!(spte & shadow_mmu_writable_mask) && is_writable_pte(spte),
+		  "kvm: Writable SPTE is not MMU-writable: %llx", spte);
 }
 
 static inline bool is_mmu_writable_spte(u64 spte)
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] kvm: x86/mmu: Reduce the update to the spte in FNAME(sync_page)
  2022-12-12 15:32 ` [PATCH 1/2] kvm: x86/mmu: Reduce the update to the spte in FNAME(sync_page) Lai Jiangshan
@ 2022-12-13 18:11   ` Sean Christopherson
  2022-12-14 13:47     ` Lai Jiangshan
  2023-01-05 10:08     ` Lai Jiangshan
  0 siblings, 2 replies; 6+ messages in thread
From: Sean Christopherson @ 2022-12-13 18:11 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, Paolo Bonzini, Lai Jiangshan, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	kvm

On Mon, Dec 12, 2022, Lai Jiangshan wrote:
> From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> 
> Sometimes when the guest updates its pagetable, it adds only new gptes
> to it without changing any existed one, so there is no point to update
> the sptes for these existed gptes.
>
> Also when the sptes for these unchanged gptes are updated, the AD
> bits are also removed since make_spte() is called with prefetch=true
> which might result unneeded TLB flushing.

If either of the proposed changes is kept, please move this to a separate patch.
Skipping updates for PTEs with the same protections is separate logical change
from skipping updates when making the SPTE writable.

Actually, can't we just pass @prefetch=false to make_spte()?  FNAME(prefetch_invalid_gpte)
has already verified the Accessed bit is set in the GPTE, so at least for guest
correctness there's no need to access-track the SPTE.  Host page aging is already
fuzzy so I don't think there are problems there.

> Do nothing if the permissions are unchanged or only write-access is
> being added.

I'm pretty sure skipping the "make writable" case is architecturally wrong.  On a
#PF, any TLB entries for the faulting virtual address are required to be removed.
That means KVM _must_ refresh the SPTE if a vCPU takes a !WRITABLE fault on an
unsync page.  E.g. see kvm_inject_emulated_page_fault().

> Only update the spte when write-access is being removed.  Drop the SPTE
> otherwise.

Correctness aside, there needs to be far more analysis and justification for a
change like this, e.g. performance numbers for various workloads.

> ---
>  arch/x86/kvm/mmu/paging_tmpl.h | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index e5662dbd519c..613f043a3e9e 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -1023,7 +1023,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
>  	for (i = 0; i < SPTE_ENT_PER_PAGE; i++) {
>  		u64 *sptep, spte;
>  		struct kvm_memory_slot *slot;
> -		unsigned pte_access;
> +		unsigned old_pte_access, pte_access;
>  		pt_element_t gpte;
>  		gpa_t pte_gpa;
>  		gfn_t gfn;
> @@ -1064,6 +1064,23 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
>  			continue;
>  		}
>  
> +		/*
> +		 * Drop the SPTE if the new protections would result in access
> +		 * permissions other than write-access is changing.  Do nothing
> +		 * if the permissions are unchanged or only write-access is
> +		 * being added.  Only update the spte when write-access is being
> +		 * removed.
> +		 */
> +		old_pte_access = kvm_mmu_page_get_access(sp, i);
> +		if (old_pte_access == pte_access ||
> +		    (old_pte_access | ACC_WRITE_MASK) == pte_access)
> +			continue;
> +		if (old_pte_access != (pte_access | ACC_WRITE_MASK)) {
> +			drop_spte(vcpu->kvm, &sp->spt[i]);
> +			flush = true;
> +			continue;
> +		}
> +
>  		/* Update the shadowed access bits in case they changed. */
>  		kvm_mmu_page_set_access(sp, i, pte_access);
>  
> -- 
> 2.19.1.6.gb485710b
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] kvm: x86/mmu: Reduce the update to the spte in FNAME(sync_page)
  2022-12-13 18:11   ` Sean Christopherson
@ 2022-12-14 13:47     ` Lai Jiangshan
  2022-12-14 19:09       ` Sean Christopherson
  2023-01-05 10:08     ` Lai Jiangshan
  1 sibling, 1 reply; 6+ messages in thread
From: Lai Jiangshan @ 2022-12-14 13:47 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: linux-kernel, Paolo Bonzini, Lai Jiangshan, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	kvm

Hello Sean,

On Wed, Dec 14, 2022 at 2:12 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Dec 12, 2022, Lai Jiangshan wrote:
> > From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> >
> > Sometimes when the guest updates its pagetable, it adds only new gptes
> > to it without changing any existed one, so there is no point to update
> > the sptes for these existed gptes.
> >
> > Also when the sptes for these unchanged gptes are updated, the AD
> > bits are also removed since make_spte() is called with prefetch=true
> > which might result unneeded TLB flushing.
>
> If either of the proposed changes is kept, please move this to a separate patch.
> Skipping updates for PTEs with the same protections is separate logical change
> from skipping updates when making the SPTE writable.
>
> Actually, can't we just pass @prefetch=false to make_spte()?  FNAME(prefetch_invalid_gpte)
> has already verified the Accessed bit is set in the GPTE, so at least for guest
> correctness there's no need to access-track the SPTE.  Host page aging is already
> fuzzy so I don't think there are problems there.

FNAME(prefetch_invalid_gpte) has already verified the Accessed bit is set
in the GPTE and FNAME(protect_clean_gpte) has already verified the Dirty
bit is set in the GPTE.  These are only for guest AD bits.

And I don't think it is a good idea to pass @prefetch=false to make_spte(),
since the host might have cleared AD bit in the spte for aging or dirty-log,
The AD bits in the spte are better to be kept as before.

Though passing @prefetch=false would not cause any correctness problem
in the view of maintaining guest AD bits.

>
> > Do nothing if the permissions are unchanged or only write-access is
> > being added.
>
> I'm pretty sure skipping the "make writable" case is architecturally wrong.  On a
> #PF, any TLB entries for the faulting virtual address are required to be removed.
> That means KVM _must_ refresh the SPTE if a vCPU takes a !WRITABLE fault on an
> unsync page.  E.g. see kvm_inject_emulated_page_fault().

I might misunderstand what you meant or I failed to connect it with
the SDM properly.

I think there is no #PF here.

And even if the guest is requesting writable, the hypervisor is allowed to
set it non-writable and prepared to handle it in the ensuing write-fault.

Skipping to make it writable is a kind of lazy operation and considered
to be "the hypervisor doesn't grant the writable permission for a period
before next write-fault".

Thanks
Lai

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] kvm: x86/mmu: Reduce the update to the spte in FNAME(sync_page)
  2022-12-14 13:47     ` Lai Jiangshan
@ 2022-12-14 19:09       ` Sean Christopherson
  0 siblings, 0 replies; 6+ messages in thread
From: Sean Christopherson @ 2022-12-14 19:09 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, Paolo Bonzini, Lai Jiangshan, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	kvm

On Wed, Dec 14, 2022, Lai Jiangshan wrote:
> Hello Sean,
> 
> On Wed, Dec 14, 2022 at 2:12 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Mon, Dec 12, 2022, Lai Jiangshan wrote:
> > > From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> > >
> > > Sometimes when the guest updates its pagetable, it adds only new gptes
> > > to it without changing any existed one, so there is no point to update
> > > the sptes for these existed gptes.
> > >
> > > Also when the sptes for these unchanged gptes are updated, the AD
> > > bits are also removed since make_spte() is called with prefetch=true
> > > which might result unneeded TLB flushing.
> >
> > If either of the proposed changes is kept, please move this to a separate patch.
> > Skipping updates for PTEs with the same protections is separate logical change
> > from skipping updates when making the SPTE writable.
> >
> > Actually, can't we just pass @prefetch=false to make_spte()?  FNAME(prefetch_invalid_gpte)
> > has already verified the Accessed bit is set in the GPTE, so at least for guest
> > correctness there's no need to access-track the SPTE.  Host page aging is already
> > fuzzy so I don't think there are problems there.
> 
> FNAME(prefetch_invalid_gpte) has already verified the Accessed bit is set
> in the GPTE and FNAME(protect_clean_gpte) has already verified the Dirty
> bit is set in the GPTE.  These are only for guest AD bits.
> 
> And I don't think it is a good idea to pass @prefetch=false to make_spte(),
> since the host might have cleared AD bit in the spte for aging or dirty-log,
> The AD bits in the spte are better to be kept as before.

Drat, I was thinking KVM never flushes when aging SPTEs, but forgot about
clear_flush_young().

Rather than skipping if the Accessed bit is the only thing that's changing, what
about simply preserving the Accessed bit?  And s/prefetch/accessed in make_spte()
so that future changes to make_spte() don't make incorrect assumptions about the
meaning of "prefetch".

Another alternative would be to conditionally preserve the Accessed bit, i.e. clear
it if a flush is needed anyways, but that seems unnecessarily complex.

> Though passing @prefetch=false would not cause any correctness problem
> in the view of maintaining guest AD bits.
> 
> >
> > > Do nothing if the permissions are unchanged or only write-access is
> > > being added.
> >
> > I'm pretty sure skipping the "make writable" case is architecturally wrong.  On a
> > #PF, any TLB entries for the faulting virtual address are required to be removed.
> > That means KVM _must_ refresh the SPTE if a vCPU takes a !WRITABLE fault on an
> > unsync page.  E.g. see kvm_inject_emulated_page_fault().
> 
> I might misunderstand what you meant or I failed to connect it with
> the SDM properly.
> 
> I think there is no #PF here.
> 
> And even if the guest is requesting writable, the hypervisor is allowed to
> set it non-writable and prepared to handle it in the ensuing write-fault.

Yeah, you're right.  The host will see the "spurious" page fault but it will
never get injected into the guest.

> Skipping to make it writable is a kind of lazy operation and considered
> to be "the hypervisor doesn't grant the writable permission for a period
> before next write-fault".

But that raises the question of why?  No TLB flush is needed precisely because any
!WRITABLE fault will be treated as a spurious fault.  The cost of writing the
SPTE is minimal.  So why skip?  Skipping just to reclaim a low SPTE bit doesn't
seem like a good tradeoff, especially without a concrete use case for the SPTE bit.

E.g. on pre-Nehalem Intel CPUs, i.e. CPUs that don't support EPT and thus have
to use shadow paging, the CPU automatically retries accesses after the TLB flush
on permission faults.  The lazy approach might introduce a noticeable performance
regression on such CPUs due to causing more #PF VM-Exits than the current approach.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] kvm: x86/mmu: Reduce the update to the spte in FNAME(sync_page)
  2022-12-13 18:11   ` Sean Christopherson
  2022-12-14 13:47     ` Lai Jiangshan
@ 2023-01-05 10:08     ` Lai Jiangshan
  1 sibling, 0 replies; 6+ messages in thread
From: Lai Jiangshan @ 2023-01-05 10:08 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: linux-kernel, Paolo Bonzini, Lai Jiangshan, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	kvm

On Wed, Dec 14, 2022 at 2:12 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Dec 12, 2022, Lai Jiangshan wrote:
> > From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> >
> > Sometimes when the guest updates its pagetable, it adds only new gptes
> > to it without changing any existed one, so there is no point to update
> > the sptes for these existed gptes.
> >
> > Also when the sptes for these unchanged gptes are updated, the AD
> > bits are also removed since make_spte() is called with prefetch=true
> > which might result unneeded TLB flushing.
>
> If either of the proposed changes is kept, please move this to a separate patch.
> Skipping updates for PTEs with the same protections is separate logical change
> from skipping updates when making the SPTE writable.
>

Did as you suggested:
https://lore.kernel.org/lkml/20230105095848.6061-5-jiangshanlai@gmail.com/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-01-05 10:08 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20221212153205.3360-1-jiangshanlai@gmail.com>
2022-12-12 15:32 ` [PATCH 1/2] kvm: x86/mmu: Reduce the update to the spte in FNAME(sync_page) Lai Jiangshan
2022-12-13 18:11   ` Sean Christopherson
2022-12-14 13:47     ` Lai Jiangshan
2022-12-14 19:09       ` Sean Christopherson
2023-01-05 10:08     ` Lai Jiangshan
2022-12-12 15:32 ` [PATCH 2/2] kvm: x86/mmu: Remove useless shadow_host_writable_mask Lai Jiangshan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox