kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] KVM: SEV: Limit cache flush operations in sev guest memory reclaim events
@ 2023-11-10  0:37 Jacky Li
  2023-11-10  0:37 ` [RFC PATCH 1/4] KVM: SEV: Drop wbinvd_on_all_cpus() as kvm mmu notifier would flush the cache Jacky Li
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Jacky Li @ 2023-11-10  0:37 UTC (permalink / raw)
  To: Sean Christpherson, Paolo Bonzini
  Cc: Ovidiu Panait, Liam Merwick, Ashish Kalra, David Rientjes,
	David Kaplan, Peter Gonda, Mingwei Zhang, kvm, Jacky Li

The cache flush operation in sev guest memory reclaim events was
originally introduced to prevent security issues due to cache
incoherence and untrusted VMM. However when this operation gets
triggered, it causes performance degradation to the whole machine.

This cache flush operation is performed in mmu_notifiers, in particular,
in the mmu_notifier_invalidate_range_start() function, unconditionally
on all guest memory regions. Although the intention was to flush
cache lines only when guest memory was deallocated, the excessive
invocations include many other cases where this flush is unnecessary.

This RFC proposes using the mmu notifier event to determine whether a
cache flush is needed. Specifically, only do the cache flush when the
address range is unmapped, cleared, released or migrated. A bitmap
module param is also introduced to provide flexibility when flush is
needed in more events or no flush is needed depending on the hardware
platform.

Note that the cache flush operation in memory reclamation only targets
SEV/SEV-ES platforms and no cache flush is needed in SEV-SNP VMs.
Therefore the patch series does not apply to the SEV-SNP context.

Jacky Li (4):
  KVM: SEV: Drop wbinvd_on_all_cpus() as kvm mmu notifier would flush
    the cache
  KVM: SEV: Plumb mmu_notifier_event into sev function
  KVM: SEV: Limit the call of WBINVDs based on the event type of mmu
    notifier
  KVM: SEV: Use a bitmap module param to decide whether a cache flush is
    needed during the guest memory reclaim

 arch/x86/include/asm/kvm_host.h |  3 +-
 arch/x86/kvm/svm/sev.c          | 62 ++++++++++++++++++++++++---------
 arch/x86/kvm/svm/svm.h          |  3 +-
 arch/x86/kvm/x86.c              |  5 +--
 include/linux/kvm_host.h        |  3 +-
 include/linux/mmu_notifier.h    |  4 +++
 virt/kvm/kvm_main.c             | 14 +++++---
 7 files changed, 68 insertions(+), 26 deletions(-)

-- 
2.43.0.rc0.421.g78406f8d94-goog


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 1/4] KVM: SEV: Drop wbinvd_on_all_cpus() as kvm mmu notifier would flush the cache
  2023-11-10  0:37 [RFC PATCH 0/4] KVM: SEV: Limit cache flush operations in sev guest memory reclaim events Jacky Li
@ 2023-11-10  0:37 ` Jacky Li
  2023-12-01 17:53   ` Sean Christopherson
  2023-11-10  0:37 ` [RFC PATCH 2/4] KVM: SEV: Plumb mmu_notifier_event into sev function Jacky Li
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Jacky Li @ 2023-11-10  0:37 UTC (permalink / raw)
  To: Sean Christpherson, Paolo Bonzini
  Cc: Ovidiu Panait, Liam Merwick, Ashish Kalra, David Rientjes,
	David Kaplan, Peter Gonda, Mingwei Zhang, kvm, Jacky Li

Remove the wbinvd_on_all_cpus inside sev_mem_enc_unregister_region() and
sev_vm_destroy() because kvm mmu notifier invalidation event would flush
the cache.

Signed-off-by: Jacky Li <jackyli@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Suggested-by: Sean Christpherson <seanjc@google.com>
---
 arch/x86/kvm/svm/sev.c | 14 --------------
 1 file changed, 14 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 4900c078045a..7fbcb7dea2c0 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2043,13 +2043,6 @@ int sev_mem_enc_unregister_region(struct kvm *kvm,
 		goto failed;
 	}
 
-	/*
-	 * Ensure that all guest tagged cache entries are flushed before
-	 * releasing the pages back to the system for use. CLFLUSH will
-	 * not do this, so issue a WBINVD.
-	 */
-	wbinvd_on_all_cpus();
-
 	__unregister_enc_region_locked(kvm, region);
 
 	mutex_unlock(&kvm->lock);
@@ -2147,13 +2140,6 @@ void sev_vm_destroy(struct kvm *kvm)
 		return;
 	}
 
-	/*
-	 * Ensure that all guest tagged cache entries are flushed before
-	 * releasing the pages back to the system for use. CLFLUSH will
-	 * not do this, so issue a WBINVD.
-	 */
-	wbinvd_on_all_cpus();
-
 	/*
 	 * if userspace was terminated before unregistering the memory regions
 	 * then lets unpin all the registered memory.
-- 
2.43.0.rc0.421.g78406f8d94-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 2/4] KVM: SEV: Plumb mmu_notifier_event into sev function
  2023-11-10  0:37 [RFC PATCH 0/4] KVM: SEV: Limit cache flush operations in sev guest memory reclaim events Jacky Li
  2023-11-10  0:37 ` [RFC PATCH 1/4] KVM: SEV: Drop wbinvd_on_all_cpus() as kvm mmu notifier would flush the cache Jacky Li
@ 2023-11-10  0:37 ` Jacky Li
  2023-11-10  0:37 ` [RFC PATCH 3/4] KVM: SEV: Limit the call of WBINVDs based on the event type of mmu notifier Jacky Li
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Jacky Li @ 2023-11-10  0:37 UTC (permalink / raw)
  To: Sean Christpherson, Paolo Bonzini
  Cc: Ovidiu Panait, Liam Merwick, Ashish Kalra, David Rientjes,
	David Kaplan, Peter Gonda, Mingwei Zhang, kvm, Jacky Li

Plumb mmu_notifier_event enum all the way to sev function so that the
enum can provide proper information for SEV/SEV-ES VMs to do the cache
flush when necessary.

Signed-off-by: Jacky Li <jackyli@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
---
 arch/x86/include/asm/kvm_host.h |  3 ++-
 arch/x86/kvm/svm/sev.c          |  3 ++-
 arch/x86/kvm/svm/svm.h          |  3 ++-
 arch/x86/kvm/x86.c              |  5 +++--
 include/linux/kvm_host.h        |  3 ++-
 virt/kvm/kvm_main.c             | 14 +++++++++-----
 6 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d7036982332e..c026e171a8c8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1741,7 +1741,8 @@ struct kvm_x86_ops {
 	int (*mem_enc_unregister_region)(struct kvm *kvm, struct kvm_enc_region *argp);
 	int (*vm_copy_enc_context_from)(struct kvm *kvm, unsigned int source_fd);
 	int (*vm_move_enc_context_from)(struct kvm *kvm, unsigned int source_fd);
-	void (*guest_memory_reclaimed)(struct kvm *kvm);
+	void (*guest_memory_reclaimed)(struct kvm *kvm,
+				       unsigned int mmu_notifier_event);
 
 	int (*get_msr_feature)(struct kvm_msr_entry *entry);
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 7fbcb7dea2c0..8d30f6c5e872 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2329,7 +2329,8 @@ static void sev_flush_encrypted_page(struct kvm_vcpu *vcpu, void *va)
 	wbinvd_on_all_cpus();
 }
 
-void sev_guest_memory_reclaimed(struct kvm *kvm)
+void sev_guest_memory_reclaimed(struct kvm *kvm,
+				unsigned int mmu_notifier_event)
 {
 	if (!sev_guest(kvm))
 		return;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index be67ab7fdd10..c8a911a02509 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -676,7 +676,8 @@ int sev_mem_enc_unregister_region(struct kvm *kvm,
 				  struct kvm_enc_region *range);
 int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd);
 int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd);
-void sev_guest_memory_reclaimed(struct kvm *kvm);
+void sev_guest_memory_reclaimed(struct kvm *kvm,
+				unsigned int mmu_notifier_event);
 
 void pre_sev_run(struct vcpu_svm *svm, int cpu);
 void __init sev_set_cpu_caps(void);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2c924075f6f1..2cde9a836bf7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10592,9 +10592,10 @@ static void vcpu_load_eoi_exitmap(struct kvm_vcpu *vcpu)
 		vcpu, (u64 *)vcpu->arch.ioapic_handled_vectors);
 }
 
-void kvm_arch_guest_memory_reclaimed(struct kvm *kvm)
+void kvm_arch_guest_memory_reclaimed(struct kvm *kvm,
+				     unsigned int mmu_notifier_event)
 {
-	static_call_cond(kvm_x86_guest_memory_reclaimed)(kvm);
+	static_call_cond(kvm_x86_guest_memory_reclaimed)(kvm, mmu_notifier_event);
 }
 
 static void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 4944136efaa2..8984414c5b7a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2270,7 +2270,8 @@ static inline long kvm_arch_vcpu_async_ioctl(struct file *filp,
 }
 #endif /* CONFIG_HAVE_KVM_VCPU_ASYNC_IOCTL */
 
-void kvm_arch_guest_memory_reclaimed(struct kvm *kvm);
+void kvm_arch_guest_memory_reclaimed(struct kvm *kvm,
+				     unsigned int mmu_notifier_event);
 
 #ifdef CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE
 int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 486800a7024b..18526e198993 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -154,7 +154,8 @@ static unsigned long long kvm_active_vms;
 
 static DEFINE_PER_CPU(cpumask_var_t, cpu_kick_mask);
 
-__weak void kvm_arch_guest_memory_reclaimed(struct kvm *kvm)
+__weak void kvm_arch_guest_memory_reclaimed(struct kvm *kvm,
+					    unsigned int mmu_notifier_event)
 {
 }
 
@@ -396,7 +397,7 @@ void kvm_flush_remote_tlbs_memslot(struct kvm *kvm,
 static void kvm_flush_shadow_all(struct kvm *kvm)
 {
 	kvm_arch_flush_shadow_all(kvm);
-	kvm_arch_guest_memory_reclaimed(kvm);
+	kvm_arch_guest_memory_reclaimed(kvm, MMU_NOTIFY_RELEASE);
 }
 
 #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
@@ -546,11 +547,13 @@ typedef bool (*hva_handler_t)(struct kvm *kvm, struct kvm_gfn_range *range);
 typedef void (*on_lock_fn_t)(struct kvm *kvm, unsigned long start,
 			     unsigned long end);
 
-typedef void (*on_unlock_fn_t)(struct kvm *kvm);
+typedef void (*on_unlock_fn_t)(struct kvm *kvm,
+			       unsigned int mmu_notifier_event);
 
 struct kvm_hva_range {
 	unsigned long start;
 	unsigned long end;
+	unsigned int event;
 	union kvm_mmu_notifier_arg arg;
 	hva_handler_t handler;
 	on_lock_fn_t on_lock;
@@ -647,7 +650,7 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
 	if (locked) {
 		KVM_MMU_UNLOCK(kvm);
 		if (!IS_KVM_NULL_FN(range->on_unlock))
-			range->on_unlock(kvm);
+			range->on_unlock(kvm, range->event);
 	}
 
 	srcu_read_unlock(&kvm->srcu, idx);
@@ -774,6 +777,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
 	const struct kvm_hva_range hva_range = {
 		.start		= range->start,
 		.end		= range->end,
+		.event		= range->event,
 		.handler	= kvm_unmap_gfn_range,
 		.on_lock	= kvm_mmu_invalidate_begin,
 		.on_unlock	= kvm_arch_guest_memory_reclaimed,
@@ -1769,7 +1773,7 @@ static void kvm_invalidate_memslot(struct kvm *kvm,
 	 *	- kvm_is_visible_gfn (mmu_check_root)
 	 */
 	kvm_arch_flush_shadow_memslot(kvm, old);
-	kvm_arch_guest_memory_reclaimed(kvm);
+	kvm_arch_guest_memory_reclaimed(kvm, MMU_NOTIFY_UNMAP);
 
 	/* Was released by kvm_swap_active_memslots(), reacquire. */
 	mutex_lock(&kvm->slots_arch_lock);
-- 
2.43.0.rc0.421.g78406f8d94-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 3/4] KVM: SEV: Limit the call of WBINVDs based on the event type of mmu notifier
  2023-11-10  0:37 [RFC PATCH 0/4] KVM: SEV: Limit cache flush operations in sev guest memory reclaim events Jacky Li
  2023-11-10  0:37 ` [RFC PATCH 1/4] KVM: SEV: Drop wbinvd_on_all_cpus() as kvm mmu notifier would flush the cache Jacky Li
  2023-11-10  0:37 ` [RFC PATCH 2/4] KVM: SEV: Plumb mmu_notifier_event into sev function Jacky Li
@ 2023-11-10  0:37 ` Jacky Li
  2023-11-10 18:52   ` Kalra, Ashish
  2023-11-10  0:37 ` [RFC PATCH 4/4] KVM: SEV: Use a bitmap module param to decide whether a cache flush is needed during the guest memory reclaim Jacky Li
  2023-12-01 18:05 ` [RFC PATCH 0/4] KVM: SEV: Limit cache flush operations in sev guest memory reclaim events Sean Christopherson
  4 siblings, 1 reply; 17+ messages in thread
From: Jacky Li @ 2023-11-10  0:37 UTC (permalink / raw)
  To: Sean Christpherson, Paolo Bonzini
  Cc: Ovidiu Panait, Liam Merwick, Ashish Kalra, David Rientjes,
	David Kaplan, Peter Gonda, Mingwei Zhang, kvm, Jacky Li

The cache flush was originally introduced to enforce the cache coherency
across VM boundary in SEV, so the flush is not needed in some cases when
the page remains in the same VM. wbinvd_on_all_cpus() is a costly
operation so use the mmu notifier event type information in the range
struct to only do cache flush when needed.

The physical page might be allocated to a different VM after the range
is unmapped, cleared, released or migrated. So do a cache flush only on
those events.

Signed-off-by: Jacky Li <jackyli@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Suggested-by: Sean Christpherson <seanjc@google.com>
---
 arch/x86/kvm/svm/sev.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 8d30f6c5e872..477df8a06629 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2335,7 +2335,11 @@ void sev_guest_memory_reclaimed(struct kvm *kvm,
 	if (!sev_guest(kvm))
 		return;
 
-	wbinvd_on_all_cpus();
+	if (mmu_notifier_event == MMU_NOTIFY_UNMAP ||
+	    mmu_notifier_event == MMU_NOTIFY_CLEAR ||
+	    mmu_notifier_event == MMU_NOTIFY_RELEASE ||
+	    mmu_notifier_event == MMU_NOTIFY_MIGRATE)
+		wbinvd_on_all_cpus();
 }
 
 void sev_free_vcpu(struct kvm_vcpu *vcpu)
-- 
2.43.0.rc0.421.g78406f8d94-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 4/4] KVM: SEV: Use a bitmap module param to decide whether a cache flush is needed during the guest memory reclaim
  2023-11-10  0:37 [RFC PATCH 0/4] KVM: SEV: Limit cache flush operations in sev guest memory reclaim events Jacky Li
                   ` (2 preceding siblings ...)
  2023-11-10  0:37 ` [RFC PATCH 3/4] KVM: SEV: Limit the call of WBINVDs based on the event type of mmu notifier Jacky Li
@ 2023-11-10  0:37 ` Jacky Li
  2023-12-01 18:00   ` Sean Christopherson
  2023-12-01 18:05 ` [RFC PATCH 0/4] KVM: SEV: Limit cache flush operations in sev guest memory reclaim events Sean Christopherson
  4 siblings, 1 reply; 17+ messages in thread
From: Jacky Li @ 2023-11-10  0:37 UTC (permalink / raw)
  To: Sean Christpherson, Paolo Bonzini
  Cc: Ovidiu Panait, Liam Merwick, Ashish Kalra, David Rientjes,
	David Kaplan, Peter Gonda, Mingwei Zhang, kvm, Jacky Li

Use a bitmap to provide the flexibility on deciding whether the flush is
needed for a specific mmu notifier event. The cache flush during memory
reclamation was originally introduced to address the cache incoherency
issues in some SME_COHERENT platforms. User may configure the bitmap
depending on the hardware (e.g. No flush needed when SME_COHERENT can
extend to DMA devices) or userspace VMM (e.g. No flush needed when VMM
ensures guest memory is properly unpinned).

The bitmap also decouples itself from the mmu_notifier_event type to
provide a consistent interface. The parameter remains same behavior
regardless of the changes to mmu notifier event in future kernel
versions. When a new mmu notifier event is added, the new event will be
defaulted to BIT(0) so that no additional cache flush will be
accidentally introduced.

Signed-off-by: Jacky Li <jackyli@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
---
 arch/x86/kvm/svm/sev.c       | 47 +++++++++++++++++++++++++++++++++---
 include/linux/mmu_notifier.h |  4 +++
 2 files changed, 47 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 477df8a06629..6e7530b4ae5d 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -65,6 +65,47 @@ module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
 #define sev_es_debug_swap_enabled false
 #endif /* CONFIG_KVM_AMD_SEV */
 
+#define MMU_NOTIFY_OTHERS_BIT BIT_ULL(0)
+#define MMU_NOTIFY_UNMAP_BIT BIT_ULL(1)
+#define MMU_NOTIFY_CLEAR_BIT BIT_ULL(2)
+#define MMU_NOTIFY_PROTECTION_VMA_BIT BIT_ULL(3)
+#define MMU_NOTIFY_PROTECTION_PAGE_BIT BIT_ULL(4)
+#define MMU_NOTIFY_SOFT_DIRTY_BIT BIT_ULL(5)
+#define MMU_NOTIFY_RELEASE_BIT BIT_ULL(6)
+#define MMU_NOTIFY_MIGRATE_BIT BIT_ULL(7)
+#define MMU_NOTIFY_EXCLUSIVE_BIT BIT_ULL(8)
+
+/*
+ * Explicitly decouple with the mmu_notifier_event enum, so that the interface
+ * (i.e. bit definitions in the module param bitmp) remains the same when the
+ * original enum get updated.
+ */
+static const int mmu_notifier_event_map[NR_MMU_NOTIFY_EVENTS] = {
+	[MMU_NOTIFY_UNMAP] = MMU_NOTIFY_UNMAP_BIT,
+	[MMU_NOTIFY_CLEAR] = MMU_NOTIFY_CLEAR_BIT,
+	[MMU_NOTIFY_PROTECTION_VMA] = MMU_NOTIFY_PROTECTION_VMA_BIT,
+	[MMU_NOTIFY_PROTECTION_PAGE] = MMU_NOTIFY_PROTECTION_PAGE_BIT,
+	[MMU_NOTIFY_SOFT_DIRTY] = MMU_NOTIFY_SOFT_DIRTY_BIT,
+	[MMU_NOTIFY_RELEASE] = MMU_NOTIFY_RELEASE_BIT,
+	[MMU_NOTIFY_MIGRATE] = MMU_NOTIFY_MIGRATE_BIT,
+	[MMU_NOTIFY_EXCLUSIVE] = MMU_NOTIFY_EXCLUSIVE_BIT
+};
+unsigned long flush_on_mmu_notifier_event_bitmap = MMU_NOTIFY_UNMAP_BIT |
+	MMU_NOTIFY_CLEAR_BIT | MMU_NOTIFY_RELEASE_BIT | MMU_NOTIFY_MIGRATE_BIT;
+EXPORT_SYMBOL_GPL(flush_on_mmu_notifier_event_bitmap);
+module_param(flush_on_mmu_notifier_event_bitmap, ulong, 0644);
+MODULE_PARM_DESC(flush_on_mmu_notifier_event_bitmap,
+"Whether a cache flush is needed when the sev guest memory is reclaimed with a specific mmu notifier event.\n"
+"\tBit 0 (0x01)  left to any event not yet defined in the map\n"
+"\tBit 1 (0x02)  corresponds to MMU_NOTIFY_UNMAP event\n"
+"\tBit 2 (0x04)  corresponds to MMU_NOTIFY_CLEAR event\n"
+"\tBit 3 (0x08)  corresponds to MMU_NOTIFY_PROTECTION_VMA event\n"
+"\tBit 4 (0x10)  corresponds to MMU_NOTIFY_PROTECTION_PAGE event\n"
+"\tBit 5 (0x20)  corresponds to MMU_NOTIFY_SOFT_DIRTY event\n"
+"\tBit 6 (0x80)  corresponds to MMU_NOTIFY_RELEASE event\n"
+"\tBit 7 (0x100) corresponds to MMU_NOTIFY_MIGRATE event\n"
+"\tBit 8 (0x200) corresponds to MMU_NOTIFY_EXCLUSIVE event");
+
 static u8 sev_enc_bit;
 static DECLARE_RWSEM(sev_deactivate_lock);
 static DEFINE_MUTEX(sev_bitmap_lock);
@@ -2335,10 +2376,8 @@ void sev_guest_memory_reclaimed(struct kvm *kvm,
 	if (!sev_guest(kvm))
 		return;
 
-	if (mmu_notifier_event == MMU_NOTIFY_UNMAP ||
-	    mmu_notifier_event == MMU_NOTIFY_CLEAR ||
-	    mmu_notifier_event == MMU_NOTIFY_RELEASE ||
-	    mmu_notifier_event == MMU_NOTIFY_MIGRATE)
+	if (mmu_notifier_event_map[mmu_notifier_event] &
+	    flush_on_mmu_notifier_event_bitmap)
 		wbinvd_on_all_cpus();
 }
 
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index f349e08a9dfe..b40db51d76a4 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -47,6 +47,9 @@ struct mmu_interval_notifier;
  * longer have exclusive access to the page. When sent during creation of an
  * exclusive range the owner will be initialised to the value provided by the
  * caller of make_device_exclusive_range(), otherwise the owner will be NULL.
+ *
+ * @NR_MMU_NOTIFY_EVENTS: number of mmu notifier events, should always be at
+ * the end of the enum list.
  */
 enum mmu_notifier_event {
 	MMU_NOTIFY_UNMAP = 0,
@@ -57,6 +60,7 @@ enum mmu_notifier_event {
 	MMU_NOTIFY_RELEASE,
 	MMU_NOTIFY_MIGRATE,
 	MMU_NOTIFY_EXCLUSIVE,
+	NR_MMU_NOTIFY_EVENTS,
 };
 
 #define MMU_NOTIFIER_RANGE_BLOCKABLE (1 << 0)
-- 
2.43.0.rc0.421.g78406f8d94-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 3/4] KVM: SEV: Limit the call of WBINVDs based on the event type of mmu notifier
  2023-11-10  0:37 ` [RFC PATCH 3/4] KVM: SEV: Limit the call of WBINVDs based on the event type of mmu notifier Jacky Li
@ 2023-11-10 18:52   ` Kalra, Ashish
  0 siblings, 0 replies; 17+ messages in thread
From: Kalra, Ashish @ 2023-11-10 18:52 UTC (permalink / raw)
  To: Jacky Li, Sean Christpherson, Paolo Bonzini
  Cc: Ovidiu Panait, Liam Merwick, David Rientjes, David Kaplan,
	Peter Gonda, Mingwei Zhang, kvm

On 11/9/2023 6:37 PM, Jacky Li wrote:
> The cache flush was originally introduced to enforce the cache coherency
> across VM boundary in SEV, so the flush is not needed in some cases when
> the page remains in the same VM. wbinvd_on_all_cpus() is a costly
> operation so use the mmu notifier event type information in the range
> struct to only do cache flush when needed.
> 
> The physical page might be allocated to a different VM after the range
> is unmapped, cleared, released or migrated. So do a cache flush only on
> those events.
> 
> Signed-off-by: Jacky Li <jackyli@google.com>
> Signed-off-by: Mingwei Zhang <mizhang@google.com>
> Suggested-by: Sean Christpherson <seanjc@google.com>
> ---
>   arch/x86/kvm/svm/sev.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 8d30f6c5e872..477df8a06629 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -2335,7 +2335,11 @@ void sev_guest_memory_reclaimed(struct kvm *kvm,
>   	if (!sev_guest(kvm))
>   		return;
>   
> -	wbinvd_on_all_cpus();
> +	if (mmu_notifier_event == MMU_NOTIFY_UNMAP ||
> +	    mmu_notifier_event == MMU_NOTIFY_CLEAR ||

When i looked at MMU notifier event filtering and the various code paths 
(of interest) from where the MMU invalidation notifier gets invoked:

For NUMA load balancing during #PF fault handling, try_to_migrate_one() 
does MMU invalidation notifier invocation with MMU_NOTIFY_CLEAR event 
and then looking at KSM code paths, try_to_merge_one_page() -> 
write_protect_page() and try_to_merge_one_page() -> replace_page() does 
MMU invalidation notifier invocations also with MMU_NOTIFY_CLEAR event.

Now, i remember from previous discussions, that the CLEAR event is an 
overloaded event used for page zapping, madvise, etc., so i don't think 
we will be able to filter *out* this event and if this is the case then 
i have concerns about this whole patch-series as this event is 
triggering most of the performance issues we are observing.

Thanks,
Ashish

> +	    mmu_notifier_event == MMU_NOTIFY_RELEASE ||
> +	    mmu_notifier_event == MMU_NOTIFY_MIGRATE)
> +		wbinvd_on_all_cpus();
>   }
>   
>   void sev_free_vcpu(struct kvm_vcpu *vcpu)
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/4] KVM: SEV: Drop wbinvd_on_all_cpus() as kvm mmu notifier would flush the cache
  2023-11-10  0:37 ` [RFC PATCH 1/4] KVM: SEV: Drop wbinvd_on_all_cpus() as kvm mmu notifier would flush the cache Jacky Li
@ 2023-12-01 17:53   ` Sean Christopherson
  0 siblings, 0 replies; 17+ messages in thread
From: Sean Christopherson @ 2023-12-01 17:53 UTC (permalink / raw)
  To: Jacky Li
  Cc: Paolo Bonzini, Ovidiu Panait, Liam Merwick, Ashish Kalra,
	David Rientjes, David Kaplan, Peter Gonda, Mingwei Zhang, kvm

On Fri, Nov 10, 2023, Jacky Li wrote:
> Remove the wbinvd_on_all_cpus inside sev_mem_enc_unregister_region() and
> sev_vm_destroy() because kvm mmu notifier invalidation event would flush
> the cache.

This needs a much longer explanation of why this is safe.  This might also need
an opt-in, e.g. if userspace is reusing the memory for something else without
freeing it back to the kernel, and thus is relying on KVM to do the WBINVD.

The key thing is that userspace can access the memory at any time and _can_ do
CLFLUSH{OPT} if userspace wants to do its own conversions.  I.e. the WBINVD doesn't
protect against a misbehaving corrupting guest/userspace data.  But it's still
possible that userspace is relying on the WBINVD, and thou shalt not break userspace.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 4/4] KVM: SEV: Use a bitmap module param to decide whether a cache flush is needed during the guest memory reclaim
  2023-11-10  0:37 ` [RFC PATCH 4/4] KVM: SEV: Use a bitmap module param to decide whether a cache flush is needed during the guest memory reclaim Jacky Li
@ 2023-12-01 18:00   ` Sean Christopherson
  0 siblings, 0 replies; 17+ messages in thread
From: Sean Christopherson @ 2023-12-01 18:00 UTC (permalink / raw)
  To: Jacky Li
  Cc: Paolo Bonzini, Ovidiu Panait, Liam Merwick, Ashish Kalra,
	David Rientjes, David Kaplan, Peter Gonda, Mingwei Zhang, kvm

On Fri, Nov 10, 2023, Jacky Li wrote:
> Use a bitmap to provide the flexibility on deciding whether the flush is
> needed for a specific mmu notifier event. The cache flush during memory
> reclamation was originally introduced to address the cache incoherency
> issues in some SME_COHERENT platforms. User may configure the bitmap
> depending on the hardware (e.g. No flush needed when SME_COHERENT can
> extend to DMA devices) or userspace VMM (e.g. No flush needed when VMM
> ensures guest memory is properly unpinned).

Absolutely not.  KVM and the kernel must know when a flush is needed and when it
is not.  The various errata around cache line aliasing is far too messy to punt
to userspace.  Not to mention this bleeds kernel internals to userspace and thus
risks unintentionally creating an ABI.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/4] KVM: SEV: Limit cache flush operations in sev guest memory reclaim events
  2023-11-10  0:37 [RFC PATCH 0/4] KVM: SEV: Limit cache flush operations in sev guest memory reclaim events Jacky Li
                   ` (3 preceding siblings ...)
  2023-11-10  0:37 ` [RFC PATCH 4/4] KVM: SEV: Use a bitmap module param to decide whether a cache flush is needed during the guest memory reclaim Jacky Li
@ 2023-12-01 18:05 ` Sean Christopherson
  2023-12-01 19:02   ` Mingwei Zhang
  4 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2023-12-01 18:05 UTC (permalink / raw)
  To: Jacky Li
  Cc: Paolo Bonzini, Ovidiu Panait, Liam Merwick, Ashish Kalra,
	David Rientjes, David Kaplan, Peter Gonda, Mingwei Zhang, kvm

On Fri, Nov 10, 2023, Jacky Li wrote:
> The cache flush operation in sev guest memory reclaim events was
> originally introduced to prevent security issues due to cache
> incoherence and untrusted VMM. However when this operation gets
> triggered, it causes performance degradation to the whole machine.
> 
> This cache flush operation is performed in mmu_notifiers, in particular,
> in the mmu_notifier_invalidate_range_start() function, unconditionally
> on all guest memory regions. Although the intention was to flush
> cache lines only when guest memory was deallocated, the excessive
> invocations include many other cases where this flush is unnecessary.
> 
> This RFC proposes using the mmu notifier event to determine whether a
> cache flush is needed. Specifically, only do the cache flush when the
> address range is unmapped, cleared, released or migrated. A bitmap
> module param is also introduced to provide flexibility when flush is
> needed in more events or no flush is needed depending on the hardware
> platform.

I'm still not at all convinced that this is worth doing.  We have clear line of
sight to cleanly and optimally handling SNP and beyond.  If there is an actual
use case that wants to run SEV and/or SEV-ES VMs, which can't support page
migration, on the same host as traditional VMs, _and_ for some reason their
userspace is incapable of providing reasonable NUMA locality, then the owners of
that use case can speak up and provide justification for taking on this extra
complexity in KVM.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/4] KVM: SEV: Limit cache flush operations in sev guest memory reclaim events
  2023-12-01 18:05 ` [RFC PATCH 0/4] KVM: SEV: Limit cache flush operations in sev guest memory reclaim events Sean Christopherson
@ 2023-12-01 19:02   ` Mingwei Zhang
  2023-12-01 21:30     ` Kalra, Ashish
  0 siblings, 1 reply; 17+ messages in thread
From: Mingwei Zhang @ 2023-12-01 19:02 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Jacky Li, Paolo Bonzini, Ovidiu Panait, Liam Merwick,
	Ashish Kalra, David Rientjes, David Kaplan, Peter Gonda, kvm

On Fri, Dec 1, 2023 at 10:05 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, Nov 10, 2023, Jacky Li wrote:
> > The cache flush operation in sev guest memory reclaim events was
> > originally introduced to prevent security issues due to cache
> > incoherence and untrusted VMM. However when this operation gets
> > triggered, it causes performance degradation to the whole machine.
> >
> > This cache flush operation is performed in mmu_notifiers, in particular,
> > in the mmu_notifier_invalidate_range_start() function, unconditionally
> > on all guest memory regions. Although the intention was to flush
> > cache lines only when guest memory was deallocated, the excessive
> > invocations include many other cases where this flush is unnecessary.
> >
> > This RFC proposes using the mmu notifier event to determine whether a
> > cache flush is needed. Specifically, only do the cache flush when the
> > address range is unmapped, cleared, released or migrated. A bitmap
> > module param is also introduced to provide flexibility when flush is
> > needed in more events or no flush is needed depending on the hardware
> > platform.
>
> I'm still not at all convinced that this is worth doing.  We have clear line of
> sight to cleanly and optimally handling SNP and beyond.  If there is an actual
> use case that wants to run SEV and/or SEV-ES VMs, which can't support page
> migration, on the same host as traditional VMs, _and_ for some reason their
> userspace is incapable of providing reasonable NUMA locality, then the owners of
> that use case can speak up and provide justification for taking on this extra
> complexity in KVM.

Hi Sean,

Jacky and I were looking at some cases like mmu_notifier calls
triggered by the overloaded reason "MMU_NOTIFY_CLEAR". Even if we turn
off page migration etc, splitting PMD may still happen at some point
under this reason, and we will never be able to turn it off by
tweaking kernel CONFIG options. So, I think this is the line of sight
for this series.

Handling SNP could be separate, since in SNP we have per-page
properties, which allow KVM to know which page to flush individually.

Thanks.
-Mingwei

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/4] KVM: SEV: Limit cache flush operations in sev guest memory reclaim events
  2023-12-01 19:02   ` Mingwei Zhang
@ 2023-12-01 21:30     ` Kalra, Ashish
  2023-12-01 21:58       ` Mingwei Zhang
  0 siblings, 1 reply; 17+ messages in thread
From: Kalra, Ashish @ 2023-12-01 21:30 UTC (permalink / raw)
  To: Mingwei Zhang, Sean Christopherson
  Cc: Jacky Li, Paolo Bonzini, Ovidiu Panait, Liam Merwick,
	David Rientjes, David Kaplan, Peter Gonda, kvm, Michael Roth

On 12/1/2023 1:02 PM, Mingwei Zhang wrote:
> On Fri, Dec 1, 2023 at 10:05 AM Sean Christopherson <seanjc@google.com> wrote:
>>
>> On Fri, Nov 10, 2023, Jacky Li wrote:
>>> The cache flush operation in sev guest memory reclaim events was
>>> originally introduced to prevent security issues due to cache
>>> incoherence and untrusted VMM. However when this operation gets
>>> triggered, it causes performance degradation to the whole machine.
>>>
>>> This cache flush operation is performed in mmu_notifiers, in particular,
>>> in the mmu_notifier_invalidate_range_start() function, unconditionally
>>> on all guest memory regions. Although the intention was to flush
>>> cache lines only when guest memory was deallocated, the excessive
>>> invocations include many other cases where this flush is unnecessary.
>>>
>>> This RFC proposes using the mmu notifier event to determine whether a
>>> cache flush is needed. Specifically, only do the cache flush when the
>>> address range is unmapped, cleared, released or migrated. A bitmap
>>> module param is also introduced to provide flexibility when flush is
>>> needed in more events or no flush is needed depending on the hardware
>>> platform.
>>
>> I'm still not at all convinced that this is worth doing.  We have clear line of
>> sight to cleanly and optimally handling SNP and beyond.  If there is an actual
>> use case that wants to run SEV and/or SEV-ES VMs, which can't support page
>> migration, on the same host as traditional VMs, _and_ for some reason their
>> userspace is incapable of providing reasonable NUMA locality, then the owners of
>> that use case can speak up and provide justification for taking on this extra
>> complexity in KVM.
> 
> Hi Sean,
> 
> Jacky and I were looking at some cases like mmu_notifier calls
> triggered by the overloaded reason "MMU_NOTIFY_CLEAR". Even if we turn
> off page migration etc, splitting PMD may still happen at some point
> under this reason, and we will never be able to turn it off by
> tweaking kernel CONFIG options. So, I think this is the line of sight
> for this series.
> 
> Handling SNP could be separate, since in SNP we have per-page
> properties, which allow KVM to know which page to flush individually.
> 

For SNP + gmem, where the HVA ranges covered by the MMU notifiers are 
not acting on encrypted pages, we are ignoring MMU invalidation 
notifiers for SNP guests as part of the SNP host patches being posted 
upstream and instead relying on gmem own invalidation stuff to clean 
them up on a per-folio basis.

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/4] KVM: SEV: Limit cache flush operations in sev guest memory reclaim events
  2023-12-01 21:30     ` Kalra, Ashish
@ 2023-12-01 21:58       ` Mingwei Zhang
  2023-12-01 22:13         ` Kalra, Ashish
  2023-12-01 22:13         ` Sean Christopherson
  0 siblings, 2 replies; 17+ messages in thread
From: Mingwei Zhang @ 2023-12-01 21:58 UTC (permalink / raw)
  To: Kalra, Ashish
  Cc: Sean Christopherson, Jacky Li, Paolo Bonzini, Ovidiu Panait,
	Liam Merwick, David Rientjes, David Kaplan, Peter Gonda, kvm,
	Michael Roth

On Fri, Dec 1, 2023 at 1:30 PM Kalra, Ashish <ashish.kalra@amd.com> wrote:
>
> On 12/1/2023 1:02 PM, Mingwei Zhang wrote:
> > On Fri, Dec 1, 2023 at 10:05 AM Sean Christopherson <seanjc@google.com> wrote:
> >>
> >> On Fri, Nov 10, 2023, Jacky Li wrote:
> >>> The cache flush operation in sev guest memory reclaim events was
> >>> originally introduced to prevent security issues due to cache
> >>> incoherence and untrusted VMM. However when this operation gets
> >>> triggered, it causes performance degradation to the whole machine.
> >>>
> >>> This cache flush operation is performed in mmu_notifiers, in particular,
> >>> in the mmu_notifier_invalidate_range_start() function, unconditionally
> >>> on all guest memory regions. Although the intention was to flush
> >>> cache lines only when guest memory was deallocated, the excessive
> >>> invocations include many other cases where this flush is unnecessary.
> >>>
> >>> This RFC proposes using the mmu notifier event to determine whether a
> >>> cache flush is needed. Specifically, only do the cache flush when the
> >>> address range is unmapped, cleared, released or migrated. A bitmap
> >>> module param is also introduced to provide flexibility when flush is
> >>> needed in more events or no flush is needed depending on the hardware
> >>> platform.
> >>
> >> I'm still not at all convinced that this is worth doing.  We have clear line of
> >> sight to cleanly and optimally handling SNP and beyond.  If there is an actual
> >> use case that wants to run SEV and/or SEV-ES VMs, which can't support page
> >> migration, on the same host as traditional VMs, _and_ for some reason their
> >> userspace is incapable of providing reasonable NUMA locality, then the owners of
> >> that use case can speak up and provide justification for taking on this extra
> >> complexity in KVM.
> >
> > Hi Sean,
> >
> > Jacky and I were looking at some cases like mmu_notifier calls
> > triggered by the overloaded reason "MMU_NOTIFY_CLEAR". Even if we turn
> > off page migration etc, splitting PMD may still happen at some point
> > under this reason, and we will never be able to turn it off by
> > tweaking kernel CONFIG options. So, I think this is the line of sight
> > for this series.
> >
> > Handling SNP could be separate, since in SNP we have per-page
> > properties, which allow KVM to know which page to flush individually.
> >
>
> For SNP + gmem, where the HVA ranges covered by the MMU notifiers are
> not acting on encrypted pages, we are ignoring MMU invalidation
> notifiers for SNP guests as part of the SNP host patches being posted
> upstream and instead relying on gmem own invalidation stuff to clean
> them up on a per-folio basis.
>
> Thanks,
> Ashish

oh, I have no question about that. This series only applies to
SEV/SEV-ES type of VMs.

For SNP + guest_memfd, I don't see the implementation details, but I
doubt you can ignore mmu_notifiers if the request does cover some
encrypted memory in error cases or corner cases. Does the SNP enforce
the usage of guest_memfd? How do we prevent exceptional cases? I am
sure you guys already figured out the answers, so I don't plan to dig
deeper until SNP host pages are accepted.

Clearly, for SEV/SEV-ES, there is no such guarantee like guest_memfd.
Applying guest_memfd on SEV/SEV-ES might require changes on SEV API I
suspect, so I think that's equally non-trivial and thus may not be
worth pursuing.

Thanks.
-Mingwei

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/4] KVM: SEV: Limit cache flush operations in sev guest memory reclaim events
  2023-12-01 21:58       ` Mingwei Zhang
@ 2023-12-01 22:13         ` Kalra, Ashish
  2023-12-01 22:13         ` Sean Christopherson
  1 sibling, 0 replies; 17+ messages in thread
From: Kalra, Ashish @ 2023-12-01 22:13 UTC (permalink / raw)
  To: Mingwei Zhang
  Cc: Sean Christopherson, Jacky Li, Paolo Bonzini, Ovidiu Panait,
	Liam Merwick, David Rientjes, David Kaplan, Peter Gonda, kvm,
	Michael Roth

On 12/1/2023 3:58 PM, Mingwei Zhang wrote:
> On Fri, Dec 1, 2023 at 1:30 PM Kalra, Ashish <ashish.kalra@amd.com> wrote:
>>
>> On 12/1/2023 1:02 PM, Mingwei Zhang wrote:
>>> On Fri, Dec 1, 2023 at 10:05 AM Sean Christopherson <seanjc@google.com> wrote:
>>>>
>>>> On Fri, Nov 10, 2023, Jacky Li wrote:
>>>>> The cache flush operation in sev guest memory reclaim events was
>>>>> originally introduced to prevent security issues due to cache
>>>>> incoherence and untrusted VMM. However when this operation gets
>>>>> triggered, it causes performance degradation to the whole machine.
>>>>>
>>>>> This cache flush operation is performed in mmu_notifiers, in particular,
>>>>> in the mmu_notifier_invalidate_range_start() function, unconditionally
>>>>> on all guest memory regions. Although the intention was to flush
>>>>> cache lines only when guest memory was deallocated, the excessive
>>>>> invocations include many other cases where this flush is unnecessary.
>>>>>
>>>>> This RFC proposes using the mmu notifier event to determine whether a
>>>>> cache flush is needed. Specifically, only do the cache flush when the
>>>>> address range is unmapped, cleared, released or migrated. A bitmap
>>>>> module param is also introduced to provide flexibility when flush is
>>>>> needed in more events or no flush is needed depending on the hardware
>>>>> platform.
>>>>
>>>> I'm still not at all convinced that this is worth doing.  We have clear line of
>>>> sight to cleanly and optimally handling SNP and beyond.  If there is an actual
>>>> use case that wants to run SEV and/or SEV-ES VMs, which can't support page
>>>> migration, on the same host as traditional VMs, _and_ for some reason their
>>>> userspace is incapable of providing reasonable NUMA locality, then the owners of
>>>> that use case can speak up and provide justification for taking on this extra
>>>> complexity in KVM.
>>>
>>> Hi Sean,
>>>
>>> Jacky and I were looking at some cases like mmu_notifier calls
>>> triggered by the overloaded reason "MMU_NOTIFY_CLEAR". Even if we turn
>>> off page migration etc, splitting PMD may still happen at some point
>>> under this reason, and we will never be able to turn it off by
>>> tweaking kernel CONFIG options. So, I think this is the line of sight
>>> for this series.
>>>
>>> Handling SNP could be separate, since in SNP we have per-page
>>> properties, which allow KVM to know which page to flush individually.
>>>
>>
>> For SNP + gmem, where the HVA ranges covered by the MMU notifiers are
>> not acting on encrypted pages, we are ignoring MMU invalidation
>> notifiers for SNP guests as part of the SNP host patches being posted
>> upstream and instead relying on gmem own invalidation stuff to clean
>> them up on a per-folio basis.
>>
> oh, I have no question about that. This series only applies to
> SEV/SEV-ES type of VMs.
> 
> For SNP + guest_memfd, I don't see the implementation details, but I
> doubt you can ignore mmu_notifiers if the request does cover some
> encrypted memory in error cases or corner cases.

I believe that all page state transitions from private->shared will 
invoke gmem's own invalidation stuff which should cover such corner cases.

Mike Roth can provide more specific details about that.

>Does the SNP enforce the usage of guest_memfd? 

Again i believe that SNP implementation is only based on and uses guest 
memfd support.

Thanks,
Ashish

> How do we prevent exceptional cases? I am
> sure you guys already figured out the answers, so I don't plan to dig
> deeper until SNP host pages are accepted.
> 
> Clearly, for SEV/SEV-ES, there is no such guarantee like guest_memfd.
> Applying guest_memfd on SEV/SEV-ES might require changes on SEV API I
> suspect, so I think that's equally non-trivial and thus may not be
> worth pursuing.
> 
> Thanks.
> -Mingwei
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/4] KVM: SEV: Limit cache flush operations in sev guest memory reclaim events
  2023-12-01 21:58       ` Mingwei Zhang
  2023-12-01 22:13         ` Kalra, Ashish
@ 2023-12-01 22:13         ` Sean Christopherson
  2023-12-01 22:22           ` Mingwei Zhang
  1 sibling, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2023-12-01 22:13 UTC (permalink / raw)
  To: Mingwei Zhang
  Cc: Ashish Kalra, Jacky Li, Paolo Bonzini, Ovidiu Panait,
	Liam Merwick, David Rientjes, David Kaplan, Peter Gonda, kvm,
	Michael Roth

On Fri, Dec 01, 2023, Mingwei Zhang wrote:
> On Fri, Dec 1, 2023 at 1:30 PM Kalra, Ashish <ashish.kalra@amd.com> wrote:
> > For SNP + gmem, where the HVA ranges covered by the MMU notifiers are
> > not acting on encrypted pages, we are ignoring MMU invalidation
> > notifiers for SNP guests as part of the SNP host patches being posted
> > upstream and instead relying on gmem own invalidation stuff to clean
> > them up on a per-folio basis.
> >
> > Thanks,
> > Ashish
> 
> oh, I have no question about that. This series only applies to
> SEV/SEV-ES type of VMs.
> 
> For SNP + guest_memfd, I don't see the implementation details, but I
> doubt you can ignore mmu_notifiers if the request does cover some
> encrypted memory in error cases or corner cases. Does the SNP enforce
> the usage of guest_memfd? How do we prevent exceptional cases? I am
> sure you guys already figured out the answers, so I don't plan to dig
> deeper until SNP host pages are accepted.

KVM will not allow SNP guests to map VMA-based memory as encrypted/private, full
stop.  Any invalidations initiated by mmu_notifiers will therefore apply only to
shared memory.

That approach doesn't work for SEV/SEV-ES because KVM can't prevent the guest
from accessing memory as encrypted, i.e. KVM needs the #NPF due to RMP violation
to intercept attempts to convert a GFN from shared to private.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/4] KVM: SEV: Limit cache flush operations in sev guest memory reclaim events
  2023-12-01 22:13         ` Sean Christopherson
@ 2023-12-01 22:22           ` Mingwei Zhang
  2023-12-01 22:30             ` Sean Christopherson
  0 siblings, 1 reply; 17+ messages in thread
From: Mingwei Zhang @ 2023-12-01 22:22 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Ashish Kalra, Jacky Li, Paolo Bonzini, Ovidiu Panait,
	Liam Merwick, David Rientjes, David Kaplan, Peter Gonda, kvm,
	Michael Roth

On Fri, Dec 1, 2023 at 2:13 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, Dec 01, 2023, Mingwei Zhang wrote:
> > On Fri, Dec 1, 2023 at 1:30 PM Kalra, Ashish <ashish.kalra@amd.com> wrote:
> > > For SNP + gmem, where the HVA ranges covered by the MMU notifiers are
> > > not acting on encrypted pages, we are ignoring MMU invalidation
> > > notifiers for SNP guests as part of the SNP host patches being posted
> > > upstream and instead relying on gmem own invalidation stuff to clean
> > > them up on a per-folio basis.
> > >
> > > Thanks,
> > > Ashish
> >
> > oh, I have no question about that. This series only applies to
> > SEV/SEV-ES type of VMs.
> >
> > For SNP + guest_memfd, I don't see the implementation details, but I
> > doubt you can ignore mmu_notifiers if the request does cover some
> > encrypted memory in error cases or corner cases. Does the SNP enforce
> > the usage of guest_memfd? How do we prevent exceptional cases? I am
> > sure you guys already figured out the answers, so I don't plan to dig
> > deeper until SNP host pages are accepted.
>
> KVM will not allow SNP guests to map VMA-based memory as encrypted/private, full
> stop.  Any invalidations initiated by mmu_notifiers will therefore apply only to
> shared memory.

Remind me. If I (as a SEV-SNP guest) flip the C-bit in my own x86 page
table and write to some of the pages, am I generating encrypted dirty
cache lines? I understand that the RMP table may say, hey it is
"shared" but that's ok since I just don't need to pvalidate them,
right?

>
> That approach doesn't work for SEV/SEV-ES because KVM can't prevent the guest
> from accessing memory as encrypted, i.e. KVM needs the #NPF due to RMP violation
> to intercept attempts to convert a GFN from shared to private.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/4] KVM: SEV: Limit cache flush operations in sev guest memory reclaim events
  2023-12-01 22:22           ` Mingwei Zhang
@ 2023-12-01 22:30             ` Sean Christopherson
  2023-12-02  6:21               ` Mingwei Zhang
  0 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2023-12-01 22:30 UTC (permalink / raw)
  To: Mingwei Zhang
  Cc: Ashish Kalra, Jacky Li, Paolo Bonzini, Ovidiu Panait,
	Liam Merwick, David Rientjes, David Kaplan, Peter Gonda, kvm,
	Michael Roth

On Fri, Dec 01, 2023, Mingwei Zhang wrote:
> On Fri, Dec 1, 2023 at 2:13 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Fri, Dec 01, 2023, Mingwei Zhang wrote:
> > > On Fri, Dec 1, 2023 at 1:30 PM Kalra, Ashish <ashish.kalra@amd.com> wrote:
> > > > For SNP + gmem, where the HVA ranges covered by the MMU notifiers are
> > > > not acting on encrypted pages, we are ignoring MMU invalidation
> > > > notifiers for SNP guests as part of the SNP host patches being posted
> > > > upstream and instead relying on gmem own invalidation stuff to clean
> > > > them up on a per-folio basis.
> > > >
> > > > Thanks,
> > > > Ashish
> > >
> > > oh, I have no question about that. This series only applies to
> > > SEV/SEV-ES type of VMs.
> > >
> > > For SNP + guest_memfd, I don't see the implementation details, but I
> > > doubt you can ignore mmu_notifiers if the request does cover some
> > > encrypted memory in error cases or corner cases. Does the SNP enforce
> > > the usage of guest_memfd? How do we prevent exceptional cases? I am
> > > sure you guys already figured out the answers, so I don't plan to dig
> > > deeper until SNP host pages are accepted.
> >
> > KVM will not allow SNP guests to map VMA-based memory as encrypted/private, full
> > stop.  Any invalidations initiated by mmu_notifiers will therefore apply only to
> > shared memory.
> 
> Remind me. If I (as a SEV-SNP guest) flip the C-bit in my own x86 page
> table and write to some of the pages, am I generating encrypted dirty
> cache lines?

No.  See Table 15-39. "RMP Memory Access Checks" in the APM (my attempts to copy
it to plain test failed miserably).  

For accesses with effective C-bit == 0, the page must be Hypervisor-Owned.  For
effective C-bit == 1, the page must be fully assigned to the guest.  Violation
of those rules generates #VMEXIT.

A missing Validated attribute causes a #VC, but that case has lower priority than
the about checks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/4] KVM: SEV: Limit cache flush operations in sev guest memory reclaim events
  2023-12-01 22:30             ` Sean Christopherson
@ 2023-12-02  6:21               ` Mingwei Zhang
  0 siblings, 0 replies; 17+ messages in thread
From: Mingwei Zhang @ 2023-12-02  6:21 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Ashish Kalra, Jacky Li, Paolo Bonzini, Ovidiu Panait,
	Liam Merwick, David Rientjes, David Kaplan, Peter Gonda, kvm,
	Michael Roth

On Fri, Dec 01, 2023, Sean Christopherson wrote:
> On Fri, Dec 01, 2023, Mingwei Zhang wrote:
> > On Fri, Dec 1, 2023 at 2:13 PM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > On Fri, Dec 01, 2023, Mingwei Zhang wrote:
> > > > On Fri, Dec 1, 2023 at 1:30 PM Kalra, Ashish <ashish.kalra@amd.com> wrote:
> > > > > For SNP + gmem, where the HVA ranges covered by the MMU notifiers are
> > > > > not acting on encrypted pages, we are ignoring MMU invalidation
> > > > > notifiers for SNP guests as part of the SNP host patches being posted
> > > > > upstream and instead relying on gmem own invalidation stuff to clean
> > > > > them up on a per-folio basis.
> > > > >
> > > > > Thanks,
> > > > > Ashish
> > > >
> > > > oh, I have no question about that. This series only applies to
> > > > SEV/SEV-ES type of VMs.
> > > >
> > > > For SNP + guest_memfd, I don't see the implementation details, but I
> > > > doubt you can ignore mmu_notifiers if the request does cover some
> > > > encrypted memory in error cases or corner cases. Does the SNP enforce
> > > > the usage of guest_memfd? How do we prevent exceptional cases? I am
> > > > sure you guys already figured out the answers, so I don't plan to dig
> > > > deeper until SNP host pages are accepted.
> > >
> > > KVM will not allow SNP guests to map VMA-based memory as encrypted/private, full
> > > stop.  Any invalidations initiated by mmu_notifiers will therefore apply only to
> > > shared memory.
> > 
> > Remind me. If I (as a SEV-SNP guest) flip the C-bit in my own x86 page
> > table and write to some of the pages, am I generating encrypted dirty
> > cache lines?
> 
> No.  See Table 15-39. "RMP Memory Access Checks" in the APM (my attempts to copy
> it to plain test failed miserably).  
> 
> For accesses with effective C-bit == 0, the page must be Hypervisor-Owned.  For
> effective C-bit == 1, the page must be fully assigned to the guest.  Violation
> of those rules generates #VMEXIT.
> 
> A missing Validated attribute causes a #VC, but that case has lower priority than
> the about checks.

Thank you. RMP check seems to be mandatory on all memory accesses when
SNP is active. Hope that property remain an invarient regardless of any
future optimization.

Thanks.
-Mingwei

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-12-02  6:21 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-10  0:37 [RFC PATCH 0/4] KVM: SEV: Limit cache flush operations in sev guest memory reclaim events Jacky Li
2023-11-10  0:37 ` [RFC PATCH 1/4] KVM: SEV: Drop wbinvd_on_all_cpus() as kvm mmu notifier would flush the cache Jacky Li
2023-12-01 17:53   ` Sean Christopherson
2023-11-10  0:37 ` [RFC PATCH 2/4] KVM: SEV: Plumb mmu_notifier_event into sev function Jacky Li
2023-11-10  0:37 ` [RFC PATCH 3/4] KVM: SEV: Limit the call of WBINVDs based on the event type of mmu notifier Jacky Li
2023-11-10 18:52   ` Kalra, Ashish
2023-11-10  0:37 ` [RFC PATCH 4/4] KVM: SEV: Use a bitmap module param to decide whether a cache flush is needed during the guest memory reclaim Jacky Li
2023-12-01 18:00   ` Sean Christopherson
2023-12-01 18:05 ` [RFC PATCH 0/4] KVM: SEV: Limit cache flush operations in sev guest memory reclaim events Sean Christopherson
2023-12-01 19:02   ` Mingwei Zhang
2023-12-01 21:30     ` Kalra, Ashish
2023-12-01 21:58       ` Mingwei Zhang
2023-12-01 22:13         ` Kalra, Ashish
2023-12-01 22:13         ` Sean Christopherson
2023-12-01 22:22           ` Mingwei Zhang
2023-12-01 22:30             ` Sean Christopherson
2023-12-02  6:21               ` Mingwei Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).