[PATCH v2 0/4] KVM: SVM: Add Page Modification Logging (PML) support

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/4] KVM: SVM: Add Page Modification Logging (PML) support
@ 2025-09-15  8:59 Nikunj A Dadhania
  2025-09-15  8:59 ` [PATCH v2 1/4] KVM: x86: Carve out PML flush routine Nikunj A Dadhania
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Nikunj A Dadhania @ 2025-09-15  8:59 UTC (permalink / raw)
  To: seanjc, pbonzini
  Cc: kvm, thomas.lendacky, santosh.shukla, bp, joao.m.martins, nikunj,
	kai.huang

This series implements Page Modification Logging (PML) for guests, bringing
hardware-assisted dirty logging support. PML is designed to track guest
modified memory pages. PML enables the hypervisor to identify which pages in a
guest's memory have been modified since the last checkpoint or during live
migration.

The PML feature uses two new VMCB fields (PML_ADDR and PML_INDEX) and
generates a VMEXIT when the 4KB log buffer becomes full.

Patch breakdown:
1. Refactor existing VMX PML code to be shared between VMX and SVM
2. Prepare to share the PML page in VMX and SVM
3. Add AMD SVM PML CPUID
4. Implement SVM PML support using the shared infrastructure

The feature is enabled by default when hardware support is detected and
can be disabled via the 'pml' module parameter.

Changelog:

v2:
* Rebased on latest kvm/next
* Added patch to move pml_pg field from struct vcpu_vmx to struct kvm_vcpu_arch
  to share the PML page. (Kai Huang)
* Dropped the SNP safe allocation optimization patch, will submit it separately.
* Update commit message adding explicit mention that AMD PML follows VMX behavior
  (Kai Huang)
* Updated SNP erratum comment to include PML buffer alongside VMCB, VMSA, and
  AVIC pages. (Kai Huang)

RFC: https://lore.kernel.org/kvm/20250825152009.3512-1-nikunj@amd.com/

Nikunj A Dadhania (4):
  KVM: x86: Carve out PML flush routine
  KVM: x86: Move PML page to common vcpu arch structure
  x86/cpufeatures: Add Page modification logging
  KVM: SVM: Add Page modification logging support

 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  2 +
 arch/x86/include/asm/svm.h         |  6 +-
 arch/x86/include/uapi/asm/svm.h    |  2 +
 arch/x86/kernel/cpu/scattered.c    |  1 +
 arch/x86/kvm/svm/sev.c             |  2 +-
 arch/x86/kvm/svm/svm.c             | 99 +++++++++++++++++++++++++++++-
 arch/x86/kvm/svm/svm.h             |  4 ++
 arch/x86/kvm/vmx/vmx.c             | 48 ++++-----------
 arch/x86/kvm/vmx/vmx.h             |  7 ---
 arch/x86/kvm/x86.c                 | 31 ++++++++++
 arch/x86/kvm/x86.h                 |  7 +++
 12 files changed, 163 insertions(+), 47 deletions(-)

base-commit: a6ad54137af92535cfe32e19e5f3bc1bb7dbd383
-- 
2.48.1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/4] KVM: x86: Carve out PML flush routine
  2025-09-15  8:59 [PATCH v2 0/4] KVM: SVM: Add Page Modification Logging (PML) support Nikunj A Dadhania
@ 2025-09-15  8:59 ` Nikunj A Dadhania
  2025-09-15  8:59 ` [PATCH v2 2/4] KVM: x86: Move PML page to common vcpu arch structure Nikunj A Dadhania
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Nikunj A Dadhania @ 2025-09-15  8:59 UTC (permalink / raw)
  To: seanjc, pbonzini
  Cc: kvm, thomas.lendacky, santosh.shukla, bp, joao.m.martins, nikunj,
	kai.huang

Move the PML (Page Modification Logging) buffer flushing logic from
VMX-specific code to common x86 KVM code to enable reuse by SVM and avoid
code duplication.

The AMD SVM PML implementations share the same behavior as VMX PML:
 1) The PML buffer is a 4K page with 512 entries
 2) Hardware records dirty GPAs in reverse order (from index 511 to 0)
 3) Hardware clears bits 11:0 when recording GPAs

The PML constants (PML_LOG_NR_ENTRIES and PML_HEAD_INDEX) are moved from
vmx.h to x86.h to make them available to both VMX and SVM.

No functional change intended for VMX, except tone down the WARN_ON() to
WARN_ON_ONCE() for the page alignment check. If hardware exhibits this
behavior once, it's likely to occur repeatedly, so use WARN_ON_ONCE() to
avoid log flooding while still capturing the unexpected condition.

The refactoring prepares for SVM to leverage the same PML flushing
implementation.

Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
---
 arch/x86/kvm/vmx/vmx.c | 26 ++------------------------
 arch/x86/kvm/vmx/vmx.h |  5 -----
 arch/x86/kvm/x86.c     | 31 +++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.h     |  7 +++++++
 4 files changed, 40 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index aa157fe5b7b3..a0955155d7ca 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6107,37 +6107,15 @@ static void vmx_destroy_pml_buffer(struct vcpu_vmx *vmx)
 static void vmx_flush_pml_buffer(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
-	u16 pml_idx, pml_tail_index;
-	u64 *pml_buf;
-	int i;
+	u16 pml_idx;
 
 	pml_idx = vmcs_read16(GUEST_PML_INDEX);
 
 	/* Do nothing if PML buffer is empty */
 	if (pml_idx == PML_HEAD_INDEX)
 		return;
-	/*
-	 * PML index always points to the next available PML buffer entity
-	 * unless PML log has just overflowed.
-	 */
-	pml_tail_index = (pml_idx >= PML_LOG_NR_ENTRIES) ? 0 : pml_idx + 1;
 
-	/*
-	 * PML log is written backwards: the CPU first writes the entry 511
-	 * then the entry 510, and so on.
-	 *
-	 * Read the entries in the same order they were written, to ensure that
-	 * the dirty ring is filled in the same order the CPU wrote them.
-	 */
-	pml_buf = page_address(vmx->pml_pg);
-
-	for (i = PML_HEAD_INDEX; i >= pml_tail_index; i--) {
-		u64 gpa;
-
-		gpa = pml_buf[i];
-		WARN_ON(gpa & (PAGE_SIZE - 1));
-		kvm_vcpu_mark_page_dirty(vcpu, gpa >> PAGE_SHIFT);
-	}
+	kvm_flush_pml_buffer(vcpu, vmx->pml_pg, pml_idx);
 
 	/* reset PML index */
 	vmcs_write16(GUEST_PML_INDEX, PML_HEAD_INDEX);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index d3389baf3ab3..4494c253727f 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -269,11 +269,6 @@ struct vcpu_vmx {
 	unsigned int ple_window;
 	bool ple_window_dirty;
 
-	/* Support for PML */
-#define PML_LOG_NR_ENTRIES	512
-	/* PML is written backwards: this is the first entry written by the CPU */
-#define PML_HEAD_INDEX		(PML_LOG_NR_ENTRIES-1)
-
 	struct page *pml_pg;
 
 	/* apic deadline value in host tsc */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 33fba801b205..123ebe7be184 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6417,6 +6417,37 @@ void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
 		kvm_vcpu_kick(vcpu);
 }
 
+void kvm_flush_pml_buffer(struct kvm_vcpu *vcpu, struct page *pml_page, u16 pml_idx)
+{
+	u16 pml_tail_index;
+	u64 *pml_buf;
+	int i;
+
+	/*
+	 * PML index always points to the next available PML buffer entity
+	 * unless PML log has just overflowed.
+	 */
+	pml_tail_index = (pml_idx >= PML_LOG_NR_ENTRIES) ? 0 : pml_idx + 1;
+
+	/*
+	 * PML log is written backwards: the CPU first writes the entry 511
+	 * then the entry 510, and so on.
+	 *
+	 * Read the entries in the same order they were written, to ensure that
+	 * the dirty ring is filled in the same order the CPU wrote them.
+	 */
+	pml_buf = page_address(pml_page);
+
+	for (i = PML_HEAD_INDEX; i >= pml_tail_index; i--) {
+		u64 gpa;
+
+		gpa = pml_buf[i];
+		WARN_ON_ONCE(gpa & (PAGE_SIZE - 1));
+		kvm_vcpu_mark_page_dirty(vcpu, gpa >> PAGE_SHIFT);
+	}
+}
+EXPORT_SYMBOL_GPL(kvm_flush_pml_buffer);
+
 int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 			    struct kvm_enable_cap *cap)
 {
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index bcfd9b719ada..23c188c0a24b 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -699,4 +699,11 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, int cpl,
 
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
 
+/* Support for PML */
+#define PML_LOG_NR_ENTRIES	512
+/* PML is written backwards: this is the first entry written by the CPU */
+#define PML_HEAD_INDEX		(PML_LOG_NR_ENTRIES-1)
+
+void kvm_flush_pml_buffer(struct kvm_vcpu *vcpu, struct page *pml_pg, u16 pml_idx);
+
 #endif
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 2/4] KVM: x86: Move PML page to common vcpu arch structure
  2025-09-15  8:59 [PATCH v2 0/4] KVM: SVM: Add Page Modification Logging (PML) support Nikunj A Dadhania
  2025-09-15  8:59 ` [PATCH v2 1/4] KVM: x86: Carve out PML flush routine Nikunj A Dadhania
@ 2025-09-15  8:59 ` Nikunj A Dadhania
  2025-09-16 10:27   ` Huang, Kai
  2025-09-15  8:59 ` [PATCH v2 3/4] x86/cpufeatures: Add Page modification logging Nikunj A Dadhania
  2025-09-15  8:59 ` [PATCH v2 4/4] KVM: SVM: Add Page modification logging support Nikunj A Dadhania
  3 siblings, 1 reply; 13+ messages in thread
From: Nikunj A Dadhania @ 2025-09-15  8:59 UTC (permalink / raw)
  To: seanjc, pbonzini
  Cc: kvm, thomas.lendacky, santosh.shukla, bp, joao.m.martins, nikunj,
	kai.huang

Move the PML page from VMX-specific vcpu_vmx structure to the common
kvm_vcpu_arch structure to share it between VMX and SVM implementations.

Update all VMX references accordingly, and simplify the
kvm_flush_pml_buffer() interface by removing the page parameter since it
can now access the page directly from the vcpu structure.

No functional change, restructuring to prepare for SVM PML support.

Suggested-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/vmx/vmx.c          | 24 ++++++++++++------------
 arch/x86/kvm/vmx/vmx.h          |  2 --
 arch/x86/kvm/x86.c              |  4 ++--
 arch/x86/kvm/x86.h              |  2 +-
 5 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c56cc54d682a..62a7d519fbaf 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -857,6 +857,8 @@ struct kvm_vcpu_arch {
 	 */
 	struct kvm_mmu_memory_cache mmu_external_spt_cache;
 
+	struct page *pml_page;
+
 	/*
 	 * QEMU userspace and the guest each have their own FPU state.
 	 * In vcpu_run, we switch between the user and guest FPU contexts.
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a0955155d7ca..9520e11b08d0 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4612,7 +4612,8 @@ int vmx_vcpu_precreate(struct kvm *kvm)
 
 static void init_vmcs(struct vcpu_vmx *vmx)
 {
-	struct kvm *kvm = vmx->vcpu.kvm;
+	struct kvm_vcpu *vcpu = &vmx->vcpu;
+	struct kvm *kvm = vcpu->kvm;
 	struct kvm_vmx *kvm_vmx = to_kvm_vmx(kvm);
 
 	if (nested)
@@ -4703,7 +4704,7 @@ static void init_vmcs(struct vcpu_vmx *vmx)
 		vmcs_write64(XSS_EXIT_BITMAP, VMX_XSS_EXIT_BITMAP);
 
 	if (enable_pml) {
-		vmcs_write64(PML_ADDRESS, page_to_phys(vmx->pml_pg));
+		vmcs_write64(PML_ADDRESS, page_to_phys(vcpu->arch.pml_page));
 		vmcs_write16(GUEST_PML_INDEX, PML_HEAD_INDEX);
 	}
 
@@ -6096,17 +6097,16 @@ void vmx_get_entry_info(struct kvm_vcpu *vcpu, u32 *intr_info, u32 *error_code)
 		*error_code = 0;
 }
 
-static void vmx_destroy_pml_buffer(struct vcpu_vmx *vmx)
+static void vmx_destroy_pml_buffer(struct kvm_vcpu *vcpu)
 {
-	if (vmx->pml_pg) {
-		__free_page(vmx->pml_pg);
-		vmx->pml_pg = NULL;
+	if (vcpu->arch.pml_page) {
+		__free_page(vcpu->arch.pml_page);
+		vcpu->arch.pml_page = NULL;
 	}
 }
 
 static void vmx_flush_pml_buffer(struct kvm_vcpu *vcpu)
 {
-	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	u16 pml_idx;
 
 	pml_idx = vmcs_read16(GUEST_PML_INDEX);
@@ -6115,7 +6115,7 @@ static void vmx_flush_pml_buffer(struct kvm_vcpu *vcpu)
 	if (pml_idx == PML_HEAD_INDEX)
 		return;
 
-	kvm_flush_pml_buffer(vcpu, vmx->pml_pg, pml_idx);
+	kvm_flush_pml_buffer(vcpu, pml_idx);
 
 	/* reset PML index */
 	vmcs_write16(GUEST_PML_INDEX, PML_HEAD_INDEX);
@@ -7388,7 +7388,7 @@ void vmx_vcpu_free(struct kvm_vcpu *vcpu)
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
 	if (enable_pml)
-		vmx_destroy_pml_buffer(vmx);
+		vmx_destroy_pml_buffer(vcpu);
 	free_vpid(vmx->vpid);
 	nested_vmx_free_vcpu(vcpu);
 	free_loaded_vmcs(vmx->loaded_vmcs);
@@ -7417,8 +7417,8 @@ int vmx_vcpu_create(struct kvm_vcpu *vcpu)
 	 * for the guest), etc.
 	 */
 	if (enable_pml) {
-		vmx->pml_pg = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
-		if (!vmx->pml_pg)
+		vcpu->arch.pml_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+		if (!vcpu->arch.pml_page)
 			goto free_vpid;
 	}
 
@@ -7489,7 +7489,7 @@ int vmx_vcpu_create(struct kvm_vcpu *vcpu)
 free_vmcs:
 	free_loaded_vmcs(vmx->loaded_vmcs);
 free_pml:
-	vmx_destroy_pml_buffer(vmx);
+	vmx_destroy_pml_buffer(vcpu);
 free_vpid:
 	free_vpid(vmx->vpid);
 	return err;
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 4494c253727f..6fafb6228c17 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -269,8 +269,6 @@ struct vcpu_vmx {
 	unsigned int ple_window;
 	bool ple_window_dirty;
 
-	struct page *pml_pg;
-
 	/* apic deadline value in host tsc */
 	u64 hv_deadline_tsc;
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 123ebe7be184..afa7f8b46416 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6417,7 +6417,7 @@ void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
 		kvm_vcpu_kick(vcpu);
 }
 
-void kvm_flush_pml_buffer(struct kvm_vcpu *vcpu, struct page *pml_page, u16 pml_idx)
+void kvm_flush_pml_buffer(struct kvm_vcpu *vcpu, u16 pml_idx)
 {
 	u16 pml_tail_index;
 	u64 *pml_buf;
@@ -6436,7 +6436,7 @@ void kvm_flush_pml_buffer(struct kvm_vcpu *vcpu, struct page *pml_page, u16 pml_
 	 * Read the entries in the same order they were written, to ensure that
 	 * the dirty ring is filled in the same order the CPU wrote them.
 	 */
-	pml_buf = page_address(pml_page);
+	pml_buf = page_address(vcpu->arch.pml_page);
 
 	for (i = PML_HEAD_INDEX; i >= pml_tail_index; i--) {
 		u64 gpa;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 23c188c0a24b..92016081a7e7 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -704,6 +704,6 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
 /* PML is written backwards: this is the first entry written by the CPU */
 #define PML_HEAD_INDEX		(PML_LOG_NR_ENTRIES-1)
 
-void kvm_flush_pml_buffer(struct kvm_vcpu *vcpu, struct page *pml_pg, u16 pml_idx);
+void kvm_flush_pml_buffer(struct kvm_vcpu *vcpu, u16 pml_idx);
 
 #endif
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 3/4] x86/cpufeatures: Add Page modification logging
  2025-09-15  8:59 [PATCH v2 0/4] KVM: SVM: Add Page Modification Logging (PML) support Nikunj A Dadhania
  2025-09-15  8:59 ` [PATCH v2 1/4] KVM: x86: Carve out PML flush routine Nikunj A Dadhania
  2025-09-15  8:59 ` [PATCH v2 2/4] KVM: x86: Move PML page to common vcpu arch structure Nikunj A Dadhania
@ 2025-09-15  8:59 ` Nikunj A Dadhania
  2025-09-15  8:59 ` [PATCH v2 4/4] KVM: SVM: Add Page modification logging support Nikunj A Dadhania
  3 siblings, 0 replies; 13+ messages in thread
From: Nikunj A Dadhania @ 2025-09-15  8:59 UTC (permalink / raw)
  To: seanjc, pbonzini
  Cc: kvm, thomas.lendacky, santosh.shukla, bp, joao.m.martins, nikunj,
	kai.huang

Page modification logging(PML) is a hardware feature designed to track
guest modified memory pages. PML enables the hypervisor to identify which
pages in a guest's memory have been changed since the last checkpoint or
during live migration.

The PML feature is advertised via CPUID leaf 0x8000000A ECX[4] bit.

Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/kernel/cpu/scattered.c    | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 06fc0479a23f..60e95f2bea06 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -228,6 +228,7 @@
 #define X86_FEATURE_PVUNLOCK		( 8*32+20) /* PV unlock function */
 #define X86_FEATURE_VCPUPREEMPT		( 8*32+21) /* PV vcpu_is_preempted function */
 #define X86_FEATURE_TDX_GUEST		( 8*32+22) /* "tdx_guest" Intel Trust Domain Extensions Guest */
+#define X86_FEATURE_PML			( 8*32+23) /* AMD Page Modification logging */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */
 #define X86_FEATURE_FSGSBASE		( 9*32+ 0) /* "fsgsbase" RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 6b868afb26c3..aeca6f55d2fa 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -48,6 +48,7 @@ static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_PROC_FEEDBACK,		CPUID_EDX, 11, 0x80000007, 0 },
 	{ X86_FEATURE_AMD_FAST_CPPC,		CPUID_EDX, 15, 0x80000007, 0 },
 	{ X86_FEATURE_MBA,			CPUID_EBX,  6, 0x80000008, 0 },
+	{ X86_FEATURE_PML,			CPUID_ECX,  4, 0x8000000A, 0 },
 	{ X86_FEATURE_COHERENCY_SFW_NO,		CPUID_EBX, 31, 0x8000001f, 0 },
 	{ X86_FEATURE_SMBA,			CPUID_EBX,  2, 0x80000020, 0 },
 	{ X86_FEATURE_BMEC,			CPUID_EBX,  3, 0x80000020, 0 },
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 4/4] KVM: SVM: Add Page modification logging support
  2025-09-15  8:59 [PATCH v2 0/4] KVM: SVM: Add Page Modification Logging (PML) support Nikunj A Dadhania
                   ` (2 preceding siblings ...)
  2025-09-15  8:59 ` [PATCH v2 3/4] x86/cpufeatures: Add Page modification logging Nikunj A Dadhania
@ 2025-09-15  8:59 ` Nikunj A Dadhania
  2025-09-15 12:35   ` Huang, Kai
  3 siblings, 1 reply; 13+ messages in thread
From: Nikunj A Dadhania @ 2025-09-15  8:59 UTC (permalink / raw)
  To: seanjc, pbonzini
  Cc: kvm, thomas.lendacky, santosh.shukla, bp, joao.m.martins, nikunj,
	kai.huang

Currently, dirty logging relies on write protecting guest memory and
marking dirty GFNs during subsequent write faults. This method works but
incurs overhead due to additional write faults for each dirty GFN.

Implement support for the Page Modification Logging (PML) feature, a
hardware-assisted method for efficient dirty logging. PML automatically
logs dirty GPA[51:12] to a 4K buffer when the CPU sets NPT D-bits. Two new
VMCB fields are utilized: PML_ADDR and PML_INDEX. The PML_INDEX is
initialized to 511 (8 bytes per GPA entry), and the CPU decreases the
PML_INDEX after logging each GPA. When the PML buffer is full, a
VMEXIT(PML_FULL) with exit code 0x407 is generated.

PML is enabled by default when supported and can be disabled via the 'pml'
module parameter.

Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
---
 arch/x86/include/asm/svm.h      |  6 +-
 arch/x86/include/uapi/asm/svm.h |  2 +
 arch/x86/kvm/svm/sev.c          |  2 +-
 arch/x86/kvm/svm/svm.c          | 99 ++++++++++++++++++++++++++++++++-
 arch/x86/kvm/svm/svm.h          |  4 ++
 5 files changed, 108 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index ffc27f676243..9fbada95afd5 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -165,7 +165,10 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 	u8 reserved_9[22];
 	u64 allowed_sev_features;	/* Offset 0x138 */
 	u64 guest_sev_features;		/* Offset 0x140 */
-	u8 reserved_10[664];
+	u8 reserved_10[128];
+	u64 pml_addr;			/* Offset 0x1c8 */
+	u16 pml_index;			/* Offset 0x1d0 */
+	u8 reserved_11[526];
 	/*
 	 * Offset 0x3e0, 32 bytes reserved
 	 * for use by hypervisor/software.
@@ -239,6 +242,7 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 #define SVM_NESTED_CTL_NP_ENABLE	BIT(0)
 #define SVM_NESTED_CTL_SEV_ENABLE	BIT(1)
 #define SVM_NESTED_CTL_SEV_ES_ENABLE	BIT(2)
+#define SVM_NESTED_CTL_PML_ENABLE	BIT(11)
 
 
 #define SVM_TSC_RATIO_RSVD	0xffffff0000000000ULL
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 9c640a521a67..f329dca167de 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -101,6 +101,7 @@
 #define SVM_EXIT_AVIC_INCOMPLETE_IPI		0x401
 #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS	0x402
 #define SVM_EXIT_VMGEXIT       0x403
+#define SVM_EXIT_PML_FULL	0x407
 
 /* SEV-ES software-defined VMGEXIT events */
 #define SVM_VMGEXIT_MMIO_READ			0x80000001
@@ -232,6 +233,7 @@
 	{ SVM_EXIT_AVIC_INCOMPLETE_IPI,		"avic_incomplete_ipi" }, \
 	{ SVM_EXIT_AVIC_UNACCELERATED_ACCESS,   "avic_unaccelerated_access" }, \
 	{ SVM_EXIT_VMGEXIT,		"vmgexit" }, \
+	{ SVM_EXIT_PML_FULL,		"pml_full" }, \
 	{ SVM_VMGEXIT_MMIO_READ,	"vmgexit_mmio_read" }, \
 	{ SVM_VMGEXIT_MMIO_WRITE,	"vmgexit_mmio_write" }, \
 	{ SVM_VMGEXIT_NMI_COMPLETE,	"vmgexit_nmi_complete" }, \
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 5bac4d20aec0..b179a0a2581a 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4669,7 +4669,7 @@ struct page *snp_safe_alloc_page_node(int node, gfp_t gfp)
 	 * Allocate an SNP-safe page to workaround the SNP erratum where
 	 * the CPU will incorrectly signal an RMP violation #PF if a
 	 * hugepage (2MB or 1GB) collides with the RMP entry of a
-	 * 2MB-aligned VMCB, VMSA, or AVIC backing page.
+	 * 2MB-aligned VMCB, VMSA, PML or AVIC backing page.
 	 *
 	 * Allocate one extra page, choose a page which is not
 	 * 2MB-aligned, and free the other.
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8a66e2e985a4..a44fd68e3e23 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -178,6 +178,9 @@ module_param(intercept_smi, bool, 0444);
 bool vnmi = true;
 module_param(vnmi, bool, 0444);
 
+bool pml = true;
+module_param(pml, bool, 0444);
+
 static bool svm_gp_erratum_intercept = true;
 
 static u8 rsm_ins_bytes[] = "\x0f\xaa";
@@ -1220,6 +1223,16 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 	if (vcpu->kvm->arch.bus_lock_detection_enabled)
 		svm_set_intercept(svm, INTERCEPT_BUSLOCK);
 
+	if (pml) {
+		/*
+		 * Populate the page address and index here, PML is enabled
+		 * when dirty logging is enabled on the memslot through
+		 * svm_update_cpu_dirty_logging()
+		 */
+		control->pml_addr = (u64)__sme_set(page_to_phys(vcpu->arch.pml_page));
+		control->pml_index = PML_HEAD_INDEX;
+	}
+
 	if (sev_guest(vcpu->kvm))
 		sev_init_vmcb(svm);
 
@@ -1296,14 +1309,20 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
 			goto error_free_vmcb_page;
 	}
 
+	if (pml) {
+		vcpu->arch.pml_page = snp_safe_alloc_page();
+		if (!vcpu->arch.pml_page)
+			goto error_free_vmsa_page;
+	}
+
 	err = avic_init_vcpu(svm);
 	if (err)
-		goto error_free_vmsa_page;
+		goto error_free_pml_page;
 
 	svm->msrpm = svm_vcpu_alloc_msrpm();
 	if (!svm->msrpm) {
 		err = -ENOMEM;
-		goto error_free_vmsa_page;
+		goto error_free_pml_page;
 	}
 
 	svm->x2avic_msrs_intercepted = true;
@@ -1319,6 +1338,9 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
 
 	return 0;
 
+error_free_pml_page:
+	if (vcpu->arch.pml_page)
+		__free_page(vcpu->arch.pml_page);
 error_free_vmsa_page:
 	if (vmsa_page)
 		__free_page(vmsa_page);
@@ -1339,6 +1361,9 @@ static void svm_vcpu_free(struct kvm_vcpu *vcpu)
 
 	sev_free_vcpu(vcpu);
 
+	if (pml)
+		__free_page(vcpu->arch.pml_page);
+
 	__free_page(__sme_pa_to_page(svm->vmcb01.pa));
 	svm_vcpu_free_msrpm(svm->msrpm);
 }
@@ -3206,6 +3231,53 @@ static int bus_lock_exit(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+void svm_update_cpu_dirty_logging(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	if (WARN_ON_ONCE(!pml))
+		return;
+
+	if (is_guest_mode(vcpu))
+		return;
+
+	/*
+	 * Note, nr_memslots_dirty_logging can be changed concurrently with this
+	 * code, but in that case another update request will be made and so the
+	 * guest will never run with a stale PML value.
+	 */
+	if (atomic_read(&vcpu->kvm->nr_memslots_dirty_logging))
+		svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_PML_ENABLE;
+	else
+		svm->vmcb->control.nested_ctl &= ~SVM_NESTED_CTL_PML_ENABLE;
+}
+
+static void svm_flush_pml_buffer(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	struct vmcb_control_area *control = &svm->vmcb->control;
+
+	/* Do nothing if PML buffer is empty */
+	if (control->pml_index == PML_HEAD_INDEX)
+		return;
+
+	kvm_flush_pml_buffer(vcpu, control->pml_index);
+
+	/* Reset the PML index */
+	control->pml_index = PML_HEAD_INDEX;
+}
+
+static int pml_full_interception(struct kvm_vcpu *vcpu)
+{
+	trace_kvm_pml_full(vcpu->vcpu_id);
+
+	/*
+	 * PML buffer is already flushed at the beginning of svm_handle_exit().
+	 * Nothing to do here.
+	 */
+	return 1;
+}
+
 static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[SVM_EXIT_READ_CR0]			= cr_interception,
 	[SVM_EXIT_READ_CR3]			= cr_interception,
@@ -3282,6 +3354,7 @@ static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 #ifdef CONFIG_KVM_AMD_SEV
 	[SVM_EXIT_VMGEXIT]			= sev_handle_vmgexit,
 #endif
+	[SVM_EXIT_PML_FULL]			= pml_full_interception,
 };
 
 static void dump_vmcb(struct kvm_vcpu *vcpu)
@@ -3330,8 +3403,10 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
 	pr_err("%-20s%016llx\n", "exit_info2:", control->exit_info_2);
 	pr_err("%-20s%08x\n", "exit_int_info:", control->exit_int_info);
 	pr_err("%-20s%08x\n", "exit_int_info_err:", control->exit_int_info_err);
-	pr_err("%-20s%lld\n", "nested_ctl:", control->nested_ctl);
+	pr_err("%-20s%llx\n", "nested_ctl:", control->nested_ctl);
 	pr_err("%-20s%016llx\n", "nested_cr3:", control->nested_cr3);
+	pr_err("%-20s%016llx\n", "pml_addr:", control->pml_addr);
+	pr_err("%-20s%04x\n", "pml_index:", control->pml_index);
 	pr_err("%-20s%016llx\n", "avic_vapic_bar:", control->avic_vapic_bar);
 	pr_err("%-20s%016llx\n", "ghcb:", control->ghcb_gpa);
 	pr_err("%-20s%08x\n", "event_inj:", control->event_inj);
@@ -3562,6 +3637,15 @@ static int svm_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 	struct kvm_run *kvm_run = vcpu->run;
 	u32 exit_code = svm->vmcb->control.exit_code;
 
+	/*
+	 * Opportunistically flush the PML buffer on VM exit. This keeps the
+	 * dirty bitmap current by processing logged GPAs rather than waiting for
+	 * PML_FULL exit.
+	 */
+	if (pml && !is_guest_mode(vcpu))
+		svm_flush_pml_buffer(vcpu);
+
+
 	/* SEV-ES guests must use the CR write traps to track CR registers. */
 	if (!sev_es_guest(vcpu->kvm)) {
 		if (!svm_is_intercept(svm, INTERCEPT_CR0_WRITE))
@@ -5028,6 +5112,9 @@ static int svm_vm_init(struct kvm *kvm)
 			return ret;
 	}
 
+	if (pml)
+		kvm->arch.cpu_dirty_log_size = PML_LOG_NR_ENTRIES;
+
 	svm_srso_vm_init();
 	return 0;
 }
@@ -5181,6 +5268,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.gmem_prepare = sev_gmem_prepare,
 	.gmem_invalidate = sev_gmem_invalidate,
 	.gmem_max_mapping_level = sev_gmem_max_mapping_level,
+
+	.update_cpu_dirty_logging = svm_update_cpu_dirty_logging,
 };
 
 /*
@@ -5382,6 +5471,10 @@ static __init int svm_hardware_setup(void)
 
 	nrips = nrips && boot_cpu_has(X86_FEATURE_NRIPS);
 
+	pml = pml && npt_enabled && cpu_feature_enabled(X86_FEATURE_PML);
+	if (pml)
+		pr_info("Page modification logging supported\n");
+
 	if (lbrv) {
 		if (!boot_cpu_has(X86_FEATURE_LBRV))
 			lbrv = false;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 70df7c6413cf..7e6ee4e80021 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -335,6 +335,8 @@ struct vcpu_svm {
 
 	/* Guest GIF value, used when vGIF is not enabled */
 	bool guest_gif;
+
+	struct page *pml_page;
 };
 
 struct svm_cpu_data {
@@ -717,6 +719,8 @@ static inline void svm_enable_intercept_for_msr(struct kvm_vcpu *vcpu,
 	svm_set_intercept_for_msr(vcpu, msr, type, true);
 }
 
+void svm_update_cpu_dirty_logging(struct kvm_vcpu *vcpu);
+
 /* nested.c */
 
 #define NESTED_EXIT_HOST	0	/* Exit handled on host level */
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 4/4] KVM: SVM: Add Page modification logging support
  2025-09-15  8:59 ` [PATCH v2 4/4] KVM: SVM: Add Page modification logging support Nikunj A Dadhania
@ 2025-09-15 12:35   ` Huang, Kai
  2025-09-18  6:12     ` Nikunj A Dadhania
  0 siblings, 1 reply; 13+ messages in thread
From: Huang, Kai @ 2025-09-15 12:35 UTC (permalink / raw)
  To: pbonzini@redhat.com, seanjc@google.com, nikunj@amd.com
  Cc: thomas.lendacky@amd.com, kvm@vger.kernel.org,
	joao.m.martins@oracle.com, santosh.shukla@amd.com, bp@alien8.de

On Mon, 2025-09-15 at 08:59 +0000, Nikunj A Dadhania wrote:
> Currently, dirty logging relies on write protecting guest memory and
> marking dirty GFNs during subsequent write faults. 
> 

Better to point out "On AMD platforms only".

> This method works but
> incurs overhead due to additional write faults for each dirty GFN.
> 
> Implement support for the Page Modification Logging (PML) feature, a
> hardware-assisted method for efficient dirty logging. PML automatically
> logs dirty GPA[51:12] to a 4K buffer when the CPU sets NPT D-bits. Two new
> VMCB fields are utilized: PML_ADDR and PML_INDEX. The PML_INDEX is
> initialized to 511 (8 bytes per GPA entry), and the CPU decreases the
> PML_INDEX after logging each GPA. When the PML buffer is full, a
> VMEXIT(PML_FULL) with exit code 0x407 is generated.
> 
> PML is enabled by default when supported and can be disabled via the 'pml'
> module parameter.

This changelog mentions nothing about interaction between PML vs nested.

On VMX, PML is emulated for L2 (for nested EPT) but is never enabled in
hardware when CPU runs in L2, so:

1) PML is exposed to L1 (for nested EPT).
2) PML needs to be turned off when CPU runs in L2 otherwise L2's GPA 
   could be logged, and turned on again after CPU leaves L2 (and restore
   PML buffer/index of VMCS01).

It doesn't seem this series supports emulating PML for L2 (for nested
NPT), because AMD's PML is also enumerated via a CPUID bit (while VMX
doesn't) and it's not exposed to guest, so we don't need to handle nested
PML_FULL VMEXIT etc.

This is fine I think, and we can support this in the future if needed.

But 2) is also needed anyway for AMD's PML AFAICT, regardless of whether
1) is supported or not ?

If so, could we add some text to clarify all of these in the changelog?

[...]

>  
> +void svm_update_cpu_dirty_logging(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +
> +	if (WARN_ON_ONCE(!pml))
> +		return;
> +
> +	if (is_guest_mode(vcpu))
> +		return;

VMX has a vmx->nested.update_vmcs01_cpu_dirty_logging boolean.  It's set
here to indicate PML enabling is not updated for L2 here, but later when
switching to run in L1, the PML enabling needs to updated.

Shouldn't SVM have similar handling?

> +
> +	/*
> +	 * Note, nr_memslots_dirty_logging can be changed concurrently with this
> +	 * code, but in that case another update request will be made and so the
> +	 * guest will never run with a stale PML value.
> +	 */
> +	if (atomic_read(&vcpu->kvm->nr_memslots_dirty_logging))
> +		svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_PML_ENABLE;
> +	else
> +		svm->vmcb->control.nested_ctl &= ~SVM_NESTED_CTL_PML_ENABLE;
> +}
> +
> 
[...]

> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -335,6 +335,8 @@ struct vcpu_svm {
>  
>  	/* Guest GIF value, used when vGIF is not enabled */
>  	bool guest_gif;
> +
> +	struct page *pml_page;
>  };

This seems to be a leftover.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/4] KVM: x86: Move PML page to common vcpu arch structure
  2025-09-15  8:59 ` [PATCH v2 2/4] KVM: x86: Move PML page to common vcpu arch structure Nikunj A Dadhania
@ 2025-09-16 10:27   ` Huang, Kai
  2025-09-17  3:31     ` Nikunj A. Dadhania
  0 siblings, 1 reply; 13+ messages in thread
From: Huang, Kai @ 2025-09-16 10:27 UTC (permalink / raw)
  To: pbonzini@redhat.com, seanjc@google.com, nikunj@amd.com
  Cc: thomas.lendacky@amd.com, kvm@vger.kernel.org,
	joao.m.martins@oracle.com, santosh.shukla@amd.com, bp@alien8.de

On Mon, 2025-09-15 at 08:59 +0000, Nikunj A Dadhania wrote:
> Move the PML page from VMX-specific vcpu_vmx structure to the common
> kvm_vcpu_arch structure to share it between VMX and SVM implementations.
> 
> Update all VMX references accordingly, and simplify the
> kvm_flush_pml_buffer() interface by removing the page parameter since it
> can now access the page directly from the vcpu structure.
> 
> No functional change, restructuring to prepare for SVM PML support.
> 
> Suggested-by: Kai Huang <kai.huang@intel.com>
> Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>

Nit: IMHO it's also better to explain why we only moved the PML buffer
pointer but not the code which allocates/frees the PML buffer:

  Move the PML page to x86 common code only without moving the PML page 
  allocation code, since for AMD the PML buffer must be allocated using
  snp_safe_alloc_page().

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/4] KVM: x86: Move PML page to common vcpu arch structure
  2025-09-16 10:27   ` Huang, Kai
@ 2025-09-17  3:31     ` Nikunj A. Dadhania
  2025-09-17 10:13       ` Huang, Kai
  0 siblings, 1 reply; 13+ messages in thread
From: Nikunj A. Dadhania @ 2025-09-17  3:31 UTC (permalink / raw)
  To: Huang, Kai, pbonzini@redhat.com, seanjc@google.com
  Cc: thomas.lendacky@amd.com, kvm@vger.kernel.org,
	joao.m.martins@oracle.com, santosh.shukla@amd.com, bp@alien8.de



On 9/16/2025 3:57 PM, Huang, Kai wrote:
> On Mon, 2025-09-15 at 08:59 +0000, Nikunj A Dadhania wrote:
>> Move the PML page from VMX-specific vcpu_vmx structure to the common
>> kvm_vcpu_arch structure to share it between VMX and SVM implementations.
>>
>> Update all VMX references accordingly, and simplify the
>> kvm_flush_pml_buffer() interface by removing the page parameter since it
>> can now access the page directly from the vcpu structure.
>>
>> No functional change, restructuring to prepare for SVM PML support.
>>
>> Suggested-by: Kai Huang <kai.huang@intel.com>
>> Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
> 
> Nit: IMHO it's also better to explain why we only moved the PML buffer
> pointer but not the code which allocates/frees the PML buffer:
> 
>   Move the PML page to x86 common code only without moving the PML page 
>   allocation code, since for AMD the PML buffer must be allocated using
>   snp_safe_alloc_page().

Ack

Regards
Nikunj


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/4] KVM: x86: Move PML page to common vcpu arch structure
  2025-09-17  3:31     ` Nikunj A. Dadhania
@ 2025-09-17 10:13       ` Huang, Kai
  2025-09-17 10:17         ` Nikunj A. Dadhania
  0 siblings, 1 reply; 13+ messages in thread
From: Huang, Kai @ 2025-09-17 10:13 UTC (permalink / raw)
  To: pbonzini@redhat.com, seanjc@google.com, nikunj@amd.com
  Cc: thomas.lendacky@amd.com, kvm@vger.kernel.org,
	joao.m.martins@oracle.com, santosh.shukla@amd.com, bp@alien8.de

On Wed, 2025-09-17 at 09:01 +0530, Nikunj A. Dadhania wrote:
> 
> On 9/16/2025 3:57 PM, Huang, Kai wrote:
> > On Mon, 2025-09-15 at 08:59 +0000, Nikunj A Dadhania wrote:
> > > Move the PML page from VMX-specific vcpu_vmx structure to the common
> > > kvm_vcpu_arch structure to share it between VMX and SVM implementations.
> > > 
> > > Update all VMX references accordingly, and simplify the
> > > kvm_flush_pml_buffer() interface by removing the page parameter since it
> > > can now access the page directly from the vcpu structure.
> > > 
> > > No functional change, restructuring to prepare for SVM PML support.
> > > 
> > > Suggested-by: Kai Huang <kai.huang@intel.com>
> > > Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
> > 
> > Nit: IMHO it's also better to explain why we only moved the PML buffer
> > pointer but not the code which allocates/frees the PML buffer:
> > 
> >   Move the PML page to x86 common code only without moving the PML page 
> >   allocation code, since for AMD the PML buffer must be allocated using
> >   snp_safe_alloc_page().
> 
> Ack
> 

Btw, just asking: why not just merging this patch to the first patch?

I don't have strong preference but seems patch 1/2 are connected (they
both move VMX specific code to x86 common for share) and could be one.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/4] KVM: x86: Move PML page to common vcpu arch structure
  2025-09-17 10:13       ` Huang, Kai
@ 2025-09-17 10:17         ` Nikunj A. Dadhania
  2025-09-17 10:28           ` Huang, Kai
  0 siblings, 1 reply; 13+ messages in thread
From: Nikunj A. Dadhania @ 2025-09-17 10:17 UTC (permalink / raw)
  To: Huang, Kai, pbonzini@redhat.com, seanjc@google.com
  Cc: thomas.lendacky@amd.com, kvm@vger.kernel.org,
	joao.m.martins@oracle.com, santosh.shukla@amd.com, bp@alien8.de



On 9/17/2025 3:43 PM, Huang, Kai wrote:
> On Wed, 2025-09-17 at 09:01 +0530, Nikunj A. Dadhania wrote:
>>
>> On 9/16/2025 3:57 PM, Huang, Kai wrote:
>>> On Mon, 2025-09-15 at 08:59 +0000, Nikunj A Dadhania wrote:
>>>> Move the PML page from VMX-specific vcpu_vmx structure to the common
>>>> kvm_vcpu_arch structure to share it between VMX and SVM implementations.
>>>>
>>>> Update all VMX references accordingly, and simplify the
>>>> kvm_flush_pml_buffer() interface by removing the page parameter since it
>>>> can now access the page directly from the vcpu structure.
>>>>
>>>> No functional change, restructuring to prepare for SVM PML support.
>>>>
>>>> Suggested-by: Kai Huang <kai.huang@intel.com>
>>>> Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
>>>
>>> Nit: IMHO it's also better to explain why we only moved the PML buffer
>>> pointer but not the code which allocates/frees the PML buffer:
>>>
>>>   Move the PML page to x86 common code only without moving the PML page 
>>>   allocation code, since for AMD the PML buffer must be allocated using
>>>   snp_safe_alloc_page().
>>
>> Ack
>>
> 
> Btw, just asking: why not just merging this patch to the first patch?

Just kept it separate as VMX and SVM is not sharing the page allocation and 
if we do not want to have this churn of moving the pml_page to vcpu structure.
We can drop this patch.

> I don't have strong preference but seems patch 1/2 are connected (they
> both move VMX specific code to x86 common for share) and could be one.
Same here, I can squash this with patch 1.

Regards
Nikunj

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/4] KVM: x86: Move PML page to common vcpu arch structure
  2025-09-17 10:17         ` Nikunj A. Dadhania
@ 2025-09-17 10:28           ` Huang, Kai
  0 siblings, 0 replies; 13+ messages in thread
From: Huang, Kai @ 2025-09-17 10:28 UTC (permalink / raw)
  To: pbonzini@redhat.com, seanjc@google.com, nikunj@amd.com
  Cc: thomas.lendacky@amd.com, kvm@vger.kernel.org,
	joao.m.martins@oracle.com, santosh.shukla@amd.com, bp@alien8.de

On Wed, 2025-09-17 at 15:47 +0530, Nikunj A. Dadhania wrote:
> 
> On 9/17/2025 3:43 PM, Huang, Kai wrote:
> > On Wed, 2025-09-17 at 09:01 +0530, Nikunj A. Dadhania wrote:
> > > 
> > > On 9/16/2025 3:57 PM, Huang, Kai wrote:
> > > > On Mon, 2025-09-15 at 08:59 +0000, Nikunj A Dadhania wrote:
> > > > > Move the PML page from VMX-specific vcpu_vmx structure to the common
> > > > > kvm_vcpu_arch structure to share it between VMX and SVM implementations.
> > > > > 
> > > > > Update all VMX references accordingly, and simplify the
> > > > > kvm_flush_pml_buffer() interface by removing the page parameter since it
> > > > > can now access the page directly from the vcpu structure.
> > > > > 
> > > > > No functional change, restructuring to prepare for SVM PML support.
> > > > > 
> > > > > Suggested-by: Kai Huang <kai.huang@intel.com>
> > > > > Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
> > > > 
> > > > Nit: IMHO it's also better to explain why we only moved the PML buffer
> > > > pointer but not the code which allocates/frees the PML buffer:
> > > > 
> > > >   Move the PML page to x86 common code only without moving the PML page 
> > > >   allocation code, since for AMD the PML buffer must be allocated using
> > > >   snp_safe_alloc_page().
> > > 
> > > Ack
> > > 
> > 
> > Btw, just asking: why not just merging this patch to the first patch?
> 
> Just kept it separate as VMX and SVM is not sharing the page allocation and 
> if we do not want to have this churn of moving the pml_page to vcpu structure.
> We can drop this patch.
> 
> > I don't have strong preference but seems patch 1/2 are connected (they
> > both move VMX specific code to x86 common for share) and could be one.
> Same here, I can squash this with patch 1.
> 

All right, up to you (and Sean/Paolo) :-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 4/4] KVM: SVM: Add Page modification logging support
  2025-09-15 12:35   ` Huang, Kai
@ 2025-09-18  6:12     ` Nikunj A Dadhania
  2025-09-24  6:26       ` Nikunj A Dadhania
  0 siblings, 1 reply; 13+ messages in thread
From: Nikunj A Dadhania @ 2025-09-18  6:12 UTC (permalink / raw)
  To: Huang, Kai, pbonzini@redhat.com, seanjc@google.com
  Cc: thomas.lendacky@amd.com, kvm@vger.kernel.org,
	joao.m.martins@oracle.com, santosh.shukla@amd.com, bp@alien8.de

"Huang, Kai" <kai.huang@intel.com> writes:

> On Mon, 2025-09-15 at 08:59 +0000, Nikunj A Dadhania wrote:
>> Currently, dirty logging relies on write protecting guest memory and
>> marking dirty GFNs during subsequent write faults. 
>> 
>
> Better to point out "On AMD platforms only".

Subject being "KVM: SVM: ", this seems redundant to me.

>
>> This method works but
>> incurs overhead due to additional write faults for each dirty GFN.
>> 
>> Implement support for the Page Modification Logging (PML) feature, a
>> hardware-assisted method for efficient dirty logging. PML automatically
>> logs dirty GPA[51:12] to a 4K buffer when the CPU sets NPT D-bits. Two new
>> VMCB fields are utilized: PML_ADDR and PML_INDEX. The PML_INDEX is
>> initialized to 511 (8 bytes per GPA entry), and the CPU decreases the
>> PML_INDEX after logging each GPA. When the PML buffer is full, a
>> VMEXIT(PML_FULL) with exit code 0x407 is generated.
>> 
>> PML is enabled by default when supported and can be disabled via the 'pml'
>> module parameter.
>
> This changelog mentions nothing about interaction between PML vs nested.
>
> On VMX, PML is emulated for L2 (for nested EPT) but is never enabled in
> hardware when CPU runs in L2, so:
>
> 1) PML is exposed to L1 (for nested EPT).
> 2) PML needs to be turned off when CPU runs in L2 otherwise L2's GPA 
>    could be logged, and turned on again after CPU leaves L2 (and restore
>    PML buffer/index of VMCS01).

I get your point and I see that when nested VM entry, PML is set in
the nested_ctl for L2.

I am trying to create this scenario, and couldnt get the L2 GPA's.

>
> It doesn't seem this series supports emulating PML for L2 (for nested
> NPT), because AMD's PML is also enumerated via a CPUID bit (while VMX
> doesn't) and it's not exposed to guest, so we don't need to handle nested
> PML_FULL VMEXIT etc.
>
> This is fine I think, and we can support this in the future if needed.
>
> But 2) is also needed anyway for AMD's PML AFAICT, regardless of whether
> 1) is supported or not ?

I see your point, we will need to disable PML for L2.

>
> If so, could we add some text to clarify all of these in the changelog?
>
>
> [...]
>
>>  
>> +void svm_update_cpu_dirty_logging(struct kvm_vcpu *vcpu)
>> +{
>> +	struct vcpu_svm *svm = to_svm(vcpu);
>> +
>> +	if (WARN_ON_ONCE(!pml))
>> +		return;
>> +
>> +	if (is_guest_mode(vcpu))
>> +		return;
>
> VMX has a vmx->nested.update_vmcs01_cpu_dirty_logging boolean.  It's set
> here to indicate PML enabling is not updated for L2 here, but later when
> switching to run in L1, the PML enabling needs to updated.
>
> Shouldn't SVM have similar handling?

Sure, will get back to you on this.

>
>> +
>> +	/*
>> +	 * Note, nr_memslots_dirty_logging can be changed concurrently with this
>> +	 * code, but in that case another update request will be made and so the
>> +	 * guest will never run with a stale PML value.
>> +	 */
>> +	if (atomic_read(&vcpu->kvm->nr_memslots_dirty_logging))
>> +		svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_PML_ENABLE;
>> +	else
>> +		svm->vmcb->control.nested_ctl &= ~SVM_NESTED_CTL_PML_ENABLE;
>> +}
>> +
>> 
> [...]
>
>> --- a/arch/x86/kvm/svm/svm.h
>> +++ b/arch/x86/kvm/svm/svm.h
>> @@ -335,6 +335,8 @@ struct vcpu_svm {
>>  
>>  	/* Guest GIF value, used when vGIF is not enabled */
>>  	bool guest_gif;
>> +
>> +	struct page *pml_page;
>>  };
>
> This seems to be a leftover.

Sure.

Regards
Nikunj

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 4/4] KVM: SVM: Add Page modification logging support
  2025-09-18  6:12     ` Nikunj A Dadhania
@ 2025-09-24  6:26       ` Nikunj A Dadhania
  0 siblings, 0 replies; 13+ messages in thread
From: Nikunj A Dadhania @ 2025-09-24  6:26 UTC (permalink / raw)
  To: Huang, Kai, pbonzini@redhat.com, seanjc@google.com
  Cc: thomas.lendacky@amd.com, kvm@vger.kernel.org,
	joao.m.martins@oracle.com, santosh.shukla@amd.com, bp@alien8.de

Nikunj A Dadhania <nikunj@amd.com> writes:

> "Huang, Kai" <kai.huang@intel.com> writes:
>
>> On Mon, 2025-09-15 at 08:59 +0000, Nikunj A Dadhania wrote:
>>> 
>>> PML is enabled by default when supported and can be disabled via the 'pml'
>>> module parameter.
>>
>> This changelog mentions nothing about interaction between PML vs nested.
>>
>> On VMX, PML is emulated for L2 (for nested EPT) but is never enabled in
>> hardware when CPU runs in L2, so:
>>
>> 1) PML is exposed to L1 (for nested EPT).
>> 2) PML needs to be turned off when CPU runs in L2 otherwise L2's GPA 
>>    could be logged, and turned on again after CPU leaves L2 (and restore
>>    PML buffer/index of VMCS01).
>
> I get your point and I see that when nested VM entry, PML is set in
> the nested_ctl for L2.

As nested_ctl from L1 was getting copied into L2, PML flag was also set
for L2. I have disabled this now.

>
> I am trying to create this scenario, and couldnt get the L2 GPA's.

L2 GPA's were not logged because PML_ADDR and PML_INDEX of the VMCB02
was not populated.

>
>>
>> It doesn't seem this series supports emulating PML for L2 (for nested
>> NPT), because AMD's PML is also enumerated via a CPUID bit (while VMX
>> doesn't) and it's not exposed to guest, so we don't need to handle nested
>> PML_FULL VMEXIT etc.
>>
>> This is fine I think, and we can support this in the future if needed.
>>
>> But 2) is also needed anyway for AMD's PML AFAICT, regardless of whether
>> 1) is supported or not ?
>
> I see your point, we will need to disable PML for L2.
>
>>
>> If so, could we add some text to clarify all of these in the changelog?
>>
>>
>> [...]
>>
>>>  
>>> +void svm_update_cpu_dirty_logging(struct kvm_vcpu *vcpu)
>>> +{
>>> +	struct vcpu_svm *svm = to_svm(vcpu);
>>> +
>>> +	if (WARN_ON_ONCE(!pml))
>>> +		return;
>>> +
>>> +	if (is_guest_mode(vcpu))
>>> +		return;
>>
>> VMX has a vmx->nested.update_vmcs01_cpu_dirty_logging boolean.  It's set
>> here to indicate PML enabling is not updated for L2 here, but later when
>> switching to run in L1, the PML enabling needs to updated.
>>
>> Shouldn't SVM have similar handling?
>
> Sure, will get back to you on this.

Yes, this will be required as it is handling a case when there is a
dirty logging change for L1 while L2 is running. So this variable is
used to record such change and when L2 exits, the pending change needs
to be made in L1's VMCB.

Thanks for the detailed review.

Regards
Nikunj

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-09-24  6:26 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-15  8:59 [PATCH v2 0/4] KVM: SVM: Add Page Modification Logging (PML) support Nikunj A Dadhania
2025-09-15  8:59 ` [PATCH v2 1/4] KVM: x86: Carve out PML flush routine Nikunj A Dadhania
2025-09-15  8:59 ` [PATCH v2 2/4] KVM: x86: Move PML page to common vcpu arch structure Nikunj A Dadhania
2025-09-16 10:27   ` Huang, Kai
2025-09-17  3:31     ` Nikunj A. Dadhania
2025-09-17 10:13       ` Huang, Kai
2025-09-17 10:17         ` Nikunj A. Dadhania
2025-09-17 10:28           ` Huang, Kai
2025-09-15  8:59 ` [PATCH v2 3/4] x86/cpufeatures: Add Page modification logging Nikunj A Dadhania
2025-09-15  8:59 ` [PATCH v2 4/4] KVM: SVM: Add Page modification logging support Nikunj A Dadhania
2025-09-15 12:35   ` Huang, Kai
2025-09-18  6:12     ` Nikunj A Dadhania
2025-09-24  6:26       ` Nikunj A Dadhania

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox