[PATCH v3 RESEND 0/5] Implement Eager Page Splitting for RISC-V

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 RESEND 0/5] Implement Eager Page Splitting for RISC-V
@ 2026-06-24  8:00 ` wang.yechao255
  0 siblings, 0 replies; 17+ messages in thread
From: wang.yechao255 @ 2026-06-24  8:00 UTC (permalink / raw)
  To: anup, kvm, kvm-riscv, linux-riscv, linux-kernel
  Cc: pjw, palmer, aou, atish.patra, alex, wang.yechao255

From: Wang Yechao <wang.yechao255@zte.com.cn>

Eager Page Splitting is implemented on x86 and ARM. It improves the
performance of dirty logging (used in live migrations) when guest memory
is backed by huge pages.

This series implement Eager Page Splitting for RISC-V. The Implementation
similar to x86 and ARM. It provides two ways to split huge pages in ioctl
context instead of on fault in vCPU context:

- Split huge pages when dirty logging is enabled when
  KVM_DIRTY_LOG_INITIALLY_SET is not set. This happens when enabling the
  KVM_MEM_LOG_DIRTY_PAGES flag of a memslot, and splits the whole memslot
  into 4K mappings.

- Split huge pages during KVM_CLEAR_DIRTY_LOG when
  KVM_DIRTY_LOG_INITIALLY_SET is set. This happens when enabling dirty log
  in small chunks. It does not split the whole memslot, but only the
  requested chunk range.

Changes in v3 resend:
 - Fix patch format to ensure emails reach the linux-riscv and kvm-riscv
   mailing lists.
 - Move the free pgd_split_page_cache into kvm_arch_destroy_vm().

Changes in v3:
 - Rebase on v7.1 version.
 - Add patch03 to remove the redundant TLB flush operations.

 sashiko-bot AI review
 (https://sashiko.dev/#/message/20260603104847.9692C1F00893%40smtp.kernel.org)
 - Check the kvm->arch.pgd before split huge pages.
 - Align the start address to PMD_SIZE before split.
 - Flushing remote TLBs before Dropping mmu_lock.

Changes in v2:
 - Rename the split_page_cache.
 - Rename the kvm_riscv_split_huge_pages and
   kvm_riscv_split_memory_region.
 - Add lockdep_assert_held check before split huge pages.
 - Update Documentation/admin-guide/kernel-parameters.txt.
 - Link to v2
 https://lore.kernel.org/linux-riscv/20260603175256408L0jnqGs1cJGc0ijCdujci@zte.com.cn/

 - Link to v1:
 https://lore.kernel.org/linux-riscv/20260513153656847l3c4fI5hBsAyoIZi8aGIs@zte.com.cn/


Wang Yechao (5):
  RISC-V: KVM: Add the split page cache for ioctl context
  RISC-V: KVM: Split huge pages when dirty logging is enabled
  RISC-V: KVM: Remove redundant TLB flush operations
  RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG
  RISC-V: KVM: Add the eager_page_split module parameter

 .../admin-guide/kernel-parameters.txt         |  7 +-
 arch/riscv/include/asm/kvm_gstage.h           |  6 +-
 arch/riscv/include/asm/kvm_host.h             |  1 +
 arch/riscv/kvm/gstage.c                       | 21 +++--
 arch/riscv/kvm/mmu.c                          | 91 ++++++++++++++++++-
 arch/riscv/kvm/vm.c                           |  6 ++
 6 files changed, 116 insertions(+), 16 deletions(-)

-- 
2.43.5

-- 
kvm-riscv mailing list
kvm-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kvm-riscv

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 RESEND 0/5] Implement Eager Page Splitting for RISC-V
@ 2026-06-24  8:00 ` wang.yechao255
  0 siblings, 0 replies; 17+ messages in thread
From: wang.yechao255 @ 2026-06-24  8:00 UTC (permalink / raw)
  To: anup, kvm, kvm-riscv, linux-riscv, linux-kernel
  Cc: pjw, palmer, aou, atish.patra, alex, wang.yechao255

From: Wang Yechao <wang.yechao255@zte.com.cn>

Eager Page Splitting is implemented on x86 and ARM. It improves the
performance of dirty logging (used in live migrations) when guest memory
is backed by huge pages.

This series implement Eager Page Splitting for RISC-V. The Implementation
similar to x86 and ARM. It provides two ways to split huge pages in ioctl
context instead of on fault in vCPU context:

- Split huge pages when dirty logging is enabled when
  KVM_DIRTY_LOG_INITIALLY_SET is not set. This happens when enabling the
  KVM_MEM_LOG_DIRTY_PAGES flag of a memslot, and splits the whole memslot
  into 4K mappings.

- Split huge pages during KVM_CLEAR_DIRTY_LOG when
  KVM_DIRTY_LOG_INITIALLY_SET is set. This happens when enabling dirty log
  in small chunks. It does not split the whole memslot, but only the
  requested chunk range.

Changes in v3 resend:
 - Fix patch format to ensure emails reach the linux-riscv and kvm-riscv
   mailing lists.
 - Move the free pgd_split_page_cache into kvm_arch_destroy_vm().

Changes in v3:
 - Rebase on v7.1 version.
 - Add patch03 to remove the redundant TLB flush operations.

 sashiko-bot AI review
 (https://sashiko.dev/#/message/20260603104847.9692C1F00893%40smtp.kernel.org)
 - Check the kvm->arch.pgd before split huge pages.
 - Align the start address to PMD_SIZE before split.
 - Flushing remote TLBs before Dropping mmu_lock.

Changes in v2:
 - Rename the split_page_cache.
 - Rename the kvm_riscv_split_huge_pages and
   kvm_riscv_split_memory_region.
 - Add lockdep_assert_held check before split huge pages.
 - Update Documentation/admin-guide/kernel-parameters.txt.
 - Link to v2
 https://lore.kernel.org/linux-riscv/20260603175256408L0jnqGs1cJGc0ijCdujci@zte.com.cn/

 - Link to v1:
 https://lore.kernel.org/linux-riscv/20260513153656847l3c4fI5hBsAyoIZi8aGIs@zte.com.cn/


Wang Yechao (5):
  RISC-V: KVM: Add the split page cache for ioctl context
  RISC-V: KVM: Split huge pages when dirty logging is enabled
  RISC-V: KVM: Remove redundant TLB flush operations
  RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG
  RISC-V: KVM: Add the eager_page_split module parameter

 .../admin-guide/kernel-parameters.txt         |  7 +-
 arch/riscv/include/asm/kvm_gstage.h           |  6 +-
 arch/riscv/include/asm/kvm_host.h             |  1 +
 arch/riscv/kvm/gstage.c                       | 21 +++--
 arch/riscv/kvm/mmu.c                          | 91 ++++++++++++++++++-
 arch/riscv/kvm/vm.c                           |  6 ++
 6 files changed, 116 insertions(+), 16 deletions(-)

-- 
2.43.5

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 RESEND 0/5] Implement Eager Page Splitting for RISC-V
@ 2026-06-24  8:00 ` wang.yechao255
  0 siblings, 0 replies; 17+ messages in thread
From: wang.yechao255 @ 2026-06-24  8:00 UTC (permalink / raw)
  To: anup, kvm, kvm-riscv, linux-riscv, linux-kernel
  Cc: pjw, palmer, aou, atish.patra, alex, wang.yechao255

From: Wang Yechao <wang.yechao255@zte.com.cn>

Eager Page Splitting is implemented on x86 and ARM. It improves the
performance of dirty logging (used in live migrations) when guest memory
is backed by huge pages.

This series implement Eager Page Splitting for RISC-V. The Implementation
similar to x86 and ARM. It provides two ways to split huge pages in ioctl
context instead of on fault in vCPU context:

- Split huge pages when dirty logging is enabled when
  KVM_DIRTY_LOG_INITIALLY_SET is not set. This happens when enabling the
  KVM_MEM_LOG_DIRTY_PAGES flag of a memslot, and splits the whole memslot
  into 4K mappings.

- Split huge pages during KVM_CLEAR_DIRTY_LOG when
  KVM_DIRTY_LOG_INITIALLY_SET is set. This happens when enabling dirty log
  in small chunks. It does not split the whole memslot, but only the
  requested chunk range.

Changes in v3 resend:
 - Fix patch format to ensure emails reach the linux-riscv and kvm-riscv
   mailing lists.
 - Move the free pgd_split_page_cache into kvm_arch_destroy_vm().

Changes in v3:
 - Rebase on v7.1 version.
 - Add patch03 to remove the redundant TLB flush operations.

 sashiko-bot AI review
 (https://sashiko.dev/#/message/20260603104847.9692C1F00893%40smtp.kernel.org)
 - Check the kvm->arch.pgd before split huge pages.
 - Align the start address to PMD_SIZE before split.
 - Flushing remote TLBs before Dropping mmu_lock.

Changes in v2:
 - Rename the split_page_cache.
 - Rename the kvm_riscv_split_huge_pages and
   kvm_riscv_split_memory_region.
 - Add lockdep_assert_held check before split huge pages.
 - Update Documentation/admin-guide/kernel-parameters.txt.
 - Link to v2
 https://lore.kernel.org/linux-riscv/20260603175256408L0jnqGs1cJGc0ijCdujci@zte.com.cn/

 - Link to v1:
 https://lore.kernel.org/linux-riscv/20260513153656847l3c4fI5hBsAyoIZi8aGIs@zte.com.cn/


Wang Yechao (5):
  RISC-V: KVM: Add the split page cache for ioctl context
  RISC-V: KVM: Split huge pages when dirty logging is enabled
  RISC-V: KVM: Remove redundant TLB flush operations
  RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG
  RISC-V: KVM: Add the eager_page_split module parameter

 .../admin-guide/kernel-parameters.txt         |  7 +-
 arch/riscv/include/asm/kvm_gstage.h           |  6 +-
 arch/riscv/include/asm/kvm_host.h             |  1 +
 arch/riscv/kvm/gstage.c                       | 21 +++--
 arch/riscv/kvm/mmu.c                          | 91 ++++++++++++++++++-
 arch/riscv/kvm/vm.c                           |  6 ++
 6 files changed, 116 insertions(+), 16 deletions(-)

-- 
2.43.5

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 17+ messages in thread

*  [PATCH v3 RESEND 1/5] RISC-V: KVM: Add the split page cache for ioctl context
  2026-06-24  8:00 ` wang.yechao255
@ 2026-06-24  8:05   ` wang.yechao255
  -1 siblings, 0 replies; 17+ messages in thread
From: wang.yechao255 @ 2026-06-24  8:05 UTC (permalink / raw)
  To: anup, kvm, kvm-riscv, linux-riscv, linux-kernel
  Cc: pjw, palmer, aou, atish.patra, alex, wang.yechao255

From: Wang Yechao <wang.yechao255@zte.com.cn>

Add the split page cache for dirty logging enablement and the
KVM_CLEAR_DIRTY_LOG ioctl.

Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
---
 arch/riscv/include/asm/kvm_host.h | 1 +
 arch/riscv/kvm/mmu.c              | 1 +
 arch/riscv/kvm/vm.c               | 6 ++++++
 3 files changed, 8 insertions(+)

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index 60017ceec9d2a..69f73fd106a94 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -86,6 +86,7 @@ struct kvm_arch {
 	pgd_t *pgd;
 	phys_addr_t pgd_phys;
 	unsigned long pgd_levels;
+	struct kvm_mmu_memory_cache pgd_split_page_cache;

 	/* Guest Timer */
 	struct kvm_guest_timer timer;
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 082f9b2617338..9cf69bc28b9c5 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -676,6 +676,7 @@ int kvm_riscv_mmu_alloc_pgd(struct kvm *kvm)
 	kvm->arch.pgd = page_to_virt(pgd_page);
 	kvm->arch.pgd_phys = page_to_phys(pgd_page);
 	kvm->arch.pgd_levels = kvm_riscv_gstage_max_pgd_levels;
+	kvm->arch.pgd_split_page_cache.gfp_zero = __GFP_ZERO;

 	return 0;
 }
diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
index a9f083feeb767..be38f24a297d6 100644
--- a/arch/riscv/kvm/vm.c
+++ b/arch/riscv/kvm/vm.c
@@ -54,6 +54,12 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 	kvm_destroy_vcpus(kvm);

 	kvm_riscv_aia_destroy_vm(kvm);
+
+	/*
+	 * Free the split page cache after all vCPUs and devices are destroyed.
+	 * At this point, there are no concurrent accesses to the cache.
+	 */
+	kvm_mmu_free_memory_cache(&kvm->arch.pgd_split_page_cache);
 }

 int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irql,
-- 
2.43.5

-- 
kvm-riscv mailing list
kvm-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kvm-riscv

^ permalink raw reply related	[flat|nested] 17+ messages in thread

*  [PATCH v3 RESEND 1/5] RISC-V: KVM: Add the split page cache for ioctl context
@ 2026-06-24  8:05   ` wang.yechao255
  0 siblings, 0 replies; 17+ messages in thread
From: wang.yechao255 @ 2026-06-24  8:05 UTC (permalink / raw)
  To: anup, kvm, kvm-riscv, linux-riscv, linux-kernel
  Cc: pjw, palmer, aou, atish.patra, alex, wang.yechao255

From: Wang Yechao <wang.yechao255@zte.com.cn>

Add the split page cache for dirty logging enablement and the
KVM_CLEAR_DIRTY_LOG ioctl.

Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
---
 arch/riscv/include/asm/kvm_host.h | 1 +
 arch/riscv/kvm/mmu.c              | 1 +
 arch/riscv/kvm/vm.c               | 6 ++++++
 3 files changed, 8 insertions(+)

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index 60017ceec9d2a..69f73fd106a94 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -86,6 +86,7 @@ struct kvm_arch {
 	pgd_t *pgd;
 	phys_addr_t pgd_phys;
 	unsigned long pgd_levels;
+	struct kvm_mmu_memory_cache pgd_split_page_cache;

 	/* Guest Timer */
 	struct kvm_guest_timer timer;
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 082f9b2617338..9cf69bc28b9c5 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -676,6 +676,7 @@ int kvm_riscv_mmu_alloc_pgd(struct kvm *kvm)
 	kvm->arch.pgd = page_to_virt(pgd_page);
 	kvm->arch.pgd_phys = page_to_phys(pgd_page);
 	kvm->arch.pgd_levels = kvm_riscv_gstage_max_pgd_levels;
+	kvm->arch.pgd_split_page_cache.gfp_zero = __GFP_ZERO;

 	return 0;
 }
diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
index a9f083feeb767..be38f24a297d6 100644
--- a/arch/riscv/kvm/vm.c
+++ b/arch/riscv/kvm/vm.c
@@ -54,6 +54,12 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 	kvm_destroy_vcpus(kvm);

 	kvm_riscv_aia_destroy_vm(kvm);
+
+	/*
+	 * Free the split page cache after all vCPUs and devices are destroyed.
+	 * At this point, there are no concurrent accesses to the cache.
+	 */
+	kvm_mmu_free_memory_cache(&kvm->arch.pgd_split_page_cache);
 }

 int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irql,
-- 
2.43.5

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 RESEND 2/5] RISC-V: KVM: Split huge pages when dirty logging is enabled
  2026-06-24  8:00 ` wang.yechao255
@ 2026-06-24  8:07   ` wang.yechao255
  -1 siblings, 0 replies; 17+ messages in thread
From: wang.yechao255 @ 2026-06-24  8:07 UTC (permalink / raw)
  To: anup, kvm, kvm-riscv, linux-riscv, linux-kernel
  Cc: pjw, palmer, aou, atish.patra, alex, wang.yechao255

From: Wang Yechao <wang.yechao255@zte.com.cn>

Split huge pages eagerly when enabling dirty logging. The goal is to
avoid doing it while faulting on write-protected pages, which
negatively impacts guest performance.

The benefits of eager page splitting are the same as in x86 and arm64,
added with commit a3fe5dbda0a4 ("KVM: x86/mmu: Split huge pages mapped
by the TDP MMU when dirty logging is enabled") and commit e7bf7a490c68
("KVM: arm64: Split huge pages when dirty logging is enabled")

Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
---
 arch/riscv/include/asm/kvm_gstage.h |  6 +--
 arch/riscv/kvm/gstage.c             | 21 +++++---
 arch/riscv/kvm/mmu.c                | 74 +++++++++++++++++++++++++++++
 3 files changed, 92 insertions(+), 9 deletions(-)

diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/kvm_gstage.h
index 21e2019df0cf5..f726279780177 100644
--- a/arch/riscv/include/asm/kvm_gstage.h
+++ b/arch/riscv/include/asm/kvm_gstage.h
@@ -64,9 +64,9 @@ int kvm_riscv_gstage_map_page(struct kvm_gstage *gstage,
 			      bool page_rdonly, bool page_exec,
 			      struct kvm_gstage_mapping *out_map);

-int kvm_riscv_gstage_split_huge(struct kvm_gstage *gstage,
-				struct kvm_mmu_memory_cache *pcache,
-				gpa_t addr, u32 target_level, bool flush);
+bool kvm_riscv_gstage_split_huge(struct kvm_gstage *gstage,
+				 struct kvm_mmu_memory_cache *pcache,
+				 gpa_t addr, u32 target_level, bool flush);

 enum kvm_riscv_gstage_op {
 	GSTAGE_OP_NOP = 0,	/* Nothing */
diff --git a/arch/riscv/kvm/gstage.c b/arch/riscv/kvm/gstage.c
index c4c3b79567f10..291cb70ea96dd 100644
--- a/arch/riscv/kvm/gstage.c
+++ b/arch/riscv/kvm/gstage.c
@@ -303,19 +303,20 @@ static inline unsigned long make_child_pte(unsigned long huge_pte, int index,
 	return child_pte;
 }

-int kvm_riscv_gstage_split_huge(struct kvm_gstage *gstage,
-				struct kvm_mmu_memory_cache *pcache,
-				gpa_t addr, u32 target_level, bool flush)
+bool kvm_riscv_gstage_split_huge(struct kvm_gstage *gstage,
+				 struct kvm_mmu_memory_cache *pcache,
+				 gpa_t addr, u32 target_level, bool flush)
 {
 	u32 current_level = gstage->pgd_levels - 1;
 	pte_t *next_ptep = (pte_t *)gstage->pgd;
 	unsigned long huge_pte, child_pte;
 	unsigned long child_page_size;
+	bool need_flush = false;
 	pte_t *ptep;
 	int i, ret;

 	if (!pcache)
-		return -ENOMEM;
+		return false;

 	while(current_level > target_level) {
 		ptep = (pte_t *)&next_ptep[gstage_pte_index(gstage, addr, current_level)];
@@ -337,23 +338,31 @@ int kvm_riscv_gstage_split_huge(struct kvm_gstage *gstage,

 		next_ptep = kvm_mmu_memory_cache_alloc(pcache);
 		if (!next_ptep)
-			return -ENOMEM;
+			return need_flush;

 		for (i = 0; i < PTRS_PER_PTE; i++) {
 			child_pte = make_child_pte(huge_pte, i, child_page_size);
 			set_pte((pte_t *)&next_ptep[i], __pte(child_pte));
 		}

+		/*
+		 * Ensure the writes to the child PTEs are visible before
+		 * linking the new page table to the parent PTE.
+		 */
+		smp_wmb();
+
 		set_pte(ptep, pfn_pte(PFN_DOWN(__pa(next_ptep)),
 				__pgprot(_PAGE_TABLE)));

 		if (flush)
 			gstage_tlb_flush(gstage, current_level, addr);
+		else
+			need_flush = true;

 		current_level--;
 	}

-	return 0;
+	return need_flush;
 }

 bool kvm_riscv_gstage_op_pte(struct kvm_gstage *gstage, gpa_t addr,
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 9cf69bc28b9c5..95e83c50addf5 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -97,6 +97,60 @@ void kvm_riscv_mmu_iounmap(struct kvm *kvm, gpa_t gpa, unsigned long size)
 					    size >> PAGE_SHIFT);
 }

+static bool need_topup_split_caches_or_resched(struct kvm_mmu_memory_cache *cache,
+					       int count)
+{
+	if (need_resched())
+		return true;
+
+	return kvm_mmu_memory_cache_nr_free_objects(cache) < count;
+}
+
+static bool mmu_split_huge_pages(struct kvm_gstage *gstage,
+				 phys_addr_t start, phys_addr_t end)
+{
+	struct kvm *kvm = gstage->kvm;
+	struct kvm_mmu_memory_cache *pcache = &kvm->arch.pgd_split_page_cache;
+	phys_addr_t addr = ALIGN(start, PMD_SIZE);
+	phys_addr_t last_flush_gfn = addr >> PAGE_SHIFT;
+	int count = gstage->pgd_levels;
+	bool flush = false;
+	int ret;
+
+	lockdep_assert_held_write(&kvm->mmu_lock);
+
+	while (addr < end) {
+		if (need_topup_split_caches_or_resched(pcache, count)) {
+			if (flush) {
+				kvm_flush_remote_tlbs_range(kvm, last_flush_gfn,
+					  (addr >> PAGE_SHIFT) - last_flush_gfn);
+				last_flush_gfn = addr >> PAGE_SHIFT;
+				flush = false;
+			}
+
+			write_unlock(&kvm->mmu_lock);
+			cond_resched();
+
+			ret = kvm_mmu_topup_memory_cache(pcache, count);
+			if (ret) {
+				kvm_err("Failed to toup split page cache\n");
+				write_lock(&kvm->mmu_lock);
+				return flush;
+			}
+			write_lock(&kvm->mmu_lock);
+		}
+
+		if (!kvm->arch.pgd)
+			return flush;
+
+		flush |= kvm_riscv_gstage_split_huge(gstage, pcache, addr, 0, false);
+
+		addr += PMD_SIZE;
+	}
+
+	return flush;
+}
+
 void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 					     struct kvm_memory_slot *slot,
 					     gfn_t gfn_offset,
@@ -151,6 +205,25 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 					    size >> PAGE_SHIFT);
 }

+static void mmu_split_memory_region(struct kvm *kvm, int slot)
+{
+	struct kvm_memslots *slots = kvm_memslots(kvm);
+	struct kvm_memory_slot *memslot = id_to_memslot(slots, slot);
+	phys_addr_t start = memslot->base_gfn << PAGE_SHIFT;
+	phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
+	struct kvm_gstage gstage;
+	bool flush;
+
+	kvm_riscv_gstage_init(&gstage, kvm);
+
+	write_lock(&kvm->mmu_lock);
+	flush = mmu_split_huge_pages(&gstage, start, end);
+	write_unlock(&kvm->mmu_lock);
+
+	if (flush)
+		kvm_flush_remote_tlbs_memslot(kvm, memslot);
+}
+
 void kvm_arch_commit_memory_region(struct kvm *kvm,
 				struct kvm_memory_slot *old,
 				const struct kvm_memory_slot *new,
@@ -164,6 +237,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 		if (kvm_dirty_log_manual_protect_and_init_set(kvm))
 			return;
 		mmu_wp_memory_region(kvm, new->id);
+		mmu_split_memory_region(kvm, new->id);
 	}
 }

-- 
2.43.5

-- 
kvm-riscv mailing list
kvm-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kvm-riscv

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 RESEND 2/5] RISC-V: KVM: Split huge pages when dirty logging is enabled
@ 2026-06-24  8:07   ` wang.yechao255
  0 siblings, 0 replies; 17+ messages in thread
From: wang.yechao255 @ 2026-06-24  8:07 UTC (permalink / raw)
  To: anup, kvm, kvm-riscv, linux-riscv, linux-kernel
  Cc: pjw, palmer, aou, atish.patra, alex, wang.yechao255

From: Wang Yechao <wang.yechao255@zte.com.cn>

Split huge pages eagerly when enabling dirty logging. The goal is to
avoid doing it while faulting on write-protected pages, which
negatively impacts guest performance.

The benefits of eager page splitting are the same as in x86 and arm64,
added with commit a3fe5dbda0a4 ("KVM: x86/mmu: Split huge pages mapped
by the TDP MMU when dirty logging is enabled") and commit e7bf7a490c68
("KVM: arm64: Split huge pages when dirty logging is enabled")

Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
---
 arch/riscv/include/asm/kvm_gstage.h |  6 +--
 arch/riscv/kvm/gstage.c             | 21 +++++---
 arch/riscv/kvm/mmu.c                | 74 +++++++++++++++++++++++++++++
 3 files changed, 92 insertions(+), 9 deletions(-)

diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/kvm_gstage.h
index 21e2019df0cf5..f726279780177 100644
--- a/arch/riscv/include/asm/kvm_gstage.h
+++ b/arch/riscv/include/asm/kvm_gstage.h
@@ -64,9 +64,9 @@ int kvm_riscv_gstage_map_page(struct kvm_gstage *gstage,
 			      bool page_rdonly, bool page_exec,
 			      struct kvm_gstage_mapping *out_map);

-int kvm_riscv_gstage_split_huge(struct kvm_gstage *gstage,
-				struct kvm_mmu_memory_cache *pcache,
-				gpa_t addr, u32 target_level, bool flush);
+bool kvm_riscv_gstage_split_huge(struct kvm_gstage *gstage,
+				 struct kvm_mmu_memory_cache *pcache,
+				 gpa_t addr, u32 target_level, bool flush);

 enum kvm_riscv_gstage_op {
 	GSTAGE_OP_NOP = 0,	/* Nothing */
diff --git a/arch/riscv/kvm/gstage.c b/arch/riscv/kvm/gstage.c
index c4c3b79567f10..291cb70ea96dd 100644
--- a/arch/riscv/kvm/gstage.c
+++ b/arch/riscv/kvm/gstage.c
@@ -303,19 +303,20 @@ static inline unsigned long make_child_pte(unsigned long huge_pte, int index,
 	return child_pte;
 }

-int kvm_riscv_gstage_split_huge(struct kvm_gstage *gstage,
-				struct kvm_mmu_memory_cache *pcache,
-				gpa_t addr, u32 target_level, bool flush)
+bool kvm_riscv_gstage_split_huge(struct kvm_gstage *gstage,
+				 struct kvm_mmu_memory_cache *pcache,
+				 gpa_t addr, u32 target_level, bool flush)
 {
 	u32 current_level = gstage->pgd_levels - 1;
 	pte_t *next_ptep = (pte_t *)gstage->pgd;
 	unsigned long huge_pte, child_pte;
 	unsigned long child_page_size;
+	bool need_flush = false;
 	pte_t *ptep;
 	int i, ret;

 	if (!pcache)
-		return -ENOMEM;
+		return false;

 	while(current_level > target_level) {
 		ptep = (pte_t *)&next_ptep[gstage_pte_index(gstage, addr, current_level)];
@@ -337,23 +338,31 @@ int kvm_riscv_gstage_split_huge(struct kvm_gstage *gstage,

 		next_ptep = kvm_mmu_memory_cache_alloc(pcache);
 		if (!next_ptep)
-			return -ENOMEM;
+			return need_flush;

 		for (i = 0; i < PTRS_PER_PTE; i++) {
 			child_pte = make_child_pte(huge_pte, i, child_page_size);
 			set_pte((pte_t *)&next_ptep[i], __pte(child_pte));
 		}

+		/*
+		 * Ensure the writes to the child PTEs are visible before
+		 * linking the new page table to the parent PTE.
+		 */
+		smp_wmb();
+
 		set_pte(ptep, pfn_pte(PFN_DOWN(__pa(next_ptep)),
 				__pgprot(_PAGE_TABLE)));

 		if (flush)
 			gstage_tlb_flush(gstage, current_level, addr);
+		else
+			need_flush = true;

 		current_level--;
 	}

-	return 0;
+	return need_flush;
 }

 bool kvm_riscv_gstage_op_pte(struct kvm_gstage *gstage, gpa_t addr,
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 9cf69bc28b9c5..95e83c50addf5 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -97,6 +97,60 @@ void kvm_riscv_mmu_iounmap(struct kvm *kvm, gpa_t gpa, unsigned long size)
 					    size >> PAGE_SHIFT);
 }

+static bool need_topup_split_caches_or_resched(struct kvm_mmu_memory_cache *cache,
+					       int count)
+{
+	if (need_resched())
+		return true;
+
+	return kvm_mmu_memory_cache_nr_free_objects(cache) < count;
+}
+
+static bool mmu_split_huge_pages(struct kvm_gstage *gstage,
+				 phys_addr_t start, phys_addr_t end)
+{
+	struct kvm *kvm = gstage->kvm;
+	struct kvm_mmu_memory_cache *pcache = &kvm->arch.pgd_split_page_cache;
+	phys_addr_t addr = ALIGN(start, PMD_SIZE);
+	phys_addr_t last_flush_gfn = addr >> PAGE_SHIFT;
+	int count = gstage->pgd_levels;
+	bool flush = false;
+	int ret;
+
+	lockdep_assert_held_write(&kvm->mmu_lock);
+
+	while (addr < end) {
+		if (need_topup_split_caches_or_resched(pcache, count)) {
+			if (flush) {
+				kvm_flush_remote_tlbs_range(kvm, last_flush_gfn,
+					  (addr >> PAGE_SHIFT) - last_flush_gfn);
+				last_flush_gfn = addr >> PAGE_SHIFT;
+				flush = false;
+			}
+
+			write_unlock(&kvm->mmu_lock);
+			cond_resched();
+
+			ret = kvm_mmu_topup_memory_cache(pcache, count);
+			if (ret) {
+				kvm_err("Failed to toup split page cache\n");
+				write_lock(&kvm->mmu_lock);
+				return flush;
+			}
+			write_lock(&kvm->mmu_lock);
+		}
+
+		if (!kvm->arch.pgd)
+			return flush;
+
+		flush |= kvm_riscv_gstage_split_huge(gstage, pcache, addr, 0, false);
+
+		addr += PMD_SIZE;
+	}
+
+	return flush;
+}
+
 void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 					     struct kvm_memory_slot *slot,
 					     gfn_t gfn_offset,
@@ -151,6 +205,25 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 					    size >> PAGE_SHIFT);
 }

+static void mmu_split_memory_region(struct kvm *kvm, int slot)
+{
+	struct kvm_memslots *slots = kvm_memslots(kvm);
+	struct kvm_memory_slot *memslot = id_to_memslot(slots, slot);
+	phys_addr_t start = memslot->base_gfn << PAGE_SHIFT;
+	phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
+	struct kvm_gstage gstage;
+	bool flush;
+
+	kvm_riscv_gstage_init(&gstage, kvm);
+
+	write_lock(&kvm->mmu_lock);
+	flush = mmu_split_huge_pages(&gstage, start, end);
+	write_unlock(&kvm->mmu_lock);
+
+	if (flush)
+		kvm_flush_remote_tlbs_memslot(kvm, memslot);
+}
+
 void kvm_arch_commit_memory_region(struct kvm *kvm,
 				struct kvm_memory_slot *old,
 				const struct kvm_memory_slot *new,
@@ -164,6 +237,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 		if (kvm_dirty_log_manual_protect_and_init_set(kvm))
 			return;
 		mmu_wp_memory_region(kvm, new->id);
+		mmu_split_memory_region(kvm, new->id);
 	}
 }

-- 
2.43.5

^ permalink raw reply related	[flat|nested] 17+ messages in thread

*  [PATCH v3 RESEND 3/5] RISC-V: KVM: Remove redundant TLB flush operations
  2026-06-24  8:00 ` wang.yechao255
@ 2026-06-24  8:09   ` wang.yechao255
  -1 siblings, 0 replies; 17+ messages in thread
From: wang.yechao255 @ 2026-06-24  8:09 UTC (permalink / raw)
  To: anup, kvm, kvm-riscv, linux-riscv, linux-kernel
  Cc: pjw, palmer, aou, atish.patra, alex, wang.yechao255

From: Wang Yechao <wang.yechao255@zte.com.cn>

The function kvm_arch_mmu_enable_log_dirty_pt_masked() is called by
kvm_vm_ioctl_reset_dirty_pages() and kvm_clear_dirty_log_protect().
Both callers already perform a TLB flush after enabling dirty logging,
so the TLB flush inside kvm_arch_mmu_enable_log_dirty_pt_masked() is
unnecessary. Remove it.

Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
---
 arch/riscv/kvm/mmu.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 95e83c50addf5..bc3bad67d507b 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -160,14 +160,10 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 	phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
 	phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
 	struct kvm_gstage gstage;
-	bool flush;

 	kvm_riscv_gstage_init(&gstage, kvm);

-	flush = kvm_riscv_gstage_wp_range(&gstage, start, end);
-	if (flush)
-		kvm_flush_remote_tlbs_range(kvm, start >> PAGE_SHIFT,
-					    (end - start) >> PAGE_SHIFT);
+	kvm_riscv_gstage_wp_range(&gstage, start, end);
 }

 void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
-- 
2.43.5

-- 
kvm-riscv mailing list
kvm-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kvm-riscv

^ permalink raw reply related	[flat|nested] 17+ messages in thread

*  [PATCH v3 RESEND 3/5] RISC-V: KVM: Remove redundant TLB flush operations
@ 2026-06-24  8:09   ` wang.yechao255
  0 siblings, 0 replies; 17+ messages in thread
From: wang.yechao255 @ 2026-06-24  8:09 UTC (permalink / raw)
  To: anup, kvm, kvm-riscv, linux-riscv, linux-kernel
  Cc: pjw, palmer, aou, atish.patra, alex, wang.yechao255

From: Wang Yechao <wang.yechao255@zte.com.cn>

The function kvm_arch_mmu_enable_log_dirty_pt_masked() is called by
kvm_vm_ioctl_reset_dirty_pages() and kvm_clear_dirty_log_protect().
Both callers already perform a TLB flush after enabling dirty logging,
so the TLB flush inside kvm_arch_mmu_enable_log_dirty_pt_masked() is
unnecessary. Remove it.

Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
---
 arch/riscv/kvm/mmu.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 95e83c50addf5..bc3bad67d507b 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -160,14 +160,10 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 	phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
 	phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
 	struct kvm_gstage gstage;
-	bool flush;

 	kvm_riscv_gstage_init(&gstage, kvm);

-	flush = kvm_riscv_gstage_wp_range(&gstage, start, end);
-	if (flush)
-		kvm_flush_remote_tlbs_range(kvm, start >> PAGE_SHIFT,
-					    (end - start) >> PAGE_SHIFT);
+	kvm_riscv_gstage_wp_range(&gstage, start, end);
 }

 void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
-- 
2.43.5

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 RESEND 4/5] RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG
  2026-06-24  8:00 ` wang.yechao255
@ 2026-06-24  8:11   ` wang.yechao255
  -1 siblings, 0 replies; 17+ messages in thread
From: wang.yechao255 @ 2026-06-24  8:11 UTC (permalink / raw)
  To: anup, kvm, kvm-riscv, linux-riscv, linux-kernel
  Cc: pjw, palmer, aou, atish.patra, alex, wang.yechao255

From: Wang Yechao <wang.yechao255@zte.com.cn>

Split huge pages on the range specified using KVM_CLEAR_DIRTY_LOG.
And do not split when enabling dirty logging if
KVM_DIRTY_LOG_INITIALLY_SET is set.

Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
Reviewed-by: Anup Patel <anup@brainfault.org>
---
 arch/riscv/kvm/mmu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index bc3bad67d507b..cbda927dd24e3 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -164,6 +164,9 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 	kvm_riscv_gstage_init(&gstage, kvm);

 	kvm_riscv_gstage_wp_range(&gstage, start, end);
+
+	if (kvm_dirty_log_manual_protect_and_init_set(kvm))
+		mmu_split_huge_pages(&gstage, start, end);
 }

 void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
-- 
2.43.5

-- 
kvm-riscv mailing list
kvm-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kvm-riscv

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 RESEND 4/5] RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG
@ 2026-06-24  8:11   ` wang.yechao255
  0 siblings, 0 replies; 17+ messages in thread
From: wang.yechao255 @ 2026-06-24  8:11 UTC (permalink / raw)
  To: anup, kvm, kvm-riscv, linux-riscv, linux-kernel
  Cc: pjw, palmer, aou, atish.patra, alex, wang.yechao255

From: Wang Yechao <wang.yechao255@zte.com.cn>

Split huge pages on the range specified using KVM_CLEAR_DIRTY_LOG.
And do not split when enabling dirty logging if
KVM_DIRTY_LOG_INITIALLY_SET is set.

Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
Reviewed-by: Anup Patel <anup@brainfault.org>
---
 arch/riscv/kvm/mmu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index bc3bad67d507b..cbda927dd24e3 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -164,6 +164,9 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 	kvm_riscv_gstage_init(&gstage, kvm);

 	kvm_riscv_gstage_wp_range(&gstage, start, end);
+
+	if (kvm_dirty_log_manual_protect_and_init_set(kvm))
+		mmu_split_huge_pages(&gstage, start, end);
 }

 void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
-- 
2.43.5

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 RESEND 5/5] RISC-V: KVM: Add the eager_page_split module parameter
  2026-06-24  8:00 ` wang.yechao255
@ 2026-06-24  8:13   ` wang.yechao255
  -1 siblings, 0 replies; 17+ messages in thread
From: wang.yechao255 @ 2026-06-24  8:13 UTC (permalink / raw)
  To: anup, kvm, kvm-riscv, linux-riscv, linux-kernel, linux-doc
  Cc: pjw, palmer, aou, atish.patra, alex, wang.yechao255

From: Wang Yechao <wang.yechao255@zte.com.cn>

Add an eager_page_split module parameter for RISC-V KVM, following
the same approach as on x86. This parameter controls whether eager
page splitting is enabled. The default value is on.

When eager page splitting is enabled, KVM proactively splits large
pages (huge pages) into smaller pages when needed for dirty logging
or other operations. Disabling it can be beneficial for VM workloads
that rarely perform writes, or that only write to a small region of
memory, as it allows huge pages to remain intact for read accesses.

Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
---
 Documentation/admin-guide/kernel-parameters.txt |  7 +++++--
 arch/riscv/kvm/mmu.c                            | 13 ++++++++++---
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a68003c3599cc..b4c68a896fa79 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3047,7 +3047,7 @@ Kernel parameters
 			Default is 0 (don't ignore, but inject #GP)

 	kvm.eager_page_split=
-			[KVM,X86] Controls whether or not KVM will try to
+			[KVM,X86,RISCV] Controls whether or not KVM will try to
 			proactively split all huge pages during dirty logging.
 			Eager page splitting reduces interruptions to vCPU
 			execution by eliminating the write-protection faults
@@ -3067,7 +3067,10 @@ Kernel parameters
 			the KVM_CLEAR_DIRTY ioctl, and only for the pages being
 			cleared.

-			Eager page splitting is only supported when kvm.tdp_mmu=Y.
+			On x86, eager page splitting is only supported when
+			kvm.tdp_mmu=Y.
+
+			On RISCV, eager page splitting is supported by default.

 			Default is Y (on).

diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index cbda927dd24e3..278dd3bba680e 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -16,6 +16,9 @@
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_nacl.h>

+static bool __read_mostly eager_page_split = true;
+module_param(eager_page_split, bool, 0644);
+
 static void mmu_wp_memory_region(struct kvm *kvm, int slot)
 {
 	struct kvm_memslots *slots = kvm_memslots(kvm);
@@ -165,8 +168,10 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,

 	kvm_riscv_gstage_wp_range(&gstage, start, end);

-	if (kvm_dirty_log_manual_protect_and_init_set(kvm))
-		mmu_split_huge_pages(&gstage, start, end);
+	if (kvm_dirty_log_manual_protect_and_init_set(kvm)) {
+		if (READ_ONCE(eager_page_split))
+			mmu_split_huge_pages(&gstage, start, end);
+	}
 }

 void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
@@ -236,7 +241,9 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 		if (kvm_dirty_log_manual_protect_and_init_set(kvm))
 			return;
 		mmu_wp_memory_region(kvm, new->id);
-		mmu_split_memory_region(kvm, new->id);
+
+		if (READ_ONCE(eager_page_split))
+			mmu_split_memory_region(kvm, new->id);
 	}
 }

-- 
2.43.5

-- 
kvm-riscv mailing list
kvm-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kvm-riscv

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 RESEND 5/5] RISC-V: KVM: Add the eager_page_split module parameter
@ 2026-06-24  8:13   ` wang.yechao255
  0 siblings, 0 replies; 17+ messages in thread
From: wang.yechao255 @ 2026-06-24  8:13 UTC (permalink / raw)
  To: anup, kvm, kvm-riscv, linux-riscv, linux-kernel, linux-doc
  Cc: pjw, palmer, aou, atish.patra, alex, wang.yechao255

From: Wang Yechao <wang.yechao255@zte.com.cn>

Add an eager_page_split module parameter for RISC-V KVM, following
the same approach as on x86. This parameter controls whether eager
page splitting is enabled. The default value is on.

When eager page splitting is enabled, KVM proactively splits large
pages (huge pages) into smaller pages when needed for dirty logging
or other operations. Disabling it can be beneficial for VM workloads
that rarely perform writes, or that only write to a small region of
memory, as it allows huge pages to remain intact for read accesses.

Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
---
 Documentation/admin-guide/kernel-parameters.txt |  7 +++++--
 arch/riscv/kvm/mmu.c                            | 13 ++++++++++---
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a68003c3599cc..b4c68a896fa79 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3047,7 +3047,7 @@ Kernel parameters
 			Default is 0 (don't ignore, but inject #GP)

 	kvm.eager_page_split=
-			[KVM,X86] Controls whether or not KVM will try to
+			[KVM,X86,RISCV] Controls whether or not KVM will try to
 			proactively split all huge pages during dirty logging.
 			Eager page splitting reduces interruptions to vCPU
 			execution by eliminating the write-protection faults
@@ -3067,7 +3067,10 @@ Kernel parameters
 			the KVM_CLEAR_DIRTY ioctl, and only for the pages being
 			cleared.

-			Eager page splitting is only supported when kvm.tdp_mmu=Y.
+			On x86, eager page splitting is only supported when
+			kvm.tdp_mmu=Y.
+
+			On RISCV, eager page splitting is supported by default.

 			Default is Y (on).

diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index cbda927dd24e3..278dd3bba680e 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -16,6 +16,9 @@
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_nacl.h>

+static bool __read_mostly eager_page_split = true;
+module_param(eager_page_split, bool, 0644);
+
 static void mmu_wp_memory_region(struct kvm *kvm, int slot)
 {
 	struct kvm_memslots *slots = kvm_memslots(kvm);
@@ -165,8 +168,10 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,

 	kvm_riscv_gstage_wp_range(&gstage, start, end);

-	if (kvm_dirty_log_manual_protect_and_init_set(kvm))
-		mmu_split_huge_pages(&gstage, start, end);
+	if (kvm_dirty_log_manual_protect_and_init_set(kvm)) {
+		if (READ_ONCE(eager_page_split))
+			mmu_split_huge_pages(&gstage, start, end);
+	}
 }

 void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
@@ -236,7 +241,9 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 		if (kvm_dirty_log_manual_protect_and_init_set(kvm))
 			return;
 		mmu_wp_memory_region(kvm, new->id);
-		mmu_split_memory_region(kvm, new->id);
+
+		if (READ_ONCE(eager_page_split))
+			mmu_split_memory_region(kvm, new->id);
 	}
 }

-- 
2.43.5

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 RESEND 5/5] RISC-V: KVM: Add the eager_page_split module parameter
  2026-06-24  8:13   ` wang.yechao255
  (?)
@ 2026-06-24  8:31   ` sashiko-bot
  -1 siblings, 0 replies; 17+ messages in thread
From: sashiko-bot @ 2026-06-24  8:31 UTC (permalink / raw)
  To: wang.yechao255; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [Critical] Dropping `mmu_lock` during eager page splitting without flushing the TLB allows guest writes to be silently lost from the dirty log.
--

commit 76d7c935b026714d65bf09e853c93dff6ddf2853
Author: Wang Yechao <wang.yechao255@zte.com.cn>

RISC-V: KVM: Add the eager_page_split module parameter

This commit adds an eager_page_split module parameter for RISC-V KVM.
When enabled, KVM proactively splits large pages into smaller pages for
dirty logging operations.

> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index cbda927dd24e3..278dd3bba680e 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c

[ ... ]

> @@ -165,8 +168,10 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>  
>  	kvm_riscv_gstage_wp_range(&gstage, start, end);
>  
> -	if (kvm_dirty_log_manual_protect_and_init_set(kvm))
> -		mmu_split_huge_pages(&gstage, start, end);
> +	if (kvm_dirty_log_manual_protect_and_init_set(kvm)) {
> +		if (READ_ONCE(eager_page_split))
> +			mmu_split_huge_pages(&gstage, start, end);
> +	}
>  }

[Severity: Critical]
This is a pre-existing issue, but does dropping the mmu_lock here without
flushing the TLB allow guest writes to be silently lost?

In kvm_arch_mmu_enable_log_dirty_pt_masked(), kvm_riscv_gstage_wp_range()
write-protects the stage-2 page tables, but its return value is ignored,
deferring the TLB flush.

It then calls mmu_split_huge_pages(), which can drop the kvm->mmu_lock
if it needs to reschedule or top up caches:

arch/riscv/kvm/mmu.c:mmu_split_huge_pages() {
    ...
    if (need_topup_split_caches_or_resched(pcache, count)) {
        if (flush) {
            ...
        }
        write_unlock(&kvm->mmu_lock);
        cond_resched();
    ...
}

If the lock is dropped here during eager page splitting before the deferred
TLB flush happens, can guest vCPUs still write to these pages using their
stale, writable TLB entries?

Because the dirty bitmap was already cleared by the caller
kvm_clear_dirty_log_protect(), and no page fault would occur to invoke
KVM's fault handler, these writes wouldn't be recorded in the dirty log.

Would this result in silent data corruption where the destination VM
resumes with stale memory after a live migration?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260624160054463wcDvJaMoydSggcNOWgcfB@zte.com.cn?part=5

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 RESEND 2/5] RISC-V: KVM: Split huge pages when dirty logging is enabled
  2026-06-24  8:07   ` wang.yechao255
  (?)
@ 2026-06-24  8:35   ` sashiko-bot
  -1 siblings, 0 replies; 17+ messages in thread
From: sashiko-bot @ 2026-06-24  8:35 UTC (permalink / raw)
  To: wang.yechao255; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] `need_topup_split_caches_or_resched` fails to check `rwlock_needbreak(&kvm->mmu_lock)`, causing lock starvation and latency spikes.
--

commit 10d6c81f6c350e7e0a1eeeb75389a0fa2b8dc5f8
Author: Wang Yechao <wang.yechao255@zte.com.cn>

RISC-V: KVM: Split huge pages when dirty logging is enabled

Eagerly split huge pages when enabling dirty logging to avoid faulting on
write-protected pages. This mirrors the approach taken on x86 and arm64
to improve guest performance.

> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index 9cf69bc28b9c5..95e83c50addf5 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
[ ... ]
> +static bool need_topup_split_caches_or_resched(struct kvm_mmu_memory_cache *cache,
> +					       int count)
> +{
> +	if (need_resched())
> +		return true;
> +
> +	return kvm_mmu_memory_cache_nr_free_objects(cache) < count;
> +}
> +
> +static bool mmu_split_huge_pages(struct kvm_gstage *gstage,
> +				 phys_addr_t start, phys_addr_t end)
> +{
> +	struct kvm *kvm = gstage->kvm;
> +	struct kvm_mmu_memory_cache *pcache = &kvm->arch.pgd_split_page_cache;
> +	phys_addr_t addr = ALIGN(start, PMD_SIZE);
> +	phys_addr_t last_flush_gfn = addr >> PAGE_SHIFT;
> +	int count = gstage->pgd_levels;
> +	bool flush = false;
> +	int ret;
> +
> +	lockdep_assert_held_write(&kvm->mmu_lock);
> +
> +	while (addr < end) {
> +		if (need_topup_split_caches_or_resched(pcache, count)) {

[Severity: High]
Does this code need to check rwlock_needbreak() to avoid lock starvation?

When mmu_split_huge_pages() iterates over a memslot with kvm->mmu_lock held
in write mode, preemption is disabled. The need_resched() check alone might
not capture lock contention on kvm->mmu_lock.

If there are no huge pages to split in the memslot, the memory cache will
never deplete. This means the loop could run over large amounts of memory
without ever yielding the lock.

Could this starve other vCPUs waiting to acquire kvm->mmu_lock in read mode?
Perhaps need_topup_split_caches_or_resched() should check
rwlock_needbreak(&kvm->mmu_lock) like other architectures do during eager
page splitting?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260624160054463wcDvJaMoydSggcNOWgcfB@zte.com.cn?part=2

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 RESEND 4/5] RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG
  2026-06-24  8:11   ` wang.yechao255
  (?)
@ 2026-06-24  8:35   ` sashiko-bot
  2026-06-24  9:41     ` wang.yechao255
  -1 siblings, 1 reply; 17+ messages in thread
From: sashiko-bot @ 2026-06-24  8:35 UTC (permalink / raw)
  To: wang.yechao255; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

New issues:
- [High] Eager huge page splitting during KVM_CLEAR_DIRTY_LOG silently fails for most ranges due to incorrect address alignment in mmu_split_huge_pages().

Pre-existing issues:
- [High] Write-protecting guest page table entries (SPTEs) clobbers concurrent hardware Access/Dirty bit updates.
--

commit 00cb697f813eca012a08732046a12ddb66d6af56
Author: Wang Yechao <wang.yechao255@zte.com.cn>

RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG

This commit splits huge pages on the range specified by KVM_CLEAR_DIRTY_LOG.
It also avoids splitting when enabling dirty logging if
KVM_DIRTY_LOG_INITIALLY_SET is set.

> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index bc3bad67d507b..cbda927dd24e3 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
> @@ -164,6 +164,9 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>  	kvm_riscv_gstage_init(&gstage, kvm);
>  
>  	kvm_riscv_gstage_wp_range(&gstage, start, end);

[Severity: High]
This is a pre-existing issue, but does calling kvm_riscv_gstage_wp_range()
clobber concurrent hardware Access/Dirty bit updates?

Looking at arch/riscv/kvm/gstage.c, kvm_riscv_gstage_op_pte() uses a
non-atomic read-modify-write when write-protecting:

kvm_riscv_gstage_op_pte() {
    ...
    else if (op == GSTAGE_OP_WP)
        set_pte(ptep, __pte(pte_val(ptep_get(ptep)) & ~_PAGE_WRITE));
    ...
}

If the hardware page-table walker sets the Accessed or Dirty bit between the
read and the write, it seems the bit is permanently lost.

> +
> +	if (kvm_dirty_log_manual_protect_and_init_set(kvm))
> +		mmu_split_huge_pages(&gstage, start, end);

[Severity: High]
Does mmu_split_huge_pages() fail to split overlapping huge pages for small
or unaligned ranges?

Looking at mmu_split_huge_pages() in arch/riscv/kvm/mmu.c, it initializes
the address using ALIGN():

    phys_addr_t addr = ALIGN(start, PMD_SIZE);

If KVM_CLEAR_DIRTY_LOG passes an unaligned start address (e.g., clearing a
small mask), ALIGN() rounds the start address up. This could cause addr to
be greater than or equal to end, resulting in the while (addr < end) loop
terminating immediately and skipping the huge page that overlaps the
beginning of the target range. Should this use ALIGN_DOWN() instead?

>  }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260624160054463wcDvJaMoydSggcNOWgcfB@zte.com.cn?part=4

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 RESEND 4/5] RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG
  2026-06-24  8:35   ` sashiko-bot
@ 2026-06-24  9:41     ` wang.yechao255
  0 siblings, 0 replies; 17+ messages in thread
From: wang.yechao255 @ 2026-06-24  9:41 UTC (permalink / raw)
  To: sashiko-bot; +Cc: kvm

> [Severity: High]
> Does mmu_split_huge_pages() fail to split overlapping huge pages for small
> or unaligned ranges?
> 
> Looking at mmu_split_huge_pages() in arch/riscv/kvm/mmu.c, it initializes
> the address using ALIGN():
> 
>    phys_addr_t addr = ALIGN(start, PMD_SIZE);
>
> If KVM_CLEAR_DIRTY_LOG passes an unaligned start address (e.g., clearing a
> small mask), ALIGN() rounds the start address up. This could cause addr to
> be greater than or equal to end, resulting in the while (addr < end) loop
> terminating immediately and skipping the huge page that overlaps the
> beginning of the target range. Should this use ALIGN_DOWN() instead?
Good catch. 
I'll change it to ALIGN_DOWN(start, PMD_SIZE) so that we always start
splitting from the huge page that contains start. Will fix this in the
next version.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2026-06-24  9:41 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-24  8:00 [PATCH v3 RESEND 0/5] Implement Eager Page Splitting for RISC-V wang.yechao255
2026-06-24  8:00 ` wang.yechao255
2026-06-24  8:00 ` wang.yechao255
2026-06-24  8:05 ` [PATCH v3 RESEND 1/5] RISC-V: KVM: Add the split page cache for ioctl context wang.yechao255
2026-06-24  8:05   ` wang.yechao255
2026-06-24  8:07 ` [PATCH v3 RESEND 2/5] RISC-V: KVM: Split huge pages when dirty logging is enabled wang.yechao255
2026-06-24  8:07   ` wang.yechao255
2026-06-24  8:35   ` sashiko-bot
2026-06-24  8:09 ` [PATCH v3 RESEND 3/5] RISC-V: KVM: Remove redundant TLB flush operations wang.yechao255
2026-06-24  8:09   ` wang.yechao255
2026-06-24  8:11 ` [PATCH v3 RESEND 4/5] RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG wang.yechao255
2026-06-24  8:11   ` wang.yechao255
2026-06-24  8:35   ` sashiko-bot
2026-06-24  9:41     ` wang.yechao255
2026-06-24  8:13 ` [PATCH v3 RESEND 5/5] RISC-V: KVM: Add the eager_page_split module parameter wang.yechao255
2026-06-24  8:13   ` wang.yechao255
2026-06-24  8:31   ` sashiko-bot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.