[PATCH v4 0/5] Implement Eager Page Splitting for RISC-V

Kernel KVM virtualization development
 help / color / mirror / Atom feed

* [PATCH v4 0/5] Implement Eager Page Splitting for RISC-V
@ 2026-07-01 12:09 Wang Yechao
  2026-07-01 12:09 ` [PATCH v4 1/5] RISC-V: KVM: Add the split page cache for ioctl context Wang Yechao
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Wang Yechao @ 2026-07-01 12:09 UTC (permalink / raw)
  To: Anup Patel, kvm, kvm-riscv, linux-riscv, linux-kernel
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Atish Patra,
	Alexandre Ghiti, Wang Yechao

Eager Page Splitting is implemented on x86 and ARM. It improves the
performance of dirty logging (used in live migrations) when guest memory
is backed by huge pages.

This series implement Eager Page Splitting for RISC-V. The Implementation
similar to x86 and ARM. It provides two ways to split huge pages in ioctl
context instead of on fault in vCPU context:

- Split huge pages when dirty logging is enabled when
  KVM_DIRTY_LOG_INITIALLY_SET is not set. This happens when enabling the
  KVM_MEM_LOG_DIRTY_PAGES flag of a memslot, and splits the whole memslot
  into 4K mappings.

- Split huge pages during KVM_CLEAR_DIRTY_LOG when
  KVM_DIRTY_LOG_INITIALLY_SET is set. This happens when enabling dirty log
  in small chunks. It does not split the whole memslot, but only the
  requested chunk range.

Changes in v4:
 - Rebase on v7.2-rc1 version.
 sashiko-bot AI review
 (https://sashiko.dev/#/patchset/20260624160054463wcDvJaMoydSggcNOWgcfB@zte.com.cn)
 - Add the rwlock_needbreak() check when toup split cache.
 - ALIGN_DOWN the start addr to PMD_SIZE.

Changes in v3 resend:
 - Fix patch format to ensure emails reach the linux-riscv and kvm-riscv
   mailing lists.
 - Move the free pgd_split_page_cache into kvm_arch_destroy_vm().
 https://lore.kernel.org/kvm-riscv/20260624160054463wcDvJaMoydSggcNOWgcfB@zte.com.cn/

Changes in v3:
 - Rebase on v7.1 version.
 - Add patch03 to remove the redundant TLB flush operations.
 
 sashiko-bot AI review
 (https://sashiko.dev/#/message/20260603104847.9692C1F00893%40smtp.kernel.org)
 - Check the kvm->arch.pgd before split huge pages.
 - Align the start address to PMD_SIZE before split.
 - Flushing remote TLBs before Dropping mmu_lock.

Changes in v2:
 - Rename the split_page_cache.
 - Rename the kvm_riscv_split_huge_pages and
   kvm_riscv_split_memory_region.
 - Add lockdep_assert_held check before split huge pages.
 - Update Documentation/admin-guide/kernel-parameters.txt.
 - Link to v2
 https://lore.kernel.org/linux-riscv/20260603175256408L0jnqGs1cJGc0ijCdujci@zte.com.cn/

 - Link to v1:
 https://lore.kernel.org/linux-riscv/20260513153656847l3c4fI5hBsAyoIZi8aGIs@zte.com.cn/

Wang Yechao (5):
  RISC-V: KVM: Add the split page cache for ioctl context
  RISC-V: KVM: Split huge pages when dirty logging is enabled
  RISC-V: KVM: Remove redundant TLB flush operations
  RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG
  RISC-V: KVM: Add the eager_page_split module parameter

 .../admin-guide/kernel-parameters.txt         |  7 +-
 arch/riscv/include/asm/kvm_gstage.h           |  6 +-
 arch/riscv/include/asm/kvm_host.h             |  1 +
 arch/riscv/kvm/gstage.c                       | 23 +++--
 arch/riscv/kvm/mmu.c                          | 93 ++++++++++++++++++-
 arch/riscv/kvm/vm.c                           |  6 ++
 6 files changed, 119 insertions(+), 17 deletions(-)

-- 
2.43.5


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v4 1/5] RISC-V: KVM: Add the split page cache for ioctl context
  2026-07-01 12:09 [PATCH v4 0/5] Implement Eager Page Splitting for RISC-V Wang Yechao
@ 2026-07-01 12:09 ` Wang Yechao
  2026-07-01 12:09 ` [PATCH v4 2/5] RISC-V: KVM: Split huge pages when dirty logging is enabled Wang Yechao
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Wang Yechao @ 2026-07-01 12:09 UTC (permalink / raw)
  To: Anup Patel, kvm, kvm-riscv, linux-riscv, linux-kernel
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Atish Patra,
	Alexandre Ghiti, Wang Yechao

Add the split page cache for dirty logging enablement and the
KVM_CLEAR_DIRTY_LOG ioctl.

Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
---
 arch/riscv/include/asm/kvm_host.h | 1 +
 arch/riscv/kvm/mmu.c              | 1 +
 arch/riscv/kvm/vm.c               | 6 ++++++
 3 files changed, 8 insertions(+)

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index 60017ceec9d2a..69f73fd106a94 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -86,6 +86,7 @@ struct kvm_arch {
 	pgd_t *pgd;
 	phys_addr_t pgd_phys;
 	unsigned long pgd_levels;
+	struct kvm_mmu_memory_cache pgd_split_page_cache;
 
 	/* Guest Timer */
 	struct kvm_guest_timer timer;
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 082f9b2617338..9cf69bc28b9c5 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -676,6 +676,7 @@ int kvm_riscv_mmu_alloc_pgd(struct kvm *kvm)
 	kvm->arch.pgd = page_to_virt(pgd_page);
 	kvm->arch.pgd_phys = page_to_phys(pgd_page);
 	kvm->arch.pgd_levels = kvm_riscv_gstage_max_pgd_levels;
+	kvm->arch.pgd_split_page_cache.gfp_zero = __GFP_ZERO;
 
 	return 0;
 }
diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
index a9f083feeb767..be38f24a297d6 100644
--- a/arch/riscv/kvm/vm.c
+++ b/arch/riscv/kvm/vm.c
@@ -54,6 +54,12 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 	kvm_destroy_vcpus(kvm);
 
 	kvm_riscv_aia_destroy_vm(kvm);
+
+	/*
+	 * Free the split page cache after all vCPUs and devices are destroyed.
+	 * At this point, there are no concurrent accesses to the cache.
+	 */
+	kvm_mmu_free_memory_cache(&kvm->arch.pgd_split_page_cache);
 }
 
 int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irql,
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v4 2/5] RISC-V: KVM: Split huge pages when dirty logging is enabled
  2026-07-01 12:09 [PATCH v4 0/5] Implement Eager Page Splitting for RISC-V Wang Yechao
  2026-07-01 12:09 ` [PATCH v4 1/5] RISC-V: KVM: Add the split page cache for ioctl context Wang Yechao
@ 2026-07-01 12:09 ` Wang Yechao
  2026-07-01 12:09 ` [PATCH v4 3/5] RISC-V: KVM: Remove redundant TLB flush operations Wang Yechao
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Wang Yechao @ 2026-07-01 12:09 UTC (permalink / raw)
  To: Anup Patel, kvm, kvm-riscv, linux-riscv, linux-kernel
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Atish Patra,
	Alexandre Ghiti, Wang Yechao

Split huge pages eagerly when enabling dirty logging. The goal is to
avoid doing it while faulting on write-protected pages, which
negatively impacts guest performance.

The benefits of eager page splitting are the same as in x86 and arm64,
added with commit a3fe5dbda0a4 ("KVM: x86/mmu: Split huge pages mapped
by the TDP MMU when dirty logging is enabled") and commit e7bf7a490c68
("KVM: arm64: Split huge pages when dirty logging is enabled")

Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
---
 arch/riscv/include/asm/kvm_gstage.h |  6 +--
 arch/riscv/kvm/gstage.c             | 23 ++++++---
 arch/riscv/kvm/mmu.c                | 76 +++++++++++++++++++++++++++++
 3 files changed, 95 insertions(+), 10 deletions(-)

diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/kvm_gstage.h
index 21e2019df0cf5..f726279780177 100644
--- a/arch/riscv/include/asm/kvm_gstage.h
+++ b/arch/riscv/include/asm/kvm_gstage.h
@@ -64,9 +64,9 @@ int kvm_riscv_gstage_map_page(struct kvm_gstage *gstage,
 			      bool page_rdonly, bool page_exec,
 			      struct kvm_gstage_mapping *out_map);
 
-int kvm_riscv_gstage_split_huge(struct kvm_gstage *gstage,
-				struct kvm_mmu_memory_cache *pcache,
-				gpa_t addr, u32 target_level, bool flush);
+bool kvm_riscv_gstage_split_huge(struct kvm_gstage *gstage,
+				 struct kvm_mmu_memory_cache *pcache,
+				 gpa_t addr, u32 target_level, bool flush);
 
 enum kvm_riscv_gstage_op {
 	GSTAGE_OP_NOP = 0,	/* Nothing */
diff --git a/arch/riscv/kvm/gstage.c b/arch/riscv/kvm/gstage.c
index c4c3b79567f10..4815233f9788d 100644
--- a/arch/riscv/kvm/gstage.c
+++ b/arch/riscv/kvm/gstage.c
@@ -303,19 +303,20 @@ static inline unsigned long make_child_pte(unsigned long huge_pte, int index,
 	return child_pte;
 }
 
-int kvm_riscv_gstage_split_huge(struct kvm_gstage *gstage,
-				struct kvm_mmu_memory_cache *pcache,
-				gpa_t addr, u32 target_level, bool flush)
+bool kvm_riscv_gstage_split_huge(struct kvm_gstage *gstage,
+				 struct kvm_mmu_memory_cache *pcache,
+				 gpa_t addr, u32 target_level, bool flush)
 {
 	u32 current_level = gstage->pgd_levels - 1;
 	pte_t *next_ptep = (pte_t *)gstage->pgd;
 	unsigned long huge_pte, child_pte;
 	unsigned long child_page_size;
+	bool need_flush = false;
 	pte_t *ptep;
 	int i, ret;
 
 	if (!pcache)
-		return -ENOMEM;
+		return false;
 
 	while(current_level > target_level) {
 		ptep = (pte_t *)&next_ptep[gstage_pte_index(gstage, addr, current_level)];
@@ -333,27 +334,35 @@ int kvm_riscv_gstage_split_huge(struct kvm_gstage *gstage,
 
 		ret = gstage_level_to_page_size(gstage, current_level - 1, &child_page_size);
 		if (ret)
-			return ret;
+			return need_flush;
 
 		next_ptep = kvm_mmu_memory_cache_alloc(pcache);
 		if (!next_ptep)
-			return -ENOMEM;
+			return need_flush;
 
 		for (i = 0; i < PTRS_PER_PTE; i++) {
 			child_pte = make_child_pte(huge_pte, i, child_page_size);
 			set_pte((pte_t *)&next_ptep[i], __pte(child_pte));
 		}
 
+		/*
+		 * Ensure the writes to the child PTEs are visible before
+		 * linking the new page table to the parent PTE.
+		 */
+		smp_wmb();
+
 		set_pte(ptep, pfn_pte(PFN_DOWN(__pa(next_ptep)),
 				__pgprot(_PAGE_TABLE)));
 
 		if (flush)
 			gstage_tlb_flush(gstage, current_level, addr);
+		else
+			need_flush = true;
 
 		current_level--;
 	}
 
-	return 0;
+	return need_flush;
 }
 
 bool kvm_riscv_gstage_op_pte(struct kvm_gstage *gstage, gpa_t addr,
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 9cf69bc28b9c5..363238efaedb4 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -97,6 +97,62 @@ void kvm_riscv_mmu_iounmap(struct kvm *kvm, gpa_t gpa, unsigned long size)
 					    size >> PAGE_SHIFT);
 }
 
+static bool need_topup_split_caches_or_resched(struct kvm *kvm, int count)
+{
+	struct kvm_mmu_memory_cache *cache;
+
+	if (need_resched() || rwlock_needbreak(&kvm->mmu_lock))
+		return true;
+
+	cache = &kvm->arch.pgd_split_page_cache;
+	return kvm_mmu_memory_cache_nr_free_objects(cache) < count;
+}
+
+static bool mmu_split_huge_pages(struct kvm_gstage *gstage,
+				 phys_addr_t start, phys_addr_t end)
+{
+	struct kvm *kvm = gstage->kvm;
+	struct kvm_mmu_memory_cache *pcache = &kvm->arch.pgd_split_page_cache;
+	phys_addr_t addr = ALIGN_DOWN(start, PMD_SIZE);
+	phys_addr_t last_flush_gfn = addr >> PAGE_SHIFT;
+	int count = gstage->pgd_levels;
+	bool flush = false;
+	int ret;
+
+	lockdep_assert_held_write(&kvm->mmu_lock);
+
+	while (addr < end) {
+		if (need_topup_split_caches_or_resched(kvm, count)) {
+			if (flush) {
+				kvm_flush_remote_tlbs_range(kvm, last_flush_gfn,
+					  (addr >> PAGE_SHIFT) - last_flush_gfn);
+				last_flush_gfn = addr >> PAGE_SHIFT;
+				flush = false;
+			}
+
+			write_unlock(&kvm->mmu_lock);
+			cond_resched();
+
+			ret = kvm_mmu_topup_memory_cache(pcache, count);
+			if (ret) {
+				kvm_err("Failed to toup split page cache\n");
+				write_lock(&kvm->mmu_lock);
+				return flush;
+			}
+			write_lock(&kvm->mmu_lock);
+		}
+
+		if (!kvm->arch.pgd)
+			return flush;
+
+		flush |= kvm_riscv_gstage_split_huge(gstage, pcache, addr, 0, false);
+
+		addr += PMD_SIZE;
+	}
+
+	return flush;
+}
+
 void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 					     struct kvm_memory_slot *slot,
 					     gfn_t gfn_offset,
@@ -151,6 +207,25 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 					    size >> PAGE_SHIFT);
 }
 
+static void mmu_split_memory_region(struct kvm *kvm, int slot)
+{
+	struct kvm_memslots *slots = kvm_memslots(kvm);
+	struct kvm_memory_slot *memslot = id_to_memslot(slots, slot);
+	phys_addr_t start = memslot->base_gfn << PAGE_SHIFT;
+	phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
+	struct kvm_gstage gstage;
+	bool flush;
+
+	kvm_riscv_gstage_init(&gstage, kvm);
+
+	write_lock(&kvm->mmu_lock);
+	flush = mmu_split_huge_pages(&gstage, start, end);
+	write_unlock(&kvm->mmu_lock);
+
+	if (flush)
+		kvm_flush_remote_tlbs_memslot(kvm, memslot);
+}
+
 void kvm_arch_commit_memory_region(struct kvm *kvm,
 				struct kvm_memory_slot *old,
 				const struct kvm_memory_slot *new,
@@ -164,6 +239,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 		if (kvm_dirty_log_manual_protect_and_init_set(kvm))
 			return;
 		mmu_wp_memory_region(kvm, new->id);
+		mmu_split_memory_region(kvm, new->id);
 	}
 }
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v4 3/5] RISC-V: KVM: Remove redundant TLB flush operations
  2026-07-01 12:09 [PATCH v4 0/5] Implement Eager Page Splitting for RISC-V Wang Yechao
  2026-07-01 12:09 ` [PATCH v4 1/5] RISC-V: KVM: Add the split page cache for ioctl context Wang Yechao
  2026-07-01 12:09 ` [PATCH v4 2/5] RISC-V: KVM: Split huge pages when dirty logging is enabled Wang Yechao
@ 2026-07-01 12:09 ` Wang Yechao
  2026-07-01 12:09 ` [PATCH v4 4/5] RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG Wang Yechao
  2026-07-01 12:09 ` [PATCH v4 5/5] RISC-V: KVM: Add the eager_page_split module parameter Wang Yechao
  4 siblings, 0 replies; 8+ messages in thread
From: Wang Yechao @ 2026-07-01 12:09 UTC (permalink / raw)
  To: Anup Patel, kvm, kvm-riscv, linux-riscv, linux-kernel
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Atish Patra,
	Alexandre Ghiti, Wang Yechao

The function kvm_arch_mmu_enable_log_dirty_pt_masked() is invoked from
two distinct call paths:

kvm_clear_dirty_log_protect()
  kvm_arch_mmu_enable_log_dirty_pt_masked()

kvm_vm_ioctl_reset_dirty_pages()
  kvm_dirty_ring_reset()
    kvm_reset_dirty_gfn()
        kvm_arch_mmu_enable_log_dirty_pt_masked()

In both scenarios, the caller already performs a remote TLB flush after
dirty logging is enabled, so the TLB flush inside
kvm_arch_mmu_enable_log_dirty_pt_masked() is unnecessary. Remove it.

Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
---
 arch/riscv/kvm/mmu.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 363238efaedb4..056c0abe278af 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -162,14 +162,10 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 	phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
 	phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
 	struct kvm_gstage gstage;
-	bool flush;
 
 	kvm_riscv_gstage_init(&gstage, kvm);
 
-	flush = kvm_riscv_gstage_wp_range(&gstage, start, end);
-	if (flush)
-		kvm_flush_remote_tlbs_range(kvm, start >> PAGE_SHIFT,
-					    (end - start) >> PAGE_SHIFT);
+	kvm_riscv_gstage_wp_range(&gstage, start, end);
 }
 
 void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v4 4/5] RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG
  2026-07-01 12:09 [PATCH v4 0/5] Implement Eager Page Splitting for RISC-V Wang Yechao
                   ` (2 preceding siblings ...)
  2026-07-01 12:09 ` [PATCH v4 3/5] RISC-V: KVM: Remove redundant TLB flush operations Wang Yechao
@ 2026-07-01 12:09 ` Wang Yechao
  2026-07-01 12:32   ` sashiko-bot
  2026-07-01 12:09 ` [PATCH v4 5/5] RISC-V: KVM: Add the eager_page_split module parameter Wang Yechao
  4 siblings, 1 reply; 8+ messages in thread
From: Wang Yechao @ 2026-07-01 12:09 UTC (permalink / raw)
  To: Anup Patel, kvm, kvm-riscv, linux-riscv, linux-kernel
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Atish Patra,
	Alexandre Ghiti, Wang Yechao

Split huge pages on the range specified using KVM_CLEAR_DIRTY_LOG.
And do not split when enabling dirty logging if
KVM_DIRTY_LOG_INITIALLY_SET is set.

Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
Reviewed-by: Anup Patel <anup@brainfault.org>
---
 arch/riscv/kvm/mmu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 056c0abe278af..29a7ac023d4db 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -166,6 +166,9 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 	kvm_riscv_gstage_init(&gstage, kvm);
 
 	kvm_riscv_gstage_wp_range(&gstage, start, end);
+
+	if (kvm_dirty_log_manual_protect_and_init_set(kvm))
+		mmu_split_huge_pages(&gstage, start, end);
 }
 
 void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 4/5] RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG
  2026-07-01 12:09 ` [PATCH v4 4/5] RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG Wang Yechao
@ 2026-07-01 12:32   ` sashiko-bot
  0 siblings, 0 replies; 8+ messages in thread
From: sashiko-bot @ 2026-07-01 12:32 UTC (permalink / raw)
  To: Wang Yechao; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

New issues:
- [High] Dropping `mmu_lock` after write-protecting page tables but before flushing the TLB leads to lost dirty pages.

Pre-existing issues:
- [Critical] `kvm_riscv_gstage_split_huge` overwrites leaf PTEs non-atomically, clobbering concurrent hardware A/D bit updates.
--

commit d0195abbe59e08ad1f5c62c2d279ff75e0b3a3f5
Author: Wang Yechao <wang.yechao255@zte.com.cn>
RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG

This commit splits huge pages on the range specified using the clear dirty log
ioctl. It also avoids splitting when enabling dirty logging if the initially
set flag is present.

> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index 056c0abe278af..29a7ac023d4db 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
> @@ -166,6 +166,9 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>  	kvm_riscv_gstage_init(&gstage, kvm);
>  
>  	kvm_riscv_gstage_wp_range(&gstage, start, end);
> +
> +	if (kvm_dirty_log_manual_protect_and_init_set(kvm))
> +		mmu_split_huge_pages(&gstage, start, end);

[Severity: High]
Does ignoring the boolean return value from kvm_riscv_gstage_wp_range() create
a vulnerability window?

If mmu_split_huge_pages() drops the mmu_lock and yields before the TLB is
flushed:

arch/riscv/kvm/mmu.c:mmu_split_huge_pages() {
    ...
		if (need_topup_split_caches_or_resched(kvm, count)) {
			...
			write_unlock(&kvm->mmu_lock);
			cond_resched();
			...
}

vCPUs could continue writing to these pages using stale writable TLB entries.

Since the software dirty bitmap was already cleared by the clear dirty log
ioctl prior to this call, could these writes be completely lost during
migration?

[Severity: Critical]
This isn't a bug introduced by this patch, but looking at
mmu_split_huge_pages() led me to kvm_riscv_gstage_split_huge() which
overwrites leaf PTEs non-atomically:

arch/riscv/kvm/gstage.c:kvm_riscv_gstage_split_huge() {
    ...
		huge_pte = pte_val(ptep_get(ptep));
        ...
		set_pte(ptep, pfn_pte(PFN_DOWN(__pa(next_ptep)),
				__pgprot(_PAGE_TABLE)));
    ...
}

Does this set_pte() clobber concurrent hardware accessed/dirty bit updates?

If the hardware page table walker sets the dirty bit on the huge leaf PTE
between the initial read of huge_pte and this unconditional set_pte()
overwrite, those hardware updates appear to be lost.

Should this use an atomic operation (like a cmpxchg loop) when modifying leaf
PTEs shared with hardware to avoid missing dirty pages?

>  }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701120954.129873-1-wang.yechao255@zte.com.cn?part=4

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v4 5/5] RISC-V: KVM: Add the eager_page_split module parameter
  2026-07-01 12:09 [PATCH v4 0/5] Implement Eager Page Splitting for RISC-V Wang Yechao
                   ` (3 preceding siblings ...)
  2026-07-01 12:09 ` [PATCH v4 4/5] RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG Wang Yechao
@ 2026-07-01 12:09 ` Wang Yechao
  2026-07-01 12:24   ` sashiko-bot
  4 siblings, 1 reply; 8+ messages in thread
From: Wang Yechao @ 2026-07-01 12:09 UTC (permalink / raw)
  To: Anup Patel, kvm, kvm-riscv, linux-riscv, linux-kernel
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Atish Patra,
	Alexandre Ghiti, Wang Yechao, linux-doc

Add an eager_page_split module parameter for RISC-V KVM, following
the same approach as on x86. This parameter controls whether eager
page splitting is enabled. The default value is on.

When eager page splitting is enabled, KVM proactively splits large
pages (huge pages) into smaller pages when needed for dirty logging
or other operations. Disabling it can be beneficial for VM workloads
that rarely perform writes, or that only write to a small region of
memory, as it allows huge pages to remain intact for read accesses.

Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
---
 Documentation/admin-guide/kernel-parameters.txt |  7 +++++--
 arch/riscv/kvm/mmu.c                            | 13 ++++++++++---
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a68003c3599cc..b4c68a896fa79 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3047,7 +3047,7 @@ Kernel parameters
 			Default is 0 (don't ignore, but inject #GP)
 
 	kvm.eager_page_split=
-			[KVM,X86] Controls whether or not KVM will try to
+			[KVM,X86,RISCV] Controls whether or not KVM will try to
 			proactively split all huge pages during dirty logging.
 			Eager page splitting reduces interruptions to vCPU
 			execution by eliminating the write-protection faults
@@ -3067,7 +3067,10 @@ Kernel parameters
 			the KVM_CLEAR_DIRTY ioctl, and only for the pages being
 			cleared.
 
-			Eager page splitting is only supported when kvm.tdp_mmu=Y.
+			On x86, eager page splitting is only supported when
+			kvm.tdp_mmu=Y.
+
+			On RISCV, eager page splitting is supported by default.
 
 			Default is Y (on).
 
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 29a7ac023d4db..a534f596c8ca0 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -16,6 +16,9 @@
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_nacl.h>
 
+static bool __read_mostly eager_page_split = true;
+module_param(eager_page_split, bool, 0644);
+
 static void mmu_wp_memory_region(struct kvm *kvm, int slot)
 {
 	struct kvm_memslots *slots = kvm_memslots(kvm);
@@ -167,8 +170,10 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 
 	kvm_riscv_gstage_wp_range(&gstage, start, end);
 
-	if (kvm_dirty_log_manual_protect_and_init_set(kvm))
-		mmu_split_huge_pages(&gstage, start, end);
+	if (kvm_dirty_log_manual_protect_and_init_set(kvm)) {
+		if (READ_ONCE(eager_page_split))
+			mmu_split_huge_pages(&gstage, start, end);
+	}
 }
 
 void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
@@ -238,7 +243,9 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 		if (kvm_dirty_log_manual_protect_and_init_set(kvm))
 			return;
 		mmu_wp_memory_region(kvm, new->id);
-		mmu_split_memory_region(kvm, new->id);
+
+		if (READ_ONCE(eager_page_split))
+			mmu_split_memory_region(kvm, new->id);
 	}
 }
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 5/5] RISC-V: KVM: Add the eager_page_split module parameter
  2026-07-01 12:09 ` [PATCH v4 5/5] RISC-V: KVM: Add the eager_page_split module parameter Wang Yechao
@ 2026-07-01 12:24   ` sashiko-bot
  0 siblings, 0 replies; 8+ messages in thread
From: sashiko-bot @ 2026-07-01 12:24 UTC (permalink / raw)
  To: Wang Yechao; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] Lost dirty tracking bits when yielding `mmu_lock` during eager page splitting.
--

commit c10dcf7c74a70f5a554a9a6c72e02322a9c48b68
Author: Wang Yechao <wang.yechao255@zte.com.cn>

RISC-V: KVM: Add the eager_page_split module parameter

This patch adds an eager_page_split module parameter for RISC-V KVM, following
the same approach as on x86. It controls whether eager page splitting is
enabled during dirty logging. Disabling it can benefit VM workloads that
rarely perform writes by keeping huge pages intact.

> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index 29a7ac023d4db..a534f596c8ca0 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c

[ ... ]

> @@ -167,8 +170,10 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>  
>  	kvm_riscv_gstage_wp_range(&gstage, start, end);
>  
> -	if (kvm_dirty_log_manual_protect_and_init_set(kvm))
> -		mmu_split_huge_pages(&gstage, start, end);
> +	if (kvm_dirty_log_manual_protect_and_init_set(kvm)) {
> +		if (READ_ONCE(eager_page_split))
> +			mmu_split_huge_pages(&gstage, start, end);
> +	}
>  }

[Severity: High]
This is a pre-existing issue, but since we're modifying this path, could this
lead to lost dirty tracking bits?

In kvm_arch_mmu_enable_log_dirty_pt_masked(), kvm_riscv_gstage_wp_range()
write-protects the page tables and returns a boolean indicating if a TLB
flush is needed, but this return value is currently ignored.

When mmu_split_huge_pages() is called immediately after, it can drop the
mmu_lock and yield:

arch/riscv/kvm/mmu.c:mmu_split_huge_pages() {
    ...
    if (need_topup_split_caches_or_resched(kvm, count)) {
        if (flush) {
            ...
        }

        write_unlock(&kvm->mmu_lock);
        cond_resched();
    ...
}

If the lock is dropped without flushing the TLBs for the newly
write-protected pages, couldn't guest vCPUs continue executing and write
to these pages without taking a page fault?

This seems like it would allow the hardware to use stale writable TLB entries
while KVM has already cleared its software dirty bitmap, potentially leading
to silent memory corruption during live migration.

Should we ensure the TLBs are flushed for the write-protected range before
yielding the mmu_lock?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701120954.129873-1-wang.yechao255@zte.com.cn?part=5

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-07-01 12:32 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-07-01 12:09 [PATCH v4 0/5] Implement Eager Page Splitting for RISC-V Wang Yechao
2026-07-01 12:09 ` [PATCH v4 1/5] RISC-V: KVM: Add the split page cache for ioctl context Wang Yechao
2026-07-01 12:09 ` [PATCH v4 2/5] RISC-V: KVM: Split huge pages when dirty logging is enabled Wang Yechao
2026-07-01 12:09 ` [PATCH v4 3/5] RISC-V: KVM: Remove redundant TLB flush operations Wang Yechao
2026-07-01 12:09 ` [PATCH v4 4/5] RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG Wang Yechao
2026-07-01 12:32   ` sashiko-bot
2026-07-01 12:09 ` [PATCH v4 5/5] RISC-V: KVM: Add the eager_page_split module parameter Wang Yechao
2026-07-01 12:24   ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox