[PATCH v3 0/5] Implement Eager Page Splitting for RISC-V

Kernel KVM virtualization development
 help / color / mirror / Atom feed

* [PATCH v3 0/5] Implement Eager Page Splitting for RISC-V
@ 2026-06-23 12:13 Wang Yechao
  2026-06-23 12:13 ` [PATCH v3 1/5] RISC-V: KVM: Add the split page cache for ioctl context Wang Yechao
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Wang Yechao @ 2026-06-23 12:13 UTC (permalink / raw)
  To: Anup Patel, kvm, kvm-riscv, linux-riscv, linux-kernel
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Atish Patra,
	Alexandre Ghiti, Wang Yechao


[-- Attachment #1.1.1: Type: text/plain, Size: 2335 bytes --]

Eager Page Splitting is implemented on x86 and ARM. It improves the
performance of dirty logging (used in live migrations) when guest memory
is backed by huge pages.

This series implement Eager Page Splitting for RISC-V. The Implementation
similar to x86 and ARM. It provides two ways to split huge pages in ioctl
context instead of on fault in vCPU context:

- Split huge pages when dirty logging is enabled when
  KVM_DIRTY_LOG_INITIALLY_SET is not set. This happens when enabling the
  KVM_MEM_LOG_DIRTY_PAGES flag of a memslot, and splits the whole memslot
  into 4K mappings.

- Split huge pages during KVM_CLEAR_DIRTY_LOG when
  KVM_DIRTY_LOG_INITIALLY_SET is set. This happens when enabling dirty log
  in small chunks. It does not split the whole memslot, but only the
  requested chunk range.

Changes in v3:
 - Rebase on v7.1 version.
 - Add patch03 to remove the redundant TLB flush operations.
 
 sashiko-bot AI review
 (https://sashiko.dev/#/message/20260603104847.9692C1F00893%40smtp.kernel.org)
 - Check the kvm->arch.pgd before split huge pages.
 - Align the start address to PMD_SIZE before split.
 - Flushing remote TLBs before Dropping mmu_lock.

Changes in v2:
 - Rename the split_page_cache.
 - Rename the kvm_riscv_split_huge_pages and
   kvm_riscv_split_memory_region.
 - Add lockdep_assert_held check before split huge pages.
 - Update Documentation/admin-guide/kernel-parameters.txt.
 - Link to v2
 https://lore.kernel.org/linux-riscv/20260603175256408L0jnqGs1cJGc0ijCdujci@zte.com.cn/

 - Link to v1:
 https://lore.kernel.org/linux-riscv/20260513153656847l3c4fI5hBsAyoIZi8aGIs@zte.com.cn/

Wang Yechao (5):
  RISC-V: KVM: Add the split page cache for ioctl context
  RISC-V: KVM: Split huge pages when dirty logging is enabled
  RISC-V: KVM: Remove redundant TLB flush operations
  RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG
  RISC-V: KVM: Add the eager_page_split module parameter

 .../admin-guide/kernel-parameters.txt         |  7 +-
 arch/riscv/include/asm/kvm_gstage.h           |  6 +-
 arch/riscv/include/asm/kvm_host.h             |  1 +
 arch/riscv/kvm/gstage.c                       | 15 +--
 arch/riscv/kvm/mmu.c                          | 93 ++++++++++++++++++-
 5 files changed, 106 insertions(+), 16 deletions(-)

-- 
2.43.5

[-- Attachment #1.1.2: Type: text/html , Size: 4375 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 1/5] RISC-V: KVM: Add the split page cache for ioctl context
  2026-06-23 12:13 [PATCH v3 0/5] Implement Eager Page Splitting for RISC-V Wang Yechao
@ 2026-06-23 12:13 ` Wang Yechao
  2026-06-23 12:34   ` sashiko-bot
  2026-06-23 12:13 ` [PATCH v3 2/5] RISC-V: KVM: Split huge pages when dirty logging is enabled Wang Yechao
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Wang Yechao @ 2026-06-23 12:13 UTC (permalink / raw)
  To: Anup Patel, kvm, kvm-riscv, linux-riscv, linux-kernel
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Atish Patra,
	Alexandre Ghiti, Wang Yechao


[-- Attachment #1.1.1: Type: text/plain, Size: 1448 bytes --]

Add the split page cache for dirty logging enablement and the
KVM_CLEAR_DIRTY_LOG ioctl.

Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
---
 arch/riscv/include/asm/kvm_host.h | 1 +
 arch/riscv/kvm/mmu.c              | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index 60017ceec9d2a..69f73fd106a94 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -86,6 +86,7 @@ struct kvm_arch {
 	pgd_t *pgd;
 	phys_addr_t pgd_phys;
 	unsigned long pgd_levels;
+	struct kvm_mmu_memory_cache pgd_split_page_cache;
 
 	/* Guest Timer */
 	struct kvm_guest_timer timer;
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 082f9b2617338..d69fba7d8e9a6 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -676,6 +676,7 @@ int kvm_riscv_mmu_alloc_pgd(struct kvm *kvm)
 	kvm->arch.pgd = page_to_virt(pgd_page);
 	kvm->arch.pgd_phys = page_to_phys(pgd_page);
 	kvm->arch.pgd_levels = kvm_riscv_gstage_max_pgd_levels;
+	kvm->arch.pgd_split_page_cache.gfp_zero = __GFP_ZERO;
 
 	return 0;
 }
@@ -703,6 +704,8 @@ void kvm_riscv_mmu_free_pgd(struct kvm *kvm)
 
 	if (pgd)
 		free_pages((unsigned long)pgd, get_order(kvm_riscv_gstage_pgd_size));
+
+	kvm_mmu_free_memory_cache(&kvm->arch.pgd_split_page_cache);
 }
 
 void kvm_riscv_mmu_update_hgatp(struct kvm_vcpu *vcpu)
-- 
2.43.5

[-- Attachment #1.1.2: Type: text/html , Size: 2599 bytes --]

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 2/5] RISC-V: KVM: Split huge pages when dirty logging is enabled
  2026-06-23 12:13 [PATCH v3 0/5] Implement Eager Page Splitting for RISC-V Wang Yechao
  2026-06-23 12:13 ` [PATCH v3 1/5] RISC-V: KVM: Add the split page cache for ioctl context Wang Yechao
@ 2026-06-23 12:13 ` Wang Yechao
  2026-06-23 12:33   ` sashiko-bot
  2026-06-23 12:13 ` [PATCH v3 3/5] RISC-V: KVM: Remove redundant TLB flush operations Wang Yechao
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Wang Yechao @ 2026-06-23 12:13 UTC (permalink / raw)
  To: Anup Patel, kvm, kvm-riscv, linux-riscv, linux-kernel
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Atish Patra,
	Alexandre Ghiti, Wang Yechao


[-- Attachment #1.1.1: Type: text/plain, Size: 6074 bytes --]

Split huge pages eagerly when enabling dirty logging. The goal is to
avoid doing it while faulting on write-protected pages, which
negatively impacts guest performance.

The benefits of eager page splitting are the same as in x86 and arm64,
added with commit a3fe5dbda0a4 ("KVM: x86/mmu: Split huge pages mapped
by the TDP MMU when dirty logging is enabled") and commit e7bf7a490c68
("KVM: arm64: Split huge pages when dirty logging is enabled")

Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
---
 arch/riscv/include/asm/kvm_gstage.h |  6 +--
 arch/riscv/kvm/gstage.c             | 15 +++---
 arch/riscv/kvm/mmu.c                | 74 +++++++++++++++++++++++++++++
 3 files changed, 86 insertions(+), 9 deletions(-)

diff --git a/arch/riscv/include/asm/kvm_gstage.h b/arch/riscv/include/asm/kvm_gstage.h
index 21e2019df0cf5..f726279780177 100644
--- a/arch/riscv/include/asm/kvm_gstage.h
+++ b/arch/riscv/include/asm/kvm_gstage.h
@@ -64,9 +64,9 @@ int kvm_riscv_gstage_map_page(struct kvm_gstage *gstage,
 			      bool page_rdonly, bool page_exec,
 			      struct kvm_gstage_mapping *out_map);
 
-int kvm_riscv_gstage_split_huge(struct kvm_gstage *gstage,
-				struct kvm_mmu_memory_cache *pcache,
-				gpa_t addr, u32 target_level, bool flush);
+bool kvm_riscv_gstage_split_huge(struct kvm_gstage *gstage,
+				 struct kvm_mmu_memory_cache *pcache,
+				 gpa_t addr, u32 target_level, bool flush);
 
 enum kvm_riscv_gstage_op {
 	GSTAGE_OP_NOP = 0,	/* Nothing */
diff --git a/arch/riscv/kvm/gstage.c b/arch/riscv/kvm/gstage.c
index c4c3b79567f10..1e980b4a08dc5 100644
--- a/arch/riscv/kvm/gstage.c
+++ b/arch/riscv/kvm/gstage.c
@@ -303,19 +303,20 @@ static inline unsigned long make_child_pte(unsigned long huge_pte, int index,
 	return child_pte;
 }
 
-int kvm_riscv_gstage_split_huge(struct kvm_gstage *gstage,
-				struct kvm_mmu_memory_cache *pcache,
-				gpa_t addr, u32 target_level, bool flush)
+bool kvm_riscv_gstage_split_huge(struct kvm_gstage *gstage,
+				 struct kvm_mmu_memory_cache *pcache,
+				 gpa_t addr, u32 target_level, bool flush)
 {
 	u32 current_level = gstage->pgd_levels - 1;
 	pte_t *next_ptep = (pte_t *)gstage->pgd;
 	unsigned long huge_pte, child_pte;
 	unsigned long child_page_size;
+	bool need_flush = false;
 	pte_t *ptep;
 	int i, ret;
 
 	if (!pcache)
-		return -ENOMEM;
+		return false;
 
 	while(current_level > target_level) {
 		ptep = (pte_t *)&next_ptep[gstage_pte_index(gstage, addr, current_level)];
@@ -337,7 +338,7 @@ int kvm_riscv_gstage_split_huge(struct kvm_gstage *gstage,
 
 		next_ptep = kvm_mmu_memory_cache_alloc(pcache);
 		if (!next_ptep)
-			return -ENOMEM;
+			return need_flush;
 
 		for (i = 0; i < PTRS_PER_PTE; i++) {
 			child_pte = make_child_pte(huge_pte, i, child_page_size);
@@ -349,11 +350,13 @@ int kvm_riscv_gstage_split_huge(struct kvm_gstage *gstage,
 
 		if (flush)
 			gstage_tlb_flush(gstage, current_level, addr);
+		else
+			need_flush = true;
 
 		current_level--;
 	}
 
-	return 0;
+	return need_flush;
 }
 
 bool kvm_riscv_gstage_op_pte(struct kvm_gstage *gstage, gpa_t addr,
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index d69fba7d8e9a6..479f7e9bb8f81 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -97,6 +97,60 @@ void kvm_riscv_mmu_iounmap(struct kvm *kvm, gpa_t gpa, unsigned long size)
 					    size >> PAGE_SHIFT);
 }
 
+static bool need_topup_split_caches_or_resched(struct kvm_mmu_memory_cache *cache,
+					       int count)
+{
+	if (need_resched())
+		return true;
+
+	return kvm_mmu_memory_cache_nr_free_objects(cache) < count;
+}
+
+static bool mmu_split_huge_pages(struct kvm_gstage *gstage,
+				 phys_addr_t start, phys_addr_t end)
+{
+	struct kvm *kvm = gstage->kvm;
+	struct kvm_mmu_memory_cache *pcache = &kvm->arch.pgd_split_page_cache;
+	phys_addr_t addr = ALIGN(start, PMD_SIZE);
+	phys_addr_t last_flush_gfn = addr >> PAGE_SHIFT;
+	int count = gstage->pgd_levels;
+	bool flush = false;
+	int ret;
+
+	lockdep_assert_held_write(&kvm->mmu_lock);
+
+	while (addr < end) {
+		if (need_topup_split_caches_or_resched(pcache, count)) {
+			if (flush) {
+				kvm_flush_remote_tlbs_range(kvm, last_flush_gfn,
+					  (addr >> PAGE_SHIFT) - last_flush_gfn);
+				last_flush_gfn = addr >> PAGE_SHIFT;
+				flush = false;
+			}
+
+			write_unlock(&kvm->mmu_lock);
+			cond_resched();
+
+			ret = kvm_mmu_topup_memory_cache(pcache, count);
+			if (ret) {
+				kvm_err("Failed to toup split page cache\n");
+				write_lock(&kvm->mmu_lock);
+				return flush;
+			}
+			write_lock(&kvm->mmu_lock);
+		}
+
+		if (!kvm->arch.pgd)
+			return flush;
+
+		flush |= kvm_riscv_gstage_split_huge(gstage, pcache, addr, 0, false);
+
+		addr += PMD_SIZE;
+	}
+
+	return flush;
+}
+
 void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 					     struct kvm_memory_slot *slot,
 					     gfn_t gfn_offset,
@@ -151,6 +205,25 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 					    size >> PAGE_SHIFT);
 }
 
+static void mmu_split_memory_region(struct kvm *kvm, int slot)
+{
+	struct kvm_memslots *slots = kvm_memslots(kvm);
+	struct kvm_memory_slot *memslot = id_to_memslot(slots, slot);
+	phys_addr_t start = memslot->base_gfn << PAGE_SHIFT;
+	phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
+	struct kvm_gstage gstage;
+	bool flush;
+
+	kvm_riscv_gstage_init(&gstage, kvm);
+
+	write_lock(&kvm->mmu_lock);
+	flush = mmu_split_huge_pages(&gstage, start, end);
+	write_unlock(&kvm->mmu_lock);
+
+	if (flush)
+		kvm_flush_remote_tlbs_memslot(kvm, memslot);
+}
+
 void kvm_arch_commit_memory_region(struct kvm *kvm,
 				struct kvm_memory_slot *old,
 				const struct kvm_memory_slot *new,
@@ -164,6 +237,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 		if (kvm_dirty_log_manual_protect_and_init_set(kvm))
 			return;
 		mmu_wp_memory_region(kvm, new->id);
+		mmu_split_memory_region(kvm, new->id);
 	}
 }
 
-- 
2.43.5

[-- Attachment #1.1.2: Type: text/html , Size: 14980 bytes --]

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 3/5] RISC-V: KVM: Remove redundant TLB flush operations
  2026-06-23 12:13 [PATCH v3 0/5] Implement Eager Page Splitting for RISC-V Wang Yechao
  2026-06-23 12:13 ` [PATCH v3 1/5] RISC-V: KVM: Add the split page cache for ioctl context Wang Yechao
  2026-06-23 12:13 ` [PATCH v3 2/5] RISC-V: KVM: Split huge pages when dirty logging is enabled Wang Yechao
@ 2026-06-23 12:13 ` Wang Yechao
  2026-06-23 12:13 ` [PATCH v3 4/5] RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG Wang Yechao
  2026-06-23 12:13 ` [PATCH v3 5/5] RISC-V: KVM: Add the eager_page_split module parameter Wang Yechao
  4 siblings, 0 replies; 10+ messages in thread
From: Wang Yechao @ 2026-06-23 12:13 UTC (permalink / raw)
  To: Anup Patel, kvm, kvm-riscv, linux-riscv, linux-kernel
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Atish Patra,
	Alexandre Ghiti, Wang Yechao


[-- Attachment #1.1.1: Type: text/plain, Size: 1236 bytes --]

The function kvm_arch_mmu_enable_log_dirty_pt_masked() is called by
kvm_vm_ioctl_reset_dirty_pages() and kvm_clear_dirty_log_protect().
Both callers already perform a TLB flush after enabling dirty logging,
so the TLB flush inside kvm_arch_mmu_enable_log_dirty_pt_masked() is
unnecessary. Remove it.

Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
---
 arch/riscv/kvm/mmu.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index f9c396817e0d8..570c0371b85e8 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -160,14 +160,10 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 	phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
 	phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
 	struct kvm_gstage gstage;
-	bool flush;
 
 	kvm_riscv_gstage_init(&gstage, kvm);
 
-	flush = kvm_riscv_gstage_wp_range(&gstage, start, end);
-	if (flush)
-		kvm_flush_remote_tlbs_range(kvm, start >> PAGE_SHIFT,
-					    (end - start) >> PAGE_SHIFT);
+	kvm_riscv_gstage_wp_range(&gstage, start, end);
 }
 
 void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
-- 
2.43.5

[-- Attachment #1.1.2: Type: text/html , Size: 2304 bytes --]

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 4/5] RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG
  2026-06-23 12:13 [PATCH v3 0/5] Implement Eager Page Splitting for RISC-V Wang Yechao
                   ` (2 preceding siblings ...)
  2026-06-23 12:13 ` [PATCH v3 3/5] RISC-V: KVM: Remove redundant TLB flush operations Wang Yechao
@ 2026-06-23 12:13 ` Wang Yechao
  2026-06-23 12:33   ` sashiko-bot
  2026-06-23 12:13 ` [PATCH v3 5/5] RISC-V: KVM: Add the eager_page_split module parameter Wang Yechao
  4 siblings, 1 reply; 10+ messages in thread
From: Wang Yechao @ 2026-06-23 12:13 UTC (permalink / raw)
  To: Anup Patel, kvm, kvm-riscv, linux-riscv, linux-kernel
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Atish Patra,
	Alexandre Ghiti, Wang Yechao


[-- Attachment #1.1.1: Type: text/plain, Size: 869 bytes --]

Split huge pages on the range specified using KVM_CLEAR_DIRTY_LOG.
And do not split when enabling dirty logging if
KVM_DIRTY_LOG_INITIALLY_SET is set.

Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
Reviewed-by: Anup Patel <anup@brainfault.org>
---
 arch/riscv/kvm/mmu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 570c0371b85e8..5b92b256ae22c 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -164,6 +164,9 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 	kvm_riscv_gstage_init(&gstage, kvm);
 
 	kvm_riscv_gstage_wp_range(&gstage, start, end);
+
+	if (kvm_dirty_log_manual_protect_and_init_set(kvm))
+		mmu_split_huge_pages(&gstage, start, end);
 }
 
 void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
-- 
2.43.5

[-- Attachment #1.1.2: Type: text/html , Size: 1451 bytes --]

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 5/5] RISC-V: KVM: Add the eager_page_split module parameter
  2026-06-23 12:13 [PATCH v3 0/5] Implement Eager Page Splitting for RISC-V Wang Yechao
                   ` (3 preceding siblings ...)
  2026-06-23 12:13 ` [PATCH v3 4/5] RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG Wang Yechao
@ 2026-06-23 12:13 ` Wang Yechao
  2026-06-23 12:31   ` sashiko-bot
  4 siblings, 1 reply; 10+ messages in thread
From: Wang Yechao @ 2026-06-23 12:13 UTC (permalink / raw)
  To: Anup Patel, kvm, kvm-riscv, linux-riscv, linux-kernel
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Atish Patra,
	Alexandre Ghiti, Wang Yechao


[-- Attachment #1.1.1: Type: text/plain, Size: 3069 bytes --]

Add an eager_page_split module parameter for RISC-V KVM, following
the same approach as on x86. This parameter controls whether eager
page splitting is enabled. The default value is on.

When eager page splitting is enabled, KVM proactively splits large
pages (huge pages) into smaller pages when needed for dirty logging
or other operations. Disabling it can be beneficial for VM workloads
that rarely perform writes, or that only write to a small region of
memory, as it allows huge pages to remain intact for read accesses.

Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
---
 Documentation/admin-guide/kernel-parameters.txt |  7 +++++--
 arch/riscv/kvm/mmu.c                            | 13 ++++++++++---
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a68003c3599cc..b4c68a896fa79 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3047,7 +3047,7 @@ Kernel parameters
 			Default is 0 (don't ignore, but inject #GP)
 
 	kvm.eager_page_split=
-			[KVM,X86] Controls whether or not KVM will try to
+			[KVM,X86,RISCV] Controls whether or not KVM will try to
 			proactively split all huge pages during dirty logging.
 			Eager page splitting reduces interruptions to vCPU
 			execution by eliminating the write-protection faults
@@ -3067,7 +3067,10 @@ Kernel parameters
 			the KVM_CLEAR_DIRTY ioctl, and only for the pages being
 			cleared.
 
-			Eager page splitting is only supported when kvm.tdp_mmu=Y.
+			On x86, eager page splitting is only supported when
+			kvm.tdp_mmu=Y.
+
+			On RISCV, eager page splitting is supported by default.
 
 			Default is Y (on).
 
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 5b92b256ae22c..b6b6e7e64db60 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -16,6 +16,9 @@
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_nacl.h>
 
+bool __read_mostly eager_page_split = true;
+module_param(eager_page_split, bool, 0644);
+
 static void mmu_wp_memory_region(struct kvm *kvm, int slot)
 {
 	struct kvm_memslots *slots = kvm_memslots(kvm);
@@ -165,8 +168,10 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 
 	kvm_riscv_gstage_wp_range(&gstage, start, end);
 
-	if (kvm_dirty_log_manual_protect_and_init_set(kvm))
-		mmu_split_huge_pages(&gstage, start, end);
+	if (kvm_dirty_log_manual_protect_and_init_set(kvm)) {
+		if (READ_ONCE(eager_page_split))
+			mmu_split_huge_pages(&gstage, start, end);
+	}
 }
 
 void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
@@ -236,7 +241,9 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 		if (kvm_dirty_log_manual_protect_and_init_set(kvm))
 			return;
 		mmu_wp_memory_region(kvm, new->id);
-		mmu_split_memory_region(kvm, new->id);
+
+		if (READ_ONCE(eager_page_split))
+			mmu_split_memory_region(kvm, new->id);
 	}
 }
 
-- 
2.43.5

[-- Attachment #1.1.2: Type: text/html , Size: 6546 bytes --]

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 5/5] RISC-V: KVM: Add the eager_page_split module parameter
  2026-06-23 12:13 ` [PATCH v3 5/5] RISC-V: KVM: Add the eager_page_split module parameter Wang Yechao
@ 2026-06-23 12:31   ` sashiko-bot
  0 siblings, 0 replies; 10+ messages in thread
From: sashiko-bot @ 2026-06-23 12:31 UTC (permalink / raw)
  To: Wang Yechao; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

New issues:
- [Low] The `eager_page_split` module parameter is declared globally without the `static` keyword, leading to namespace pollution.

Pre-existing issues:
- [Critical] Dropping `mmu_lock` before flushing the TLB for newly write-protected pages allows the guest to modify memory without KVM recording it in the dirty bitmap, leading to silent data corruption during live migration. This problem wasn't introduced by this patch, but it was exposed in the modified code.
--

commit 775425d655f0c265e56c7d5ec6ecfe710e56385a
Author: Wang Yechao <wang.yechao255@zte.com.cn>

RISC-V: KVM: Add the eager_page_split module parameter

This commit introduces the eager_page_split module parameter for RISC-V KVM
to control whether huge pages are proactively split during dirty logging.
Eager page splitting is enabled by default.

[ ... ]

> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index d2d36a9dca06f..89f0dc237cf7b 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
> @@ -16,6 +16,9 @@
>  #include <asm/kvm_mmu.h>
>  #include <asm/kvm_nacl.h>
>  
> +bool __read_mostly eager_page_split = true;
> +module_param(eager_page_split, bool, 0644);

[Severity: Low]
Should this variable be declared with the static keyword?

Because eager_page_split is only used in this file, omitting static pollutes
the global namespace and might cause silent collisions with variables in
other built-in drivers or subsystems. It can also trigger a
-Wmissing-prototypes warning.

[ ... ]

> @@ -165,8 +168,10 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>  
>  	kvm_riscv_gstage_wp_range(&gstage, start, end);
>  
> -	if (kvm_dirty_log_manual_protect_and_init_set(kvm))
> -		mmu_split_huge_pages(&gstage, start, end);
> +	if (kvm_dirty_log_manual_protect_and_init_set(kvm)) {
> +		if (READ_ONCE(eager_page_split))
> +			mmu_split_huge_pages(&gstage, start, end);
> +	}

[Severity: Critical]
This is a pre-existing issue, but does dropping the lock in
mmu_split_huge_pages() without first flushing the TLB allow the guest to
bypass dirty tracking?

During KVM_CLEAR_DIRTY_LOG, the generic KVM core clears the dirty bit in its
bitmap before calling the architecture hook here in
kvm_arch_mmu_enable_log_dirty_pt_masked().

kvm_riscv_gstage_wp_range() makes the KVM page tables read-only, but its
return value (which indicates a TLB flush is needed) is ignored. We then
call mmu_split_huge_pages() which periodically drops kvm->mmu_lock to call
cond_resched():

mmu_split_huge_pages() {
    ...
    if (need_topup_split_caches_or_resched(pcache, count)) {
        ...
        write_unlock(&kvm->mmu_lock);
        cond_resched();
        ...
}

While the lock is dropped, guest vCPUs can still write to memory using
stale, writable TLB entries. Since the dirty bit was already cleared, these
writes silently bypass KVM's dirty tracking and are lost during live
migration.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260623121332.78514-1-wang.yechao255@zte.com.cn?part=5

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 4/5] RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG
  2026-06-23 12:13 ` [PATCH v3 4/5] RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG Wang Yechao
@ 2026-06-23 12:33   ` sashiko-bot
  0 siblings, 0 replies; 10+ messages in thread
From: sashiko-bot @ 2026-06-23 12:33 UTC (permalink / raw)
  To: Wang Yechao; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Critical] Dropping kvm->mmu_lock before flushing TLBs leads to lost dirty page tracking and silent data corruption.
--

commit ea1e9d47d599e24f8216769c511d82126690b4ec
Author: Wang Yechao <wang.yechao255@zte.com.cn>

RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG

Split huge pages on the range specified using KVM_CLEAR_DIRTY_LOG.
And do not split when enabling dirty logging if
KVM_DIRTY_LOG_INITIALLY_SET is set.

> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index 8220efadb8713..d2d36a9dca06f 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
> @@ -164,6 +164,9 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>  	kvm_riscv_gstage_init(&gstage, kvm);
>  
>  	kvm_riscv_gstage_wp_range(&gstage, start, end);
> +
> +	if (kvm_dirty_log_manual_protect_and_init_set(kvm))
> +		mmu_split_huge_pages(&gstage, start, end);

[Severity: Critical]
Does this code inadvertently drop kvm->mmu_lock while there are pending
deferred TLB flushes?

When userspace calls KVM_CLEAR_DIRTY_LOG, virt/kvm/kvm_main.c clears bits
in the software dirty bitmap and calls
kvm_arch_mmu_enable_log_dirty_pt_masked() to write-protect the pages. The
TLB flush is intentionally deferred across the batch loop:

virt/kvm/kvm_main.c:kvm_clear_dirty_log_protect() {
    ...
    kvm_arch_mmu_enable_log_dirty_pt_masked(kvm, memslot, offset, mask);
    ...
}

However, mmu_split_huge_pages() drops the lock:

arch/riscv/kvm/mmu.c:mmu_split_huge_pages() {
    ...
    write_unlock(&kvm->mmu_lock);
    cond_resched();
    ...
}

Because the lock is dropped before the deferred TLB flush completes, guest
vCPUs can resume and write to memory using stale, writable TLB entries.

Since the software dirty bitmap was already cleared and these writes bypass
KVM's write-protection faults, can this result in permanently lost dirty
page tracking and memory corruption during live migration?

>  }
>  
>  void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260623121332.78514-1-wang.yechao255@zte.com.cn?part=4

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 2/5] RISC-V: KVM: Split huge pages when dirty logging is enabled
  2026-06-23 12:13 ` [PATCH v3 2/5] RISC-V: KVM: Split huge pages when dirty logging is enabled Wang Yechao
@ 2026-06-23 12:33   ` sashiko-bot
  0 siblings, 0 replies; 10+ messages in thread
From: sashiko-bot @ 2026-06-23 12:33 UTC (permalink / raw)
  To: Wang Yechao; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Critical] Dropping `mmu_lock` to top up the per-VM `pgd_split_page_cache` introduces a race with `mmu_notifier_release()`, leading to a Use-After-Free.
- [High] Missing smp_wmb() before linking a newly populated page table to the parent PTE allows hardware walkers to observe uninitialized memory.
--

commit 4aa84b3974bf84dfd6f8f8c1dd2e7b3b61d34417
Author: Wang Yechao <wang.yechao255@zte.com.cn>
Subject: RISC-V: KVM: Split huge pages when dirty logging is enabled

Split huge pages eagerly when enabling dirty logging. The goal is to
avoid doing it while faulting on write-protected pages, which
negatively impacts guest performance.

> diff --git a/arch/riscv/kvm/gstage.c b/arch/riscv/kvm/gstage.c
> index c4c3b79567f10..1e980b4a08dc5 100644
> --- a/arch/riscv/kvm/gstage.c
> +++ b/arch/riscv/kvm/gstage.c

[ ... ]

> @@ -337,7 +338,7 @@ int kvm_riscv_gstage_split_huge(struct kvm_gstage *gstage,
>  
>  		next_ptep = kvm_mmu_memory_cache_alloc(pcache);
>  		if (!next_ptep)
> -			return -ENOMEM;
> +			return need_flush;
>  
>  		for (i = 0; i < PTRS_PER_PTE; i++) {
>  			child_pte = make_child_pte(huge_pte, i, child_page_size);
> @@ -349,11 +350,13 @@ int kvm_riscv_gstage_split_huge(struct kvm_gstage *gstage,
>  
>  		if (flush)
>  			gstage_tlb_flush(gstage, current_level, addr);

[Severity: High]
Can hardware page table walkers observe an uninitialized or empty page
directory here?

Because RISC-V utilizes a weakly ordered memory model (RVWMO), the CPU
might reorder the stores to the next_ptep array after the store that links
next_ptep into the parent ptep.

Should we add an smp_wmb() before updating the parent PTE to ensure the
child PTEs are fully visible first?

> +		else
> +			need_flush = true;
>  
>  		current_level--;
>  	}

[ ... ]

> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index d69fba7d8e9a6..479f7e9bb8f81 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c

[ ... ]

> @@ -97,6 +97,60 @@ void kvm_riscv_mmu_iounmap(struct kvm *kvm, gpa_t gpa, unsigned long size)
>  					    size >> PAGE_SHIFT);
>  }
>  
> +static bool need_topup_split_caches_or_resched(struct kvm_mmu_memory_cache *cache,
> +					       int count)
> +{
> +	if (need_resched())
> +		return true;
> +
> +	return kvm_mmu_memory_cache_nr_free_objects(cache) < count;
> +}
> +
> +static bool mmu_split_huge_pages(struct kvm_gstage *gstage,
> +				 phys_addr_t start, phys_addr_t end)
> +{
> +	struct kvm *kvm = gstage->kvm;
> +	struct kvm_mmu_memory_cache *pcache = &kvm->arch.pgd_split_page_cache;
> +	phys_addr_t addr = ALIGN(start, PMD_SIZE);
> +	phys_addr_t last_flush_gfn = addr >> PAGE_SHIFT;
> +	int count = gstage->pgd_levels;
> +	bool flush = false;
> +	int ret;
> +
> +	lockdep_assert_held_write(&kvm->mmu_lock);
> +
> +	while (addr < end) {
> +		if (need_topup_split_caches_or_resched(pcache, count)) {
> +			if (flush) {
> +				kvm_flush_remote_tlbs_range(kvm, last_flush_gfn,
> +					  (addr >> PAGE_SHIFT) - last_flush_gfn);
> +				last_flush_gfn = addr >> PAGE_SHIFT;
> +				flush = false;
> +			}
> +
> +			write_unlock(&kvm->mmu_lock);
> +			cond_resched();
> +
> +			ret = kvm_mmu_topup_memory_cache(pcache, count);

[Severity: Critical]
Does dropping mmu_lock here introduce a race with VM teardown?

If the VMM process terminates concurrently, mmu_notifier_release() could
trigger kvm_riscv_mmu_free_pgd(), which explicitly frees
kvm->arch.pgd_split_page_cache without taking mmu_lock.

Since we call kvm_mmu_topup_memory_cache() on this shared per-VM cache
without holding mmu_lock, could this lead to a use-after-free or memory
corruption on the cache objects array?

> +			if (ret) {
> +				kvm_err("Failed to toup split page cache\n");
> +				write_lock(&kvm->mmu_lock);
> +				return flush;
> +			}
> +			write_lock(&kvm->mmu_lock);
> +		}
> +

[ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260623121332.78514-1-wang.yechao255@zte.com.cn?part=2

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/5] RISC-V: KVM: Add the split page cache for ioctl context
  2026-06-23 12:13 ` [PATCH v3 1/5] RISC-V: KVM: Add the split page cache for ioctl context Wang Yechao
@ 2026-06-23 12:34   ` sashiko-bot
  0 siblings, 0 replies; 10+ messages in thread
From: sashiko-bot @ 2026-06-23 12:34 UTC (permalink / raw)
  To: Wang Yechao; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

New issues:
- [High] Lockless freeing of `pgd_split_page_cache` in `kvm_riscv_mmu_free_pgd()` introduces a structural data race with concurrent KVM ioctls.

Pre-existing issues:
- [Critical] Guest memory fault after `mmu_notifier_release` causes a host kernel panic due to unchecked NULL pgd.
--

commit 4369101385f8b6b3e99f16d6f89e641a21847559
Author: Wang Yechao <wang.yechao255@zte.com.cn>

RISC-V: KVM: Add the split page cache for ioctl context

This commit adds a split page cache for dirty logging enablement and the
KVM_CLEAR_DIRTY_LOG ioctl on RISC-V.

> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index 082f9b2617338..d69fba7d8e9a6 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c

[ ... ]

> @@ -703,6 +704,8 @@ void kvm_riscv_mmu_free_pgd(struct kvm *kvm)
>  
>  	if (pgd)
>  		free_pages((unsigned long)pgd, get_order(kvm_riscv_gstage_pgd_size));
> +
> +	kvm_mmu_free_memory_cache(&kvm->arch.pgd_split_page_cache);

[Severity: High]
Does freeing this cache locklessly introduce a use-after-free race with
concurrent KVM ioctls?

If mmu_notifier_release() is executed by exit_mmap() while a concurrent
process (sharing the KVM fd) executes KVM_CLEAR_DIRTY_LOG, they can race.

KVM_CLEAR_DIRTY_LOG calls mmu_split_huge_pages(), which drops kvm->mmu_lock
and locklessly tops up the cache:

arch/riscv/kvm/mmu.c:mmu_split_huge_pages() {
    ...
            write_unlock(&kvm->mmu_lock);
            cond_resched();

            ret = kvm_mmu_topup_memory_cache(pcache, count);
    ...
}

Concurrently, kvm_riscv_mmu_free_pgd() (invoked by mmu_notifier_release())
takes and releases kvm->mmu_lock, and then locklessly calls
kvm_mmu_free_memory_cache() here, which can free the cache while it is being
accessed by the top-up.

[Severity: Critical]
This is a pre-existing issue, but does setting kvm->arch.pgd to NULL earlier
in kvm_riscv_mmu_free_pgd() expose a host kernel panic if a guest memory
fault occurs afterward?

If mmu_notifier_release() runs (e.g., via exit_mmap()), it sets
kvm->arch.pgd = NULL. If another process holding the KVM fd subsequently
invokes KVM_RUN, a guest memory fault will enter kvm_riscv_mmu_map() and
eventually kvm_riscv_gstage_get_leaf() without checking if pgd is NULL.

In kvm_riscv_gstage_get_leaf(), the NULL pgd is offset and blindly
dereferenced:

arch/riscv/kvm/gstage.c:kvm_riscv_gstage_get_leaf() {
    ...
        ptep = (pte_t *)gstage->pgd;
        ptep = &ptep[gstage_pte_index(gstage, addr, current_level)];
        while (ptep && pte_val(ptep_get(ptep))) {
    ...
}

Can this lead to a host panic via an unprivileged userspace process?

>  }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260623121332.78514-1-wang.yechao255@zte.com.cn?part=1

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-06-23 12:34 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-23 12:13 [PATCH v3 0/5] Implement Eager Page Splitting for RISC-V Wang Yechao
2026-06-23 12:13 ` [PATCH v3 1/5] RISC-V: KVM: Add the split page cache for ioctl context Wang Yechao
2026-06-23 12:34   ` sashiko-bot
2026-06-23 12:13 ` [PATCH v3 2/5] RISC-V: KVM: Split huge pages when dirty logging is enabled Wang Yechao
2026-06-23 12:33   ` sashiko-bot
2026-06-23 12:13 ` [PATCH v3 3/5] RISC-V: KVM: Remove redundant TLB flush operations Wang Yechao
2026-06-23 12:13 ` [PATCH v3 4/5] RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG Wang Yechao
2026-06-23 12:33   ` sashiko-bot
2026-06-23 12:13 ` [PATCH v3 5/5] RISC-V: KVM: Add the eager_page_split module parameter Wang Yechao
2026-06-23 12:31   ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox