[PATCH v2 0/4] Implement Eager Page Splitting for RISC-V

Kernel KVM virtualization development
 help / color / mirror / Atom feed

* [PATCH v2 0/4] Implement Eager Page Splitting for RISC-V
@ 2026-06-03  9:52 wang.yechao255
  2026-06-03  9:54 ` [PATCH v2 1/4] RISC-V: KVM: Add the split page cache for ioctl context wang.yechao255
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: wang.yechao255 @ 2026-06-03  9:52 UTC (permalink / raw)
  To: anup, atish.patra, pjw, palmer, aou, alex
  Cc: kvm, kvm-riscv, linux-riscv, linux-kernel, wang.yechao255

From: Wang Yechao <wang.yechao255@zte.com.cn>

Eager Page Splitting is implemented on x86 and ARM. It improves the
performance of dirty logging (used in live migrations) when guest memory
is backed by huge pages.

This series implement Eager Page Splitting for RISC-V. The Implementation
similar to x86 and ARM. It provides two ways to split huge pages in ioctl
context instead of on fault in vCPU context:

- Split huge pages when dirty logging is enabled when
  KVM_DIRTY_LOG_INITIALLY_SET is not set. This happens when enabling the
  KVM_MEM_LOG_DIRTY_PAGES flag of a memslot, and splits the whole memslot
  into 4K mappings.

- Split huge pages during KVM_CLEAR_DIRTY_LOG when
  KVM_DIRTY_LOG_INITIALLY_SET is set. This happens when enabling dirty log
  in small chunks. It does not split the whole memslot, but only the
  requested chunk range.

Changes in v2:
 - Rename the split_page_cache.
 - Rename the kvm_riscv_split_huge_pages and
   kvm_riscv_split_memory_region.
 - Add lockdep_assert_held check before split huge pages.
 - Update Documentation/admin-guide/kernel-parameters.txt.

 - Link to v1:
 https://lore.kernel.org/linux-riscv/20260513153656847l3c4fI5hBsAyoIZi8aGIs@zte.com.cn/

Wang Yechao (4):
  RISC-V: KVM: Add the split page cache for ioctl context
  RISC-V: KVM: Split huge pages when dirty logging is enabled
  RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG
  RISC-V: KVM: Add the eager_page_split module parameter

 .../admin-guide/kernel-parameters.txt         |  7 +-
 arch/riscv/include/asm/kvm_host.h             |  1 +
 arch/riscv/kvm/mmu.c                          | 77 +++++++++++++++++++
 3 files changed, 83 insertions(+), 2 deletions(-)

-- 
2.43.5

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/4] RISC-V: KVM: Add the split page cache for ioctl context
  2026-06-03  9:52 [PATCH v2 0/4] Implement Eager Page Splitting for RISC-V wang.yechao255
@ 2026-06-03  9:54 ` wang.yechao255
  2026-06-03 10:20   ` sashiko-bot
  2026-06-03  9:57 ` [PATCH v2 2/4] RISC-V: KVM: Split huge pages when dirty logging is enabled wang.yechao255
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: wang.yechao255 @ 2026-06-03  9:54 UTC (permalink / raw)
  To: anup, atish.patra, pjw, palmer, aou, alex
  Cc: kvm, kvm-riscv, linux-riscv, linux-kernel, wang.yechao255

From: Wang Yechao <wang.yechao255@zte.com.cn>

Add the split page cache for dirty logging enablement and the
KVM_CLEAR_DIRTY_LOG ioctl.

Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
---
 arch/riscv/include/asm/kvm_host.h | 1 +
 arch/riscv/kvm/mmu.c              | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index 75b0a951c1bc..0ee778f0f086 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -84,6 +84,7 @@ struct kvm_arch {
 	pgd_t *pgd;
 	phys_addr_t pgd_phys;
 	unsigned long pgd_levels;
+	struct kvm_mmu_memory_cache pgd_split_page_cache;

 	/* Guest Timer */
 	struct kvm_guest_timer timer;
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 2d3def024270..0676937bd9a1 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -549,6 +549,7 @@ int kvm_riscv_mmu_alloc_pgd(struct kvm *kvm)
 	kvm->arch.pgd = page_to_virt(pgd_page);
 	kvm->arch.pgd_phys = page_to_phys(pgd_page);
 	kvm->arch.pgd_levels = kvm_riscv_gstage_max_pgd_levels;
+	kvm->arch.pgd_split_page_cache.gfp_zero = __GFP_ZERO;

 	return 0;
 }
@@ -572,6 +573,8 @@ void kvm_riscv_mmu_free_pgd(struct kvm *kvm)

 	if (pgd)
 		free_pages((unsigned long)pgd, get_order(kvm_riscv_gstage_pgd_size));
+
+	kvm_mmu_free_memory_cache(&kvm->arch.pgd_split_page_cache);
 }

 void kvm_riscv_mmu_update_hgatp(struct kvm_vcpu *vcpu)
-- 
2.43.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/4] RISC-V: KVM: Add the split page cache for ioctl context
  2026-06-03  9:54 ` [PATCH v2 1/4] RISC-V: KVM: Add the split page cache for ioctl context wang.yechao255
@ 2026-06-03 10:20   ` sashiko-bot
  0 siblings, 0 replies; 8+ messages in thread
From: sashiko-bot @ 2026-06-03 10:20 UTC (permalink / raw)
  To: wang.yechao255; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

Pre-existing issues:
- [Critical] Destruction of stage-2 PGD during `mmu_notifier_release` allows vCPUs to execute with BARE mode stage-2 translation, granting guests full access to host physical memory
- [High] NULL pointer dereference when enabling dirty logging after MMU destruction
--

commit 5935831ace6c3305e677dc574f5aeeda39b0da2a
Author: Wang Yechao <wang.yechao255@zte.com.cn>

RISC-V: KVM: Add the split page cache for ioctl context

Add the split page cache for dirty logging enablement and the
KVM_CLEAR_DIRTY_LOG ioctl.

[ ... ]

> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index 2d3def024270c..0676937bd9a1c 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c

[ ... ]

> @@ -572,6 +573,8 @@ void kvm_riscv_mmu_free_pgd(struct kvm *kvm)
>  
>  	if (pgd)
>  		free_pages((unsigned long)pgd, get_order(kvm_riscv_gstage_pgd_size));
> +
> +	kvm_mmu_free_memory_cache(&kvm->arch.pgd_split_page_cache);
>  }

[Severity: High]
This is a pre-existing issue, but can
kvm_arch_mmu_enable_log_dirty_pt_masked() trigger a NULL pointer dereference
after the PGD is freed here?

When a process exits, mmu_notifier_release() invokes kvm_riscv_mmu_free_pgd(),
which frees the stage-2 PGD and sets kvm->arch.pgd = NULL and
kvm->arch.pgd_levels = 0.

However, if the KVM VM file descriptor is still held open by another process
or thread, a subsequent KVM_CLEAR_DIRTY_LOG ioctl would call
kvm_arch_mmu_enable_log_dirty_pt_masked() in arch/riscv/kvm/mmu.c.

This unconditionally initializes the gstage and accesses the PGD via
kvm_riscv_gstage_get_leaf(). Since gstage->pgd_levels is 0, current_level
wraps around to UINT_MAX, and it dereferences gstage->pgd (which is NULL)
plus an out-of-bounds index.

>  
>  void kvm_riscv_mmu_update_hgatp(struct kvm_vcpu *vcpu)

[Severity: Critical]
This is a pre-existing issue, but does freeing the stage-2 PGD during
mmu_notifier_release allow vCPUs to execute with BARE mode stage-2
translation?

If the VM file descriptor is kept open after kvm_riscv_mmu_free_pgd() sets
kvm->arch.pgd_phys = 0 and kvm->arch.pgd_levels = 0, another process can still
invoke KVM_RUN.

When the vCPU resumes, kvm_riscv_mmu_update_hgatp() in arch/riscv/kvm/mmu.c
programs the hgatp CSR using kvm_riscv_gstage_mode(0), which evaluates to
BARE mode:

arch/riscv/kvm/mmu.c:kvm_riscv_mmu_update_hgatp() {
	...
	unsigned long hgatp = kvm_riscv_gstage_mode(ka->pgd_levels)
			      << HGATP_MODE_SHIFT;

	hgatp |= (READ_ONCE(ka->vmid.vmid) << HGATP_VMID_SHIFT) & HGATP_VMID;
	hgatp |= (ka->pgd_phys >> PAGE_SHIFT) & HGATP_PPN;

	ncsr_write(CSR_HGATP, hgatp);
	...
}

With BARE mode stage-2 translation, the hardware treats guest physical
addresses as host physical addresses. Could this grant guests full access to
host physical memory?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260603175256408L0jnqGs1cJGc0ijCdujci@zte.com.cn?part=1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 2/4] RISC-V: KVM: Split huge pages when dirty logging is enabled
  2026-06-03  9:52 [PATCH v2 0/4] Implement Eager Page Splitting for RISC-V wang.yechao255
  2026-06-03  9:54 ` [PATCH v2 1/4] RISC-V: KVM: Add the split page cache for ioctl context wang.yechao255
@ 2026-06-03  9:57 ` wang.yechao255
  2026-06-03 10:34   ` sashiko-bot
  2026-06-03  9:58 ` [PATCH v2 3/4] RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG wang.yechao255
  2026-06-03 10:00 ` [PATCH v2 4/4] RISC-V: KVM: Add the eager_page_split module parameter wang.yechao255
  3 siblings, 1 reply; 8+ messages in thread
From: wang.yechao255 @ 2026-06-03  9:57 UTC (permalink / raw)
  To: anup, atish.patra, pjw, palmer, aou, alex
  Cc: kvm, kvm-riscv, linux-riscv, linux-kernel, wang.yechao255

From: Wang Yechao <wang.yechao255@zte.com.cn>

Split huge pages eagerly when enabling dirty logging. The goal is to
avoid doing it while faulting on write-protected pages, which
negatively impacts guest performance.

The benefits of eager page splitting are the same as in x86 and arm64,
added with commit a3fe5dbda0a4 ("KVM: x86/mmu: Split huge pages mapped
by the TDP MMU when dirty logging is enabled") and commit e7bf7a490c68
("KVM: arm64: Split huge pages when dirty logging is enabled")

Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
---
 arch/riscv/kvm/mmu.c | 64 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 0676937bd9a1..51bd1db74e1a 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -90,6 +90,52 @@ void kvm_riscv_mmu_iounmap(struct kvm *kvm, gpa_t gpa, unsigned long size)
 	spin_unlock(&kvm->mmu_lock);
 }

+static bool need_topup_split_caches_or_resched(struct kvm_mmu_memory_cache *cache,
+					       int count)
+{
+	if (need_resched())
+		return true;
+
+	return kvm_mmu_memory_cache_nr_free_objects(cache) < count;
+}
+
+/* the caller must held mmu lock */
+static void mmu_split_huge_pages(struct kvm_gstage *gstage,
+				 phys_addr_t start, phys_addr_t end,
+				 bool flush)
+{
+	struct kvm *kvm = gstage->kvm;
+	struct kvm_mmu_memory_cache *pcache = &kvm->arch.pgd_split_page_cache;
+	int count = gstage->pgd_levels;
+	phys_addr_t addr = start;
+	int ret;
+
+	lockdep_assert_held(&kvm->mmu_lock);
+
+	while (addr < end) {
+		if (need_topup_split_caches_or_resched(pcache, count)) {
+			spin_unlock(&kvm->mmu_lock);
+			cond_resched();
+
+			ret = kvm_mmu_topup_memory_cache(pcache, count);
+			if (ret) {
+				kvm_err("Failed to toup split page cache\n");
+				spin_lock(&kvm->mmu_lock);
+				return;
+			}
+			spin_lock(&kvm->mmu_lock);
+		}
+
+		ret = kvm_riscv_gstage_split_huge(gstage, pcache, addr, 0, flush);
+		if (ret != 0) {
+			kvm_err("split huge page for addr(0x%llx) failed %d\n", addr, ret);
+			break;
+		}
+
+		addr += PMD_SIZE;
+	}
+}
+
 void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 					     struct kvm_memory_slot *slot,
 					     gfn_t gfn_offset,
@@ -136,6 +182,23 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 	spin_unlock(&kvm->mmu_lock);
 }

+static void mmu_split_memory_region(struct kvm *kvm, int slot)
+{
+	struct kvm_memslots *slots = kvm_memslots(kvm);
+	struct kvm_memory_slot *memslot = id_to_memslot(slots, slot);
+	phys_addr_t start = memslot->base_gfn << PAGE_SHIFT;
+	phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
+	struct kvm_gstage gstage;
+
+	kvm_riscv_gstage_init(&gstage, kvm);
+
+	spin_lock(&kvm->mmu_lock);
+	mmu_split_huge_pages(&gstage, start, end, false);
+	spin_unlock(&kvm->mmu_lock);
+
+	kvm_flush_remote_tlbs_memslot(kvm, memslot);
+}
+
 void kvm_arch_commit_memory_region(struct kvm *kvm,
 				struct kvm_memory_slot *old,
 				const struct kvm_memory_slot *new,
@@ -150,6 +213,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 		if (kvm_dirty_log_manual_protect_and_init_set(kvm))
 			return;
 		mmu_wp_memory_region(kvm, new->id);
+		mmu_split_memory_region(kvm, new->id);
 	}
 }

-- 
2.43.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/4] RISC-V: KVM: Split huge pages when dirty logging is enabled
  2026-06-03  9:57 ` [PATCH v2 2/4] RISC-V: KVM: Split huge pages when dirty logging is enabled wang.yechao255
@ 2026-06-03 10:34   ` sashiko-bot
  0 siblings, 0 replies; 8+ messages in thread
From: sashiko-bot @ 2026-06-03 10:34 UTC (permalink / raw)
  To: wang.yechao255; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

New issues:
- [High] Dropping mmu_lock without flushing remote TLBs after modifying page tables.

Pre-existing issues:
- [High] Missing memory barrier before publishing newly allocated stage-2 child page tables.
--

commit 0208ccce76fbb3c703dc5faa9c4873b1be06a0ce
Author: Wang Yechao <wang.yechao255@zte.com.cn>

RISC-V: KVM: Split huge pages when dirty logging is enabled

Split huge pages eagerly when enabling dirty logging. The goal is to
avoid doing it while faulting on write-protected pages, which
negatively impacts guest performance.

> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index 0676937bd9a1c..51bd1db74e1aa 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
> @@ -90,6 +90,52 @@ void kvm_riscv_mmu_iounmap(struct kvm *kvm, gpa_t gpa, unsigned long size)
[ ... ]
> +/* the caller must held mmu lock */
> +static void mmu_split_huge_pages(struct kvm_gstage *gstage,
> +				 phys_addr_t start, phys_addr_t end,
> +				 bool flush)
> +{
> +	struct kvm *kvm = gstage->kvm;
> +	struct kvm_mmu_memory_cache *pcache = &kvm->arch.pgd_split_page_cache;
> +	int count = gstage->pgd_levels;
> +	phys_addr_t addr = start;
> +	int ret;
> +
> +	lockdep_assert_held(&kvm->mmu_lock);
> +
> +	while (addr < end) {
> +		if (need_topup_split_caches_or_resched(pcache, count)) {
> +			spin_unlock(&kvm->mmu_lock);
> +			cond_resched();

[Severity: High]
Does this drop the MMU lock while there are unflushed page table
modifications?

Because mmu_split_huge_pages() passes flush = false to
kvm_riscv_gstage_split_huge(), TLB flushes are deferred until
mmu_split_memory_region() completes. If cache top-up or rescheduling
is needed, the kvm->mmu_lock is dropped here without flushing the
remote TLBs for the pages that were split in previous loop iterations.

Could this allow other vCPUs to observe inconsistent stage-2 page
tables or execute with stale TLB mappings while the lock is dropped?
Perhaps a remote TLB flush should be issued for the already-processed
range before unlocking?

> +
> +			ret = kvm_mmu_topup_memory_cache(pcache, count);
> +			if (ret) {
> +				kvm_err("Failed to toup split page cache\n");
> +				spin_lock(&kvm->mmu_lock);
> +				return;
> +			}
> +			spin_lock(&kvm->mmu_lock);
> +		}
> +
> +		ret = kvm_riscv_gstage_split_huge(gstage, pcache, addr, 0, flush);

[Severity: High]
This is a pre-existing issue, but looking at kvm_riscv_gstage_split_huge()
which is called here, it appears to lack a memory barrier before publishing
the newly allocated stage-2 child page tables:

arch/riscv/kvm/gstage.c:kvm_riscv_gstage_split_huge() {
...
		for (i = 0; i < PTRS_PER_PTE; i++) {
			child_pte = make_child_pte(huge_pte, i, child_page_size);
			set_pte((pte_t *)&next_ptep[i], __pte(child_pte));
		}

		set_pte(ptep, pfn_pte(PFN_DOWN(__pa(next_ptep)),
				__pgprot(_PAGE_TABLE)));
...
}

Because RISC-V set_pte() translates to a simple WRITE_ONCE without an
implicit memory barrier, could the stores initializing the child PTEs be
reordered after the store that publishes the page table to ptep?

If so, a concurrent hardware page table walker traversing the newly
published ptep might read uninitialized or stale data from the child
page table, resulting in spurious stage-2 page faults. Would it make
sense to insert an smp_wmb() before publishing the new page table?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260603175256408L0jnqGs1cJGc0ijCdujci@zte.com.cn?part=2

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 3/4] RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG
  2026-06-03  9:52 [PATCH v2 0/4] Implement Eager Page Splitting for RISC-V wang.yechao255
  2026-06-03  9:54 ` [PATCH v2 1/4] RISC-V: KVM: Add the split page cache for ioctl context wang.yechao255
  2026-06-03  9:57 ` [PATCH v2 2/4] RISC-V: KVM: Split huge pages when dirty logging is enabled wang.yechao255
@ 2026-06-03  9:58 ` wang.yechao255
  2026-06-03 10:48   ` sashiko-bot
  2026-06-03 10:00 ` [PATCH v2 4/4] RISC-V: KVM: Add the eager_page_split module parameter wang.yechao255
  3 siblings, 1 reply; 8+ messages in thread
From: wang.yechao255 @ 2026-06-03  9:58 UTC (permalink / raw)
  To: anup, atish.patra, pjw, palmer, aou, alex
  Cc: kvm, kvm-riscv, linux-riscv, linux-kernel, wang.yechao255

From: Wang Yechao <wang.yechao255@zte.com.cn>

Split huge pages on the range specified using KVM_CLEAR_DIRTY_LOG.
And do not split when enabling dirty logging if
KVM_DIRTY_LOG_INITIALLY_SET is set.

Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
Reviewed-by: Anup Patel <anup@brainfault.org>
---
 arch/riscv/kvm/mmu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 51bd1db74e1a..d04680687f4a 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -149,6 +149,9 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 	kvm_riscv_gstage_init(&gstage, kvm);

 	kvm_riscv_gstage_wp_range(&gstage, start, end);
+
+	if (kvm_dirty_log_manual_protect_and_init_set(kvm))
+		mmu_split_huge_pages(&gstage, start, end, true);
 }

 void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
-- 
2.43.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 3/4] RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG
  2026-06-03  9:58 ` [PATCH v2 3/4] RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG wang.yechao255
@ 2026-06-03 10:48   ` sashiko-bot
  0 siblings, 0 replies; 8+ messages in thread
From: sashiko-bot @ 2026-06-03 10:48 UTC (permalink / raw)
  To: wang.yechao255; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

New issues:
- [Critical] Caching `gstage.pgd` across `mmu_lock` drops in `mmu_split_huge_pages` leads to a Use-After-Free if the page table is concurrently destroyed.
- [High] Eager huge page splitting skips PMDs when the address range is unaligned, leaving huge pages unsplit.

Pre-existing issues:
- [Critical] Non-atomic SPTE updates in `kvm_riscv_gstage_op_pte` clobber concurrent hardware Access/Dirty bit updates.
--

commit 80c4b9b5e74f99371bb99acaee95451275e9865f
Author: Wang Yechao <wang.yechao255@zte.com.cn>

RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG

This commit splits huge pages on the range specified using
KVM_CLEAR_DIRTY_LOG. It also ensures that splitting does not occur when
enabling dirty logging if KVM_DIRTY_LOG_INITIALLY_SET is set.

> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index 51bd1db74e1aa..d04680687f4a1 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
> @@ -149,6 +149,9 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>  	kvm_riscv_gstage_init(&gstage, kvm);
>  
>  	kvm_riscv_gstage_wp_range(&gstage, start, end);

[Severity: Critical]
This is a pre-existing issue, but does kvm_riscv_gstage_wp_range()
non-atomically update PTEs, risking lost hardware A/D bit updates?

When kvm_riscv_gstage_wp_range() calls kvm_riscv_gstage_op_pte(), the
write-protection is done with a read-modify-write:

arch/riscv/kvm/gstage.c:kvm_riscv_gstage_op_pte() {
    ...
    else if (op == GSTAGE_OP_WP)
        set_pte(ptep, __pte(pte_val(ptep_get(ptep)) & ~_PAGE_WRITE));
    ...
}

If the hardware page-table walker sets the Dirty (D) bit after KVM reads the
PTE but before KVM writes it back, will the hardware's update be lost? This
could lead to silent data corruption during live migration since KVM might
miss guest memory writes.

> +
> +	if (kvm_dirty_log_manual_protect_and_init_set(kvm))
> +		mmu_split_huge_pages(&gstage, start, end, true);

[Severity: Critical]
When calling mmu_split_huge_pages(), is it possible to encounter a
use-after-free if the page table is concurrently destroyed?

In mmu_split_huge_pages(), kvm->mmu_lock can be dropped:

arch/riscv/kvm/mmu.c:mmu_split_huge_pages() {
    ...
		if (need_topup_split_caches_or_resched(pcache, count)) {
			spin_unlock(&kvm->mmu_lock);
			cond_resched();
			...
			spin_lock(&kvm->mmu_lock);
		}

		ret = kvm_riscv_gstage_split_huge(gstage, pcache, addr, 0, flush);
    ...
}

While the lock is dropped, what happens if a concurrent
kvm_mmu_notifier_release() acquires mmu_lock and frees the PGD pages? When
mmu_split_huge_pages() reacquires the lock, it continues using the cached
gstage->pgd pointer to access the now-freed page table memory. Should the
KVM PGD be re-validated after re-acquiring the lock?

[Severity: High]
Does this code skip splitting PMDs when the address range is unaligned?

Looking at the loop inside mmu_split_huge_pages():

arch/riscv/kvm/mmu.c:mmu_split_huge_pages() {
    ...
	phys_addr_t addr = start;
    ...
	while (addr < end) {
        ...
		addr += PMD_SIZE;
	}
}

If the starting address is not aligned to PMD_SIZE, addr retains this
unaligned offset. If the range [start, end) spans across a PMD_SIZE
boundary, adding PMD_SIZE to the unaligned addr could make it strictly
greater than or equal to end prematurely. Will this cause the loop to
terminate early and leave the last huge page unsplit?

>  }
>  
>  void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260603175256408L0jnqGs1cJGc0ijCdujci@zte.com.cn?part=3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 4/4] RISC-V: KVM: Add the eager_page_split module parameter
  2026-06-03  9:52 [PATCH v2 0/4] Implement Eager Page Splitting for RISC-V wang.yechao255
                   ` (2 preceding siblings ...)
  2026-06-03  9:58 ` [PATCH v2 3/4] RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG wang.yechao255
@ 2026-06-03 10:00 ` wang.yechao255
  3 siblings, 0 replies; 8+ messages in thread
From: wang.yechao255 @ 2026-06-03 10:00 UTC (permalink / raw)
  To: anup, atish.patra, pjw, palmer, aou, alex, linux-doc
  Cc: kvm, kvm-riscv, linux-riscv, linux-kernel, wang.yechao255

From: Wang Yechao <wang.yechao255@zte.com.cn>

Add an eager_page_split module parameter for RISC-V KVM, following
the same approach as on x86. This parameter controls whether eager
page splitting is enabled. The default value is on.

When eager page splitting is enabled, KVM proactively splits large
pages (huge pages) into smaller pages when needed for dirty logging
or other operations. Disabling it can be beneficial for VM workloads
that rarely perform writes, or that only write to a small region of
memory, as it allows huge pages to remain intact for read accesses.

Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
---
 Documentation/admin-guide/kernel-parameters.txt |  7 +++++--
 arch/riscv/kvm/mmu.c                            | 13 ++++++++++---
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 4d0f545fb3ec..d443b5313b79 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3059,7 +3059,7 @@ Kernel parameters
 			Default is 0 (don't ignore, but inject #GP)

 	kvm.eager_page_split=
-			[KVM,X86] Controls whether or not KVM will try to
+			[KVM,X86,RISCV] Controls whether or not KVM will try to
 			proactively split all huge pages during dirty logging.
 			Eager page splitting reduces interruptions to vCPU
 			execution by eliminating the write-protection faults
@@ -3079,7 +3079,10 @@ Kernel parameters
 			the KVM_CLEAR_DIRTY ioctl, and only for the pages being
 			cleared.

-			Eager page splitting is only supported when kvm.tdp_mmu=Y.
+			On x86, eager page splitting is only supported when
+			kvm.tdp_mmu=Y.
+
+			On RISCV, eager page splitting is supported by default.

 			Default is Y (on).

diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index d04680687f4a..7a910c5a61fe 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -16,6 +16,9 @@
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_nacl.h>

+bool __read_mostly eager_page_split = true;
+module_param(eager_page_split, bool, 0644);
+
 static void mmu_wp_memory_region(struct kvm *kvm, int slot)
 {
 	struct kvm_memslots *slots = kvm_memslots(kvm);
@@ -150,8 +153,10 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,

 	kvm_riscv_gstage_wp_range(&gstage, start, end);

-	if (kvm_dirty_log_manual_protect_and_init_set(kvm))
-		mmu_split_huge_pages(&gstage, start, end, true);
+	if (kvm_dirty_log_manual_protect_and_init_set(kvm)) {
+		if (READ_ONCE(eager_page_split))
+			mmu_split_huge_pages(&gstage, start, end, true);
+	}
 }

 void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
@@ -216,7 +221,9 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 		if (kvm_dirty_log_manual_protect_and_init_set(kvm))
 			return;
 		mmu_wp_memory_region(kvm, new->id);
-		mmu_split_memory_region(kvm, new->id);
+
+		if (READ_ONCE(eager_page_split))
+			mmu_split_memory_region(kvm, new->id);
 	}
 }

-- 
2.43.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-06-03 10:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-03  9:52 [PATCH v2 0/4] Implement Eager Page Splitting for RISC-V wang.yechao255
2026-06-03  9:54 ` [PATCH v2 1/4] RISC-V: KVM: Add the split page cache for ioctl context wang.yechao255
2026-06-03 10:20   ` sashiko-bot
2026-06-03  9:57 ` [PATCH v2 2/4] RISC-V: KVM: Split huge pages when dirty logging is enabled wang.yechao255
2026-06-03 10:34   ` sashiko-bot
2026-06-03  9:58 ` [PATCH v2 3/4] RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG wang.yechao255
2026-06-03 10:48   ` sashiko-bot
2026-06-03 10:00 ` [PATCH v2 4/4] RISC-V: KVM: Add the eager_page_split module parameter wang.yechao255

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox