All of lore.kernel.org
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: wang.yechao255@zte.com.cn
Cc: kvm@vger.kernel.org
Subject: Re: [PATCH v2 2/4] RISC-V: KVM: Split huge pages when dirty logging is enabled
Date: Wed, 03 Jun 2026 10:34:01 +0000	[thread overview]
Message-ID: <20260603103403.265151F00893@smtp.kernel.org> (raw)
In-Reply-To: <202606031757028642jwiaL-QePBLGAhMW88zw@zte.com.cn>

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

New issues:
- [High] Dropping mmu_lock without flushing remote TLBs after modifying page tables.

Pre-existing issues:
- [High] Missing memory barrier before publishing newly allocated stage-2 child page tables.
--

commit 0208ccce76fbb3c703dc5faa9c4873b1be06a0ce
Author: Wang Yechao <wang.yechao255@zte.com.cn>

RISC-V: KVM: Split huge pages when dirty logging is enabled

Split huge pages eagerly when enabling dirty logging. The goal is to
avoid doing it while faulting on write-protected pages, which
negatively impacts guest performance.

> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index 0676937bd9a1c..51bd1db74e1aa 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
> @@ -90,6 +90,52 @@ void kvm_riscv_mmu_iounmap(struct kvm *kvm, gpa_t gpa, unsigned long size)
[ ... ]
> +/* the caller must held mmu lock */
> +static void mmu_split_huge_pages(struct kvm_gstage *gstage,
> +				 phys_addr_t start, phys_addr_t end,
> +				 bool flush)
> +{
> +	struct kvm *kvm = gstage->kvm;
> +	struct kvm_mmu_memory_cache *pcache = &kvm->arch.pgd_split_page_cache;
> +	int count = gstage->pgd_levels;
> +	phys_addr_t addr = start;
> +	int ret;
> +
> +	lockdep_assert_held(&kvm->mmu_lock);
> +
> +	while (addr < end) {
> +		if (need_topup_split_caches_or_resched(pcache, count)) {
> +			spin_unlock(&kvm->mmu_lock);
> +			cond_resched();

[Severity: High]
Does this drop the MMU lock while there are unflushed page table
modifications?

Because mmu_split_huge_pages() passes flush = false to
kvm_riscv_gstage_split_huge(), TLB flushes are deferred until
mmu_split_memory_region() completes. If cache top-up or rescheduling
is needed, the kvm->mmu_lock is dropped here without flushing the
remote TLBs for the pages that were split in previous loop iterations.

Could this allow other vCPUs to observe inconsistent stage-2 page
tables or execute with stale TLB mappings while the lock is dropped?
Perhaps a remote TLB flush should be issued for the already-processed
range before unlocking?

> +
> +			ret = kvm_mmu_topup_memory_cache(pcache, count);
> +			if (ret) {
> +				kvm_err("Failed to toup split page cache\n");
> +				spin_lock(&kvm->mmu_lock);
> +				return;
> +			}
> +			spin_lock(&kvm->mmu_lock);
> +		}
> +
> +		ret = kvm_riscv_gstage_split_huge(gstage, pcache, addr, 0, flush);

[Severity: High]
This is a pre-existing issue, but looking at kvm_riscv_gstage_split_huge()
which is called here, it appears to lack a memory barrier before publishing
the newly allocated stage-2 child page tables:

arch/riscv/kvm/gstage.c:kvm_riscv_gstage_split_huge() {
...
		for (i = 0; i < PTRS_PER_PTE; i++) {
			child_pte = make_child_pte(huge_pte, i, child_page_size);
			set_pte((pte_t *)&next_ptep[i], __pte(child_pte));
		}

		set_pte(ptep, pfn_pte(PFN_DOWN(__pa(next_ptep)),
				__pgprot(_PAGE_TABLE)));
...
}

Because RISC-V set_pte() translates to a simple WRITE_ONCE without an
implicit memory barrier, could the stores initializing the child PTEs be
reordered after the store that publishes the page table to ptep?

If so, a concurrent hardware page table walker traversing the newly
published ptep might read uninitialized or stale data from the child
page table, resulting in spurious stage-2 page faults. Would it make
sense to insert an smp_wmb() before publishing the new page table?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260603175256408L0jnqGs1cJGc0ijCdujci@zte.com.cn?part=2

  reply	other threads:[~2026-06-03 10:34 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-03  9:52 [PATCH v2 0/4] Implement Eager Page Splitting for RISC-V wang.yechao255
2026-06-03  9:52 ` wang.yechao255
2026-06-03  9:52 ` wang.yechao255
2026-06-03  9:54 ` [PATCH v2 1/4] RISC-V: KVM: Add the split page cache for ioctl context wang.yechao255
2026-06-03  9:54   ` wang.yechao255
2026-06-03  9:54   ` wang.yechao255
2026-06-03 10:20   ` sashiko-bot
2026-06-03  9:57 ` [PATCH v2 2/4] RISC-V: KVM: Split huge pages when dirty logging is enabled wang.yechao255
2026-06-03  9:57   ` wang.yechao255
2026-06-03  9:57   ` wang.yechao255
2026-06-03 10:34   ` sashiko-bot [this message]
2026-06-03  9:58 ` [PATCH v2 3/4] RISC-V: KVM: Split huge pages during KVM_CLEAR_DIRTY_LOG wang.yechao255
2026-06-03  9:58   ` wang.yechao255
2026-06-03  9:58   ` wang.yechao255
2026-06-03 10:48   ` sashiko-bot
2026-06-03 10:00 ` [PATCH v2 4/4] RISC-V: KVM: Add the eager_page_split module parameter wang.yechao255
2026-06-03 10:00   ` wang.yechao255
2026-06-03 10:00   ` wang.yechao255

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260603103403.265151F00893@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=wang.yechao255@zte.com.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.