Re: [PATCH v2 08/13] KVM: arm64: Add hardware-accelerated dirty-bitmap cleaning routine

Kernel KVM virtualization development
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Leonardo Bras" <leo.bras@arm.com>
Cc: kvmarm@lists.linux.dev, "Marc Zyngier" <maz@kernel.org>,
	"Oliver Upton" <oupton@kernel.org>,
	kvm@vger.kernel.org
Subject: Re: [PATCH v2 08/13] KVM: arm64: Add hardware-accelerated dirty-bitmap cleaning routine
Date: Mon, 29 Jun 2026 11:45:01 +0000	[thread overview]
Message-ID: <20260629114501.A6A801F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260629111820.1873540-9-leo.bras@arm.com>

Thank you for your contribution! Sashiko AI review found 10 potential issue(s) to consider:
- [High] Missing `isb()` context synchronization event after writing to the `HCR_EL2` control-plane system register.
- [High] Host kernel directly accesses EL2 system registers, which will crash nVHE configurations.
- [High] Incorrect indexing of the userspace-provided `bitmap` buffer leads to out-of-bounds memory reads.
- [High] Dropping `kvm->mmu_lock` before pending HACDBS entries are flushed creates a TOCTOU race window that permanently loses guest writes.
- [High] Off-by-one error in calculating `split_end` causes KVM to mistakenly skip huge page splitting for subsequent PMDs, leading to data loss.
- [High] Flawed return value logic in `__kvm_arch_dirty_log_clear()` incorrectly returns `-EAGAIN` on exact-multiple success or a positive integer on general success.
- [High] The `wfi()` loop waits indefinitely with `preempt_disable()` and `kvm->mmu_lock` held in write mode without masking local interrupts, risking lost wakeups and hard lockups.
- [High] Ignoring the return value of `kvm_mmu_split_huge_pages()` can permanently break dirty logging for huge pages.
- [High] Dereferencing `kvm->arch.mmu.pgt` without checking if it is NULL leads to a kernel panic.
- [Low] A variable declaration immediately follows a `case` label, resulting in a build failure.
--

commit 8c1efc438f0d50c036b9548bfa7efea74d90a446
Author: Leonardo Bras <leo.bras@arm.com>

KVM: arm64: Add hardware-accelerated dirty-bitmap cleaning routine

This patch implements an arm64-specific kvm_arch_dirty_log_clear() using the
FEAT_HACDBS hardware feature. It converts dirty bitmap entries into HDBSS
structures and feeds them to the hardware to clean the page tables. It also
splits huge pages in chunks if manual protect and init set options are
enabled.

> diff --git a/arch/arm64/include/asm/kvm_dirty_bit.h b/arch/arm64/include/asm/kvm_dirty_bit.h
> --- a/arch/arm64/include/asm/kvm_dirty_bit.h
> +++ b/arch/arm64/include/asm/kvm_dirty_bit.h

[ ... ]

> +static inline bool kvm_arch_dirty_clear_enabled(struct kvm *kvm)
> +{
> +	return this_cpu_read(hacdbs_pcp.status) == HACDBS_IDLE &&
> +	       (kvm->arch.mmu.pgt->flags & KVM_PGTABLE_S2_DBM);
> +}

[Severity: High]
Can this cause a kernel panic if invoked before the MMU is fully initialized?
The stage-2 page tables (pgt) are allocated lazily, so if userspace calls
the clear dirty log ioctl before initialization, kvm->arch.mmu.pgt could be
NULL.

> diff --git a/arch/arm64/kvm/dirty_bit.c b/arch/arm64/kvm/dirty_bit.c
> --- a/arch/arm64/kvm/dirty_bit.c
> +++ b/arch/arm64/kvm/dirty_bit.c

[ ... ]

> +int __kvm_arch_dirty_log_clear(struct kvm *kvm,
> +			       struct kvm_memory_slot *memslot,
> +			       struct kvm_clear_dirty_log *log,
> +			       unsigned long *bitmap,
> +			       bool *flush)
> +{

[ ... ]

> +	for (unsigned long i = start; i < end; i++) {
> +		unsigned long mask;
> +		gfn_t offset;
> +		atomic_long_t *p;
> +
> +		if (log) { /* Clean only what is in the input bitmap */
> +			mask = bitmap[i];

[Severity: High]
Does this code read out of bounds of the userspace-provided bitmap?
When a dirty log is provided, the bitmap is sized relative to log->num_pages.
Since i is an absolute index into the guest's memory slot (starting at
log->first_page / BITS_PER_LONG), accessing bitmap[i] uses the absolute 
offset instead of a 0-based relative offset.

> +			if (!mask)
> +				continue;

[ ... ]

> +		offset = base_gfn + i * BITS_PER_LONG;
> +
> +		if (kvm_dirty_log_manual_protect_and_init_set(kvm))
> +			kvm_mmu_split_huge_pages(kvm,
> +						 gfn_to_gpa(offset + __ffs(mask)),
> +						 gfn_to_gpa(offset + __fls(mask) + 1));

[Severity: High]
Does this create a race condition that loses guest writes?
The software dirty bitmap is cleared at the start of this loop. If
kvm_mmu_split_huge_pages() is called here, it drops and reacquires
kvm->mmu_lock to top up the memory cache. Any guest write that occurs
while the lock is dropped is recorded in the hardware PTE, but because
the software bit was already cleared, the subsequent hardware execution
clears the PTE's dirty bit, erasing the concurrent write.

[Severity: High]
Can this permanently break dirty logging for huge pages if splitting fails?
The return value of kvm_mmu_split_huge_pages() is ignored. If it fails
(e.g., returning -ENOMEM), the block is not split, but the IPAs are
still fed to the hardware. The hardware will then clear the dirty bit on
the 2MB block descriptor, losing dirty tracking for the entire region.

[Severity: High]
Is there an off-by-one error in the split_end calculation?
Looking at the split_end assignment in the full context:

__kvm_arch_dirty_log_clear() {
    ...
    split_end = gpa_to_gfn(ALIGN_DOWN(end, PMD_SIZE) + PMD_SIZE - 1);
    ...
}

Since end is an exclusive upper bound, if the dirty region perfectly aligns
with a PMD boundary, this calculates a split_end that spans to the end of
the next PMD. This would cause the next PMD to skip splitting, leading to
data loss when the hardware clears the dirty bit on the unsplit block.

> +		do {
> +			idx = mask_to_hdbss(&mask, hw_entries, offset, ttwl, idx, entries_sz);
> +			if (idx >= entries_sz) {
> +				ret = dirty_bit_clear(kvm, hw_entries, idx);

[Severity: High]
Is there a missing context synchronization event in dirty_bit_clear()?
When this calls dirty_bit_clear(), that function writes to the HCR_EL2
control-plane system register and immediately calls __load_stage2():

dirty_bit_clear() {
    ...
    hcr_el2 = read_sysreg(HCR_EL2);
    write_sysreg(hcr_el2 | HCR_EL2_VM, HCR_EL2);
    __load_stage2(&kvm->arch.mmu);
    ...
}

The architecture requires an isb() as the very next instruction after
writing a control-plane system register. Without it, the CPU executes
subsequent instructions in an architecturally undefined pipeline state.

[Severity: High]
Does this code access EL2 system registers directly from the host kernel?
In nVHE configurations, the host kernel runs at EL1. The dirty_bit_clear()
function directly reads and writes HCR_EL2. This will trigger an Undefined 
Instruction exception and panic the host kernel on nVHE systems. KVM must 
use EL2-private hyp code or hypercalls to access EL2 state outside of VHE 
mode.

[Severity: High]
Does the wait loop in dirty_bit_clear() risk a lost wakeup and hard lockup?
The code disables preemption but fails to disable local interrupts before
calling wfi():

dirty_bit_clear() {
    preempt_disable();
    ...
    do {
        wfi();
    } while (this_cpu_read(hacdbs_pcp.status) == HACDBS_RUNNING);
    ...
}

If the hardware interrupt fires and sets the status to idle after the
condition is checked but before the wfi() instruction executes, the CPU
will sleep indefinitely.

> +				*flush = *flush || ret > 0;
> +				if (ret != idx) {
> +					/* Save bits not converted back to bitmap */
> +					atomic_long_or(mask, (atomic_long_t *)&dirty_bitmap[i]);
> +					goto out_err;
> +				}
> +				idx = 0;
> +			}
> +		} while (mask);
> +	}

[ ... ]

> +out_err:
> +	if (unlikely(ret != idx)) {
> +		/*
> +		 * In case there is an error and not all entries in HACDBS get
> +		 * cleaned, we have to mark the dirty bits back in the bitmap,
> +		 * as that will be used by the software routine.
> +		 *
> +		 * Entries should be in order, since they were extraxed from
> +		 * the dirty-bitmap, so batching the atomic writes is efficient.
> +		 */
> +
> +		if (ret < idx)
> +			hdbss_to_bitmap(hw_entries, ret, idx, dirty_bitmap, memslot->base_gfn);
> +
> +		ret = -EAGAIN;
> +	}
> +
> +	write_unlock(&kvm->mmu_lock);
> +	kfree(hw_entries);
> +
> +	return ret;
> +}

[Severity: High]
Does this error handling block incorrectly treat exact-multiple success states
as failures?
If the processed entries perfectly divide by entries_sz, the loop resets
idx = 0 and sets ret = entries_sz. The check if (unlikely(ret != idx))
then evaluates to true, falsely triggering this error path and returning
-EAGAIN.

Additionally, if it is not an exact multiple, the function exits with
ret = idx (a positive integer), which the generic KVM ioctl layer treats
as a failure code.

>  static irqreturn_t hacdbsirq_handler(int irq, void *pcpu)
>  {
>  	u64 cons = read_sysreg_s(SYS_HACDBSCONS_EL2);

[Severity: High]
Does this handler also directly access an EL2-only register from the host?
Just like dirty_bit_clear(), calling read_sysreg_s(SYS_HACDBSCONS_EL2)
from the host IRQ handler running at EL1 will crash nVHE configurations
with an Undefined Instruction exception.

[Severity: Low]
Is there a syntax error in this IRQ handler's switch statement?
Looking at the rest of hacdbsirq_handler():

hacdbsirq_handler() {
    ...
    case HACDBSCONS_EL2_ERR_REASON_IPAHACF:
        int index = ...;
    ...
}

A variable declaration immediately following a case label is a syntax
error in C, which will cause a build failure.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260629111820.1873540-1-leo.bras@arm.com?part=8

next prev parent reply	other threads:[~2026-06-29 11:45 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-29 11:17 [PATCH v2 00/13] KVM Dirty-bit cleaning hw accelerator (HACDBS) Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 01/13] KVM: arm64: HDBSS bits Leonardo Bras
2026-06-29 11:34   ` sashiko-bot
2026-06-29 12:57     ` Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 02/13] KVM: arm64: Enable eager hugepage splitting if HDBSS is available Leonardo Bras
2026-06-29 11:36   ` sashiko-bot
2026-06-29 14:47     ` Leonardo Bras
2026-06-29 17:06       ` Oliver Upton
2026-06-30 12:58         ` Leonardo Bras
2026-06-30 15:44           ` Oliver Upton
2026-06-30 17:09             ` Leonardo Bras
2026-06-30 18:43               ` Oliver Upton
2026-06-29 11:17 ` [PATCH v2 03/13] arm64/cpufeature: Add system-wide FEAT_HACDBS detection Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 04/13] arm64/sysreg: Add HACDBS consumer and base registers Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 05/13] KVM: arm64: Detect (via ACPI) and initialize HACDBSIRQ Leonardo Bras
2026-06-29 11:32   ` sashiko-bot
2026-06-29 15:43     ` Leonardo Bras
2026-06-29 16:52       ` Vladimir Murzin
2026-06-30 14:52         ` Leonardo Bras
2026-06-29 17:22   ` Oliver Upton
2026-06-30 14:50     ` Leonardo Bras
2026-06-30 16:03       ` Oliver Upton
2026-06-30 17:19         ` Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 06/13] KVM: arm64: dirty_bit: Add base FEAT_HACDBS cleaning routine Leonardo Bras
2026-06-29 11:29   ` sashiko-bot
2026-06-29 15:54     ` Leonardo Bras
2026-06-29 17:36   ` Oliver Upton
2026-06-30 14:59     ` Leonardo Bras
2026-06-30 19:06       ` Oliver Upton
2026-06-29 11:17 ` [PATCH v2 07/13] kvm: Add arch-generic interface for hw-accelerated dirty-bitmap cleaning Leonardo Bras
2026-06-29 11:38   ` sashiko-bot
2026-06-29 16:07     ` Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 08/13] KVM: arm64: Add hardware-accelerated dirty-bitmap cleaning routine Leonardo Bras
2026-06-29 11:45   ` sashiko-bot [this message]
2026-06-29 16:49     ` Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 09/13] KVM: arm64: Dirty-bitmap: avoid splitting previously split blocks Leonardo Bras
2026-06-29 11:39   ` sashiko-bot
2026-06-29 17:07     ` Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 10/13] kvm/dirty_ring: Introduce get_memslot and move helpers to header Leonardo Bras
2026-06-29 11:17 ` [PATCH v2 11/13] kvm/dirty_ring: Add arch-generic interface for hw-accelerated dirty-ring cleaning Leonardo Bras
2026-06-29 11:49   ` sashiko-bot
2026-06-29 17:09     ` Leonardo Bras
2026-06-29 11:18 ` [PATCH v2 12/13] KVM: arm64: Add hardware-accelerated dirty-ring cleaning routine Leonardo Bras
2026-06-29 11:49   ` sashiko-bot
2026-06-29 17:26     ` Leonardo Bras
2026-06-29 11:18 ` [PATCH v2 13/13] KVM: arm64: Enable KVM_HW_DIRTY_BIT Leonardo Bras
2026-06-29 11:52   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260629114501.A6A801F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=leo.bras@arm.com \
    --cc=maz@kernel.org \
    --cc=oupton@kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox