[PATCH] KVM: Reject wrapped offset in kvm_reset_dirty

Kernel KVM virtualization development
 help / color / mirror / Atom feed

* [PATCH] KVM: Reject wrapped offset in kvm_reset_dirty_gfn()
       [not found] <agFyH8UoHO5Q3Fta@1wt.eu>
@ 2026-05-12  6:07 ` Aaron Sacks
  2026-05-12 19:56   ` Paolo Bonzini
  0 siblings, 1 reply; 2+ messages in thread
From: Aaron Sacks @ 2026-05-12  6:07 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: security, Paolo Bonzini, kvm

kvm_reset_dirty_gfn() guards the gfn range with

	if (!memslot || (offset + __fls(mask)) >= memslot->npages)
		return;

but offset is u64 and the addition is unchecked.  The check can be
silently bypassed by a u64 wrap.

The dirty ring backing those entries is MAP_SHARED at
KVM_DIRTY_LOG_PAGE_OFFSET of the vcpu fd, so the VMM can rewrite the
slot and offset fields of any entry between when the kernel pushes
them and when KVM_RESET_DIRTY_RINGS consumes them.  On reset,
kvm_dirty_ring_reset() re-reads the values via READ_ONCE() and feeds
them straight back into this check; only the flags handshake is
treated as the handover, the slot/offset payload is taken on trust.

Crafting two entries

	entry[i].offset   = 0xffffffffffffffc1
	entry[i+1].offset = 0

makes the coalescing loop in kvm_dirty_ring_reset() compute

	delta = (s64)(0 - 0xffffffffffffffc1) = 63

which falls in [0, BITS_PER_LONG), so it folds entry[i+1] into the
existing mask by setting bit 63.  The trailing kvm_reset_dirty_gfn()
call then sees offset = 0xffffffffffffffc1 and __fls(mask) = 63;
the sum is 0 in u64 and the bounds check passes.

That offset propagates into kvm_arch_mmu_enable_log_dirty_pt_masked()
unchanged.  On the legacy MMU path -- kvm_memslots_have_rmaps() ==
true, i.e. shadow paging, any VM that has allocated shadow roots, or
a write-tracked slot -- it reaches gfn_to_rmap(), which indexes
slot->arch.rmap[0][] with a near-U64_MAX gfn.  That is an
out-of-bounds load of a kvm_rmap_head, followed by a conditional
clear of PT_WRITABLE_MASK in whatever the loaded pointer points at.
The path is reachable from any process holding /dev/kvm.

Range-check offset on its own first, so the addition cannot wrap.
memslot->npages is bounded well below U64_MAX, so once offset <
npages holds, offset + __fls(mask) (with __fls(mask) < BITS_PER_LONG)
stays in range.

Fixes: fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking")
Cc: stable@vger.kernel.org
Signed-off-by: Aaron Sacks <contact@xchglabs.com>
---
Hi Willy,

Thanks for the review.  Re-sending this one as a proper patch and
addressing your points below.  I will reply separately to the
other three reports per your guidance there.

> Please could you check on an up-to-date kernel? 6.13.7 is long
> dead (1 year) and thousands of fixes were merged since.

Re-verified on v7.0.6 (current stable). The buggy bounds check
in virt/kvm/dirty_ring.c is unchanged:

    if (!memslot || (offset + __fls(mask)) >= memslot->npages)
        return;

Same PoC still produces the oops:

    Comm: poc Not tainted 7.0.6 #1
    BUG: unable to handle page fault for address: ffffa9bbc002ce08
    RIP: 0010:rmap_write_protect+0x6/0xf0
    RAX: ffffffffffffffc1   RBX: 8000000000000001
    Call Trace:
     kvm_arch_mmu_enable_log_dirty_pt_masked+0x145/0x210
     kvm_reset_dirty_gfn+0xcd/0x100
     kvm_dirty_ring_reset+0x12c/0x1f0
     kvm_vm_ioctl+0xb41/0x10b0
    Kernel panic - not syncing: Fatal exception

RAX is the crafted wrapped offset; RBX is the coalesced mask with
bit 63 set (__fls == 63).  Both match the values planted in the
ring entries from userspace, so the wrapped sum still passes the
existing bounds check on v7.0.6.

> Please also wrap long lines

Done -- this reply and the patch are at <=72 cols.

> Was all of this generated by an LLM?

The reports were generated by an LLM yes. But the analysis, PoC
and patch suggestions are mine. Happy to provide the PoC source,
kernel .config, or additional artifacts.

> Could you please turn this one into a real patch

Patch follows. Applies clean on v7.0.6; with it applied, the
same trigger no longer oopses (offset is rejected by the added
offset >= npages check before the addition can wrap).

Thanks,
Aaron

 virt/kvm/dirty_ring.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index 02bc6b00d..572b854ed 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -63,7 +63,8 @@ static void kvm_reset_dirty_gfn(struct kvm *kvm, u32 slot, u64 offset, u64 mask)

 	memslot = id_to_memslot(__kvm_memslots(kvm, as_id), id);

-	if (!memslot || (offset + __fls(mask)) >= memslot->npages)
+	if (!memslot || offset >= memslot->npages ||
+	    offset + __fls(mask) >= memslot->npages)
 		return;

 	KVM_MMU_LOCK(kvm);
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] KVM: Reject wrapped offset in kvm_reset_dirty_gfn()
  2026-05-12  6:07 ` [PATCH] KVM: Reject wrapped offset in kvm_reset_dirty_gfn() Aaron Sacks
@ 2026-05-12 19:56   ` Paolo Bonzini
  0 siblings, 0 replies; 2+ messages in thread
From: Paolo Bonzini @ 2026-05-12 19:56 UTC (permalink / raw)
  To: Aaron Sacks, Willy Tarreau; +Cc: security, kvm

On 5/12/26 08:07, Aaron Sacks wrote:
> kvm_reset_dirty_gfn() guards the gfn range with
> 
> 	if (!memslot || (offset + __fls(mask)) >= memslot->npages)
> 		return;
> 
> but offset is u64 and the addition is unchecked.  The check can be
> silently bypassed by a u64 wrap.
> 
> The dirty ring backing those entries is MAP_SHARED at
> KVM_DIRTY_LOG_PAGE_OFFSET of the vcpu fd, so the VMM can rewrite the
> slot and offset fields of any entry between when the kernel pushes
> them and when KVM_RESET_DIRTY_RINGS consumes them.  On reset,
> kvm_dirty_ring_reset() re-reads the values via READ_ONCE() and feeds
> them straight back into this check; only the flags handshake is
> treated as the handover, the slot/offset payload is taken on trust.
> 
> Crafting two entries
> 
> 	entry[i].offset   = 0xffffffffffffffc1
> 	entry[i+1].offset = 0
> 
> makes the coalescing loop in kvm_dirty_ring_reset() compute
> 
> 	delta = (s64)(0 - 0xffffffffffffffc1) = 63
> 
> which falls in [0, BITS_PER_LONG), so it folds entry[i+1] into the
> existing mask by setting bit 63.  The trailing kvm_reset_dirty_gfn()
> call then sees offset = 0xffffffffffffffc1 and __fls(mask) = 63;
> the sum is 0 in u64 and the bounds check passes.
> 
> That offset propagates into kvm_arch_mmu_enable_log_dirty_pt_masked()
> unchanged.  On the legacy MMU path -- kvm_memslots_have_rmaps() ==
> true, i.e. shadow paging, any VM that has allocated shadow roots, or
> a write-tracked slot -- it reaches gfn_to_rmap(), which indexes
> slot->arch.rmap[0][] with a near-U64_MAX gfn.  That is an
> out-of-bounds load of a kvm_rmap_head, followed by a conditional
> clear of PT_WRITABLE_MASK in whatever the loaded pointer points at.
> The path is reachable from any process holding /dev/kvm.
> 
> Range-check offset on its own first, so the addition cannot wrap.
> memslot->npages is bounded well below U64_MAX, so once offset <
> npages holds, offset + __fls(mask) (with __fls(mask) < BITS_PER_LONG)
> stays in range.
> 
> Fixes: fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking")
> Cc: stable@vger.kernel.org
> Signed-off-by: Aaron Sacks <contact@xchglabs.com>
> ---
> Hi Willy,
> 
> Thanks for the review.  Re-sending this one as a proper patch and
> addressing your points below.  I will reply separately to the
> other three reports per your guidance there.
> 
>> Please could you check on an up-to-date kernel? 6.13.7 is long
>> dead (1 year) and thousands of fixes were merged since.
> 
> Re-verified on v7.0.6 (current stable). The buggy bounds check
> in virt/kvm/dirty_ring.c is unchanged:
> 
>      if (!memslot || (offset + __fls(mask)) >= memslot->npages)
>          return;
> 
> Same PoC still produces the oops:
> 
>      Comm: poc Not tainted 7.0.6 #1
>      BUG: unable to handle page fault for address: ffffa9bbc002ce08
>      RIP: 0010:rmap_write_protect+0x6/0xf0
>      RAX: ffffffffffffffc1   RBX: 8000000000000001
>      Call Trace:
>       kvm_arch_mmu_enable_log_dirty_pt_masked+0x145/0x210
>       kvm_reset_dirty_gfn+0xcd/0x100
>       kvm_dirty_ring_reset+0x12c/0x1f0
>       kvm_vm_ioctl+0xb41/0x10b0
>      Kernel panic - not syncing: Fatal exception
> 
> RAX is the crafted wrapped offset; RBX is the coalesced mask with
> bit 63 set (__fls == 63).  Both match the values planted in the
> ring entries from userspace, so the wrapped sum still passes the
> existing bounds check on v7.0.6.
> 
>> Please also wrap long lines
> 
> Done -- this reply and the patch are at <=72 cols.
> 
>> Was all of this generated by an LLM?
> 
> The reports were generated by an LLM yes. But the analysis, PoC
> and patch suggestions are mine. Happy to provide the PoC source,
> kernel .config, or additional artifacts.
> 
>> Could you please turn this one into a real patch
> 
> Patch follows. Applies clean on v7.0.6; with it applied, the
> same trigger no longer oopses (offset is rejected by the added
> offset >= npages check before the addition can wrap).

Applied, thanks.

Paolo


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-05-12 19:56 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <agFyH8UoHO5Q3Fta@1wt.eu>
2026-05-12  6:07 ` [PATCH] KVM: Reject wrapped offset in kvm_reset_dirty_gfn() Aaron Sacks
2026-05-12 19:56   ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox