public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [BUG REPORT] USE_AFTER_FREE in complete_emulated_mmio found by KASAN/Syzkaller fuzz test (v5.10.0)
@ 2026-02-02  1:24 Zhangjiaji
  2026-02-06 23:30 ` Sean Christopherson
  2026-02-10  6:26 ` Paolo Bonzini
  0 siblings, 2 replies; 6+ messages in thread
From: Zhangjiaji @ 2026-02-02  1:24 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org
  Cc: Wangqinxiao (Tom), zhangyashu, wangyanan (Y)

Syzkaller hit 'KASAN: use-after-free Read in complete_emulated_mmio' bug.

==================================================================
BUG: KASAN: use-after-free in complete_emulated_mmio+0x305/0x420
Read of size 1 at addr ffff888009c378d1 by task syz-executor417/984

CPU: 1 PID: 984 Comm: syz-executor417 Not tainted 5.10.0-182.0.0.95.h2627.eulerosv2r13.x86_64 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 Call Trace:
dump_stack+0xbe/0xfd
print_address_description.constprop.0+0x19/0x170
__kasan_report.cold+0x6c/0x84
kasan_report+0x3a/0x50
check_memory_region+0xfd/0x1f0
memcpy+0x20/0x60
complete_emulated_mmio+0x305/0x420
kvm_arch_vcpu_ioctl_run+0x63f/0x6d0
kvm_vcpu_ioctl+0x413/0xb20
__se_sys_ioctl+0x111/0x160
do_syscall_64+0x30/0x40
entry_SYSCALL_64_after_hwframe+0x67/0xd1
RIP: 0033:0x42477d
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007faa8e6890e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00000000004d7338 RCX: 000000000042477d
RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005
RBP: 00000000004d7330 R08: 00007fff28d546df R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004d733c
R13: 0000000000000000 R14: 000000000040a200 R15: 00007fff28d54720

The buggy address belongs to the page:
page:0000000029f6a428 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x9c37
flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
raw: 000fffffc0000000 0000000000000000 ffffea0000270dc8 0000000000000000
raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888009c37780: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff888009c37800: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>ffff888009c37880: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                                 ^
ffff888009c37900: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff888009c37980: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ==================================================================


Syzkaller reproducer:
# {Threaded:true Repeat:true RepeatTimes:0 Procs:1 Slowdown:1 Sandbox: SandboxArg:0 Leak:false NetInjection:false NetDevices:false NetReset:false Cgroups:false BinfmtMisc:false CloseFDs:false KCSAN:false DevlinkPCI:false NicVF:false USB:false VhciInjection:false Wifi:false IEEE802154:false Sysctl:false Swap:false UseTmpDir:false HandleSegv:true Repro:false Trace:false LegacyOptions:{Collide:false Fault:false FaultCall:0 FaultNth:0}}
r0 = openat$kvm(0xffffffffffffff9c, &(0x7f00000001c0), 0x0, 0x0)
r1 = ioctl$KVM_CREATE_VM(r0, 0xae01, 0x0)
r2 = ioctl$KVM_CREATE_VCPU(r1, 0xae41, 0x0) syz_kvm_setup_cpu$x86(r1, r2, &(0x7f0000fe2000/0x18000)=nil, &(0x7f0000000080)=[@text32={0x20, &(0x7f0000000000)="44c8df2020c020c003000000000f22c0671e26653e360f2224660f65b600000000b9e0450200f5e8f5e8f30f1ed6c744240000100000c744240200000000c7442406000000000f011424eacf5700000301b8010000000f01c1", 0x59}], 0x1, 0x27, 0x0, 0x1) ioctl$KVM_RUN(r2, 0xae80, 0x0) ioctl$KVM_SMI(0xffffffffffffffff, 0xaeb7) (async) ioctl$KVM_RUN(r2, 0xae80, 0x0)


----------------------------
Hi,

I've analyzed the Syzkaller output and the complete_emulated_mmio() code path.
The buggy address is created in em_enter(), where it passes its local variable `ulong rbp` to emulate_push(), finally ends in emulator_read_write_onepage() putting the address into vcpu->mmio_fragments[].data .
The bug happens when kvm guest executes an "enter" instruction, and top of the stack crosses the mem page. 
In that case, the em_enter() function cannot complete the instruction within itself, but leave the rest data (which is in the other page) to complete_emulated_mmio().
When complete_emulated_mmio() starts, em_enter() has exited, so local variable `ulong rbp` is also released.
Now complete_emulated_mmio() trys to access vcpu->mmio_fragments[].data , and the bug happened.

any idea?

--
Best regards,
Yashu Zhang




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG REPORT] USE_AFTER_FREE in complete_emulated_mmio found by KASAN/Syzkaller fuzz test (v5.10.0)
  2026-02-02  1:24 [BUG REPORT] USE_AFTER_FREE in complete_emulated_mmio found by KASAN/Syzkaller fuzz test (v5.10.0) Zhangjiaji
@ 2026-02-06 23:30 ` Sean Christopherson
  2026-02-09 20:21   ` Sean Christopherson
  2026-02-10  6:26 ` Paolo Bonzini
  1 sibling, 1 reply; 6+ messages in thread
From: Sean Christopherson @ 2026-02-06 23:30 UTC (permalink / raw)
  To: Zhangjiaji
  Cc: Paolo Bonzini, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Wangqinxiao (Tom), zhangyashu, wangyanan (Y)

On Mon, Feb 02, 2026, Zhangjiaji wrote:
> Syzkaller hit 'KASAN: use-after-free Read in complete_emulated_mmio' bug.
> 
> ==================================================================
> BUG: KASAN: use-after-free in complete_emulated_mmio+0x305/0x420
> Read of size 1 at addr ffff888009c378d1 by task syz-executor417/984
> 
> CPU: 1 PID: 984 Comm: syz-executor417 Not tainted 5.10.0-182.0.0.95.h2627.eulerosv2r13.x86_64 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 Call Trace:
> dump_stack+0xbe/0xfd
> print_address_description.constprop.0+0x19/0x170
> __kasan_report.cold+0x6c/0x84
> kasan_report+0x3a/0x50
> check_memory_region+0xfd/0x1f0
> memcpy+0x20/0x60
> complete_emulated_mmio+0x305/0x420
> kvm_arch_vcpu_ioctl_run+0x63f/0x6d0
> kvm_vcpu_ioctl+0x413/0xb20
> __se_sys_ioctl+0x111/0x160
> do_syscall_64+0x30/0x40
> entry_SYSCALL_64_after_hwframe+0x67/0xd1
> RIP: 0033:0x42477d
> Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007faa8e6890e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 00000000004d7338 RCX: 000000000042477d
> RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005
> RBP: 00000000004d7330 R08: 00007fff28d546df R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004d733c
> R13: 0000000000000000 R14: 000000000040a200 R15: 00007fff28d54720
> 
> The buggy address belongs to the page:
> page:0000000029f6a428 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x9c37
> flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
> raw: 000fffffc0000000 0000000000000000 ffffea0000270dc8 0000000000000000
> raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected
> 
> Memory state around the buggy address:
> ffff888009c37780: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ffff888009c37800: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> >ffff888009c37880: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>                                                  ^
> ffff888009c37900: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ffff888009c37980: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ==================================================================
> 
> 
> Syzkaller reproducer:
> # {Threaded:true Repeat:true RepeatTimes:0 Procs:1 Slowdown:1 Sandbox: SandboxArg:0 Leak:false NetInjection:false NetDevices:false NetReset:false Cgroups:false BinfmtMisc:false CloseFDs:false KCSAN:false DevlinkPCI:false NicVF:false USB:false VhciInjection:false Wifi:false IEEE802154:false Sysctl:false Swap:false UseTmpDir:false HandleSegv:true Repro:false Trace:false LegacyOptions:{Collide:false Fault:false FaultCall:0 FaultNth:0}}
> r0 = openat$kvm(0xffffffffffffff9c, &(0x7f00000001c0), 0x0, 0x0)
> r1 = ioctl$KVM_CREATE_VM(r0, 0xae01, 0x0)
> r2 = ioctl$KVM_CREATE_VCPU(r1, 0xae41, 0x0) syz_kvm_setup_cpu$x86(r1, r2, &(0x7f0000fe2000/0x18000)=nil, &(0x7f0000000080)=[@text32={0x20, &(0x7f0000000000)="44c8df2020c020c003000000000f22c0671e26653e360f2224660f65b600000000b9e0450200f5e8f5e8f30f1ed6c744240000100000c744240200000000c7442406000000000f011424eacf5700000301b8010000000f01c1", 0x59}], 0x1, 0x27, 0x0, 0x1) ioctl$KVM_RUN(r2, 0xae80, 0x0) ioctl$KVM_SMI(0xffffffffffffffff, 0xaeb7) (async) ioctl$KVM_RUN(r2, 0xae80, 0x0)
> 
> 
> ----------------------------
> Hi,
> 
> I've analyzed the Syzkaller output and the complete_emulated_mmio() code
> path.  The buggy address is created in em_enter(), where it passes its local
> variable `ulong rbp` to emulate_push(), finally ends in
> emulator_read_write_onepage() putting the address into
> vcpu->mmio_fragments[].data .  The bug happens when kvm guest executes an
> "enter" instruction, and top of the stack crosses the mem page.  In that
> case, the em_enter() function cannot complete the instruction within itself,
> but leave the rest data (which is in the other page) to
> complete_emulated_mmio().  When complete_emulated_mmio() starts, em_enter()
> has exited, so local variable `ulong rbp` is also released.  Now
> complete_emulated_mmio() trys to access vcpu->mmio_fragments[].data , and the
> bug happened.
> 
> any idea?

Egad, sorry!  I had reproduced this shortly after you sent the report and prepped
a fix, but got distracted and lost this in my inbox.

Can you test this on your end?  I repro'd by modifying a KVM-Unit-Test and hacking
KVM to tweak the stack, so I haven't confirmed the syzkaller version.

It's a bit gross, as it abuses an unused field as scratch space, but AFAICT that's
"fine".  The alternative would be add a dedicated field, which seems like overkill?

I'm also going to try and add a WARN to detect if the @val parameter passed to
emulator_read_write() is ever on the kernel stack, e.g. to help detect lurking
bugs like this one without relying on kasahn.

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index c8e292e9a24d..dacef51c2565 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1897,13 +1897,12 @@ static int em_enter(struct x86_emulate_ctxt *ctxt)
        int rc;
        unsigned frame_size = ctxt->src.val;
        unsigned nesting_level = ctxt->src2.val & 31;
-       ulong rbp;
 
        if (nesting_level)
                return X86EMUL_UNHANDLEABLE;
 
-       rbp = reg_read(ctxt, VCPU_REGS_RBP);
-       rc = emulate_push(ctxt, &rbp, stack_size(ctxt));
+       ctxt->memop.orig_val = reg_read(ctxt, VCPU_REGS_RBP);
+       rc = emulate_push(ctxt, &ctxt->memop.orig_val, stack_size(ctxt));
        if (rc != X86EMUL_CONTINUE)
                return rc;
        assign_masked(reg_rmw(ctxt, VCPU_REGS_RBP), reg_read(ctxt, VCPU_REGS_RSP),

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [BUG REPORT] USE_AFTER_FREE in complete_emulated_mmio found by KASAN/Syzkaller fuzz test (v5.10.0)
  2026-02-06 23:30 ` Sean Christopherson
@ 2026-02-09 20:21   ` Sean Christopherson
  0 siblings, 0 replies; 6+ messages in thread
From: Sean Christopherson @ 2026-02-09 20:21 UTC (permalink / raw)
  To: Zhangjiaji
  Cc: Paolo Bonzini, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Wangqinxiao (Tom), zhangyashu, wangyanan (Y)

On Fri, Feb 06, 2026, Sean Christopherson wrote:
> On Mon, Feb 02, 2026, Zhangjiaji wrote:
> > Syzkaller hit 'KASAN: use-after-free Read in complete_emulated_mmio' bug.
> > 
> > ==================================================================
> > BUG: KASAN: use-after-free in complete_emulated_mmio+0x305/0x420
> > Read of size 1 at addr ffff888009c378d1 by task syz-executor417/984
> > 
> > CPU: 1 PID: 984 Comm: syz-executor417 Not tainted 5.10.0-182.0.0.95.h2627.eulerosv2r13.x86_64 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 Call Trace:
> > dump_stack+0xbe/0xfd
> > print_address_description.constprop.0+0x19/0x170
> > __kasan_report.cold+0x6c/0x84
> > kasan_report+0x3a/0x50
> > check_memory_region+0xfd/0x1f0
> > memcpy+0x20/0x60
> > complete_emulated_mmio+0x305/0x420
> > kvm_arch_vcpu_ioctl_run+0x63f/0x6d0
> > kvm_vcpu_ioctl+0x413/0xb20
> > __se_sys_ioctl+0x111/0x160
> > do_syscall_64+0x30/0x40
> > entry_SYSCALL_64_after_hwframe+0x67/0xd1
> > RIP: 0033:0x42477d

...

> > I've analyzed the Syzkaller output and the complete_emulated_mmio() code
> > path.  The buggy address is created in em_enter(), where it passes its local
> > variable `ulong rbp` to emulate_push(), finally ends in
> > emulator_read_write_onepage() putting the address into
> > vcpu->mmio_fragments[].data .  The bug happens when kvm guest executes an
> > "enter" instruction, and top of the stack crosses the mem page.  In that
> > case, the em_enter() function cannot complete the instruction within itself,
> > but leave the rest data (which is in the other page) to
> > complete_emulated_mmio().  When complete_emulated_mmio() starts, em_enter()
> > has exited, so local variable `ulong rbp` is also released.  Now
> > complete_emulated_mmio() trys to access vcpu->mmio_fragments[].data , and the
> > bug happened.
> > 
> > any idea?
> 
> Egad, sorry!  I had reproduced this shortly after you sent the report and prepped
> a fix, but got distracted and lost this in my inbox.
> 
> Can you test this on your end?  I repro'd by modifying a KVM-Unit-Test and hacking
> KVM to tweak the stack, so I haven't confirmed the syzkaller version.
> 
> It's a bit gross, as it abuses an unused field as scratch space, but AFAICT that's
> "fine".  The alternative would be add a dedicated field, which seems like overkill?
> 
> I'm also going to try and add a WARN to detect if the @val parameter passed to
> emulator_read_write() is ever on the kernel stack, e.g. to help detect lurking
> bugs like this one without relying on kasahn.
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index c8e292e9a24d..dacef51c2565 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -1897,13 +1897,12 @@ static int em_enter(struct x86_emulate_ctxt *ctxt)
>         int rc;
>         unsigned frame_size = ctxt->src.val;
>         unsigned nesting_level = ctxt->src2.val & 31;
> -       ulong rbp;
>  
>         if (nesting_level)
>                 return X86EMUL_UNHANDLEABLE;
>  
> -       rbp = reg_read(ctxt, VCPU_REGS_RBP);
> -       rc = emulate_push(ctxt, &rbp, stack_size(ctxt));
> +       ctxt->memop.orig_val = reg_read(ctxt, VCPU_REGS_RBP);
> +       rc = emulate_push(ctxt, &ctxt->memop.orig_val, stack_size(ctxt));
>         if (rc != X86EMUL_CONTINUE)
>                 return rc;
>         assign_masked(reg_rmw(ctxt, VCPU_REGS_RBP), reg_read(ctxt, VCPU_REGS_RSP),

*sigh*

This isn't going to work.  Or rather, it's far from a complete fix, as there are
several other instructions and flows that use stack variables to service reads
and writes.  Hacking all of them to not use stack variables isn't feasible, as
flows like emulate_iret_real() and em_popa() perform multiple acceses, i.e.
hijacking an unused operand simply won't work.

I'm tempted to go with a straightforward "fix" of rejecting userspace MMIO if
the destination is on the stack, e.g. like so:

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index db3f393192d9..113287612acd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8272,6 +8272,9 @@ static int emulator_read_write(struct x86_emulate_ctxt *ctxt,
        if (!vcpu->mmio_nr_fragments)
                return X86EMUL_CONTINUE;
 
+       if (object_is_on_stack(val))
+               return X86EMUL_UNHANDLEABLE;
+
        gpa = vcpu->mmio_fragments[0].gpa;
 
        vcpu->mmio_needed = 1;

But I hate how arbitrary that is, and I'm somewhat concerned that it could break
existing setups.

I think there's a not-completely-awful solution buried in this gigantic cesspool.
The only time KVM uses on-stack variables is for qword or smaller accesses, i.e.
8 bytes in size or less.  For larger fragments, e.g. AVX to/from MMIO, the target
value will always be an operand in the emulator context.  And so rather than
disallow stack variables, for "small" fragments, we can rework the handling to
copy the value to/from each fragment on-demand instead of stashing a pointer to
the value.

This needs a _lot_ more documentation, the SEV-ES flows aren't converted, and
there is a ton of cleanup that can and should be done on top, but this appears
to work.

Note the "clobber" in complete_emulated_mmio() to fill 4096 bytes of the stack
with 0xaa, i.e. to simulate some of the stack being "freed".

---
 arch/x86/kvm/kvm_emulate.h |   8 +++
 arch/x86/kvm/x86.c         | 125 +++++++++++++++++++++++++++----------
 include/linux/kvm_host.h   |   6 +-
 3 files changed, 103 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h
index fb3dab4b5a53..f735158af05e 100644
--- a/arch/x86/kvm/kvm_emulate.h
+++ b/arch/x86/kvm/kvm_emulate.h
@@ -284,6 +284,14 @@ struct fetch_cache {
 	u8 *end;
 };
 
+/*
+ * To complete userspace MMIO and I/O reads, KVM re-emulates (NO_DECODE) the
+ * _entire_ instruction to propagate the read data to its final destination.
+ * To avoid re-reading values from memory and/or getting "stuck" on the access
+ * that triggered an exit to userspace, KVM caches all values that have been
+ * read for a given instruction, and reads from this cache instead of reading
+ * from guest memory or from userspace.
+ */
 struct read_cache {
 	u8 data[1024];
 	unsigned long pos;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index db3f393192d9..1b99de3e3236 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8109,7 +8109,7 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
 }
 
 struct read_write_emulator_ops {
-	int (*read_write_prepare)(struct kvm_vcpu *vcpu, void *val,
+	int (*read_mmio_fragment)(struct kvm_vcpu *vcpu, void *val,
 				  int bytes);
 	int (*read_write_emulate)(struct kvm_vcpu *vcpu, gpa_t gpa,
 				  void *val, int bytes);
@@ -8120,16 +8120,37 @@ struct read_write_emulator_ops {
 	bool write;
 };
 
-static int read_prepare(struct kvm_vcpu *vcpu, void *val, int bytes)
+static int read_mmio_fragment(struct kvm_vcpu *vcpu, void *val, int bytes)
 {
-	if (vcpu->mmio_read_completed) {
-		trace_kvm_mmio(KVM_TRACE_MMIO_READ, bytes,
-			       vcpu->mmio_fragments[0].gpa, val);
-		vcpu->mmio_read_completed = 0;
-		return 1;
+	struct kvm_mmio_fragment *frag;
+	unsigned int len = bytes;
+
+	while (len && vcpu->mmio_head_fragment < vcpu->mmio_tail_fragment) {
+		int i = vcpu->mmio_head_fragment++;
+
+		if (WARN_ON_ONCE(i >= vcpu->mmio_nr_fragments))
+			break;
+
+		frag = &vcpu->mmio_fragments[i];
+		if (!frag->data) {
+			if (WARN_ON_ONCE(len < frag->len || frag->len > 8u)) {
+				pr_warn("len = %u, bytes = %u, frag->len = %u, gpa = %llx, rip = %lx\n",
+					len, bytes, frag->len, frag->gpa, kvm_rip_read(vcpu));
+				break;
+			}
+
+			memcpy(val, &frag->val, min(8u, frag->len));
+		}
+
+		val += frag->len;
+		len -= min(len, frag->len);
 	}
 
-	return 0;
+	if ((int)len == bytes)
+		return 0;
+
+	trace_kvm_mmio(KVM_TRACE_MMIO_READ, bytes, frag->gpa, val);
+	return 1;
 }
 
 static int read_emulate(struct kvm_vcpu *vcpu, gpa_t gpa,
@@ -8150,6 +8171,11 @@ static int write_mmio(struct kvm_vcpu *vcpu, gpa_t gpa, int bytes, void *val)
 	return vcpu_mmio_write(vcpu, gpa, bytes, val);
 }
 
+static void *mmio_frag_data(struct kvm_mmio_fragment *frag)
+{
+	return frag->data ?: &frag->val;
+}
+
 static int read_exit_mmio(struct kvm_vcpu *vcpu, gpa_t gpa,
 			  void *val, int bytes)
 {
@@ -8162,12 +8188,12 @@ static int write_exit_mmio(struct kvm_vcpu *vcpu, gpa_t gpa,
 {
 	struct kvm_mmio_fragment *frag = &vcpu->mmio_fragments[0];
 
-	memcpy(vcpu->run->mmio.data, frag->data, min(8u, frag->len));
+	memcpy(vcpu->run->mmio.data, mmio_frag_data(frag), min(8u, frag->len));
 	return X86EMUL_CONTINUE;
 }
 
 static const struct read_write_emulator_ops read_emultor = {
-	.read_write_prepare = read_prepare,
+	.read_mmio_fragment = read_mmio_fragment,
 	.read_write_emulate = read_emulate,
 	.read_write_mmio = vcpu_mmio_read,
 	.read_write_exit_mmio = read_exit_mmio,
@@ -8226,7 +8252,13 @@ static int emulator_read_write_onepage(unsigned long addr, void *val,
 	WARN_ON(vcpu->mmio_nr_fragments >= KVM_MAX_MMIO_FRAGMENTS);
 	frag = &vcpu->mmio_fragments[vcpu->mmio_nr_fragments++];
 	frag->gpa = gpa;
-	frag->data = val;
+	if (bytes > 8u) {
+		frag->data = val;
+	} else {
+		frag->data = NULL;
+		if (write)
+			memcpy(&frag->val, val, bytes);
+	}
 	frag->len = bytes;
 	return X86EMUL_CONTINUE;
 }
@@ -8241,11 +8273,16 @@ static int emulator_read_write(struct x86_emulate_ctxt *ctxt,
 	gpa_t gpa;
 	int rc;
 
-	if (ops->read_write_prepare &&
-		  ops->read_write_prepare(vcpu, val, bytes))
+	if (WARN_ON_ONCE(bytes > 8u && object_is_on_stack(val)))
+		return X86EMUL_UNHANDLEABLE;
+
+	if (ops->read_mmio_fragment &&
+	    ops->read_mmio_fragment(vcpu, val, bytes))
 		return X86EMUL_CONTINUE;
 
 	vcpu->mmio_nr_fragments = 0;
+	vcpu->mmio_head_fragment = 0;
+	vcpu->mmio_tail_fragment = 0;
 
 	/* Crossing a page boundary? */
 	if (((addr + bytes - 1) ^ addr) & PAGE_MASK) {
@@ -8275,7 +8312,6 @@ static int emulator_read_write(struct x86_emulate_ctxt *ctxt,
 	gpa = vcpu->mmio_fragments[0].gpa;
 
 	vcpu->mmio_needed = 1;
-	vcpu->mmio_cur_fragment = 0;
 
 	vcpu->run->mmio.len = min(8u, vcpu->mmio_fragments[0].len);
 	vcpu->run->mmio.is_write = vcpu->mmio_is_write = ops->write;
@@ -11832,42 +11868,61 @@ static int complete_emulated_mmio(struct kvm_vcpu *vcpu)
 {
 	struct kvm_run *run = vcpu->run;
 	struct kvm_mmio_fragment *frag;
-	unsigned len;
+	unsigned int len;
+	u8 clobber[4096];
 
-	BUG_ON(!vcpu->mmio_needed);
+	memset(clobber, 0xaa, sizeof(clobber));
+
+	if (WARN_ON_ONCE(!vcpu->mmio_needed))
+		return -EIO;
+
+	/* Complete MMIO for the current fragment. */
+	frag = &vcpu->mmio_fragments[vcpu->mmio_tail_fragment];
 
-	/* Complete previous fragment */
-	frag = &vcpu->mmio_fragments[vcpu->mmio_cur_fragment];
 	len = min(8u, frag->len);
 	if (!vcpu->mmio_is_write)
-		memcpy(frag->data, run->mmio.data, len);
+		memcpy(mmio_frag_data(frag), run->mmio.data, len);
 
 	if (frag->len <= 8) {
 		/* Switch to the next fragment. */
 		frag++;
-		vcpu->mmio_cur_fragment++;
+		vcpu->mmio_tail_fragment++;
 	} else {
+		if (WARN_ON_ONCE(!frag->data))
+			return -EIO;
+	
 		/* Go forward to the next mmio piece. */
 		frag->data += len;
 		frag->gpa += len;
 		frag->len -= len;
 	}
 
-	if (vcpu->mmio_cur_fragment >= vcpu->mmio_nr_fragments) {
+	if (vcpu->mmio_tail_fragment >= vcpu->mmio_nr_fragments) {
 		vcpu->mmio_needed = 0;
+		WARN_ON_ONCE(vcpu->mmio_head_fragment);
 
-		/* FIXME: return into emulator if single-stepping.  */
-		if (vcpu->mmio_is_write)
+		/*
+		 * Don't re-emulate the instruction for MMIO writes, as KVM has
+		 * already committed all side effects (and the emulator simply
+		 * isn't equi)
+		 *
+		 * FIXME: Return into emulator if single-stepping.
+		 */
+		if (vcpu->mmio_is_write) {
+			vcpu->mmio_tail_fragment = 0;
 			return 1;
-		vcpu->mmio_read_completed = 1;
+		}
+
 		return complete_emulated_io(vcpu);
 	}
 
+	len = min(8u, frag->len);
+
 	run->exit_reason = KVM_EXIT_MMIO;
 	run->mmio.phys_addr = frag->gpa;
 	if (vcpu->mmio_is_write)
-		memcpy(run->mmio.data, frag->data, min(8u, frag->len));
-	run->mmio.len = min(8u, frag->len);
+		memcpy(run->mmio.data, mmio_frag_data(frag), len);
+	run->mmio.len = len;
 	run->mmio.is_write = vcpu->mmio_is_write;
 	vcpu->arch.complete_userspace_io = complete_emulated_mmio;
 	return 0;
@@ -14247,15 +14302,15 @@ static int complete_sev_es_emulated_mmio(struct kvm_vcpu *vcpu)
 	BUG_ON(!vcpu->mmio_needed);
 
 	/* Complete previous fragment */
-	frag = &vcpu->mmio_fragments[vcpu->mmio_cur_fragment];
+	frag = &vcpu->mmio_fragments[vcpu->mmio_tail_fragment];
 	len = min(8u, frag->len);
-	if (!vcpu->mmio_is_write)
+	if (!vcpu->mmio_is_write && frag->data)
 		memcpy(frag->data, run->mmio.data, len);
 
 	if (frag->len <= 8) {
 		/* Switch to the next fragment. */
 		frag++;
-		vcpu->mmio_cur_fragment++;
+		vcpu->mmio_tail_fragment++;
 	} else {
 		/* Go forward to the next mmio piece. */
 		frag->data += len;
@@ -14263,7 +14318,7 @@ static int complete_sev_es_emulated_mmio(struct kvm_vcpu *vcpu)
 		frag->len -= len;
 	}
 
-	if (vcpu->mmio_cur_fragment >= vcpu->mmio_nr_fragments) {
+	if (vcpu->mmio_tail_fragment >= vcpu->mmio_nr_fragments) {
 		vcpu->mmio_needed = 0;
 
 		// VMG change, at this point, we're always done
@@ -14303,13 +14358,14 @@ int kvm_sev_es_mmio_write(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned int bytes,
 
 	/*TODO: Check if need to increment number of frags */
 	frag = vcpu->mmio_fragments;
-	vcpu->mmio_nr_fragments = 1;
 	frag->len = bytes;
 	frag->gpa = gpa;
 	frag->data = data;
 
 	vcpu->mmio_needed = 1;
-	vcpu->mmio_cur_fragment = 0;
+	vcpu->mmio_nr_fragments = 1;
+	vcpu->mmio_head_fragment = 0;
+	vcpu->mmio_tail_fragment = 0;
 
 	vcpu->run->mmio.phys_addr = gpa;
 	vcpu->run->mmio.len = min(8u, frag->len);
@@ -14342,13 +14398,14 @@ int kvm_sev_es_mmio_read(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned int bytes,
 
 	/*TODO: Check if need to increment number of frags */
 	frag = vcpu->mmio_fragments;
-	vcpu->mmio_nr_fragments = 1;
 	frag->len = bytes;
 	frag->gpa = gpa;
 	frag->data = data;
 
 	vcpu->mmio_needed = 1;
-	vcpu->mmio_cur_fragment = 0;
+	vcpu->mmio_nr_fragments = 1;
+	vcpu->mmio_head_fragment = 0;
+	vcpu->mmio_tail_fragment = 0;
 
 	vcpu->run->mmio.phys_addr = gpa;
 	vcpu->run->mmio.len = min(8u, frag->len);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 782f4d670793..be4b9de5b8c9 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -320,7 +320,8 @@ static inline bool kvm_vcpu_can_poll(ktime_t cur, ktime_t stop)
 struct kvm_mmio_fragment {
 	gpa_t gpa;
 	void *data;
-	unsigned len;
+	u64 val;
+	unsigned int len;
 };
 
 struct kvm_vcpu {
@@ -356,7 +357,8 @@ struct kvm_vcpu {
 	int mmio_needed;
 	int mmio_read_completed;
 	int mmio_is_write;
-	int mmio_cur_fragment;
+	int mmio_head_fragment;
+	int mmio_tail_fragment;
 	int mmio_nr_fragments;
 	struct kvm_mmio_fragment mmio_fragments[KVM_MAX_MMIO_FRAGMENTS];
 #endif

base-commit: e944fe2c09f405a2e2d147145c9b470084bc4c9a
--

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [BUG REPORT] USE_AFTER_FREE in complete_emulated_mmio found by KASAN/Syzkaller fuzz test (v5.10.0)
  2026-02-02  1:24 [BUG REPORT] USE_AFTER_FREE in complete_emulated_mmio found by KASAN/Syzkaller fuzz test (v5.10.0) Zhangjiaji
  2026-02-06 23:30 ` Sean Christopherson
@ 2026-02-10  6:26 ` Paolo Bonzini
  2026-02-10 14:35   ` Sean Christopherson
  1 sibling, 1 reply; 6+ messages in thread
From: Paolo Bonzini @ 2026-02-10  6:26 UTC (permalink / raw)
  To: Zhangjiaji
  Cc: Sean Christopherson, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, Wangqinxiao (Tom), zhangyashu,
	wangyanan (Y)


> I've analyzed the Syzkaller output and the complete_emulated_mmio() code path.
> The buggy address is created in em_enter(), where it passes its local variable `ulong rbp` to emulate_push(), finally ends in emulator_read_write_onepage() putting the address into vcpu->mmio_fragments[].data .
> The bug happens when kvm guest executes an "enter" instruction, and top of the stack crosses the mem page.
> In that case, the em_enter() function cannot complete the instruction within itself, but leave the rest data (which is in the other page) to complete_emulated_mmio().
> When complete_emulated_mmio() starts, em_enter() has exited, so local variable `ulong rbp` is also released.
> Now complete_emulated_mmio() trys to access vcpu->mmio_fragments[].data , and the bug happened.
>
> any idea?

Ouch, the bug is certainly legit.  The easiest way to fix it is something
like this:

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index c8e292e9a24d..1c8698139dd5 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1905,7 +1905,7 @@ static int em_enter(struct x86_emulate_ctxt *ctxt)
  	rbp = reg_read(ctxt, VCPU_REGS_RBP);
  	rc = emulate_push(ctxt, &rbp, stack_size(ctxt));
  	if (rc != X86EMUL_CONTINUE)
-		return rc;
+		return X86EMUL_UNHANDLEABLE;
  	assign_masked(reg_rmw(ctxt, VCPU_REGS_RBP), reg_read(ctxt, VCPU_REGS_RSP),
  		      stack_mask(ctxt));
   	assign_masked(reg_rmw(ctxt, VCPU_REGS_RSP),

The hard part is auditing all similar places that lead to passing a
local variable to emulator_read_write_onepage().

For example I found this one:

	write_segment_descriptor(ctxt, old_tss_sel, &curr_tss_desc);

which however is caught at kvm_task_switch() and changed to an
emulation error.

It may be a good idea to stick a defensive "vcpu->mmio_needed = false;"
in prepare_emulation_failure_exit(), as well as
"vcpu->arch.complete_userspace_io = NULL" both there and in
kvm_task_switch().

Paolo


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [BUG REPORT] USE_AFTER_FREE in complete_emulated_mmio found by KASAN/Syzkaller fuzz test (v5.10.0)
  2026-02-10  6:26 ` Paolo Bonzini
@ 2026-02-10 14:35   ` Sean Christopherson
  2026-02-10 17:41     ` Paolo Bonzini
  0 siblings, 1 reply; 6+ messages in thread
From: Sean Christopherson @ 2026-02-10 14:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Zhangjiaji, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Wangqinxiao (Tom), zhangyashu, wangyanan (Y)

On Tue, Feb 10, 2026, Paolo Bonzini wrote:
> 
> > I've analyzed the Syzkaller output and the complete_emulated_mmio() code path.
> > The buggy address is created in em_enter(), where it passes its local variable `ulong rbp` to emulate_push(), finally ends in emulator_read_write_onepage() putting the address into vcpu->mmio_fragments[].data .
> > The bug happens when kvm guest executes an "enter" instruction, and top of the stack crosses the mem page.
> > In that case, the em_enter() function cannot complete the instruction within itself, but leave the rest data (which is in the other page) to complete_emulated_mmio().
> > When complete_emulated_mmio() starts, em_enter() has exited, so local variable `ulong rbp` is also released.
> > Now complete_emulated_mmio() trys to access vcpu->mmio_fragments[].data , and the bug happened.
> > 
> > any idea?
> 
> Ouch, the bug is certainly legit.  The easiest way to fix it is something
> like this:
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index c8e292e9a24d..1c8698139dd5 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -1905,7 +1905,7 @@ static int em_enter(struct x86_emulate_ctxt *ctxt)
>  	rbp = reg_read(ctxt, VCPU_REGS_RBP);
>  	rc = emulate_push(ctxt, &rbp, stack_size(ctxt));
>  	if (rc != X86EMUL_CONTINUE)
> -		return rc;
> +		return X86EMUL_UNHANDLEABLE;

This won't do anything, rc == X86EMUL_CONTINUE when MMIO is needed.  The check
would need to be something equivalent to this hack:

	rc = emulate_push(ctxt, &rbp, stack_size(ctxt));
	if (((struct kvm_vcpu *)ctxt->vcpu)->mmio_needed)
		return X86EMUL_UNHANDLEABLE;

but that's largely a moot point.


>  	assign_masked(reg_rmw(ctxt, VCPU_REGS_RBP), reg_read(ctxt, VCPU_REGS_RSP),
>  		      stack_mask(ctxt));
>   	assign_masked(reg_rmw(ctxt, VCPU_REGS_RSP),
> 
> The hard part is auditing all similar places that lead to passing a
> local variable to emulator_read_write_onepage().
> 
> For example I found this one:
> 
> 	write_segment_descriptor(ctxt, old_tss_sel, &curr_tss_desc);
> 
> which however is caught at kvm_task_switch() and changed to an
> emulation error.
> 
> It may be a good idea to stick a defensive "vcpu->mmio_needed = false;"
> in prepare_emulation_failure_exit(), as well as
> "vcpu->arch.complete_userspace_io = NULL" both there and in
> kvm_task_switch().

Please see my other reply[*].  There are a pile of instructions and flows that
read/write using on-stack variables.  IMO, anything that relies on updating
callers is a non-starter, and I don't love the idea of simply disallowing MMIO
for any instruction that uses an on-stack variable.

[*] https://lore.kernel.org/all/aYpBz1wDdsDfl8Al@google.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG REPORT] USE_AFTER_FREE in complete_emulated_mmio found by KASAN/Syzkaller fuzz test (v5.10.0)
  2026-02-10 14:35   ` Sean Christopherson
@ 2026-02-10 17:41     ` Paolo Bonzini
  0 siblings, 0 replies; 6+ messages in thread
From: Paolo Bonzini @ 2026-02-10 17:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Zhangjiaji, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Wangqinxiao (Tom), zhangyashu, wangyanan (Y)

On 2/10/26 15:35, Sean Christopherson wrote:
> On Tue, Feb 10, 2026, Paolo Bonzini wrote:
>>
>>> I've analyzed the Syzkaller output and the complete_emulated_mmio() code path.
>>> The buggy address is created in em_enter(), where it passes its local variable `ulong rbp` to emulate_push(), finally ends in emulator_read_write_onepage() putting the address into vcpu->mmio_fragments[].data .
>>> The bug happens when kvm guest executes an "enter" instruction, and top of the stack crosses the mem page.
>>> In that case, the em_enter() function cannot complete the instruction within itself, but leave the rest data (which is in the other page) to complete_emulated_mmio().
>>> When complete_emulated_mmio() starts, em_enter() has exited, so local variable `ulong rbp` is also released.
>>> Now complete_emulated_mmio() trys to access vcpu->mmio_fragments[].data , and the bug happened.
>>>
>>> any idea?
>>
>> Ouch, the bug is certainly legit.  The easiest way to fix it is something
>> like this:
>>
>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>> index c8e292e9a24d..1c8698139dd5 100644
>> --- a/arch/x86/kvm/emulate.c
>> +++ b/arch/x86/kvm/emulate.c
>> @@ -1905,7 +1905,7 @@ static int em_enter(struct x86_emulate_ctxt *ctxt)
>>   	rbp = reg_read(ctxt, VCPU_REGS_RBP);
>>   	rc = emulate_push(ctxt, &rbp, stack_size(ctxt));
>>   	if (rc != X86EMUL_CONTINUE)
>> -		return rc;
>> +		return X86EMUL_UNHANDLEABLE;
> 
> This won't do anything, rc == X86EMUL_CONTINUE when MMIO is needed.

Yeah, I was thinking of X86EMUL_IO_NEEDED but that's only for reads.

Paolo


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-02-10 17:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-02  1:24 [BUG REPORT] USE_AFTER_FREE in complete_emulated_mmio found by KASAN/Syzkaller fuzz test (v5.10.0) Zhangjiaji
2026-02-06 23:30 ` Sean Christopherson
2026-02-09 20:21   ` Sean Christopherson
2026-02-10  6:26 ` Paolo Bonzini
2026-02-10 14:35   ` Sean Christopherson
2026-02-10 17:41     ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox