* Re: [BUG REPORT] USE_AFTER_FREE in complete_emulated_mmio found by KASAN/Syzkaller fuzz test (v5.10.0) [not found] ` <aZYneb7Dvuu-HQsP@google.com> @ 2026-05-08 7:57 ` Xinyu Zheng 2026-05-08 14:25 ` Sean Christopherson 0 siblings, 1 reply; 3+ messages in thread From: Xinyu Zheng @ 2026-05-08 7:57 UTC (permalink / raw) To: Sean Christopherson, Zhangjiaji Cc: Paolo Bonzini, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Wangqinxiao (Tom), zhangyashu, wangyanan (Y), zouyipeng, wangyanan55 On 2/19/2026 4:56 AM, Sean Christopherson wrote: > On Tue, Feb 10, 2026, Sean Christopherson wrote: >> On Tue, Feb 10, 2026, Zhangjiaji wrote: >>>> I think there's a not-completely-awful solution buried in this gigantic cesspool. >>>> The only time KVM uses on-stack variables is for qword or smaller accesses, i.e. >>>> 8 bytes in size or less. For larger fragments, e.g. AVX to/from MMIO, the target >>>> value will always be an operand in the emulator context. And so rather than >>>> disallow stack variables, for "small" fragments, we can rework the handling to >>>> copy the value to/from each fragment on-demand instead of stashing a pointer to >>>> the value. >>> >>> Since we can store the frag->val in struct kvm_mmio_fragment, >>> why not just point frag->data to it? This Way we can save a lot code about >>> (frag->data == NULL). >> >> It's not quite that simple, because we need to handle reads as well. >> >>> Though this patch will block any read-into-stack calls, we can add a special path >>> in function emulator_read_write handling feasible read-into-stack calls -- the >>> target is released just after emulator_read_write returns. >>> >>> --- >>> arch/x86/kvm/x86.c | 9 ++++++++- >>> include/linux/kvm_host.h | 3 ++- >>> 2 files changed, 10 insertions(+), 2 deletions(-) >>> >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >>> index 72d37c8930ad..12d53d441a39 100644 >>> --- a/arch/x86/kvm/x86.c >>> +++ b/arch/x86/kvm/x86.c >>> @@ -8197,7 +8197,14 @@ static int emulator_read_write_onepage(unsigned long addr, void *val, >>> WARN_ON(vcpu->mmio_nr_fragments >= KVM_MAX_MMIO_FRAGMENTS); >>> frag = &vcpu->mmio_fragments[vcpu->mmio_nr_fragments++]; >>> frag->gpa = gpa; >>> - frag->data = val; >>> + if (bytes > 8u || ! write) { >>> + if (WARN_ON_ONCE(object_is_on_stack(val))) >> >> This is user-triggerable, e.g. em_popa(), em_pop_sreg(), emulate_iret_real(), >> em_ret_near_imm(), em_ret_far(), and em_ret(). > > *sigh* > > And I was wrong. I finally sat down to write some comments for all of this, and > realized that reads _never_ pass an on-stack @val to emulator_read_write_onepage(), > because read_emulated() always buffers reads through ctxt->mem_read. > > So not only is my fancy, complex code unnecessary, it's actively broken. If a > read splits a page boundary, and the first page is NOT emulated MMIO, trying to > fulfill the read on-demand falls apart because the @val points at the start of > the operand (technically its cache "entry"). I'm sure that's a solvable problem, > but I don't see any point in manufacturing a problem in the first place. > > I need to write a changelog, but as Yashu suggested, the fix can more simply be: > > -- > From: Sean Christopherson <seanjc@google.com> > Date: Tue, 10 Feb 2026 09:45:37 -0800 > Subject: [PATCH 01/14] KVM: x86: Use scratch field in MMIO fragment to hold > small write values > > Fixes: f78146b0f923 ("KVM: Fix page-crossing MMIO") > Suggested-by: Yashu Zhang <zhangjiaji1@huawei.com> > Reported-by: Yashu Zhang <zhangjiaji1@huawei.com> > Closes: https://lore.kernel.org/all/369eaaa2b3c1425c85e8477066391bc7@huawei.com > Cc: stable@vger.kernel.org > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > arch/x86/kvm/x86.c | 14 +++++++++++++- > include/linux/kvm_host.h | 3 ++- > 2 files changed, 15 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index db3f393192d9..ff3a6f86973f 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -8226,7 +8226,13 @@ static int emulator_read_write_onepage(unsigned long addr, void *val, > WARN_ON(vcpu->mmio_nr_fragments >= KVM_MAX_MMIO_FRAGMENTS); > frag = &vcpu->mmio_fragments[vcpu->mmio_nr_fragments++]; > frag->gpa = gpa; > - frag->data = val; > + if (write && bytes <= 8u) { > + frag->val = 0; > + frag->data = &frag->val; > + memcpy(&frag->val, val, bytes); > + } else { > + frag->data = val; > + } > frag->len = bytes; > return X86EMUL_CONTINUE; > } > @@ -8241,6 +8247,9 @@ static int emulator_read_write(struct x86_emulate_ctxt *ctxt, > gpa_t gpa; > int rc; > > + if (WARN_ON_ONCE((bytes > 8u || !ops->write) && object_is_on_stack(val))) > + return X86EMUL_UNHANDLEABLE; > + > if (ops->read_write_prepare && > ops->read_write_prepare(vcpu, val, bytes)) > return X86EMUL_CONTINUE; > @@ -11847,6 +11856,9 @@ static int complete_emulated_mmio(struct kvm_vcpu *vcpu) > frag++; > vcpu->mmio_cur_fragment++; > } else { > + if (WARN_ON_ONCE(frag->data == &frag->val)) > + return -EIO; > + > /* Go forward to the next mmio piece. */ > frag->data += len; > frag->gpa += len; > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 2c7d76262898..0bb2a34fb93d 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -320,7 +320,8 @@ static inline bool kvm_vcpu_can_poll(ktime_t cur, ktime_t stop) > struct kvm_mmio_fragment { > gpa_t gpa; > void *data; > - unsigned len; > + u64 val; Hi, Jiayi and Sean, Since I met a KABI consistence break problem from this change, I am finding a way to avoid add including kvm_mmio_fragment.val. Can I try to directly malloc a 8 size buffer for kvm_mmio_fragment.data instead of using kvm_mmio_fragment.val, and free this buffer in complete_emulated_mmio when all fragments is been copied? Thanks! > + unsigned int len; > }; > > struct kvm_vcpu { > > base-commit: 183bb0ce8c77b0fd1fb25874112bc8751a461e49 > -- -- Xinyu ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [BUG REPORT] USE_AFTER_FREE in complete_emulated_mmio found by KASAN/Syzkaller fuzz test (v5.10.0) 2026-05-08 7:57 ` [BUG REPORT] USE_AFTER_FREE in complete_emulated_mmio found by KASAN/Syzkaller fuzz test (v5.10.0) Xinyu Zheng @ 2026-05-08 14:25 ` Sean Christopherson 2026-05-09 1:55 ` Xinyu 0 siblings, 1 reply; 3+ messages in thread From: Sean Christopherson @ 2026-05-08 14:25 UTC (permalink / raw) To: Xinyu Zheng Cc: Zhangjiaji, Paolo Bonzini, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Wangqinxiao (Tom), zhangyashu, wangyanan (Y), zouyipeng On Fri, May 08, 2026, Xinyu Zheng wrote: > On 2/19/2026 4:56 AM, Sean Christopherson wrote: > > On Tue, Feb 10, 2026, Sean Christopherson wrote: > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > index 2c7d76262898..0bb2a34fb93d 100644 > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -320,7 +320,8 @@ static inline bool kvm_vcpu_can_poll(ktime_t cur, ktime_t stop) > > struct kvm_mmio_fragment { > > gpa_t gpa; > > void *data; > > - unsigned len; > > + u64 val; > > Hi, Jiayi and Sean, > > Since I met a KABI consistence break problem from this change, I am finding > a way to avoid add including kvm_mmio_fragment.val. I assume you're looking for a solution for a private/proprietary kernel? I.e. not trying to figure out a solution for an upstream LTS kernel? > Can I try to directly malloc a 8 size buffer for kvm_mmio_fragment.data > instead of using kvm_mmio_fragment.val, and free this buffer in > complete_emulated_mmio when all fragments is been copied? I highly doubt that will work, because you'd still need to stash the pointer somewhere. And it pretty much would have to be somewhere in kvm_vcpu, which would likely mean a change in KABI. FWIW, freeing the allocation in complete_emulated_mmio() wouldn't suffice; you'd also need to free the memory on vCPU destruction, because there's no guarantee userspace would complete KVM_RUN. You should be able to use the padding in the "kvm_run.s". Thanks to s390's massive regs size, there's a huge amount of unused space in the union on x86. Note, because there can be two fragments in-flight, you'd need to index the array using the correct fragment number. Userspace can scribble the value, but that's completely irrelevant from a host safety perspective. diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 6c8afa2047bf..29c7123d5467 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -515,7 +515,10 @@ struct kvm_run { __u64 kvm_valid_regs; __u64 kvm_dirty_regs; union { - struct kvm_sync_regs regs; + struct { + struct kvm_sync_regs regs; + u64 x86_mmio_val[2]; + }; char padding[SYNC_REGS_SIZE_BYTES]; } s; }; ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [BUG REPORT] USE_AFTER_FREE in complete_emulated_mmio found by KASAN/Syzkaller fuzz test (v5.10.0) 2026-05-08 14:25 ` Sean Christopherson @ 2026-05-09 1:55 ` Xinyu 0 siblings, 0 replies; 3+ messages in thread From: Xinyu @ 2026-05-09 1:55 UTC (permalink / raw) To: Sean Christopherson Cc: Zhangjiaji, Paolo Bonzini, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Wangqinxiao (Tom), zhangyashu, wangyanan (Y), zouyipeng, zhengxinyu6 On 5/8/2026 10:25 PM, Sean Christopherson wrote: > On Fri, May 08, 2026, Xinyu Zheng wrote: >> On 2/19/2026 4:56 AM, Sean Christopherson wrote: >>> On Tue, Feb 10, 2026, Sean Christopherson wrote: >>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h >>> index 2c7d76262898..0bb2a34fb93d 100644 >>> --- a/include/linux/kvm_host.h >>> +++ b/include/linux/kvm_host.h >>> @@ -320,7 +320,8 @@ static inline bool kvm_vcpu_can_poll(ktime_t cur, ktime_t stop) >>> struct kvm_mmio_fragment { >>> gpa_t gpa; >>> void *data; >>> - unsigned len; >>> + u64 val; >> >> Hi, Jiayi and Sean, >> >> Since I met a KABI consistence break problem from this change, I am finding >> a way to avoid add including kvm_mmio_fragment.val. > > I assume you're looking for a solution for a private/proprietary kernel? I.e. > not trying to figure out a solution for an upstream LTS kernel? Yes. > >> Can I try to directly malloc a 8 size buffer for kvm_mmio_fragment.data >> instead of using kvm_mmio_fragment.val, and free this buffer in >> complete_emulated_mmio when all fragments is been copied? > > I highly doubt that will work, because you'd still need to stash the pointer > somewhere. And it pretty much would have to be somewhere in kvm_vcpu, which > would likely mean a change in KABI. FWIW, freeing the allocation in > complete_emulated_mmio() wouldn't suffice; you'd also need to free the memory > on vCPU destruction, because there's no guarantee userspace would complete > KVM_RUN. Thanks for your correction and patient reply! > > You should be able to use the padding in the "kvm_run.s". Thanks to s390's > massive regs size, there's a huge amount of unused space in the union on x86. > Note, because there can be two fragments in-flight, you'd need to index the > array using the correct fragment number. > > Userspace can scribble the value, but that's completely irrelevant from a host > safety perspective. > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index 6c8afa2047bf..29c7123d5467 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -515,7 +515,10 @@ struct kvm_run { > __u64 kvm_valid_regs; > __u64 kvm_dirty_regs; > union { > - struct kvm_sync_regs regs; > + struct { > + struct kvm_sync_regs regs; > + u64 x86_mmio_val[2]; > + }; > char padding[SYNC_REGS_SIZE_BYTES]; > } s; > }; I will take this suggestion. -- Xinyu Zheng ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-05-09 1:55 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <67a2f20537354628bcb835586a7c6255@huawei.com>
[not found] ` <aYuC87rMLlBYIZRc@google.com>
[not found] ` <aZYneb7Dvuu-HQsP@google.com>
2026-05-08 7:57 ` [BUG REPORT] USE_AFTER_FREE in complete_emulated_mmio found by KASAN/Syzkaller fuzz test (v5.10.0) Xinyu Zheng
2026-05-08 14:25 ` Sean Christopherson
2026-05-09 1:55 ` Xinyu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox