public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [syzbot] [kvm?] WARNING in kvm_gmem_fault_user_mapping
@ 2026-01-30 20:15 syzbot
  2026-02-04 17:01 ` [PATCH] KVM: guest_memfd: Disable VMA merging with VM_DONTEXPAND Ackerley Tng
  0 siblings, 1 reply; 12+ messages in thread
From: syzbot @ 2026-01-30 20:15 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    1f97d9dcf536 Merge tag 'vfio-v6.19-rc8' of https://github...
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=10b3e322580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=f1fac0919970b671
dashboard link: https://syzkaller.appspot.com/bug?extid=33a04338019ac7e43a44
compiler:       gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=15e5ebfa580000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=13eef85a580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/f898291c4b7b/disk-1f97d9dc.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/cac48e20323c/vmlinux-1f97d9dc.xz
kernel image: https://storage.googleapis.com/syzbot-assets/d2e60d34b7e7/bzImage-1f97d9dc.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com

------------[ cut here ]------------
folio_test_large(folio)
WARNING: arch/x86/kvm/../../../virt/kvm/guest_memfd.c:416 at kvm_gmem_fault_user_mapping+0x4b5/0x6e0 virt/kvm/guest_memfd.c:416, CPU#1: syz.3.124/6406
Modules linked in:
CPU: 1 UID: 0 PID: 6406 Comm: syz.3.124 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/13/2026
RIP: 0010:kvm_gmem_fault_user_mapping+0x4b5/0x6e0 virt/kvm/guest_memfd.c:416
Code: 00 e9 a1 fe ff ff bd 00 04 00 00 eb d9 e8 43 b8 83 00 48 c7 c6 e0 9f 82 8b 48 89 df e8 d4 f8 ce 00 90 0f 0b e8 2c b8 83 00 90 <0f> 0b 90 48 8d 6b 34 48 89 df e8 ec f6 bb 00 be 04 00 00 00 48 89
RSP: 0018:ffffc90004ab7848 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffffea00018a0000 RCX: ffffffff81834070
RDX: ffff888028b124c0 RSI: ffffffff81834334 RDI: ffff888028b124c0
RBP: ffffc90004ab79f8 R08: 0000000000000007 R09: 0000000000000000
R10: 0000000000000040 R11: 0000000000000000 R12: ffffea00018a0000
R13: ffffc90004ab7a08 R14: 0000000000000040 R15: ffffea00018a0008
FS:  00007fb8562ce6c0(0000) GS:ffff8881246db000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f89fb863d58 CR3: 000000006060a000 CR4: 00000000003526f0
Call Trace:
 <TASK>
 __do_fault+0x10d/0x550 mm/memory.c:5323
 do_read_fault mm/memory.c:5758 [inline]
 do_fault+0xaf9/0x1990 mm/memory.c:5892
 do_pte_missing mm/memory.c:4404 [inline]
 handle_pte_fault mm/memory.c:6276 [inline]
 __handle_mm_fault+0x1807/0x2b50 mm/memory.c:6414
 handle_mm_fault+0x36d/0xa20 mm/memory.c:6583
 faultin_page mm/gup.c:1126 [inline]
 __get_user_pages+0xf9c/0x34d0 mm/gup.c:1428
 populate_vma_page_range+0x267/0x3f0 mm/gup.c:1860
 __mm_populate+0x107/0x3a0 mm/gup.c:1963
 do_mlock+0x3f0/0x7f0 mm/mlock.c:653
 __do_sys_mlock mm/mlock.c:661 [inline]
 __se_sys_mlock mm/mlock.c:659 [inline]
 __x64_sys_mlock+0x59/0x80 mm/mlock.c:659
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xc9/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fb85539aeb9
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fb8562ce028 EFLAGS: 00000246 ORIG_RAX: 0000000000000095
RAX: ffffffffffffffda RBX: 00007fb855615fa0 RCX: 00007fb85539aeb9
RDX: 0000000000000000 RSI: 0000000000800000 RDI: 0000200000000000
RBP: 00007fb855408c1f R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fb855616038 R14: 00007fb855615fa0 R15: 00007ffcc1750088
 </TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] KVM: guest_memfd: Disable VMA merging with VM_DONTEXPAND
  2026-01-30 20:15 [syzbot] [kvm?] WARNING in kvm_gmem_fault_user_mapping syzbot
@ 2026-02-04 17:01 ` Ackerley Tng
  2026-02-04 18:21   ` [syzbot] [kvm?] WARNING in kvm_gmem_fault_user_mapping syzbot
  2026-02-04 19:10   ` [PATCH] KVM: guest_memfd: Disable VMA merging with VM_DONTEXPAND Ackerley Tng
  0 siblings, 2 replies; 12+ messages in thread
From: Ackerley Tng @ 2026-02-04 17:01 UTC (permalink / raw)
  To: syzbot+33a04338019ac7e43a44
  Cc: kvm, linux-kernel, pbonzini, syzkaller-bugs, Ackerley Tng

#syz test: git://git.kernel.org/pub/scm/virt/kvm/kvm.git next

guest_memfd VMAs don't need to be merged, especially now, since guest_memfd
only supports PAGE_SIZE folios.

Set VM_DONTEXPAND on guest_memfd VMAs.

In addition, this disables khugepaged from operating on guest_memfd folios,
which may result in unintended merging of guest_memfd folios.

Change-Id: I5867edcb66b075b54b25260afd22a198aee76df1
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
---
 virt/kvm/guest_memfd.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index fdaea3422c30..3d4ac461c28b 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -480,6 +480,12 @@ static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
 		return -EINVAL;
 	}

+	/*
+	 * Disable VMA merging - guest_memfd VMAs should be
+	 * static. This also stops khugepaged from operating on
+	 * guest_memfd VMAs and folios.
+	 */
+	vm_flags_set(vma, VM_DONTEXPAND);
 	vma->vm_ops = &kvm_gmem_vm_ops;

 	return 0;
--
2.53.0.rc2.204.g2597b5adb4-goog

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [syzbot] [kvm?] WARNING in kvm_gmem_fault_user_mapping
  2026-02-04 17:01 ` [PATCH] KVM: guest_memfd: Disable VMA merging with VM_DONTEXPAND Ackerley Tng
@ 2026-02-04 18:21   ` syzbot
  2026-02-04 19:10   ` [PATCH] KVM: guest_memfd: Disable VMA merging with VM_DONTEXPAND Ackerley Tng
  1 sibling, 0 replies; 12+ messages in thread
From: syzbot @ 2026-02-04 18:21 UTC (permalink / raw)
  To: ackerleytng, kvm, linux-kernel, pbonzini, syzkaller-bugs

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com
Tested-by: syzbot+33a04338019ac7e43a44@syzkaller.appspotmail.com

Tested on:

commit:         0499add8 Merge tag 'kvm-x86-fixes-6.19-rc1' of https:/..
git tree:       git://git.kernel.org/pub/scm/virt/kvm/kvm.git next
console output: https://syzkaller.appspot.com/x/log.txt?x=1778a402580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=3aec2f7e1730a8eb
dashboard link: https://syzkaller.appspot.com/bug?extid=33a04338019ac7e43a44
compiler:       gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44
patch:          https://syzkaller.appspot.com/x/patch.diff?x=13b847fa580000

Note: testing is done by a robot and is best-effort only.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] KVM: guest_memfd: Disable VMA merging with VM_DONTEXPAND
  2026-02-04 17:01 ` [PATCH] KVM: guest_memfd: Disable VMA merging with VM_DONTEXPAND Ackerley Tng
  2026-02-04 18:21   ` [syzbot] [kvm?] WARNING in kvm_gmem_fault_user_mapping syzbot
@ 2026-02-04 19:10   ` Ackerley Tng
  2026-02-04 21:37     ` Sean Christopherson
  1 sibling, 1 reply; 12+ messages in thread
From: Ackerley Tng @ 2026-02-04 19:10 UTC (permalink / raw)
  To: syzbot+33a04338019ac7e43a44
  Cc: kvm, linux-kernel, pbonzini, syzkaller-bugs, david, michael.roth,
	vannapurve, kartikey406

Ackerley Tng <ackerleytng@google.com> writes:

> #syz test: git://git.kernel.org/pub/scm/virt/kvm/kvm.git next
>
> guest_memfd VMAs don't need to be merged, especially now, since guest_memfd
> only supports PAGE_SIZE folios.
>
> Set VM_DONTEXPAND on guest_memfd VMAs.
>

Local tests and syzbot agree that this fixes the issue identified. :)

I would like to look into madvise(MADV_COLLAPSE) and uprobes triggering
mapping/folio collapsing before submitting a full patch series.

David, Michael, Vishal, what do you think of the choice of setting
VM_DONTEXPAND to disable khugepaged?

+ For 4K guest_memfd, there's really nothing to expand
+ For THP and HugeTLB guest_memfd (future), we actually don't want
expansion of the VMAs.

IIUC setting VM_DONTEXPAND doesn't affect mremap() as long as the
remapping does not involve expansion.

> In addition, this disables khugepaged from operating on guest_memfd folios,
> which may result in unintended merging of guest_memfd folios.
>
> Change-Id: I5867edcb66b075b54b25260afd22a198aee76df1
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> ---
>  virt/kvm/guest_memfd.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index fdaea3422c30..3d4ac461c28b 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -480,6 +480,12 @@ static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
>  		return -EINVAL;
>  	}
>
> +	/*
> +	 * Disable VMA merging - guest_memfd VMAs should be
> +	 * static. This also stops khugepaged from operating on
> +	 * guest_memfd VMAs and folios.
> +	 */
> +	vm_flags_set(vma, VM_DONTEXPAND);
>  	vma->vm_ops = &kvm_gmem_vm_ops;
>
>  	return 0;
> --
> 2.53.0.rc2.204.g2597b5adb4-goog

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] KVM: guest_memfd: Disable VMA merging with VM_DONTEXPAND
  2026-02-04 19:10   ` [PATCH] KVM: guest_memfd: Disable VMA merging with VM_DONTEXPAND Ackerley Tng
@ 2026-02-04 21:37     ` Sean Christopherson
  2026-02-04 21:45       ` David Hildenbrand (arm)
  0 siblings, 1 reply; 12+ messages in thread
From: Sean Christopherson @ 2026-02-04 21:37 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: syzbot+33a04338019ac7e43a44, kvm, linux-kernel, pbonzini,
	syzkaller-bugs, david, michael.roth, vannapurve, kartikey406

On Wed, Feb 04, 2026, Ackerley Tng wrote:
> Ackerley Tng <ackerleytng@google.com> writes:
> 
> > #syz test: git://git.kernel.org/pub/scm/virt/kvm/kvm.git next
> >
> > guest_memfd VMAs don't need to be merged,

Why not?  There are benefits to merging VMAs that have nothing to do with folios.
E.g. map 1GiB of guest_memfd with 512*512 4KiB VMAs, and then it becomes quite
desirable to merge all of those VMAs into one.

Creating _hugepages_ doesn't add value, but that's not the same things as merging
VMAs.

> > especially now, since guest_memfd only supports PAGE_SIZE folios.
> >
> > Set VM_DONTEXPAND on guest_memfd VMAs.
>
> Local tests and syzbot agree that this fixes the issue identified. :)
> 
> I would like to look into madvise(MADV_COLLAPSE) and uprobes triggering
> mapping/folio collapsing before submitting a full patch series.
> 
> David, Michael, Vishal, what do you think of the choice of setting
> VM_DONTEXPAND to disable khugepaged?

I'm not one of the above, but for me it feels very much like treating a symptom
and not fixing the underlying cause.

It seems like what KVM should do is not block one path that triggers hugepage
processing, but instead flat out disallow creating hugepages.  Unfortunately,
AFAICT, there's no existing way to prevent madvise() from clearing VM_NOHUGEPAGE,
so we can't simply force that flag.

I'd prefer not to special case guest_memfd, a la devdax, but I also want to address
this head-on, not by removing a tangentially related trigger.

> + For 4K guest_memfd, there's really nothing to expand
> + For THP and HugeTLB guest_memfd (future), we actually don't want
> expansion of the VMAs.
> 
> IIUC setting VM_DONTEXPAND doesn't affect mremap() as long as the
> remapping does not involve expansion.
> 
> > In addition, this disables khugepaged from operating on guest_memfd folios,
> > which may result in unintended merging of guest_memfd folios.
> >
> > Change-Id: I5867edcb66b075b54b25260afd22a198aee76df1
> > Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> > ---
> >  virt/kvm/guest_memfd.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> > index fdaea3422c30..3d4ac461c28b 100644
> > --- a/virt/kvm/guest_memfd.c
> > +++ b/virt/kvm/guest_memfd.c
> > @@ -480,6 +480,12 @@ static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
> >  		return -EINVAL;
> >  	}
> >
> > +	/*
> > +	 * Disable VMA merging - guest_memfd VMAs should be
> > +	 * static. This also stops khugepaged from operating on
> > +	 * guest_memfd VMAs and folios.
> > +	 */
> > +	vm_flags_set(vma, VM_DONTEXPAND);
> >  	vma->vm_ops = &kvm_gmem_vm_ops;
> >
> >  	return 0;
> > --
> > 2.53.0.rc2.204.g2597b5adb4-goog

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] KVM: guest_memfd: Disable VMA merging with VM_DONTEXPAND
  2026-02-04 21:37     ` Sean Christopherson
@ 2026-02-04 21:45       ` David Hildenbrand (arm)
  2026-02-04 23:17         ` Ackerley Tng
  0 siblings, 1 reply; 12+ messages in thread
From: David Hildenbrand (arm) @ 2026-02-04 21:45 UTC (permalink / raw)
  To: Sean Christopherson, Ackerley Tng
  Cc: syzbot+33a04338019ac7e43a44, kvm, linux-kernel, pbonzini,
	syzkaller-bugs, michael.roth, vannapurve, kartikey406

On 2/4/26 22:37, Sean Christopherson wrote:
> On Wed, Feb 04, 2026, Ackerley Tng wrote:
>> Ackerley Tng <ackerleytng@google.com> writes:
>>
>>> #syz test: git://git.kernel.org/pub/scm/virt/kvm/kvm.git next
>>>
>>> guest_memfd VMAs don't need to be merged,
> 
> Why not?  There are benefits to merging VMAs that have nothing to do with folios.
> E.g. map 1GiB of guest_memfd with 512*512 4KiB VMAs, and then it becomes quite
> desirable to merge all of those VMAs into one.
> 
> Creating _hugepages_ doesn't add value, but that's not the same things as merging
> VMAs.
> 
>>> especially now, since guest_memfd only supports PAGE_SIZE folios.
>>>
>>> Set VM_DONTEXPAND on guest_memfd VMAs.
>>
>> Local tests and syzbot agree that this fixes the issue identified. :)
>>
>> I would like to look into madvise(MADV_COLLAPSE) and uprobes triggering
>> mapping/folio collapsing before submitting a full patch series.
>>
>> David, Michael, Vishal, what do you think of the choice of setting
>> VM_DONTEXPAND to disable khugepaged?
> 
> I'm not one of the above, but for me it feels very much like treating a symptom
> and not fixing the underlying cause.

And you are spot-on :)

> 
> It seems like what KVM should do is not block one path that triggers hugepage
> processing, but instead flat out disallow creating hugepages.  Unfortunately,
> AFAICT, there's no existing way to prevent madvise() from clearing VM_NOHUGEPAGE,
> so we can't simply force that flag.
> 
> I'd prefer not to special case guest_memfd, a la devdax, but I also want to address
> this head-on, not by removing a tangentially related trigger.

VM_NOHUGEPAGE also smells like the wrong thing. This is a file limitation.

!thp_vma_allowable_order() must take care of that somehow down in 
__thp_vma_allowable_orders(), by checking the file).

Likely the file_thp_enabled() check is the culprit with 
CONFIG_READ_ONLY_THP_FOR_FS?

Maybe we need a flag to say "even not CONFIG_READ_ONLY_THP_FOR_FS".

I wonder how we handle that for secretmem. Too late for me, going to bed :)

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] KVM: guest_memfd: Disable VMA merging with VM_DONTEXPAND
  2026-02-04 21:45       ` David Hildenbrand (arm)
@ 2026-02-04 23:17         ` Ackerley Tng
  2026-02-08 17:34           ` Ackerley Tng
  0 siblings, 1 reply; 12+ messages in thread
From: Ackerley Tng @ 2026-02-04 23:17 UTC (permalink / raw)
  To: David Hildenbrand (arm), Sean Christopherson
  Cc: syzbot+33a04338019ac7e43a44, kvm, linux-kernel, pbonzini,
	syzkaller-bugs, michael.roth, vannapurve, kartikey406

"David Hildenbrand (arm)" <david@kernel.org> writes:

> On 2/4/26 22:37, Sean Christopherson wrote:
>> On Wed, Feb 04, 2026, Ackerley Tng wrote:
>>> Ackerley Tng <ackerleytng@google.com> writes:
>>>
>>>> #syz test: git://git.kernel.org/pub/scm/virt/kvm/kvm.git next
>>>>
>>>> guest_memfd VMAs don't need to be merged,
>>
>> Why not?  There are benefits to merging VMAs that have nothing to do with folios.
>> E.g. map 1GiB of guest_memfd with 512*512 4KiB VMAs, and then it becomes quite
>> desirable to merge all of those VMAs into one.
>>

I didn't realise VM_DONTEXPAND's no expansion policy extends to the case
where adjacent VMAs with the same flags, etc automatically merge. Since
VM_DONTEXPAND blocks this kind of expansion, I agree VM_DONTEXPAND is
not great.

>> Creating _hugepages_ doesn't add value, but that's not the same things as merging
>> VMAs.
>>
>>>> especially now, since guest_memfd only supports PAGE_SIZE folios.
>>>>
>>>> Set VM_DONTEXPAND on guest_memfd VMAs.
>>>
>>> Local tests and syzbot agree that this fixes the issue identified. :)
>>>
>>> I would like to look into madvise(MADV_COLLAPSE) and uprobes triggering
>>> mapping/folio collapsing before submitting a full patch series.
>>>
>>> David, Michael, Vishal, what do you think of the choice of setting
>>> VM_DONTEXPAND to disable khugepaged?
>>
>> I'm not one of the above, but for me it feels very much like treating a symptom

Was going to find some solution before getting to you to save you some
time :)

>> and not fixing the underlying cause.
>
> And you are spot-on :)
>
>>
>> It seems like what KVM should do is not block one path that triggers hugepage
>> processing, but instead flat out disallow creating hugepages.  Unfortunately,

__filemap_get_folio_mpol(), which we use in kvm_gmem_get_folio(), looks
up mapping_min_folio_order() to determine what order to allocate. I
think we could lock that down to always use order 0. I tried that here
[1] but in this case khugepaged allocates new folios for guest_memfd
(and others) directly in collapse_file(), explicitly specifying
PMD_ORDER.

I took a look and wasn't able to find a central callback/ops to catch
all fs allocations.

[1] https://lore.kernel.org/all/6982553e.a00a0220.34fa92.0009.GAE@google.com/

>> AFAICT, there's no existing way to prevent madvise() from clearing VM_NOHUGEPAGE,
>> so we can't simply force that flag.
>>
>> I'd prefer not to special case guest_memfd, a la devdax, but I also want to address
>> this head-on, not by removing a tangentially related trigger.
>
> VM_NOHUGEPAGE also smells like the wrong thing. This is a file limitation.
>
> !thp_vma_allowable_order() must take care of that somehow down in
> __thp_vma_allowable_orders(), by checking the file).
>
> Likely the file_thp_enabled() check is the culprit with
> CONFIG_READ_ONLY_THP_FOR_FS?
>
> Maybe we need a flag to say "even not CONFIG_READ_ONLY_THP_FOR_FS".
>
> I wonder how we handle that for secretmem. Too late for me, going to bed :)
>

Let me look deeper into this. Thanks!

> --
> Cheers,
>
> David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] KVM: guest_memfd: Disable VMA merging with VM_DONTEXPAND
  2026-02-04 23:17         ` Ackerley Tng
@ 2026-02-08 17:34           ` Ackerley Tng
  2026-02-09  3:40             ` Deepanshu Kartikey
  2026-02-09 10:38             ` David Hildenbrand (Arm)
  0 siblings, 2 replies; 12+ messages in thread
From: Ackerley Tng @ 2026-02-08 17:34 UTC (permalink / raw)
  To: David Hildenbrand (arm), Sean Christopherson
  Cc: syzbot+33a04338019ac7e43a44, kvm, linux-kernel, pbonzini,
	syzkaller-bugs, michael.roth, vannapurve, kartikey406

Ackerley Tng <ackerleytng@google.com> writes:

>
> [...snip...]
>
>> !thp_vma_allowable_order() must take care of that somehow down in
>> __thp_vma_allowable_orders(), by checking the file).
>>
>> Likely the file_thp_enabled() check is the culprit with
>> CONFIG_READ_ONLY_THP_FOR_FS?
>>
>> Maybe we need a flag to say "even not CONFIG_READ_ONLY_THP_FOR_FS".
>>
>> I wonder how we handle that for secretmem. Too late for me, going to bed :)
>>
>
> Let me look deeper into this. Thanks!
>

I trimmed the repro to this:

static void test_guest_memfd_repro(void)
{
	struct kvm_vcpu *vcpu;
	uint8_t *unaligned_mem;
	struct kvm_vm *vm;
	uint8_t *mem;
	int fd;

	vm = __vm_create_shape_with_one_vcpu(VM_SHAPE_DEFAULT, &vcpu, 1, guest_code);

	fd = vm_create_guest_memfd(vm, SZ_2M * 2, GUEST_MEMFD_FLAG_MMAP |
GUEST_MEMFD_FLAG_INIT_SHARED);

	unaligned_mem = mmap(NULL, SZ_2M + SZ_2M, PROT_READ | PROT_WRITE,
MAP_FIXED | MAP_SHARED, fd, 0);
	mem = align_ptr_up(unaligned_mem, SZ_2M);
	TEST_ASSERT(((unsigned long)mem & (SZ_2M - 1)) == 0, "returned
address must be aligned to SZ_2M");

	TEST_ASSERT_EQ(madvise(mem, SZ_2M, MADV_HUGEPAGE), 0);

	for (int i = 0; i < SZ_2M; i += SZ_4K)
		READ_ONCE(mem[i]);

	TEST_ASSERT_EQ(madvise(mem, SZ_2M, MADV_COLLAPSE), 0);

	TEST_ASSERT_EQ(madvise(mem, SZ_2M, MADV_DONTNEED), 0);

	/* This triggers the WARNing. */
	READ_ONCE(mem[0]);

	munmap(unaligned_mem, SZ_2M * 2);

	close(fd);
	kvm_vm_free(vm);
}

And tried to replace the fd creation the secretmem equivalent

	fd = syscall(__NR_memfd_secret, 0);
	TEST_ASSERT(fd >= 0, "Couldn't create secretmem fd.");
	TEST_ASSERT_EQ(ftruncate(fd, SZ_2M * 2), 0);

Should a guest_memfd selftest be added to cover this?

MADV_COLLAPSE fails with EINVAL, but it does go through to
hpage_collapse_scan_file() -> collapse_file(), before failing because
when collapsing the page, copy_mc_highpage() returns > 0.

Not super familiar with copy_mc_highpage() - I haven't looked into why
copy_mc_highpage() failed, but looks like it would have caused
memory_failure_queue() which would be inappropriate.

Since this also affects secretmem, I think thp_vma_allowable_order() is
the best place to intercept the collapsing flow for both secretmem and
guest_memfd.

Let me know if you have any ideas!

>> --
>> Cheers,
>>
>> David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] KVM: guest_memfd: Disable VMA merging with VM_DONTEXPAND
  2026-02-08 17:34           ` Ackerley Tng
@ 2026-02-09  3:40             ` Deepanshu Kartikey
  2026-02-09 10:38             ` David Hildenbrand (Arm)
  1 sibling, 0 replies; 12+ messages in thread
From: Deepanshu Kartikey @ 2026-02-09  3:40 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: David Hildenbrand (arm), Sean Christopherson,
	syzbot+33a04338019ac7e43a44, kvm, linux-kernel, pbonzini,
	syzkaller-bugs, michael.roth, vannapurve

On Sun, Feb 8, 2026 at 11:04 PM Ackerley Tng <ackerleytng@google.com> wrote:
>

> Since this also affects secretmem, I think thp_vma_allowable_order() is
> the best place to intercept the collapsing flow for both secretmem and
> guest_memfd.
>
> Let me know if you have any ideas!
>

Hi David, Ackerley,

I have been looking into this bug and I think the root cause is in
file_thp_enabled(). When CONFIG_READ_ONLY_THP_FOR_FS is enabled,
guest_memfd and secretmem inodes pass the S_ISREG() and
!inode_is_open_for_write() checks, so file_thp_enabled() incorrectly
returns true. This allows khugepaged and MADV_COLLAPSE to create large
folios in the page cache.

I sent a patch that fixes this at the source by explicitly rejecting
GUEST_MEMFD_MAGIC and SECRETMEM_MAGIC in file_thp_enabled():

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 40cf59301c21..4f57c78b57dd 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -93,6 +93,9 @@ static inline bool file_thp_enabled(struct
vm_area_struct *vma)
  return false;

  inode = file_inode(vma->vm_file);
+ if (inode->i_sb->s_magic == GUEST_MEMFD_MAGIC ||
+     inode->i_sb->s_magic == SECRETMEM_MAGIC)
+ return false;

  return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
 }

I have tested this and confirmed the warning no longer triggers. This
approach covers both guest_memfd and secretmem in one place without
needing separate VMA flag changes in each subsystem. I have sent the
patch.

Please have a look and let me know your thoughts.

Thanks,
Deepanshu

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] KVM: guest_memfd: Disable VMA merging with VM_DONTEXPAND
  2026-02-08 17:34           ` Ackerley Tng
  2026-02-09  3:40             ` Deepanshu Kartikey
@ 2026-02-09 10:38             ` David Hildenbrand (Arm)
  2026-02-09 18:24               ` Ackerley Tng
  1 sibling, 1 reply; 12+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-09 10:38 UTC (permalink / raw)
  To: Ackerley Tng, Sean Christopherson
  Cc: syzbot+33a04338019ac7e43a44, kvm, linux-kernel, pbonzini,
	syzkaller-bugs, michael.roth, vannapurve, kartikey406

On 2/8/26 18:34, Ackerley Tng wrote:
> Ackerley Tng <ackerleytng@google.com> writes:
> 
>>
>> [...snip...]
>>
>>> !thp_vma_allowable_order() must take care of that somehow down in
>>> __thp_vma_allowable_orders(), by checking the file).
>>>
>>> Likely the file_thp_enabled() check is the culprit with
>>> CONFIG_READ_ONLY_THP_FOR_FS?
>>>
>>> Maybe we need a flag to say "even not CONFIG_READ_ONLY_THP_FOR_FS".
>>>
>>> I wonder how we handle that for secretmem. Too late for me, going to bed :)
>>>
>>
>> Let me look deeper into this. Thanks!
>>
> 
> I trimmed the repro to this:
> 
> static void test_guest_memfd_repro(void)
> {
> 	struct kvm_vcpu *vcpu;
> 	uint8_t *unaligned_mem;
> 	struct kvm_vm *vm;
> 	uint8_t *mem;
> 	int fd;
> 
> 	vm = __vm_create_shape_with_one_vcpu(VM_SHAPE_DEFAULT, &vcpu, 1, guest_code);
> 
> 	fd = vm_create_guest_memfd(vm, SZ_2M * 2, GUEST_MEMFD_FLAG_MMAP |
> GUEST_MEMFD_FLAG_INIT_SHARED);
> 
> 	unaligned_mem = mmap(NULL, SZ_2M + SZ_2M, PROT_READ | PROT_WRITE,
> MAP_FIXED | MAP_SHARED, fd, 0);
> 	mem = align_ptr_up(unaligned_mem, SZ_2M);
> 	TEST_ASSERT(((unsigned long)mem & (SZ_2M - 1)) == 0, "returned
> address must be aligned to SZ_2M");
> 
> 	TEST_ASSERT_EQ(madvise(mem, SZ_2M, MADV_HUGEPAGE), 0);
> 
> 	for (int i = 0; i < SZ_2M; i += SZ_4K)
> 		READ_ONCE(mem[i]);
> 
> 	TEST_ASSERT_EQ(madvise(mem, SZ_2M, MADV_COLLAPSE), 0);
> 
> 	TEST_ASSERT_EQ(madvise(mem, SZ_2M, MADV_DONTNEED), 0);
> 
> 	/* This triggers the WARNing. */
> 	READ_ONCE(mem[0]);
> 
> 	munmap(unaligned_mem, SZ_2M * 2);
> 
> 	close(fd);
> 	kvm_vm_free(vm);
> }
> 
> And tried to replace the fd creation the secretmem equivalent
> 
> 	fd = syscall(__NR_memfd_secret, 0);
> 	TEST_ASSERT(fd >= 0, "Couldn't create secretmem fd.");
> 	TEST_ASSERT_EQ(ftruncate(fd, SZ_2M * 2), 0);
> 
> Should a guest_memfd selftest be added to cover this?
> 
> MADV_COLLAPSE fails with EINVAL, but it does go through to
> hpage_collapse_scan_file() -> collapse_file(), before failing because
> when collapsing the page, copy_mc_highpage() returns > 0.

Just what I suspected. :)

Thanks for digging into the details!

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] KVM: guest_memfd: Disable VMA merging with VM_DONTEXPAND
  2026-02-09 10:38             ` David Hildenbrand (Arm)
@ 2026-02-09 18:24               ` Ackerley Tng
  2026-02-09 19:38                 ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 12+ messages in thread
From: Ackerley Tng @ 2026-02-09 18:24 UTC (permalink / raw)
  To: David Hildenbrand (Arm), Sean Christopherson
  Cc: syzbot+33a04338019ac7e43a44, kvm, linux-kernel, pbonzini,
	syzkaller-bugs, michael.roth, vannapurve, kartikey406

"David Hildenbrand (Arm)" <david@kernel.org> writes:

> On 2/8/26 18:34, Ackerley Tng wrote:
>> Ackerley Tng <ackerleytng@google.com> writes:
>>
>>>
>>> [...snip...]
>>>
>>>> !thp_vma_allowable_order() must take care of that somehow down in
>>>> __thp_vma_allowable_orders(), by checking the file).
>>>>
>>>> Likely the file_thp_enabled() check is the culprit with
>>>> CONFIG_READ_ONLY_THP_FOR_FS?
>>>>
>>>> Maybe we need a flag to say "even not CONFIG_READ_ONLY_THP_FOR_FS".
>>>>
>>>> I wonder how we handle that for secretmem. Too late for me, going to bed :)
>>>>
>>>
>>> Let me look deeper into this. Thanks!
>>>
>>
>> I trimmed the repro to this:
>>
>> static void test_guest_memfd_repro(void)
>> {
>> 	struct kvm_vcpu *vcpu;
>> 	uint8_t *unaligned_mem;
>> 	struct kvm_vm *vm;
>> 	uint8_t *mem;
>> 	int fd;
>>
>> 	vm = __vm_create_shape_with_one_vcpu(VM_SHAPE_DEFAULT, &vcpu, 1, guest_code);
>>
>> 	fd = vm_create_guest_memfd(vm, SZ_2M * 2, GUEST_MEMFD_FLAG_MMAP |
>> GUEST_MEMFD_FLAG_INIT_SHARED);
>>
>> 	unaligned_mem = mmap(NULL, SZ_2M + SZ_2M, PROT_READ | PROT_WRITE,
>> MAP_FIXED | MAP_SHARED, fd, 0);
>> 	mem = align_ptr_up(unaligned_mem, SZ_2M);
>> 	TEST_ASSERT(((unsigned long)mem & (SZ_2M - 1)) == 0, "returned
>> address must be aligned to SZ_2M");
>>
>> 	TEST_ASSERT_EQ(madvise(mem, SZ_2M, MADV_HUGEPAGE), 0);
>>
>> 	for (int i = 0; i < SZ_2M; i += SZ_4K)
>> 		READ_ONCE(mem[i]);
>>
>> 	TEST_ASSERT_EQ(madvise(mem, SZ_2M, MADV_COLLAPSE), 0);
>>
>> 	TEST_ASSERT_EQ(madvise(mem, SZ_2M, MADV_DONTNEED), 0);
>>
>> 	/* This triggers the WARNing. */
>> 	READ_ONCE(mem[0]);
>>
>> 	munmap(unaligned_mem, SZ_2M * 2);
>>
>> 	close(fd);
>> 	kvm_vm_free(vm);
>> }
>>
>> And tried to replace the fd creation the secretmem equivalent
>>
>> 	fd = syscall(__NR_memfd_secret, 0);
>> 	TEST_ASSERT(fd >= 0, "Couldn't create secretmem fd.");
>> 	TEST_ASSERT_EQ(ftruncate(fd, SZ_2M * 2), 0);
>>
>> Should a guest_memfd selftest be added to cover this?
>>
>> MADV_COLLAPSE fails with EINVAL, but it does go through to
>> hpage_collapse_scan_file() -> collapse_file(), before failing because
>> when collapsing the page, copy_mc_highpage() returns > 0.
>
> Just what I suspected. :)
>
> Thanks for digging into the details!
>

Happy to help :)

In general, do we want the reproducers added as selftests? Should this
be added as part of tools/testing/selftests/kvm/guest_memfd_test.c or a
separate file?

> --
> Cheers,
>
> David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] KVM: guest_memfd: Disable VMA merging with VM_DONTEXPAND
  2026-02-09 18:24               ` Ackerley Tng
@ 2026-02-09 19:38                 ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-09 19:38 UTC (permalink / raw)
  To: Ackerley Tng, Sean Christopherson
  Cc: syzbot+33a04338019ac7e43a44, kvm, linux-kernel, pbonzini,
	syzkaller-bugs, michael.roth, vannapurve, kartikey406

On 2/9/26 19:24, Ackerley Tng wrote:
> "David Hildenbrand (Arm)" <david@kernel.org> writes:
> 
>> On 2/8/26 18:34, Ackerley Tng wrote:
>>> Ackerley Tng <ackerleytng@google.com> writes:
>>>
>>>
>>> I trimmed the repro to this:
>>>
>>> static void test_guest_memfd_repro(void)
>>> {
>>> 	struct kvm_vcpu *vcpu;
>>> 	uint8_t *unaligned_mem;
>>> 	struct kvm_vm *vm;
>>> 	uint8_t *mem;
>>> 	int fd;
>>>
>>> 	vm = __vm_create_shape_with_one_vcpu(VM_SHAPE_DEFAULT, &vcpu, 1, guest_code);
>>>
>>> 	fd = vm_create_guest_memfd(vm, SZ_2M * 2, GUEST_MEMFD_FLAG_MMAP |
>>> GUEST_MEMFD_FLAG_INIT_SHARED);
>>>
>>> 	unaligned_mem = mmap(NULL, SZ_2M + SZ_2M, PROT_READ | PROT_WRITE,
>>> MAP_FIXED | MAP_SHARED, fd, 0);
>>> 	mem = align_ptr_up(unaligned_mem, SZ_2M);
>>> 	TEST_ASSERT(((unsigned long)mem & (SZ_2M - 1)) == 0, "returned
>>> address must be aligned to SZ_2M");
>>>
>>> 	TEST_ASSERT_EQ(madvise(mem, SZ_2M, MADV_HUGEPAGE), 0);
>>>
>>> 	for (int i = 0; i < SZ_2M; i += SZ_4K)
>>> 		READ_ONCE(mem[i]);
>>>
>>> 	TEST_ASSERT_EQ(madvise(mem, SZ_2M, MADV_COLLAPSE), 0);
>>>
>>> 	TEST_ASSERT_EQ(madvise(mem, SZ_2M, MADV_DONTNEED), 0);
>>>
>>> 	/* This triggers the WARNing. */
>>> 	READ_ONCE(mem[0]);
>>>
>>> 	munmap(unaligned_mem, SZ_2M * 2);
>>>
>>> 	close(fd);
>>> 	kvm_vm_free(vm);
>>> }
>>>
>>> And tried to replace the fd creation the secretmem equivalent
>>>
>>> 	fd = syscall(__NR_memfd_secret, 0);
>>> 	TEST_ASSERT(fd >= 0, "Couldn't create secretmem fd.");
>>> 	TEST_ASSERT_EQ(ftruncate(fd, SZ_2M * 2), 0);
>>>
>>> Should a guest_memfd selftest be added to cover this?
>>>
>>> MADV_COLLAPSE fails with EINVAL, but it does go through to
>>> hpage_collapse_scan_file() -> collapse_file(), before failing because
>>> when collapsing the page, copy_mc_highpage() returns > 0.
>>
>> Just what I suspected. :)
>>
>> Thanks for digging into the details!
>>
> 
> Happy to help :)
> 
> In general, do we want the reproducers added as selftests? Should this
> be added as part of tools/testing/selftests/kvm/guest_memfd_test.c

I guess adding it to guest_memfd_test.c and asserting that MADV_COLLAPSE 
fails as expected could be a reasonable test case. It's not a lot of 
code and easy to verify.

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-02-09 19:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-30 20:15 [syzbot] [kvm?] WARNING in kvm_gmem_fault_user_mapping syzbot
2026-02-04 17:01 ` [PATCH] KVM: guest_memfd: Disable VMA merging with VM_DONTEXPAND Ackerley Tng
2026-02-04 18:21   ` [syzbot] [kvm?] WARNING in kvm_gmem_fault_user_mapping syzbot
2026-02-04 19:10   ` [PATCH] KVM: guest_memfd: Disable VMA merging with VM_DONTEXPAND Ackerley Tng
2026-02-04 21:37     ` Sean Christopherson
2026-02-04 21:45       ` David Hildenbrand (arm)
2026-02-04 23:17         ` Ackerley Tng
2026-02-08 17:34           ` Ackerley Tng
2026-02-09  3:40             ` Deepanshu Kartikey
2026-02-09 10:38             ` David Hildenbrand (Arm)
2026-02-09 18:24               ` Ackerley Tng
2026-02-09 19:38                 ` David Hildenbrand (Arm)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox