From: Lance Yang <lance.yang@linux.dev>
To: ljs@kernel.org
Cc: syzbot+de14f7701c22477db718@syzkaller.appspotmail.com,
Liam.Howlett@oracle.com, akpm@linux-foundation.org,
baohua@kernel.org, baolin.wang@linux.alibaba.com,
david@kernel.org, dev.jain@arm.com, lance.yang@linux.dev,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
npache@redhat.com, ryan.roberts@arm.com,
syzkaller-bugs@googlegroups.com, ziy@nvidia.com, rppt@kernel.org,
harry.yoo@oracle.com
Subject: Re: [syzbot] [mm?] general protection fault in zap_huge_pmd
Date: Thu, 19 Mar 2026 11:09:14 +0800 [thread overview]
Message-ID: <20260319030914.12034-1-lance.yang@linux.dev> (raw)
In-Reply-To: <6b3d7ad7-49e1-407a-903d-3103704160d8@lucifer.local>
On Wed, Mar 18, 2026 at 05:26:32PM +0000, Lorenzo Stoakes (Oracle) wrote:
>+cc Mike for uffd, Harry for fix that also resolves this, see below
>
>On Wed, Mar 18, 2026 at 08:03:22AM -0700, syzbot wrote:
>> Hello,
>>
>> syzbot found the following issue on:
>>
>> HEAD commit: b84a0ebe421c Add linux-next specific files for 20260313
>
>For some reason I have to git pull --tags to get this... commit hash locally?
>Strange.
>
>> git tree: linux-next
>> console output: https://syzkaller.appspot.com/x/log.txt?x=119ddd52580000
>> kernel config: https://syzkaller.appspot.com/x/.config?x=e7280ad1f68b2dce
>> dashboard link: https://syzkaller.appspot.com/bug?extid=de14f7701c22477db718
>> compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=173b44da580000
>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1537b8da580000
>
>@SYZKALLER guys:
>
>Note: the repro is incorrectly labelling;
>
> // ioctl$UFFDIO_CONTINUE arguments: [
> // fd: fd_uffd (resource)
> // cmd: const = 0xc020aa08 (4 bytes)
>
>as UFFDIO_CONTINUE (0x7), it's actually UFFDIO_POISION (0x8) as you can see
>from least-significant byte.
#define _UFFDIO_CONTINUE (0x07)
#define _UFFDIO_POISON (0x08)
Ouch. I spent quite some time trying to figure out how UFFDIO_CONTINUE
could possibly install PTE markers and push the loop past the VMA
boundary - turns out it can't, because it was UFFDIO_POISON all along.
>
>It's also stating things like mmap flags wrong e.g.:
>
> /*flags=MAP_UNINITIALIZED|MAP_POPULATE|MAP_NORESERVE|MAP_NONBLOCK|MAP_HUGETLB|0x8c4b815a506002b2*/
> 0x8c4b815a5465c2b2ul,
>
>AT LEAST MAKE THE NUMBERS MATCH :) this doesn't help with debugging.
>
>AI hallucinations?
>
>It also never returns with an error if a syscall doesn't work which means the
>repro can run 'ok' but actually be failing on something, this really slows down repro'ing.
>
>Maybe hard, but be good to figure out maintainers based on the stuff the repro
>uses uffd -> uffd entry in MAINTAINERS :)
>
>OK rants done :) got it repro'ing locally now.
>
>>
>> Downloadable assets:
>> disk image: https://storage.googleapis.com/syzbot-assets/09145161a8a9/disk-b84a0ebe.raw.xz
>> vmlinux: https://storage.googleapis.com/syzbot-assets/b64c254e474c/vmlinux-b84a0ebe.xz
>> kernel image: https://storage.googleapis.com/syzbot-assets/a7c33f5f7f45/bzImage-b84a0ebe.xz
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+de14f7701c22477db718@syzkaller.appspotmail.com
>>
>> Oops: general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] SMP KASAN PTI
>> KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
>> CPU: 1 UID: 0 PID: 5994 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2026
>> RIP: 0010:folio_test_anon include/linux/page-flags.h:718 [inline]
>
>static __always_inline bool folio_test_anon(const struct folio *folio)
>{
> return ((unsigned long)folio->mapping & FOLIO_MAPPING_ANON) != 0; <-- NULL folio
>}
>
>
>> RIP: 0010:zap_huge_pmd+0x7b1/0x1030 mm/huge_memory.c:2463
>
>int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> pmd_t *pmd, unsigned long addr)
>{
> ...
>
> if (!vma_is_dax(vma) && vma_is_special_huge(vma)) {
> ...
> } else if (is_huge_zero_pmd(orig_pmd)) {
> ...
> } else {
> struct folio *folio = NULL;
>
> ...
>
> if (pmd_present(orig_pmd)) {
> ...
> } else if (pmd_is_valid_softleaf(orig_pmd)) {
> ...
> }
>
> if (folio_test_anon(folio)) { <-- if !pmd_present() && !pmd_is_valid_softleaf(orig_pmd)
>
>Yikes. We should probably put an } else { VM_WARN_ON_ONCE(1); } at least above
>this...
>
>
>
>
>> Code: 08 00 00 e8 11 e0 92 ff 48 c7 44 24 10 00 00 00 00 4c 8b 3c 24 4c 8d 75 18 4c 89 f0 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <80> 3c 08 00 74 08 4c 89 f7 e8 f1 43 fc ff 49 8b 1e 48 89 de 48 83
>> RSP: 0018:ffffc90003bb7550 EFLAGS: 00010206
>> RAX: 0000000000000003 RBX: f000000000000000 RCX: dffffc0000000000
>> RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003
>> RBP: 0000000000000000 R08: ffff88807cc9802f R09: 1ffff1100f993005
>> R10: dffffc0000000000 R11: ffffed100f993006 R12: ffff88807cc98028
>> R13: fffffffffffffa00 R14: 0000000000000018 R15: ffffc90003bb7ac0
>> FS: 0000000000000000(0000) GS:ffff888124ee0000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00002000000000c0 CR3: 000000000e94a000 CR4: 00000000003526f0
>> Call Trace:
>> <TASK>
>> zap_pmd_range mm/memory.c:1990 [inline]
>
> else if (zap_huge_pmd(tlb, vma, pmd, addr)) { <-- here
>
>> zap_pud_range mm/memory.c:2032 [inline]
>> zap_p4d_range mm/memory.c:2053 [inline]
>> __zap_vma_range+0xa82/0x4bd0 mm/memory.c:2093
>> unmap_vmas+0x379/0x530 mm/memory.c:2162
>> exit_mmap+0x280/0xa10 mm/mmap.c:1302
>> __mmput+0x118/0x430 kernel/fork.c:1180
>> exit_mm+0x18e/0x250 kernel/exit.c:581
>> do_exit+0x8b9/0x2490 kernel/exit.c:962
>> do_group_exit+0x21b/0x2d0 kernel/exit.c:1116
>> __do_sys_exit_group kernel/exit.c:1127 [inline]
>> __se_sys_exit_group kernel/exit.c:1125 [inline]
>> __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1125
>> x64_sys_call+0x221a/0x2240 arch/x86/include/generated/asm/syscalls_64.h:232
>> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>> do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
>> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
>So this is on process teardown.
>
>Looking at the repro (+ trying to decode what it ACTUALLY does :) it looks like
>it's installing a PTE_MARKER_POISONED at a PMD level via hugetlb, because since
>commit 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs")
>this is supported.
>
>Normally this would be handled by __unmap_hugepage_range():
>
> if (unlikely(is_vm_hugetlb_page(vma))) {
> ...
> __unmap_hugepage_range(tlb, vma, start, end, NULL, zap_flags);
> } else {
> ...
> next = zap_p4d_range(tlb, vma, pgd, addr, next, details);
> }
>
>But for some reason the zap_p4d_range() path is being used.
>
>I got a the repro reliably working locally (not sure why syzkaller didn't
>bisect) so I have bisected it to commit 7d4d4de3ac3e ("userfaultfd: introduce
>mfill_get_vma() and mfill_put_vma()").
>
>And.. of course, after spending (wasting? :) a long time on this, it's already
>fixed...
>
>It seems it's fixed by https://lore.kernel.org/linux-mm/abehBY7QakYF9bK4@hyeyoo/
>
>Before mfill_atomic() would initialise some mfill_state helper struct like this:
>
> struct mfill_state state = (struct mfill_state){
> .ctx = ctx,
> .dst_start = dst_start,
> .src_start = src_start,
> .flags = flags,
>
> .src_addr = src_start,
> .dst_addr = dst_start,
> };
>
>BUT not initialise .len = len
>
>So length from then on is assumed to be 0.
>
>OK so the repro, again, generates TOTALLY incorrect labelling:
>
> // ioctl$UFFDIO_CONTINUE arguments: [
> // fd: fd_uffd (resource)
> // cmd: const = 0xc020aa08 (4 bytes)
> // arg: ptr[in, uffdio_continue] {
> // uffdio_continue {
> // range: uffdio_range {
> // start: VMA[0xc00000]
> // len: len = 0xc00000 (8 bytes)
> // }
> // mode: uffdio_continue_mode = 0x0 (8 bytes)
> // mapped: int64 = 0x0 (8 bytes)
> // }
> // }
> // ]
> *(uint64_t*)0x200000000280 = 0x200000400000;
> *(uint64_t*)0x200000000288 = 0xc00000;
> *(uint64_t*)0x200000000290 = 0;
> syscall(__NR_ioctl, /*fd=*/r[0], /*cmd=*/0xc020aa08,
> /*arg=*/0x200000000280ul);
>
>In reality this is:
>
> struct uffdio_poison poison = {
> .range = {
> .start = 0x200000400000,
> .len = 0xc00000, /* 12MB */
> },
> .mode = 0,
> };
>
>(!!!)
>
>Which in the kernel calls
>
>userfaultfd_ioctl()
>-> userfaultfd_poison()
>-> validate_range() -> validate_unaligned_range() <-- would ordinarily reject 0 len!!
>-> mfill_atomic_poison()
>-> mfill_atomic() [ hits bug]
>-> mfill_get_vma()
>-> uffd_mfill_lock(..., len=0!)
>
>static struct vm_area_struct *uffd_mfill_lock(struct mm_struct *dst_mm,
> unsigned long dst_start,
> unsigned long len)
>{
> struct vm_area_struct *dst_vma;
>
> dst_vma = uffd_lock_vma(dst_mm, dst_start);
> if (IS_ERR(dst_vma) || validate_dst_vma(dst_vma, dst_start + len))
> return dst_vma;
>}
>
>Here validate_dst_vma() succeeds trivially as len is 0
>
>BUT. The rest of mfill_atomic() uses len, not state.len.
>
>So this results in ONLY the validation check using the bogus len=0, and works
>with a 12MB size.
>
>Note that in the repro, we try to map a hugetlb VMA of (weirdly) 9.36 MB.
>
>Because we align the hugetlb and round it up from this to 10mb we get VMAs like:
>
> 0x1ffffffff000 0x200000a00000 0x200001001000
> |---------|------------------------------|-----------------------|---------|
> |1pg none | 2560 pages (10 MB) hugetlb | 1535 pages (6MB) WRX | 1pg none|
> |---------|------------------------------|-----------------------|---------|
> 0x200000000000 0x200001000000
>
>Because of the len bug, we happily try to install poison markers into 2 MB of
>the 1535 page anon WRX region which is not hugetlb and then BOOM.
>
>So Harry's fix resolves this, but we should handle this case better in
>zap_huge_pmd(), I will send a patch for that.
>
Thanks for the thorough analysis Lorenzo!
next prev parent reply other threads:[~2026-03-19 3:09 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-18 15:03 [syzbot] [mm?] general protection fault in zap_huge_pmd syzbot
2026-03-18 16:53 ` Lance Yang
2026-03-18 17:35 ` Lorenzo Stoakes (Oracle)
2026-03-19 2:58 ` Lance Yang
2026-03-18 17:26 ` Lorenzo Stoakes (Oracle)
2026-03-18 21:54 ` Aleksandr Nogikh
2026-03-19 10:04 ` Lorenzo Stoakes (Oracle)
2026-03-19 3:09 ` Lance Yang [this message]
2026-03-19 5:45 ` Mike Rapoport
2026-03-19 8:54 ` Lorenzo Stoakes (Oracle)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260319030914.12034-1-lance.yang@linux.dev \
--to=lance.yang@linux.dev \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=harry.yoo@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=npache@redhat.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=syzbot+de14f7701c22477db718@syzkaller.appspotmail.com \
--cc=syzkaller-bugs@googlegroups.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox