From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from out-177.mta0.migadu.com (out-177.mta0.migadu.com [91.218.175.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id A260834F255
	for <linux-kernel@vger.kernel.org>; Thu, 19 Mar 2026 03:09:23 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.177
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1773889765; cv=none; b=f51UL1cgQ9RyylRxWizklfb+K1Utd3jAsHhOw4I09hzODDUdY9zEZ9ibSIznf+xtJ2U7nKqN1NuvLzvD4flukUm5oycfrms/MwIcAPcMDehMc+UBRKnOpkhKPQhUQ1kOtERuasaKewCw750jbJdeBsnohBKvt7JCptNpxF5yDNs=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1773889765; c=relaxed/simple;
	bh=uBTmAe+LqD/qOsOVdV2XkYOWyI9w5qAamw8EtTBAUpw=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version:Content-Type; b=Segll6IgwyX/rYmZ9WIgHXNISagic3Izr/nnawqCXx65l11+YKr95ESLvvYfmMt5U7Bk9RusuZd53ohHEE6x9FxrKmkWJepHtx4AWCUyb7MFkAy4owK8IsvEV7b5CUJiV3f0Qcbgglfn9fzw9faUW8pq1SFtDrruq9UeuvmUnik=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=NwIljfUc; arc=none smtp.client-ip=91.218.175.177
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="NwIljfUc"
X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers.
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1;
	t=1773889761;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=bZybTTbg8AAvyS6cqk7gUlRWFfSkSvTu+/1JIIo5e2U=;
	b=NwIljfUcwu6sWUozpAHmhTLmjRPC8kVCnJYqYcu/eAoNJQz5EuDf8KNK0ijl9aGEmcGs38
	6v5lDGQAqcd0lyC0J2RAmaKqHkieu6iLqAjUYtJt5stIfbKbB5gn3XxnR5VcmvTW5OYhWW
	jBcAYtFOlHSH1aJB3NrPSqfoesEQJvo=
From: Lance Yang <lance.yang@linux.dev>
To: ljs@kernel.org
Cc: syzbot+de14f7701c22477db718@syzkaller.appspotmail.com,
	Liam.Howlett@oracle.com,
	akpm@linux-foundation.org,
	baohua@kernel.org,
	baolin.wang@linux.alibaba.com,
	david@kernel.org,
	dev.jain@arm.com,
	lance.yang@linux.dev,
	linux-kernel@vger.kernel.org,
	linux-mm@kvack.org,
	npache@redhat.com,
	ryan.roberts@arm.com,
	syzkaller-bugs@googlegroups.com,
	ziy@nvidia.com,
	rppt@kernel.org,
	harry.yoo@oracle.com
Subject: Re: [syzbot] [mm?] general protection fault in zap_huge_pmd
Date: Thu, 19 Mar 2026 11:09:14 +0800
Message-Id: <20260319030914.12034-1-lance.yang@linux.dev>
In-Reply-To: <6b3d7ad7-49e1-407a-903d-3103704160d8@lucifer.local>
References: <6b3d7ad7-49e1-407a-903d-3103704160d8@lucifer.local>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Migadu-Flow: FLOW_OUT


On Wed, Mar 18, 2026 at 05:26:32PM +0000, Lorenzo Stoakes (Oracle) wrote:
>+cc Mike for uffd, Harry for fix that also resolves this, see below
>
>On Wed, Mar 18, 2026 at 08:03:22AM -0700, syzbot wrote:
>> Hello,
>>
>> syzbot found the following issue on:
>>
>> HEAD commit:    b84a0ebe421c Add linux-next specific files for 20260313
>
>For some reason I have to git pull --tags to get this... commit hash locally?
>Strange.
>
>> git tree:       linux-next
>> console output: https://syzkaller.appspot.com/x/log.txt?x=119ddd52580000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=e7280ad1f68b2dce
>> dashboard link: https://syzkaller.appspot.com/bug?extid=de14f7701c22477db718
>> compiler:       Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=173b44da580000
>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1537b8da580000
>
>@SYZKALLER guys:
>
>Note: the repro is incorrectly labelling;
>
>  //  ioctl$UFFDIO_CONTINUE arguments: [
>  //    fd: fd_uffd (resource)
>  //    cmd: const = 0xc020aa08 (4 bytes)
>
>as UFFDIO_CONTINUE (0x7), it's actually UFFDIO_POISION (0x8) as you can see
>from least-significant byte.

#define _UFFDIO_CONTINUE		(0x07)
#define _UFFDIO_POISON			(0x08)

Ouch. I spent quite some time trying to figure out how UFFDIO_CONTINUE
could possibly install PTE markers and push the loop past the VMA
boundary - turns out it can't, because it was UFFDIO_POISON all along.

>
>It's also stating things like mmap flags wrong e.g.:
>
>      /*flags=MAP_UNINITIALIZED|MAP_POPULATE|MAP_NORESERVE|MAP_NONBLOCK|MAP_HUGETLB|0x8c4b815a506002b2*/
>      0x8c4b815a5465c2b2ul,
>
>AT LEAST MAKE THE NUMBERS MATCH :) this doesn't help with debugging.
>
>AI hallucinations?
>
>It also never returns with an error if a syscall doesn't work which means the
>repro can run 'ok' but actually be failing on something, this really slows down repro'ing.
>
>Maybe hard, but be good to figure out maintainers based on the stuff the repro
>uses uffd -> uffd entry in MAINTAINERS :)
>
>OK rants done :) got it repro'ing locally now.
>
>>
>> Downloadable assets:
>> disk image: https://storage.googleapis.com/syzbot-assets/09145161a8a9/disk-b84a0ebe.raw.xz
>> vmlinux: https://storage.googleapis.com/syzbot-assets/b64c254e474c/vmlinux-b84a0ebe.xz
>> kernel image: https://storage.googleapis.com/syzbot-assets/a7c33f5f7f45/bzImage-b84a0ebe.xz
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+de14f7701c22477db718@syzkaller.appspotmail.com
>>
>> Oops: general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] SMP KASAN PTI
>> KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
>> CPU: 1 UID: 0 PID: 5994 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2026
>> RIP: 0010:folio_test_anon include/linux/page-flags.h:718 [inline]
>
>static __always_inline bool folio_test_anon(const struct folio *folio)
>{
>	return ((unsigned long)folio->mapping & FOLIO_MAPPING_ANON) != 0; <-- NULL folio
>}
>
>
>> RIP: 0010:zap_huge_pmd+0x7b1/0x1030 mm/huge_memory.c:2463
>
>int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>		 pmd_t *pmd, unsigned long addr)
>{
>	...
>
>	if (!vma_is_dax(vma) && vma_is_special_huge(vma)) {
>		...
>	} else if (is_huge_zero_pmd(orig_pmd)) {
>		...
>	} else {
>		struct folio *folio = NULL;
>
>		...
>
>		if (pmd_present(orig_pmd)) {
>			...
>		} else if (pmd_is_valid_softleaf(orig_pmd)) {
>			...
>		}
>
>		if (folio_test_anon(folio)) { <-- if !pmd_present() && !pmd_is_valid_softleaf(orig_pmd)
>
>Yikes. We should probably put an } else { VM_WARN_ON_ONCE(1); } at least above
>this...
>
>
>
>
>> Code: 08 00 00 e8 11 e0 92 ff 48 c7 44 24 10 00 00 00 00 4c 8b 3c 24 4c 8d 75 18 4c 89 f0 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <80> 3c 08 00 74 08 4c 89 f7 e8 f1 43 fc ff 49 8b 1e 48 89 de 48 83
>> RSP: 0018:ffffc90003bb7550 EFLAGS: 00010206
>> RAX: 0000000000000003 RBX: f000000000000000 RCX: dffffc0000000000
>> RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003
>> RBP: 0000000000000000 R08: ffff88807cc9802f R09: 1ffff1100f993005
>> R10: dffffc0000000000 R11: ffffed100f993006 R12: ffff88807cc98028
>> R13: fffffffffffffa00 R14: 0000000000000018 R15: ffffc90003bb7ac0
>> FS:  0000000000000000(0000) GS:ffff888124ee0000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00002000000000c0 CR3: 000000000e94a000 CR4: 00000000003526f0
>> Call Trace:
>>  <TASK>
>>  zap_pmd_range mm/memory.c:1990 [inline]
>
>			else if (zap_huge_pmd(tlb, vma, pmd, addr)) { <-- here
>
>>  zap_pud_range mm/memory.c:2032 [inline]
>>  zap_p4d_range mm/memory.c:2053 [inline]
>>  __zap_vma_range+0xa82/0x4bd0 mm/memory.c:2093
>>  unmap_vmas+0x379/0x530 mm/memory.c:2162
>>  exit_mmap+0x280/0xa10 mm/mmap.c:1302
>>  __mmput+0x118/0x430 kernel/fork.c:1180
>>  exit_mm+0x18e/0x250 kernel/exit.c:581
>>  do_exit+0x8b9/0x2490 kernel/exit.c:962
>>  do_group_exit+0x21b/0x2d0 kernel/exit.c:1116
>>  __do_sys_exit_group kernel/exit.c:1127 [inline]
>>  __se_sys_exit_group kernel/exit.c:1125 [inline]
>>  __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1125
>>  x64_sys_call+0x221a/0x2240 arch/x86/include/generated/asm/syscalls_64.h:232
>>  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>>  do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
>>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
>So this is on process teardown.
>
>Looking at the repro (+ trying to decode what it ACTUALLY does :) it looks like
>it's installing a PTE_MARKER_POISONED at a PMD level via hugetlb, because since
>commit 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs")
>this is supported.
>
>Normally this would be handled by __unmap_hugepage_range():
>
>	if (unlikely(is_vm_hugetlb_page(vma))) {
>		...
>		__unmap_hugepage_range(tlb, vma, start, end, NULL, zap_flags);
>	} else {
>		...
>		next = zap_p4d_range(tlb, vma, pgd, addr, next, details);
>	}
>
>But for some reason the zap_p4d_range() path is being used.
>
>I got a the repro reliably working locally (not sure why syzkaller didn't
>bisect) so I have bisected it to commit 7d4d4de3ac3e ("userfaultfd: introduce
>mfill_get_vma() and mfill_put_vma()").
>
>And.. of course, after spending (wasting? :) a long time on this, it's already
>fixed...
>
>It seems it's fixed by https://lore.kernel.org/linux-mm/abehBY7QakYF9bK4@hyeyoo/
>
>Before mfill_atomic() would initialise some mfill_state helper struct like this:
>
>	struct mfill_state state = (struct mfill_state){
>		.ctx = ctx,
>		.dst_start = dst_start,
>		.src_start = src_start,
>		.flags = flags,
>
>		.src_addr = src_start,
>		.dst_addr = dst_start,
>	};
>
>BUT not initialise .len = len
>
>So length from then on is assumed to be 0.
>
>OK so the repro, again, generates TOTALLY incorrect labelling:
>
>  //  ioctl$UFFDIO_CONTINUE arguments: [
>  //    fd: fd_uffd (resource)
>  //    cmd: const = 0xc020aa08 (4 bytes)
>  //    arg: ptr[in, uffdio_continue] {
>  //      uffdio_continue {
>  //        range: uffdio_range {
>  //          start: VMA[0xc00000]
>  //          len: len = 0xc00000 (8 bytes)
>  //        }
>  //        mode: uffdio_continue_mode = 0x0 (8 bytes)
>  //        mapped: int64 = 0x0 (8 bytes)
>  //      }
>  //    }
>  //  ]
>  *(uint64_t*)0x200000000280 = 0x200000400000;
>  *(uint64_t*)0x200000000288 = 0xc00000;
>  *(uint64_t*)0x200000000290 = 0;
>  syscall(__NR_ioctl, /*fd=*/r[0], /*cmd=*/0xc020aa08,
>          /*arg=*/0x200000000280ul);
>
>In reality this is:
>
>  struct uffdio_poison poison = {
>      .range = {
>          .start = 0x200000400000,
>          .len   = 0xc00000,         /* 12MB */
>      },
>      .mode = 0,
>  };
>
>(!!!)
>
>Which in the kernel calls
>
>userfaultfd_ioctl()
>-> userfaultfd_poison()
>-> validate_range() -> validate_unaligned_range() <-- would ordinarily reject 0 len!!
>-> mfill_atomic_poison()
>-> mfill_atomic() [ hits bug]
>-> mfill_get_vma()
>-> uffd_mfill_lock(..., len=0!)
>
>static struct vm_area_struct *uffd_mfill_lock(struct mm_struct *dst_mm,
>					      unsigned long dst_start,
>					      unsigned long len)
>{
>	struct vm_area_struct *dst_vma;
>
>	dst_vma = uffd_lock_vma(dst_mm, dst_start);
>	if (IS_ERR(dst_vma) || validate_dst_vma(dst_vma, dst_start + len))
>		return dst_vma;
>}
>
>Here validate_dst_vma() succeeds trivially as len is 0
>
>BUT. The rest of mfill_atomic() uses len, not state.len.
>
>So this results in ONLY the validation check using the bogus len=0, and works
>with a 12MB size.
>
>Note that in the repro, we try to map a hugetlb VMA of (weirdly) 9.36 MB.
>
>Because we align the hugetlb and round it up from this to 10mb we get VMAs like:
>
>  0x1ffffffff000                          0x200000a00000                    0x200001001000
>  |---------|------------------------------|-----------------------|---------|
>  |1pg none |  2560 pages (10 MB) hugetlb  | 1535 pages (6MB) WRX  | 1pg none|
>  |---------|------------------------------|-----------------------|---------|
>            0x200000000000                                         0x200001000000
>
>Because of the len bug, we happily try to install poison markers into 2 MB of
>the 1535 page anon WRX region which is not hugetlb and then BOOM.
>
>So Harry's fix resolves this, but we should handle this case better in
>zap_huge_pmd(), I will send a patch for that.
>

Thanks for the thorough analysis Lorenzo!