From: Lance Yang <lance.yang@linux.dev>
To: willy@infradead.org
Cc: syzbot+bf6e6a6ca143afea5ca2@syzkaller.appspotmail.com,
Liam.Howlett@oracle.com, akpm@linux-foundation.org,
baohua@kernel.org, baolin.wang@linux.alibaba.com,
david@kernel.org, dev.jain@arm.com, lance.yang@linux.dev,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
lorenzo.stoakes@oracle.com, npache@redhat.com,
ryan.roberts@arm.com, syzkaller-bugs@googlegroups.com,
ziy@nvidia.com
Subject: Re: [syzbot] [mm?] kernel BUG in hpage_collapse_scan_file (2)
Date: Sun, 25 Jan 2026 20:10:01 +0800 [thread overview]
Message-ID: <20260125121001.32733-1-lance.yang@linux.dev> (raw)
In-Reply-To: <69757ea0.a00a0220.33ccc7.0017.GAE@google.com>
Ccing Willy.
On Sat, 24 Jan 2026 18:23:28 -0800, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: ca3a02fda4da Add linux-next specific files for 20260123
> git tree: linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=10c42452580000
> kernel config: https://syzkaller.appspot.com/x/.config?x=10f2b64f8f12b9a4
> dashboard link: https://syzkaller.appspot.com/bug?extid=bf6e6a6ca143afea5ca2
> compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17f7cbfa580000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=112d405a580000
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/291ebca63a31/disk-ca3a02fd.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/b2112a214b54/vmlinux-ca3a02fd.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/77d1ae437e07/bzImage-ca3a02fd.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+bf6e6a6ca143afea5ca2@syzkaller.appspotmail.com
>
> node ffff888148816ec0 offset 0 parent ffff888148817700 shift 0 count 64 values 0 array ffff88807be6b0f0 list ffff888148816ed8 ffff888148816ed8 marks 0 0 0
> ------------[ cut here ]------------
> kernel BUG at ./include/linux/xarray.h:1441!
> Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
> CPU: 0 UID: 0 PID: 6017 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/13/2026
> RIP: 0010:XAS_INVALID include/linux/xarray.h:1441 [inline]
Seems like that is:
```
static inline struct xa_state *XAS_INVALID(struct xa_state *xas)
{
XA_NODE_BUG_ON(xas->xa_node, xas_valid(xas));
return xas;
}
```
Which was added by commit 43b00759f21b (not land upstream yet):
```
commit 43b00759f21b10142094d1ae5ff65cbb368953a3
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date: Sun Dec 14 10:53:31 2025 -0500
XArray: Add extra debugging check to xas_lock and friends
While tracking down a recent bug, we discovered somewhere that had
forgotten to call xas_reset() before calling xas_lock(). Add a debug
check to be sure that doesn't happen in future and fix all the places in
the test suite which were carelessly doing just this.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
```
which catches places that forget to reset xas before locking.
> RIP: 0010:collapse_file mm/khugepaged.c:2041 [inline]
Yeah, maybe it caught a bug in collapse_file() ...
When we lock again with xas_lock_irq(), xas->xa_node is still pointing
at a node from the earlier xas_load(), so the BUG_ON fires, IIUC.
Fix it by calling xas_set() before xas_lock_irq() to reset the state.
And one spot in rollback doesn't actually need xas at all, just changed
it to xa_lock_irq() directly.
---8<---
commit 2003255c52846ab10cad6c2e57cda4d17dddadbe
Author: Lance Yang <lance.yang@linux.dev>
Date: Sun Jan 25 19:37:56 2026 +0800
HACK
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index fba6aea5bea6..3656ae491385 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2038,6 +2038,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
try_to_unmap(folio,
TTU_IGNORE_MLOCK | TTU_BATCH_FLUSH);
+ xas_set(&xas, index);
xas_lock_irq(&xas);
VM_BUG_ON_FOLIO(folio != xa_load(xas.xa, index), folio);
@@ -2140,9 +2141,8 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
int nr_none_check = 0;
i_mmap_lock_read(mapping);
- xas_lock_irq(&xas);
-
xas_set(&xas, start);
+ xas_lock_irq(&xas);
for (index = start; index < end; index++) {
if (!xas_next(&xas)) {
xas_store(&xas, XA_RETRY_ENTRY);
@@ -2192,6 +2192,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
goto rollback;
}
} else {
+ xas_set(&xas, start);
xas_lock_irq(&xas);
}
@@ -2250,9 +2251,9 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
rollback:
/* Something went wrong: roll back page cache changes */
if (nr_none) {
- xas_lock_irq(&xas);
+ xa_lock_irq(&mapping->i_pages);
mapping->nrpages -= nr_none;
- xas_unlock_irq(&xas);
+ xa_unlock_irq(&mapping->i_pages);
shmem_uncharge(mapping->host, nr_none);
}
---
Tested with the syzbot reproducer[1], no more crashes :)
[1] https://syzkaller.appspot.com/x/repro.c?x=112d405a580000
Cheers,
Lance
[...]
next prev parent reply other threads:[~2026-01-25 12:10 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-25 2:23 [syzbot] [mm?] kernel BUG in hpage_collapse_scan_file (2) syzbot
2026-01-25 12:10 ` Lance Yang [this message]
2026-01-25 18:13 ` David Hildenbrand (Red Hat)
2026-01-26 1:54 ` Lance Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260125121001.32733-1-lance.yang@linux.dev \
--to=lance.yang@linux.dev \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=npache@redhat.com \
--cc=ryan.roberts@arm.com \
--cc=syzbot+bf6e6a6ca143afea5ca2@syzkaller.appspotmail.com \
--cc=syzkaller-bugs@googlegroups.com \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.