All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lance Yang <lance.yang@linux.dev>
To: willy@infradead.org
Cc: syzbot+bf6e6a6ca143afea5ca2@syzkaller.appspotmail.com,
	Liam.Howlett@oracle.com, akpm@linux-foundation.org,
	baohua@kernel.org, baolin.wang@linux.alibaba.com,
	david@kernel.org, dev.jain@arm.com, lance.yang@linux.dev,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	lorenzo.stoakes@oracle.com, npache@redhat.com,
	ryan.roberts@arm.com, syzkaller-bugs@googlegroups.com,
	ziy@nvidia.com
Subject: Re: [syzbot] [mm?] kernel BUG in hpage_collapse_scan_file (2)
Date: Sun, 25 Jan 2026 20:10:01 +0800	[thread overview]
Message-ID: <20260125121001.32733-1-lance.yang@linux.dev> (raw)
In-Reply-To: <69757ea0.a00a0220.33ccc7.0017.GAE@google.com>

Ccing Willy.

On Sat, 24 Jan 2026 18:23:28 -0800, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    ca3a02fda4da Add linux-next specific files for 20260123
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=10c42452580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=10f2b64f8f12b9a4
> dashboard link: https://syzkaller.appspot.com/bug?extid=bf6e6a6ca143afea5ca2
> compiler:       Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=17f7cbfa580000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=112d405a580000
> 
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/291ebca63a31/disk-ca3a02fd.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/b2112a214b54/vmlinux-ca3a02fd.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/77d1ae437e07/bzImage-ca3a02fd.xz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+bf6e6a6ca143afea5ca2@syzkaller.appspotmail.com
> 
> node ffff888148816ec0 offset 0 parent ffff888148817700 shift 0 count 64 values 0 array ffff88807be6b0f0 list ffff888148816ed8 ffff888148816ed8 marks 0 0 0
> ------------[ cut here ]------------
> kernel BUG at ./include/linux/xarray.h:1441!
> Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
> CPU: 0 UID: 0 PID: 6017 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) 
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/13/2026
> RIP: 0010:XAS_INVALID include/linux/xarray.h:1441 [inline]

Seems like that is:

```
static inline struct xa_state *XAS_INVALID(struct xa_state *xas)
{
	XA_NODE_BUG_ON(xas->xa_node, xas_valid(xas));
	return xas;
}
```

Which was added by commit 43b00759f21b (not land upstream yet):

```
commit 43b00759f21b10142094d1ae5ff65cbb368953a3
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Sun Dec 14 10:53:31 2025 -0500

    XArray: Add extra debugging check to xas_lock and friends

    While tracking down a recent bug, we discovered somewhere that had
    forgotten to call xas_reset() before calling xas_lock().  Add a debug
    check to be sure that doesn't happen in future and fix all the places in
    the test suite which were carelessly doing just this.

    Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
```

which catches places that forget to reset xas before locking.

> RIP: 0010:collapse_file mm/khugepaged.c:2041 [inline]

Yeah, maybe it caught a bug in collapse_file() ...

When we lock again with xas_lock_irq(), xas->xa_node is still pointing
at a node from the earlier xas_load(), so the BUG_ON fires, IIUC.

Fix it by calling xas_set() before xas_lock_irq() to reset the state.
And one spot in rollback doesn't actually need xas at all, just changed
it to xa_lock_irq() directly.

---8<---
commit 2003255c52846ab10cad6c2e57cda4d17dddadbe
Author: Lance Yang <lance.yang@linux.dev>
Date:   Sun Jan 25 19:37:56 2026 +0800

    HACK

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index fba6aea5bea6..3656ae491385 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2038,6 +2038,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
 			try_to_unmap(folio,
 					TTU_IGNORE_MLOCK | TTU_BATCH_FLUSH);

+		xas_set(&xas, index);
 		xas_lock_irq(&xas);

 		VM_BUG_ON_FOLIO(folio != xa_load(xas.xa, index), folio);
@@ -2140,9 +2141,8 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
 		int nr_none_check = 0;

 		i_mmap_lock_read(mapping);
-		xas_lock_irq(&xas);
-
 		xas_set(&xas, start);
+		xas_lock_irq(&xas);
 		for (index = start; index < end; index++) {
 			if (!xas_next(&xas)) {
 				xas_store(&xas, XA_RETRY_ENTRY);
@@ -2192,6 +2192,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
 			goto rollback;
 		}
 	} else {
+		xas_set(&xas, start);
 		xas_lock_irq(&xas);
 	}

@@ -2250,9 +2251,9 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
 rollback:
 	/* Something went wrong: roll back page cache changes */
 	if (nr_none) {
-		xas_lock_irq(&xas);
+		xa_lock_irq(&mapping->i_pages);
 		mapping->nrpages -= nr_none;
-		xas_unlock_irq(&xas);
+		xa_unlock_irq(&mapping->i_pages);
 		shmem_uncharge(mapping->host, nr_none);
 	}
---

Tested with the syzbot reproducer[1], no more crashes :)

[1] https://syzkaller.appspot.com/x/repro.c?x=112d405a580000

Cheers,
Lance

[...]


  reply	other threads:[~2026-01-25 12:10 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-25  2:23 [syzbot] [mm?] kernel BUG in hpage_collapse_scan_file (2) syzbot
2026-01-25 12:10 ` Lance Yang [this message]
2026-01-25 18:13   ` David Hildenbrand (Red Hat)
2026-01-26  1:54     ` Lance Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260125121001.32733-1-lance.yang@linux.dev \
    --to=lance.yang@linux.dev \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=npache@redhat.com \
    --cc=ryan.roberts@arm.com \
    --cc=syzbot+bf6e6a6ca143afea5ca2@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.