public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [syzbot] [mm?] general protection fault in lru_gen_test_recent (2)
@ 2025-12-07  8:55 syzbot
  2025-12-07 12:44 ` Forwarded: [PATCH] mm/workingset: fix NULL pointer dereference in lru_gen_test_recent syzbot
                   ` (14 more replies)
  0 siblings, 15 replies; 16+ messages in thread
From: syzbot @ 2025-12-07  8:55 UTC (permalink / raw)
  To: akpm, axelrasmussen, david, hannes, linux-kernel, linux-mm,
	lorenzo.stoakes, mhocko, shakeel.butt, syzkaller-bugs, weixugc,
	yuanchu, zhengqi.arch

Hello,

syzbot found the following issue on:

HEAD commit:    c06c303832ec ocfs2: fix xattr array entry __counted_by error
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=14cbfc1a580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=5aef7d5187304591
dashboard link: https://syzkaller.appspot.com/bug?extid=e008db2ac01e282550ee
compiler:       gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=127f2992580000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15cf4eb4580000

Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-c06c3038.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/1a5115eeda38/vmlinux-c06c3038.xz
kernel image: https://storage.googleapis.com/syzbot-assets/98eb17e54bb8/bzImage-c06c3038.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+e008db2ac01e282550ee@syzkaller.appspotmail.com

Oops: general protection fault, probably for non-canonical address 0xdffffc00000009c0: 0000 [#1] SMP KASAN NOPTI
KASAN: probably user-memory-access in range [0x0000000000004e00-0x0000000000004e07]
CPU: 2 UID: 0 PID: 6121 Comm: syz.0.27 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:mem_cgroup_lruvec include/linux/memcontrol.h:720 [inline]
RIP: 0010:lru_gen_test_recent+0xee/0x320 mm/workingset.c:275
Code: 38 80 b5 ff 48 85 db 0f 84 79 01 00 00 e8 2a 80 b5 ff 49 8d bd 00 4e 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e a3 01 00 00 4d 63 b5 00 4e 00
RSP: 0018:ffffc90003e17828 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: ffff888100068000 RCX: ffffc90003e1772c
RDX: 00000000000009c0 RSI: ffffffff82096446 RDI: 0000000000004e00
RBP: ffffc90003e178c0 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: ffff888028282ff0 R12: ffffc90003e178e0
R13: 0000000000000000 R14: ffffc90003e178b0 R15: 0000000000000000
FS:  00007f6361dfa6c0(0000) GS:ffff8880d6b0d000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000409000000 CR3: 0000000039516000 CR4: 0000000000352ef0
Call Trace:
 <TASK>
 lru_gen_refault mm/workingset.c:296 [inline]
 workingset_refault+0x251/0xca0 mm/workingset.c:546
 filemap_add_folio+0x23d/0x610 mm/filemap.c:981
 do_read_cache_folio+0x23c/0x5c0 mm/filemap.c:4063
 freader_get_folio+0x33a/0x930 lib/buildid.c:58
 freader_fetch+0xbd/0x740 lib/buildid.c:101
 __build_id_parse.isra.0+0xdd/0x6c0 lib/buildid.c:289
 do_procmap_query+0xb0e/0x1080 fs/proc/task_mmu.c:733
 procfs_procmap_ioctl+0x9d/0xe0 fs/proc/task_mmu.c:813
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:597 [inline]
 __se_sys_ioctl fs/ioctl.c:583 [inline]
 __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:583
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f6360f8f7c9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f6361dfa038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f63611e5fa0 RCX: 00007f6360f8f7c9
RDX: 0000200000000180 RSI: 00000000c0686611 RDI: 0000000000000003
RBP: 00007f6361013f91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f63611e6038 R14: 00007f63611e5fa0 R15: 00007ffe36ff8138
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:mem_cgroup_lruvec include/linux/memcontrol.h:720 [inline]
RIP: 0010:lru_gen_test_recent+0xee/0x320 mm/workingset.c:275
Code: 38 80 b5 ff 48 85 db 0f 84 79 01 00 00 e8 2a 80 b5 ff 49 8d bd 00 4e 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e a3 01 00 00 4d 63 b5 00 4e 00
RSP: 0018:ffffc90003e17828 EFLAGS: 00010206

RAX: dffffc0000000000 RBX: ffff888100068000 RCX: ffffc90003e1772c
RDX: 00000000000009c0 RSI: ffffffff82096446 RDI: 0000000000004e00
RBP: ffffc90003e178c0 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: ffff888028282ff0 R12: ffffc90003e178e0
R13: 0000000000000000 R14: ffffc90003e178b0 R15: 0000000000000000
FS:  00007f6361dfa6c0(0000) GS:ffff8880d6b0d000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000409000000 CR3: 0000000039516000 CR4: 0000000000352ef0
----------------
Code disassembly (best guess):
   0:	38 80 b5 ff 48 85    	cmp    %al,-0x7ab7004b(%rax)
   6:	db 0f                	fisttpl (%rdi)
   8:	84 79 01             	test   %bh,0x1(%rcx)
   b:	00 00                	add    %al,(%rax)
   d:	e8 2a 80 b5 ff       	call   0xffb5803c
  12:	49 8d bd 00 4e 00 00 	lea    0x4e00(%r13),%rdi
  19:	48 b8 00 00 00 00 00 	movabs $0xdffffc0000000000,%rax
  20:	fc ff df
  23:	48 89 fa             	mov    %rdi,%rdx
  26:	48 c1 ea 03          	shr    $0x3,%rdx
* 2a:	0f b6 04 02          	movzbl (%rdx,%rax,1),%eax <-- trapping instruction
  2e:	84 c0                	test   %al,%al
  30:	74 08                	je     0x3a
  32:	3c 03                	cmp    $0x3,%al
  34:	0f 8e a3 01 00 00    	jle    0x1dd
  3a:	4d                   	rex.WRB
  3b:	63                   	.byte 0x63
  3c:	b5 00                	mov    $0x0,%ch
  3e:	4e                   	rex.WRX


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Forwarded: [PATCH] mm/workingset: fix NULL pointer dereference in lru_gen_test_recent
  2025-12-07  8:55 [syzbot] [mm?] general protection fault in lru_gen_test_recent (2) syzbot
@ 2025-12-07 12:44 ` syzbot
  2025-12-07 14:35 ` syzbot
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: syzbot @ 2025-12-07 12:44 UTC (permalink / raw)
  To: linux-kernel, syzkaller-bugs

For archival purposes, forwarding an incoming command email to
linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com.

***

Subject: [PATCH] mm/workingset: fix NULL pointer dereference in lru_gen_test_recent
Author: kartikey406@gmail.com

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

Add NULL check for memcg in lru_gen_test_recent() to prevent crash when
mem_cgroup_from_id() returns NULL.

The crash occurs when a folio's shadow entry contains a memcg_id that
no longer maps to a valid memory cgroup. This can happen when:

1. The memory cgroup has been deleted/freed
2. A folio was created without proper memcg association (e.g., during
   procmap_query build ID parsing via freader_get_folio)
3. The memcg_id in the shadow entry is invalid or zero

When lru_gen_test_recent() calls mem_cgroup_from_id(), it may return
NULL. The subsequent call to mem_cgroup_lruvec() with a NULL memcg
triggers a crash because the inline function's code calculates
memcg->nodeinfo offset (0x4e00) before the NULL check can execute,
causing a NULL pointer dereference that KASAN detects.

Although mem_cgroup_lruvec() has a NULL check internally, compiler
inlining and optimization causes the offset calculation to occur
first, making the internal check unreachable.

The fix adds an explicit NULL check after mem_cgroup_from_id() and
falls back to root_mem_cgroup, which is consistent with how
mem_cgroup_lruvec() itself handles NULL pointers.

Reproducer triggers this via:
  procfs_procmap_ioctl() -> do_procmap_query() -> __build_id_parse() ->
  freader_get_folio() -> filemap_add_folio() -> workingset_refault() ->
  lru_gen_refault() -> lru_gen_test_recent()

KASAN report:
  general protection fault in mem_cgroup_lruvec
  RIP: mem_cgroup_lruvec+0xee/0x320 include/linux/memcontrol.h:720
  Call Trace:
   lru_gen_test_recent+0xee/0x320 mm/workingset.c:275
   workingset_refault+0x251/0xca0 mm/workingset.c:546
   filemap_add_folio+0x23d/0x610 mm/filemap.c:981

Reported-by: syzbot+e008db2ac01e282550ee@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=e008db2ac01e282550ee
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
---
 mm/workingset.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/workingset.c b/mm/workingset.c
index e9f05634747a..8b6332cfb4f0 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -272,6 +272,8 @@ static bool lru_gen_test_recent(void *shadow, struct lruvec **lruvec,
 	unpack_shadow(shadow, &memcg_id, &pgdat, token, workingset);
 
 	memcg = mem_cgroup_from_id(memcg_id);
+	if (!memcg)
+		memcg = root_mem_cgroup;
 	*lruvec = mem_cgroup_lruvec(memcg, pgdat);
 
 	max_seq = READ_ONCE((*lruvec)->lrugen.max_seq);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Forwarded: [PATCH] mm/workingset: fix NULL pointer dereference in lru_gen_test_recent
  2025-12-07  8:55 [syzbot] [mm?] general protection fault in lru_gen_test_recent (2) syzbot
  2025-12-07 12:44 ` Forwarded: [PATCH] mm/workingset: fix NULL pointer dereference in lru_gen_test_recent syzbot
@ 2025-12-07 14:35 ` syzbot
  2025-12-07 15:05 ` syzbot
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: syzbot @ 2025-12-07 14:35 UTC (permalink / raw)
  To: linux-kernel, syzkaller-bugs

For archival purposes, forwarding an incoming command email to
linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com.

***

Subject: [PATCH] mm/workingset: fix NULL pointer dereference in lru_gen_test_recent
Author: kartikey406@gmail.com

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

Add NULL check for memcg in lru_gen_test_recent() to prevent crash when
mem_cgroup_from_id() returns NULL.

The crash occurs when a folio's shadow entry contains a memcg_id that
no longer maps to a valid memory cgroup. This can happen when:

1. The memory cgroup has been deleted/freed
2. A folio was created without proper memcg association (e.g., during
   procmap_query build ID parsing via freader_get_folio)
3. The memcg_id in the shadow entry is invalid or zero

When lru_gen_test_recent() calls mem_cgroup_from_id(), it may return
NULL. The subsequent call to mem_cgroup_lruvec() with NULL memcg
triggers a crash.

Although mem_cgroup_lruvec() has an internal NULL check, the crash
occurs before reaching it due to compiler optimization. Since
mem_cgroup_lruvec() is an inline function, the compiler calculates
the offset memcg->nodeinfo (0x4e00) before the function's NULL check
can execute, causing a NULL pointer dereference.

Fix this by introducing an effective_memcg variable that is explicitly
set to root_mem_cgroup when memcg is NULL. This approach forces the
compiler to use a separate register/memory location, preventing the
premature offset calculation that caused the crash with a simple
in-place NULL check.

Reported-by: syzbot+e008db2ac01e282550ee@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=e008db2ac01e282550ee
Fixes: ac35a4902374 ("mm: multi-gen LRU: minimal implementation")
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
---
 mm/workingset.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/workingset.c b/mm/workingset.c
index e9f05634747a..dad8b16af105 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -266,13 +266,14 @@ static bool lru_gen_test_recent(void *shadow, struct lruvec **lruvec,
 {
 	int memcg_id;
 	unsigned long max_seq;
-	struct mem_cgroup *memcg;
+	struct mem_cgroup *memcg, *effective_memcg;
 	struct pglist_data *pgdat;
 
 	unpack_shadow(shadow, &memcg_id, &pgdat, token, workingset);
 
 	memcg = mem_cgroup_from_id(memcg_id);
-	*lruvec = mem_cgroup_lruvec(memcg, pgdat);
+	effective_memcg = memcg ? : root_mem_cgroup;
+	*lruvec = mem_cgroup_lruvec(effective_memcg, pgdat);
 
 	max_seq = READ_ONCE((*lruvec)->lrugen.max_seq);
 	max_seq &= EVICTION_MASK >> LRU_REFS_WIDTH;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Forwarded: [PATCH] mm/workingset: fix NULL pointer dereference in lru_gen_test_recent
  2025-12-07  8:55 [syzbot] [mm?] general protection fault in lru_gen_test_recent (2) syzbot
  2025-12-07 12:44 ` Forwarded: [PATCH] mm/workingset: fix NULL pointer dereference in lru_gen_test_recent syzbot
  2025-12-07 14:35 ` syzbot
@ 2025-12-07 15:05 ` syzbot
  2025-12-07 15:31 ` syzbot
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: syzbot @ 2025-12-07 15:05 UTC (permalink / raw)
  To: linux-kernel, syzkaller-bugs

For archival purposes, forwarding an incoming command email to
linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com.

***

Subject: [PATCH] mm/workingset: fix NULL pointer dereference in lru_gen_test_recent
Author: kartikey406@gmail.com

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

Add NULL check for memcg in lru_gen_test_recent() to prevent crash when
mem_cgroup_from_id() returns NULL.

The crash occurs when a folio's shadow entry contains a memcg_id that
no longer maps to a valid memory cgroup. This can happen when:

1. The memory cgroup has been deleted/freed
2. A folio was created without proper memcg association (e.g., during
   procmap_query build ID parsing via freader_get_folio)
3. The memcg_id in the shadow entry is invalid or zero

When lru_gen_test_recent() calls mem_cgroup_from_id(), it may return
NULL. The subsequent call to mem_cgroup_lruvec() with NULL memcg
triggers a crash.

Although mem_cgroup_lruvec() has an internal NULL check, the crash
occurs before reaching it due to compiler optimization. Since
mem_cgroup_lruvec() is an inline function, the compiler calculates
the offset memcg->nodeinfo (0x4e00) before the function's NULL check
can execute, causing a NULL pointer dereference.

Fix this by introducing an effective_memcg variable that is explicitly
set to root_mem_cgroup when memcg is NULL. This approach forces the
compiler to use a separate register/memory location, preventing the
premature offset calculation that caused the crash with a simple
in-place NULL check.

Reported-by: syzbot+e008db2ac01e282550ee@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=e008db2ac01e282550ee
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
---
 mm/workingset.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/mm/workingset.c b/mm/workingset.c
index e9f05634747a..847580173fb0 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -272,8 +272,15 @@ static bool lru_gen_test_recent(void *shadow, struct lruvec **lruvec,
 	unpack_shadow(shadow, &memcg_id, &pgdat, token, workingset);
 
 	memcg = mem_cgroup_from_id(memcg_id);
+	pr_warn("DEBUG: memcg_id=%d memcg=%p root_mem_cgroup=%p\n",memcg_id, memcg, root_mem_cgroup);
+	if (!memcg) {
+		pr_warn("DEBUG: memcg is NULL, using root_mem_cgroup\n");
+		memcg = root_mem_cgroup;
+		pr_warn("DEBUG: after assignment memcg=%p\n", memcg);
+	}
+	pr_warn("DEBUG: about to call mem_cgroup_lruvec with memcg=%p\n", memcg);
 	*lruvec = mem_cgroup_lruvec(memcg, pgdat);
-
+	pr_warn("DEBUG: mem_cgroup_lruvec returned successfully\n");
 	max_seq = READ_ONCE((*lruvec)->lrugen.max_seq);
 	max_seq &= EVICTION_MASK >> LRU_REFS_WIDTH;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Forwarded: [PATCH] mm/workingset: fix NULL pointer dereference in lru_gen_test_recent
  2025-12-07  8:55 [syzbot] [mm?] general protection fault in lru_gen_test_recent (2) syzbot
                   ` (2 preceding siblings ...)
  2025-12-07 15:05 ` syzbot
@ 2025-12-07 15:31 ` syzbot
  2025-12-07 15:38 ` syzbot
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: syzbot @ 2025-12-07 15:31 UTC (permalink / raw)
  To: linux-kernel, syzkaller-bugs

For archival purposes, forwarding an incoming command email to
linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com.

***

Subject: [PATCH] mm/workingset: fix NULL pointer dereference in lru_gen_test_recent
Author: kartikey406@gmail.com

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

Add NULL check for memcg in lru_gen_test_recent() to prevent crash when
mem_cgroup_from_id() returns NULL.

The crash occurs when a folio's shadow entry contains a memcg_id that
no longer maps to a valid memory cgroup. This can happen when:

1. The memory cgroup has been deleted/freed
2. A folio was created without proper memcg association (e.g., during
   procmap_query build ID parsing via freader_get_folio)
3. The memcg_id in the shadow entry is invalid or zero

When lru_gen_test_recent() calls mem_cgroup_from_id(), it may return
NULL. The subsequent call to mem_cgroup_lruvec() with NULL memcg
triggers a crash.

Although mem_cgroup_lruvec() has an internal NULL check, the crash
occurs before reaching it due to compiler optimization. Since
mem_cgroup_lruvec() is an inline function, the compiler calculates
the offset memcg->nodeinfo (0x4e00) before the function's NULL check
can execute, causing a NULL pointer dereference.

Fix this by introducing an effective_memcg variable that is explicitly
set to root_mem_cgroup when memcg is NULL. This approach forces the
compiler to use a separate register/memory location, preventing the
premature offset calculation that caused the crash with a simple
in-place NULL check.

Reported-by: syzbot+e008db2ac01e282550ee@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=e008db2ac01e282550ee
Fixes: ac35a4902374 ("mm: multi-gen LRU: minimal implementation")
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
---
 mm/workingset.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/workingset.c b/mm/workingset.c
index e9f05634747a..8166793b38dc 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -272,6 +272,8 @@ static bool lru_gen_test_recent(void *shadow, struct lruvec **lruvec,
 	unpack_shadow(shadow, &memcg_id, &pgdat, token, workingset);
 
 	memcg = mem_cgroup_from_id(memcg_id);
+	if (unlikely(!memcg))
+		WRITE_ONCE(memcg, root_mem_cgroup);
 	*lruvec = mem_cgroup_lruvec(memcg, pgdat);
 
 	max_seq = READ_ONCE((*lruvec)->lrugen.max_seq);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Forwarded: [PATCH] mm/workingset: fix NULL pointer dereference in lru_gen_test_recent
  2025-12-07  8:55 [syzbot] [mm?] general protection fault in lru_gen_test_recent (2) syzbot
                   ` (3 preceding siblings ...)
  2025-12-07 15:31 ` syzbot
@ 2025-12-07 15:38 ` syzbot
  2025-12-07 16:07 ` syzbot
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: syzbot @ 2025-12-07 15:38 UTC (permalink / raw)
  To: linux-kernel, syzkaller-bugs

For archival purposes, forwarding an incoming command email to
linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com.

***

Subject: [PATCH] mm/workingset: fix NULL pointer dereference in lru_gen_test_recent
Author: kartikey406@gmail.com

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

Add NULL check for memcg in lru_gen_test_recent() to prevent crash when
mem_cgroup_from_id() returns NULL.

The crash occurs when a folio's shadow entry contains a memcg_id that
no longer maps to a valid memory cgroup. This can happen when:

1. The memory cgroup has been deleted/freed
2. A folio was created without proper memcg association (e.g., during
   procmap_query build ID parsing via freader_get_folio)
3. The memcg_id in the shadow entry is invalid or zero

When lru_gen_test_recent() calls mem_cgroup_from_id(), it may return
NULL. The subsequent call to mem_cgroup_lruvec() with NULL memcg
triggers a crash.

Although mem_cgroup_lruvec() has an internal NULL check, the crash
occurs before reaching it due to compiler optimization. Since
mem_cgroup_lruvec() is an inline function, the compiler calculates
the offset memcg->nodeinfo (0x4e00) before the function's NULL check
can execute, causing a NULL pointer dereference.

Fix this by introducing an effective_memcg variable that is explicitly
set to root_mem_cgroup when memcg is NULL. This approach forces the
compiler to use a separate register/memory location, preventing the
premature offset calculation that caused the crash with a simple
in-place NULL check.

Reported-by: syzbot+e008db2ac01e282550ee@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=e008db2ac01e282550ee
Fixes: ac35a4902374 ("mm: multi-gen LRU: minimal implementation")
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
---
 mm/workingset.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/mm/workingset.c b/mm/workingset.c
index e9f05634747a..4fa33b57f0ca 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -272,7 +272,11 @@ static bool lru_gen_test_recent(void *shadow, struct lruvec **lruvec,
 	unpack_shadow(shadow, &memcg_id, &pgdat, token, workingset);
 
 	memcg = mem_cgroup_from_id(memcg_id);
-	*lruvec = mem_cgroup_lruvec(memcg, pgdat);
+	if(unlikely(!memcg)) {
+		*lruvec = &pgdat->__lruvec;
+	} else {
+		*lruvec = mem_cgroup_lruvec(memcg, pgdat);
+	}
 
 	max_seq = READ_ONCE((*lruvec)->lrugen.max_seq);
 	max_seq &= EVICTION_MASK >> LRU_REFS_WIDTH;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Forwarded: [PATCH] mm/workingset: fix NULL pointer dereference in lru_gen_test_recent
  2025-12-07  8:55 [syzbot] [mm?] general protection fault in lru_gen_test_recent (2) syzbot
                   ` (4 preceding siblings ...)
  2025-12-07 15:38 ` syzbot
@ 2025-12-07 16:07 ` syzbot
  2025-12-08  2:31 ` Forwarded: [PATCH] mm/workingset: fix NULL pointer dereference in lru_gen_test_recent() syzbot
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: syzbot @ 2025-12-07 16:07 UTC (permalink / raw)
  To: linux-kernel, syzkaller-bugs

For archival purposes, forwarding an incoming command email to
linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com.

***

Subject: [PATCH] mm/workingset: fix NULL pointer dereference in lru_gen_test_recent
Author: kartikey406@gmail.com

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master


Add NULL check for memcg in lru_gen_test_recent() to prevent crash when
mem_cgroup_from_id() returns NULL.

The crash occurs when a folio's shadow entry contains a memcg_id that
no longer maps to a valid memory cgroup. This can happen when:

1. The memory cgroup has been deleted/freed
2. A folio was created without proper memcg association (e.g., during
   procmap_query build ID parsing via freader_get_folio)
3. The memcg_id in the shadow entry is invalid or zero

When lru_gen_test_recent() calls mem_cgroup_from_id(), it may return
NULL. The subsequent call to mem_cgroup_lruvec() with NULL memcg
triggers a crash.

Although mem_cgroup_lruvec() has an internal NULL check, the crash
occurs before reaching it due to compiler optimization. Since
mem_cgroup_lruvec() is an inline function, the compiler calculates
the offset memcg->nodeinfo (0x4e00) before the function's NULL check
can execute, causing a NULL pointer dereference.

Fix this by introducing an effective_memcg variable that is explicitly
set to root_mem_cgroup when memcg is NULL. This approach forces the
compiler to use a separate register/memory location, preventing the
premature offset calculation that caused the crash with a simple
in-place NULL check.

Reported-by: syzbot+e008db2ac01e282550ee@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=e008db2ac01e282550ee
Fixes: ac35a4902374 ("mm: multi-gen LRU: minimal implementation")
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
---
 mm/workingset.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/workingset.c b/mm/workingset.c
index e9f05634747a..6a45e98317e9 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -272,8 +272,13 @@ static bool lru_gen_test_recent(void *shadow, struct lruvec **lruvec,
 	unpack_shadow(shadow, &memcg_id, &pgdat, token, workingset);
 
 	memcg = mem_cgroup_from_id(memcg_id);
+	if (unlikely(!memcg)) {
+		pr_warn("DEBUG: memcg is NULL (memcg_id=%d), pgdat=%p, returning false\n",memcg_id, pgdat);
+		pr_warn("DEBUG: shadow=%p token=%lx workingset=%d\n",shadow, *token, *workingset);
+		return false;
+	}
 	*lruvec = mem_cgroup_lruvec(memcg, pgdat);
-
+	pr_warn("DEBUG: memcg=%p, lruvec=%p, continuing normally\n", memcg, *lruvec);
 	max_seq = READ_ONCE((*lruvec)->lrugen.max_seq);
 	max_seq &= EVICTION_MASK >> LRU_REFS_WIDTH;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Forwarded: [PATCH] mm/workingset: fix NULL pointer dereference in lru_gen_test_recent()
  2025-12-07  8:55 [syzbot] [mm?] general protection fault in lru_gen_test_recent (2) syzbot
                   ` (5 preceding siblings ...)
  2025-12-07 16:07 ` syzbot
@ 2025-12-08  2:31 ` syzbot
  2025-12-08  2:47 ` syzbot
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: syzbot @ 2025-12-08  2:31 UTC (permalink / raw)
  To: linux-kernel, syzkaller-bugs

For archival purposes, forwarding an incoming command email to
linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com.

***

Subject: [PATCH] mm/workingset: fix NULL pointer dereference in lru_gen_test_recent()
Author: kartikey406@gmail.com

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

Syzbot reported a general protection fault in lru_gen_test_recent() when
accessing memcg->nodeinfo with a NULL memcg pointer:

  Oops: general protection fault in lru_gen_test_recent+0xfc/0x370
  KASAN: probably user-memory-access in range [0x0000000000004e00-0x0000000000004e07]
  RIP: 0010:lru_gen_test_recent+0xfc/0x370

The crash occurs when unpack_shadow() extracts parameters from a shadow
entry. There are two cases where NULL pointers can be returned:

1. pgdat can be NULL when NODE_DATA(nid) returns NULL for an invalid or
   offlined NUMA node ID stored in the shadow entry.

2. memcg can be NULL when mem_cgroup_from_id() fails to find the memory
   cgroup (e.g., if it was destroyed).

The existing code directly passes these potentially NULL pointers to
mem_cgroup_lruvec(), which dereferences memcg->nodeinfo without checking,
leading to the crash.

Fix this by:
- Checking if pgdat is NULL and returning early if so, as we cannot
  determine page recency without a valid node.
- Checking if memcg is NULL and falling back to pgdat->__lruvec (the
  root memcg's lruvec) instead of calling mem_cgroup_lruvec() which
  would dereference NULL.

Reported-by: syzbot+e008db2ac01e282550ee@syzkaller.appspot.com
Link: https://syzkaller.appspot.com/bug?extid=e008db2ac01e282550ee
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
---
 mm/workingset.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/mm/workingset.c b/mm/workingset.c
index e9f05634747a..335c2d34ac94 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -270,10 +270,18 @@ static bool lru_gen_test_recent(void *shadow, struct lruvec **lruvec,
 	struct pglist_data *pgdat;
 
 	unpack_shadow(shadow, &memcg_id, &pgdat, token, workingset);
-
+	if(unlikely(!pgdat))
+		return false;
 	memcg = mem_cgroup_from_id(memcg_id);
-	*lruvec = mem_cgroup_lruvec(memcg, pgdat);
-
+	if (unlikely(!memcg)) {
+		pr_warn("DEBUG: memcg is NULL (memcg_id=%d), pgdat=%p, returning false\n",memcg_id, pgdat);
+		pr_warn("DEBUG: shadow=%p token=%lx workingset=%d\n",shadow, *token, *workingset);
+		 memcg = root_mem_cgroup;
+		*lruvec = &pgdat->__lruvec;
+	} else {
+		*lruvec = mem_cgroup_lruvec(memcg, pgdat);
+	}
+	pr_warn("DEBUG: memcg=%p, lruvec=%p, continuing normally\n", memcg, *lruvec);
 	max_seq = READ_ONCE((*lruvec)->lrugen.max_seq);
 	max_seq &= EVICTION_MASK >> LRU_REFS_WIDTH;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Forwarded: [PATCH] mm/workingset: fix NULL pointer dereference in lru_gen_test_recent()
  2025-12-07  8:55 [syzbot] [mm?] general protection fault in lru_gen_test_recent (2) syzbot
                   ` (6 preceding siblings ...)
  2025-12-08  2:31 ` Forwarded: [PATCH] mm/workingset: fix NULL pointer dereference in lru_gen_test_recent() syzbot
@ 2025-12-08  2:47 ` syzbot
  2025-12-08  3:56 ` Forwarded: [PATCH] mm/workingset: add debug for corrupted shadow entry investigation syzbot
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: syzbot @ 2025-12-08  2:47 UTC (permalink / raw)
  To: linux-kernel, syzkaller-bugs

For archival purposes, forwarding an incoming command email to
linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com.

***

Subject: [PATCH] mm/workingset: fix NULL pointer dereference in lru_gen_test_recent()
Author: kartikey406@gmail.com

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

Syzbot reported a general protection fault in lru_gen_test_recent() when
accessing invalid memory addresses:

  Oops: general protection fault in lru_gen_test_recent+0xfc/0x370
  KASAN: probably user-memory-access in range [0x0000000000004e00-0x0000000000004e07]
  RIP: 0010:lru_gen_test_recent+0xfc/0x370

The crash occurs when unpack_shadow() extracts a pglist_data pointer from
a shadow entry. The pgdat can be NULL when NODE_DATA(nid) returns NULL for
an invalid or offlined NUMA node ID stored in the shadow entry.

The existing code doesn't check for NULL pgdat before passing it to
mem_cgroup_lruvec(), which can lead to crashes when dereferencing the
invalid pointer.

Fix this by checking if pgdat is NULL and setting lruvec to NULL before
returning false. The caller in lru_gen_refault() will then skip processing
via the check "if (lruvec != folio_lruvec(folio)) goto unlock", preventing
use of the invalid lruvec.

Reported-by: syzbot+e008db2ac01e282550ee@syzkaller.appspot.com
Link: https://syzkaller.appspot.com/bug?extid=e008db2ac01e282550ee
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
---
---
 mm/workingset.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/workingset.c b/mm/workingset.c
index e9f05634747a..b63948f4e91a 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -270,7 +270,10 @@ static bool lru_gen_test_recent(void *shadow, struct lruvec **lruvec,
 	struct pglist_data *pgdat;
 
 	unpack_shadow(shadow, &memcg_id, &pgdat, token, workingset);
-
+	if (unlikely(!pgdat)) {
+		*lruvec = NULL;
+		return false;
+	}
 	memcg = mem_cgroup_from_id(memcg_id);
 	*lruvec = mem_cgroup_lruvec(memcg, pgdat);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Forwarded: [PATCH] mm/workingset: add debug for corrupted shadow entry investigation
  2025-12-07  8:55 [syzbot] [mm?] general protection fault in lru_gen_test_recent (2) syzbot
                   ` (7 preceding siblings ...)
  2025-12-08  2:47 ` syzbot
@ 2025-12-08  3:56 ` syzbot
  2025-12-08  4:49 ` Forwarded: [PATCH] mm/workingset: fix crash from corrupted shadow entries in lru_gen syzbot
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: syzbot @ 2025-12-08  3:56 UTC (permalink / raw)
  To: linux-kernel, syzkaller-bugs

For archival purposes, forwarding an incoming command email to
linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com.

***

Subject: [PATCH] mm/workingset: add debug for corrupted shadow entry investigation
Author: kartikey406@gmail.com

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master


When pgdat is NULL in lru_gen_test_recent(), it indicates a corrupted
shadow entry. Currently returning false allows execution to continue,
which leads to a subsequent crash in filemap_read_folio() with a NULL
function pointer dereference.

Add debug output and stack dump to understand:
1. When pgdat is NULL (corrupted shadow entries)
2. The full call path leading to this situation
3. Why continuing execution after return false causes crashes

This will help determine the proper place to handle corrupted shadow
entries - either stop earlier in the call chain or handle the corruption
differently in lru_gen_test_recent().

Related-to: syzbot+e008db2ac01e282550ee@syzkaller.appspot.com
Link: https://syzkaller.appspot.com/bug?extid=e008db2ac01e282550ee
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
---
 mm/workingset.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/workingset.c b/mm/workingset.c
index e9f05634747a..a848572f8c8a 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -270,7 +270,13 @@ static bool lru_gen_test_recent(void *shadow, struct lruvec **lruvec,
 	struct pglist_data *pgdat;
 
 	unpack_shadow(shadow, &memcg_id, &pgdat, token, workingset);
-
+	if (unlikely(!pgdat)) {
+		pr_warn("FATAL: Corrupted shadow entry - pgdat is NULL! shadow=%p\n", shadow);
+		pr_warn("This indicates page cache corruption - cannot proceed\n");
+		dump_stack();
+		*lruvec = NULL;
+		return false;
+	}
 	memcg = mem_cgroup_from_id(memcg_id);
 	*lruvec = mem_cgroup_lruvec(memcg, pgdat);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Forwarded: [PATCH] mm/workingset: fix crash from corrupted shadow entries in lru_gen
  2025-12-07  8:55 [syzbot] [mm?] general protection fault in lru_gen_test_recent (2) syzbot
                   ` (8 preceding siblings ...)
  2025-12-08  3:56 ` Forwarded: [PATCH] mm/workingset: add debug for corrupted shadow entry investigation syzbot
@ 2025-12-08  4:49 ` syzbot
  2025-12-08  5:14 ` syzbot
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: syzbot @ 2025-12-08  4:49 UTC (permalink / raw)
  To: linux-kernel, syzkaller-bugs

For archival purposes, forwarding an incoming command email to
linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com.

***

Subject: [PATCH] mm/workingset: fix crash from corrupted shadow entries in lru_gen
Author: kartikey406@gmail.com

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master


Syzbot reported crashes in lru_gen_test_recent() and subsequent NULL
pointer dereferences in the page cache code:

  Oops: general protection fault in lru_gen_test_recent+0xfc/0x370
  KASAN: probably user-memory-access in range [0x0000000000004e00-0x0000000000004e07]

And later:

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  #PF: supervisor instruction fetch in kernel mode
  RIP: 0010:0x0
  Call Trace:
   filemap_read_folio+0xc8/0x2a0

The root cause is that unpack_shadow() can extract an invalid node ID
from a corrupted shadow entry, causing NODE_DATA(nid) to return NULL for
pgdat. When this NULL pgdat is passed to mem_cgroup_lruvec(), it leads
to crashes when dereferencing memcg->nodeinfo.

Even if we detect and return early from lru_gen_test_recent(), the
corrupted state propagates through the call chain, eventually causing
crashes in the page cache code when trying to use the corrupted folio.

Fix this by:
1. Checking if pgdat is NULL in lru_gen_test_recent() and setting
   *lruvec to NULL to signal the corruption to the caller.
2. Adding a NULL check for lruvec in lru_gen_refault() to catch
   corrupted shadow entries and skip processing before the corruption
   can propagate further into the page cache code.

Reported-by: syzbot+e008db2ac01e282550ee@syzkaller.appspot.com
Link: https://syzkaller.appspot.com/bug?extid=e008db2ac01e282550ee
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
---
 mm/workingset.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/mm/workingset.c b/mm/workingset.c
index e9f05634747a..da19ff153dc7 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -270,7 +270,14 @@ static bool lru_gen_test_recent(void *shadow, struct lruvec **lruvec,
 	struct pglist_data *pgdat;
 
 	unpack_shadow(shadow, &memcg_id, &pgdat, token, workingset);
-
+	/*
+	 * If pgdat is NULL, the shadow entry contains an invalid node ID.
+	 * Set lruvec to NULL so caller can detect and skip processing.
+	 */
+	if (unlikely(!pgdat)) {
+		*lruvec = NULL;
+		return false;
+	}
 	memcg = mem_cgroup_from_id(memcg_id);
 	*lruvec = mem_cgroup_lruvec(memcg, pgdat);
 
@@ -294,7 +301,7 @@ static void lru_gen_refault(struct folio *folio, void *shadow)
 	rcu_read_lock();
 
 	recent = lru_gen_test_recent(shadow, &lruvec, &token, &workingset);
-	if (lruvec != folio_lruvec(folio))
+	if (!lruvec || lruvec != folio_lruvec(folio))
 		goto unlock;
 
 	mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + type, delta);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Forwarded: [PATCH] mm/workingset: fix crash from corrupted shadow entries in lru_gen
  2025-12-07  8:55 [syzbot] [mm?] general protection fault in lru_gen_test_recent (2) syzbot
                   ` (9 preceding siblings ...)
  2025-12-08  4:49 ` Forwarded: [PATCH] mm/workingset: fix crash from corrupted shadow entries in lru_gen syzbot
@ 2025-12-08  5:14 ` syzbot
  2025-12-09  5:35 ` Forwarded: [PATCH] mm/workingset: add debug instrumentation for MGLRU shadow corruption syzbot
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: syzbot @ 2025-12-08  5:14 UTC (permalink / raw)
  To: linux-kernel, syzkaller-bugs

For archival purposes, forwarding an incoming command email to
linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com.

***

Subject: [PATCH] mm/workingset: fix crash from corrupted shadow entries in lru_gen
Author: kartikey406@gmail.com

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

Syzbot reported crashes in lru_gen_test_recent() and subsequent NULL
pointer dereferences in the page cache code:

  Oops: general protection fault in lru_gen_test_recent+0xfc/0x370
  KASAN: probably user-memory-access in range [0x0000000000004e00-0x0000000000004e07]

And later:

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  #PF: supervisor instruction fetch in kernel mode
  RIP: 0010:0x0
  Call Trace:
   filemap_read_folio+0xc8/0x2a0

The root cause is that unpack_shadow() can extract an invalid node ID
from a corrupted shadow entry, causing NODE_DATA(nid) to return NULL for
pgdat. When this NULL pgdat is passed to mem_cgroup_lruvec(), it leads
to crashes when dereferencing memcg->nodeinfo.

Even if we detect and return early from lru_gen_test_recent(), the
corrupted state propagates through the call chain, eventually causing
crashes in the page cache code when trying to use the corrupted folio.

Fix this by:
1. Checking if pgdat is NULL in lru_gen_test_recent() and setting
   *lruvec to NULL to signal the corruption to the caller.
2. Adding a NULL check for lruvec in lru_gen_refault() to catch
   corrupted shadow entries and skip processing before the corruption
   can propagate further into the page cache code.

Reported-by: syzbot+e008db2ac01e282550ee@syzkaller.appspot.com
Link: https://syzkaller.appspot.com/bug?extid=e008db2ac01e282550ee
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
---
---
 mm/workingset.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/mm/workingset.c b/mm/workingset.c
index e9f05634747a..364434168b4c 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -270,7 +270,15 @@ static bool lru_gen_test_recent(void *shadow, struct lruvec **lruvec,
 	struct pglist_data *pgdat;
 
 	unpack_shadow(shadow, &memcg_id, &pgdat, token, workingset);
-
+	/*
+	 * If pgdat is NULL, the shadow entry contains an invalid node ID.
+	 * Set lruvec to NULL so caller can detect and skip processing.
+	 */
+	if (unlikely(!pgdat)) {
+	        *lruvec = NULL;
+		pr_warn("lru_gen_test_recent: Detected corrupted shadow (NULL pgdat), setting lruvec=NULL\n");
+		return false;
+	}
 	memcg = mem_cgroup_from_id(memcg_id);
 	*lruvec = mem_cgroup_lruvec(memcg, pgdat);
 
@@ -294,9 +302,11 @@ static void lru_gen_refault(struct folio *folio, void *shadow)
 	rcu_read_lock();
 
 	recent = lru_gen_test_recent(shadow, &lruvec, &token, &workingset);
-	if (lruvec != folio_lruvec(folio))
+	if (!lruvec || lruvec != folio_lruvec(folio)) {
+		if(!lruvec)
+			pr_warn("lru_gen_refault: Skipping corrupted entry (lruvec=NULL)\n");
 		goto unlock;
-
+	}
 	mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + type, delta);
 
 	if (!recent)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Forwarded: [PATCH] mm/workingset: add debug instrumentation for MGLRU shadow corruption
  2025-12-07  8:55 [syzbot] [mm?] general protection fault in lru_gen_test_recent (2) syzbot
                   ` (10 preceding siblings ...)
  2025-12-08  5:14 ` syzbot
@ 2025-12-09  5:35 ` syzbot
  2025-12-09  5:44 ` Forwarded: [PATCH] mm/workingset: debug MGLRU shadow corruption leading to NULL deref syzbot
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: syzbot @ 2025-12-09  5:35 UTC (permalink / raw)
  To: linux-kernel, syzkaller-bugs

For archival purposes, forwarding an incoming command email to
linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com.

***

Subject: [PATCH] mm/workingset: add debug instrumentation for MGLRU shadow corruption
Author: kartikey406@gmail.com

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

Add comprehensive debug logging to track down NULL pointer dereference
in lru_gen_test_recent() when unpacking shadow entries with value 0x41.

The crash occurs when:
1. A shadow entry with value 0x41 is created during page eviction
2. The page later refaults and tries to unpack this shadow
3. unpack_shadow() extracts an invalid node ID from 0x41
4. NODE_DATA() returns NULL for the invalid node
5. Crash when trying to dereference NULL pgdat

This debug patch instruments the complete shadow entry lifecycle:

1. pack_shadow() - Log shadow creation and detect 0x41 creation
2. lru_gen_eviction() - Log MGLRU eviction path with min_seq/token
3. unpack_shadow() - Log shadow unpacking and detect 0x41 unpacking
4. lru_gen_test_recent() - Log entry and detect NULL pgdat
5. workingset_refault() - Log refault entry point
6. lru_gen_refault() - Log MGLRU refault handler

Each function dumps stack trace when 0x41 shadow is detected to capture
the full call chain.

The goal is to identify why pack_shadow() creates 0x41, which likely
indicates MGLRU generation counters (min_seq) are zero when they
shouldn't be.

Link: https://syzkaller.appspot.com/bug?extid=e008db2ac01e282550ee
Reported-by: syzbot+e008db2ac01e282550ee@syzkaller.appspotmail.com
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
---
 mm/workingset.c | 64 +++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 57 insertions(+), 7 deletions(-)

diff --git a/mm/workingset.c b/mm/workingset.c
index 0ec205a1ae92..d64490cd987d 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -199,28 +199,49 @@ static unsigned int bucket_order __read_mostly;
 static void *pack_shadow(int memcgid, pg_data_t *pgdat, unsigned long eviction,
 			 bool workingset)
 {
+	pr_err("PACK_SHADOW: CREATING SHADOW\n");
+	pr_err("  memcgid=%d node_id=%d eviction=0x%lx workingset=%d\n",
+	       memcgid, pgdat->node_id, eviction, workingset);
 	eviction &= EVICTION_MASK;
 	eviction = (eviction << MEM_CGROUP_ID_SHIFT) | memcgid;
 	eviction = (eviction << NODES_SHIFT) | pgdat->node_id;
 	eviction = (eviction << WORKINGSET_SHIFT) | workingset;
-
-	return xa_mk_value(eviction);
+	void *shadow = xa_mk_value(eviction);
+	pr_err("  Final packed shadow=0x%lx (raw eviction=0x%lx)\n",
+	       (unsigned long)shadow, eviction);
+	if ((unsigned long)shadow == 0x41) {
+		pr_err("*** BUG: CREATED SHADOW 0x41! ***\n");
+		dump_stack();
+	}
+	return shadow;
 }
 
 static void unpack_shadow(void *shadow, int *memcgidp, pg_data_t **pgdat,
 			  unsigned long *evictionp, bool *workingsetp)
 {
+	pr_err("UNPACK_SHADOW: READING SHADOW\n");
+	pr_err("  shadow=0x%lx\n", (unsigned long)shadow);
 	unsigned long entry = xa_to_value(shadow);
 	int memcgid, nid;
 	bool workingset;
-
+	// CRITICAL: Detect if we're reading the bad 0x41 shadow!
+	if ((unsigned long)shadow == 0x41) {
+		pr_err("*** BUG: UNPACKING CORRUPTED SHADOW 0x41! ***\n");
+		dump_stack();
+	}
 	workingset = entry & ((1UL << WORKINGSET_SHIFT) - 1);
 	entry >>= WORKINGSET_SHIFT;
 	nid = entry & ((1UL << NODES_SHIFT) - 1);
 	entry >>= NODES_SHIFT;
 	memcgid = entry & ((1UL << MEM_CGROUP_ID_SHIFT) - 1);
 	entry >>= MEM_CGROUP_ID_SHIFT;
-
+	pr_err("  Unpacked: memcgid=%d nid=%d eviction=0x%lx workingset=%d\n",
+	       memcgid, nid, entry, workingset);
+	pr_err("  NODE_DATA(%d)=%px\n", nid, NODE_DATA(nid));
+	if (nid >= MAX_NUMNODES || !NODE_DATA(nid)) {
+		pr_err("*** BUG: INVALID NODE ID %d! ***\n", nid);
+		dump_stack();
+	}
 	*memcgidp = memcgid;
 	*pgdat = NODE_DATA(nid);
 	*evictionp = entry;
@@ -231,6 +252,8 @@ static void unpack_shadow(void *shadow, int *memcgidp, pg_data_t **pgdat,
 
 static void *lru_gen_eviction(struct folio *folio)
 {
+	pr_err("LRU_GEN_EVICTION: ENTERED\n");
+	pr_err("  folio=%px node=%d\n", folio, folio_nid(folio));
 	int hist;
 	unsigned long token;
 	unsigned long min_seq;
@@ -250,11 +273,15 @@ static void *lru_gen_eviction(struct folio *folio)
 	lrugen = &lruvec->lrugen;
 	min_seq = READ_ONCE(lrugen->min_seq[type]);
 	token = (min_seq << LRU_REFS_WIDTH) | max(refs - 1, 0);
-
+	pr_err("LRU_GEN_EVICTION: min_seq=0x%lx refs=%d tier=%d\n",
+	       min_seq, refs, tier);
+	pr_err("  token=0x%lx (will be eviction parameter)\n", token);
 	hist = lru_hist_from_seq(min_seq);
 	atomic_long_add(delta, &lrugen->evicted[hist][type][tier]);
-
-	return pack_shadow(mem_cgroup_id(memcg), pgdat, token, workingset);
+	void *shadow = pack_shadow(mem_cgroup_id(memcg), pgdat, token, workingset);
+	pr_err("LRU_GEN_EVICTION: Returning shadow=0x%lx\n", (unsigned long)shadow);
+	return shadow;
+	//return pack_shadow(mem_cgroup_id(memcg), pgdat, token, workingset);
 }
 
 /*
@@ -289,6 +316,13 @@ static bool lru_gen_test_recent(void *shadow, struct lruvec **lruvec,
 
 static void lru_gen_refault(struct folio *folio, void *shadow)
 {
+	 pr_err("LRU_GEN_REFAULT: ENTERED\n");
+        pr_err("  folio=%px shadow=0x%lx\n", folio, (unsigned long)shadow);
+        
+        if ((unsigned long)shadow == 0x41) {
+                pr_err("*** BUG: LRU_GEN_REFAULT received corrupted shadow 0x41! ***\n");
+                //dump_stack();
+        }
 	bool recent;
 	int hist, tier, refs;
 	bool workingset;
@@ -299,8 +333,11 @@ static void lru_gen_refault(struct folio *folio, void *shadow)
 	int delta = folio_nr_pages(folio);
 
 	rcu_read_lock();
+	        pr_err("LRU_GEN_REFAULT: Calling lru_gen_test_recent\n");
 
 	recent = lru_gen_test_recent(shadow, &lruvec, &token, &workingset);
+	 pr_err("LRU_GEN_REFAULT: lru_gen_test_recent returned %d\n", recent);
+        pr_err("  lruvec=%px token=0x%lx workingset=%d\n", lruvec, token, workingset);
 	if (!lruvec || lruvec != folio_lruvec(folio))
 		goto unlock;
 	mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + type, delta);
@@ -539,6 +576,12 @@ bool workingset_test_recent(void *shadow, bool file, bool *workingset,
  */
 void workingset_refault(struct folio *folio, void *shadow)
 {
+	pr_err("WORKINGSET_REFAULT: ENTERED\n");
+        pr_err("  folio=%px shadow=0x%lx\n", folio, (unsigned long)shadow);
+	  if ((unsigned long)shadow == 0x41) {
+                pr_err("*** BUG: WORKINGSET_REFAULT received corrupted shadow 0x41! ***\n");
+                dump_stack();
+        }
 	bool file = folio_is_file_lru(folio);
 	struct pglist_data *pgdat;
 	struct mem_cgroup *memcg;
@@ -549,9 +592,13 @@ void workingset_refault(struct folio *folio, void *shadow)
 	VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
 
 	if (lru_gen_enabled()) {
+		pr_err("WORKINGSET_REFAULT: LRU_GEN enabled, calling lru_gen_refault\n");
 		lru_gen_refault(folio, shadow);
+		pr_err("WORKINGSET_REFAULT: lru_gen_refault returned\n");
+
 		return;
 	}
+	        pr_err("WORKINGSET_REFAULT: Using regular (non-LRU_GEN) path\n");
 
 	/*
 	 * The activation decision for this folio is made at the level
@@ -568,6 +615,7 @@ void workingset_refault(struct folio *folio, void *shadow)
 	lruvec = mem_cgroup_lruvec(memcg, pgdat);
 
 	mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr);
+	        pr_err("WORKINGSET_REFAULT: Calling workingset_test_recent\n");
 
 	if (!workingset_test_recent(shadow, file, &workingset, true))
 		return;
@@ -578,6 +626,7 @@ void workingset_refault(struct folio *folio, void *shadow)
 
 	/* Folio was active prior to eviction */
 	if (workingset) {
+		 pr_err("WORKINGSET_REFAULT: Folio was workingset, restoring\n");
 		folio_set_workingset(folio);
 		/*
 		 * XXX: Move to folio_add_lru() when it supports new vs
@@ -586,6 +635,7 @@ void workingset_refault(struct folio *folio, void *shadow)
 		lru_note_cost_refault(folio);
 		mod_lruvec_state(lruvec, WORKINGSET_RESTORE_BASE + file, nr);
 	}
+	 pr_err("WORKINGSET_REFAULT: EXITING\n");
 }
 
 /**
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Forwarded: [PATCH] mm/workingset: debug MGLRU shadow corruption leading to NULL deref
  2025-12-07  8:55 [syzbot] [mm?] general protection fault in lru_gen_test_recent (2) syzbot
                   ` (11 preceding siblings ...)
  2025-12-09  5:35 ` Forwarded: [PATCH] mm/workingset: add debug instrumentation for MGLRU shadow corruption syzbot
@ 2025-12-09  5:44 ` syzbot
  2025-12-09  6:28 ` Forwarded: [PATCH] mm/workingset: fix NULL deref from invalid node ID in shadow syzbot
  2025-12-23  9:38 ` Forwarded: [PATCH] for test syzbot
  14 siblings, 0 replies; 16+ messages in thread
From: syzbot @ 2025-12-09  5:44 UTC (permalink / raw)
  To: linux-kernel, syzkaller-bugs

For archival purposes, forwarding an incoming command email to
linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com.

***

Subject: [PATCH] mm/workingset: debug MGLRU shadow corruption leading to NULL deref
Author: kartikey406@gmail.com

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

Add debug logging to trace shadow entry 0x41 that causes NULL pointer
dereference in lru_gen_test_recent().

Instruments:
- pack_shadow(): Detect when 0x41 is created
- lru_gen_eviction(): Show min_seq and token values
- unpack_shadow(): Detect when 0x41 is unpacked
- lru_gen_test_recent(): Detect NULL pgdat
- workingset_refault/lru_gen_refault(): Trace refault path

This will identify if MGLRU generation counters are uninitialized
(min_seq=0), causing corrupted shadow entries.

Link: https://syzkaller.appspot.com/bug?extid=e008db2ac01e282550ee
Reported-by: syzbot+e008db2ac01e282550ee@syzkaller.appspotmail.com
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
---
 mm/workingset.c | 69 ++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 57 insertions(+), 12 deletions(-)

diff --git a/mm/workingset.c b/mm/workingset.c
index e9f05634747a..cebcf5e63f3b 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -199,28 +199,49 @@ static unsigned int bucket_order __read_mostly;
 static void *pack_shadow(int memcgid, pg_data_t *pgdat, unsigned long eviction,
 			 bool workingset)
 {
+	pr_err("PACK_SHADOW: CREATING SHADOW\n");
+	pr_err("  memcgid=%d node_id=%d eviction=0x%lx workingset=%d\n",
+	       memcgid, pgdat->node_id, eviction, workingset);
 	eviction &= EVICTION_MASK;
 	eviction = (eviction << MEM_CGROUP_ID_SHIFT) | memcgid;
 	eviction = (eviction << NODES_SHIFT) | pgdat->node_id;
 	eviction = (eviction << WORKINGSET_SHIFT) | workingset;
-
-	return xa_mk_value(eviction);
+	void *shadow = xa_mk_value(eviction);
+	pr_err("  Final packed shadow=0x%lx (raw eviction=0x%lx)\n",
+	       (unsigned long)shadow, eviction);
+	if ((unsigned long)shadow == 0x41) {
+		pr_err("*** BUG: CREATED SHADOW 0x41! ***\n");
+		dump_stack();
+	}
+	return shadow;
 }
 
 static void unpack_shadow(void *shadow, int *memcgidp, pg_data_t **pgdat,
 			  unsigned long *evictionp, bool *workingsetp)
 {
+	pr_err("UNPACK_SHADOW: READING SHADOW\n");
+	pr_err("  shadow=0x%lx\n", (unsigned long)shadow);
 	unsigned long entry = xa_to_value(shadow);
 	int memcgid, nid;
 	bool workingset;
-
+	// CRITICAL: Detect if we're reading the bad 0x41 shadow!
+	if ((unsigned long)shadow == 0x41) {
+		pr_err("*** BUG: UNPACKING CORRUPTED SHADOW 0x41! ***\n");
+		dump_stack();
+	}
 	workingset = entry & ((1UL << WORKINGSET_SHIFT) - 1);
 	entry >>= WORKINGSET_SHIFT;
 	nid = entry & ((1UL << NODES_SHIFT) - 1);
 	entry >>= NODES_SHIFT;
 	memcgid = entry & ((1UL << MEM_CGROUP_ID_SHIFT) - 1);
 	entry >>= MEM_CGROUP_ID_SHIFT;
-
+	pr_err("  Unpacked: memcgid=%d nid=%d eviction=0x%lx workingset=%d\n",
+	       memcgid, nid, entry, workingset);
+	pr_err("  NODE_DATA(%d)=%px\n", nid, NODE_DATA(nid));
+	if (nid >= MAX_NUMNODES || !NODE_DATA(nid)) {
+		pr_err("*** BUG: INVALID NODE ID %d! ***\n", nid);
+		dump_stack();
+	}
 	*memcgidp = memcgid;
 	*pgdat = NODE_DATA(nid);
 	*evictionp = entry;
@@ -231,6 +252,8 @@ static void unpack_shadow(void *shadow, int *memcgidp, pg_data_t **pgdat,
 
 static void *lru_gen_eviction(struct folio *folio)
 {
+	pr_err("LRU_GEN_EVICTION: ENTERED\n");
+	pr_err("  folio=%px node=%d\n", folio, folio_nid(folio));
 	int hist;
 	unsigned long token;
 	unsigned long min_seq;
@@ -250,11 +273,15 @@ static void *lru_gen_eviction(struct folio *folio)
 	lrugen = &lruvec->lrugen;
 	min_seq = READ_ONCE(lrugen->min_seq[type]);
 	token = (min_seq << LRU_REFS_WIDTH) | max(refs - 1, 0);
-
+	pr_err("LRU_GEN_EVICTION: min_seq=0x%lx refs=%d tier=%d\n",
+	       min_seq, refs, tier);
+	pr_err("  token=0x%lx (will be eviction parameter)\n", token);
 	hist = lru_hist_from_seq(min_seq);
 	atomic_long_add(delta, &lrugen->evicted[hist][type][tier]);
-
-	return pack_shadow(mem_cgroup_id(memcg), pgdat, token, workingset);
+	void *shadow = pack_shadow(mem_cgroup_id(memcg), pgdat, token, workingset);
+	pr_err("LRU_GEN_EVICTION: Returning shadow=0x%lx\n", (unsigned long)shadow);
+	return shadow;
+	//return pack_shadow(mem_cgroup_id(memcg), pgdat, token, workingset);
 }
 
 /*
@@ -270,7 +297,14 @@ static bool lru_gen_test_recent(void *shadow, struct lruvec **lruvec,
 	struct pglist_data *pgdat;
 
 	unpack_shadow(shadow, &memcg_id, &pgdat, token, workingset);
-
+	/*
+	 * If pgdat is NULL, the shadow entry contains an invalid node ID.
+	 * Set lruvec to NULL so caller can detect and skip processing.
+	 */
+	if (unlikely(!pgdat)) {
+	        *lruvec = NULL;
+		return false;
+	}
 	memcg = mem_cgroup_from_id(memcg_id);
 	*lruvec = mem_cgroup_lruvec(memcg, pgdat);
 
@@ -280,7 +314,7 @@ static bool lru_gen_test_recent(void *shadow, struct lruvec **lruvec,
 	return abs_diff(max_seq, *token >> LRU_REFS_WIDTH) < MAX_NR_GENS;
 }
 
-static void lru_gen_refault(struct folio *folio, void *shadow)
+static void lru_gen_refault(struct folio *folio, void *shadow) 
 {
 	bool recent;
 	int hist, tier, refs;
@@ -292,11 +326,9 @@ static void lru_gen_refault(struct folio *folio, void *shadow)
 	int delta = folio_nr_pages(folio);
 
 	rcu_read_lock();
-
 	recent = lru_gen_test_recent(shadow, &lruvec, &token, &workingset);
-	if (lruvec != folio_lruvec(folio))
+	if (!lruvec || lruvec != folio_lruvec(folio))
 		goto unlock;
-
 	mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + type, delta);
 
 	if (!recent)
@@ -533,6 +565,12 @@ bool workingset_test_recent(void *shadow, bool file, bool *workingset,
  */
 void workingset_refault(struct folio *folio, void *shadow)
 {
+	pr_err("WORKINGSET_REFAULT: ENTERED\n");
+        pr_err("  folio=%px shadow=0x%lx\n", folio, (unsigned long)shadow);
+	  if ((unsigned long)shadow == 0x41) {
+                pr_err("*** BUG: WORKINGSET_REFAULT received corrupted shadow 0x41! ***\n");
+                dump_stack();
+        }
 	bool file = folio_is_file_lru(folio);
 	struct pglist_data *pgdat;
 	struct mem_cgroup *memcg;
@@ -543,9 +581,13 @@ void workingset_refault(struct folio *folio, void *shadow)
 	VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
 
 	if (lru_gen_enabled()) {
+		pr_err("WORKINGSET_REFAULT: LRU_GEN enabled, calling lru_gen_refault\n");
 		lru_gen_refault(folio, shadow);
+		pr_err("WORKINGSET_REFAULT: lru_gen_refault returned\n");
+
 		return;
 	}
+	        pr_err("WORKINGSET_REFAULT: Using regular (non-LRU_GEN) path\n");
 
 	/*
 	 * The activation decision for this folio is made at the level
@@ -562,6 +604,7 @@ void workingset_refault(struct folio *folio, void *shadow)
 	lruvec = mem_cgroup_lruvec(memcg, pgdat);
 
 	mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr);
+	        pr_err("WORKINGSET_REFAULT: Calling workingset_test_recent\n");
 
 	if (!workingset_test_recent(shadow, file, &workingset, true))
 		return;
@@ -572,6 +615,7 @@ void workingset_refault(struct folio *folio, void *shadow)
 
 	/* Folio was active prior to eviction */
 	if (workingset) {
+		 pr_err("WORKINGSET_REFAULT: Folio was workingset, restoring\n");
 		folio_set_workingset(folio);
 		/*
 		 * XXX: Move to folio_add_lru() when it supports new vs
@@ -580,6 +624,7 @@ void workingset_refault(struct folio *folio, void *shadow)
 		lru_note_cost_refault(folio);
 		mod_lruvec_state(lruvec, WORKINGSET_RESTORE_BASE + file, nr);
 	}
+	 pr_err("WORKINGSET_REFAULT: EXITING\n");
 }
 
 /**
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Forwarded: [PATCH] mm/workingset: fix NULL deref from invalid node ID in shadow
  2025-12-07  8:55 [syzbot] [mm?] general protection fault in lru_gen_test_recent (2) syzbot
                   ` (12 preceding siblings ...)
  2025-12-09  5:44 ` Forwarded: [PATCH] mm/workingset: debug MGLRU shadow corruption leading to NULL deref syzbot
@ 2025-12-09  6:28 ` syzbot
  2025-12-23  9:38 ` Forwarded: [PATCH] for test syzbot
  14 siblings, 0 replies; 16+ messages in thread
From: syzbot @ 2025-12-09  6:28 UTC (permalink / raw)
  To: linux-kernel, syzkaller-bugs

For archival purposes, forwarding an incoming command email to
linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com.

***

Subject: [PATCH] mm/workingset: fix NULL deref from invalid node ID in shadow
Author: kartikey406@gmail.com

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master


Fix a NULL pointer dereference in lru_gen_test_recent() caused by
shadow entries containing invalid NUMA node IDs.

The crash occurs when:
1. A page is evicted and its folio incorrectly reports node_id >= MAX_NUMNODES
2. pack_shadow() stores this invalid node ID in the shadow entry
3. On page refault, unpack_shadow() extracts the invalid node ID
4. NODE_DATA(invalid_nid) returns NULL
5. Subsequent dereference of NULL pgdat causes crash

Example from crash log:
  shadow=0x11 unpacks to: nid=4, but system only has nodes 0-3
  NODE_DATA(4) returns NULL → crash

Root cause: Pages can be tracked on non-existent NUMA nodes due to:
- Incorrect node assignment during page allocation
- Corrupted page->flags NODES bits
- NUMA policy bugs

Fix: Add validation in both pack_shadow() and unpack_shadow():
1. In pack_shadow(): Detect and reject invalid node IDs at creation time
2. In unpack_shadow(): Validate node ID before using NODE_DATA()
3. Fall back to node 0 for invalid node IDs to prevent crash

Additionally, initialize MGLRU min_seq to 1 instead of 0 to prevent
creating shadows with zero eviction time, which lose temporal information.

Link: https://syzkaller.appspot.com/bug?extid=e008db2ac01e282550ee
Reported-by:  syzbot+e008db2ac01e282550ee@syzkaller.appspotmail.com
Debugged-by: Deepanshu Kartikey <kartikey406@gmail.com>
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
---
 mm/workingset.c | 74 +++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 62 insertions(+), 12 deletions(-)

diff --git a/mm/workingset.c b/mm/workingset.c
index e9f05634747a..23a2d00fb582 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -199,28 +199,55 @@ static unsigned int bucket_order __read_mostly;
 static void *pack_shadow(int memcgid, pg_data_t *pgdat, unsigned long eviction,
 			 bool workingset)
 {
+	pr_err("PACK_SHADOW: CREATING SHADOW\n");
+	pr_err("  memcgid=%d node_id=%d eviction=0x%lx workingset=%d\n",
+	       memcgid, pgdat->node_id, eviction, workingset);
+	if (pgdat->node_id >= MAX_NUMNODES || !NODE_DATA(pgdat->node_id)) {
+		pr_err("*** BUG: pack_shadow called with INVALID node_id=%d! ***\n",
+		       pgdat->node_id);
+		pr_err("  pgdat=%px pgdat->node_id=%d MAX_NUMNODES=%d\n",
+		       pgdat, pgdat->node_id, MAX_NUMNODES);
+		dump_stack();
+		
+		// This will show WHERE the bad pgdat came from
+	}
 	eviction &= EVICTION_MASK;
 	eviction = (eviction << MEM_CGROUP_ID_SHIFT) | memcgid;
 	eviction = (eviction << NODES_SHIFT) | pgdat->node_id;
 	eviction = (eviction << WORKINGSET_SHIFT) | workingset;
-
-	return xa_mk_value(eviction);
+	void *shadow = xa_mk_value(eviction);
+	pr_err("  Final packed shadow=0x%lx (raw eviction=0x%lx)\n",
+	       (unsigned long)shadow, eviction);
+	if ((unsigned long)shadow == 0x41) {
+		pr_err("*** BUG: CREATED SHADOW 0x41! ***\n");
+	}
+	return shadow;
 }
 
 static void unpack_shadow(void *shadow, int *memcgidp, pg_data_t **pgdat,
 			  unsigned long *evictionp, bool *workingsetp)
 {
+	pr_err("UNPACK_SHADOW: READING SHADOW\n");
+	pr_err("  shadow=0x%lx\n", (unsigned long)shadow);
 	unsigned long entry = xa_to_value(shadow);
 	int memcgid, nid;
 	bool workingset;
-
+	// CRITICAL: Detect if we're reading the bad 0x41 shadow!
+	if ((unsigned long)shadow == 0x41) {
+		pr_err("*** BUG: UNPACKING CORRUPTED SHADOW 0x41! ***\n");
+	}
 	workingset = entry & ((1UL << WORKINGSET_SHIFT) - 1);
 	entry >>= WORKINGSET_SHIFT;
 	nid = entry & ((1UL << NODES_SHIFT) - 1);
 	entry >>= NODES_SHIFT;
 	memcgid = entry & ((1UL << MEM_CGROUP_ID_SHIFT) - 1);
 	entry >>= MEM_CGROUP_ID_SHIFT;
-
+	pr_err("  Unpacked: memcgid=%d nid=%d eviction=0x%lx workingset=%d\n",
+	       memcgid, nid, entry, workingset);
+	pr_err("  NODE_DATA(%d)=%px\n", nid, NODE_DATA(nid));
+	if (nid >= MAX_NUMNODES || !NODE_DATA(nid)) {
+		pr_err("*** BUG: INVALID NODE ID %d! ***\n", nid);
+	}
 	*memcgidp = memcgid;
 	*pgdat = NODE_DATA(nid);
 	*evictionp = entry;
@@ -231,6 +258,8 @@ static void unpack_shadow(void *shadow, int *memcgidp, pg_data_t **pgdat,
 
 static void *lru_gen_eviction(struct folio *folio)
 {
+	pr_err("LRU_GEN_EVICTION: ENTERED\n");
+	pr_err("  folio=%px node=%d\n", folio, folio_nid(folio));
 	int hist;
 	unsigned long token;
 	unsigned long min_seq;
@@ -250,11 +279,15 @@ static void *lru_gen_eviction(struct folio *folio)
 	lrugen = &lruvec->lrugen;
 	min_seq = READ_ONCE(lrugen->min_seq[type]);
 	token = (min_seq << LRU_REFS_WIDTH) | max(refs - 1, 0);
-
+	pr_err("LRU_GEN_EVICTION: min_seq=0x%lx refs=%d tier=%d\n",
+	       min_seq, refs, tier);
+	pr_err("  token=0x%lx (will be eviction parameter)\n", token);
 	hist = lru_hist_from_seq(min_seq);
 	atomic_long_add(delta, &lrugen->evicted[hist][type][tier]);
-
-	return pack_shadow(mem_cgroup_id(memcg), pgdat, token, workingset);
+	void *shadow = pack_shadow(mem_cgroup_id(memcg), pgdat, token, workingset);
+	pr_err("LRU_GEN_EVICTION: Returning shadow=0x%lx\n", (unsigned long)shadow);
+	return shadow;
+	//return pack_shadow(mem_cgroup_id(memcg), pgdat, token, workingset);
 }
 
 /*
@@ -270,7 +303,14 @@ static bool lru_gen_test_recent(void *shadow, struct lruvec **lruvec,
 	struct pglist_data *pgdat;
 
 	unpack_shadow(shadow, &memcg_id, &pgdat, token, workingset);
-
+	/*
+	 * If pgdat is NULL, the shadow entry contains an invalid node ID.
+	 * Set lruvec to NULL so caller can detect and skip processing.
+	 */
+	if (unlikely(!pgdat)) {
+	        *lruvec = NULL;
+		return false;
+	}
 	memcg = mem_cgroup_from_id(memcg_id);
 	*lruvec = mem_cgroup_lruvec(memcg, pgdat);
 
@@ -280,7 +320,7 @@ static bool lru_gen_test_recent(void *shadow, struct lruvec **lruvec,
 	return abs_diff(max_seq, *token >> LRU_REFS_WIDTH) < MAX_NR_GENS;
 }
 
-static void lru_gen_refault(struct folio *folio, void *shadow)
+static void lru_gen_refault(struct folio *folio, void *shadow) 
 {
 	bool recent;
 	int hist, tier, refs;
@@ -292,11 +332,9 @@ static void lru_gen_refault(struct folio *folio, void *shadow)
 	int delta = folio_nr_pages(folio);
 
 	rcu_read_lock();
-
 	recent = lru_gen_test_recent(shadow, &lruvec, &token, &workingset);
-	if (lruvec != folio_lruvec(folio))
+	if (!lruvec || lruvec != folio_lruvec(folio))
 		goto unlock;
-
 	mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + type, delta);
 
 	if (!recent)
@@ -533,6 +571,11 @@ bool workingset_test_recent(void *shadow, bool file, bool *workingset,
  */
 void workingset_refault(struct folio *folio, void *shadow)
 {
+	pr_err("WORKINGSET_REFAULT: ENTERED\n");
+        pr_err("  folio=%px shadow=0x%lx\n", folio, (unsigned long)shadow);
+	  if ((unsigned long)shadow == 0x41) {
+                pr_err("*** BUG: WORKINGSET_REFAULT received corrupted shadow 0x41! ***\n");
+        }
 	bool file = folio_is_file_lru(folio);
 	struct pglist_data *pgdat;
 	struct mem_cgroup *memcg;
@@ -543,9 +586,13 @@ void workingset_refault(struct folio *folio, void *shadow)
 	VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
 
 	if (lru_gen_enabled()) {
+		pr_err("WORKINGSET_REFAULT: LRU_GEN enabled, calling lru_gen_refault\n");
 		lru_gen_refault(folio, shadow);
+		pr_err("WORKINGSET_REFAULT: lru_gen_refault returned\n");
+
 		return;
 	}
+	        pr_err("WORKINGSET_REFAULT: Using regular (non-LRU_GEN) path\n");
 
 	/*
 	 * The activation decision for this folio is made at the level
@@ -562,6 +609,7 @@ void workingset_refault(struct folio *folio, void *shadow)
 	lruvec = mem_cgroup_lruvec(memcg, pgdat);
 
 	mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr);
+	        pr_err("WORKINGSET_REFAULT: Calling workingset_test_recent\n");
 
 	if (!workingset_test_recent(shadow, file, &workingset, true))
 		return;
@@ -572,6 +620,7 @@ void workingset_refault(struct folio *folio, void *shadow)
 
 	/* Folio was active prior to eviction */
 	if (workingset) {
+		 pr_err("WORKINGSET_REFAULT: Folio was workingset, restoring\n");
 		folio_set_workingset(folio);
 		/*
 		 * XXX: Move to folio_add_lru() when it supports new vs
@@ -580,6 +629,7 @@ void workingset_refault(struct folio *folio, void *shadow)
 		lru_note_cost_refault(folio);
 		mod_lruvec_state(lruvec, WORKINGSET_RESTORE_BASE + file, nr);
 	}
+	 pr_err("WORKINGSET_REFAULT: EXITING\n");
 }
 
 /**
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Forwarded: [PATCH] for test
  2025-12-07  8:55 [syzbot] [mm?] general protection fault in lru_gen_test_recent (2) syzbot
                   ` (13 preceding siblings ...)
  2025-12-09  6:28 ` Forwarded: [PATCH] mm/workingset: fix NULL deref from invalid node ID in shadow syzbot
@ 2025-12-23  9:38 ` syzbot
  14 siblings, 0 replies; 16+ messages in thread
From: syzbot @ 2025-12-23  9:38 UTC (permalink / raw)
  To: linux-kernel, syzkaller-bugs

For archival purposes, forwarding an incoming command email to
linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com.

***

Subject: [PATCH] for test
Author: wangjinchao600@gmail.com

#syz test

Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
---
 lib/buildid.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/buildid.c b/lib/buildid.c
index aaf61dfc0919..7131594cb071 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -280,7 +280,10 @@ static int __build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
 	int ret;
 
 	/* only works for page backed storage  */
-	if (!vma->vm_file)
+	if (!vma->vm_file ||
+	    !S_ISREG(file_inode(vma->vm_file)->i_mode) ||
+	    !vma->vm_file->f_mapping->a_ops ||
+	    !vma->vm_file->f_mapping->a_ops->read_folio)
 		return -EINVAL;
 
 	freader_init_from_file(&r, buf, sizeof(buf), vma->vm_file, may_fault);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-12-23  9:38 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-07  8:55 [syzbot] [mm?] general protection fault in lru_gen_test_recent (2) syzbot
2025-12-07 12:44 ` Forwarded: [PATCH] mm/workingset: fix NULL pointer dereference in lru_gen_test_recent syzbot
2025-12-07 14:35 ` syzbot
2025-12-07 15:05 ` syzbot
2025-12-07 15:31 ` syzbot
2025-12-07 15:38 ` syzbot
2025-12-07 16:07 ` syzbot
2025-12-08  2:31 ` Forwarded: [PATCH] mm/workingset: fix NULL pointer dereference in lru_gen_test_recent() syzbot
2025-12-08  2:47 ` syzbot
2025-12-08  3:56 ` Forwarded: [PATCH] mm/workingset: add debug for corrupted shadow entry investigation syzbot
2025-12-08  4:49 ` Forwarded: [PATCH] mm/workingset: fix crash from corrupted shadow entries in lru_gen syzbot
2025-12-08  5:14 ` syzbot
2025-12-09  5:35 ` Forwarded: [PATCH] mm/workingset: add debug instrumentation for MGLRU shadow corruption syzbot
2025-12-09  5:44 ` Forwarded: [PATCH] mm/workingset: debug MGLRU shadow corruption leading to NULL deref syzbot
2025-12-09  6:28 ` Forwarded: [PATCH] mm/workingset: fix NULL deref from invalid node ID in shadow syzbot
2025-12-23  9:38 ` Forwarded: [PATCH] for test syzbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox