* [PATCH] bcachefs: Mark bch_inode_info as SLAB_ACCOUNT
@ 2024-07-03 7:09 Youling Tang
2024-07-03 15:02 ` Kent Overstreet
2024-07-12 0:03 ` Kent Overstreet
0 siblings, 2 replies; 5+ messages in thread
From: Youling Tang @ 2024-07-03 7:09 UTC (permalink / raw)
To: Kent Overstreet; +Cc: linux-bcachefs, linux-kernel, youling.tang, Youling Tang
From: Youling Tang <tangyouling@kylinos.cn>
After commit 230e9fc28604 ("slab: add SLAB_ACCOUNT flag"), we need to mark
the inode cache as SLAB_ACCOUNT, similar to commit 5d097056c9a0 ("kmemcg:
account for certain kmem allocations to memcg")
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
---
fs/bcachefs/fs.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/bcachefs/fs.c b/fs/bcachefs/fs.c
index be7eda94b459..092f95426dce 100644
--- a/fs/bcachefs/fs.c
+++ b/fs/bcachefs/fs.c
@@ -2126,7 +2126,8 @@ int __init bch2_vfs_init(void)
{
int ret = -ENOMEM;
- bch2_inode_cache = KMEM_CACHE(bch_inode_info, SLAB_RECLAIM_ACCOUNT);
+ bch2_inode_cache = KMEM_CACHE(bch_inode_info, SLAB_RECLAIM_ACCOUNT |
+ SLAB_ACCOUNT);
if (!bch2_inode_cache)
goto err;
--
2.34.1
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH] bcachefs: Mark bch_inode_info as SLAB_ACCOUNT 2024-07-03 7:09 [PATCH] bcachefs: Mark bch_inode_info as SLAB_ACCOUNT Youling Tang @ 2024-07-03 15:02 ` Kent Overstreet 2024-07-12 0:03 ` Kent Overstreet 1 sibling, 0 replies; 5+ messages in thread From: Kent Overstreet @ 2024-07-03 15:02 UTC (permalink / raw) To: Youling Tang; +Cc: linux-bcachefs, linux-kernel, Youling Tang On Wed, Jul 03, 2024 at 03:09:55PM GMT, Youling Tang wrote: > From: Youling Tang <tangyouling@kylinos.cn> > > After commit 230e9fc28604 ("slab: add SLAB_ACCOUNT flag"), we need to mark > the inode cache as SLAB_ACCOUNT, similar to commit 5d097056c9a0 ("kmemcg: > account for certain kmem allocations to memcg") > > Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Thanks, applied ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] bcachefs: Mark bch_inode_info as SLAB_ACCOUNT 2024-07-03 7:09 [PATCH] bcachefs: Mark bch_inode_info as SLAB_ACCOUNT Youling Tang 2024-07-03 15:02 ` Kent Overstreet @ 2024-07-12 0:03 ` Kent Overstreet 2024-07-12 1:39 ` Youling Tang 1 sibling, 1 reply; 5+ messages in thread From: Kent Overstreet @ 2024-07-12 0:03 UTC (permalink / raw) To: Youling Tang; +Cc: linux-bcachefs, linux-kernel, Youling Tang On Wed, Jul 03, 2024 at 03:09:55PM GMT, Youling Tang wrote: > From: Youling Tang <tangyouling@kylinos.cn> > > After commit 230e9fc28604 ("slab: add SLAB_ACCOUNT flag"), we need to mark > the inode cache as SLAB_ACCOUNT, similar to commit 5d097056c9a0 ("kmemcg: > account for certain kmem allocations to memcg") > > Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Turns out this was never tested with memcg enabled (!). I'm reverting it, please feel free to send me a fixed version. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] bcachefs: Mark bch_inode_info as SLAB_ACCOUNT 2024-07-12 0:03 ` Kent Overstreet @ 2024-07-12 1:39 ` Youling Tang 2024-07-12 2:24 ` Youling Tang 0 siblings, 1 reply; 5+ messages in thread From: Youling Tang @ 2024-07-12 1:39 UTC (permalink / raw) To: Kent Overstreet; +Cc: linux-bcachefs, linux-kernel, Youling Tang On 12/07/2024 08:03, Kent Overstreet wrote: > On Wed, Jul 03, 2024 at 03:09:55PM GMT, Youling Tang wrote: >> From: Youling Tang <tangyouling@kylinos.cn> >> >> After commit 230e9fc28604 ("slab: add SLAB_ACCOUNT flag"), we need to mark >> the inode cache as SLAB_ACCOUNT, similar to commit 5d097056c9a0 ("kmemcg: >> account for certain kmem allocations to memcg") >> >> Signed-off-by: Youling Tang <tangyouling@kylinos.cn> > Turns out this was never tested with memcg enabled (!). > > I'm reverting it, please feel free to send me a fixed version. Sorry, my oversight. The following null pointer dereference is triggered after MEMCG configuration is enabled. ``` BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP CPU: 5 PID: 1702 Comm: umount Not tainted 6.10.0-rc7-ktest-00003-g557bd05b0d4c-dirty #12 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 RIP: 0010:list_lru_add+0x83/0x100 Code: 5f 5d c3 48 8b 45 d0 48 85 c0 74 13 41 80 7c 24 1c 00 48 63 b0 68 06 00 00 74 04 85 f6 79 5e 4d 03 2c 24 49 83 c5 08 4c 89 ea <49> 8b 45 08 49 89 5d 08 48 89 13 48 89 43 08 48 89 18 49 8b 45 10 RSP: 0018:ffff8881178efd10 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88810ec140f0 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000017 RDI: ffff8881178efcc8 RBP: ffff8881178efd48 R08: ffff8881009de780 R09: ffffffff822e0de0 R10: 0000000000000000 R11: 0000000000000000 R12: ffff888102075c80 R13: 0000000000000000 R14: ffff88810443e6c0 R15: 0000000000000000 FS: 00007f9ed1840800(0000) GS:ffff888179940000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 00000001062b9005 CR4: 0000000000370eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? show_regs+0x69/0x70 ? __die+0x29/0x70 ? page_fault_oops+0x14f/0x3c0 ? do_user_addr_fault+0x2d0/0x5b0 ? default_wake_function+0x1e/0x30 ? exc_page_fault+0x6d/0x130 ? asm_exc_page_fault+0x2b/0x30 ? list_lru_add+0x83/0x100 list_lru_add_obj+0x4b/0x60 iput+0x1fe/0x220 dentry_unlink_inode+0xbd/0x120 __dentry_kill+0x78/0x180 dput+0xc7/0x170 shrink_dcache_for_umount+0xe8/0x120 generic_shutdown_super+0x23/0x150 bch2_kill_sb+0x1b/0x30 deactivate_locked_super+0x34/0xb0 deactivate_super+0x44/0x50 cleanup_mnt+0x105/0x160 __cleanup_mnt+0x16/0x20 task_work_run+0x63/0x90 syscall_exit_to_user_mode+0x10d/0x110 do_syscall_64+0x57/0x100 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f9ed1a7a6e7 Code: 0c 00 f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 09 97 0c 00 f7 d8 64 89 02 b8 RSP: 002b:00007ffef8a29128 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 RAX: 0000000000000000 RBX: 000055f4671acad8 RCX: 00007f9ed1a7a6e7 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055f4671b1240 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9ed1bc6244 R13: 000055f4671b1240 R14: 000055f4671acde0 R15: 000055f4671ac9d0 </TASK> ``` I'm going to analyze it. Thanks, Youling. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] bcachefs: Mark bch_inode_info as SLAB_ACCOUNT 2024-07-12 1:39 ` Youling Tang @ 2024-07-12 2:24 ` Youling Tang 0 siblings, 0 replies; 5+ messages in thread From: Youling Tang @ 2024-07-12 2:24 UTC (permalink / raw) To: Kent Overstreet; +Cc: linux-bcachefs, linux-kernel, Youling Tang Hi, Kent On 12/07/2024 09:39, Youling Tang wrote: > On 12/07/2024 08:03, Kent Overstreet wrote: >> On Wed, Jul 03, 2024 at 03:09:55PM GMT, Youling Tang wrote: >>> From: Youling Tang <tangyouling@kylinos.cn> >>> >>> After commit 230e9fc28604 ("slab: add SLAB_ACCOUNT flag"), we need >>> to mark >>> the inode cache as SLAB_ACCOUNT, similar to commit 5d097056c9a0 >>> ("kmemcg: >>> account for certain kmem allocations to memcg") >>> >>> Signed-off-by: Youling Tang <tangyouling@kylinos.cn> >> Turns out this was never tested with memcg enabled (!). >> >> I'm reverting it, please feel free to send me a fixed version. > Sorry, my oversight. > > The following null pointer dereference is triggered after MEMCG > configuration is enabled. > ``` > BUG: kernel NULL pointer dereference, address: 0000000000000008 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page > PGD 0 P4D 0 > Oops: Oops: 0000 [#1] SMP > CPU: 5 PID: 1702 Comm: umount Not tainted > 6.10.0-rc7-ktest-00003-g557bd05b0d4c-dirty #12 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 > 04/01/2014 > RIP: 0010:list_lru_add+0x83/0x100 > Code: 5f 5d c3 48 8b 45 d0 48 85 c0 74 13 41 80 7c 24 1c 00 48 63 b0 > 68 06 00 00 74 04 85 f6 79 5e 4d 03 2c 24 49 83 c5 08 4c 89 ea <49> 8b > 45 08 49 89 5d 08 48 89 13 48 89 43 08 48 89 18 49 8b 45 10 > RSP: 0018:ffff8881178efd10 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: ffff88810ec140f0 RCX: 0000000000000000 > RDX: 0000000000000000 RSI: 0000000000000017 RDI: ffff8881178efcc8 > RBP: ffff8881178efd48 R08: ffff8881009de780 R09: ffffffff822e0de0 > R10: 0000000000000000 R11: 0000000000000000 R12: ffff888102075c80 > R13: 0000000000000000 R14: ffff88810443e6c0 R15: 0000000000000000 > FS: 00007f9ed1840800(0000) GS:ffff888179940000(0000) > knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000008 CR3: 00000001062b9005 CR4: 0000000000370eb0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > Call Trace: > <TASK> > ? show_regs+0x69/0x70 > ? __die+0x29/0x70 > ? page_fault_oops+0x14f/0x3c0 > ? do_user_addr_fault+0x2d0/0x5b0 > ? default_wake_function+0x1e/0x30 > ? exc_page_fault+0x6d/0x130 > ? asm_exc_page_fault+0x2b/0x30 > ? list_lru_add+0x83/0x100 > list_lru_add_obj+0x4b/0x60 > iput+0x1fe/0x220 > dentry_unlink_inode+0xbd/0x120 > __dentry_kill+0x78/0x180 > dput+0xc7/0x170 > shrink_dcache_for_umount+0xe8/0x120 > generic_shutdown_super+0x23/0x150 > bch2_kill_sb+0x1b/0x30 > deactivate_locked_super+0x34/0xb0 > deactivate_super+0x44/0x50 > cleanup_mnt+0x105/0x160 > __cleanup_mnt+0x16/0x20 > task_work_run+0x63/0x90 > syscall_exit_to_user_mode+0x10d/0x110 > do_syscall_64+0x57/0x100 > entry_SYSCALL_64_after_hwframe+0x4b/0x53 > RIP: 0033:0x7f9ed1a7a6e7 > Code: 0c 00 f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 31 f6 > e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d > 00 f0 ff ff 77 01 c3 48 8b 15 09 97 0c 00 f7 d8 64 89 02 b8 > RSP: 002b:00007ffef8a29128 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 > RAX: 0000000000000000 RBX: 000055f4671acad8 RCX: 00007f9ed1a7a6e7 > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055f4671b1240 > RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9ed1bc6244 > R13: 000055f4671b1240 R14: 000055f4671acde0 R15: 000055f4671ac9d0 > </TASK> > ``` The direct cause of the BUG is that the return value of list_lru_from_memcg_idx() is NULL, and the execution of l->list will cause NULL pointer dereference. The return value of list_lru_from_memcg_idx() needs to be determined, similar to commit 5abc1e37afa0 ("mm: list_lru: allocate list_lru_one only when needed"). Modified as follows: diff --git a/mm/list_lru.c b/mm/list_lru.c index 3fd64736bc45..ee7424c3879d 100644 --- a/mm/list_lru.c +++ b/mm/list_lru.c @@ -94,6 +94,9 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item, int nid, spin_lock(&nlru->lock); if (list_empty(item)) { l = list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(memcg)); + if (!l) + goto out; + list_add_tail(item, &l->list); /* Set shrinker bit if the first element was added */ if (!l->nr_items++) @@ -102,6 +105,7 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item, int nid, spin_unlock(&nlru->lock); return true; } +out: spin_unlock(&nlru->lock); return false; } After ktest test tests/bcachefs/xfstests.ktest can continue to test (enable MEMCG and MEMCG_KMEM). Thanks, Youling. ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-07-12 2:25 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-07-03 7:09 [PATCH] bcachefs: Mark bch_inode_info as SLAB_ACCOUNT Youling Tang 2024-07-03 15:02 ` Kent Overstreet 2024-07-12 0:03 ` Kent Overstreet 2024-07-12 1:39 ` Youling Tang 2024-07-12 2:24 ` Youling Tang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox