Re: [PATCH] bcachefs: Use alloc_percpu_gfp to avoid deadlock

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Vlastimil Babka <vbabka@suse.cz>
To: Alan Huang <mmpgouride@gmail.com>,
	Kent Overstreet <kent.overstreet@linux.dev>
Cc: linux-bcachefs@vger.kernel.org,
	syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com,
	linux-mm@kvack.org, Tejun Heo <tj@kernel.org>,
	Dennis Zhou <dennis@kernel.org>, Christoph Lameter <cl@linux.com>,
	Michal Hocko <mhocko@kernel.org>
Subject: Re: [PATCH] bcachefs: Use alloc_percpu_gfp to avoid deadlock
Date: Thu, 20 Feb 2025 18:16:43 +0100	[thread overview]
Message-ID: <78d954b5-e33f-4bbc-855b-e91e96278bef@suse.cz> (raw)
In-Reply-To: <25FBAAE5-8BC6-41F3-9A6D-65911BA5A5D7@gmail.com>

On 2/20/25 11:57, Alan Huang wrote:
> Ping
> 
>> On Feb 12, 2025, at 22:27, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>> 
>> Adding pcpu people to the CC
>> 
>> On Wed, Feb 12, 2025 at 06:06:25PM +0800, Alan Huang wrote:
>>> The cycle:
>>> 
>>> CPU0: CPU1:
>>> bc->lock pcpu_alloc_mutex
>>> pcpu_alloc_mutex bc->lock
>>> 
>>> Reported-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com
>>> Tested-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com
>>> Signed-off-by: Alan Huang <mmpgouride@gmail.com>
>> 
>> So pcpu_alloc_mutex -> fs_reclaim?
>> 
>> That's really awkward; seems like something that might invite more
>> issues. We can apply your fix if we need to, but I want to hear with the
>> percpu people have to say first.
>> 
>> ======================================================
>> WARNING: possible circular locking dependency detected
>> 6.14.0-rc2-syzkaller-00039-g09fbf3d50205 #0 Not tainted
>> ------------------------------------------------------
>> syz.0.21/5625 is trying to acquire lock:
>> ffffffff8ea19608 (pcpu_alloc_mutex){+.+.}-{4:4}, at: pcpu_alloc_noprof+0x293/0x1760 mm/percpu.c:1782
>> 
>> but task is already holding lock:
>> ffff888051401c68 (&bc->lock){+.+.}-{4:4}, at: bch2_btree_node_mem_alloc+0x559/0x16f0 fs/bcachefs/btree_cache.c:804
>> 
>> which lock already depends on the new lock.
>> 
>> 
>> the existing dependency chain (in reverse order) is:
>> 
>> -> #2 (&bc->lock){+.+.}-{4:4}:
>>       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
>>       __mutex_lock_common kernel/locking/mutex.c:585 [inline]
>>       __mutex_lock+0x19c/0x1010 kernel/locking/mutex.c:730
>>       bch2_btree_cache_scan+0x184/0xec0 fs/bcachefs/btree_cache.c:482
>>       do_shrink_slab+0x72d/0x1160 mm/shrinker.c:437
>>       shrink_slab+0x1093/0x14d0 mm/shrinker.c:664
>>       shrink_one+0x43b/0x850 mm/vmscan.c:4868
>>       shrink_many mm/vmscan.c:4929 [inline]
>>       lru_gen_shrink_node mm/vmscan.c:5007 [inline]
>>       shrink_node+0x37c5/0x3e50 mm/vmscan.c:5978
>>       kswapd_shrink_node mm/vmscan.c:6807 [inline]
>>       balance_pgdat mm/vmscan.c:6999 [inline]
>>       kswapd+0x20f3/0x3b10 mm/vmscan.c:7264
>>       kthread+0x7a9/0x920 kernel/kthread.c:464
>>       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:148
>>       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
>> 
>> -> #1 (fs_reclaim){+.+.}-{0:0}:
>>       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
>>       __fs_reclaim_acquire mm/page_alloc.c:3853 [inline]
>>       fs_reclaim_acquire+0x88/0x130 mm/page_alloc.c:3867
>>       might_alloc include/linux/sched/mm.h:318 [inline]
>>       slab_pre_alloc_hook mm/slub.c:4066 [inline]
>>       slab_alloc_node mm/slub.c:4144 [inline]
>>       __do_kmalloc_node mm/slub.c:4293 [inline]
>>       __kmalloc_noprof+0xae/0x4c0 mm/slub.c:4306
>>       kmalloc_noprof include/linux/slab.h:905 [inline]
>>       kzalloc_noprof include/linux/slab.h:1037 [inline]
>>       pcpu_mem_zalloc mm/percpu.c:510 [inline]
>>       pcpu_alloc_chunk mm/percpu.c:1430 [inline]
>>       pcpu_create_chunk+0x57/0xbc0 mm/percpu-vm.c:338
>>       pcpu_balance_populated mm/percpu.c:2063 [inline]
>>       pcpu_balance_workfn+0xc4d/0xd40 mm/percpu.c:2200
>>       process_one_work kernel/workqueue.c:3236 [inline]
>>       process_scheduled_works+0xa66/0x1840 kernel/workqueue.c:3317
>>       worker_thread+0x870/0xd30 kernel/workqueue.c:3398
>>       kthread+0x7a9/0x920 kernel/kthread.c:464
>>       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:148
>>       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

Seeing this as part of the chain (fs reclaim from a worker doing
pcpu_balance_workfn) makes me think Michal's patch could be a fix to this:

https://lore.kernel.org/all/20250206122633.167896-1-mhocko@kernel.org/

>> -> #0 (pcpu_alloc_mutex){+.+.}-{4:4}:
>>       check_prev_add kernel/locking/lockdep.c:3163 [inline]
>>       check_prevs_add kernel/locking/lockdep.c:3282 [inline]
>>       validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3906
>>       __lock_acquire+0x1397/0x2100 kernel/locking/lockdep.c:5228
>>       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
>>       __mutex_lock_common kernel/locking/mutex.c:585 [inline]
>>       __mutex_lock+0x19c/0x1010 kernel/locking/mutex.c:730
>>       pcpu_alloc_noprof+0x293/0x1760 mm/percpu.c:1782
>>       __six_lock_init+0x104/0x150 fs/bcachefs/six.c:876
>>       bch2_btree_lock_init+0x38/0x100 fs/bcachefs/btree_locking.c:12
>>       bch2_btree_node_mem_alloc+0x565/0x16f0 fs/bcachefs/btree_cache.c:807
>>       __bch2_btree_node_alloc fs/bcachefs/btree_update_interior.c:304 [inline]
>>       bch2_btree_reserve_get+0x2df/0x1890 fs/bcachefs/btree_update_interior.c:532
>>       bch2_btree_update_start+0xe56/0x14e0 fs/bcachefs/btree_update_interior.c:1230
>>       bch2_btree_split_leaf+0x121/0x880 fs/bcachefs/btree_update_interior.c:1851
>>       bch2_trans_commit_error+0x212/0x1380 fs/bcachefs/btree_trans_commit.c:908
>>       __bch2_trans_commit+0x812b/0x97a0 fs/bcachefs/btree_trans_commit.c:1085
>>       bch2_trans_commit fs/bcachefs/btree_update.h:183 [inline]
>>       bch2_trans_mark_metadata_bucket+0x47a/0x17b0 fs/bcachefs/buckets.c:1043
>>       bch2_trans_mark_metadata_sectors fs/bcachefs/buckets.c:1060 [inline]
>>       __bch2_trans_mark_dev_sb fs/bcachefs/buckets.c:1100 [inline]
>>       bch2_trans_mark_dev_sb+0x3f6/0x820 fs/bcachefs/buckets.c:1128
>>       bch2_trans_mark_dev_sbs_flags+0x6be/0x720 fs/bcachefs/buckets.c:1138
>>       bch2_fs_initialize+0xba0/0x1610 fs/bcachefs/recovery.c:1149
>>       bch2_fs_start+0x36d/0x610 fs/bcachefs/super.c:1042
>>       bch2_fs_get_tree+0xd8d/0x1740 fs/bcachefs/fs.c:2203
>>       vfs_get_tree+0x90/0x2b0 fs/super.c:1814
>>       do_new_mount+0x2be/0xb40 fs/namespace.c:3560
>>       do_mount fs/namespace.c:3900 [inline]
>>       __do_sys_mount fs/namespace.c:4111 [inline]
>>       __se_sys_mount+0x2d6/0x3c0 fs/namespace.c:4088
>>       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>>       do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
>>       entry_SYSCALL_64_after_hwframe+0x77/0x7f
>> 
>> other info that might help us debug this:
>> 
>> Chain exists of:
>>  pcpu_alloc_mutex --> fs_reclaim --> &bc->lock
>> 
>> Possible unsafe locking scenario:
>> 
>>       CPU0                    CPU1
>>       ----                    ----
>>  lock(&bc->lock);
>>                               lock(fs_reclaim);
>>                               lock(&bc->lock);
>>  lock(pcpu_alloc_mutex);
>> 
>> *** DEADLOCK ***
>> 
>> 4 locks held by syz.0.21/5625:
>> #0: ffff888051400278 (&c->state_lock){+.+.}-{4:4}, at: bch2_fs_start+0x45/0x610 fs/bcachefs/super.c:1010
>> #1: ffff888051404378 (&c->btree_trans_barrier){.+.+}-{0:0}, at: srcu_lock_acquire include/linux/srcu.h:164 [inline]
>> #1: ffff888051404378 (&c->btree_trans_barrier){.+.+}-{0:0}, at: srcu_read_lock include/linux/srcu.h:256 [inline]
>> #1: ffff888051404378 (&c->btree_trans_barrier){.+.+}-{0:0}, at: __bch2_trans_get+0x7e4/0xd30 fs/bcachefs/btree_iter.c:3377
>> #2: ffff8880514266d0 (&c->gc_lock){.+.+}-{4:4}, at: bch2_btree_update_start+0x682/0x14e0 fs/bcachefs/btree_update_interior.c:1180
>> #3: ffff888051401c68 (&bc->lock){+.+.}-{4:4}, at: bch2_btree_node_mem_alloc+0x559/0x16f0 fs/bcachefs/btree_cache.c:804
>> 
>> stack backtrace:
>> CPU: 0 UID: 0 PID: 5625 Comm: syz.0.21 Not tainted 6.14.0-rc2-syzkaller-00039-g09fbf3d50205 #0
>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
>> Call Trace:
>> <TASK>
>> __dump_stack lib/dump_stack.c:94 [inline]
>> dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
>> print_circular_bug+0x13a/0x1b0 kernel/locking/lockdep.c:2076
>> check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2208
>> check_prev_add kernel/locking/lockdep.c:3163 [inline]
>> check_prevs_add kernel/locking/lockdep.c:3282 [inline]
>> validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3906
>> __lock_acquire+0x1397/0x2100 kernel/locking/lockdep.c:5228
>> lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5851
>> __mutex_lock_common kernel/locking/mutex.c:585 [inline]
>> __mutex_lock+0x19c/0x1010 kernel/locking/mutex.c:730
>> pcpu_alloc_noprof+0x293/0x1760 mm/percpu.c:1782
>> __six_lock_init+0x104/0x150 fs/bcachefs/six.c:876
>> bch2_btree_lock_init+0x38/0x100 fs/bcachefs/btree_locking.c:12
>> bch2_btree_node_mem_alloc+0x565/0x16f0 fs/bcachefs/btree_cache.c:807
>> __bch2_btree_node_alloc fs/bcachefs/btree_update_interior.c:304 [inline]
>> bch2_btree_reserve_get+0x2df/0x1890 fs/bcachefs/btree_update_interior.c:532
>> bch2_btree_update_start+0xe56/0x14e0 fs/bcachefs/btree_update_interior.c:1230
>> bch2_btree_split_leaf+0x121/0x880 fs/bcachefs/btree_update_interior.c:1851
>> bch2_trans_commit_error+0x212/0x1380 fs/bcachefs/btree_trans_commit.c:908
>> __bch2_trans_commit+0x812b/0x97a0 fs/bcachefs/btree_trans_commit.c:1085
>> bch2_trans_commit fs/bcachefs/btree_update.h:183 [inline]
>> bch2_trans_mark_metadata_bucket+0x47a/0x17b0 fs/bcachefs/buckets.c:1043
>> bch2_trans_mark_metadata_sectors fs/bcachefs/buckets.c:1060 [inline]
>> __bch2_trans_mark_dev_sb fs/bcachefs/buckets.c:1100 [inline]
>> bch2_trans_mark_dev_sb+0x3f6/0x820 fs/bcachefs/buckets.c:1128
>> bch2_trans_mark_dev_sbs_flags+0x6be/0x720 fs/bcachefs/buckets.c:1138
>> bch2_fs_initialize+0xba0/0x1610 fs/bcachefs/recovery.c:1149
>> bch2_fs_start+0x36d/0x610 fs/bcachefs/super.c:1042
>> bch2_fs_get_tree+0xd8d/0x1740 fs/bcachefs/fs.c:2203
>> vfs_get_tree+0x90/0x2b0 fs/super.c:1814
>> do_new_mount+0x2be/0xb40 fs/namespace.c:3560
>> do_mount fs/namespace.c:3900 [inline]
>> __do_sys_mount fs/namespace.c:4111 [inline]
>> __se_sys_mount+0x2d6/0x3c0 fs/namespace.c:4088
>> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>> do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
>> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>> RIP: 0033:0x7fcaed38e58a
>> Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb a6 e8 de 1a 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
>> RSP: 002b:00007fcaec5fde68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
>> RAX: ffffffffffffffda RBX: 00007fcaec5fdef0 RCX: 00007fcaed38e58a
>> RDX: 00004000000000c0 RSI: 0000400000000180 RDI: 00007fcaec5fdeb0
>> RBP: 00004000000000c0 R08: 00007fcaec5fdef0 R09: 0000000000000000
>> 
>>> ---
>>> fs/bcachefs/six.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>> 
>>> diff --git a/fs/bcachefs/six.c b/fs/bcachefs/six.c
>>> index 7e7c66a1e1a6..ccdc6d496910 100644
>>> --- a/fs/bcachefs/six.c
>>> +++ b/fs/bcachefs/six.c
>>> @@ -873,7 +873,7 @@ void __six_lock_init(struct six_lock *lock, const char *name,
>>> * failure if they wish by checking lock->readers, but generally
>>> * will not want to treat it as an error.
>>> */
>>> - lock->readers = alloc_percpu(unsigned);
>>> + lock->readers = alloc_percpu_gfp(unsigned, GFP_NOWAIT|__GFP_NOWARN);
>>> }
>>> #endif
>>> }
>>> -- 
>>> 2.47.0
>>> 
> 
>

next prev parent reply	other threads:[~2025-02-20 17:16 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-12 10:06 [PATCH] bcachefs: Use alloc_percpu_gfp to avoid deadlock Alan Huang
2025-02-12 14:27 ` Kent Overstreet
2025-02-20 10:57   ` Alan Huang
2025-02-20 12:40     ` Kent Overstreet
2025-02-20 12:44       ` Alan Huang
2025-02-20 17:16     ` Vlastimil Babka [this message]
2025-02-20 20:37       ` Kent Overstreet
2025-02-21  2:46         ` Dennis Zhou
2025-02-21  7:21           ` Vlastimil Babka
2025-02-21 19:44           ` Alan Huang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=78d954b5-e33f-4bbc-855b-e91e96278bef@suse.cz \
    --to=vbabka@suse.cz \
    --cc=cl@linux.com \
    --cc=dennis@kernel.org \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-bcachefs@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mmpgouride@gmail.com \
    --cc=syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.