* [dm-devel] Sleeping function called from invalid context
@ 2023-09-26 22:08 Matthew Wilcox
2023-09-26 22:53 ` Matthew Wilcox
0 siblings, 1 reply; 4+ messages in thread
From: Matthew Wilcox @ 2023-09-26 22:08 UTC (permalink / raw)
To: Alasdair Kergon, Mike Snitzer, dm-devel
Got this in linux-next 20230926, and I don't think it's due to my
patches on top (it may be, will verify):
04178 generic/347 run fstests generic/347 at 2023-09-26 17:24:55
04178 XFS (sdb): Mounting V5 Filesystem c0c11e6a-170c-48e4-84c5-42b46d6d5197
04178 XFS (sdb): Ending clean mount
04179 BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1525
04179 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1096408, name: kworker/u16:32
04179 preempt_count: 1, expected: 0
04179 RCU nest depth: 0, expected: 0
04179 4 locks held by kworker/u16:32/1096408:
04179 #0: ffff88800ac6ff38 ((wq_completion)dm-thin){+.+.}-{0:0}, at: process_one_work+0x17b/0x4a0
04179 #1: ffff88800800be48 ((work_completion)(&pool->worker)){+.+.}-{0:0}, at: process_one_work+0x17b/0x4a0
04179 #2: ffff88804d8715d0 (&pmd->root_lock){++++}-{3:3}, at: dm_thin_alloc_data_block+0x26/0xb0
04179 #3: ffff88800db2fe98 (&ea->lock){+.+.}-{2:2}, at: dm_ea_context_alloc+0x2a/0x380
04179 Preemption disabled at:
04179 [<ffffffff81878cba>] dm_ea_context_alloc+0x2a/0x380
04179 CPU: 6 PID: 1096408 Comm: kworker/u16:32 Kdump: loaded Not tainted 6.6.0-rc3-next-20230926-00005-geb8b46fa6084-dirty #289
04179 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
04179 Workqueue: dm-thin do_worker
04179 Call Trace:
04179 <TASK>
04179 dump_stack_lvl+0x69/0x80
04179 dump_stack+0x10/0x20
04179 __might_resched+0x1a8/0x2c0
04179 __might_sleep+0x3d/0x70
04179 down_read+0x19/0x180
04179 cache_get+0x46/0xc0
04179 new_read+0x44/0x180
04179 dm_bufio_read+0x24/0x30
04179 dm_bm_read_lock+0x21/0x70
04179 dm_tm_read_lock+0x24/0xa0
04179 sm_ll_find_free_block+0xa1/0x190
04179 sm_ll_find_common_free_block+0x2e/0x80
04179 sm_disk_new_block_in_range+0x2f/0x70
04179 sm_alloc_extent+0xe/0x10
04179 dm_ea_context_alloc+0x4e/0x380
04179 dm_thin_alloc_data_block+0x4f/0xb0
04179 alloc_data_block+0x51/0x190
04179 process_cell+0x34b/0x620
04179 ? _raw_spin_unlock_irq+0x22/0x50
04179 do_worker+0x275/0xa00
04179 process_one_work+0x1d8/0x4a0
04179 ? process_one_work+0x17b/0x4a0
04179 worker_thread+0x1d9/0x3d0
04179 ? rescuer_thread+0x370/0x370
04179 kthread+0x109/0x140
04179 ? kthread_complete_and_exit+0x20/0x20
04179 ret_from_fork+0x31/0x50
04179 ? kthread_complete_and_exit+0x20/0x20
04179 ret_from_fork_asm+0x11/0x20
04179 </TASK>
04179
04179 =============================
04179 [ BUG: Invalid wait context ]
04179 6.6.0-rc3-next-20230926-00005-geb8b46fa6084-dirty #289 Tainted: G W
04179 -----------------------------
04179 kworker/u16:32/1096408 is trying to lock:
04179 ffff88803d8c02f0 (&bc->trees[i].lock){++++}-{3:3}, at: cache_get+0x46/0xc0
04179 other info that might help us debug this:
04179 context-{4:4}
04179 4 locks held by kworker/u16:32/1096408:
04179 #0: ffff88800ac6ff38 ((wq_completion)dm-thin){+.+.}-{0:0}, at: process_one_work+0x17b/0x4a0
04179 #1: ffff88800800be48 ((work_completion)(&pool->worker)){+.+.}-{0:0}, at: process_one_work+0x17b/0x4a0
04179 #2: ffff88804d8715d0 (&pmd->root_lock){++++}-{3:3}, at: dm_thin_alloc_data_block+0x26/0xb0
04179 #3: ffff88800db2fe98 (&ea->lock){+.+.}-{2:2}, at: dm_ea_context_alloc+0x2a/0x380
04179 stack backtrace:
04179 CPU: 6 PID: 1096408 Comm: kworker/u16:32 Kdump: loaded Tainted: G W 6.6.0-rc3-next-20230926-00005-geb8b46fa6084-dirty #289
04179 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
04179 Workqueue: dm-thin do_worker
04179 Call Trace:
04179 <TASK>
04179 dump_stack_lvl+0x4a/0x80
04179 dump_stack+0x10/0x20
04179 __lock_acquire+0x9c9/0x26d0
04179 ? dump_stack_lvl+0x73/0x80
04179 lock_acquire+0xb8/0x280
04179 ? cache_get+0x46/0xc0
04179 ? __might_resched+0x1b4/0x2c0
04179 down_read+0x3c/0x180
04179 ? cache_get+0x46/0xc0
04179 cache_get+0x46/0xc0
04179 new_read+0x44/0x180
04179 dm_bufio_read+0x24/0x30
04179 dm_bm_read_lock+0x21/0x70
04179 dm_tm_read_lock+0x24/0xa0
04179 sm_ll_find_free_block+0xa1/0x190
04179 sm_ll_find_common_free_block+0x2e/0x80
04179 sm_disk_new_block_in_range+0x2f/0x70
04179 sm_alloc_extent+0xe/0x10
04179 dm_ea_context_alloc+0x4e/0x380
04179 ? save_sm_roots+0x70/0x70
04179 dm_thin_alloc_data_block+0x4f/0xb0
04179 alloc_data_block+0x51/0x190
04179 process_cell+0x34b/0x620
04179 ? _raw_spin_unlock_irq+0x22/0x50
04179 do_worker+0x275/0xa00
04179 process_one_work+0x1d8/0x4a0
04179 ? process_one_work+0x17b/0x4a0
04179 worker_thread+0x1d9/0x3d0
04179 ? rescuer_thread+0x370/0x370
04179 kthread+0x109/0x140
04179 ? kthread_complete_and_exit+0x20/0x20
04179 ret_from_fork+0x31/0x50
04179 ? kthread_complete_and_exit+0x20/0x20
04179 ret_from_fork_asm+0x11/0x20
04179 </TASK>
I have a few more of these, but I don't see any new information in them.
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [dm-devel] Sleeping function called from invalid context
2023-09-26 22:08 [dm-devel] Sleeping function called from invalid context Matthew Wilcox
@ 2023-09-26 22:53 ` Matthew Wilcox
2023-09-27 4:42 ` Matthew Wilcox
0 siblings, 1 reply; 4+ messages in thread
From: Matthew Wilcox @ 2023-09-26 22:53 UTC (permalink / raw)
To: Alasdair Kergon, Mike Snitzer, dm-devel
On Tue, Sep 26, 2023 at 11:08:07PM +0100, Matthew Wilcox wrote:
> Got this in linux-next 20230926, and I don't think it's due to my
> patches on top (it may be, will verify):
Confirmed not my patches;
git bisect start
# status: waiting for both good and bad commits
# good: [6465e260f48790807eef06b583b38ca9789b6072] Linux 6.6-rc3
git bisect good 6465e260f48790807eef06b583b38ca9789b6072
# status: waiting for bad commit, 1 good commit known
# bad: [4ae73bba62a367f2314f6ce69e3085a941983d8b] Add linux-next specific files for 20230926
git bisect bad 4ae73bba62a367f2314f6ce69e3085a941983d8b
# good: [cc2f2a99c4be7c44a3441dd8f4b9e475d30800c3] Merge branch 'for-next' of https://evilpiepirate.org/git/bcachefs.git
git bisect good cc2f2a99c4be7c44a3441dd8f4b9e475d30800c3
# good: [5a3bc5b9e4e9744f372692a88fd70750ee4851b3] Merge branch 'for-linux-next' of git://anongit.freedesktop.org/drm/drm-misc
git bisect good 5a3bc5b9e4e9744f372692a88fd70750ee4851b3
I'm going to sleep now instead of running the last 10 steps of the
bisect. If nobody's found this when I wake up, I'll finish it then.
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [dm-devel] Sleeping function called from invalid context
2023-09-26 22:53 ` Matthew Wilcox
@ 2023-09-27 4:42 ` Matthew Wilcox
2023-09-27 15:57 ` Mike Snitzer
0 siblings, 1 reply; 4+ messages in thread
From: Matthew Wilcox @ 2023-09-27 4:42 UTC (permalink / raw)
To: Joe Thornber, dm-devel; +Cc: Mike Snitzer, Alasdair Kergon
On Tue, Sep 26, 2023 at 11:53:33PM +0100, Matthew Wilcox wrote:
> I'm going to sleep now instead of running the last 10 steps of the
> bisect. If nobody's found this when I wake up, I'll finish it then.
Bisection found it. I confirmed by hand; checking out this commit
yields a failed test, and then reverting it leads to a success.
commit 026e4728c276cdf3ec618a71a38181864596027b
Author: Joe Thornber <ejt@redhat.com>
Date: Wed Sep 13 10:39:09 2023 +0100
dm thin: Use the extent allocator for data blocks
The thin_pool object now contains an extent-allocator, and each thin
device contains an allocation-context from this. The allocation
context is used to guide data block allocations. The actual
allocation book-keeping is still done by the space-map.
2 new specific unit tests were added to dm-unit:
/thinp/fragmentation/thins
/thinp/fragmentation/snapshots
https://github.com/jthornber/dm-unit/blob/main/src/tests/thinp.rs
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Joe, in case you missed the earlier splat:
00009 generic/347 run fstests generic/347 at 2023-09-27 04:07:02
00010 BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1519
00010 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 79, name: kworker/u16:3
00010 preempt_count: 1, expected: 0
00010 RCU nest depth: 0, expected: 0
00010 4 locks held by kworker/u16:3/79:
00010 #0: ffff888008a22138 ((wq_completion)dm-thin){+.+.}-{0:0}, at: process_one_work+0x17b/0x4a0
00010 #1: ffff888004fb7e48 ((work_completion)(&pool->worker)){+.+.}-{0:0}, at: process_one_work+0x17b/0x4a0
00010 #2: ffff88800cb1c1d0 (&pmd->root_lock){++++}-{3:3}, at: dm_thin_alloc_data_block+0x26/0xb0
00010 #3: ffff888008b77298 (&ea->lock){+.+.}-{2:2}, at: dm_ea_context_alloc+0x2a/0x380
00010 Preemption disabled at:
00010 [<ffffffff8186f6ca>] dm_ea_context_alloc+0x2a/0x380
00010 CPU: 5 PID: 79 Comm: kworker/u16:3 Kdump: loaded Not tainted 6.6.0-rc2-00004-g026e4728c276 #306
00010 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
00010 Workqueue: dm-thin do_worker
00010 Call Trace:
00010 <TASK>
00010 dump_stack_lvl+0x69/0x80
00010 dump_stack+0x10/0x20
00010 __might_resched+0x1a8/0x2c0
00010 __might_sleep+0x3d/0x70
00010 down_read+0x19/0x180
00010 cache_get+0x46/0xc0
00010 new_read+0x44/0x180
00010 dm_bufio_read+0x24/0x30
00010 dm_bm_read_lock+0x21/0x70
00010 dm_tm_read_lock+0x24/0xa0
00010 sm_ll_find_free_block+0xa1/0x190
00010 sm_ll_find_common_free_block+0x2e/0x80
00010 sm_disk_new_block_in_range+0x2f/0x70
00010 sm_alloc_extent+0xe/0x10
00010 dm_ea_context_alloc+0x4e/0x380
00010 ? save_sm_roots+0x70/0x70
00010 dm_thin_alloc_data_block+0x4f/0xb0
00010 alloc_data_block+0x51/0x190
00010 process_cell+0x34b/0x620
00010 ? _raw_spin_unlock_irq+0x22/0x50
00010 do_worker+0x275/0xa00
00010 process_one_work+0x1d8/0x4a0
00010 ? process_one_work+0x17b/0x4a0
00010 worker_thread+0x1d9/0x3d0
00010 ? wq_sysfs_prep_attrs+0x90/0x90
00010 kthread+0x109/0x140
00010 ? kthread_complete_and_exit+0x20/0x20
00010 ret_from_fork+0x31/0x50
00010 ? kthread_complete_and_exit+0x20/0x20
00010 ret_from_fork_asm+0x11/0x20
00010 </TASK>
00010
00010 =============================
00010 [ BUG: Invalid wait context ]
00010 6.6.0-rc2-00004-g026e4728c276 #306 Tainted: G W
00010 -----------------------------
00010 kworker/u16:3/79 is trying to lock:
00010 ffff88800cda7330 (&bc->trees[i].lock){++++}-{3:3}, at: cache_get+0x46/0xc0
00010 other info that might help us debug this:
00010 context-{4:4}
00010 4 locks held by kworker/u16:3/79:
00010 #0: ffff888008a22138 ((wq_completion)dm-thin){+.+.}-{0:0}, at: process_one_work+0x17b/0x4a0
00010 #1: ffff888004fb7e48 ((work_completion)(&pool->worker)){+.+.}-{0:0}, at: process_one_work+0x17b/0x4a0
00010 #2: ffff88800cb1c1d0 (&pmd->root_lock){++++}-{3:3}, at: dm_thin_alloc_data_block+0x26/0xb0
00010 #3: ffff888008b77298 (&ea->lock){+.+.}-{2:2}, at: dm_ea_context_alloc+0x2a/0x380
00010 stack backtrace:
00010 CPU: 5 PID: 79 Comm: kworker/u16:3 Kdump: loaded Tainted: G W 6.6.0-rc2-00004-g026e4728c276 #306
00010 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
00010 Workqueue: dm-thin do_worker
00010 Call Trace:
00010 <TASK>
00010 dump_stack_lvl+0x4a/0x80
00010 dump_stack+0x10/0x20
00010 __lock_acquire+0x9c9/0x26d0
00010 ? dump_stack_lvl+0x73/0x80
00010 lock_acquire+0xb8/0x280
00010 ? cache_get+0x46/0xc0
00010 ? __might_resched+0x1b4/0x2c0
00010 down_read+0x3c/0x180
00010 ? cache_get+0x46/0xc0
00010 cache_get+0x46/0xc0
00010 new_read+0x44/0x180
00010 dm_bufio_read+0x24/0x30
00010 dm_bm_read_lock+0x21/0x70
00010 dm_tm_read_lock+0x24/0xa0
00010 sm_ll_find_free_block+0xa1/0x190
00010 sm_ll_find_common_free_block+0x2e/0x80
00010 sm_disk_new_block_in_range+0x2f/0x70
00010 sm_alloc_extent+0xe/0x10
00010 dm_ea_context_alloc+0x4e/0x380
00010 ? save_sm_roots+0x70/0x70
00010 dm_thin_alloc_data_block+0x4f/0xb0
00010 alloc_data_block+0x51/0x190
00010 process_cell+0x34b/0x620
00010 ? _raw_spin_unlock_irq+0x22/0x50
00010 do_worker+0x275/0xa00
00010 process_one_work+0x1d8/0x4a0
00010 ? process_one_work+0x17b/0x4a0
00010 worker_thread+0x1d9/0x3d0
00010 ? wq_sysfs_prep_attrs+0x90/0x90
00010 kthread+0x109/0x140
00010 ? kthread_complete_and_exit+0x20/0x20
00010 ret_from_fork+0x31/0x50
00010 ? kthread_complete_and_exit+0x20/0x20
00010 ret_from_fork_asm+0x11/0x20
00010 </TASK>
This seems fairly clear to me; there's a spin_lock() around the call to
__alloc() in dm_ea_context_alloc(), which then calls all the way down
that stack until you get to cache_get(), which takes a semaphore and
the locking validation quite reasonably says "You can't do that".
I'm sure you don't need my help coming up with a fix. Although I might
ask that you turn on at least some basic locking checks in future while
developing your code, even if not full lockdep. I think this particular
warning comes out of CONFIG_DEBUG_ATOMIC_SLEEP=y.
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [dm-devel] Sleeping function called from invalid context
2023-09-27 4:42 ` Matthew Wilcox
@ 2023-09-27 15:57 ` Mike Snitzer
0 siblings, 0 replies; 4+ messages in thread
From: Mike Snitzer @ 2023-09-27 15:57 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: Joe Thornber, dm-devel, Alasdair Kergon
On Wed, Sep 27 2023 at 12:42P -0400,
Matthew Wilcox <willy@infradead.org> wrote:
> On Tue, Sep 26, 2023 at 11:53:33PM +0100, Matthew Wilcox wrote:
> > I'm going to sleep now instead of running the last 10 steps of the
> > bisect. If nobody's found this when I wake up, I'll finish it then.
>
> Bisection found it. I confirmed by hand; checking out this commit
> yields a failed test, and then reverting it leads to a success.
>
> commit 026e4728c276cdf3ec618a71a38181864596027b
> Author: Joe Thornber <ejt@redhat.com>
> Date: Wed Sep 13 10:39:09 2023 +0100
>
> dm thin: Use the extent allocator for data blocks
>
> The thin_pool object now contains an extent-allocator, and each thin
> device contains an allocation-context from this. The allocation
> context is used to guide data block allocations. The actual
> allocation book-keeping is still done by the space-map.
>
> 2 new specific unit tests were added to dm-unit:
>
> /thinp/fragmentation/thins
> /thinp/fragmentation/snapshots
>
> https://github.com/jthornber/dm-unit/blob/main/src/tests/thinp.rs
>
> Signed-off-by: Joe Thornber <ejt@redhat.com>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
>
> Joe, in case you missed the earlier splat:
...
>
> This seems fairly clear to me; there's a spin_lock() around the call to
> __alloc() in dm_ea_context_alloc(), which then calls all the way down
> that stack until you get to cache_get(), which takes a semaphore and
> the locking validation quite reasonably says "You can't do that".
>
> I'm sure you don't need my help coming up with a fix. Although I might
> ask that you turn on at least some basic locking checks in future while
> developing your code, even if not full lockdep. I think this particular
> warning comes out of CONFIG_DEBUG_ATOMIC_SLEEP=y.
Thanks for the report and bisecting -- wish I caught you earlier to
save you the hassle (was immediately clear which commit caused it when
I saw the trace).
Joe is looking at how best to fix and also updating dmtest-python [1]
to scrape the kernel log to pickup such bugs (as-is the kernel will
carry on despite the splat, so far anyway).
But I've dropped the related code from linux-next until this is all
fixed properly.
Mike
[1] https://github.com/jthornber/dmtest-python.git
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-09-27 15:59 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-26 22:08 [dm-devel] Sleeping function called from invalid context Matthew Wilcox
2023-09-26 22:53 ` Matthew Wilcox
2023-09-27 4:42 ` Matthew Wilcox
2023-09-27 15:57 ` Mike Snitzer
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.