* [PATCH AUTOSEL 5.15 19/39] btrfs: silence lockdep when reading chunk tree during mount
[not found] <20211126023156.441292-1-sashal@kernel.org>
@ 2021-11-26 2:31 ` Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 20/39] btrfs: check-integrity: fix a warning on write caching disabled disk Sasha Levin
1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2021-11-26 2:31 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Filipe Manana, David Sterba, Sasha Levin, clm, josef, linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
[ Upstream commit 4d9380e0da7be2351437cdac71673a9cd94e50fd ]
Often some test cases like btrfs/161 trigger lockdep splats that complain
about possible unsafe lock scenario due to the fact that during mount,
when reading the chunk tree we end up calling blkdev_get_by_path() while
holding a read lock on a leaf of the chunk tree. That produces a lockdep
splat like the following:
[ 3653.683975] ======================================================
[ 3653.685148] WARNING: possible circular locking dependency detected
[ 3653.686301] 5.15.0-rc7-btrfs-next-103 #1 Not tainted
[ 3653.687239] ------------------------------------------------------
[ 3653.688400] mount/447465 is trying to acquire lock:
[ 3653.689320] ffff8c6b0c76e528 (&disk->open_mutex){+.+.}-{3:3}, at: blkdev_get_by_dev.part.0+0xe7/0x320
[ 3653.691054]
but task is already holding lock:
[ 3653.692155] ffff8c6b0a9f39e0 (btrfs-chunk-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x24/0x110 [btrfs]
[ 3653.693978]
which lock already depends on the new lock.
[ 3653.695510]
the existing dependency chain (in reverse order) is:
[ 3653.696915]
-> #3 (btrfs-chunk-00){++++}-{3:3}:
[ 3653.698053] down_read_nested+0x4b/0x140
[ 3653.698893] __btrfs_tree_read_lock+0x24/0x110 [btrfs]
[ 3653.699988] btrfs_read_lock_root_node+0x31/0x40 [btrfs]
[ 3653.701205] btrfs_search_slot+0x537/0xc00 [btrfs]
[ 3653.702234] btrfs_insert_empty_items+0x32/0x70 [btrfs]
[ 3653.703332] btrfs_init_new_device+0x563/0x15b0 [btrfs]
[ 3653.704439] btrfs_ioctl+0x2110/0x3530 [btrfs]
[ 3653.705405] __x64_sys_ioctl+0x83/0xb0
[ 3653.706215] do_syscall_64+0x3b/0xc0
[ 3653.706990] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3653.708040]
-> #2 (sb_internal#2){.+.+}-{0:0}:
[ 3653.708994] lock_release+0x13d/0x4a0
[ 3653.709533] up_write+0x18/0x160
[ 3653.710017] btrfs_sync_file+0x3f3/0x5b0 [btrfs]
[ 3653.710699] __loop_update_dio+0xbd/0x170 [loop]
[ 3653.711360] lo_ioctl+0x3b1/0x8a0 [loop]
[ 3653.711929] block_ioctl+0x48/0x50
[ 3653.712442] __x64_sys_ioctl+0x83/0xb0
[ 3653.712991] do_syscall_64+0x3b/0xc0
[ 3653.713519] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3653.714233]
-> #1 (&lo->lo_mutex){+.+.}-{3:3}:
[ 3653.715026] __mutex_lock+0x92/0x900
[ 3653.715648] lo_open+0x28/0x60 [loop]
[ 3653.716275] blkdev_get_whole+0x28/0x90
[ 3653.716867] blkdev_get_by_dev.part.0+0x142/0x320
[ 3653.717537] blkdev_open+0x5e/0xa0
[ 3653.718043] do_dentry_open+0x163/0x390
[ 3653.718604] path_openat+0x3f0/0xa80
[ 3653.719128] do_filp_open+0xa9/0x150
[ 3653.719652] do_sys_openat2+0x97/0x160
[ 3653.720197] __x64_sys_openat+0x54/0x90
[ 3653.720766] do_syscall_64+0x3b/0xc0
[ 3653.721285] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3653.721986]
-> #0 (&disk->open_mutex){+.+.}-{3:3}:
[ 3653.722775] __lock_acquire+0x130e/0x2210
[ 3653.723348] lock_acquire+0xd7/0x310
[ 3653.723867] __mutex_lock+0x92/0x900
[ 3653.724394] blkdev_get_by_dev.part.0+0xe7/0x320
[ 3653.725041] blkdev_get_by_path+0xb8/0xd0
[ 3653.725614] btrfs_get_bdev_and_sb+0x1b/0xb0 [btrfs]
[ 3653.726332] open_fs_devices+0xd7/0x2c0 [btrfs]
[ 3653.726999] btrfs_read_chunk_tree+0x3ad/0x870 [btrfs]
[ 3653.727739] open_ctree+0xb8e/0x17bf [btrfs]
[ 3653.728384] btrfs_mount_root.cold+0x12/0xde [btrfs]
[ 3653.729130] legacy_get_tree+0x30/0x50
[ 3653.729676] vfs_get_tree+0x28/0xc0
[ 3653.730192] vfs_kern_mount.part.0+0x71/0xb0
[ 3653.730800] btrfs_mount+0x11d/0x3a0 [btrfs]
[ 3653.731427] legacy_get_tree+0x30/0x50
[ 3653.731970] vfs_get_tree+0x28/0xc0
[ 3653.732486] path_mount+0x2d4/0xbe0
[ 3653.732997] __x64_sys_mount+0x103/0x140
[ 3653.733560] do_syscall_64+0x3b/0xc0
[ 3653.734080] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3653.734782]
other info that might help us debug this:
[ 3653.735784] Chain exists of:
&disk->open_mutex --> sb_internal#2 --> btrfs-chunk-00
[ 3653.737123] Possible unsafe locking scenario:
[ 3653.737865] CPU0 CPU1
[ 3653.738435] ---- ----
[ 3653.739007] lock(btrfs-chunk-00);
[ 3653.739449] lock(sb_internal#2);
[ 3653.740193] lock(btrfs-chunk-00);
[ 3653.740955] lock(&disk->open_mutex);
[ 3653.741431]
*** DEADLOCK ***
[ 3653.742176] 3 locks held by mount/447465:
[ 3653.742739] #0: ffff8c6acf85c0e8 (&type->s_umount_key#44/1){+.+.}-{3:3}, at: alloc_super+0xd5/0x3b0
[ 3653.744114] #1: ffffffffc0b28f70 (uuid_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x59/0x870 [btrfs]
[ 3653.745563] #2: ffff8c6b0a9f39e0 (btrfs-chunk-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x24/0x110 [btrfs]
[ 3653.747066]
stack backtrace:
[ 3653.747723] CPU: 4 PID: 447465 Comm: mount Not tainted 5.15.0-rc7-btrfs-next-103 #1
[ 3653.748873] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 3653.750592] Call Trace:
[ 3653.750967] dump_stack_lvl+0x57/0x72
[ 3653.751526] check_noncircular+0xf3/0x110
[ 3653.752136] ? stack_trace_save+0x4b/0x70
[ 3653.752748] __lock_acquire+0x130e/0x2210
[ 3653.753356] lock_acquire+0xd7/0x310
[ 3653.753898] ? blkdev_get_by_dev.part.0+0xe7/0x320
[ 3653.754596] ? lock_is_held_type+0xe8/0x140
[ 3653.755125] ? blkdev_get_by_dev.part.0+0xe7/0x320
[ 3653.755729] ? blkdev_get_by_dev.part.0+0xe7/0x320
[ 3653.756338] __mutex_lock+0x92/0x900
[ 3653.756794] ? blkdev_get_by_dev.part.0+0xe7/0x320
[ 3653.757400] ? do_raw_spin_unlock+0x4b/0xa0
[ 3653.757930] ? _raw_spin_unlock+0x29/0x40
[ 3653.758437] ? bd_prepare_to_claim+0x129/0x150
[ 3653.758999] ? trace_module_get+0x2b/0xd0
[ 3653.759508] ? try_module_get.part.0+0x50/0x80
[ 3653.760072] blkdev_get_by_dev.part.0+0xe7/0x320
[ 3653.760661] ? devcgroup_check_permission+0xc1/0x1f0
[ 3653.761288] blkdev_get_by_path+0xb8/0xd0
[ 3653.761797] btrfs_get_bdev_and_sb+0x1b/0xb0 [btrfs]
[ 3653.762454] open_fs_devices+0xd7/0x2c0 [btrfs]
[ 3653.763055] ? clone_fs_devices+0x8f/0x170 [btrfs]
[ 3653.763689] btrfs_read_chunk_tree+0x3ad/0x870 [btrfs]
[ 3653.764370] ? kvm_sched_clock_read+0x14/0x40
[ 3653.764922] open_ctree+0xb8e/0x17bf [btrfs]
[ 3653.765493] ? super_setup_bdi_name+0x79/0xd0
[ 3653.766043] btrfs_mount_root.cold+0x12/0xde [btrfs]
[ 3653.766780] ? rcu_read_lock_sched_held+0x3f/0x80
[ 3653.767488] ? kfree+0x1f2/0x3c0
[ 3653.767979] legacy_get_tree+0x30/0x50
[ 3653.768548] vfs_get_tree+0x28/0xc0
[ 3653.769076] vfs_kern_mount.part.0+0x71/0xb0
[ 3653.769718] btrfs_mount+0x11d/0x3a0 [btrfs]
[ 3653.770381] ? rcu_read_lock_sched_held+0x3f/0x80
[ 3653.771086] ? kfree+0x1f2/0x3c0
[ 3653.771574] legacy_get_tree+0x30/0x50
[ 3653.772136] vfs_get_tree+0x28/0xc0
[ 3653.772673] path_mount+0x2d4/0xbe0
[ 3653.773201] __x64_sys_mount+0x103/0x140
[ 3653.773793] do_syscall_64+0x3b/0xc0
[ 3653.774333] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3653.775094] RIP: 0033:0x7f648bc45aaa
This happens because through btrfs_read_chunk_tree(), which is called only
during mount, ends up acquiring the mutex open_mutex of a block device
while holding a read lock on a leaf of the chunk tree while other paths
need to acquire other locks before locking extent buffers of the chunk
tree.
Since at mount time when we call btrfs_read_chunk_tree() we know that
we don't have other tasks running in parallel and modifying the chunk
tree, we can simply skip locking of chunk tree extent buffers. So do
that and move the assertion that checks the fs is not yet mounted to the
top block of btrfs_read_chunk_tree(), with a comment before doing it.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/btrfs/volumes.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 6afebc5b10c84..8ad62d9fbb0ad 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7483,6 +7483,19 @@ int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
*/
fs_info->fs_devices->total_rw_bytes = 0;
+ /*
+ * Lockdep complains about possible circular locking dependency between
+ * a disk's open_mutex (struct gendisk.open_mutex), the rw semaphores
+ * used for freeze procection of a fs (struct super_block.s_writers),
+ * which we take when starting a transaction, and extent buffers of the
+ * chunk tree if we call read_one_dev() while holding a lock on an
+ * extent buffer of the chunk tree. Since we are mounting the filesystem
+ * and at this point there can't be any concurrent task modifying the
+ * chunk tree, to keep it simple, just skip locking on the chunk tree.
+ */
+ ASSERT(!test_bit(BTRFS_FS_OPEN, &fs_info->flags));
+ path->skip_locking = 1;
+
/*
* Read all device items, and then all the chunk items. All
* device items are found before any chunk item (their object id
@@ -7508,10 +7521,6 @@ int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
goto error;
break;
}
- /*
- * The nodes on level 1 are not locked but we don't need to do
- * that during mount time as nothing else can access the tree
- */
node = path->nodes[1];
if (node) {
if (last_ra_node != node->start) {
@@ -7539,7 +7548,6 @@ int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
* requirement for chunk allocation, see the comment on
* top of btrfs_chunk_alloc() for details.
*/
- ASSERT(!test_bit(BTRFS_FS_OPEN, &fs_info->flags));
chunk = btrfs_item_ptr(leaf, slot, struct btrfs_chunk);
ret = read_one_chunk(&found_key, leaf, chunk);
if (ret)
--
2.33.0
^ permalink raw reply related [flat|nested] 2+ messages in thread* [PATCH AUTOSEL 5.15 20/39] btrfs: check-integrity: fix a warning on write caching disabled disk
[not found] <20211126023156.441292-1-sashal@kernel.org>
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 19/39] btrfs: silence lockdep when reading chunk tree during mount Sasha Levin
@ 2021-11-26 2:31 ` Sasha Levin
1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2021-11-26 2:31 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Wang Yugui, Filipe Manana, David Sterba, Sasha Levin, clm, josef,
linux-btrfs
From: Wang Yugui <wangyugui@e16-tech.com>
[ Upstream commit a91cf0ffbc244792e0b3ecf7d0fddb2f344b461f ]
When a disk has write caching disabled, we skip submission of a bio with
flush and sync requests before writing the superblock, since it's not
needed. However when the integrity checker is enabled, this results in
reports that there are metadata blocks referred by a superblock that
were not properly flushed. So don't skip the bio submission only when
the integrity checker is enabled for the sake of simplicity, since this
is a debug tool and not meant for use in non-debug builds.
fstests/btrfs/220 trigger a check-integrity warning like the following
when CONFIG_BTRFS_FS_CHECK_INTEGRITY=y and the disk with WCE=0.
btrfs: attempt to write superblock which references block M @5242880 (sdb2/5242880/0) which is not flushed out of disk's write cache (block flush_gen=1, dev->flush_gen=0)!
------------[ cut here ]------------
WARNING: CPU: 28 PID: 843680 at fs/btrfs/check-integrity.c:2196 btrfsic_process_written_superblock+0x22a/0x2a0 [btrfs]
CPU: 28 PID: 843680 Comm: umount Not tainted 5.15.0-0.rc5.39.el8.x86_64 #1
Hardware name: Dell Inc. Precision T7610/0NK70N, BIOS A18 09/11/2019
RIP: 0010:btrfsic_process_written_superblock+0x22a/0x2a0 [btrfs]
RSP: 0018:ffffb642afb47940 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000
RDX: 00000000ffffffff RSI: ffff8b722fc97d00 RDI: ffff8b722fc97d00
RBP: ffff8b5601c00000 R08: 0000000000000000 R09: c0000000ffff7fff
R10: 0000000000000001 R11: ffffb642afb476f8 R12: ffffffffffffffff
R13: ffffb642afb47974 R14: ffff8b5499254c00 R15: 0000000000000003
FS: 00007f00a06d4080(0000) GS:ffff8b722fc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fff5cff5ff0 CR3: 00000001c0c2a006 CR4: 00000000001706e0
Call Trace:
btrfsic_process_written_block+0x2f7/0x850 [btrfs]
__btrfsic_submit_bio.part.19+0x310/0x330 [btrfs]
? bio_associate_blkg_from_css+0xa4/0x2c0
btrfsic_submit_bio+0x18/0x30 [btrfs]
write_dev_supers+0x81/0x2a0 [btrfs]
? find_get_pages_range_tag+0x219/0x280
? pagevec_lookup_range_tag+0x24/0x30
? __filemap_fdatawait_range+0x6d/0xf0
? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
? find_first_extent_bit+0x9b/0x160 [btrfs]
? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
write_all_supers+0x1b3/0xa70 [btrfs]
? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
btrfs_commit_transaction+0x59d/0xac0 [btrfs]
close_ctree+0x11d/0x339 [btrfs]
generic_shutdown_super+0x71/0x110
kill_anon_super+0x14/0x30
btrfs_kill_super+0x12/0x20 [btrfs]
deactivate_locked_super+0x31/0x70
cleanup_mnt+0xb8/0x140
task_work_run+0x6d/0xb0
exit_to_user_mode_prepare+0x1f0/0x200
syscall_exit_to_user_mode+0x12/0x30
do_syscall_64+0x46/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f009f711dfb
RSP: 002b:00007fff5cff7928 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
RAX: 0000000000000000 RBX: 000055b68c6c9970 RCX: 00007f009f711dfb
RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055b68c6c9b50
RBP: 0000000000000000 R08: 000055b68c6ca900 R09: 00007f009f795580
R10: 0000000000000000 R11: 0000000000000246 R12: 000055b68c6c9b50
R13: 00007f00a04bf184 R14: 0000000000000000 R15: 00000000ffffffff
---[ end trace 2c4b82abcef9eec4 ]---
S-65536(sdb2/65536/1)
-->
M-1064960(sdb2/1064960/1)
Reviewed-by: Filipe Manana <fdmanana@gmail.com>
Signed-off-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/btrfs/disk-io.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index e00c4c1f622f3..c37239c8ac0c6 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3970,11 +3970,23 @@ static void btrfs_end_empty_barrier(struct bio *bio)
*/
static void write_dev_flush(struct btrfs_device *device)
{
- struct request_queue *q = bdev_get_queue(device->bdev);
struct bio *bio = device->flush_bio;
+#ifndef CONFIG_BTRFS_FS_CHECK_INTEGRITY
+ /*
+ * When a disk has write caching disabled, we skip submission of a bio
+ * with flush and sync requests before writing the superblock, since
+ * it's not needed. However when the integrity checker is enabled, this
+ * results in reports that there are metadata blocks referred by a
+ * superblock that were not properly flushed. So don't skip the bio
+ * submission only when the integrity checker is enabled for the sake
+ * of simplicity, since this is a debug tool and not meant for use in
+ * non-debug builds.
+ */
+ struct request_queue *q = bdev_get_queue(device->bdev);
if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags))
return;
+#endif
bio_reset(bio);
bio->bi_end_io = btrfs_end_empty_barrier;
--
2.33.0
^ permalink raw reply related [flat|nested] 2+ messages in thread