* [PATCH AUTOSEL 5.7 23/60] btrfs: fix lockdep splat from btrfs_dump_space_info
[not found] <20200810191028.3793884-1-sashal@kernel.org>
2020-08-10 19:09 ` [PATCH AUTOSEL 5.7 08/60] fs/btrfs: Add cond_resched() for try_release_extent_mapping() stalls Sasha Levin
@ 2020-08-10 19:09 ` Sasha Levin
2020-08-10 19:10 ` [PATCH AUTOSEL 5.7 54/60] btrfs: allow btrfs_truncate_block() to fallback to nocow for data space reservation Sasha Levin
2020-08-10 19:10 ` [PATCH AUTOSEL 5.7 55/60] btrfs: qgroup: free per-trans reserved space when a subvolume gets dropped Sasha Levin
3 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2020-08-10 19:09 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Josef Bacik, David Sterba, Sasha Levin, linux-btrfs
From: Josef Bacik <josef@toxicpanda.com>
[ Upstream commit ab0db043c35da3477e57d4d516492b2d51a5ca0f ]
When running with -o enospc_debug you can get the following splat if one
of the dump_space_info's trip
======================================================
WARNING: possible circular locking dependency detected
5.8.0-rc5+ #20 Tainted: G OE
------------------------------------------------------
dd/563090 is trying to acquire lock:
ffff9e7dbf4f1e18 (&ctl->tree_lock){+.+.}-{2:2}, at: btrfs_dump_free_space+0x2b/0xa0 [btrfs]
but task is already holding lock:
ffff9e7e2284d428 (&cache->lock){+.+.}-{2:2}, at: btrfs_dump_space_info+0xaa/0x120 [btrfs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (&cache->lock){+.+.}-{2:2}:
_raw_spin_lock+0x25/0x30
btrfs_add_reserved_bytes+0x3c/0x3c0 [btrfs]
find_free_extent+0x7ef/0x13b0 [btrfs]
btrfs_reserve_extent+0x9b/0x180 [btrfs]
btrfs_alloc_tree_block+0xc1/0x340 [btrfs]
alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs]
__btrfs_cow_block+0x122/0x530 [btrfs]
btrfs_cow_block+0x106/0x210 [btrfs]
commit_cowonly_roots+0x55/0x300 [btrfs]
btrfs_commit_transaction+0x4ed/0xac0 [btrfs]
sync_filesystem+0x74/0x90
generic_shutdown_super+0x22/0x100
kill_anon_super+0x14/0x30
btrfs_kill_super+0x12/0x20 [btrfs]
deactivate_locked_super+0x36/0x70
cleanup_mnt+0x104/0x160
task_work_run+0x5f/0x90
__prepare_exit_to_usermode+0x1bd/0x1c0
do_syscall_64+0x5e/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #2 (&space_info->lock){+.+.}-{2:2}:
_raw_spin_lock+0x25/0x30
btrfs_block_rsv_release+0x1a6/0x3f0 [btrfs]
btrfs_inode_rsv_release+0x4f/0x170 [btrfs]
btrfs_clear_delalloc_extent+0x155/0x480 [btrfs]
clear_state_bit+0x81/0x1a0 [btrfs]
__clear_extent_bit+0x25c/0x5d0 [btrfs]
clear_extent_bit+0x15/0x20 [btrfs]
btrfs_invalidatepage+0x2b7/0x3c0 [btrfs]
truncate_cleanup_page+0x47/0xe0
truncate_inode_pages_range+0x238/0x840
truncate_pagecache+0x44/0x60
btrfs_setattr+0x202/0x5e0 [btrfs]
notify_change+0x33b/0x490
do_truncate+0x76/0xd0
path_openat+0x687/0xa10
do_filp_open+0x91/0x100
do_sys_openat2+0x215/0x2d0
do_sys_open+0x44/0x80
do_syscall_64+0x52/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #1 (&tree->lock#2){+.+.}-{2:2}:
_raw_spin_lock+0x25/0x30
find_first_extent_bit+0x32/0x150 [btrfs]
write_pinned_extent_entries.isra.0+0xc5/0x100 [btrfs]
__btrfs_write_out_cache+0x172/0x480 [btrfs]
btrfs_write_out_cache+0x7a/0xf0 [btrfs]
btrfs_write_dirty_block_groups+0x286/0x3b0 [btrfs]
commit_cowonly_roots+0x245/0x300 [btrfs]
btrfs_commit_transaction+0x4ed/0xac0 [btrfs]
close_ctree+0xf9/0x2f5 [btrfs]
generic_shutdown_super+0x6c/0x100
kill_anon_super+0x14/0x30
btrfs_kill_super+0x12/0x20 [btrfs]
deactivate_locked_super+0x36/0x70
cleanup_mnt+0x104/0x160
task_work_run+0x5f/0x90
__prepare_exit_to_usermode+0x1bd/0x1c0
do_syscall_64+0x5e/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #0 (&ctl->tree_lock){+.+.}-{2:2}:
__lock_acquire+0x1240/0x2460
lock_acquire+0xab/0x360
_raw_spin_lock+0x25/0x30
btrfs_dump_free_space+0x2b/0xa0 [btrfs]
btrfs_dump_space_info+0xf4/0x120 [btrfs]
btrfs_reserve_extent+0x176/0x180 [btrfs]
__btrfs_prealloc_file_range+0x145/0x550 [btrfs]
cache_save_setup+0x28d/0x3b0 [btrfs]
btrfs_start_dirty_block_groups+0x1fc/0x4f0 [btrfs]
btrfs_commit_transaction+0xcc/0xac0 [btrfs]
btrfs_alloc_data_chunk_ondemand+0x162/0x4c0 [btrfs]
btrfs_check_data_free_space+0x4c/0xa0 [btrfs]
btrfs_buffered_write.isra.0+0x19b/0x740 [btrfs]
btrfs_file_write_iter+0x3cf/0x610 [btrfs]
new_sync_write+0x11e/0x1b0
vfs_write+0x1c9/0x200
ksys_write+0x68/0xe0
do_syscall_64+0x52/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
other info that might help us debug this:
Chain exists of:
&ctl->tree_lock --> &space_info->lock --> &cache->lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&cache->lock);
lock(&space_info->lock);
lock(&cache->lock);
lock(&ctl->tree_lock);
*** DEADLOCK ***
6 locks held by dd/563090:
#0: ffff9e7e21d18448 (sb_writers#14){.+.+}-{0:0}, at: vfs_write+0x195/0x200
#1: ffff9e7dd0410ed8 (&sb->s_type->i_mutex_key#19){++++}-{3:3}, at: btrfs_file_write_iter+0x86/0x610 [btrfs]
#2: ffff9e7e21d18638 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40b/0x5b0 [btrfs]
#3: ffff9e7e1f05d688 (&cur_trans->cache_write_mutex){+.+.}-{3:3}, at: btrfs_start_dirty_block_groups+0x158/0x4f0 [btrfs]
#4: ffff9e7e2284ddb8 (&space_info->groups_sem){++++}-{3:3}, at: btrfs_dump_space_info+0x69/0x120 [btrfs]
#5: ffff9e7e2284d428 (&cache->lock){+.+.}-{2:2}, at: btrfs_dump_space_info+0xaa/0x120 [btrfs]
stack backtrace:
CPU: 3 PID: 563090 Comm: dd Tainted: G OE 5.8.0-rc5+ #20
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./890FX Deluxe5, BIOS P1.40 05/03/2011
Call Trace:
dump_stack+0x96/0xd0
check_noncircular+0x162/0x180
__lock_acquire+0x1240/0x2460
? wake_up_klogd.part.0+0x30/0x40
lock_acquire+0xab/0x360
? btrfs_dump_free_space+0x2b/0xa0 [btrfs]
_raw_spin_lock+0x25/0x30
? btrfs_dump_free_space+0x2b/0xa0 [btrfs]
btrfs_dump_free_space+0x2b/0xa0 [btrfs]
btrfs_dump_space_info+0xf4/0x120 [btrfs]
btrfs_reserve_extent+0x176/0x180 [btrfs]
__btrfs_prealloc_file_range+0x145/0x550 [btrfs]
? btrfs_qgroup_reserve_data+0x1d/0x60 [btrfs]
cache_save_setup+0x28d/0x3b0 [btrfs]
btrfs_start_dirty_block_groups+0x1fc/0x4f0 [btrfs]
btrfs_commit_transaction+0xcc/0xac0 [btrfs]
? start_transaction+0xe0/0x5b0 [btrfs]
btrfs_alloc_data_chunk_ondemand+0x162/0x4c0 [btrfs]
btrfs_check_data_free_space+0x4c/0xa0 [btrfs]
btrfs_buffered_write.isra.0+0x19b/0x740 [btrfs]
? ktime_get_coarse_real_ts64+0xa8/0xd0
? trace_hardirqs_on+0x1c/0xe0
btrfs_file_write_iter+0x3cf/0x610 [btrfs]
new_sync_write+0x11e/0x1b0
vfs_write+0x1c9/0x200
ksys_write+0x68/0xe0
do_syscall_64+0x52/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
This is because we're holding the block_group->lock while trying to dump
the free space cache. However we don't need this lock, we just need it
to read the values for the printk, so move the free space cache dumping
outside of the block group lock.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/btrfs/space-info.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 756950aba1a66..317d1d2160090 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -468,8 +468,8 @@ void btrfs_dump_space_info(struct btrfs_fs_info *fs_info,
"block group %llu has %llu bytes, %llu used %llu pinned %llu reserved %s",
cache->start, cache->length, cache->used, cache->pinned,
cache->reserved, cache->ro ? "[readonly]" : "");
- btrfs_dump_free_space(cache, bytes);
spin_unlock(&cache->lock);
+ btrfs_dump_free_space(cache, bytes);
}
if (++index < BTRFS_NR_RAID_TYPES)
goto again;
--
2.25.1
^ permalink raw reply related [flat|nested] 4+ messages in thread* [PATCH AUTOSEL 5.7 54/60] btrfs: allow btrfs_truncate_block() to fallback to nocow for data space reservation
[not found] <20200810191028.3793884-1-sashal@kernel.org>
2020-08-10 19:09 ` [PATCH AUTOSEL 5.7 08/60] fs/btrfs: Add cond_resched() for try_release_extent_mapping() stalls Sasha Levin
2020-08-10 19:09 ` [PATCH AUTOSEL 5.7 23/60] btrfs: fix lockdep splat from btrfs_dump_space_info Sasha Levin
@ 2020-08-10 19:10 ` Sasha Levin
2020-08-10 19:10 ` [PATCH AUTOSEL 5.7 55/60] btrfs: qgroup: free per-trans reserved space when a subvolume gets dropped Sasha Levin
3 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2020-08-10 19:10 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Qu Wenruo, Martin Doucha, Filipe Manana, Anand Jain, David Sterba,
Sasha Levin, linux-btrfs
From: Qu Wenruo <wqu@suse.com>
[ Upstream commit 6d4572a9d71d5fc2affee0258d8582d39859188c ]
[BUG]
When the data space is exhausted, even if the inode has NOCOW attribute,
we will still refuse to truncate unaligned range due to ENOSPC.
The following script can reproduce it pretty easily:
#!/bin/bash
dev=/dev/test/test
mnt=/mnt/btrfs
umount $dev &> /dev/null
umount $mnt &> /dev/null
mkfs.btrfs -f $dev -b 1G
mount -o nospace_cache $dev $mnt
touch $mnt/foobar
chattr +C $mnt/foobar
xfs_io -f -c "pwrite -b 4k 0 4k" $mnt/foobar > /dev/null
xfs_io -f -c "pwrite -b 4k 0 1G" $mnt/padding &> /dev/null
sync
xfs_io -c "fpunch 0 2k" $mnt/foobar
umount $mnt
Currently this will fail at the fpunch part.
[CAUSE]
Because btrfs_truncate_block() always reserves space without checking
the NOCOW attribute.
Since the writeback path follows NOCOW bit, we only need to bother the
space reservation code in btrfs_truncate_block().
[FIX]
Make btrfs_truncate_block() follow btrfs_buffered_write() to try to
reserve data space first, and fall back to NOCOW check only when we
don't have enough space.
Such always-try-reserve is an optimization introduced in
btrfs_buffered_write(), to avoid expensive btrfs_check_can_nocow() call.
This patch will export check_can_nocow() as btrfs_check_can_nocow(), and
use it in btrfs_truncate_block() to fix the problem.
Reported-by: Martin Doucha <martin.doucha@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/btrfs/ctree.h | 2 ++
fs/btrfs/file.c | 12 ++++++------
fs/btrfs/inode.c | 44 +++++++++++++++++++++++++++++++++++++-------
3 files changed, 45 insertions(+), 13 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 09e6dff8a8f85..68bd89e3d4f09 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2982,6 +2982,8 @@ int btrfs_dirty_pages(struct inode *inode, struct page **pages,
size_t num_pages, loff_t pos, size_t write_bytes,
struct extent_state **cached);
int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
+int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos,
+ size_t *write_bytes, bool nowait);
/* tree-defrag.c */
int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 93244934d4f92..1e1af0ce70771 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1540,8 +1540,8 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
return ret;
}
-static noinline int check_can_nocow(struct btrfs_inode *inode, loff_t pos,
- size_t *write_bytes, bool nowait)
+int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos,
+ size_t *write_bytes, bool nowait)
{
struct btrfs_fs_info *fs_info = inode->root->fs_info;
struct btrfs_root *root = inode->root;
@@ -1656,8 +1656,8 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
if (ret < 0) {
if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
BTRFS_INODE_PREALLOC)) &&
- check_can_nocow(BTRFS_I(inode), pos,
- &write_bytes, false) > 0) {
+ btrfs_check_can_nocow(BTRFS_I(inode), pos,
+ &write_bytes, false) > 0) {
/*
* For nodata cow case, no need to reserve
* data space.
@@ -1936,8 +1936,8 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
*/
if (!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
BTRFS_INODE_PREALLOC)) ||
- check_can_nocow(BTRFS_I(inode), pos, &nocow_bytes,
- true) <= 0) {
+ btrfs_check_can_nocow(BTRFS_I(inode), pos, &nocow_bytes,
+ true) <= 0) {
inode_unlock(inode);
return -EAGAIN;
}
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e7bdda3ed069b..6cb3dc2748974 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4520,11 +4520,13 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
struct extent_state *cached_state = NULL;
struct extent_changeset *data_reserved = NULL;
char *kaddr;
+ bool only_release_metadata = false;
u32 blocksize = fs_info->sectorsize;
pgoff_t index = from >> PAGE_SHIFT;
unsigned offset = from & (blocksize - 1);
struct page *page;
gfp_t mask = btrfs_alloc_write_mask(mapping);
+ size_t write_bytes = blocksize;
int ret = 0;
u64 block_start;
u64 block_end;
@@ -4536,11 +4538,27 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
block_start = round_down(from, blocksize);
block_end = block_start + blocksize - 1;
- ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
- block_start, blocksize);
- if (ret)
- goto out;
+ ret = btrfs_check_data_free_space(inode, &data_reserved, block_start,
+ blocksize);
+ if (ret < 0) {
+ if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
+ BTRFS_INODE_PREALLOC)) &&
+ btrfs_check_can_nocow(BTRFS_I(inode), block_start,
+ &write_bytes, false) > 0) {
+ /* For nocow case, no need to reserve data space */
+ only_release_metadata = true;
+ } else {
+ goto out;
+ }
+ }
+ ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), blocksize);
+ if (ret < 0) {
+ if (!only_release_metadata)
+ btrfs_free_reserved_data_space(inode, data_reserved,
+ block_start, blocksize);
+ goto out;
+ }
again:
page = find_or_create_page(mapping, index, mask);
if (!page) {
@@ -4609,14 +4627,26 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
set_page_dirty(page);
unlock_extent_cached(io_tree, block_start, block_end, &cached_state);
+ if (only_release_metadata)
+ set_extent_bit(&BTRFS_I(inode)->io_tree, block_start,
+ block_end, EXTENT_NORESERVE, NULL, NULL,
+ GFP_NOFS);
+
out_unlock:
- if (ret)
- btrfs_delalloc_release_space(inode, data_reserved, block_start,
- blocksize, true);
+ if (ret) {
+ if (only_release_metadata)
+ btrfs_delalloc_release_metadata(BTRFS_I(inode),
+ blocksize, true);
+ else
+ btrfs_delalloc_release_space(inode, data_reserved,
+ block_start, blocksize, true);
+ }
btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize);
unlock_page(page);
put_page(page);
out:
+ if (only_release_metadata)
+ btrfs_drew_write_unlock(&BTRFS_I(inode)->root->snapshot_lock);
extent_changeset_free(data_reserved);
return ret;
}
--
2.25.1
^ permalink raw reply related [flat|nested] 4+ messages in thread* [PATCH AUTOSEL 5.7 55/60] btrfs: qgroup: free per-trans reserved space when a subvolume gets dropped
[not found] <20200810191028.3793884-1-sashal@kernel.org>
` (2 preceding siblings ...)
2020-08-10 19:10 ` [PATCH AUTOSEL 5.7 54/60] btrfs: allow btrfs_truncate_block() to fallback to nocow for data space reservation Sasha Levin
@ 2020-08-10 19:10 ` Sasha Levin
3 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2020-08-10 19:10 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Qu Wenruo, David Sterba, Sasha Levin, linux-btrfs
From: Qu Wenruo <wqu@suse.com>
[ Upstream commit a3cf0e4342b6af9e6b34a4b913c630fbd03a82ea ]
[BUG]
Sometime fsstress could lead to qgroup warning for case like
generic/013:
BTRFS warning (device dm-3): qgroup 0/259 has unreleased space, type 1 rsv 81920
------------[ cut here ]------------
WARNING: CPU: 9 PID: 24535 at fs/btrfs/disk-io.c:4142 close_ctree+0x1dc/0x323 [btrfs]
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:close_ctree+0x1dc/0x323 [btrfs]
Call Trace:
btrfs_put_super+0x15/0x17 [btrfs]
generic_shutdown_super+0x72/0x110
kill_anon_super+0x18/0x30
btrfs_kill_super+0x17/0x30 [btrfs]
deactivate_locked_super+0x3b/0xa0
deactivate_super+0x40/0x50
cleanup_mnt+0x135/0x190
__cleanup_mnt+0x12/0x20
task_work_run+0x64/0xb0
__prepare_exit_to_usermode+0x1bc/0x1c0
__syscall_return_slowpath+0x47/0x230
do_syscall_64+0x64/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
---[ end trace 6c341cdf9b6cc3c1 ]---
BTRFS error (device dm-3): qgroup reserved space leaked
While that subvolume 259 is no longer in that filesystem.
[CAUSE]
Normally per-trans qgroup reserved space is freed when a transaction is
committed, in commit_fs_roots().
However for completely dropped subvolume, that subvolume is completely
gone, thus is no longer in the fs_roots_radix, and its per-trans
reserved qgroup will never be freed.
Since the subvolume is already gone, leaked per-trans space won't cause
any trouble for end users.
[FIX]
Just call btrfs_qgroup_free_meta_all_pertrans() before a subvolume is
completely dropped.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/btrfs/extent-tree.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 54a64d1e18c6b..7c86188b33d43 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5481,6 +5481,14 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc)
}
}
+ /*
+ * This subvolume is going to be completely dropped, and won't be
+ * recorded as dirty roots, thus pertrans meta rsv will not be freed at
+ * commit transaction time. So free it here manually.
+ */
+ btrfs_qgroup_convert_reserved_meta(root, INT_MAX);
+ btrfs_qgroup_free_meta_all_pertrans(root);
+
if (test_bit(BTRFS_ROOT_IN_RADIX, &root->state))
btrfs_add_dropped_root(trans, root);
else
--
2.25.1
^ permalink raw reply related [flat|nested] 4+ messages in thread