* [PATCH AUTOSEL 5.7 08/60] fs/btrfs: Add cond_resched() for try_release_extent_mapping() stalls
[not found] <20200810191028.3793884-1-sashal@kernel.org>
@ 2020-08-10 19:09 ` Sasha Levin
2020-08-10 19:09 ` [PATCH AUTOSEL 5.7 23/60] btrfs: fix lockdep splat from btrfs_dump_space_info Sasha Levin
` (2 subsequent siblings)
3 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2020-08-10 19:09 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Paul E. McKenney, Sasha Levin, linux-btrfs
From: "Paul E. McKenney" <paulmck@kernel.org>
[ Upstream commit 9f47eb5461aaeb6cb8696f9d11503ae90e4d5cb0 ]
Very large I/Os can cause the following RCU CPU stall warning:
RIP: 0010:rb_prev+0x8/0x50
Code: 49 89 c0 49 89 d1 48 89 c2 48 89 f8 e9 e5 fd ff ff 4c 89 48 10 c3 4c =
89 06 c3 4c 89 40 10 c3 0f 1f 00 48 8b 0f 48 39 cf 74 38 <48> 8b 47 10 48 85 c0 74 22 48 8b 50 08 48 85 d2 74 0c 48 89 d0 48
RSP: 0018:ffffc9002212bab0 EFLAGS: 00000287 ORIG_RAX: ffffffffffffff13
RAX: ffff888821f93630 RBX: ffff888821f93630 RCX: ffff888821f937e0
RDX: 0000000000000000 RSI: 0000000000102000 RDI: ffff888821f93630
RBP: 0000000000103000 R08: 000000000006c000 R09: 0000000000000238
R10: 0000000000102fff R11: ffffc9002212bac8 R12: 0000000000000001
R13: ffffffffffffffff R14: 0000000000102000 R15: ffff888821f937e0
__lookup_extent_mapping+0xa0/0x110
try_release_extent_mapping+0xdc/0x220
btrfs_releasepage+0x45/0x70
shrink_page_list+0xa39/0xb30
shrink_inactive_list+0x18f/0x3b0
shrink_lruvec+0x38e/0x6b0
shrink_node+0x14d/0x690
do_try_to_free_pages+0xc6/0x3e0
try_to_free_mem_cgroup_pages+0xe6/0x1e0
reclaim_high.constprop.73+0x87/0xc0
mem_cgroup_handle_over_high+0x66/0x150
exit_to_usermode_loop+0x82/0xd0
do_syscall_64+0xd4/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
On a PREEMPT=n kernel, the try_release_extent_mapping() function's
"while" loop might run for a very long time on a large I/O. This commit
therefore adds a cond_resched() to this loop, providing RCU any needed
quiescent states.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/btrfs/extent_io.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 79196eb1a1b36..9d6d646e1eb08 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4518,6 +4518,8 @@ int try_release_extent_mapping(struct page *page, gfp_t mask)
/* once for us */
free_extent_map(em);
+
+ cond_resched(); /* Allow large-extent preemption. */
}
}
return try_release_extent_state(tree, page, mask);
--
2.25.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH AUTOSEL 5.7 23/60] btrfs: fix lockdep splat from btrfs_dump_space_info
[not found] <20200810191028.3793884-1-sashal@kernel.org>
2020-08-10 19:09 ` [PATCH AUTOSEL 5.7 08/60] fs/btrfs: Add cond_resched() for try_release_extent_mapping() stalls Sasha Levin
@ 2020-08-10 19:09 ` Sasha Levin
2020-08-10 19:10 ` [PATCH AUTOSEL 5.7 54/60] btrfs: allow btrfs_truncate_block() to fallback to nocow for data space reservation Sasha Levin
2020-08-10 19:10 ` [PATCH AUTOSEL 5.7 55/60] btrfs: qgroup: free per-trans reserved space when a subvolume gets dropped Sasha Levin
3 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2020-08-10 19:09 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Josef Bacik, David Sterba, Sasha Levin, linux-btrfs
From: Josef Bacik <josef@toxicpanda.com>
[ Upstream commit ab0db043c35da3477e57d4d516492b2d51a5ca0f ]
When running with -o enospc_debug you can get the following splat if one
of the dump_space_info's trip
======================================================
WARNING: possible circular locking dependency detected
5.8.0-rc5+ #20 Tainted: G OE
------------------------------------------------------
dd/563090 is trying to acquire lock:
ffff9e7dbf4f1e18 (&ctl->tree_lock){+.+.}-{2:2}, at: btrfs_dump_free_space+0x2b/0xa0 [btrfs]
but task is already holding lock:
ffff9e7e2284d428 (&cache->lock){+.+.}-{2:2}, at: btrfs_dump_space_info+0xaa/0x120 [btrfs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (&cache->lock){+.+.}-{2:2}:
_raw_spin_lock+0x25/0x30
btrfs_add_reserved_bytes+0x3c/0x3c0 [btrfs]
find_free_extent+0x7ef/0x13b0 [btrfs]
btrfs_reserve_extent+0x9b/0x180 [btrfs]
btrfs_alloc_tree_block+0xc1/0x340 [btrfs]
alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs]
__btrfs_cow_block+0x122/0x530 [btrfs]
btrfs_cow_block+0x106/0x210 [btrfs]
commit_cowonly_roots+0x55/0x300 [btrfs]
btrfs_commit_transaction+0x4ed/0xac0 [btrfs]
sync_filesystem+0x74/0x90
generic_shutdown_super+0x22/0x100
kill_anon_super+0x14/0x30
btrfs_kill_super+0x12/0x20 [btrfs]
deactivate_locked_super+0x36/0x70
cleanup_mnt+0x104/0x160
task_work_run+0x5f/0x90
__prepare_exit_to_usermode+0x1bd/0x1c0
do_syscall_64+0x5e/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #2 (&space_info->lock){+.+.}-{2:2}:
_raw_spin_lock+0x25/0x30
btrfs_block_rsv_release+0x1a6/0x3f0 [btrfs]
btrfs_inode_rsv_release+0x4f/0x170 [btrfs]
btrfs_clear_delalloc_extent+0x155/0x480 [btrfs]
clear_state_bit+0x81/0x1a0 [btrfs]
__clear_extent_bit+0x25c/0x5d0 [btrfs]
clear_extent_bit+0x15/0x20 [btrfs]
btrfs_invalidatepage+0x2b7/0x3c0 [btrfs]
truncate_cleanup_page+0x47/0xe0
truncate_inode_pages_range+0x238/0x840
truncate_pagecache+0x44/0x60
btrfs_setattr+0x202/0x5e0 [btrfs]
notify_change+0x33b/0x490
do_truncate+0x76/0xd0
path_openat+0x687/0xa10
do_filp_open+0x91/0x100
do_sys_openat2+0x215/0x2d0
do_sys_open+0x44/0x80
do_syscall_64+0x52/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #1 (&tree->lock#2){+.+.}-{2:2}:
_raw_spin_lock+0x25/0x30
find_first_extent_bit+0x32/0x150 [btrfs]
write_pinned_extent_entries.isra.0+0xc5/0x100 [btrfs]
__btrfs_write_out_cache+0x172/0x480 [btrfs]
btrfs_write_out_cache+0x7a/0xf0 [btrfs]
btrfs_write_dirty_block_groups+0x286/0x3b0 [btrfs]
commit_cowonly_roots+0x245/0x300 [btrfs]
btrfs_commit_transaction+0x4ed/0xac0 [btrfs]
close_ctree+0xf9/0x2f5 [btrfs]
generic_shutdown_super+0x6c/0x100
kill_anon_super+0x14/0x30
btrfs_kill_super+0x12/0x20 [btrfs]
deactivate_locked_super+0x36/0x70
cleanup_mnt+0x104/0x160
task_work_run+0x5f/0x90
__prepare_exit_to_usermode+0x1bd/0x1c0
do_syscall_64+0x5e/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #0 (&ctl->tree_lock){+.+.}-{2:2}:
__lock_acquire+0x1240/0x2460
lock_acquire+0xab/0x360
_raw_spin_lock+0x25/0x30
btrfs_dump_free_space+0x2b/0xa0 [btrfs]
btrfs_dump_space_info+0xf4/0x120 [btrfs]
btrfs_reserve_extent+0x176/0x180 [btrfs]
__btrfs_prealloc_file_range+0x145/0x550 [btrfs]
cache_save_setup+0x28d/0x3b0 [btrfs]
btrfs_start_dirty_block_groups+0x1fc/0x4f0 [btrfs]
btrfs_commit_transaction+0xcc/0xac0 [btrfs]
btrfs_alloc_data_chunk_ondemand+0x162/0x4c0 [btrfs]
btrfs_check_data_free_space+0x4c/0xa0 [btrfs]
btrfs_buffered_write.isra.0+0x19b/0x740 [btrfs]
btrfs_file_write_iter+0x3cf/0x610 [btrfs]
new_sync_write+0x11e/0x1b0
vfs_write+0x1c9/0x200
ksys_write+0x68/0xe0
do_syscall_64+0x52/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
other info that might help us debug this:
Chain exists of:
&ctl->tree_lock --> &space_info->lock --> &cache->lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&cache->lock);
lock(&space_info->lock);
lock(&cache->lock);
lock(&ctl->tree_lock);
*** DEADLOCK ***
6 locks held by dd/563090:
#0: ffff9e7e21d18448 (sb_writers#14){.+.+}-{0:0}, at: vfs_write+0x195/0x200
#1: ffff9e7dd0410ed8 (&sb->s_type->i_mutex_key#19){++++}-{3:3}, at: btrfs_file_write_iter+0x86/0x610 [btrfs]
#2: ffff9e7e21d18638 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40b/0x5b0 [btrfs]
#3: ffff9e7e1f05d688 (&cur_trans->cache_write_mutex){+.+.}-{3:3}, at: btrfs_start_dirty_block_groups+0x158/0x4f0 [btrfs]
#4: ffff9e7e2284ddb8 (&space_info->groups_sem){++++}-{3:3}, at: btrfs_dump_space_info+0x69/0x120 [btrfs]
#5: ffff9e7e2284d428 (&cache->lock){+.+.}-{2:2}, at: btrfs_dump_space_info+0xaa/0x120 [btrfs]
stack backtrace:
CPU: 3 PID: 563090 Comm: dd Tainted: G OE 5.8.0-rc5+ #20
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./890FX Deluxe5, BIOS P1.40 05/03/2011
Call Trace:
dump_stack+0x96/0xd0
check_noncircular+0x162/0x180
__lock_acquire+0x1240/0x2460
? wake_up_klogd.part.0+0x30/0x40
lock_acquire+0xab/0x360
? btrfs_dump_free_space+0x2b/0xa0 [btrfs]
_raw_spin_lock+0x25/0x30
? btrfs_dump_free_space+0x2b/0xa0 [btrfs]
btrfs_dump_free_space+0x2b/0xa0 [btrfs]
btrfs_dump_space_info+0xf4/0x120 [btrfs]
btrfs_reserve_extent+0x176/0x180 [btrfs]
__btrfs_prealloc_file_range+0x145/0x550 [btrfs]
? btrfs_qgroup_reserve_data+0x1d/0x60 [btrfs]
cache_save_setup+0x28d/0x3b0 [btrfs]
btrfs_start_dirty_block_groups+0x1fc/0x4f0 [btrfs]
btrfs_commit_transaction+0xcc/0xac0 [btrfs]
? start_transaction+0xe0/0x5b0 [btrfs]
btrfs_alloc_data_chunk_ondemand+0x162/0x4c0 [btrfs]
btrfs_check_data_free_space+0x4c/0xa0 [btrfs]
btrfs_buffered_write.isra.0+0x19b/0x740 [btrfs]
? ktime_get_coarse_real_ts64+0xa8/0xd0
? trace_hardirqs_on+0x1c/0xe0
btrfs_file_write_iter+0x3cf/0x610 [btrfs]
new_sync_write+0x11e/0x1b0
vfs_write+0x1c9/0x200
ksys_write+0x68/0xe0
do_syscall_64+0x52/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
This is because we're holding the block_group->lock while trying to dump
the free space cache. However we don't need this lock, we just need it
to read the values for the printk, so move the free space cache dumping
outside of the block group lock.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/btrfs/space-info.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 756950aba1a66..317d1d2160090 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -468,8 +468,8 @@ void btrfs_dump_space_info(struct btrfs_fs_info *fs_info,
"block group %llu has %llu bytes, %llu used %llu pinned %llu reserved %s",
cache->start, cache->length, cache->used, cache->pinned,
cache->reserved, cache->ro ? "[readonly]" : "");
- btrfs_dump_free_space(cache, bytes);
spin_unlock(&cache->lock);
+ btrfs_dump_free_space(cache, bytes);
}
if (++index < BTRFS_NR_RAID_TYPES)
goto again;
--
2.25.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH AUTOSEL 5.7 54/60] btrfs: allow btrfs_truncate_block() to fallback to nocow for data space reservation
[not found] <20200810191028.3793884-1-sashal@kernel.org>
2020-08-10 19:09 ` [PATCH AUTOSEL 5.7 08/60] fs/btrfs: Add cond_resched() for try_release_extent_mapping() stalls Sasha Levin
2020-08-10 19:09 ` [PATCH AUTOSEL 5.7 23/60] btrfs: fix lockdep splat from btrfs_dump_space_info Sasha Levin
@ 2020-08-10 19:10 ` Sasha Levin
2020-08-10 19:10 ` [PATCH AUTOSEL 5.7 55/60] btrfs: qgroup: free per-trans reserved space when a subvolume gets dropped Sasha Levin
3 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2020-08-10 19:10 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Qu Wenruo, Martin Doucha, Filipe Manana, Anand Jain, David Sterba,
Sasha Levin, linux-btrfs
From: Qu Wenruo <wqu@suse.com>
[ Upstream commit 6d4572a9d71d5fc2affee0258d8582d39859188c ]
[BUG]
When the data space is exhausted, even if the inode has NOCOW attribute,
we will still refuse to truncate unaligned range due to ENOSPC.
The following script can reproduce it pretty easily:
#!/bin/bash
dev=/dev/test/test
mnt=/mnt/btrfs
umount $dev &> /dev/null
umount $mnt &> /dev/null
mkfs.btrfs -f $dev -b 1G
mount -o nospace_cache $dev $mnt
touch $mnt/foobar
chattr +C $mnt/foobar
xfs_io -f -c "pwrite -b 4k 0 4k" $mnt/foobar > /dev/null
xfs_io -f -c "pwrite -b 4k 0 1G" $mnt/padding &> /dev/null
sync
xfs_io -c "fpunch 0 2k" $mnt/foobar
umount $mnt
Currently this will fail at the fpunch part.
[CAUSE]
Because btrfs_truncate_block() always reserves space without checking
the NOCOW attribute.
Since the writeback path follows NOCOW bit, we only need to bother the
space reservation code in btrfs_truncate_block().
[FIX]
Make btrfs_truncate_block() follow btrfs_buffered_write() to try to
reserve data space first, and fall back to NOCOW check only when we
don't have enough space.
Such always-try-reserve is an optimization introduced in
btrfs_buffered_write(), to avoid expensive btrfs_check_can_nocow() call.
This patch will export check_can_nocow() as btrfs_check_can_nocow(), and
use it in btrfs_truncate_block() to fix the problem.
Reported-by: Martin Doucha <martin.doucha@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/btrfs/ctree.h | 2 ++
fs/btrfs/file.c | 12 ++++++------
fs/btrfs/inode.c | 44 +++++++++++++++++++++++++++++++++++++-------
3 files changed, 45 insertions(+), 13 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 09e6dff8a8f85..68bd89e3d4f09 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2982,6 +2982,8 @@ int btrfs_dirty_pages(struct inode *inode, struct page **pages,
size_t num_pages, loff_t pos, size_t write_bytes,
struct extent_state **cached);
int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end);
+int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos,
+ size_t *write_bytes, bool nowait);
/* tree-defrag.c */
int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 93244934d4f92..1e1af0ce70771 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1540,8 +1540,8 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages,
return ret;
}
-static noinline int check_can_nocow(struct btrfs_inode *inode, loff_t pos,
- size_t *write_bytes, bool nowait)
+int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos,
+ size_t *write_bytes, bool nowait)
{
struct btrfs_fs_info *fs_info = inode->root->fs_info;
struct btrfs_root *root = inode->root;
@@ -1656,8 +1656,8 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
if (ret < 0) {
if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
BTRFS_INODE_PREALLOC)) &&
- check_can_nocow(BTRFS_I(inode), pos,
- &write_bytes, false) > 0) {
+ btrfs_check_can_nocow(BTRFS_I(inode), pos,
+ &write_bytes, false) > 0) {
/*
* For nodata cow case, no need to reserve
* data space.
@@ -1936,8 +1936,8 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
*/
if (!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
BTRFS_INODE_PREALLOC)) ||
- check_can_nocow(BTRFS_I(inode), pos, &nocow_bytes,
- true) <= 0) {
+ btrfs_check_can_nocow(BTRFS_I(inode), pos, &nocow_bytes,
+ true) <= 0) {
inode_unlock(inode);
return -EAGAIN;
}
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e7bdda3ed069b..6cb3dc2748974 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4520,11 +4520,13 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
struct extent_state *cached_state = NULL;
struct extent_changeset *data_reserved = NULL;
char *kaddr;
+ bool only_release_metadata = false;
u32 blocksize = fs_info->sectorsize;
pgoff_t index = from >> PAGE_SHIFT;
unsigned offset = from & (blocksize - 1);
struct page *page;
gfp_t mask = btrfs_alloc_write_mask(mapping);
+ size_t write_bytes = blocksize;
int ret = 0;
u64 block_start;
u64 block_end;
@@ -4536,11 +4538,27 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
block_start = round_down(from, blocksize);
block_end = block_start + blocksize - 1;
- ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
- block_start, blocksize);
- if (ret)
- goto out;
+ ret = btrfs_check_data_free_space(inode, &data_reserved, block_start,
+ blocksize);
+ if (ret < 0) {
+ if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
+ BTRFS_INODE_PREALLOC)) &&
+ btrfs_check_can_nocow(BTRFS_I(inode), block_start,
+ &write_bytes, false) > 0) {
+ /* For nocow case, no need to reserve data space */
+ only_release_metadata = true;
+ } else {
+ goto out;
+ }
+ }
+ ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), blocksize);
+ if (ret < 0) {
+ if (!only_release_metadata)
+ btrfs_free_reserved_data_space(inode, data_reserved,
+ block_start, blocksize);
+ goto out;
+ }
again:
page = find_or_create_page(mapping, index, mask);
if (!page) {
@@ -4609,14 +4627,26 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
set_page_dirty(page);
unlock_extent_cached(io_tree, block_start, block_end, &cached_state);
+ if (only_release_metadata)
+ set_extent_bit(&BTRFS_I(inode)->io_tree, block_start,
+ block_end, EXTENT_NORESERVE, NULL, NULL,
+ GFP_NOFS);
+
out_unlock:
- if (ret)
- btrfs_delalloc_release_space(inode, data_reserved, block_start,
- blocksize, true);
+ if (ret) {
+ if (only_release_metadata)
+ btrfs_delalloc_release_metadata(BTRFS_I(inode),
+ blocksize, true);
+ else
+ btrfs_delalloc_release_space(inode, data_reserved,
+ block_start, blocksize, true);
+ }
btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize);
unlock_page(page);
put_page(page);
out:
+ if (only_release_metadata)
+ btrfs_drew_write_unlock(&BTRFS_I(inode)->root->snapshot_lock);
extent_changeset_free(data_reserved);
return ret;
}
--
2.25.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH AUTOSEL 5.7 55/60] btrfs: qgroup: free per-trans reserved space when a subvolume gets dropped
[not found] <20200810191028.3793884-1-sashal@kernel.org>
` (2 preceding siblings ...)
2020-08-10 19:10 ` [PATCH AUTOSEL 5.7 54/60] btrfs: allow btrfs_truncate_block() to fallback to nocow for data space reservation Sasha Levin
@ 2020-08-10 19:10 ` Sasha Levin
3 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2020-08-10 19:10 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Qu Wenruo, David Sterba, Sasha Levin, linux-btrfs
From: Qu Wenruo <wqu@suse.com>
[ Upstream commit a3cf0e4342b6af9e6b34a4b913c630fbd03a82ea ]
[BUG]
Sometime fsstress could lead to qgroup warning for case like
generic/013:
BTRFS warning (device dm-3): qgroup 0/259 has unreleased space, type 1 rsv 81920
------------[ cut here ]------------
WARNING: CPU: 9 PID: 24535 at fs/btrfs/disk-io.c:4142 close_ctree+0x1dc/0x323 [btrfs]
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:close_ctree+0x1dc/0x323 [btrfs]
Call Trace:
btrfs_put_super+0x15/0x17 [btrfs]
generic_shutdown_super+0x72/0x110
kill_anon_super+0x18/0x30
btrfs_kill_super+0x17/0x30 [btrfs]
deactivate_locked_super+0x3b/0xa0
deactivate_super+0x40/0x50
cleanup_mnt+0x135/0x190
__cleanup_mnt+0x12/0x20
task_work_run+0x64/0xb0
__prepare_exit_to_usermode+0x1bc/0x1c0
__syscall_return_slowpath+0x47/0x230
do_syscall_64+0x64/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
---[ end trace 6c341cdf9b6cc3c1 ]---
BTRFS error (device dm-3): qgroup reserved space leaked
While that subvolume 259 is no longer in that filesystem.
[CAUSE]
Normally per-trans qgroup reserved space is freed when a transaction is
committed, in commit_fs_roots().
However for completely dropped subvolume, that subvolume is completely
gone, thus is no longer in the fs_roots_radix, and its per-trans
reserved qgroup will never be freed.
Since the subvolume is already gone, leaked per-trans space won't cause
any trouble for end users.
[FIX]
Just call btrfs_qgroup_free_meta_all_pertrans() before a subvolume is
completely dropped.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/btrfs/extent-tree.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 54a64d1e18c6b..7c86188b33d43 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5481,6 +5481,14 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc)
}
}
+ /*
+ * This subvolume is going to be completely dropped, and won't be
+ * recorded as dirty roots, thus pertrans meta rsv will not be freed at
+ * commit transaction time. So free it here manually.
+ */
+ btrfs_qgroup_convert_reserved_meta(root, INT_MAX);
+ btrfs_qgroup_free_meta_all_pertrans(root);
+
if (test_bit(BTRFS_ROOT_IN_RADIX, &root->state))
btrfs_add_dropped_root(trans, root);
else
--
2.25.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-08-10 19:29 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20200810191028.3793884-1-sashal@kernel.org>
2020-08-10 19:09 ` [PATCH AUTOSEL 5.7 08/60] fs/btrfs: Add cond_resched() for try_release_extent_mapping() stalls Sasha Levin
2020-08-10 19:09 ` [PATCH AUTOSEL 5.7 23/60] btrfs: fix lockdep splat from btrfs_dump_space_info Sasha Levin
2020-08-10 19:10 ` [PATCH AUTOSEL 5.7 54/60] btrfs: allow btrfs_truncate_block() to fallback to nocow for data space reservation Sasha Levin
2020-08-10 19:10 ` [PATCH AUTOSEL 5.7 55/60] btrfs: qgroup: free per-trans reserved space when a subvolume gets dropped Sasha Levin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox