* [PATCH AUTOSEL 5.4 42/46] btrfs: hold a ref on the root in btrfs_recover_relocation
[not found] <20200410034909.8922-1-sashal@kernel.org>
@ 2020-04-10 3:49 ` Sasha Levin
2020-04-10 3:49 ` [PATCH AUTOSEL 5.4 43/46] btrfs: qgroup: ensure qgroup_rescan_running is only set when the worker is at least queued Sasha Levin
` (3 subsequent siblings)
4 siblings, 0 replies; 5+ messages in thread
From: Sasha Levin @ 2020-04-10 3:49 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Josef Bacik, David Sterba, Sasha Levin, linux-btrfs
From: Josef Bacik <josef@toxicpanda.com>
[ Upstream commit 932fd26df8125a5b14438563c4d3e33f59ba80f7 ]
We look up the fs root in various places in here when recovering from a
crashed relcoation. Make sure we hold a ref on the root whenever we
look them up.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/btrfs/relocation.c | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index bc1d7f144ace9..4abdf3ec81c18 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -4550,6 +4550,12 @@ int btrfs_recover_relocation(struct btrfs_root *root)
err = ret;
goto out;
}
+ } else {
+ if (!btrfs_grab_fs_root(fs_root)) {
+ err = -ENOENT;
+ goto out;
+ }
+ btrfs_put_fs_root(fs_root);
}
}
@@ -4599,10 +4605,15 @@ int btrfs_recover_relocation(struct btrfs_root *root)
list_add_tail(&reloc_root->root_list, &reloc_roots);
goto out_free;
}
+ if (!btrfs_grab_fs_root(fs_root)) {
+ err = -ENOENT;
+ goto out_free;
+ }
err = __add_reloc_root(reloc_root);
BUG_ON(err < 0); /* -ENOMEM or logic error */
fs_root->reloc_root = reloc_root;
+ btrfs_put_fs_root(fs_root);
}
err = btrfs_commit_transaction(trans);
@@ -4634,10 +4645,14 @@ int btrfs_recover_relocation(struct btrfs_root *root)
if (err == 0) {
/* cleanup orphan inode in data relocation tree */
fs_root = read_fs_root(fs_info, BTRFS_DATA_RELOC_TREE_OBJECTID);
- if (IS_ERR(fs_root))
+ if (IS_ERR(fs_root)) {
err = PTR_ERR(fs_root);
- else
- err = btrfs_orphan_cleanup(fs_root);
+ } else {
+ if (btrfs_grab_fs_root(fs_root)) {
+ err = btrfs_orphan_cleanup(fs_root);
+ btrfs_put_fs_root(fs_root);
+ }
+ }
}
return err;
}
--
2.20.1
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH AUTOSEL 5.4 43/46] btrfs: qgroup: ensure qgroup_rescan_running is only set when the worker is at least queued
[not found] <20200410034909.8922-1-sashal@kernel.org>
2020-04-10 3:49 ` [PATCH AUTOSEL 5.4 42/46] btrfs: hold a ref on the root in btrfs_recover_relocation Sasha Levin
@ 2020-04-10 3:49 ` Sasha Levin
2020-04-10 3:49 ` [PATCH AUTOSEL 5.4 44/46] btrfs: remove a BUG_ON() from merge_reloc_roots() Sasha Levin
` (2 subsequent siblings)
4 siblings, 0 replies; 5+ messages in thread
From: Sasha Levin @ 2020-04-10 3:49 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Qu Wenruo, Jeff Mahoney, David Sterba, Sasha Levin, linux-btrfs
From: Qu Wenruo <wqu@suse.com>
[ Upstream commit d61acbbf54c612ea9bf67eed609494cda0857b3a ]
[BUG]
There are some reports about btrfs wait forever to unmount itself, with
the following call trace:
INFO: task umount:4631 blocked for more than 491 seconds.
Tainted: G X 5.3.8-2-default #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
umount D 0 4631 3337 0x00000000
Call Trace:
([<00000000174adf7a>] __schedule+0x342/0x748)
[<00000000174ae3ca>] schedule+0x4a/0xd8
[<00000000174b1f08>] schedule_timeout+0x218/0x420
[<00000000174af10c>] wait_for_common+0x104/0x1d8
[<000003ff804d6994>] btrfs_qgroup_wait_for_completion+0x84/0xb0 [btrfs]
[<000003ff8044a616>] close_ctree+0x4e/0x380 [btrfs]
[<0000000016fa3136>] generic_shutdown_super+0x8e/0x158
[<0000000016fa34d6>] kill_anon_super+0x26/0x40
[<000003ff8041ba88>] btrfs_kill_super+0x28/0xc8 [btrfs]
[<0000000016fa39f8>] deactivate_locked_super+0x68/0x98
[<0000000016fcb198>] cleanup_mnt+0xc0/0x140
[<0000000016d6a846>] task_work_run+0xc6/0x110
[<0000000016d04f76>] do_notify_resume+0xae/0xb8
[<00000000174b30ae>] system_call+0xe2/0x2c8
[CAUSE]
The problem happens when we have called qgroup_rescan_init(), but
not queued the worker. It can be caused mostly by error handling.
Qgroup ioctl thread | Unmount thread
----------------------------------------+-----------------------------------
|
btrfs_qgroup_rescan() |
|- qgroup_rescan_init() |
| |- qgroup_rescan_running = true; |
| |
|- trans = btrfs_join_transaction() |
| Some error happened |
| |
|- btrfs_qgroup_rescan() returns error |
But qgroup_rescan_running == true; |
| close_ctree()
| |- btrfs_qgroup_wait_for_completion()
| |- running == true;
| |- wait_for_completion();
btrfs_qgroup_rescan_worker is never queued, thus no one is going to wake
up close_ctree() and we get a deadlock.
All involved qgroup_rescan_init() callers are:
- btrfs_qgroup_rescan()
The example above. It's possible to trigger the deadlock when error
happened.
- btrfs_quota_enable()
Not possible. Just after qgroup_rescan_init() we queue the work.
- btrfs_read_qgroup_config()
It's possible to trigger the deadlock. It only init the work, the
work queueing happens in btrfs_qgroup_rescan_resume().
Thus if error happened in between, deadlock is possible.
We shouldn't set fs_info->qgroup_rescan_running just in
qgroup_rescan_init(), as at that stage we haven't yet queued qgroup
rescan worker to run.
[FIX]
Set qgroup_rescan_running before queueing the work, so that we ensure
the rescan work is queued when we wait for it.
Fixes: 8d9eddad1946 ("Btrfs: fix qgroup rescan worker initialization")
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
[ Change subject and cause analyse, use a smaller fix ]
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/btrfs/qgroup.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 286c8c11c8d32..590defdf88609 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1030,6 +1030,7 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info)
ret = qgroup_rescan_init(fs_info, 0, 1);
if (!ret) {
qgroup_rescan_zero_tracking(fs_info);
+ fs_info->qgroup_rescan_running = true;
btrfs_queue_work(fs_info->qgroup_rescan_workers,
&fs_info->qgroup_rescan_work);
}
@@ -3276,7 +3277,6 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
sizeof(fs_info->qgroup_rescan_progress));
fs_info->qgroup_rescan_progress.objectid = progress_objectid;
init_completion(&fs_info->qgroup_rescan_completion);
- fs_info->qgroup_rescan_running = true;
spin_unlock(&fs_info->qgroup_lock);
mutex_unlock(&fs_info->qgroup_rescan_lock);
@@ -3341,8 +3341,11 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
qgroup_rescan_zero_tracking(fs_info);
+ mutex_lock(&fs_info->qgroup_rescan_lock);
+ fs_info->qgroup_rescan_running = true;
btrfs_queue_work(fs_info->qgroup_rescan_workers,
&fs_info->qgroup_rescan_work);
+ mutex_unlock(&fs_info->qgroup_rescan_lock);
return 0;
}
@@ -3378,9 +3381,13 @@ int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
void
btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
{
- if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
+ if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN) {
+ mutex_lock(&fs_info->qgroup_rescan_lock);
+ fs_info->qgroup_rescan_running = true;
btrfs_queue_work(fs_info->qgroup_rescan_workers,
&fs_info->qgroup_rescan_work);
+ mutex_unlock(&fs_info->qgroup_rescan_lock);
+ }
}
/*
--
2.20.1
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH AUTOSEL 5.4 44/46] btrfs: remove a BUG_ON() from merge_reloc_roots()
[not found] <20200410034909.8922-1-sashal@kernel.org>
2020-04-10 3:49 ` [PATCH AUTOSEL 5.4 42/46] btrfs: hold a ref on the root in btrfs_recover_relocation Sasha Levin
2020-04-10 3:49 ` [PATCH AUTOSEL 5.4 43/46] btrfs: qgroup: ensure qgroup_rescan_running is only set when the worker is at least queued Sasha Levin
@ 2020-04-10 3:49 ` Sasha Levin
2020-04-10 3:49 ` [PATCH AUTOSEL 5.4 45/46] btrfs: restart relocate_tree_blocks properly Sasha Levin
2020-04-10 3:49 ` [PATCH AUTOSEL 5.4 46/46] btrfs: track reloc roots based on their commit root bytenr Sasha Levin
4 siblings, 0 replies; 5+ messages in thread
From: Sasha Levin @ 2020-04-10 3:49 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Josef Bacik, Qu Wenruo, David Sterba, Sasha Levin, linux-btrfs
From: Josef Bacik <josef@toxicpanda.com>
[ Upstream commit 7b7b74315b24dc064bc1c683659061c3d48f8668 ]
This was pretty subtle, we default to reloc roots having 0 root refs, so
if we crash in the middle of the relocation they can just be deleted.
If we successfully complete the relocation operations we'll set our root
refs to 1 in prepare_to_merge() and then go on to merge_reloc_roots().
At prepare_to_merge() time if any of the reloc roots have a 0 reference
still, we will remove that reloc root from our reloc root rb tree, and
then clean it up later.
However this only happens if we successfully start a transaction. If
we've aborted previously we will skip this step completely, and only
have reloc roots with a reference count of 0, but were never properly
removed from the reloc control's rb tree.
This isn't a problem per-se, our references are held by the list the
reloc roots are on, and by the original root the reloc root belongs to.
If we end up in this situation all the reloc roots will be added to the
dirty_reloc_list, and then properly dropped at that point. The reloc
control will be free'd and the rb tree is no longer used.
There were two options when fixing this, one was to remove the BUG_ON(),
the other was to make prepare_to_merge() handle the case where we
couldn't start a trans handle.
IMO this is the cleaner solution. I started with handling the error in
prepare_to_merge(), but it turned out super ugly. And in the end this
BUG_ON() simply doesn't matter, the cleanup was happening properly, we
were just panicing because this BUG_ON() only matters in the success
case. So I've opted to just remove it and add a comment where it was.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/btrfs/relocation.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 4abdf3ec81c18..e1337598d15ea 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2562,7 +2562,21 @@ void merge_reloc_roots(struct reloc_control *rc)
free_reloc_roots(&reloc_roots);
}
- BUG_ON(!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root));
+ /*
+ * We used to have
+ *
+ * BUG_ON(!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root));
+ *
+ * here, but it's wrong. If we fail to start the transaction in
+ * prepare_to_merge() we will have only 0 ref reloc roots, none of which
+ * have actually been removed from the reloc_root_tree rb tree. This is
+ * fine because we're bailing here, and we hold a reference on the root
+ * for the list that holds it, so these roots will be cleaned up when we
+ * do the reloc_dirty_list afterwards. Meanwhile the root->reloc_root
+ * will be cleaned up on unmount.
+ *
+ * The remaining nodes will be cleaned up by free_reloc_control.
+ */
}
static void free_block_list(struct rb_root *blocks)
--
2.20.1
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH AUTOSEL 5.4 45/46] btrfs: restart relocate_tree_blocks properly
[not found] <20200410034909.8922-1-sashal@kernel.org>
` (2 preceding siblings ...)
2020-04-10 3:49 ` [PATCH AUTOSEL 5.4 44/46] btrfs: remove a BUG_ON() from merge_reloc_roots() Sasha Levin
@ 2020-04-10 3:49 ` Sasha Levin
2020-04-10 3:49 ` [PATCH AUTOSEL 5.4 46/46] btrfs: track reloc roots based on their commit root bytenr Sasha Levin
4 siblings, 0 replies; 5+ messages in thread
From: Sasha Levin @ 2020-04-10 3:49 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Josef Bacik, David Sterba, Sasha Levin, linux-btrfs
From: Josef Bacik <josef@toxicpanda.com>
[ Upstream commit 50dbbb71c79df89532ec41d118d59386e5a877e3 ]
There are two bugs here, but fixing them independently would just result
in pain if you happened to bisect between the two patches.
First is how we handle the -EAGAIN from relocate_tree_block(). We don't
set error, unless we happen to be the first node, which makes no sense,
I have no idea what the code was trying to accomplish here.
We in fact _do_ want err set here so that we know we need to restart in
relocate_block_group(). Also we need finish_pending_nodes() to not
actually call link_to_upper(), because we didn't actually relocate the
block.
And then if we do get -EAGAIN we do not want to set our backref cache
last_trans to the one before ours. This would force us to update our
backref cache if we didn't cross transaction ids, which would mean we'd
have some nodes updated to their new_bytenr, but still able to find
their old bytenr because we're searching the same commit root as the
last time we went through relocate_tree_blocks.
Fixing these two things keeps us from panicing when we start breaking
out of relocate_tree_blocks() either for delayed ref flushing or enospc.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/btrfs/relocation.c | 11 ++---------
1 file changed, 2 insertions(+), 9 deletions(-)
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index e1337598d15ea..e3ba1c12008a9 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3176,9 +3176,8 @@ int relocate_tree_blocks(struct btrfs_trans_handle *trans,
ret = relocate_tree_block(trans, rc, node, &block->key,
path);
if (ret < 0) {
- if (ret != -EAGAIN || &block->rb_node == rb_first(blocks))
- err = ret;
- goto out;
+ err = ret;
+ break;
}
}
out:
@@ -4154,12 +4153,6 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc)
if (!RB_EMPTY_ROOT(&blocks)) {
ret = relocate_tree_blocks(trans, rc, &blocks);
if (ret < 0) {
- /*
- * if we fail to relocate tree blocks, force to update
- * backref cache when committing transaction.
- */
- rc->backref_cache.last_trans = trans->transid - 1;
-
if (ret != -EAGAIN) {
err = ret;
break;
--
2.20.1
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH AUTOSEL 5.4 46/46] btrfs: track reloc roots based on their commit root bytenr
[not found] <20200410034909.8922-1-sashal@kernel.org>
` (3 preceding siblings ...)
2020-04-10 3:49 ` [PATCH AUTOSEL 5.4 45/46] btrfs: restart relocate_tree_blocks properly Sasha Levin
@ 2020-04-10 3:49 ` Sasha Levin
4 siblings, 0 replies; 5+ messages in thread
From: Sasha Levin @ 2020-04-10 3:49 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Josef Bacik, David Sterba, Sasha Levin, linux-btrfs
From: Josef Bacik <josef@toxicpanda.com>
[ Upstream commit ea287ab157c2816bf12aad4cece41372f9d146b4 ]
We always search the commit root of the extent tree for looking up back
references, however we track the reloc roots based on their current
bytenr.
This is wrong, if we commit the transaction between relocating tree
blocks we could end up in this code in build_backref_tree
if (key.objectid == key.offset) {
/*
* Only root blocks of reloc trees use backref
* pointing to itself.
*/
root = find_reloc_root(rc, cur->bytenr);
ASSERT(root);
cur->root = root;
break;
}
find_reloc_root() is looking based on the bytenr we had in the commit
root, but if we've COWed this reloc root we will not find that bytenr,
and we will trip over the ASSERT(root).
Fix this by using the commit_root->start bytenr for indexing the commit
root. Then we change the __update_reloc_root() caller to be used when
we switch the commit root for the reloc root during commit.
This fixes the panic I was seeing when we started throttling relocation
for delayed refs.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/btrfs/relocation.c | 17 +++++++----------
1 file changed, 7 insertions(+), 10 deletions(-)
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index e3ba1c12008a9..b3f04e3d3cf7b 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1298,7 +1298,7 @@ static int __must_check __add_reloc_root(struct btrfs_root *root)
if (!node)
return -ENOMEM;
- node->bytenr = root->node->start;
+ node->bytenr = root->commit_root->start;
node->data = root;
spin_lock(&rc->reloc_root_tree.lock);
@@ -1329,10 +1329,11 @@ static void __del_reloc_root(struct btrfs_root *root)
if (rc && root->node) {
spin_lock(&rc->reloc_root_tree.lock);
rb_node = tree_search(&rc->reloc_root_tree.rb_root,
- root->node->start);
+ root->commit_root->start);
if (rb_node) {
node = rb_entry(rb_node, struct mapping_node, rb_node);
rb_erase(&node->rb_node, &rc->reloc_root_tree.rb_root);
+ RB_CLEAR_NODE(&node->rb_node);
}
spin_unlock(&rc->reloc_root_tree.lock);
if (!node)
@@ -1350,7 +1351,7 @@ static void __del_reloc_root(struct btrfs_root *root)
* helper to update the 'address of tree root -> reloc tree'
* mapping
*/
-static int __update_reloc_root(struct btrfs_root *root, u64 new_bytenr)
+static int __update_reloc_root(struct btrfs_root *root)
{
struct btrfs_fs_info *fs_info = root->fs_info;
struct rb_node *rb_node;
@@ -1359,7 +1360,7 @@ static int __update_reloc_root(struct btrfs_root *root, u64 new_bytenr)
spin_lock(&rc->reloc_root_tree.lock);
rb_node = tree_search(&rc->reloc_root_tree.rb_root,
- root->node->start);
+ root->commit_root->start);
if (rb_node) {
node = rb_entry(rb_node, struct mapping_node, rb_node);
rb_erase(&node->rb_node, &rc->reloc_root_tree.rb_root);
@@ -1371,7 +1372,7 @@ static int __update_reloc_root(struct btrfs_root *root, u64 new_bytenr)
BUG_ON((struct btrfs_root *)node->data != root);
spin_lock(&rc->reloc_root_tree.lock);
- node->bytenr = new_bytenr;
+ node->bytenr = root->node->start;
rb_node = tree_insert(&rc->reloc_root_tree.rb_root,
node->bytenr, &node->rb_node);
spin_unlock(&rc->reloc_root_tree.lock);
@@ -1529,6 +1530,7 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans,
}
if (reloc_root->commit_root != reloc_root->node) {
+ __update_reloc_root(reloc_root);
btrfs_set_root_node(root_item, reloc_root->node);
free_extent_buffer(reloc_root->commit_root);
reloc_root->commit_root = btrfs_root_node(reloc_root);
@@ -4733,11 +4735,6 @@ int btrfs_reloc_cow_block(struct btrfs_trans_handle *trans,
BUG_ON(rc->stage == UPDATE_DATA_PTRS &&
root->root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID);
- if (root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID) {
- if (buf == root->node)
- __update_reloc_root(root, cow->start);
- }
-
level = btrfs_header_level(buf);
if (btrfs_header_generation(buf) <=
btrfs_root_last_snapshot(&root->root_item))
--
2.20.1
^ permalink raw reply related [flat|nested] 5+ messages in thread