* [PATCH v3 0/3] btrfs: fix an ASSERT() triggered inside prepare_to_merge()
@ 2023-08-03 6:06 Qu Wenruo
2023-08-03 6:06 ` [PATCH v3 1/3] btrfs: avoid race with qgroup tree creation and relocation Qu Wenruo
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Qu Wenruo @ 2023-08-03 6:06 UTC (permalink / raw)
To: linux-btrfs
[CHANGELOG]
v3:
- Update the comments and commit messages
- Replace the deleted ASSERT(0)s with WARN_ON()s
- Remove a debug dump_stack() in the last patch
v2:
- Add two new patches
One to properly fix the root cause (race between quota tree creation
and relocation).
One to reject obviously corrupted reloc trees.
- Remove two ASSERT()s from merge_reloc_roots()
[BUG]
Syzbot reported an ASSERT() triggered inside prepare_to_merge(), which
turns out to be a regression caused by commit 85724171b302 ("btrfs: fix
the btrfs_get_global_root return value").
[CAUSE]
The race can happen between quota tree creation and relocation, the root
cause is btrfs_get_fs_root() can try to read quota tree as one fs tree,
thus setting ROOT_SHAREABLE flag.
This leads to relocation code to create a new reloc tree for quota tree,
which should not happen.
Furthermore at later relocation stages, we will grab quota root from
fs_info->quota_root, which is not the same as the one read by
btrfs_get_fs_root(), thus quota_root->reloc_root is NULL.
This triggers the ASSERT() and crash the system.
[FIX]
- Make sure non-subvolume trees are always grabbed from fs_info
This changes btrfs_get_root_ref() to a more explicit checks,
and would return PTR_ERR(-ENOENT) if a non-subvolume (data reloc tree
still counts as a subvolume tree) objectid is provided.
This is the root fix.
- Replace the ASSERT() with a more graceful exit
Still does the extra kernel warning through
btrfs_abort_transaction(), but with more useful error messages.
- Reject obviously incorrect reloc trees through tree-checker
Just another layer of sanity checks for on-disk data.
Qu Wenruo (3):
btrfs: avoid race with qgroup tree creation and relocation
btrfs: exit gracefully if reloc roots don't match
btrfs: reject invalid reloc tree root keys with stack dump
fs/btrfs/disk-io.c | 13 +++++++++++-
fs/btrfs/relocation.c | 44 +++++++++++++++++++++++++++++++++--------
fs/btrfs/tree-checker.c | 14 +++++++++++++
3 files changed, 62 insertions(+), 9 deletions(-)
--
2.41.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v3 1/3] btrfs: avoid race with qgroup tree creation and relocation
2023-08-03 6:06 [PATCH v3 0/3] btrfs: fix an ASSERT() triggered inside prepare_to_merge() Qu Wenruo
@ 2023-08-03 6:06 ` Qu Wenruo
2023-08-03 6:06 ` [PATCH v3 2/3] btrfs: exit gracefully if reloc roots don't match Qu Wenruo
2023-08-03 6:06 ` [PATCH v3 3/3] btrfs: reject invalid reloc tree root keys with stack dump Qu Wenruo
2 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2023-08-03 6:06 UTC (permalink / raw)
To: linux-btrfs; +Cc: Filipe Manana, syzbot+ae97a827ae1c3336bbb4
[BUG]
Syzbot reported a weird ASSERT() triggered inside prepare_to_merge().
assertion failed: root->reloc_root == reloc_root, in fs/btrfs/relocation.c:1919
------------[ cut here ]------------
kernel BUG at fs/btrfs/relocation.c:1919!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 9904 Comm: syz-executor.3 Not tainted
6.4.0-syzkaller-08881-g533925cb7604 #0
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 05/27/2023
RIP: 0010:prepare_to_merge+0xbb2/0xc40 fs/btrfs/relocation.c:1919
Code: fe e9 f5 (...)
RSP: 0018:ffffc9000325f760 EFLAGS: 00010246
RAX: 000000000000004f RBX: ffff888075644030 RCX: 1481ccc522da5800
RDX: ffffc90005c09000 RSI: 00000000000364ca RDI: 00000000000364cb
RBP: ffffc9000325f870 R08: ffffffff816f33ac R09: 1ffff9200064bea0
R10: dffffc0000000000 R11: fffff5200064bea1 R12: ffff888075644000
R13: ffff88803b166000 R14: ffff88803b166560 R15: ffff88803b166558
FS: 00007f4e305fd700(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000056080679c000 CR3: 00000000193ad000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
relocate_block_group+0xa5d/0xcd0 fs/btrfs/relocation.c:3749
btrfs_relocate_block_group+0x7ab/0xd70 fs/btrfs/relocation.c:4087
btrfs_relocate_chunk+0x12c/0x3b0 fs/btrfs/volumes.c:3283
__btrfs_balance+0x1b06/0x2690 fs/btrfs/volumes.c:4018
btrfs_balance+0xbdb/0x1120 fs/btrfs/volumes.c:4402
btrfs_ioctl_balance+0x496/0x7c0 fs/btrfs/ioctl.c:3604
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl+0xf8/0x170 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f4e2f88c389
[CAUSE]
With extra debugging, the offending reloc_root is for quota tree
(rootid 8).
Normally we should not use reloc tree for quota root at all, as reloc
trees are only for subvolume trees.
But there is a race between quota enabling and relocation, this happens
after commit 85724171b302 ("btrfs: fix the btrfs_get_global_root return value").
Before that commit, for quota and free space tree, we exit immediately
if we can not grab it from fs_info.
But now we would try to read it from disk, just as if they are fs trees,
this sets ROOT_SHAREABLE flags in such race:
Thread A | Thread B
---------------------------------+------------------------------
btrfs_quota_enable() |
| | btrfs_get_root_ref()
| | |- btrfs_get_global_root()
| | | Returned NULL
| | |- btrfs_lookup_fs_root()
| | | Returned NULL
|- btrfs_create_tree() | |
| Now quota root item is | |
| inserted | |- btrfs_read_tree_root()
| | | Got the newly inserted quota root
| | |- btrfs_init_fs_root()
| | | Set ROOT_SHAREABLE flag
[FIX]
Get back to the old behavior by returning PTR_ERR(-ENOENT) if the target
objectid is not a subvolume tree or data reloc tree.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Fixes: 85724171b302 ("btrfs: fix the btrfs_get_global_root return value")
Reported-and-tested-by: syzbot+ae97a827ae1c3336bbb4@syzkaller.appspotmail.com
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/disk-io.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index da51e5750443..5fd336c597e9 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1300,6 +1300,16 @@ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
root = btrfs_get_global_root(fs_info, objectid);
if (root)
return root;
+
+ /*
+ * If we're called for non-subvolume trees, and above function didn't
+ * found one, do not try to read it from disk.
+ *
+ * This is mostly for fst and quota trees, which can change at runtime
+ * and should only be grabbed from fs_info.
+ */
+ if (!is_fstree(objectid) && objectid != BTRFS_DATA_RELOC_TREE_OBJECTID)
+ return ERR_PTR(-ENOENT);
again:
root = btrfs_lookup_fs_root(fs_info, objectid);
if (root) {
--
2.41.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v3 2/3] btrfs: exit gracefully if reloc roots don't match
2023-08-03 6:06 [PATCH v3 0/3] btrfs: fix an ASSERT() triggered inside prepare_to_merge() Qu Wenruo
2023-08-03 6:06 ` [PATCH v3 1/3] btrfs: avoid race with qgroup tree creation and relocation Qu Wenruo
@ 2023-08-03 6:06 ` Qu Wenruo
2023-08-03 9:06 ` Filipe Manana
2023-08-03 6:06 ` [PATCH v3 3/3] btrfs: reject invalid reloc tree root keys with stack dump Qu Wenruo
2 siblings, 1 reply; 5+ messages in thread
From: Qu Wenruo @ 2023-08-03 6:06 UTC (permalink / raw)
To: linux-btrfs; +Cc: syzbot+ae97a827ae1c3336bbb4
[BUG]
Syzbot reported a crash that an ASSERT() got triggered inside
prepare_to_merge().
[CAUSE]
The root cause of the triggered ASSERT() is we can have a race between
quota tree creation and relocation.
This leads us to create a duplicated quota tree in the
btrfs_read_fs_root() path, and since it's treated as fs tree, it would
have ROOT_SHAREABLE flag, causing us to create a reloc tree for it.
The bug itself is fixed by a dedicated patch for it, but this already
taught us the ASSERT() is not something straightforward for
developers.
[ENHANCEMENT]
Instead of using an ASSERT(), let's handle it gracefully and output
extra info about the mismatch reloc roots to help debug.
Also with the above ASSERT() removed, we can trigger ASSERT(0)s inside
merge_reloc_roots() later.
Also replace those ASSERT(0)s with WARN_ON()s.
Reported-by: syzbot+ae97a827ae1c3336bbb4@syzkaller.appspotmail.com
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/relocation.c | 44 +++++++++++++++++++++++++++++++++++--------
1 file changed, 36 insertions(+), 8 deletions(-)
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 9db2e6fa2cb2..28139a47c227 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1916,7 +1916,38 @@ int prepare_to_merge(struct reloc_control *rc, int err)
err = PTR_ERR(root);
break;
}
- ASSERT(root->reloc_root == reloc_root);
+
+ if (unlikely(root->reloc_root != reloc_root)) {
+ if (root->reloc_root)
+ btrfs_err(fs_info,
+"reloc tree mismatch, root %lld has reloc root key (%lld %u %llu) gen %llu, expect reloc root key (%lld, %u, %llu) gen %llu",
+ root->root_key.objectid,
+ root->reloc_root->root_key.objectid,
+ root->reloc_root->root_key.type,
+ root->reloc_root->root_key.offset,
+ btrfs_root_generation(
+ &root->reloc_root->root_item),
+ reloc_root->root_key.objectid,
+ reloc_root->root_key.type,
+ reloc_root->root_key.offset,
+ btrfs_root_generation(
+ &reloc_root->root_item));
+ else
+ btrfs_err(fs_info,
+"reloc tree mismatch, root %lld has no reloc root, expect reloc root key (%lld %u %llu) gen %llu",
+ root->root_key.objectid,
+ reloc_root->root_key.objectid,
+ reloc_root->root_key.type,
+ reloc_root->root_key.offset,
+ btrfs_root_generation(
+ &reloc_root->root_item));
+ list_add(&reloc_root->root_list, &reloc_roots);
+ btrfs_put_root(root);
+ btrfs_abort_transaction(trans, -EUCLEAN);
+ if (!err)
+ err = -EUCLEAN;
+ break;
+ }
/*
* set reference count to 1, so btrfs_recover_relocation
@@ -1989,7 +2020,7 @@ void merge_reloc_roots(struct reloc_control *rc)
root = btrfs_get_fs_root(fs_info, reloc_root->root_key.offset,
false);
if (btrfs_root_refs(&reloc_root->root_item) > 0) {
- if (IS_ERR(root)) {
+ if (WARN_ON(IS_ERR(root))) {
/*
* For recovery we read the fs roots on mount,
* and if we didn't find the root then we marked
@@ -1998,17 +2029,14 @@ void merge_reloc_roots(struct reloc_control *rc)
* memory. However there's no reason we can't
* handle the error properly here just in case.
*/
- ASSERT(0);
ret = PTR_ERR(root);
goto out;
}
- if (root->reloc_root != reloc_root) {
+ if (WARN_ON(root->reloc_root != reloc_root)) {
/*
- * This is actually impossible without something
- * going really wrong (like weird race condition
- * or cosmic rays).
+ * This can happen if on-disk metadata has some
+ * corruption, e.g. bad reloc tree key offset.
*/
- ASSERT(0);
ret = -EINVAL;
goto out;
}
--
2.41.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v3 3/3] btrfs: reject invalid reloc tree root keys with stack dump
2023-08-03 6:06 [PATCH v3 0/3] btrfs: fix an ASSERT() triggered inside prepare_to_merge() Qu Wenruo
2023-08-03 6:06 ` [PATCH v3 1/3] btrfs: avoid race with qgroup tree creation and relocation Qu Wenruo
2023-08-03 6:06 ` [PATCH v3 2/3] btrfs: exit gracefully if reloc roots don't match Qu Wenruo
@ 2023-08-03 6:06 ` Qu Wenruo
2 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2023-08-03 6:06 UTC (permalink / raw)
To: linux-btrfs; +Cc: Filipe Manana, syzbot+ae97a827ae1c3336bbb4
[BUG]
Syzbot reported a crash that an ASSERT() got triggered inside
prepare_to_merge().
That ASSERT() makes sure the reloc tree is properly pointed back by its
subvolume tree.
[CAUSE]
After more debugging output, it turns out we had an invalid reloc tree:
BTRFS error (device loop1): reloc tree mismatch, root 8 has no reloc root, expect reloc root key (-8, 132, 8) gen 17
Note the above root key is (TREE_RELOC_OBJECTID, ROOT_ITEM,
QUOTA_TREE_OBJECTID), meaning it's a reloc tree for quota tree.
But reloc trees can only exist for subvolumes, as for non-subvolume
trees, we just COW the involved tree block, no need to create a reloc
tree since those tree blocks won't be shared with other trees.
Only subvolumes tree can share tree blocks with other trees (thus they
have BTRFS_ROOT_SHAREABLE flag).
Thus this new debug output proves my previous assumption that corrupted
on-disk data can trigger that ASSERT().
[FIX]
Besides the dedicated fix and the graceful exit, also let tree-checker to
check such root keys, to make sure reloc trees can only exist for
subvolumes.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reported-by: syzbot+ae97a827ae1c3336bbb4@syzkaller.appspotmail.com
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/disk-io.c | 3 ++-
fs/btrfs/tree-checker.c | 14 ++++++++++++++
2 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 5fd336c597e9..a01eac963075 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1103,7 +1103,8 @@ static int btrfs_init_fs_root(struct btrfs_root *root, dev_t anon_dev)
btrfs_drew_lock_init(&root->snapshot_lock);
if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID &&
- !btrfs_is_data_reloc_root(root)) {
+ !btrfs_is_data_reloc_root(root) &&
+ is_fstree(root->root_key.objectid)) {
set_bit(BTRFS_ROOT_SHAREABLE, &root->state);
btrfs_check_and_init_root_item(&root->root_item);
}
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 038dfa8f1788..739c7c1d8b06 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -446,6 +446,20 @@ static int check_root_key(struct extent_buffer *leaf, struct btrfs_key *key,
btrfs_item_key_to_cpu(leaf, &item_key, slot);
is_root_item = (item_key.type == BTRFS_ROOT_ITEM_KEY);
+ /*
+ * Bad rootid for reloc trees.
+ *
+ * Reloc trees are only for subvolume trees, other trees only need
+ * to be COWed to be relocated.
+ */
+ if (unlikely(is_root_item && key->objectid == BTRFS_TREE_RELOC_OBJECTID &&
+ !is_fstree(key->offset))) {
+ generic_err(leaf, slot,
+ "invalid reloc tree for root %lld, root id is not a subvolume tree",
+ key->offset);
+ return -EUCLEAN;
+ }
+
/* No such tree id */
if (unlikely(key->objectid == 0)) {
if (is_root_item)
--
2.41.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v3 2/3] btrfs: exit gracefully if reloc roots don't match
2023-08-03 6:06 ` [PATCH v3 2/3] btrfs: exit gracefully if reloc roots don't match Qu Wenruo
@ 2023-08-03 9:06 ` Filipe Manana
0 siblings, 0 replies; 5+ messages in thread
From: Filipe Manana @ 2023-08-03 9:06 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs, syzbot+ae97a827ae1c3336bbb4
On Thu, Aug 3, 2023 at 7:40 AM Qu Wenruo <wqu@suse.com> wrote:
>
> [BUG]
> Syzbot reported a crash that an ASSERT() got triggered inside
> prepare_to_merge().
>
> [CAUSE]
> The root cause of the triggered ASSERT() is we can have a race between
> quota tree creation and relocation.
>
> This leads us to create a duplicated quota tree in the
> btrfs_read_fs_root() path, and since it's treated as fs tree, it would
> have ROOT_SHAREABLE flag, causing us to create a reloc tree for it.
>
> The bug itself is fixed by a dedicated patch for it, but this already
> taught us the ASSERT() is not something straightforward for
> developers.
>
> [ENHANCEMENT]
> Instead of using an ASSERT(), let's handle it gracefully and output
> extra info about the mismatch reloc roots to help debug.
>
> Also with the above ASSERT() removed, we can trigger ASSERT(0)s inside
> merge_reloc_roots() later.
> Also replace those ASSERT(0)s with WARN_ON()s.
>
> Reported-by: syzbot+ae97a827ae1c3336bbb4@syzkaller.appspotmail.com
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> fs/btrfs/relocation.c | 44 +++++++++++++++++++++++++++++++++++--------
> 1 file changed, 36 insertions(+), 8 deletions(-)
>
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 9db2e6fa2cb2..28139a47c227 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -1916,7 +1916,38 @@ int prepare_to_merge(struct reloc_control *rc, int err)
> err = PTR_ERR(root);
> break;
> }
> - ASSERT(root->reloc_root == reloc_root);
> +
> + if (unlikely(root->reloc_root != reloc_root)) {
> + if (root->reloc_root)
> + btrfs_err(fs_info,
> +"reloc tree mismatch, root %lld has reloc root key (%lld %u %llu) gen %llu, expect reloc root key (%lld, %u, %llu) gen %llu",
My comment about printing the key elements with commas still remains,
there's still one key printed as "(%lld, %u, %llu)" while
the other cases don't use the commas.
Otherwise it looks good, thanks.
> + root->root_key.objectid,
> + root->reloc_root->root_key.objectid,
> + root->reloc_root->root_key.type,
> + root->reloc_root->root_key.offset,
> + btrfs_root_generation(
> + &root->reloc_root->root_item),
> + reloc_root->root_key.objectid,
> + reloc_root->root_key.type,
> + reloc_root->root_key.offset,
> + btrfs_root_generation(
> + &reloc_root->root_item));
> + else
> + btrfs_err(fs_info,
> +"reloc tree mismatch, root %lld has no reloc root, expect reloc root key (%lld %u %llu) gen %llu",
> + root->root_key.objectid,
> + reloc_root->root_key.objectid,
> + reloc_root->root_key.type,
> + reloc_root->root_key.offset,
> + btrfs_root_generation(
> + &reloc_root->root_item));
> + list_add(&reloc_root->root_list, &reloc_roots);
> + btrfs_put_root(root);
> + btrfs_abort_transaction(trans, -EUCLEAN);
> + if (!err)
> + err = -EUCLEAN;
> + break;
> + }
>
> /*
> * set reference count to 1, so btrfs_recover_relocation
> @@ -1989,7 +2020,7 @@ void merge_reloc_roots(struct reloc_control *rc)
> root = btrfs_get_fs_root(fs_info, reloc_root->root_key.offset,
> false);
> if (btrfs_root_refs(&reloc_root->root_item) > 0) {
> - if (IS_ERR(root)) {
> + if (WARN_ON(IS_ERR(root))) {
> /*
> * For recovery we read the fs roots on mount,
> * and if we didn't find the root then we marked
> @@ -1998,17 +2029,14 @@ void merge_reloc_roots(struct reloc_control *rc)
> * memory. However there's no reason we can't
> * handle the error properly here just in case.
> */
> - ASSERT(0);
> ret = PTR_ERR(root);
> goto out;
> }
> - if (root->reloc_root != reloc_root) {
> + if (WARN_ON(root->reloc_root != reloc_root)) {
> /*
> - * This is actually impossible without something
> - * going really wrong (like weird race condition
> - * or cosmic rays).
> + * This can happen if on-disk metadata has some
> + * corruption, e.g. bad reloc tree key offset.
> */
> - ASSERT(0);
> ret = -EINVAL;
> goto out;
> }
> --
> 2.41.0
>
--
Filipe David Manana,
“Whether you think you can, or you think you can't — you're right.”
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-08-03 9:06 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-03 6:06 [PATCH v3 0/3] btrfs: fix an ASSERT() triggered inside prepare_to_merge() Qu Wenruo
2023-08-03 6:06 ` [PATCH v3 1/3] btrfs: avoid race with qgroup tree creation and relocation Qu Wenruo
2023-08-03 6:06 ` [PATCH v3 2/3] btrfs: exit gracefully if reloc roots don't match Qu Wenruo
2023-08-03 9:06 ` Filipe Manana
2023-08-03 6:06 ` [PATCH v3 3/3] btrfs: reject invalid reloc tree root keys with stack dump Qu Wenruo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox