* [PATCH v3 0/3] btrfs: fix an ASSERT() triggered inside prepare_to_merge()
@ 2023-08-03 6:06 Qu Wenruo
2023-08-03 6:06 ` [PATCH v3 1/3] btrfs: avoid race with qgroup tree creation and relocation Qu Wenruo
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Qu Wenruo @ 2023-08-03 6:06 UTC (permalink / raw)
To: linux-btrfs
[CHANGELOG]
v3:
- Update the comments and commit messages
- Replace the deleted ASSERT(0)s with WARN_ON()s
- Remove a debug dump_stack() in the last patch
v2:
- Add two new patches
One to properly fix the root cause (race between quota tree creation
and relocation).
One to reject obviously corrupted reloc trees.
- Remove two ASSERT()s from merge_reloc_roots()
[BUG]
Syzbot reported an ASSERT() triggered inside prepare_to_merge(), which
turns out to be a regression caused by commit 85724171b302 ("btrfs: fix
the btrfs_get_global_root return value").
[CAUSE]
The race can happen between quota tree creation and relocation, the root
cause is btrfs_get_fs_root() can try to read quota tree as one fs tree,
thus setting ROOT_SHAREABLE flag.
This leads to relocation code to create a new reloc tree for quota tree,
which should not happen.
Furthermore at later relocation stages, we will grab quota root from
fs_info->quota_root, which is not the same as the one read by
btrfs_get_fs_root(), thus quota_root->reloc_root is NULL.
This triggers the ASSERT() and crash the system.
[FIX]
- Make sure non-subvolume trees are always grabbed from fs_info
This changes btrfs_get_root_ref() to a more explicit checks,
and would return PTR_ERR(-ENOENT) if a non-subvolume (data reloc tree
still counts as a subvolume tree) objectid is provided.
This is the root fix.
- Replace the ASSERT() with a more graceful exit
Still does the extra kernel warning through
btrfs_abort_transaction(), but with more useful error messages.
- Reject obviously incorrect reloc trees through tree-checker
Just another layer of sanity checks for on-disk data.
Qu Wenruo (3):
btrfs: avoid race with qgroup tree creation and relocation
btrfs: exit gracefully if reloc roots don't match
btrfs: reject invalid reloc tree root keys with stack dump
fs/btrfs/disk-io.c | 13 +++++++++++-
fs/btrfs/relocation.c | 44 +++++++++++++++++++++++++++++++++--------
fs/btrfs/tree-checker.c | 14 +++++++++++++
3 files changed, 62 insertions(+), 9 deletions(-)
--
2.41.0
^ permalink raw reply [flat|nested] 5+ messages in thread* [PATCH v3 1/3] btrfs: avoid race with qgroup tree creation and relocation 2023-08-03 6:06 [PATCH v3 0/3] btrfs: fix an ASSERT() triggered inside prepare_to_merge() Qu Wenruo @ 2023-08-03 6:06 ` Qu Wenruo 2023-08-03 6:06 ` [PATCH v3 2/3] btrfs: exit gracefully if reloc roots don't match Qu Wenruo 2023-08-03 6:06 ` [PATCH v3 3/3] btrfs: reject invalid reloc tree root keys with stack dump Qu Wenruo 2 siblings, 0 replies; 5+ messages in thread From: Qu Wenruo @ 2023-08-03 6:06 UTC (permalink / raw) To: linux-btrfs; +Cc: Filipe Manana, syzbot+ae97a827ae1c3336bbb4 [BUG] Syzbot reported a weird ASSERT() triggered inside prepare_to_merge(). assertion failed: root->reloc_root == reloc_root, in fs/btrfs/relocation.c:1919 ------------[ cut here ]------------ kernel BUG at fs/btrfs/relocation.c:1919! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 9904 Comm: syz-executor.3 Not tainted 6.4.0-syzkaller-08881-g533925cb7604 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023 RIP: 0010:prepare_to_merge+0xbb2/0xc40 fs/btrfs/relocation.c:1919 Code: fe e9 f5 (...) RSP: 0018:ffffc9000325f760 EFLAGS: 00010246 RAX: 000000000000004f RBX: ffff888075644030 RCX: 1481ccc522da5800 RDX: ffffc90005c09000 RSI: 00000000000364ca RDI: 00000000000364cb RBP: ffffc9000325f870 R08: ffffffff816f33ac R09: 1ffff9200064bea0 R10: dffffc0000000000 R11: fffff5200064bea1 R12: ffff888075644000 R13: ffff88803b166000 R14: ffff88803b166560 R15: ffff88803b166558 FS: 00007f4e305fd700(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000056080679c000 CR3: 00000000193ad000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> relocate_block_group+0xa5d/0xcd0 fs/btrfs/relocation.c:3749 btrfs_relocate_block_group+0x7ab/0xd70 fs/btrfs/relocation.c:4087 btrfs_relocate_chunk+0x12c/0x3b0 fs/btrfs/volumes.c:3283 __btrfs_balance+0x1b06/0x2690 fs/btrfs/volumes.c:4018 btrfs_balance+0xbdb/0x1120 fs/btrfs/volumes.c:4402 btrfs_ioctl_balance+0x496/0x7c0 fs/btrfs/ioctl.c:3604 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl+0xf8/0x170 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f4e2f88c389 [CAUSE] With extra debugging, the offending reloc_root is for quota tree (rootid 8). Normally we should not use reloc tree for quota root at all, as reloc trees are only for subvolume trees. But there is a race between quota enabling and relocation, this happens after commit 85724171b302 ("btrfs: fix the btrfs_get_global_root return value"). Before that commit, for quota and free space tree, we exit immediately if we can not grab it from fs_info. But now we would try to read it from disk, just as if they are fs trees, this sets ROOT_SHAREABLE flags in such race: Thread A | Thread B ---------------------------------+------------------------------ btrfs_quota_enable() | | | btrfs_get_root_ref() | | |- btrfs_get_global_root() | | | Returned NULL | | |- btrfs_lookup_fs_root() | | | Returned NULL |- btrfs_create_tree() | | | Now quota root item is | | | inserted | |- btrfs_read_tree_root() | | | Got the newly inserted quota root | | |- btrfs_init_fs_root() | | | Set ROOT_SHAREABLE flag [FIX] Get back to the old behavior by returning PTR_ERR(-ENOENT) if the target objectid is not a subvolume tree or data reloc tree. Reviewed-by: Filipe Manana <fdmanana@suse.com> Fixes: 85724171b302 ("btrfs: fix the btrfs_get_global_root return value") Reported-and-tested-by: syzbot+ae97a827ae1c3336bbb4@syzkaller.appspotmail.com Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/disk-io.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index da51e5750443..5fd336c597e9 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1300,6 +1300,16 @@ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info, root = btrfs_get_global_root(fs_info, objectid); if (root) return root; + + /* + * If we're called for non-subvolume trees, and above function didn't + * found one, do not try to read it from disk. + * + * This is mostly for fst and quota trees, which can change at runtime + * and should only be grabbed from fs_info. + */ + if (!is_fstree(objectid) && objectid != BTRFS_DATA_RELOC_TREE_OBJECTID) + return ERR_PTR(-ENOENT); again: root = btrfs_lookup_fs_root(fs_info, objectid); if (root) { -- 2.41.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v3 2/3] btrfs: exit gracefully if reloc roots don't match 2023-08-03 6:06 [PATCH v3 0/3] btrfs: fix an ASSERT() triggered inside prepare_to_merge() Qu Wenruo 2023-08-03 6:06 ` [PATCH v3 1/3] btrfs: avoid race with qgroup tree creation and relocation Qu Wenruo @ 2023-08-03 6:06 ` Qu Wenruo 2023-08-03 9:06 ` Filipe Manana 2023-08-03 6:06 ` [PATCH v3 3/3] btrfs: reject invalid reloc tree root keys with stack dump Qu Wenruo 2 siblings, 1 reply; 5+ messages in thread From: Qu Wenruo @ 2023-08-03 6:06 UTC (permalink / raw) To: linux-btrfs; +Cc: syzbot+ae97a827ae1c3336bbb4 [BUG] Syzbot reported a crash that an ASSERT() got triggered inside prepare_to_merge(). [CAUSE] The root cause of the triggered ASSERT() is we can have a race between quota tree creation and relocation. This leads us to create a duplicated quota tree in the btrfs_read_fs_root() path, and since it's treated as fs tree, it would have ROOT_SHAREABLE flag, causing us to create a reloc tree for it. The bug itself is fixed by a dedicated patch for it, but this already taught us the ASSERT() is not something straightforward for developers. [ENHANCEMENT] Instead of using an ASSERT(), let's handle it gracefully and output extra info about the mismatch reloc roots to help debug. Also with the above ASSERT() removed, we can trigger ASSERT(0)s inside merge_reloc_roots() later. Also replace those ASSERT(0)s with WARN_ON()s. Reported-by: syzbot+ae97a827ae1c3336bbb4@syzkaller.appspotmail.com Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/relocation.c | 44 +++++++++++++++++++++++++++++++++++-------- 1 file changed, 36 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 9db2e6fa2cb2..28139a47c227 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -1916,7 +1916,38 @@ int prepare_to_merge(struct reloc_control *rc, int err) err = PTR_ERR(root); break; } - ASSERT(root->reloc_root == reloc_root); + + if (unlikely(root->reloc_root != reloc_root)) { + if (root->reloc_root) + btrfs_err(fs_info, +"reloc tree mismatch, root %lld has reloc root key (%lld %u %llu) gen %llu, expect reloc root key (%lld, %u, %llu) gen %llu", + root->root_key.objectid, + root->reloc_root->root_key.objectid, + root->reloc_root->root_key.type, + root->reloc_root->root_key.offset, + btrfs_root_generation( + &root->reloc_root->root_item), + reloc_root->root_key.objectid, + reloc_root->root_key.type, + reloc_root->root_key.offset, + btrfs_root_generation( + &reloc_root->root_item)); + else + btrfs_err(fs_info, +"reloc tree mismatch, root %lld has no reloc root, expect reloc root key (%lld %u %llu) gen %llu", + root->root_key.objectid, + reloc_root->root_key.objectid, + reloc_root->root_key.type, + reloc_root->root_key.offset, + btrfs_root_generation( + &reloc_root->root_item)); + list_add(&reloc_root->root_list, &reloc_roots); + btrfs_put_root(root); + btrfs_abort_transaction(trans, -EUCLEAN); + if (!err) + err = -EUCLEAN; + break; + } /* * set reference count to 1, so btrfs_recover_relocation @@ -1989,7 +2020,7 @@ void merge_reloc_roots(struct reloc_control *rc) root = btrfs_get_fs_root(fs_info, reloc_root->root_key.offset, false); if (btrfs_root_refs(&reloc_root->root_item) > 0) { - if (IS_ERR(root)) { + if (WARN_ON(IS_ERR(root))) { /* * For recovery we read the fs roots on mount, * and if we didn't find the root then we marked @@ -1998,17 +2029,14 @@ void merge_reloc_roots(struct reloc_control *rc) * memory. However there's no reason we can't * handle the error properly here just in case. */ - ASSERT(0); ret = PTR_ERR(root); goto out; } - if (root->reloc_root != reloc_root) { + if (WARN_ON(root->reloc_root != reloc_root)) { /* - * This is actually impossible without something - * going really wrong (like weird race condition - * or cosmic rays). + * This can happen if on-disk metadata has some + * corruption, e.g. bad reloc tree key offset. */ - ASSERT(0); ret = -EINVAL; goto out; } -- 2.41.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v3 2/3] btrfs: exit gracefully if reloc roots don't match 2023-08-03 6:06 ` [PATCH v3 2/3] btrfs: exit gracefully if reloc roots don't match Qu Wenruo @ 2023-08-03 9:06 ` Filipe Manana 0 siblings, 0 replies; 5+ messages in thread From: Filipe Manana @ 2023-08-03 9:06 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs, syzbot+ae97a827ae1c3336bbb4 On Thu, Aug 3, 2023 at 7:40 AM Qu Wenruo <wqu@suse.com> wrote: > > [BUG] > Syzbot reported a crash that an ASSERT() got triggered inside > prepare_to_merge(). > > [CAUSE] > The root cause of the triggered ASSERT() is we can have a race between > quota tree creation and relocation. > > This leads us to create a duplicated quota tree in the > btrfs_read_fs_root() path, and since it's treated as fs tree, it would > have ROOT_SHAREABLE flag, causing us to create a reloc tree for it. > > The bug itself is fixed by a dedicated patch for it, but this already > taught us the ASSERT() is not something straightforward for > developers. > > [ENHANCEMENT] > Instead of using an ASSERT(), let's handle it gracefully and output > extra info about the mismatch reloc roots to help debug. > > Also with the above ASSERT() removed, we can trigger ASSERT(0)s inside > merge_reloc_roots() later. > Also replace those ASSERT(0)s with WARN_ON()s. > > Reported-by: syzbot+ae97a827ae1c3336bbb4@syzkaller.appspotmail.com > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/relocation.c | 44 +++++++++++++++++++++++++++++++++++-------- > 1 file changed, 36 insertions(+), 8 deletions(-) > > diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c > index 9db2e6fa2cb2..28139a47c227 100644 > --- a/fs/btrfs/relocation.c > +++ b/fs/btrfs/relocation.c > @@ -1916,7 +1916,38 @@ int prepare_to_merge(struct reloc_control *rc, int err) > err = PTR_ERR(root); > break; > } > - ASSERT(root->reloc_root == reloc_root); > + > + if (unlikely(root->reloc_root != reloc_root)) { > + if (root->reloc_root) > + btrfs_err(fs_info, > +"reloc tree mismatch, root %lld has reloc root key (%lld %u %llu) gen %llu, expect reloc root key (%lld, %u, %llu) gen %llu", My comment about printing the key elements with commas still remains, there's still one key printed as "(%lld, %u, %llu)" while the other cases don't use the commas. Otherwise it looks good, thanks. > + root->root_key.objectid, > + root->reloc_root->root_key.objectid, > + root->reloc_root->root_key.type, > + root->reloc_root->root_key.offset, > + btrfs_root_generation( > + &root->reloc_root->root_item), > + reloc_root->root_key.objectid, > + reloc_root->root_key.type, > + reloc_root->root_key.offset, > + btrfs_root_generation( > + &reloc_root->root_item)); > + else > + btrfs_err(fs_info, > +"reloc tree mismatch, root %lld has no reloc root, expect reloc root key (%lld %u %llu) gen %llu", > + root->root_key.objectid, > + reloc_root->root_key.objectid, > + reloc_root->root_key.type, > + reloc_root->root_key.offset, > + btrfs_root_generation( > + &reloc_root->root_item)); > + list_add(&reloc_root->root_list, &reloc_roots); > + btrfs_put_root(root); > + btrfs_abort_transaction(trans, -EUCLEAN); > + if (!err) > + err = -EUCLEAN; > + break; > + } > > /* > * set reference count to 1, so btrfs_recover_relocation > @@ -1989,7 +2020,7 @@ void merge_reloc_roots(struct reloc_control *rc) > root = btrfs_get_fs_root(fs_info, reloc_root->root_key.offset, > false); > if (btrfs_root_refs(&reloc_root->root_item) > 0) { > - if (IS_ERR(root)) { > + if (WARN_ON(IS_ERR(root))) { > /* > * For recovery we read the fs roots on mount, > * and if we didn't find the root then we marked > @@ -1998,17 +2029,14 @@ void merge_reloc_roots(struct reloc_control *rc) > * memory. However there's no reason we can't > * handle the error properly here just in case. > */ > - ASSERT(0); > ret = PTR_ERR(root); > goto out; > } > - if (root->reloc_root != reloc_root) { > + if (WARN_ON(root->reloc_root != reloc_root)) { > /* > - * This is actually impossible without something > - * going really wrong (like weird race condition > - * or cosmic rays). > + * This can happen if on-disk metadata has some > + * corruption, e.g. bad reloc tree key offset. > */ > - ASSERT(0); > ret = -EINVAL; > goto out; > } > -- > 2.41.0 > -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.” ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v3 3/3] btrfs: reject invalid reloc tree root keys with stack dump 2023-08-03 6:06 [PATCH v3 0/3] btrfs: fix an ASSERT() triggered inside prepare_to_merge() Qu Wenruo 2023-08-03 6:06 ` [PATCH v3 1/3] btrfs: avoid race with qgroup tree creation and relocation Qu Wenruo 2023-08-03 6:06 ` [PATCH v3 2/3] btrfs: exit gracefully if reloc roots don't match Qu Wenruo @ 2023-08-03 6:06 ` Qu Wenruo 2 siblings, 0 replies; 5+ messages in thread From: Qu Wenruo @ 2023-08-03 6:06 UTC (permalink / raw) To: linux-btrfs; +Cc: Filipe Manana, syzbot+ae97a827ae1c3336bbb4 [BUG] Syzbot reported a crash that an ASSERT() got triggered inside prepare_to_merge(). That ASSERT() makes sure the reloc tree is properly pointed back by its subvolume tree. [CAUSE] After more debugging output, it turns out we had an invalid reloc tree: BTRFS error (device loop1): reloc tree mismatch, root 8 has no reloc root, expect reloc root key (-8, 132, 8) gen 17 Note the above root key is (TREE_RELOC_OBJECTID, ROOT_ITEM, QUOTA_TREE_OBJECTID), meaning it's a reloc tree for quota tree. But reloc trees can only exist for subvolumes, as for non-subvolume trees, we just COW the involved tree block, no need to create a reloc tree since those tree blocks won't be shared with other trees. Only subvolumes tree can share tree blocks with other trees (thus they have BTRFS_ROOT_SHAREABLE flag). Thus this new debug output proves my previous assumption that corrupted on-disk data can trigger that ASSERT(). [FIX] Besides the dedicated fix and the graceful exit, also let tree-checker to check such root keys, to make sure reloc trees can only exist for subvolumes. Reviewed-by: Filipe Manana <fdmanana@suse.com> Reported-by: syzbot+ae97a827ae1c3336bbb4@syzkaller.appspotmail.com Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/disk-io.c | 3 ++- fs/btrfs/tree-checker.c | 14 ++++++++++++++ 2 files changed, 16 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 5fd336c597e9..a01eac963075 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1103,7 +1103,8 @@ static int btrfs_init_fs_root(struct btrfs_root *root, dev_t anon_dev) btrfs_drew_lock_init(&root->snapshot_lock); if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID && - !btrfs_is_data_reloc_root(root)) { + !btrfs_is_data_reloc_root(root) && + is_fstree(root->root_key.objectid)) { set_bit(BTRFS_ROOT_SHAREABLE, &root->state); btrfs_check_and_init_root_item(&root->root_item); } diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c index 038dfa8f1788..739c7c1d8b06 100644 --- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -446,6 +446,20 @@ static int check_root_key(struct extent_buffer *leaf, struct btrfs_key *key, btrfs_item_key_to_cpu(leaf, &item_key, slot); is_root_item = (item_key.type == BTRFS_ROOT_ITEM_KEY); + /* + * Bad rootid for reloc trees. + * + * Reloc trees are only for subvolume trees, other trees only need + * to be COWed to be relocated. + */ + if (unlikely(is_root_item && key->objectid == BTRFS_TREE_RELOC_OBJECTID && + !is_fstree(key->offset))) { + generic_err(leaf, slot, + "invalid reloc tree for root %lld, root id is not a subvolume tree", + key->offset); + return -EUCLEAN; + } + /* No such tree id */ if (unlikely(key->objectid == 0)) { if (is_root_item) -- 2.41.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-08-03 9:06 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-08-03 6:06 [PATCH v3 0/3] btrfs: fix an ASSERT() triggered inside prepare_to_merge() Qu Wenruo 2023-08-03 6:06 ` [PATCH v3 1/3] btrfs: avoid race with qgroup tree creation and relocation Qu Wenruo 2023-08-03 6:06 ` [PATCH v3 2/3] btrfs: exit gracefully if reloc roots don't match Qu Wenruo 2023-08-03 9:06 ` Filipe Manana 2023-08-03 6:06 ` [PATCH v3 3/3] btrfs: reject invalid reloc tree root keys with stack dump Qu Wenruo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox