All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com,
	Qu Wenruo <wqu@suse.com>
Subject: Re: [PATCH v5 13/52] btrfs: handle errors from select_reloc_root()
Date: Tue, 8 Dec 2020 09:44:46 -0500	[thread overview]
Message-ID: <20201208144446.GH31381@hungrycats.org> (raw)
In-Reply-To: <5daa486a9ce06876bdef92a50f1a11d4eee9da67.1607349282.git.josef@toxicpanda.com>

On Mon, Dec 07, 2020 at 08:57:05AM -0500, Josef Bacik wrote:
> Currently select_reloc_root() doesn't return an error, but followup
> patches will make it possible for it to return an error.  We do have
> proper error recovery in do_relocation however, so handle the
> possibility of select_reloc_root() having an error properly instead of
> BUG_ON(!root).  I've also adjusted select_reloc_root() to return
> ERR_PTR(-ENOENT) if we don't find a root, instead of NULL, to make the
> error case easier to deal with.  I've replaced the BUG_ON(!root) with an
> ASSERT(ret != -ENOENT), as this indicates we messed up the backref
> walking code, but could indicate corruption so we do not want to have a
> BUG_ON() here.
> 
> Reviewed-by: Qu Wenruo <wqu@suse.com>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/relocation.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 4333ee329290..66515ccc04fe 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -2027,7 +2027,7 @@ struct btrfs_root *select_reloc_root(struct btrfs_trans_handle *trans,
>  			break;
>  	}
>  	if (!root)
> -		return NULL;
> +		return ERR_PTR(-ENOENT);
>  
>  	next = node;
>  	/* setup backref node path for btrfs_reloc_cow_block */
> @@ -2198,7 +2198,18 @@ static int do_relocation(struct btrfs_trans_handle *trans,
>  
>  		upper = edge->node[UPPER];
>  		root = select_reloc_root(trans, rc, upper, edges);
> -		BUG_ON(!root);
> +		if (IS_ERR(root)) {
> +			ret = PTR_ERR(root);
> +
> +			/*
> +			 * This can happen if there's fs corruption, but if we
> +			 * have ASSERT()'s on then we're developers and we
> +			 * likely made a logic mistake in the backref code, so
> +			 * check for this error condition.
> +			 */
> +			ASSERT(ret != -ENOENT);

Hit this once last night.  Test VM kept going after a reboot.

	[17579.428311][T10916] BTRFS info (device dm-0): balance: start -mlimit=1 -slimit=1
	[17579.773705][T10916] BTRFS info (device dm-0): relocating block group 3262427168768 flags metadata|raid1
	[17581.153861][T10916] assertion failed: ret != -ENOENT, in fs/btrfs/relocation.c:2369
	[17581.157815][T10916] ------------[ cut here ]------------
	[17581.161214][T10916] kernel BUG at fs/btrfs/ctree.h:3368!
	[17581.162817][T10916] invalid opcode: 0000 [#1] SMP KASAN PTI
	[17581.167786][T10916] CPU: 3 PID: 10916 Comm: btrfs Tainted: G        W         5.10.0-69dfffde3fb6-for-next+ #22
	[17581.174638][T10916] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
	[17581.177226][T10916] RIP: 0010:assertfail+0x18/0x1a
	[17581.178622][T10916] Code: a0 46 aa fe 48 8b 3d a9 bd fd 02 e8 94 46 aa fe 5d c3 55 89 d1 48 89 f2 48 89 fe 48 c7 c7 00 20 24 96 48 89 e5 e8 74 36 fe ff <0f> 0b 48 89 df e8 00 2b b4 fe 48 8b 85 98 fe ff ff 44 89 ea 48 c7
	[17581.187531][T10916] RSP: 0018:ffffc90001757400 EFLAGS: 00010286
	[17581.189711][T10916] RAX: 000000000000003f RBX: ffff888012dabf00 RCX: ffffffff94265ba2
	[17581.192629][T10916] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff8881f5fff28c
	[17581.195411][T10916] RBP: ffffc90001757400 R08: ffffed103ec015b5 R09: ffffed103ec015b5
	[17581.198061][T10916] R10: ffff8881f600ada7 R11: ffffed103ec015b4 R12: ffff888012dab668
	[17581.201387][T10916] R13: 0000000000000004 R14: 00000000fffffffe R15: ffff888005f18f00
	[17581.203965][T10916] FS:  00007f55e295b8c0(0000) GS:ffff8881f5e00000(0000) knlGS:0000000000000000
	[17581.207716][T10916] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[17581.209584][T10916] CR2: 0000560ee9649da4 CR3: 00000000720d4005 CR4: 0000000000170ee0
	[17581.214157][T10916] Call Trace:
	[17581.215835][T10916]  do_relocation.cold.55+0x9c/0xb9
	[17581.218719][T10916]  ? select_reloc_root+0x6e0/0x6e0
	[17581.221899][T10916]  ? walk_up_backref+0x91/0xd0
	[17581.224508][T10916]  ? __asan_loadN+0xf/0x20
	[17581.234986][T10916]  ? memcpy+0x4d/0x60
	[17581.242655][T10916]  ? read_extent_buffer+0xd1/0x100
	[17581.250560][T10916]  relocate_tree_blocks+0xa70/0xb90
	[17581.253642][T10916]  ? do_relocation+0xc10/0xc10
	[17581.256300][T10916]  ? kmem_cache_alloc_trace+0x6a3/0xcb0
	[17581.259164][T10916]  ? free_extent_buffer.part.52+0xd7/0x140
	[17581.265002][T10916]  ? rb_insert_color+0x342/0x360
	[17581.267469][T10916]  relocate_block_group+0x2eb/0x780
	[17581.271941][T10916]  ? merge_reloc_roots+0x4a0/0x4a0
	[17581.273800][T10916]  btrfs_relocate_block_group+0x26e/0x4c0
	[17581.276285][T10916]  btrfs_relocate_chunk+0x52/0x120
	[17581.280022][T10916]  btrfs_balance+0xe2e/0x1900
	[17581.284864][T10916]  ? __kasan_check_read+0x11/0x20
	[17581.288259][T10916]  ? lock_acquire+0xd0/0x550
	[17581.291309][T10916]  ? btrfs_relocate_chunk+0x120/0x120
	[17581.295542][T10916]  ? kasan_unpoison_shadow+0x40/0x50
	[17581.298465][T10916]  ? kmem_cache_alloc_trace+0x6a3/0xcb0
	[17581.301092][T10916]  ? _copy_from_user+0x83/0xc0
	[17581.303427][T10916]  btrfs_ioctl_balance+0x3a7/0x460
	[17581.305850][T10916]  btrfs_ioctl+0x24c8/0x4360
	[17581.307699][T10916]  ? __kasan_check_read+0x11/0x20
	[17581.310098][T10916]  ? lock_release+0xc8/0x640
	[17581.312712][T10916]  ? lru_cache_add+0x178/0x250
	[17581.314991][T10916]  ? btrfs_ioctl_get_supported_features+0x30/0x30
	[17581.318654][T10916]  ? lock_downgrade+0x3f0/0x3f0
	[17581.320972][T10916]  ? handle_mm_fault+0x159e/0x2150
	[17581.323839][T10916]  ? __kasan_check_read+0x11/0x20
	[17581.326814][T10916]  ? lock_release+0xc8/0x640
	[17581.329251][T10916]  ? do_user_addr_fault+0x299/0x5a0
	[17581.332161][T10916]  ? do_raw_spin_unlock+0xa8/0x140
	[17581.334825][T10916]  ? lock_downgrade+0x3f0/0x3f0
	[17581.337282][T10916]  ? _raw_spin_unlock+0x22/0x30
	[17581.339972][T10916]  ? handle_mm_fault+0xad6/0x2150
	[17581.342585][T10916]  ? do_vfs_ioctl+0xfc/0x9d0
	[17581.345189][T10916]  ? ioctl_file_clone+0xe0/0xe0
	[17581.348274][T10916]  ? __kasan_check_write+0x14/0x20
	[17581.350976][T10916]  ? up_read+0x176/0x4f0
	[17581.353281][T10916]  ? down_write_nested+0x2d0/0x2d0
	[17581.356263][T10916]  ? vmacache_find+0xc9/0x120
	[17581.358822][T10916]  ? __kasan_check_read+0x11/0x20
	[17581.361699][T10916]  ? __fget_light+0xae/0x110
	[17581.364329][T10916]  __x64_sys_ioctl+0xc3/0x100
	[17581.367294][T10916]  do_syscall_64+0x37/0x80
	[17581.370340][T10916]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
	[17581.373984][T10916] RIP: 0033:0x7f55e2a4e427
	[17581.376697][T10916] Code: 00 00 90 48 8b 05 69 aa 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 aa 0c 00 f7 d8 64 89 01 48
	[17581.388513][T10916] RSP: 002b:00007ffdb30a1438 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
	[17581.393564][T10916] RAX: ffffffffffffffda RBX: 00007ffdb30a14d8 RCX: 00007f55e2a4e427
	[17581.398315][T10916] RDX: 00007ffdb30a14d8 RSI: 00000000c4009420 RDI: 0000000000000003
	[17581.402711][T10916] RBP: 0000000000000003 R08: 0000000000000003 R09: 0000000000000078
	[17581.407348][T10916] R10: fffffffffffff59d R11: 0000000000000202 R12: 0000000000000001
	[17581.412074][T10916] R13: 0000000000000000 R14: 00007ffdb30a3a34 R15: 0000000000000001
	[17581.417008][T10916] Modules linked in:
	[17581.420470][T10916] ---[ end trace 2aee413c08ff01b0 ]---
	[17581.424847][T10916] RIP: 0010:assertfail+0x18/0x1a
	[17581.428219][T10916] Code: a0 46 aa fe 48 8b 3d a9 bd fd 02 e8 94 46 aa fe 5d c3 55 89 d1 48 89 f2 48 89 fe 48 c7 c7 00 20 24 96 48 89 e5 e8 74 36 fe ff <0f> 0b 48 89 df e8 00 2b b4 fe 48 8b 85 98 fe ff ff 44 89 ea 48 c7
	[17581.441219][T10916] RSP: 0018:ffffc90001757400 EFLAGS: 00010286
	[17581.445067][T10916] RAX: 000000000000003f RBX: ffff888012dabf00 RCX: ffffffff94265ba2
	[17581.448863][T10916] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff8881f5fff28c
	[17581.454668][T10916] RBP: ffffc90001757400 R08: ffffed103ec015b5 R09: ffffed103ec015b5
	[17581.459261][T10916] R10: ffff8881f600ada7 R11: ffffed103ec015b4 R12: ffff888012dab668
	[17581.464851][T10916] R13: 0000000000000004 R14: 00000000fffffffe R15: ffff888005f18f00
	[17581.468591][T10916] FS:  00007f55e295b8c0(0000) GS:ffff8881f5e00000(0000) knlGS:0000000000000000
	[17581.473633][T10916] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[17581.477009][T10916] CR2: 0000560ee9649da4 CR3: 00000000720d4005 CR4: 0000000000170ee0

> +			goto next;
> +		}
>  
>  		if (upper->eb && !upper->locked) {
>  			if (!lowest) {
> -- 
> 2.26.2
> 

  reply	other threads:[~2020-12-08 14:45 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-07 13:56 [PATCH v5 00/52] Cleanup error handling in relocation Josef Bacik
2020-12-07 13:56 ` [PATCH v5 01/52] btrfs: allow error injection for btrfs_search_slot and btrfs_cow_block Josef Bacik
2020-12-07 13:56 ` [PATCH v5 02/52] btrfs: modify the new_root highest_objectid under a ref count Josef Bacik
2020-12-08 13:51   ` Nikolay Borisov
2020-12-07 13:56 ` [PATCH v5 03/52] btrfs: fix lockdep splat in btrfs_recover_relocation Josef Bacik
2020-12-07 13:56 ` [PATCH v5 04/52] btrfs: keep track of the root owner for relocation reads Josef Bacik
2020-12-07 13:56 ` [PATCH v5 05/52] btrfs: noinline btrfs_should_cancel_balance Josef Bacik
2020-12-07 13:56 ` [PATCH v5 06/52] btrfs: do not cleanup upper nodes in btrfs_backref_cleanup_node Josef Bacik
2020-12-07 13:56 ` [PATCH v5 07/52] btrfs: pass down the tree block level through ref-verify Josef Bacik
2020-12-07 13:57 ` [PATCH v5 08/52] btrfs: make sure owner is set in ref-verify Josef Bacik
2020-12-07 13:57 ` [PATCH v5 09/52] btrfs: don't clear ret in btrfs_start_dirty_block_groups Josef Bacik
2020-12-07 13:57 ` [PATCH v5 10/52] btrfs: convert some BUG_ON()'s to ASSERT()'s in do_relocation Josef Bacik
2020-12-07 13:57 ` [PATCH v5 11/52] btrfs: convert BUG_ON()'s in relocate_tree_block Josef Bacik
2020-12-07 13:57 ` [PATCH v5 12/52] btrfs: return an error from btrfs_record_root_in_trans Josef Bacik
2020-12-07 13:57 ` [PATCH v5 13/52] btrfs: handle errors from select_reloc_root() Josef Bacik
2020-12-08 14:44   ` Zygo Blaxell [this message]
2020-12-07 13:57 ` [PATCH v5 14/52] btrfs: convert BUG_ON()'s in select_reloc_root() to proper errors Josef Bacik
2020-12-07 13:57 ` [PATCH v5 15/52] btrfs: check record_root_in_trans related failures in select_reloc_root Josef Bacik
2020-12-07 13:57 ` [PATCH v5 16/52] btrfs: do proper error handling in record_reloc_root_in_trans Josef Bacik
2020-12-07 13:57 ` [PATCH v5 17/52] btrfs: handle btrfs_record_root_in_trans failure in btrfs_rename_exchange Josef Bacik
2020-12-07 13:57 ` [PATCH v5 18/52] btrfs: handle btrfs_record_root_in_trans failure in btrfs_rename Josef Bacik
2020-12-07 13:57 ` [PATCH v5 19/52] btrfs: handle btrfs_record_root_in_trans failure in btrfs_delete_subvolume Josef Bacik
2020-12-07 13:57 ` [PATCH v5 20/52] btrfs: handle btrfs_record_root_in_trans failure in btrfs_recover_log_trees Josef Bacik
2020-12-07 13:57 ` [PATCH v5 21/52] btrfs: handle btrfs_record_root_in_trans failure in create_subvol Josef Bacik
2020-12-07 13:57 ` [PATCH v5 22/52] btrfs: btrfs: handle btrfs_record_root_in_trans failure in relocate_tree_block Josef Bacik
2020-12-07 13:57 ` [PATCH v5 23/52] btrfs: handle btrfs_record_root_in_trans failure in start_transaction Josef Bacik
2020-12-07 13:57 ` [PATCH v5 24/52] btrfs: handle record_root_in_trans failure in qgroup_account_snapshot Josef Bacik
2020-12-07 13:57 ` [PATCH v5 25/52] btrfs: handle record_root_in_trans failure in btrfs_record_root_in_trans Josef Bacik
2020-12-07 13:57 ` [PATCH v5 26/52] btrfs: handle record_root_in_trans failure in create_pending_snapshot Josef Bacik
2020-12-07 13:57 ` [PATCH v5 27/52] btrfs: do not panic in __add_reloc_root Josef Bacik
2020-12-08 13:53   ` Nikolay Borisov
2020-12-07 13:57 ` [PATCH v5 28/52] btrfs: have proper error handling in btrfs_init_reloc_root Josef Bacik
2020-12-07 13:57 ` [PATCH v5 29/52] btrfs: do proper error handling in create_reloc_root Josef Bacik
2020-12-07 13:57 ` [PATCH v5 30/52] btrfs: validate ->reloc_root after recording root in trans Josef Bacik
2020-12-07 13:57 ` [PATCH v5 31/52] btrfs: handle btrfs_update_reloc_root failure in commit_fs_roots Josef Bacik
2020-12-07 13:57 ` [PATCH v5 32/52] btrfs: change insert_dirty_subvol to return errors Josef Bacik
2020-12-07 13:57 ` [PATCH v5 33/52] btrfs: handle btrfs_update_reloc_root failure in insert_dirty_subvol Josef Bacik
2020-12-07 13:57 ` [PATCH v5 34/52] btrfs: handle btrfs_update_reloc_root failure in prepare_to_merge Josef Bacik
2020-12-07 13:57 ` [PATCH v5 35/52] btrfs: do proper error handling in btrfs_update_reloc_root Josef Bacik
2020-12-07 13:57 ` [PATCH v5 36/52] btrfs: convert logic BUG_ON()'s in replace_path to ASSERT()'s Josef Bacik
2020-12-07 13:57 ` [PATCH v5 37/52] btrfs: handle btrfs_cow_block errors in replace_path Josef Bacik
2020-12-07 13:57 ` [PATCH v5 38/52] btrfs: handle btrfs_search_slot failure " Josef Bacik
2020-12-07 13:57 ` [PATCH v5 39/52] btrfs: handle errors in reference count manipulation " Josef Bacik
2020-12-07 13:57 ` [PATCH v5 40/52] btrfs: handle extent reference errors in do_relocation Josef Bacik
2020-12-07 13:57 ` [PATCH v5 41/52] btrfs: check for BTRFS_BLOCK_FLAG_FULL_BACKREF being set improperly Josef Bacik
2020-12-07 13:57 ` [PATCH v5 42/52] btrfs: remove the extent item sanity checks in relocate_block_group Josef Bacik
2020-12-07 13:57 ` [PATCH v5 43/52] btrfs: do proper error handling in create_reloc_inode Josef Bacik
2020-12-07 13:57 ` [PATCH v5 44/52] btrfs: handle __add_reloc_root failures in btrfs_recover_relocation Josef Bacik
2020-12-07 13:57 ` [PATCH v5 45/52] btrfs: cleanup error handling in prepare_to_merge Josef Bacik
2020-12-07 13:57 ` [PATCH v5 46/52] btrfs: handle extent corruption with select_one_root properly Josef Bacik
2020-12-07 13:57 ` [PATCH v5 47/52] btrfs: do proper error handling in merge_reloc_roots Josef Bacik
2020-12-07 13:57 ` [PATCH v5 48/52] btrfs: check return value of btrfs_commit_transaction in relocation Josef Bacik
2020-12-07 13:57 ` [PATCH v5 49/52] btrfs: do not WARN_ON() if we can't find the reloc root Josef Bacik
2020-12-07 13:57 ` [PATCH v5 50/52] btrfs: print the actual offset in btrfs_root_name Josef Bacik
2020-12-07 13:57 ` [PATCH v5 51/52] btrfs: fix reloc root leak with 0 ref reloc roots on recovery Josef Bacik
2020-12-07 13:57 ` [PATCH v5 52/52] btrfs: splice remaining dirty_bg's onto the transaction dirty bg list Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201208144446.GH31381@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.