public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Filipe Manana <fdmanana@kernel.org>
To: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 1/3] btrfs: clean deleted snapshots on mount
Date: Thu, 17 Feb 2022 11:53:41 +0000	[thread overview]
Message-ID: <Yg43Rdbh3CNSzzEB@debian9.Home> (raw)
In-Reply-To: <12c9eec886e4544b71389770a069c90fac0401b9.1645038250.git.josef@toxicpanda.com>

On Wed, Feb 16, 2022 at 02:06:53PM -0500, Josef Bacik wrote:
> We hit a bug with a recovering relocation on mount for one of our file
> systems in production.  I reproduced this locally by injecting errors
> into snapshot delete with balance running at the same time.  This
> presented as an error while looking up an extent item
> 
> ------------[ cut here ]------------
> WARNING: CPU: 5 PID: 1501 at fs/btrfs/extent-tree.c:866 lookup_inline_extent_backref+0x647/0x680
> CPU: 5 PID: 1501 Comm: btrfs-balance Not tainted 5.16.0-rc8+ #8
> RIP: 0010:lookup_inline_extent_backref+0x647/0x680
> RSP: 0018:ffffae0a023ab960 EFLAGS: 00010202
> RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 000000000000000c RDI: 0000000000000000
> RBP: ffff943fd2a39b60 R08: 0000000000000000 R09: 0000000000000001
> R10: 0001434088152de0 R11: 0000000000000000 R12: 0000000001d05000
> R13: ffff943fd2a39b60 R14: ffff943fdb96f2a0 R15: ffff9442fc923000
> FS:  0000000000000000(0000) GS:ffff944e9eb40000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f1157b1fca8 CR3: 000000010f092000 CR4: 0000000000350ee0
> Call Trace:
>  <TASK>
>  insert_inline_extent_backref+0x46/0xd0
>  __btrfs_inc_extent_ref.isra.0+0x5f/0x200
>  ? btrfs_merge_delayed_refs+0x164/0x190
>  __btrfs_run_delayed_refs+0x561/0xfa0
>  ? btrfs_search_slot+0x7b4/0xb30
>  ? btrfs_update_root+0x1a9/0x2c0
>  btrfs_run_delayed_refs+0x73/0x1f0
>  ? btrfs_update_root+0x1a9/0x2c0
>  btrfs_commit_transaction+0x50/0xa50
>  ? btrfs_update_reloc_root+0x122/0x220
>  prepare_to_merge+0x29f/0x320
>  relocate_block_group+0x2b8/0x550
>  btrfs_relocate_block_group+0x1a6/0x350
>  btrfs_relocate_chunk+0x27/0xe0
>  btrfs_balance+0x777/0xe60
>  balance_kthread+0x35/0x50
>  ? btrfs_balance+0xe60/0xe60
>  kthread+0x16b/0x190
>  ? set_kthread_struct+0x40/0x40
>  ret_from_fork+0x22/0x30
>  </TASK>
> ---[ end trace 7ebc95131709d2b0 ]---
> 
> Normally snapshot deletion and relocation are excluded from running at
> the same time by the fs_info->cleaner_mutex.  However if we had a
> pending balance waiting to get the ->cleaner_mutex, and a snapshot
> deletion was running, and then the box crashed, we would come up in a
> state where we have a half deleted snapshot.
> 
> Again, in the normal case the snapshot deletion needs to complete before
> relocation can start, but in this case relocation could very well start
> before the snapshot deletion completes, as we simply add the root to the
> dead roots list and wait for the next time the cleaner runs to clean up
> the snapshot.
> 
> Fix this by simply cleaning all of the deleted snapshots at mount time,
> before we start messing with relocation.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/disk-io.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index b6a81c39d5f4..ae8c201070f2 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3380,6 +3380,19 @@ int btrfs_start_pre_rw_mount(struct btrfs_fs_info *fs_info)
>  	up_read(&fs_info->cleanup_work_sem);
>  
>  	mutex_lock(&fs_info->cleaner_mutex);
> +	/*
> +	 * If there are any DEAD_TREE's we must recover them here, otherwise we
> +	 * could re-start relocation and attempt to relocate blocks that are
> +	 * within a half-deleted snapshot.  Under normal operations we can't run
> +	 * relocation and snapshot delete at the same time, however if we had a
> +	 * snapshot deletion happening prior to this mount there's no way to
> +	 * guarantee that the deletion will start before we re-start (or a user
> +	 * starts) the relocation.  So do the cleanup here in order to prevent
> +	 * problems.
> +	 */
> +	while (btrfs_clean_one_deleted_snapshot(fs_info->tree_root))
> +		cond_resched();

Unless I missed something subtle, this deletes the snapshots synchronously at
mount time, meaning that if we have many and/or large snapshots to delete,
this will slowdown a mount for a long time. And this will happen even if
the above scenario did not happen.

We already have people often report slow mount times (several minutes) likely
due to large extent trees and many block groups, so synchronously deleting
snapshots will make things a lot more slower. Such an increase in mount time
is bad for user experience.

Can we do this asynchronously, still ensuring that balance is not resumed before
all snapshots are deleted? Or at the very least do this synchonously only if we
a have relocation to resume - as this is right now, it will do the snapshot
deletion even if we have no relocation to resume.

Thanks.


> +
>  	ret = btrfs_recover_relocation(fs_info->tree_root);
>  	mutex_unlock(&fs_info->cleaner_mutex);
>  	if (ret < 0) {
> -- 
> 2.26.3
> 

  reply	other threads:[~2022-02-17 11:53 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-16 19:06 [PATCH 0/3] btrfs: fix problem with balance recovery and snap delete Josef Bacik
2022-02-16 19:06 ` [PATCH 1/3] btrfs: clean deleted snapshots on mount Josef Bacik
2022-02-17 11:53   ` Filipe Manana [this message]
2022-02-16 19:06 ` [PATCH 2/3] btrfs: use btrfs_fs_info for deleting snapshots and cleaner Josef Bacik
2022-02-16 19:06 ` [PATCH 3/3] btrfs: pass btrfs_fs_info to btrfs_recover_relocation Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yg43Rdbh3CNSzzEB@debian9.Home \
    --to=fdmanana@kernel.org \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox