Re: [PATCH v3 1/3] btrfs: discard relocated block groups

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Filipe Manana <fdmanana@gmail.com>
To: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Cc: David Sterba <dsterba@suse.com>,
	linux-btrfs <linux-btrfs@vger.kernel.org>,
	Josef Bacik <josef@toxicpanda.com>,
	Naohiro Aota <Naohiro.Aota@wdc.com>,
	Filipe Manana <fdmanana@suse.com>,
	Anand Jain <anand.jain@oracle.com>
Subject: Re: [PATCH v3 1/3] btrfs: discard relocated block groups
Date: Wed, 14 Apr 2021 12:16:31 +0100	[thread overview]
Message-ID: <CAL3q7H6Bgqkdf8Z+xRBH8C=XxtrGzXyNUf6BHaLw54LZb3Agsg@mail.gmail.com> (raw)
In-Reply-To: <PH0PR04MB74167FB19522DBEB1F70E80D9B4F9@PH0PR04MB7416.namprd04.prod.outlook.com>

On Tue, Apr 13, 2021 at 6:48 PM Johannes Thumshirn
<Johannes.Thumshirn@wdc.com> wrote:
>
> On 13/04/2021 14:57, Filipe Manana wrote:
> > And what about the other mechanism that triggers discards on pinned
> > extents, after the transaction commits the super blocks?
> > Why isn't that happening (with -o discard=sync)? We create the delayed
> > references to drop extents from the relocated block group, which
> > results in pinning extents.
> > This is the case that surprised me that it isn't working for you.
>
> I think this is the case. I would have expected to end up in this
> part of btrfs_finish_extent_commit():
>
>
>         /*
>          * Transaction is finished.  We don't need the lock anymore.  We
>          * do need to clean up the block groups in case of a transaction
>          * abort.
>          */
>         deleted_bgs = &trans->transaction->deleted_bgs;
>         list_for_each_entry_safe(block_group, tmp, deleted_bgs, bg_list) {
>                 u64 trimmed = 0;
>
>                 ret = -EROFS;
>                 if (!TRANS_ABORTED(trans))
>                         ret = btrfs_discard_extent(fs_info,
>                                                    block_group->start,
>                                                    block_group->length,
>                                                    &trimmed);
>
>                 list_del_init(&block_group->bg_list);
>                 btrfs_unfreeze_block_group(block_group);
>                 btrfs_put_block_group(block_group);
>
>                 if (ret) {
>                         const char *errstr = btrfs_decode_error(ret);
>                         btrfs_warn(fs_info,
>                            "discard failed while removing blockgroup: errno=%d %s",
>                                    ret, errstr);
>                 }
>         }
>
> and the btrfs_discard_extent() over the whole block group would then trigger a
> REQ_OP_ZONE_RESET operation, resetting the device's zone.
>
> But as btrfs_delete_unused_bgs() doesn't add the block group to the
> ->deleted_bgs list, we're not reaching above code. I /think/ (i.e. verification
> pending) the -o discard=sync case works for regular block devices, as each extent
> is discarded on it's own, by this (also in btrfs_finish_extent_commit()):
>
>         while (!TRANS_ABORTED(trans)) {
>                 struct extent_state *cached_state = NULL;
>
>                 mutex_lock(&fs_info->unused_bg_unpin_mutex);
>                 ret = find_first_extent_bit(unpin, 0, &start, &end,
>                                             EXTENT_DIRTY, &cached_state);
>                 if (ret) {
>                         mutex_unlock(&fs_info->unused_bg_unpin_mutex);
>                         break;
>                 }
>
>                 if (btrfs_test_opt(fs_info, DISCARD_SYNC))
>                         ret = btrfs_discard_extent(fs_info, start,
>                                                    end + 1 - start, NULL);
>
>                 clear_extent_dirty(unpin, start, end, &cached_state);
>                 unpin_extent_range(fs_info, start, end, true);
>                 mutex_unlock(&fs_info->unused_bg_unpin_mutex);
>                 free_extent_state(cached_state);
>                 cond_resched();
>         }
>
> If this is the case, my patch will essentially discard the data twice, for a
> non-zoned block device, which is certainly not ideal.

Yep, that's what puzzled me, why the need to do it for non-zoned file
systems when using -o discard=sync.
I assumed you ran into a case where discard was not happening due to
some bug bug in the extent pinning/unpinning mechanism.

> So the correct fix would
> be to get the block group into the 'trans->transaction->deleted_bgs' list
> after relocation, which would work if we wouldn't check for block_group->ro in
> btrfs_delete_unused_bgs(), but I suppose this check is there for a reason.

Actually the check for ->ro does not make sense anymore since I
introduced the delete_unused_bgs_mutex in commit
67c5e7d464bc466471b05e027abe8a6b29687ebd.

When the ->ro check was added
(47ab2a6c689913db23ccae38349714edf8365e0a), it was meant to prevent
the cleaner kthread and relocation tasks from calling
btrfs_remove_chunk() concurrently, but checking for ->ro only was
buggy, hence the addition of delete_unused_bgs_mutex later.

>
> How about changing the patch to the following:

Looks good.
However would just removing the ->ro check by enough as well?

Thanks Johannes.

>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 6d9b2369f17a..ba13b2ea3c6f 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -3103,6 +3103,9 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
>         struct btrfs_root *root = fs_info->chunk_root;
>         struct btrfs_trans_handle *trans;
>         struct btrfs_block_group *block_group;
> +       u64 length;
>         int ret;
>
>         /*
> @@ -3130,8 +3133,16 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
>         if (!block_group)
>                 return -ENOENT;
>         btrfs_discard_cancel_work(&fs_info->discard_ctl, block_group);
> +       length = block_group->length;
>         btrfs_put_block_group(block_group);
>
> +       /*
> +        * For a zoned filesystem we need to discard/zone-reset here, as the
> +        * discard code won't discard the whole block-group, but only single
> +        * extents.
> +        */
> +       if (btrfs_is_zoned(fs_info)) {
> +               ret = btrfs_discard_extent(fs_info, chunk_offset, length, NULL);
> +               if (ret) /* Non working discard is not fatal */
> +                       btrfs_warn(fs_info, "discarding chunk %llu failed",
> +                                  chunk_offset);
> +       }
> +
>         trans = btrfs_start_trans_remove_block_group(root->fs_info,
>                                                      chunk_offset);
>         if (IS_ERR(trans)) {



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

next prev parent reply	other threads:[~2021-04-14 11:16 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-09 10:53 [PATCH v3 0/3] btrfs: zoned: automatic BG reclaim Johannes Thumshirn
2021-04-09 10:53 ` [PATCH v3 1/3] btrfs: discard relocated block groups Johannes Thumshirn
2021-04-09 11:37   ` Filipe Manana
2021-04-12 13:49     ` Johannes Thumshirn
2021-04-12 14:08       ` Filipe Manana
2021-04-12 14:21         ` Johannes Thumshirn
2021-04-13 12:43           ` Johannes Thumshirn
2021-04-13 12:57             ` Filipe Manana
2021-04-13 17:48               ` Johannes Thumshirn
2021-04-14 11:16                 ` Filipe Manana [this message]
2021-04-14 11:22                   ` Johannes Thumshirn
2021-04-14 11:32                     ` Filipe Manana
2021-04-14 12:59                     ` Johannes Thumshirn
2021-04-14 13:13                       ` Filipe Manana
2021-04-09 10:53 ` [PATCH v3 2/3] btrfs: rename delete_unused_bgs_mutex Johannes Thumshirn
2021-04-09 10:53 ` [PATCH v3 3/3] btrfs: zoned: automatically reclaim zones Johannes Thumshirn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAL3q7H6Bgqkdf8Z+xRBH8C=XxtrGzXyNUf6BHaLw54LZb3Agsg@mail.gmail.com' \
    --to=fdmanana@gmail.com \
    --cc=Johannes.Thumshirn@wdc.com \
    --cc=Naohiro.Aota@wdc.com \
    --cc=anand.jain@oracle.com \
    --cc=dsterba@suse.com \
    --cc=fdmanana@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).