linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Filipe Manana <fdmanana@gmail.com>
To: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Cc: David Sterba <dsterba@suse.com>,
	linux-btrfs <linux-btrfs@vger.kernel.org>,
	Josef Bacik <josef@toxicpanda.com>,
	Naohiro Aota <Naohiro.Aota@wdc.com>,
	Filipe Manana <fdmanana@suse.com>,
	Anand Jain <anand.jain@oracle.com>
Subject: Re: [PATCH v3 1/3] btrfs: discard relocated block groups
Date: Mon, 12 Apr 2021 15:08:21 +0100	[thread overview]
Message-ID: <CAL3q7H55vudYBNFGHWWuWCaeMLuVm8HjbBsmTYD7KQP_dKEKOQ@mail.gmail.com> (raw)
In-Reply-To: <PH0PR04MB741605A3689AA581ABC6CF3E9B709@PH0PR04MB7416.namprd04.prod.outlook.com>

On Mon, Apr 12, 2021 at 2:49 PM Johannes Thumshirn
<Johannes.Thumshirn@wdc.com> wrote:
>
> On 09/04/2021 13:37, Filipe Manana wrote:
> > On Fri, Apr 9, 2021 at 11:54 AM Johannes Thumshirn
> > <johannes.thumshirn@wdc.com> wrote:
> >>
> >> When relocating a block group the freed up space is not discarded. On
> >> devices like SSDs this hint is useful to tell the device the space is
> >> freed now. On zoned block devices btrfs' discard code will reset the zone
> >> the block group is on, freeing up the occupied space.
> >>
> >> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> >> ---
> >>  fs/btrfs/volumes.c | 13 +++++++++++++
> >>  1 file changed, 13 insertions(+)
> >>
> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> >> index 6d9b2369f17a..d9ef8bce0cde 100644
> >> --- a/fs/btrfs/volumes.c
> >> +++ b/fs/btrfs/volumes.c
> >> @@ -3103,6 +3103,10 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
> >>         struct btrfs_root *root = fs_info->chunk_root;
> >>         struct btrfs_trans_handle *trans;
> >>         struct btrfs_block_group *block_group;
> >> +       const bool trim = btrfs_is_zoned(fs_info) ||
> >> +                               btrfs_test_opt(fs_info, DISCARD_SYNC);
> >> +       u64 trimmed;
> >> +       u64 length;
> >>         int ret;
> >>
> >>         /*
> >> @@ -3130,6 +3134,7 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
> >>         if (!block_group)
> >>                 return -ENOENT;
> >>         btrfs_discard_cancel_work(&fs_info->discard_ctl, block_group);
> >> +       length = block_group->length;
> >>         btrfs_put_block_group(block_group);
> >>
> >>         trans = btrfs_start_trans_remove_block_group(root->fs_info,
> >> @@ -3144,6 +3149,14 @@ static int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset)
> >>          * step two, delete the device extents and the
> >>          * chunk tree entries
> >>          */
> >> +       if (trim) {
> >> +               ret = btrfs_discard_extent(fs_info, chunk_offset, length,
> >> +                                          &trimmed);
> >
> > Ideally we do IO and potentially slow operations such as discard while
> > not holding a transaction handle open, to avoid slowing down anyone
> > trying to commit the transaction.
> >
> >> +               if (ret) {
> >> +                       btrfs_abort_transaction(trans, ret);
> >
> > This is leaking the transaction, still need to call btrfs_end_transaction().
> >
> > Making the discard before joining/starting transaction, would fix both things.
>
> Note taken.
>
> >
> > Now more importantly, I don't see why the freed space isn't already
> > discarded at this point.
> > Relocation creates delayed references to drop extents from the block
> > group, and when the delayed references are run, we pin the extents
> > through:
> >
> > __btrfs_free_extent() -> btrfs_update_block_group()
> >
> > Then at the very end of the transaction commit, we discard pinned
> > extents and then unpin them, at btrfs_finish_extent_commit().
> > Relocation commits the transaction at the end of
> > relocate_block_group(), so the delayed references are fun, and then we
> > discard and unpin their extents.
> > So that's why I don't get it why the discard is needed here (at least
> > when we are not on a zoned filesystem).
>
> This is a good point to investigate, but as of now, the zone isn't reset.
> Zone reset handling in btrfs is piggy backed onto discard handling, but
> from testing the patchset I see the zone isn't reset:
>
> [   81.014752] BTRFS info (device nullb0): reclaiming chunk 1073741824
> [   81.015732] BTRFS info (device nullb0): relocating block group 1073741824 flags data
> [   81.798090] BTRFS info (device nullb0): found 12452 extents, stage: move data extents
> [   82.314562] BTRFS info (device nullb0): found 12452 extents, stage: update data pointers
> # blkzone report -o $((1073741824 >> 9)) -c 1 /dev/nullb0
>   start: 0x000200000, len 0x080000, cap 0x080000, wptr 0x0799a0 reset:0 non-seq:0, zcond: 2(oi) [type: 2(SEQ_WRITE_REQUIRED)]

Not familiar with zoned device details, but what you are passing to
blkzone is a logical address, in non-zoned btrfs it does not always
matches the physical address on disk.
So maybe that is not a reliable way to check it.

>
> Whereas when the this patch is applied as well:
> [   85.126542] BTRFS info (device nullb0): reclaiming chunk 1073741824
> [   85.128211] BTRFS info (device nullb0): relocating block group 1073741824 flags data
> [   86.061831] BTRFS info (device nullb0): found 12452 extents, stage: move data extents
> [   86.766549] BTRFS info (device nullb0): found 12452 extents, stage: update data pointers
> # blkzone report -c 1 -o $((1073741824 >> 9)) /dev/nullb0
>   start: 0x000200000, len 0x080000, cap 0x080000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
>
> As a positive side effect of this, I now have code that can be used in
> xfstests to verify the patchset.

Ok.
Maybe the zone isn't reset properly because the default mechanism is
doing smaller discards, on a per extent basis, and perhaps the order
of those discards matters?

If it affects only the zoned case, I also don't see why doing it when
not in zoned mode (and -o discard=sync is set).
Unless of course the default discard mechanism is broken somehow, in
which case we need to find out why and fix it.

Thanks.

>
> Byte,
>         Johannes



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

  reply	other threads:[~2021-04-12 14:08 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-09 10:53 [PATCH v3 0/3] btrfs: zoned: automatic BG reclaim Johannes Thumshirn
2021-04-09 10:53 ` [PATCH v3 1/3] btrfs: discard relocated block groups Johannes Thumshirn
2021-04-09 11:37   ` Filipe Manana
2021-04-12 13:49     ` Johannes Thumshirn
2021-04-12 14:08       ` Filipe Manana [this message]
2021-04-12 14:21         ` Johannes Thumshirn
2021-04-13 12:43           ` Johannes Thumshirn
2021-04-13 12:57             ` Filipe Manana
2021-04-13 17:48               ` Johannes Thumshirn
2021-04-14 11:16                 ` Filipe Manana
2021-04-14 11:22                   ` Johannes Thumshirn
2021-04-14 11:32                     ` Filipe Manana
2021-04-14 12:59                     ` Johannes Thumshirn
2021-04-14 13:13                       ` Filipe Manana
2021-04-09 10:53 ` [PATCH v3 2/3] btrfs: rename delete_unused_bgs_mutex Johannes Thumshirn
2021-04-09 10:53 ` [PATCH v3 3/3] btrfs: zoned: automatically reclaim zones Johannes Thumshirn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAL3q7H55vudYBNFGHWWuWCaeMLuVm8HjbBsmTYD7KQP_dKEKOQ@mail.gmail.com \
    --to=fdmanana@gmail.com \
    --cc=Johannes.Thumshirn@wdc.com \
    --cc=Naohiro.Aota@wdc.com \
    --cc=anand.jain@oracle.com \
    --cc=dsterba@suse.com \
    --cc=fdmanana@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).