From: Josef Bacik <josef@toxicpanda.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: Qu Wenruo <wqu@suse.com>,
linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 7/8] btrfs: add code to support the block group root
Date: Wed, 10 Nov 2021 08:54:38 -0500 [thread overview]
Message-ID: <YYvPHv9dxZKFlraB@localhost.localdomain> (raw)
In-Reply-To: <e58230c4-1536-dca5-7e1c-1b6a4a0321bb@gmx.com>
On Wed, Nov 10, 2021 at 03:13:37PM +0800, Qu Wenruo wrote:
>
>
> On 2021/11/10 03:24, Josef Bacik wrote:
> > On Tue, Nov 09, 2021 at 09:14:06AM +0800, Qu Wenruo wrote:
> > >
> > >
> > > On 2021/11/9 03:36, Josef Bacik wrote:
> > > > On Sat, Nov 06, 2021 at 09:11:44AM +0800, Qu Wenruo wrote:
> > > > >
> > > > >
> > > > > On 2021/11/6 04:49, Josef Bacik wrote:
> > > > > > This code adds the on disk structures for the block group root, which
> > > > > > will hold the block group items for extent tree v2.
> > > > > >
> > > > > > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > > > > > ---
> > > > > > fs/btrfs/ctree.h | 26 ++++++++++++++++-
> > > > > > fs/btrfs/disk-io.c | 49 ++++++++++++++++++++++++++++-----
> > > > > > fs/btrfs/disk-io.h | 2 ++
> > > > > > fs/btrfs/print-tree.c | 1 +
> > > > > > include/trace/events/btrfs.h | 1 +
> > > > > > include/uapi/linux/btrfs_tree.h | 3 ++
> > > > > > 6 files changed, 74 insertions(+), 8 deletions(-)
> > > > > >
> > > > > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> > > > > > index 8ec2f068a1c2..b57367141b95 100644
> > > > > > --- a/fs/btrfs/ctree.h
> > > > > > +++ b/fs/btrfs/ctree.h
> > > > > > @@ -271,8 +271,13 @@ struct btrfs_super_block {
> > > > > > /* the UUID written into btree blocks */
> > > > > > u8 metadata_uuid[BTRFS_FSID_SIZE];
> > > > > >
> > > > > > + __le64 block_group_root;
> > > > > > + __le64 block_group_root_generation;
> > > > > > + u8 block_group_root_level;
> > > > > > +
> > > > >
> > > > > Is there any special reason that, block group root can't be put into
> > > > > root tree?
> > > > >
> > > >
> > > > Yes, I'm so glad you asked!
> > > >
> > > > One of the planned changes with extent-tree-v2 is how we do relocation. With no
> > > > longer being able to track metadata in the extent tree, relocation becomes much
> > > > more of a pain in the ass.
> > >
> > > I'm even surprised that relocation can even be done without proper metadata
> > > tracking in the new extent tree(s).
> > >
> > > >
> > > > In addition, relocation currently has a pretty big problem, it can generate
> > > > unlimited delayed refs because it absolutely has to update all paths that point
> > > > to a relocated block in a single transaction.
> > >
> > > Yep, that's also the biggest problem I attacked for the qgroup balance
> > > optimization.
> > >
> > > >
> > > > I'm fixing both of these problems with a new relocation thing, which will walk
> > > > through a block group, copy those extents to a new block group, and then update
> > > > a tree that maps the old logical address to the new logical address.
> > >
> > > That sounds like the proposal from Johannes for zoned support of RAID56.
> > > An FTL-like layer.
> > >
> > > But I'm still not sure how we could even get all the tree blocks in one
> > > block group in the first place, as there is no longer backref in the extent
> > > tree(s).
> > >
> > > By iterating all tree blocks? That doesn't sound sane to me...
> > >
> >
> > No, iterating the free areas in the free space tree. We no longer care about
> > the metadata itself, just the space that is utilized in the block group. We
> > will mark the block group as read only, search through the free space tree for
> > that block group to find extents, copy them to new locations, insert a mapping
> > object for that block group to say "X range is now at Y".
> >
> > As extent's are free'd their new respective ranges are freed. Once a relocated
> > block groups ->used hits 0 its mapping items are deleted.
> >
> > > >
> > > > Because of this we could end up with blocks in the tree root that need to be
> > > > remapped from a relocated block group into a new block group. Thus we need to
> > > > be able to know what that mapping is before we go read the tree root. This
> > > > means we have to store the block group root (and the new mapping root I'll
> > > > introduce later) in the super block.
> > >
> > > Wouldn't the new mapping root becoming a new bottleneck then?
> > >
> > > If we relocate the full fs, then the mapping root (block group root) would
> > > be no different than an old extent tree?
> > >
> > > Especially the mapping is done in extent level, not chunk level, thus it can
> > > cause tons of mapping entries, really not that better than old extent tree
> > > then.
> > >
> >
> > Except the problem with the old extent tree is we are constantly modifying it.
>
> I have another question related to this block group tree.
>
> AFAIK your new extent-tree-v2 will greatly reduce the amount of extent
> items by:
>
> - Skip all backref items for global trees
>
> - Skip backref items for non-shared subvolumes
> As they act just like global trees (until being snapshotted).
>
> I'm wondering if above modification is enough to make extent tree so
> cold that we don't even need block group tree?
>
We need it separate still because we need to get at it from the super block in
order to pre-load it so we can load the mapping tree in order to do the
logical->logical translation for the new relocation scheme.
Also the extent tree is still going to have data backrefs, so we'll still end up
with a huge spread. Thanks,
Josef
next prev parent reply other threads:[~2021-11-10 13:54 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-05 20:49 [PATCH 0/8] btrfs: extent tree v2, support for global roots Josef Bacik
2021-11-05 20:49 ` [PATCH 1/8] btrfs: add definition for EXTENT_TREE_V2 Josef Bacik
2021-11-05 20:49 ` [PATCH 2/8] btrfs: disable balance for extent tree v2 for now Josef Bacik
2021-11-05 20:49 ` [PATCH 3/8] btrfs: disable qgroups in extent tree v2 Josef Bacik
2021-11-05 20:49 ` [PATCH 4/8] btrfs: use metadata usage for global block rsv " Josef Bacik
2021-11-05 20:49 ` [PATCH 5/8] btrfs: tree-checker: don't fail on empty extent roots for " Josef Bacik
2021-11-06 1:05 ` Qu Wenruo
2021-11-05 20:49 ` [PATCH 6/8] btrfs: abstract out loading the tree root Josef Bacik
2021-11-05 20:49 ` [PATCH 7/8] btrfs: add code to support the block group root Josef Bacik
2021-11-06 1:11 ` Qu Wenruo
2021-11-08 19:36 ` Josef Bacik
2021-11-09 1:14 ` Qu Wenruo
2021-11-09 19:24 ` Josef Bacik
2021-11-09 23:44 ` Qu Wenruo
2021-11-10 13:57 ` Josef Bacik
2021-11-10 7:13 ` Qu Wenruo
2021-11-10 13:54 ` Josef Bacik [this message]
2021-11-05 20:49 ` [PATCH 8/8] btrfs: add support for multiple global roots Josef Bacik
2021-11-06 1:18 ` Qu Wenruo
2021-11-06 1:51 ` Qu Wenruo
2021-11-08 19:39 ` Josef Bacik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YYvPHv9dxZKFlraB@localhost.localdomain \
--to=josef@toxicpanda.com \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=quwenruo.btrfs@gmx.com \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox