All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: Qu Wenruo <wqu@suse.com>,
	linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 7/8] btrfs: add code to support the block group root
Date: Wed, 10 Nov 2021 08:54:38 -0500	[thread overview]
Message-ID: <YYvPHv9dxZKFlraB@localhost.localdomain> (raw)
In-Reply-To: <e58230c4-1536-dca5-7e1c-1b6a4a0321bb@gmx.com>

On Wed, Nov 10, 2021 at 03:13:37PM +0800, Qu Wenruo wrote:
> 
> 
> On 2021/11/10 03:24, Josef Bacik wrote:
> > On Tue, Nov 09, 2021 at 09:14:06AM +0800, Qu Wenruo wrote:
> > > 
> > > 
> > > On 2021/11/9 03:36, Josef Bacik wrote:
> > > > On Sat, Nov 06, 2021 at 09:11:44AM +0800, Qu Wenruo wrote:
> > > > > 
> > > > > 
> > > > > On 2021/11/6 04:49, Josef Bacik wrote:
> > > > > > This code adds the on disk structures for the block group root, which
> > > > > > will hold the block group items for extent tree v2.
> > > > > > 
> > > > > > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > > > > > ---
> > > > > >     fs/btrfs/ctree.h                | 26 ++++++++++++++++-
> > > > > >     fs/btrfs/disk-io.c              | 49 ++++++++++++++++++++++++++++-----
> > > > > >     fs/btrfs/disk-io.h              |  2 ++
> > > > > >     fs/btrfs/print-tree.c           |  1 +
> > > > > >     include/trace/events/btrfs.h    |  1 +
> > > > > >     include/uapi/linux/btrfs_tree.h |  3 ++
> > > > > >     6 files changed, 74 insertions(+), 8 deletions(-)
> > > > > > 
> > > > > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> > > > > > index 8ec2f068a1c2..b57367141b95 100644
> > > > > > --- a/fs/btrfs/ctree.h
> > > > > > +++ b/fs/btrfs/ctree.h
> > > > > > @@ -271,8 +271,13 @@ struct btrfs_super_block {
> > > > > >     	/* the UUID written into btree blocks */
> > > > > >     	u8 metadata_uuid[BTRFS_FSID_SIZE];
> > > > > > 
> > > > > > +	__le64 block_group_root;
> > > > > > +	__le64 block_group_root_generation;
> > > > > > +	u8 block_group_root_level;
> > > > > > +
> > > > > 
> > > > > Is there any special reason that, block group root can't be put into
> > > > > root tree?
> > > > > 
> > > > 
> > > > Yes, I'm so glad you asked!
> > > > 
> > > > One of the planned changes with extent-tree-v2 is how we do relocation.  With no
> > > > longer being able to track metadata in the extent tree, relocation becomes much
> > > > more of a pain in the ass.
> > > 
> > > I'm even surprised that relocation can even be done without proper metadata
> > > tracking in the new extent tree(s).
> > > 
> > > > 
> > > > In addition, relocation currently has a pretty big problem, it can generate
> > > > unlimited delayed refs because it absolutely has to update all paths that point
> > > > to a relocated block in a single transaction.
> > > 
> > > Yep, that's also the biggest problem I attacked for the qgroup balance
> > > optimization.
> > > 
> > > > 
> > > > I'm fixing both of these problems with a new relocation thing, which will walk
> > > > through a block group, copy those extents to a new block group, and then update
> > > > a tree that maps the old logical address to the new logical address.
> > > 
> > > That sounds like the proposal from Johannes for zoned support of RAID56.
> > > An FTL-like layer.
> > > 
> > > But I'm still not sure how we could even get all the tree blocks in one
> > > block group in the first place, as there is no longer backref in the extent
> > > tree(s).
> > > 
> > > By iterating all tree blocks? That doesn't sound sane to me...
> > > 
> > 
> > No, iterating the free areas in the free space tree.  We no longer care about
> > the metadata itself, just the space that is utilized in the block group.  We
> > will mark the block group as read only, search through the free space tree for
> > that block group to find extents, copy them to new locations, insert a mapping
> > object for that block group to say "X range is now at Y".
> > 
> > As extent's are free'd their new respective ranges are freed.  Once a relocated
> > block groups ->used hits 0 its mapping items are deleted.
> > 
> > > > 
> > > > Because of this we could end up with blocks in the tree root that need to be
> > > > remapped from a relocated block group into a new block group.  Thus we need to
> > > > be able to know what that mapping is before we go read the tree root.  This
> > > > means we have to store the block group root (and the new mapping root I'll
> > > > introduce later) in the super block.
> > > 
> > > Wouldn't the new mapping root becoming a new bottleneck then?
> > > 
> > > If we relocate the full fs, then the mapping root (block group root) would
> > > be no different than an old extent tree?
> > > 
> > > Especially the mapping is done in extent level, not chunk level, thus it can
> > > cause tons of mapping entries, really not that better than old extent tree
> > > then.
> > > 
> > 
> > Except the problem with the old extent tree is we are constantly modifying it.
> 
> I have another question related to this block group tree.
> 
> AFAIK your new extent-tree-v2 will greatly reduce the amount of extent
> items by:
> 
> - Skip all backref items for global trees
> 
> - Skip backref items for non-shared subvolumes
>   As they act just like global trees (until being snapshotted).
> 
> I'm wondering if above modification is enough to make extent tree so
> cold that we don't even need block group tree?
> 

We need it separate still because we need to get at it from the super block in
order to pre-load it so we can load the mapping tree in order to do the
logical->logical translation for the new relocation scheme.

Also the extent tree is still going to have data backrefs, so we'll still end up
with a huge spread.  Thanks,

Josef

  reply	other threads:[~2021-11-10 13:54 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-05 20:49 [PATCH 0/8] btrfs: extent tree v2, support for global roots Josef Bacik
2021-11-05 20:49 ` [PATCH 1/8] btrfs: add definition for EXTENT_TREE_V2 Josef Bacik
2021-11-05 20:49 ` [PATCH 2/8] btrfs: disable balance for extent tree v2 for now Josef Bacik
2021-11-05 20:49 ` [PATCH 3/8] btrfs: disable qgroups in extent tree v2 Josef Bacik
2021-11-05 20:49 ` [PATCH 4/8] btrfs: use metadata usage for global block rsv " Josef Bacik
2021-11-05 20:49 ` [PATCH 5/8] btrfs: tree-checker: don't fail on empty extent roots for " Josef Bacik
2021-11-06  1:05   ` Qu Wenruo
2021-11-05 20:49 ` [PATCH 6/8] btrfs: abstract out loading the tree root Josef Bacik
2021-11-05 20:49 ` [PATCH 7/8] btrfs: add code to support the block group root Josef Bacik
2021-11-06  1:11   ` Qu Wenruo
2021-11-08 19:36     ` Josef Bacik
2021-11-09  1:14       ` Qu Wenruo
2021-11-09 19:24         ` Josef Bacik
2021-11-09 23:44           ` Qu Wenruo
2021-11-10 13:57             ` Josef Bacik
2021-11-10  7:13           ` Qu Wenruo
2021-11-10 13:54             ` Josef Bacik [this message]
2021-11-05 20:49 ` [PATCH 8/8] btrfs: add support for multiple global roots Josef Bacik
2021-11-06  1:18   ` Qu Wenruo
2021-11-06  1:51     ` Qu Wenruo
2021-11-08 19:39       ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YYvPHv9dxZKFlraB@localhost.localdomain \
    --to=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.