From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 3/4] xfs: support bulk loading of staged btrees
Date: Sat, 7 Mar 2020 07:40:41 -0500 [thread overview]
Message-ID: <20200307124041.GA27459@bfoster> (raw)
In-Reply-To: <20200306201458.GA8045@magnolia>
On Fri, Mar 06, 2020 at 12:14:58PM -0800, Darrick J. Wong wrote:
> On Fri, Mar 06, 2020 at 12:21:29PM -0500, Brian Foster wrote:
> > On Fri, Mar 06, 2020 at 08:27:05AM -0800, Darrick J. Wong wrote:
> > > On Fri, Mar 06, 2020 at 09:22:50AM -0500, Brian Foster wrote:
> > > > On Thu, Mar 05, 2020 at 10:13:29AM -0800, Darrick J. Wong wrote:
> > > > > On Thu, Mar 05, 2020 at 09:30:29AM -0500, Brian Foster wrote:
> > > > > > On Wed, Mar 04, 2020 at 05:22:13PM -0800, Darrick J. Wong wrote:
> > > > > > > On Wed, Mar 04, 2020 at 01:21:44PM -0500, Brian Foster wrote:
> > > > > > > > On Tue, Mar 03, 2020 at 07:28:41PM -0800, Darrick J. Wong wrote:
> > > > > > > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > > > > >
> > > > > > > > > Add a new btree function that enables us to bulk load a btree cursor.
> > > > > > > > > This will be used by the upcoming online repair patches to generate new
> > > > > > > > > btrees. This avoids the programmatic inefficiency of calling
> > > > > > > > > xfs_btree_insert in a loop (which generates a lot of log traffic) in
> > > > > > > > > favor of stamping out new btree blocks with ordered buffers, and then
> > > > > > > > > committing both the new root and scheduling the removal of the old btree
> > > > > > > > > blocks in a single transaction commit.
> > > > > > > > >
> > > > > > > > > The design of this new generic code is based off the btree rebuilding
> > > > > > > > > code in xfs_repair's phase 5 code, with the explicit goal of enabling us
> > > > > > > > > to share that code between scrub and repair. It has the additional
> > > > > > > > > feature of being able to control btree block loading factors.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > > > > > ---
> > > > > > > > > fs/xfs/libxfs/xfs_btree.c | 581 +++++++++++++++++++++++++++++++++++++++++++++
> > > > > > > > > fs/xfs/libxfs/xfs_btree.h | 46 ++++
> > > > > > > > > fs/xfs/xfs_trace.c | 1
> > > > > > > > > fs/xfs/xfs_trace.h | 85 +++++++
> > > > > > > > > 4 files changed, 712 insertions(+), 1 deletion(-)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> > > > > > > > > index 469e1e9053bb..c21db7ed8481 100644
> > > > > > > > > --- a/fs/xfs/libxfs/xfs_btree.c
> > > > > > > > > +++ b/fs/xfs/libxfs/xfs_btree.c
> > > > ...
> > > > > > > > > + for (cur->bc_nlevels = 1; cur->bc_nlevels < XFS_BTREE_MAXLEVELS;) {
> > > > > > > > > + uint64_t level_blocks;
> > > > > > > > > + uint64_t dontcare64;
> > > > > > > > > + unsigned int level = cur->bc_nlevels - 1;
> > > > > > > > > + unsigned int avg_per_block;
> > > > > > > > > +
> > > > > > > > > + /*
> > > > > > > > > + * If all the things we want to store at this level would fit
> > > > > > > > > + * in a single root block, then we have our btree root and are
> > > > > > > > > + * done. Note that bmap btrees do not allow records in the
> > > > > > > > > + * root.
> > > > > > > > > + */
> > > > > > > > > + if (!(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) || level != 0) {
> > > > > > > > > + xfs_btree_bload_level_geometry(cur, bbl, level,
> > > > > > > > > + nr_this_level, &avg_per_block,
> > > > > > > > > + &level_blocks, &dontcare64);
> > > > > > > > > + if (nr_this_level <= avg_per_block) {
> > > > > > > > > + nr_blocks++;
> > > > > > > > > + break;
> > > > > > > > > + }
> > > > > > > > > + }
> > > > > > > > > +
> > > > > > > > > + /*
> > > > > > > > > + * Otherwise, we have to store all the records for this level
> > > > > > > > > + * in blocks and therefore need another level of btree to point
> > > > > > > > > + * to those blocks. Increase the number of levels and
> > > > > > > > > + * recompute the number of records we can store at this level
> > > > > > > > > + * because that can change depending on whether or not a level
> > > > > > > > > + * is the root level.
> > > > > > > > > + */
> > > > > > > > > + cur->bc_nlevels++;
> > > > > > > >
> > > > > > > > Hmm.. so does the ->bc_nlevels increment affect the
> > > > > > > > _bload_level_geometry() call or is it just part of the loop iteration?
> > > > > > > > If the latter, can these two _bload_level_geometry() calls be combined?
> > > > > > >
> > > > > > > It affects the xfs_btree_bload_level_geometry call because that calls
> > > > > > > ->get_maxrecs(), which returns a different answer for the root level
> > > > > > > when the root is an inode fork. Therefore, we cannot combine the calls.
> > > > > > >
> > > > > >
> > > > > > Hmm.. but doesn't this cause double calls for other cases? I.e. for
> > > > > > non-inode rooted trees it looks like we call the function once, check
> > > > > > the avg_per_block and then potentially call it again until we get to the
> > > > > > root block. Confused.. :/
> > > > >
> > > > > Yes, we do end up computing the geometry twice per level, which frees
> > > > > the bulkload code from having to know anything at all about the
> > > > > relationship between bc_nlevels and specific behaviors of some of the
> > > > > ->maxrecs functions.
> > > > >
> > > >
> > > > Sort of.. I think the pattern is odd enough that the fact it needs to
> > > > accommodate this special case kind of bleeds through even though it
> > > > isn't explicit.
> > >
> > > <nod>
> > >
> > > > > I guess you could do:
> > > > >
> > > > > xfs_btree_bload_level_geometry(...)
> > > > >
> > > > > if ((!ROOT_IN_INODE || level != 0) ** nr_this_level <= avg_per_block) {
> > > > > nr_blocks++
> > > > > break
> > > > > }
> > > > >
> > > > > nlevels++
> > > > >
> > > > > if (ROOT_IN_INODE) {
> > > > > xfs_btree_bload_level_geometry(...)
> > > > > }
> > > > >
> > > > > nr_blocks += level_blocks
> > > > > nr_this_level = level_blocks
> > > > >
> > > > > ...which would be slightly more efficient for AG btrees, though my
> > > > > crappy perf trace showed that the overhead for the _level_geometry()
> > > > > calls is ~0.4% even for a huge ugly rmap btree because most of the time
> > > > > gets spent in the delwri_submit_buffers at the end.
> > > > >
> > > >
> > > > It wasn't primarily a performance concern rather than a "this sure looks
> > > > like we call the function twice per loop for no reason" comment.
> > > > Something like the above might be more clear, but I need to make sure I
> > > > understand this loop first...
> > > >
> > > > Having stared at this some more, I _think_ I understand why this is
> > > > written as such. For the non-inode rooted case, the double call is
> > > > basically unnecessary so the whole loop could look something like this
> > > > (if we factored out the bmbt case):
> > > >
> > > > for (cur->bc_nlevels = 1; ...) {
> > > > xfs_btree_bload_level_geometry(...);
> > > > if (nr_this_level <= avg_per_block) {
> > > > nr_blocks++;
> > > > break;
> > > > }
> > > > cur->bc_nlevels++;
> > > > nr_this_level = level_blocks;
> > > > }
> > > >
> > > > Is that correct?
> > >
> > > Yes, that is correct for inode-rooted ("non-bmbt") btrees.
> > >
> > > > The bmbt case has these special cases where 1.) the bmbt root must be a
> > > > node block (not a leaf) and 2.) the root block has different size rules
> > > > than a typical node block because it's in the inode fork.
> > >
> > > Right.
> > >
> > > > The former
> > > > seems straightforward and explains the level != 0 check. The latter is
> > > > detected by the (level == ->bc_nlevels - 1) condition down in the
> > > > maxrecs code, so that means the order of ->bc_nlevels++ increment with
> > > > respect to the geometry call affects whether we check for a potential
> > > > bmbt root or regular node block (assuming level != 0).
> > >
> > > Right.
> > >
> > > > Hence, the bottom
> > > > part of the loop does the increment first and makes the geometry call
> > > > again... Am I following that correctly?
> > >
> > > Correct.
> > >
> > > > If so, I think at the very least the existing comments should start by
> > > > explaining the intentional construction of the loop and subtle ordering
> > > > requirements between ->bc_nlevels and the geometry calls for the bmbt.
> > > > In staring at it a bit more, I find something like the following more
> > > > clear even though it is more verbose:
> > > >
> > > > for (cur->bc_nlevels = 1; cur->bc_nlevels < XFS_BTREE_MAXLEVELS;) {
> > > > ...
> > > > xfs_btree_bload_level_geometry(cur, bbl, level, nr_this_level,
> > > > &avg_per_block,&level_blocks,
> > > > &dontcare64);
> > > > if (<inode rooted>) {
> > > > /* bmbt root must be node format, skip check for level 0 */
> > > > if (level != 0 && nr_this_level <= avg_per_block) {
> > > > nr_blocks++;
> > > > break;
> > > > }
> > > > /*
> > > > * We have to calculate geometry for each bmbt level
> > > > * twice because there is a distinction between a bmbt
> > > > * root in an inode fork and a traditional node block.
> > > > * This distinction is made in the btree code based on
> > > > * whether level == ->bc_nlevels - 1. We aren't yet at
> > > > * the root, so bump ->bc_nevels and recalculate
> > > > * geometry for a traditional node block tree level.
> > >
> > > Oooh, I like this comment. I'll put that in, with a bit of rewording?
> > >
> > > /*
> > > * Otherwise, we have to store all the items for this
> > > * level in traditional btree blocks and therefore need
> > > * another level of btree to point to those blocks.
> > > *
> > > * We have to re-compute the geometry for each level of
> > > * an inode-rooted btree because the geometry differs
> > > * between a btree root in an inode fork and a
> > > * traditional btree block.
> > > *
> > > * This distinction is made in the btree code based on
> > > * whether level == ->bc_nlevels - 1. We know that we
> > > * aren't yet ready to populate the root, so increment
> > > * ->bc_nevels and recalculate the geometry for a
> > > * traditional block-based btree level.
> > > */
> > > cur->bc_nlevels++;
> > > xfs_btree_bload_level_geometry(...);
> >
> > One thing I would tweak (even when reading back my own comment) is to
> > say something like "We aren't ready to populate the root based on the
> > previous check ..." in that last sentence so it's clear the previous
> > check was the "root check." Otherwise that looks good to me, thanks!
>
> "This distinction is made in the btree code based on whether level ==
> bc_nlevels - 1. Based on the previous root block size check against the
> root block geometry, we know that we aren't yet ready to populate the
> root. Increment bc_nlevels and recalculate the geometry for a
> traditional block-based btree level."?
>
ACK
> --D
>
> > Brian
> >
> > >
> > >
> > > > */
> > > > cur->bc_nlevels++;
> > > > xfs_btree_bload_level_geometry();
> > > > } else {
> > > > if (nr_this_level <= avg_per_block) {
> > > > nr_blocks++;
> > > > break;
> > > > }
> > > > cur->bc_nlevels++;
> > > > }
> > > >
> > > > nr_blocks += level_blocks;
> > > > nr_this_level = level_blocks;
> > > > }
> > > >
> > > > The comments and whatnot could use massaging and perhaps it would still
> > > > be fine to factor out the root block check from the if/else, but that
> > > > illustrates the idea. Thoughts?
> > >
> > > That structure (and the more verbose commenting) looks good to me.
> > > Truth be told, it took me quite a while to work out all these weird
> > > kinks vs. the phase5.c code which never had to deal with bmbts.
> > >
> > > Onto the next email...
> > >
> > > --D
> > >
> > > > Brian
> > > >
> > > > > --D
> > > > >
> > > > > > Brian
> > > > > >
> > > > > > > >
> > > > > > > > > + xfs_btree_bload_level_geometry(cur, bbl, level, nr_this_level,
> > > > > > > > > + &avg_per_block, &level_blocks, &dontcare64);
> > > > > > > > > + nr_blocks += level_blocks;
> > > > > > > > > + nr_this_level = level_blocks;
> > > > > > > > > + }
> > > > > > > > > +
> > > > > > > > > + if (cur->bc_nlevels == XFS_BTREE_MAXLEVELS)
> > > > > > > > > + return -EOVERFLOW;
> > > > > > > > > +
> > > > > > > > > + bbl->btree_height = cur->bc_nlevels;
> > > > > > > > > + if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE)
> > > > > > > > > + bbl->nr_blocks = nr_blocks - 1;
> > > > > > > > > + else
> > > > > > > > > + bbl->nr_blocks = nr_blocks;
> > > > > > > > > + return 0;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +/*
> > > > > > > > > + * Bulk load a btree.
> > > > > > > > > + *
> > > > > > > > > + * Load @bbl->nr_records quantity of records into a btree using the supplied
> > > > > > > > > + * empty and staging btree cursor @cur and a @bbl that has been filled out by
> > > > > > > > > + * the xfs_btree_bload_compute_geometry function.
> > > > > > > > > + *
> > > > > > > > > + * The @bbl->get_data function must populate the cursor's bc_rec every time it
> > > > > > > > > + * is called. The @bbl->alloc_block function will be used to allocate new
> > > > > > > > > + * btree blocks. @priv is passed to both functions.
> > > > > > > > > + *
> > > > > > > > > + * Caller must ensure that @cur is a staging cursor. Any existing btree rooted
> > > > > > > > > + * in the fakeroot will be lost, so do not call this function twice.
> > > > > > > > > + */
> > > > > > > > > +int
> > > > > > > > > +xfs_btree_bload(
> > > > > > > > > + struct xfs_btree_cur *cur,
> > > > > > > > > + struct xfs_btree_bload *bbl,
> > > > > > > > > + void *priv)
> > > > > > > > > +{
> > > > > > > > > + union xfs_btree_ptr child_ptr;
> > > > > > > > > + union xfs_btree_ptr ptr;
> > > > > > > > > + struct xfs_buf *bp = NULL;
> > > > > > > > > + struct xfs_btree_block *block = NULL;
> > > > > > > > > + uint64_t nr_this_level = bbl->nr_records;
> > > > > > > > > + uint64_t blocks;
> > > > > > > > > + uint64_t i;
> > > > > > > > > + uint64_t blocks_with_extra;
> > > > > > > > > + uint64_t total_blocks = 0;
> > > > > > > > > + unsigned int avg_per_block;
> > > > > > > > > + unsigned int level = 0;
> > > > > > > > > + int ret;
> > > > > > > > > +
> > > > > > > > > + ASSERT(cur->bc_flags & XFS_BTREE_STAGING);
> > > > > > > > > +
> > > > > > > > > + INIT_LIST_HEAD(&bbl->buffers_list);
> > > > > > > > > + cur->bc_nlevels = bbl->btree_height;
> > > > > > > > > + xfs_btree_set_ptr_null(cur, &child_ptr);
> > > > > > > > > + xfs_btree_set_ptr_null(cur, &ptr);
> > > > > > > > > +
> > > > > > > > > + xfs_btree_bload_level_geometry(cur, bbl, level, nr_this_level,
> > > > > > > > > + &avg_per_block, &blocks, &blocks_with_extra);
> > > > > > > > > +
> > > > > > > > > + /* Load each leaf block. */
> > > > > > > > > + for (i = 0; i < blocks; i++) {
> > > > > > > > > + unsigned int nr_this_block = avg_per_block;
> > > > > > > > > +
> > > > > > > > > + if (i < blocks_with_extra)
> > > > > > > > > + nr_this_block++;
> > > > > > > > > +
> > > > > > > > > + ret = xfs_btree_bload_prep_block(cur, bbl, level,
> > > > > > > > > + nr_this_block, &ptr, &bp, &block, priv);
> > > > > > > > > + if (ret)
> > > > > > > > > + return ret;
> > > > > > > > > +
> > > > > > > > > + trace_xfs_btree_bload_block(cur, level, i, blocks, &ptr,
> > > > > > > > > + nr_this_block);
> > > > > > > > > +
> > > > > > > > > + ret = xfs_btree_bload_leaf(cur, nr_this_block, bbl->get_data,
> > > > > > > > > + block, priv);
> > > > > > > > > + if (ret)
> > > > > > > > > + goto out;
> > > > > > > > > +
> > > > > > > > > + /* Record the leftmost pointer to start the next level. */
> > > > > > > > > + if (i == 0)
> > > > > > > > > + xfs_btree_copy_ptrs(cur, &child_ptr, &ptr, 1);
> > > > > > > >
> > > > > > > > "leftmost pointer" refers to the leftmost leaf block..?
> > > > > > >
> > > > > > > Yes. "Record the leftmost leaf pointer so we know where to start with
> > > > > > > the first node level." ?
> > > > > > >
> > > > > > > > > + }
> > > > > > > > > + total_blocks += blocks;
> > > > > > > > > + xfs_btree_bload_drop_buf(bbl, cur->bc_tp, &bp);
> > > > > > > > > +
> > > > > > > > > + /* Populate the internal btree nodes. */
> > > > > > > > > + for (level = 1; level < cur->bc_nlevels; level++) {
> > > > > > > > > + union xfs_btree_ptr first_ptr;
> > > > > > > > > +
> > > > > > > > > + nr_this_level = blocks;
> > > > > > > > > + block = NULL;
> > > > > > > > > + xfs_btree_set_ptr_null(cur, &ptr);
> > > > > > > > > +
> > > > > > > > > + xfs_btree_bload_level_geometry(cur, bbl, level, nr_this_level,
> > > > > > > > > + &avg_per_block, &blocks, &blocks_with_extra);
> > > > > > > > > +
> > > > > > > > > + /* Load each node block. */
> > > > > > > > > + for (i = 0; i < blocks; i++) {
> > > > > > > > > + unsigned int nr_this_block = avg_per_block;
> > > > > > > > > +
> > > > > > > > > + if (i < blocks_with_extra)
> > > > > > > > > + nr_this_block++;
> > > > > > > > > +
> > > > > > > > > + ret = xfs_btree_bload_prep_block(cur, bbl, level,
> > > > > > > > > + nr_this_block, &ptr, &bp, &block,
> > > > > > > > > + priv);
> > > > > > > > > + if (ret)
> > > > > > > > > + return ret;
> > > > > > > > > +
> > > > > > > > > + trace_xfs_btree_bload_block(cur, level, i, blocks,
> > > > > > > > > + &ptr, nr_this_block);
> > > > > > > > > +
> > > > > > > > > + ret = xfs_btree_bload_node(cur, nr_this_block,
> > > > > > > > > + &child_ptr, block);
> > > > > > > > > + if (ret)
> > > > > > > > > + goto out;
> > > > > > > > > +
> > > > > > > > > + /*
> > > > > > > > > + * Record the leftmost pointer to start the next level.
> > > > > > > > > + */
> > > > > > > >
> > > > > > > > And the same thing here. I think the generic ptr name is a little
> > > > > > > > confusing, though I don't have a better suggestion. I think it would
> > > > > > > > help if the comments were more explicit to say something like: "ptr
> > > > > > > > refers to the current block addr. Save the first block in the current
> > > > > > > > level so the next level up knows where to start looking for keys."
> > > > > > >
> > > > > > > Yes, I'll do that:
> > > > > > >
> > > > > > > "Record the leftmost node pointer so that we know where to start the
> > > > > > > next node level above this one."
> > > > > > >
> > > > > > > Thanks for reviewing!
> > > > > > >
> > > > > > > --D
> > > > > > >
> > > > > > > > Brian
> > > > > > > >
> > > > > > > > > + if (i == 0)
> > > > > > > > > + xfs_btree_copy_ptrs(cur, &first_ptr, &ptr, 1);
> > > > > > > > > + }
> > > > > > > > > + total_blocks += blocks;
> > > > > > > > > + xfs_btree_bload_drop_buf(bbl, cur->bc_tp, &bp);
> > > > > > > > > + xfs_btree_copy_ptrs(cur, &child_ptr, &first_ptr, 1);
> > > > > > > > > + }
> > > > > > > > > +
> > > > > > > > > + /* Initialize the new root. */
> > > > > > > > > + if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) {
> > > > > > > > > + ASSERT(xfs_btree_ptr_is_null(cur, &ptr));
> > > > > > > > > + cur->bc_private.b.ifake->if_levels = cur->bc_nlevels;
> > > > > > > > > + cur->bc_private.b.ifake->if_blocks = total_blocks - 1;
> > > > > > > > > + } else {
> > > > > > > > > + cur->bc_private.a.afake->af_root = be32_to_cpu(ptr.s);
> > > > > > > > > + cur->bc_private.a.afake->af_levels = cur->bc_nlevels;
> > > > > > > > > + cur->bc_private.a.afake->af_blocks = total_blocks;
> > > > > > > > > + }
> > > > > > > > > +
> > > > > > > > > + /*
> > > > > > > > > + * Write the new blocks to disk. If the ordered list isn't empty after
> > > > > > > > > + * that, then something went wrong and we have to fail. This should
> > > > > > > > > + * never happen, but we'll check anyway.
> > > > > > > > > + */
> > > > > > > > > + ret = xfs_buf_delwri_submit(&bbl->buffers_list);
> > > > > > > > > + if (ret)
> > > > > > > > > + goto out;
> > > > > > > > > + if (!list_empty(&bbl->buffers_list)) {
> > > > > > > > > + ASSERT(list_empty(&bbl->buffers_list));
> > > > > > > > > + ret = -EIO;
> > > > > > > > > + }
> > > > > > > > > +out:
> > > > > > > > > + xfs_buf_delwri_cancel(&bbl->buffers_list);
> > > > > > > > > + if (bp)
> > > > > > > > > + xfs_trans_brelse(cur->bc_tp, bp);
> > > > > > > > > + return ret;
> > > > > > > > > +}
> > > > > > > > > diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
> > > > > > > > > index 2965ed663418..51720de366ae 100644
> > > > > > > > > --- a/fs/xfs/libxfs/xfs_btree.h
> > > > > > > > > +++ b/fs/xfs/libxfs/xfs_btree.h
> > > > > > > > > @@ -578,4 +578,50 @@ void xfs_btree_stage_ifakeroot(struct xfs_btree_cur *cur,
> > > > > > > > > void xfs_btree_commit_ifakeroot(struct xfs_btree_cur *cur, int whichfork,
> > > > > > > > > const struct xfs_btree_ops *ops);
> > > > > > > > >
> > > > > > > > > +typedef int (*xfs_btree_bload_get_fn)(struct xfs_btree_cur *cur, void *priv);
> > > > > > > > > +typedef int (*xfs_btree_bload_alloc_block_fn)(struct xfs_btree_cur *cur,
> > > > > > > > > + union xfs_btree_ptr *ptr, void *priv);
> > > > > > > > > +typedef size_t (*xfs_btree_bload_iroot_size_fn)(struct xfs_btree_cur *cur,
> > > > > > > > > + unsigned int nr_this_level, void *priv);
> > > > > > > > > +
> > > > > > > > > +/* Bulk loading of staged btrees. */
> > > > > > > > > +struct xfs_btree_bload {
> > > > > > > > > + /* Buffer list for delwri_queue. */
> > > > > > > > > + struct list_head buffers_list;
> > > > > > > > > +
> > > > > > > > > + /* Function to store a record in the cursor. */
> > > > > > > > > + xfs_btree_bload_get_fn get_data;
> > > > > > > > > +
> > > > > > > > > + /* Function to allocate a block for the btree. */
> > > > > > > > > + xfs_btree_bload_alloc_block_fn alloc_block;
> > > > > > > > > +
> > > > > > > > > + /* Function to compute the size of the in-core btree root block. */
> > > > > > > > > + xfs_btree_bload_iroot_size_fn iroot_size;
> > > > > > > > > +
> > > > > > > > > + /* Number of records the caller wants to store. */
> > > > > > > > > + uint64_t nr_records;
> > > > > > > > > +
> > > > > > > > > + /* Number of btree blocks needed to store those records. */
> > > > > > > > > + uint64_t nr_blocks;
> > > > > > > > > +
> > > > > > > > > + /*
> > > > > > > > > + * Number of free records to leave in each leaf block. If this (or
> > > > > > > > > + * any of the slack values) are negative, this will be computed to
> > > > > > > > > + * be halfway between maxrecs and minrecs. This typically leaves the
> > > > > > > > > + * block 75% full.
> > > > > > > > > + */
> > > > > > > > > + int leaf_slack;
> > > > > > > > > +
> > > > > > > > > + /* Number of free keyptrs to leave in each node block. */
> > > > > > > > > + int node_slack;
> > > > > > > > > +
> > > > > > > > > + /* Computed btree height. */
> > > > > > > > > + unsigned int btree_height;
> > > > > > > > > +};
> > > > > > > > > +
> > > > > > > > > +int xfs_btree_bload_compute_geometry(struct xfs_btree_cur *cur,
> > > > > > > > > + struct xfs_btree_bload *bbl, uint64_t nr_records);
> > > > > > > > > +int xfs_btree_bload(struct xfs_btree_cur *cur, struct xfs_btree_bload *bbl,
> > > > > > > > > + void *priv);
> > > > > > > > > +
> > > > > > > > > #endif /* __XFS_BTREE_H__ */
> > > > > > > > > diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c
> > > > > > > > > index bc85b89f88ca..9b5e58a92381 100644
> > > > > > > > > --- a/fs/xfs/xfs_trace.c
> > > > > > > > > +++ b/fs/xfs/xfs_trace.c
> > > > > > > > > @@ -6,6 +6,7 @@
> > > > > > > > > #include "xfs.h"
> > > > > > > > > #include "xfs_fs.h"
> > > > > > > > > #include "xfs_shared.h"
> > > > > > > > > +#include "xfs_bit.h"
> > > > > > > > > #include "xfs_format.h"
> > > > > > > > > #include "xfs_log_format.h"
> > > > > > > > > #include "xfs_trans_resv.h"
> > > > > > > > > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > > > > > > > > index 7e162ca80c92..69e8605f9f97 100644
> > > > > > > > > --- a/fs/xfs/xfs_trace.h
> > > > > > > > > +++ b/fs/xfs/xfs_trace.h
> > > > > > > > > @@ -35,6 +35,7 @@ struct xfs_icreate_log;
> > > > > > > > > struct xfs_owner_info;
> > > > > > > > > struct xfs_trans_res;
> > > > > > > > > struct xfs_inobt_rec_incore;
> > > > > > > > > +union xfs_btree_ptr;
> > > > > > > > >
> > > > > > > > > DECLARE_EVENT_CLASS(xfs_attr_list_class,
> > > > > > > > > TP_PROTO(struct xfs_attr_list_context *ctx),
> > > > > > > > > @@ -3655,6 +3656,90 @@ TRACE_EVENT(xfs_btree_commit_ifakeroot,
> > > > > > > > > __entry->blocks)
> > > > > > > > > )
> > > > > > > > >
> > > > > > > > > +TRACE_EVENT(xfs_btree_bload_level_geometry,
> > > > > > > > > + TP_PROTO(struct xfs_btree_cur *cur, unsigned int level,
> > > > > > > > > + uint64_t nr_this_level, unsigned int nr_per_block,
> > > > > > > > > + unsigned int desired_npb, uint64_t blocks,
> > > > > > > > > + uint64_t blocks_with_extra),
> > > > > > > > > + TP_ARGS(cur, level, nr_this_level, nr_per_block, desired_npb, blocks,
> > > > > > > > > + blocks_with_extra),
> > > > > > > > > + TP_STRUCT__entry(
> > > > > > > > > + __field(dev_t, dev)
> > > > > > > > > + __field(xfs_btnum_t, btnum)
> > > > > > > > > + __field(unsigned int, level)
> > > > > > > > > + __field(unsigned int, nlevels)
> > > > > > > > > + __field(uint64_t, nr_this_level)
> > > > > > > > > + __field(unsigned int, nr_per_block)
> > > > > > > > > + __field(unsigned int, desired_npb)
> > > > > > > > > + __field(unsigned long long, blocks)
> > > > > > > > > + __field(unsigned long long, blocks_with_extra)
> > > > > > > > > + ),
> > > > > > > > > + TP_fast_assign(
> > > > > > > > > + __entry->dev = cur->bc_mp->m_super->s_dev;
> > > > > > > > > + __entry->btnum = cur->bc_btnum;
> > > > > > > > > + __entry->level = level;
> > > > > > > > > + __entry->nlevels = cur->bc_nlevels;
> > > > > > > > > + __entry->nr_this_level = nr_this_level;
> > > > > > > > > + __entry->nr_per_block = nr_per_block;
> > > > > > > > > + __entry->desired_npb = desired_npb;
> > > > > > > > > + __entry->blocks = blocks;
> > > > > > > > > + __entry->blocks_with_extra = blocks_with_extra;
> > > > > > > > > + ),
> > > > > > > > > + TP_printk("dev %d:%d btree %s level %u/%u nr_this_level %llu nr_per_block %u desired_npb %u blocks %llu blocks_with_extra %llu",
> > > > > > > > > + MAJOR(__entry->dev), MINOR(__entry->dev),
> > > > > > > > > + __print_symbolic(__entry->btnum, XFS_BTNUM_STRINGS),
> > > > > > > > > + __entry->level,
> > > > > > > > > + __entry->nlevels,
> > > > > > > > > + __entry->nr_this_level,
> > > > > > > > > + __entry->nr_per_block,
> > > > > > > > > + __entry->desired_npb,
> > > > > > > > > + __entry->blocks,
> > > > > > > > > + __entry->blocks_with_extra)
> > > > > > > > > +)
> > > > > > > > > +
> > > > > > > > > +TRACE_EVENT(xfs_btree_bload_block,
> > > > > > > > > + TP_PROTO(struct xfs_btree_cur *cur, unsigned int level,
> > > > > > > > > + uint64_t block_idx, uint64_t nr_blocks,
> > > > > > > > > + union xfs_btree_ptr *ptr, unsigned int nr_records),
> > > > > > > > > + TP_ARGS(cur, level, block_idx, nr_blocks, ptr, nr_records),
> > > > > > > > > + TP_STRUCT__entry(
> > > > > > > > > + __field(dev_t, dev)
> > > > > > > > > + __field(xfs_btnum_t, btnum)
> > > > > > > > > + __field(unsigned int, level)
> > > > > > > > > + __field(unsigned long long, block_idx)
> > > > > > > > > + __field(unsigned long long, nr_blocks)
> > > > > > > > > + __field(xfs_agnumber_t, agno)
> > > > > > > > > + __field(xfs_agblock_t, agbno)
> > > > > > > > > + __field(unsigned int, nr_records)
> > > > > > > > > + ),
> > > > > > > > > + TP_fast_assign(
> > > > > > > > > + __entry->dev = cur->bc_mp->m_super->s_dev;
> > > > > > > > > + __entry->btnum = cur->bc_btnum;
> > > > > > > > > + __entry->level = level;
> > > > > > > > > + __entry->block_idx = block_idx;
> > > > > > > > > + __entry->nr_blocks = nr_blocks;
> > > > > > > > > + if (cur->bc_flags & XFS_BTREE_LONG_PTRS) {
> > > > > > > > > + xfs_fsblock_t fsb = be64_to_cpu(ptr->l);
> > > > > > > > > +
> > > > > > > > > + __entry->agno = XFS_FSB_TO_AGNO(cur->bc_mp, fsb);
> > > > > > > > > + __entry->agbno = XFS_FSB_TO_AGBNO(cur->bc_mp, fsb);
> > > > > > > > > + } else {
> > > > > > > > > + __entry->agno = cur->bc_private.a.agno;
> > > > > > > > > + __entry->agbno = be32_to_cpu(ptr->s);
> > > > > > > > > + }
> > > > > > > > > + __entry->nr_records = nr_records;
> > > > > > > > > + ),
> > > > > > > > > + TP_printk("dev %d:%d btree %s level %u block %llu/%llu fsb (%u/%u) recs %u",
> > > > > > > > > + MAJOR(__entry->dev), MINOR(__entry->dev),
> > > > > > > > > + __print_symbolic(__entry->btnum, XFS_BTNUM_STRINGS),
> > > > > > > > > + __entry->level,
> > > > > > > > > + __entry->block_idx,
> > > > > > > > > + __entry->nr_blocks,
> > > > > > > > > + __entry->agno,
> > > > > > > > > + __entry->agbno,
> > > > > > > > > + __entry->nr_records)
> > > > > > > > > +)
> > > > > > > > > +
> > > > > > > > > #endif /* _TRACE_XFS_H */
> > > > > > > > >
> > > > > > > > > #undef TRACE_INCLUDE_PATH
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
next prev parent reply other threads:[~2020-03-07 12:40 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-04 3:28 [PATCH v3 0/4] xfs: btree bulk loading Darrick J. Wong
2020-03-04 3:28 ` [PATCH 1/4] xfs: introduce fake roots for ag-rooted btrees Darrick J. Wong
2020-03-04 18:21 ` Brian Foster
2020-03-04 23:34 ` Darrick J. Wong
2020-03-04 23:53 ` Dave Chinner
2020-03-05 1:23 ` Darrick J. Wong
2020-03-05 14:30 ` Brian Foster
2020-03-05 17:39 ` Darrick J. Wong
2020-03-05 1:20 ` Dave Chinner
2020-03-05 1:23 ` Darrick J. Wong
2020-03-04 3:28 ` [PATCH 2/4] xfs: introduce fake roots for inode-rooted btrees Darrick J. Wong
2020-03-04 23:40 ` Darrick J. Wong
2020-03-04 3:28 ` [PATCH 3/4] xfs: support bulk loading of staged btrees Darrick J. Wong
2020-03-04 18:21 ` Brian Foster
2020-03-05 1:22 ` Darrick J. Wong
2020-03-05 14:30 ` Brian Foster
2020-03-05 18:13 ` Darrick J. Wong
2020-03-06 14:22 ` Brian Foster
2020-03-06 16:27 ` Darrick J. Wong
2020-03-06 17:21 ` Brian Foster
2020-03-06 20:14 ` Darrick J. Wong
2020-03-07 12:40 ` Brian Foster [this message]
2020-03-05 23:59 ` Darrick J. Wong
2020-03-06 14:23 ` Brian Foster
2020-03-06 16:51 ` Darrick J. Wong
2020-03-06 17:25 ` Brian Foster
2020-03-06 19:55 ` Darrick J. Wong
2020-03-04 3:28 ` [PATCH 4/4] xfs: support staging cursors for per-AG btree types Darrick J. Wong
2020-03-05 14:30 ` Brian Foster
2020-03-05 18:18 ` Darrick J. Wong
-- strict thread matches above, loose matches on Subject: below --
2020-01-01 1:00 [PATCH v2 0/4] xfs: btree bulk loading Darrick J. Wong
2020-01-01 1:01 ` [PATCH 3/4] xfs: support bulk loading of staged btrees Darrick J. Wong
2019-10-29 23:30 [PATCH v2 0/4] xfs: btree bulk loading Darrick J. Wong
2019-10-29 23:30 ` [PATCH 3/4] xfs: support bulk loading of staged btrees Darrick J. Wong
2019-10-09 16:47 [PATCH 0/4] xfs: btree bulk loading Darrick J. Wong
2019-10-09 16:48 ` [PATCH 3/4] xfs: support bulk loading of staged btrees Darrick J. Wong
2019-10-16 15:26 ` Brian Foster
2019-10-16 18:15 ` Darrick J. Wong
2019-10-16 21:07 ` Brian Foster
2019-10-17 0:40 ` Darrick J. Wong
2019-10-17 9:32 ` Brian Foster
2019-10-17 19:06 ` Darrick J. Wong
2019-10-18 14:38 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200307124041.GA27459@bfoster \
--to=bfoster@redhat.com \
--cc=darrick.wong@oracle.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).