From: Chandan Rajendra <chandan@linux.ibm.com>
To: Brian Foster <bfoster@redhat.com>
Cc: Chandan Rajendra <chandanrlinux@gmail.com>,
linux-xfs@vger.kernel.org, david@fromorbit.com,
darrick.wong@oracle.com, amir73il@gmail.com
Subject: Re: [PATCH V4 RESEND 7/7] xfs: Fix log reservation calculation for xattr insert operation
Date: Thu, 27 Feb 2020 14:44:11 +0530 [thread overview]
Message-ID: <1915611.aRCDV9Np7A@localhost.localdomain> (raw)
In-Reply-To: <20200226185050.GD19695@bfoster>
On Thursday, February 27, 2020 12:20 AM Brian Foster wrote:
> On Wed, Feb 26, 2020 at 04:51:13PM +0530, Chandan Rajendra wrote:
> > On Tuesday, February 25, 2020 10:49 PM Brian Foster wrote:
> > > On Mon, Feb 24, 2020 at 09:30:44AM +0530, Chandan Rajendra wrote:
> > > > Log space reservation for xattr insert operation can be divided into two
> > > > parts,
> > > > 1. Mount time
> > > > - Inode
> > > > - Superblock for accounting space allocations
> > > > - AGF for accounting space used by count, block number, rmap and refcnt
> > > > btrees.
> > > >
> > > > 2. The remaining log space can only be calculated at run time because,
> > > > - A local xattr can be large enough to cause a double split of the dabtree.
> > > > - The value of the xattr can be large enough to be stored in remote
> > > > blocks. The contents of the remote blocks are not logged.
> > > >
> > > > The log space reservation could be,
> > > > - 2 * XFS_DA_NODE_MAXDEPTH number of blocks. Additional XFS_DA_NODE_MAXDEPTH
> > > > number of blocks are required if xattr is large enough to cause another
> > > > split of the dabtree path from root to leaf block.
> > > > - BMBT blocks for storing (2 * XFS_DA_NODE_MAXDEPTH) record
> > > > entries. Additional XFS_DA_NODE_MAXDEPTH number of blocks are required in
> > > > case of a double split of the dabtree path from root to leaf blocks.
> > > > - Space for logging blocks of count, block number, rmap and refcnt btrees.
> > > >
> > > > Presently, mount time log reservation includes block count required for a
> > > > single split of the dabtree. The dabtree block count is also taken into
> > > > account by xfs_attr_calc_size().
> > > >
> > > > Also, AGF log space reservation isn't accounted for.
> > > >
> > > > Due to the reasons mentioned above, log reservation calculation for xattr
> > > > insert operation gives an incorrect value.
> > > >
> > > > Apart from the above, xfs_log_calc_max_attrsetm_res() passes byte count as
> > > > an argument to XFS_NEXTENTADD_SPACE_RES() instead of block count.
> > > >
> > > > To fix these issues, this commit changes xfs_attr_calc_size() to also
> > > > calculate the number of dabtree blocks that need to be logged.
> > > >
> > > > xfs_attr_set() uses the following values computed by xfs_attr_calc_size()
> > > > 1. The number of dabtree blocks that need to be logged.
> > > > 2. The number of remote blocks that need to be allocated.
> > > > 3. The number of dabtree blocks that need to be allocated.
> > > > 4. The number of bmbt blocks that need to be allocated.
> > > > 5. The total number of blocks that need to be allocated.
> > > >
> > > > ... to compute number of bytes that need to be reserved in the log.
> > > >
> > > > This commit also modifies xfs_log_calc_max_attrsetm_res() to invoke
> > > > xfs_attr_calc_size() to obtain the number of blocks to be logged which it uses
> > > > to figure out the total number of bytes to be logged.
> > > >
> > > > Signed-off-by: Chandan Rajendra <chandanrlinux@gmail.com>
> > > > ---
> > > > fs/xfs/libxfs/xfs_attr.c | 7 +++--
> > > > fs/xfs/libxfs/xfs_attr.h | 3 ++
> > > > fs/xfs/libxfs/xfs_log_rlimit.c | 16 +++++------
> > > > fs/xfs/libxfs/xfs_trans_resv.c | 50 ++++++++++++++++------------------
> > > > fs/xfs/libxfs/xfs_trans_resv.h | 2 ++
> > > > 5 files changed, 41 insertions(+), 37 deletions(-)
> > > >
> > > > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > > > index 1d62ce80d7949..f056f8597ee03 100644
> > > > --- a/fs/xfs/libxfs/xfs_attr.c
> > > > +++ b/fs/xfs/libxfs/xfs_attr.c
> > > > @@ -182,9 +182,12 @@ xfs_attr_calc_size(
> > > > size = xfs_attr_leaf_newentsize(mp->m_attr_geo, namelen, valuelen,
> > > > local);
> > > > resv->total_dablks = XFS_DAENTER_BLOCKS(mp, XFS_ATTR_FORK);
> > > > + resv->log_dablks = 2 * resv->total_dablks;
> > > > +
> > > > if (*local) {
> > > > if (size > (blksize / 2)) {
> > > > /* Double split possible */
> > > > + resv->log_dablks += resv->total_dablks;
> > >
> > > I'm not quite clear on why log is double the total above, then we add an
> > > increment of total here. Can you elaborate on this?
> >
> > - resv->total_dablks = XFS_DAENTER_BLOCKS(mp, XFS_ATTR_FORK);
> >
> > This gives the number of dabtree blocks that we might need to allocate. If
> > we do indeed allocate resv->total_dablks worth of blocks then it would mean
> > that we would have split blocks from the root node to the leaf node of the
> > dabtree. We would then need to log the following blocks,
> > - Blocks belonging to the original path from root node to the leaf node.
> > - The new set of blocks from the root node to the leaf node which resulted
> > from splitting the corresponding blocks in the original path.
> > Thus the number of blocks to be logged is 2 * resv->total_dablks.
> >
>
> Got it.
>
> > - Double split
> > If the size of the xattr is large enough to cause another split, then we
> > would need yet another set of new blocks for representing the path from root
> > to the leaf. This is taken into consideration by 'resv->total_dablks *= 2'.
> > These new blocks might also need to logged. So resv->log_dablks should be
> > incremented by number of blocks b/w root node and the leaf. I now feel that
> > the statement could be rewritten in the following manner, 'resv->log_dablks
> > += XFS_DAENTER_BLOCKS(mp, XFS_ATTR_FORK)' if that makes it easier to
> > understand.
> >
>
> Yeah, it might be clearer overall to not intertwine the fields.
>
> > >
> > > > resv->total_dablks *= 2;
> > > > }
> > > > resv->rmt_blks = 0;
> > > > @@ -349,9 +352,7 @@ xfs_attr_set(
> > > > return error;
> > > > }
> > > >
> > > > - tres.tr_logres = M_RES(mp)->tr_attrsetm.tr_logres +
> > > > - M_RES(mp)->tr_attrsetrt.tr_logres *
> > > > - args->total;
> > > > + tres.tr_logres = xfs_calc_attr_res(mp, &resv);
> > >
> > > So instead of using the runtime reservation calculation, we call this
> > > new function that uses the block res structure and calculates log
> > > reservation for us. Seems like a decent cleanup, but I'm still having
> > > some trouble determining what is cleanup vs. what is fixing the res
> > > calculation.
> >
> > Log reservation calculation for xattr set operation in XFS is
> > incorrect because,
> >
> > - xfs_attr_set() computes the total log space reservation as the
> > sum of
> > 1. Mount time log reservation (M_RES(mp)->tr_attrsetm.tr_logres)
> > Here, XFS incorrectly includes the number of blocks from root
> > node to the leaf of a dabtree. This is incorrect because the
> > number of such paths (from root node to leaf) depends on the
> > size of the xattr. If the xattr is local but large enough to
> > cause a double split then we would need three times the
> > number of blocks in a path from root node to leaf. Hence the
> > number of dabtree blocks that need to be logged is something
> > that cannot be calculated at mount time.
> >
>
> Ok, so the dabtree portion doesn't properly account for dabtree splits
> and cannot do so as part of a mount time calculation.
>
> > Also, the mount log reservation calcuation does not account
> > for log space required for logging the AGF.
> >
>
> Ok.
>
> > 2. Run time log reservatation
> > This is calculated by multiplying two components,
> > - The first component consists of sum of the following
> > - Superblock
> > - The number of bytes for logging a single path from root
> > to leaf of a bmbt tree.
> > - The second component consists of,
> > - The number of dabtree blocks
> > - The number of bmbt blocks required for mapping new
> > dabtree blocks.
> > Here Bmbt blocks gets accounted twice. Also it does not make sense to
> > multiply these two components.
> >
>
> Hm, Ok. I think I follow, though I'm still curious if this was a
> reproducible problem or not.
>
> > The other bug is related to calcuation performed in
> > xfs_log_calc_max_attrsetm_res(),
> > 1. The call to XFS_DAENTER_SPACE_RES() macro returns the number of
> > blocks required for a single split of dabtree and the
> > corresponding number of bmbt blocks required for mapping the new
> > dabtree blocks.
> > 2. The space occupied by the attribute value is treated as if it
> > were a remote attribute and the space for it along with the
> > corresponding bmbt blocks required is added to the number of
> > blocks obtained in step 1.
> > 3. This sum is then multiplied with the runtime reservation. Here
> > again Bmbt blocks gets accounted twice, once from the 'run time
> > reservation' and the other from the call to
> > XFS_DAENTER_SPACE_RES().
> > 4. The result from step 3 is added to 'mount time log
> > reservation'. This means that the log space reservation is
> > accounted twice for dabtree blocks.
> >
> > To solve these problems, the fix consists of,
> > 1. Mounting time log reservation now includes only
> > - Inode
> > - Superblock
> > - AGF
> > i.e. we don't consider the dabtree logging space requirement
> > when calculating mount time log reservation.
> >
> > 2. Run time log reservation
> > To calculate run time log reservation we change xfs_attr_calc_size()
> > to return,
> > 1. Number of Dabtree blocks required.
> > 2. Number of Dabtree blocks that might need to be logged.
> > 3. Number of remote blocks.
> > 4. Number of Bmbt blocks that might be required.
> >
>
> Can we further break down some of these changes into smaller patches
> that either fix one issue or do some refactoring? For example, one patch
> could move the existing rt reservation out of ->tr_attrsetrt (removing
> it) and establish the new xfs_calc_attr_res() function without changing
> the reservation. Another patch could move the dabtree component from the
> mount time to the runtime calculation. It's not clear to me if the
> individual components of the new rt calculation can be further broken
> down from there without risk of transient regression, but it seems we
> should at least be able to rework the reservation calculation in a
> single patch that does so from first principles with the necessary
> refactoring work to accommodate the proper mount/run time components
> already done..
>
Sure, I will try to break it down into as many logical patches as possible.
--
chandan
prev parent reply other threads:[~2020-02-27 9:24 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-24 4:00 [PATCH V4 RESEND 0/7] Fix log reservation calculation for xattr insert operation Chandan Rajendra
2020-02-24 4:00 ` [PATCH V4 RESEND 1/7] xfs: Pass xattr name and value length explicitly to xfs_attr_leaf_newentsize Chandan Rajendra
2020-02-25 16:11 ` Brian Foster
2020-02-26 16:58 ` Christoph Hellwig
2020-02-27 9:27 ` Chandan Rajendra
2020-02-24 4:00 ` [PATCH V4 RESEND 2/7] xfs: xfs_attr_calc_size: Use local variables to track individual space components Chandan Rajendra
2020-02-25 16:11 ` Brian Foster
2020-02-26 10:38 ` Chandan Rajendra
2020-02-24 4:00 ` [PATCH V4 RESEND 3/7] xfs: xfs_attr_calc_size: Calculate Bmbt blks only once Chandan Rajendra
2020-02-25 16:11 ` Brian Foster
2020-02-26 15:03 ` Chandan Rajendra
2020-02-26 16:42 ` Brian Foster
2020-02-27 8:59 ` Chandan Rajendra
2020-02-27 11:53 ` Brian Foster
2020-02-27 13:38 ` Chandan Rajendra
2020-02-24 4:00 ` [PATCH V4 RESEND 4/7] xfs: Introduce struct xfs_attr_set_resv Chandan Rajendra
2020-02-25 16:27 ` Brian Foster
2020-02-26 10:40 ` Chandan Rajendra
2020-02-24 4:00 ` [PATCH V4 RESEND 5/7] xfs: xfs_attr_calc_size: Explicitly pass mp, namelen and valuelen args Chandan Rajendra
2020-02-25 16:27 ` Brian Foster
2020-02-24 4:00 ` [PATCH V4 RESEND 6/7] xfs: Make xfs_attr_calc_size() non-static Chandan Rajendra
2020-02-25 16:24 ` Darrick J. Wong
2020-02-24 4:00 ` [PATCH V4 RESEND 7/7] xfs: Fix log reservation calculation for xattr insert operation Chandan Rajendra
2020-02-25 17:19 ` Brian Foster
2020-02-26 11:21 ` Chandan Rajendra
2020-02-26 18:50 ` Brian Foster
2020-02-27 9:14 ` Chandan Rajendra [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1915611.aRCDV9Np7A@localhost.localdomain \
--to=chandan@linux.ibm.com \
--cc=amir73il@gmail.com \
--cc=bfoster@redhat.com \
--cc=chandanrlinux@gmail.com \
--cc=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox