From: Chandan Rajendra <chandan@linux.ibm.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Brian Foster <bfoster@redhat.com>,
Chandan Rajendra <chandanrlinux@gmail.com>,
linux-xfs@vger.kernel.org, darrick.wong@oracle.com
Subject: Re: [PATCH 1/2] xfs: Fix log reservation calculation for xattr insert operation
Date: Tue, 07 Apr 2020 10:41:54 +0530 [thread overview]
Message-ID: <1729748.3DUTM0xfxg@localhost.localdomain> (raw)
In-Reply-To: <20200406225758.GC21885@dread.disaster.area>
On Tuesday, April 7, 2020 4:27 AM Dave Chinner wrote:
> On Mon, Apr 06, 2020 at 11:25:40AM -0400, Brian Foster wrote:
> > On Sat, Apr 04, 2020 at 02:22:02PM +0530, Chandan Rajendra wrote:
> > > Log space reservation for xattr insert operation is divided into two
> > > parts,
> > > 1. Mount time
> > > - Inode
> > > - Superblock for accounting space allocations
> > > - AGF for accounting space used by count, block number, rmap and refcnt
> > > btrees.
> > >
> > > 2. The remaining log space can only be calculated at run time because,
> > > - A local xattr can be large enough to cause a double split of the da
> > > btree.
> > > - The value of the xattr can be large enough to be stored in remote
> > > blocks. The contents of the remote blocks are not logged.
> > >
> > > The log space reservation could be,
> > > - (XFS_DA_NODE_MAXDEPTH + 1) number of blocks. The "+ 1" is required in
> > > case xattr is large enough to cause another split of the da btree path.
> > > - BMBT blocks for storing (XFS_DA_NODE_MAXDEPTH + 1) record
> > > entries.
> > > - Space for logging blocks of count, block number, rmap and refcnt btrees.
> > >
> > > At present, mount time log reservation includes block count required for a
> > > single split of the dabtree. The dabtree block count is also taken into
> > > account by xfs_attr_calc_size().
> > >
> > > Also, AGF log space reservation isn't accounted for.
> > >
> > > Due to the reasons mentioned above, log reservation calculation for xattr
> > > insert operation gives an incorrect value.
> > >
> > > Apart from the above, xfs_log_calc_max_attrsetm_res() passes byte count as
> > > an argument to XFS_NEXTENTADD_SPACE_RES() instead of block count.
> > >
> > > The above mentioned inconsistencies were discoverd when trying to mount a
> > > modified XFS filesystem which uses a 32-bit value as xattr extent counter
> > > caused the following warning messages to be printed on the console,
> > >
> > > XFS (loop0): Mounting V4 Filesystem
> > > XFS (loop0): Log size 2560 blocks too small, minimum size is 4035 blocks
> > > XFS (loop0): Log size out of supported range.
> > > XFS (loop0): Continuing onwards, but if log hangs are experienced then please report this message in the bug report.
> > > XFS (loop0): Ending clean mount
> > >
> > > To fix the inconsistencies described above, this commit replaces 'mount'
> > > and 'runtime' components with just one static reservation. The new
> > > reservation calculates the log space for the worst case possible i.e. it
> > > considers,
> > > 1. Double split of the da btree.
> > > This happens for large local xattrs.
> > > 2. Bmbt blocks required for mapping the contents of a maximum
> > > sized (i.e. XATTR_SIZE_MAX bytes in size) remote attribute.
> > >
> >
> > Hmm.. so the last I recall looking at this, the change was more around
> > refactoring the mount vs. runtime portions of the xattr reservation to
> > be more accurate. This approach eliminates the runtime portion for a
> > 100% mount time reservation calculation. Can you elaborate on why the
> > change in approach? Also, it looks like at least one tradeoff here is
> > reservation size (to the point where we increase min log size on small
> > filesystems?).
>
> What's not in this commit message is that this was actually my idea
> that I had when Chandan contacted me off list about his refactoring
> of the reservation blowing out reservations for attribute operations
> by a factor of 10.
>
> My fault, I should have pushed the discussion back to the mailing
> list rather than answering directly. I'll repeat a lot of my
> analysis from that discussion below to get everyone up to speed.
>
> [ Chandan, in future I'm going to insist that all your XFS questions
> need to be on the list, so that everyone sees the discusions and
> understands the reasons why things are suggested. It's also a good
> idea to use "suggested-by" when presenting code based on other
> people's ideas, just so that everyone knows that there were more
> people involved than just yourself... ]
>
> So, when I went through all the reservations changes that Chandan
> had made I realised that the current code was wrong in lots of ways,
> and when I looked at it from the fundamental changes being made the
> mount vs runtime split made no sense at all.
>
> Such as:
>
> - the dabtree double split was a double _leaf block split only_. It
> is not a full tree split, and can only result in a single parent
> split because there is only one path update after the double leaf
> split has been done. Hence it can only do one full dabtree split
> and the code in xfs_attr_calc_size() that doubles the block count
> reservation for the double leaf split is wrong. We only need on
> extra block, and that is just:
>
> dgc> #define XFS_DAENTER_DBS(mp,w) \
> dgc> - (XFS_DA_NODE_MAXDEPTH + (((w) == XFS_DATA_FORK) ? 2 : 0))
> dgc> + (XFS_DA_NODE_MAXDEPTH + (((w) == XFS_DATA_FORK) ? 2 : 1))
>
> [ Note: The "+ 2" for the data fork reservation is for the dir data
> block and a potential free space index block that get added in a
> typical directory entry addition. ]
>
> - remote attributes are not logged, so only the BMBT block
> reservation is needed for that extent allocation. i.e. we need to
> reserve the blocks for the xattr, but we don't need log space
> for them. xfs_attr_calc_size() gets this right,
> xfs_log_calc_max_attrsetm_res() gets this wrong in that it does
> not take into account remote attr BMBT blocks at all.
>
> - xfs_attr_calc_size() calculates the number of blocks we need to
> allocate for the attr operation, not the number of blocks we need
> to log, hence can't be used to replace
> xfs_log_calc_max_attrsetm_res().
>
> - the runtime reservation is just a BMBT block reservation for a
> single block to be allocated. multiplying the number of blocks we
> need to allocate by M_RES(mp)->tr_attrsetrt.tr_logres to get the
> log reservation is wrong. We are not doing a full BMBT split for
> every block in the attribute we modify, so the log reservation is
> massively oversized by xfs_log_calc_max_attrsetm_res() and
> xfs_attr_set() by multipling the block count (including BMBT
> blocks we allocate) by a full bmbt split reservation.
>
> IOWs, the code as it stands now is just wrong. It works because it
> massively oversizes the runtime reservations, but that in itself is
> a problem. To quote myself again from that analysis:
>
> dgc> The log reservation that covers both local and remote attributes:
> dgc>
> dgc> blks = full dabtree split + 1 leaf block + bmbt blocks
> dgc> blks += nextent_res(MAX_ATTR_LEN/block size) // bmbt blocks only
> dgc> resv = inode + sb + agf +
> dgc> xfs_calc_buf_res(blks) +
> dgc> allocfree_log_count(blks);
>
> THe first line takes into account the blocks we modify in a local
> attribute tree modification. The second line takes into account the
> BMBT logging overhead of a remote attribute. The "resv" calculation
> converts that modified block count into a log reservation and adds
> the freespace tree logging overhead of allocating all those blocks.
>
> The only thing that is variable at runtime now is the size of the
> remote attribute, but we already have a log reservation for the
> allocation and BMBT block modification side of that and so we only
> need to physically reserve the block space (i.e. via
> block count passed to xfs_trans_alloc()).
>
> IOWs, the log reservation does not need to change at runtime now.
>
> It also makes it clear that changing the attr fork extent count from
> 16 to 32 bits should only impact the BMBT reservations. The dabtree
> reservations already use XFS_DA_NODE_MAXDEPTH for the attr fork and
> hence so they already are sized for max dabtree depth reservations.
>
> As a result, the attr reservation itself should not grow excessively
> for 32bit attribute fork extent counts. It should maybe 20-30 blocks
> on a 4kb block size filesystem as we add 4-5 levels to the max depth
> of the BMBT on the attribute fork. It should certainly not grow by
> 400 blocks as the original reworking resulted in...
>
> Again, sorry for not getting this discussion out onto the mailing
> list originally, it should have been there.
It is actually me at fault here. I am sorry for not having the conversation on
the mailing list in the first place.
--
chandan
next prev parent reply other threads:[~2020-04-07 5:08 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-04 8:52 [PATCH 0/2] Extend xattr extent counter to 32-bits Chandan Rajendra
2020-04-04 8:52 ` [PATCH 1/2] xfs: Fix log reservation calculation for xattr insert operation Chandan Rajendra
2020-04-06 15:25 ` Brian Foster
2020-04-06 22:57 ` Dave Chinner
2020-04-07 5:11 ` Chandan Rajendra [this message]
2020-04-07 12:59 ` Brian Foster
2020-04-07 0:49 ` Dave Chinner
2020-04-08 8:47 ` Chandan Rajendra
2020-04-04 8:52 ` [PATCH 2/2] xfs: Extend xattr extent counter to 32-bits Chandan Rajendra
2020-04-06 16:45 ` Brian Foster
2020-04-08 12:40 ` Chandan Rajendra
2020-04-06 17:06 ` Darrick J. Wong
2020-04-06 23:30 ` Dave Chinner
2020-04-08 12:43 ` Chandan Rajendra
2020-04-08 15:38 ` Darrick J. Wong
2020-04-08 22:43 ` Dave Chinner
2020-04-08 15:45 ` Darrick J. Wong
2020-04-08 22:45 ` Dave Chinner
2020-04-08 12:42 ` Chandan Rajendra
2020-04-07 1:20 ` Dave Chinner
2020-04-08 12:45 ` Chandan Rajendra
2020-04-10 7:46 ` Chandan Rajendra
2020-04-12 6:34 ` Chandan Rajendra
2020-04-13 18:55 ` Darrick J. Wong
2020-04-20 4:38 ` Chandan Rajendra
2020-04-22 9:38 ` Chandan Rajendra
2020-04-22 22:30 ` Dave Chinner
2020-04-25 12:07 ` Chandan Rajendra
2020-04-26 22:08 ` Dave Chinner
2020-04-29 15:35 ` Chandan Rajendra
2020-05-01 7:08 ` Chandan Rajendra
2020-05-12 23:53 ` Darrick J. Wong
2020-05-13 12:19 ` Chandan Rajendra
2020-04-22 22:51 ` Darrick J. Wong
2020-04-27 7:42 ` Christoph Hellwig
2020-04-27 7:39 ` Christoph Hellwig
2020-04-30 2:29 ` Chandan Rajendra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1729748.3DUTM0xfxg@localhost.localdomain \
--to=chandan@linux.ibm.com \
--cc=bfoster@redhat.com \
--cc=chandanrlinux@gmail.com \
--cc=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).