From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH RFC 4/4] xfs: include an allocfree res for inobt modifications
Date: Tue, 28 Nov 2017 10:49:22 -0500 [thread overview]
Message-ID: <20171128154922.GE45759@bfoster.bfoster> (raw)
In-Reply-To: <20171127232719.GC5858@dastard>
On Tue, Nov 28, 2017 at 10:27:19AM +1100, Dave Chinner wrote:
> On Mon, Nov 27, 2017 at 03:24:34PM -0500, Brian Foster wrote:
> > Analysis of recent reports of log reservation overruns and code
> > inspection has uncovered that the reservations associated with inode
> > operations may not cover the worst case scenarios. In particular,
> > many cases only include one allocfree res. for a particular
> > operation even though said operations may also entail AGFL fixups
> > and inode btree block allocations in addition to the actual inode
> > chunk allocation. This can easily turn into two or three block
> > allocations (or frees) per operation.
> >
> > In theory, the only way to define the worst case reservation is to
> > include an allocfree res for each individual allocation in a
> > transaction. Since that is impractical (we can perform multiple agfl
> > fixups per tx and not every allocation is going to result in a full
> > tree operation), implement a reasonable compromise that addresses
> > the deficiency in practice without blowing out the size of the
> > transactions.
> >
> > Refactor the inode transaction reservation code to include one
> > allocfree res. per inode btree modification to cover allocations
> > required by the tree itself. This essentially separates the
> > reservation required to allocate the physical inode chunk from
> > additional reservation required to perform inobt record
> > insertion/removal.
>
> I think you missed the most important reason the inobt/finobt
> modifications need there own allocfree reservation - btree
> modifications that cause btree blocks to be freed do not use defered
> ops and so the freeing of blocks occurs directly within the primary
> transaction. Hence the primary transaction reservation must take
> this into account ....
>
> > Apply the same logic to the finobt reservation.
> > This results in killing off the finobt modify condition because we
> > no longer assume that the broader transaction reservation will cover
> > finobt block allocations.
> >
> > Suggested-by: Dave Chinner <david@fromorbit.com>
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
>
> Code looks fine. Comments below are for another follow-on patch.
>
Actually, what do you think of the following variant (compile tested
only)? This facilites another cleanup patch I just wrote to kill off
about half of the (now effectively duplicate) xfs_calc_create_*()
helpers because the finobt and inode chunk res. helpers only include the
associated reservation based on the associated feature bits.
Brian
--- 8< ---
diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
index c27c57e65e15..045781f2cf00 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.c
+++ b/fs/xfs/libxfs/xfs_trans_resv.c
@@ -172,6 +172,41 @@ xfs_calc_finobt_res(
}
/*
+ * Calculate the reservation required to allocate or free an inode chunk. This
+ * includes:
+ *
+ * the allocation btrees: 2 trees * (max depth - 1) * block size
+ * the inode chunk: m_ialloc_blks * N
+ *
+ * The size N of the inode chunk reservation depends on whether it is for
+ * allocation or free and which type of create transaction is in use. An inode
+ * chunk free always invalidates the buffers and only requires reservation for
+ * headers (N == 0). An inode chunk allocation requires a chunk sized
+ * reservation on v4 and older superblocks to initialize the chunk. No chunk
+ * reservation is required for allocation on v5 supers, which use ordered
+ * buffers to initialize.
+ */
+STATIC uint
+xfs_calc_inode_chunk_res(
+ struct xfs_mount *mp,
+ bool alloc)
+{
+ uint res, size = 0;
+
+ res = xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
+ XFS_FSB_TO_B(mp, 1));
+ if (alloc) {
+ /* icreate tx uses ordered buffers */
+ if (xfs_sb_version_hascrc(&mp->m_sb))
+ return res;
+ size = XFS_FSB_TO_B(mp, 1);
+ }
+
+ res += xfs_calc_buf_res(mp->m_ialloc_blks, size);
+ return res;
+}
+
+/*
* Various log reservation values.
*
* These are based on the size of the file system block because that is what
@@ -386,8 +421,7 @@ xfs_calc_create_resv_modify(
* For create we can allocate some inodes giving:
* the agi and agf of the ag getting the new inodes: 2 * sectorsize
* the superblock for the nlink flag: sector size
- * the inode blocks allocated: mp->m_ialloc_blks * blocksize
- * the allocation btrees: 2 trees * (max depth - 1) * block size
+ * the inode chunk (allocation/init)
* the inode btree (record insertion)
*/
STATIC uint
@@ -396,9 +430,7 @@ xfs_calc_create_resv_alloc(
{
return xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
mp->m_sb.sb_sectsize +
- xfs_calc_buf_res(mp->m_ialloc_blks, XFS_FSB_TO_B(mp, 1)) +
- xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
- XFS_FSB_TO_B(mp, 1)) +
+ xfs_calc_inode_chunk_res(mp, true) +
xfs_calc_inobt_res(mp);
}
@@ -415,7 +447,7 @@ __xfs_calc_create_reservation(
* For icreate we can allocate some inodes giving:
* the agi and agf of the ag getting the new inodes: 2 * sectorsize
* the superblock for the nlink flag: sector size
- * the allocation btrees: 2 trees * (max depth - 1) * block size
+ * the inode chunk (allocation, no init)
* the inobt (record insertion)
* the finobt (record insertion)
*/
@@ -425,8 +457,7 @@ xfs_calc_icreate_resv_alloc(
{
return xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
mp->m_sb.sb_sectsize +
- xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
- XFS_FSB_TO_B(mp, 1)) +
+ xfs_calc_inode_chunk_res(mp, true) +
xfs_calc_inobt_res(mp) +
xfs_calc_finobt_res(mp);
}
@@ -492,15 +523,15 @@ xfs_calc_symlink_reservation(
* the inode being freed: inode size
* the super block free inode counter, AGF and AGFL: sector size
* the on disk inode (agi unlinked list removal)
- * the inode chunk is marked stale (headers only)
+ * the inode chunk (invalidated, headers only)
* the inode btree
* the finobt (record insertion, removal or modification)
*
- * Note that the allocfree res. for the inode chunk itself is not included
- * because the extent free occurs after a transaction roll. We could take the
- * maximum of the pre/post roll operations, but the pre-roll reservation already
- * includes at least one allocfree res. for the inobt and is thus guaranteed to
- * be larger.
+ * Note that the inode chunk res. includes an allocfree res. for freeing of the
+ * inode chunk. This is technically extraneous because the inode chunk free is
+ * deferred (it occurs after a transaction roll). Include the extra reservation
+ * anyways since we've had reports of ifree transaction overruns due to too many
+ * agfl fixups during inode chunk frees.
*/
STATIC uint
xfs_calc_ifree_reservation(
@@ -511,7 +542,7 @@ xfs_calc_ifree_reservation(
xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
xfs_calc_iunlink_remove_reservation(mp) +
xfs_calc_buf_res(1, 0) +
- xfs_calc_buf_res(mp->m_ialloc_blks, 0) +
+ xfs_calc_inode_chunk_res(mp, false) +
xfs_calc_inobt_res(mp) +
xfs_calc_finobt_res(mp);
}
next prev parent reply other threads:[~2017-11-28 15:49 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-27 20:24 [PATCH 0/4] xfs: inode transaction reservation fixups Brian Foster
2017-11-27 20:24 ` [PATCH 1/4] xfs: print transaction log reservation on overrun Brian Foster
2017-11-27 22:14 ` Dave Chinner
2017-11-27 20:24 ` [PATCH 2/4] xfs: include inobt buffers in ifree tx log reservation Brian Foster
2017-11-27 22:28 ` Dave Chinner
2017-11-28 13:30 ` Brian Foster
2017-11-28 21:38 ` Dave Chinner
2017-11-29 14:31 ` Brian Foster
2017-11-27 20:24 ` [PATCH 3/4] xfs: amortize agfl block frees across multiple transactions Brian Foster
2017-11-27 23:07 ` Dave Chinner
2017-11-28 13:57 ` Brian Foster
2017-11-28 22:09 ` Dave Chinner
2017-11-29 18:24 ` Brian Foster
2017-11-29 20:36 ` Brian Foster
2017-12-05 20:53 ` Brian Foster
2017-11-27 20:24 ` [PATCH RFC 4/4] xfs: include an allocfree res for inobt modifications Brian Foster
2017-11-27 23:27 ` Dave Chinner
2017-11-28 14:04 ` Brian Foster
2017-11-28 22:26 ` Dave Chinner
2017-11-29 14:32 ` Brian Foster
2017-11-28 15:49 ` Brian Foster [this message]
2017-11-28 22:34 ` Dave Chinner
2017-11-29 14:32 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171128154922.GE45759@bfoster.bfoster \
--to=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).