From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: Mark Tinguely <mark.tinguely@hpe.com>, linux-xfs@vger.kernel.org
Subject: Re: tr_ifree transaction log reservation calculation
Date: Wed, 22 Nov 2017 13:26:45 +1100 [thread overview]
Message-ID: <20171122022645.GR5858@dastard> (raw)
In-Reply-To: <20171121173557.GB6004@bfoster.bfoster>
On Tue, Nov 21, 2017 at 12:35:57PM -0500, Brian Foster wrote:
> cc linux-xfs
>
> On Tue, Nov 21, 2017 at 10:51:10AM -0600, Mark Tinguely wrote:
> > STATIC uint
> > xfs_calc_ifree_reservation(
> > struct xfs_mount *mp)
> > {
> > return XFS_DQUOT_LOGRES(mp) +
> > mp->m_sb.sb_inodesize +
> > mp->m_sb.sb_sectsize +
> > mp->m_sb.sb_sectsize +
> > XFS_FSB_TO_B(mp, 1) +
> > MAX((__uint16_t)XFS_FSB_TO_B(mp, 1),
> > XFS_INODE_CLUSTER_SIZE(mp)) +
> > 128 * 5 +
> > XFS_ALLOCFREE_LOG_RES(mp, 1) +
> > 128 * (2 + XFS_IALLOC_BLOCKS(mp) + mp->m_in_maxlevels +
> > XFS_ALLOCFREE_LOG_COUNT(mp, 1));
> > }
> > ...
>
> Indeed, this looks fairly similar to the line cited above from the
> current code (modulo the xfs_calc_buf_res() refactoring).
Right, the refactoring should have been a straight conversion of the
code - and not change any of the reservations. If they were wrong
before the refactoring, they will still be wrong now.
> I'm curious
> why we only consider the buffer overhead (i.e., 128 bytes *
> ->m_in_maxlevels) of the ->m_in_maxlevels value rather than the full
> buffer size.
That just looks like a bug. Earlier on in the calculation we
take into account:
* the inode btree entry: block size
....
xfs_calc_buf_res(1, XFS_FSB_TO_B(mp, 1)) +
So we've accounted for the AGI btree block the record being modified
exists in. But we don't account for tree shape changes....
.... and, as usual, knowing the history of the code tells me the
probable reason why the reservation is like this.
i.e. inodes could not be removed from disk when the original
reservation was defined. Once allocated, they never got
removed, so an ifree transaction only ever touched the inode btree
block the chunk record was in. So, a quick dive into history from
the XFS archive with git blame:
814994f5c75f6 (Adam Sweeney 1994-09-23 * In freeing an inode we can modify:
814994f5c75f6 (Adam Sweeney 1994-09-23 * the inode being freed: inode size
814994f5c75f6 (Adam Sweeney 1994-09-23 * the super block free inode counter: sector size
814994f5c75f6 (Adam Sweeney 1994-09-23 * the agi hash list and counters: sector size
814994f5c75f6 (Adam Sweeney 1994-09-23 * the inode btree entry: block size
cd235688e046e (Adam Sweeney 1995-08-31 * the on disk inode before ours in the agi hash list: inode cluster size
254f6311ed1b7 (Steve Lord 2003-10-06 * the inode btree: max depth * blocksize
254f6311ed1b7 (Steve Lord 2003-10-06 * the allocation btrees: 2 trees * (max depth - 1) * block size
814994f5c75f6 (Adam Sweeney 1994-09-23 */
d3c15dd08b409 (Russell Cattelan 2003-04-15 #define XFS_CALC_IFREE_LOG_RES(mp) \
814994f5c75f6 (Adam Sweeney 1994-09-23 ((mp)->m_sb.sb_inodesize + \
814994f5c75f6 (Adam Sweeney 1994-09-23 (mp)->m_sb.sb_sectsize + \
814994f5c75f6 (Adam Sweeney 1994-09-23 (mp)->m_sb.sb_sectsize + \
814994f5c75f6 (Adam Sweeney 1994-09-23 XFS_FSB_TO_B((mp), 1) + \
0139225509eb1 (Steve Lord 2001-09-11 MAX((__uint16_t)XFS_FSB_TO_B((mp), 1), XFS_INODE_CLUSTER_SIZE(mp)) + \
254f6311ed1b7 (Steve Lord 2003-10-06 (128 * 5) + \
254f6311ed1b7 (Steve Lord 2003-10-06 (128 * (2 + XFS_IALLOC_BLOCKS(mp) + XFS_IN_MAXLEVELS(mp) + \
254f6311ed1b7 (Steve Lord 2003-10-06 XFS_ALLOCFREE_LOG_COUNT(mp, 1))))
254f6311ed1b7 (Steve Lord 2003-10-06
Without even looking I know what those 2003 changes were. For
everyone else:
254f6311ed1b Implement deletion of inode clusters in XFS.
So, yeah, this looks like a bug in the original reservation
calculation for freeing inode clusters. I suspect it got messed up
because the buffers covering XFS_IALLOC_BLOCKS (the inode chunk) get
marked stale and so their contents are not logged. That means just
the header+blf is logged and so the reservation is correct for them.
that isn't the case for the inobt blocks themselves, however.
i.e. it should be calculating:
....
* the inode chunk blocks are stale, so only headers get logged
* the inode btree: max depth * blocksize
....
xfs_calc_buf_res(2 + mp->m_ialloc_blks, 0) +
xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1)) +
....
I'm still not 100% sure what the extra "2 +" is accounting for
there, though, so to be safe that migh need to be accounted as full
blocks, not log headers.
Good find, Brian!
Of course, that doesn't mean it will fix the transaction overrun
being reported because it's overrunning the freespace tree
reservations rather than the inobt reservation. It certainly won't
hurt, though. :P
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2017-11-22 2:26 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-21 15:05 tr_ifree transaction log reservation calculation Brian Foster
[not found] ` <0a9b47ba-a41e-3b96-981f-f04f9e2ab11c@hpe.com>
2017-11-21 17:35 ` Brian Foster
2017-11-22 2:26 ` Dave Chinner [this message]
2017-11-22 12:21 ` Brian Foster
2017-11-22 20:41 ` Brian Foster
2017-11-23 0:24 ` Dave Chinner
2017-11-23 14:36 ` Brian Foster
2017-11-23 21:54 ` Dave Chinner
2017-11-24 14:51 ` Brian Foster
2017-11-25 23:20 ` Dave Chinner
2017-11-27 18:46 ` Brian Foster
2017-11-27 21:29 ` Dave Chinner
2017-11-28 13:28 ` Brian Foster
2017-11-28 21:34 ` Dave Chinner
2017-11-29 14:31 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171122022645.GR5858@dastard \
--to=david@fromorbit.com \
--cc=bfoster@redhat.com \
--cc=linux-xfs@vger.kernel.org \
--cc=mark.tinguely@hpe.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).