From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH] xfs: fix indlen accounting error on partial delalloc conversion
Date: Tue, 9 May 2017 12:32:13 -0400 [thread overview]
Message-ID: <20170509163211.GB3241@bfoster.bfoster> (raw)
In-Reply-To: <20170509160948.GP5973@birch.djwong.org>
On Tue, May 09, 2017 at 09:09:48AM -0700, Darrick J. Wong wrote:
> On Wed, May 03, 2017 at 02:46:26PM -0400, Brian Foster wrote:
> > The delalloc -> real block conversion path uses an incorrect
> > calculation in the case where the middle part of a delalloc extent
> > is being converted. This is documented as a rare situation because
> > XFS generally attempts to maximize contiguity by converting as much
> > of a delalloc extent as possible.
> >
> > If this situation does occur, the indlen reservation for the two new
> > delalloc extents left behind by the conversion of the middle range
> > is calculated and compared with the original reservation. If more
> > blocks are required, the delta is allocated from the global block
> > pool. This delta value can be characterized as the difference
> > between the new total requirement (temp + temp2) and the currently
> > available reservation minus those blocks that have already been
> > allocated (startblockval(PREV.br_startblock) - allocated).
> >
> > The problem is that the current code does not account for previously
> > allocated blocks correctly. It subtracts the current allocation
> > count from the (new - old) delta rather than the old indlen
> > reservation. This means that more indlen blocks than have been
> > allocated end up stashed in the remaining extents and free space
> > accounting is broken as a result.
> >
> > Fix up the calculation to subtract the allocated block count from
> > the original extent indlen and thus correctly allocate the
> > reservation delta based on the difference between the new total
> > requirement and the unused blocks from the original reservation.
> > Also remove a bogus assert that contradicts the fact that the new
> > indlen reservation can be larger than the original indlen
> > reservation.
> >
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >
> > Hi all,
> >
> > So while I fortunately was able to reproduce this problem and verify the
> > patch, I'm not able to do so in any way that facilitates a generic test
> > case. To reproduce, I had to hack up the allocation path to always do
> > single physical block allocations and otherwise not attempt to convert
> > whole delalloc extents, hack up an ioctl to do a
> > filemap_write_and_wait_range(), and run a test that repeatedly does 1M
> > buffered writes followed by fragmented, page-sized writebacks of that 1M
> > extent. This forces partial (middle) extent conversions until the extent
> > tree grows enough to require a bmbt block allocation. When that occurs,
> > I always end up with the following free space accounting inconsistency
> > reported via xfs_repair:
> >
> > Phase 1 - find and verify superblock...
> > Phase 2 - using internal log
> > - zero log...
> > - scan filesystem freespace and inode maps...
> > sb_fdblocks 3929316, counted 3929314
> > ...
> >
> > With this patch applied, the same test no longer reproduces the
> > corruption. This otherwise survives my xfstests testing (without the
> > aforementioned hacks). Thoughts, reviews, flames appreciated.
> >
> > Brian
> >
> > fs/xfs/libxfs/xfs_bmap.c | 7 ++++---
> > 1 file changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> > index f02eb76..8adb91b 100644
> > --- a/fs/xfs/libxfs/xfs_bmap.c
> > +++ b/fs/xfs/libxfs/xfs_bmap.c
> > @@ -2065,8 +2065,10 @@ xfs_bmap_add_extent_delay_real(
> > }
> > temp = xfs_bmap_worst_indlen(bma->ip, temp);
> > temp2 = xfs_bmap_worst_indlen(bma->ip, temp2);
> > - diff = (int)(temp + temp2 - startblockval(PREV.br_startblock) -
> > - (bma->cur ? bma->cur->bc_private.b.allocated : 0));
> > + diff = (int)(temp + temp2 -
> > + (startblockval(PREV.br_startblock) -
> > + (bma->cur ?
> > + bma->cur->bc_private.b.allocated : 0)));
>
> <thinking aloud again>
>
> Prior to this operation we made a delalloc reservation in which we
> reserved a number of blocks for bmbt expansion by subtracting that
> quantity from incore fdblocks; that is startblockval(PREV.br_startblock).
> Then we did some bmbt magic which used up cur->bc_private.b.allocated
> blocks from the reservation. Since those blocks were already subtracted
> from the incore fdblocks, we don't need to do that now. After the magic
> was over, we now have two smaller delalloc reservations (with a real
> extent in the middle), each with their own reservations for bmbt shape
> changing expansion.
>
> Now we aim to make the accounting of incore fdblocks square again by
> checking if we are now hanging on to more blocks than we were before and
> subtracting the difference from fdblocks if so. (We deal separately with
> the possibility of hanging onto /fewer/ blocks than before.)
>
> Ok, so first we reserved startblockval(PREV.br_startblock).
>
> After splitting the da extent in the middle we now have two da extents;
> the effective indlen reservation of the whole range we just touched is:
> xfs_bmap_worst_indlen(bma->ip, temp) + xfs_bmap_worst_indlen(bma->ip, temp2);
>
> Then we (may have) fed bma->cur->bc_private.b.allocated blocks to the bmbt.
> The number of blocks we now know about is:
>
> xfs_bmap_worst_indlen(bma->ip, temp) +
> xfs_bmap_worst_indlen(bma->ip, temp2) +
> bma->cur->bc_private.b.allocated
>
> Therefore, the number of blocks we are over (if at all) is:
>
> xfs_bmap_worst_indlen(bma->ip, temp) +
> xfs_bmap_worst_indlen(bma->ip, temp2) +
> bma->cur->bc_private.b.allocated -
> startblockval(PREV.br_startblock)
>
> Or, put another way:
>
> (xfs_bmap_worst_indlen(bma->ip, temp) +
> xfs_bmap_worst_indlen(bma->ip, temp2)) -
> (startblockval(PREV.br_startblock) -
> bma->cur->bc_private.b.allocated)
>
> (Exactly what Brian wrote above.)
>
> If that quantity is positive, we're now hanging on to more blocks than
> we were before, so we subtract that many from fdblocks. That part looks
> ok, though I have one question -- why is diff an int? temp/temp2 are
> both xfs_filblks_t (__uint64_t), so the whole expression is promoted to
> a 64-bit operation, then the results are squashed down into a 32-bit
> int, which is then cast back to int64_t. With our short MAXEXTLEN I
> think it's impossible ever to have diff > 2^32, but the casting back and
> forth caught my eye.
>
As to why it is signed, I'd guess that either it was coded that way to
avoid a separate conditional or perhaps diff was previously used
elsewhere in this function..? I'm not really sure why it uses 32-bit vs.
64-bit tbh. (I don't think it matters, but I'm fine if you want to use
an int64_t or something and clean up the casting).
> Aside from that, it looks ok...
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
>
Thanks for review.
Brian
> --D
>
> > if (diff > 0) {
> > error = xfs_mod_fdblocks(bma->ip->i_mount,
> > -((int64_t)diff), false);
> > @@ -2123,7 +2125,6 @@ xfs_bmap_add_extent_delay_real(
> > temp = da_new;
> > if (bma->cur)
> > temp += bma->cur->bc_private.b.allocated;
> > - ASSERT(temp <= da_old);
> > if (temp < da_old)
> > xfs_mod_fdblocks(bma->ip->i_mount,
> > (int64_t)(da_old - temp), false);
> > --
> > 2.7.4
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
prev parent reply other threads:[~2017-05-09 16:32 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-03 18:46 [PATCH] xfs: fix indlen accounting error on partial delalloc conversion Brian Foster
2017-05-09 16:09 ` Darrick J. Wong
2017-05-09 16:32 ` Brian Foster [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170509163211.GB3241@bfoster.bfoster \
--to=bfoster@redhat.com \
--cc=darrick.wong@oracle.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).