From: "Darrick J. Wong" <djwong@kernel.org>
To: Gao Xiang <hsiangkao@aol.com>
Cc: Dave Chinner <david@fromorbit.com>, xfs <linux-xfs@vger.kernel.org>
Subject: Re: regressions in xfs/168?
Date: Thu, 20 May 2021 09:44:02 -0700 [thread overview]
Message-ID: <20210520164402.GY9675@magnolia> (raw)
In-Reply-To: <20210520082316.GA1782@hsiangkao-HP-ZHAN-66-Pro-G1>
On Thu, May 20, 2021 at 04:23:22PM +0800, Gao Xiang wrote:
> Hi Darrick and Dave,
>
> On Wed, May 19, 2021 at 05:08:02PM -0700, Darrick J. Wong wrote:
> > On Thu, May 20, 2021 at 08:20:06AM +1000, Dave Chinner wrote:
> > > On Wed, May 19, 2021 at 02:02:05PM -0700, Darrick J. Wong wrote:
> > > > Hm. Does anyone /else/ see failures with the new test xfs/168 (the fs
> > > > shrink tests) on a 1k blocksize? It looks as though we shrink the AG so
> > > > small that we trip the assert at the end of xfs_ag_resv_init that checks
> > > > that the reservations for an AG don't exceed the free space in that AG,
> > > > but tripping that doesn't return any error code, so xfs_ag_shrink_space
> > > > commits the new fs size and presses on with even more shrinking until
> > > > we've depleted AG 1 so thoroughly that the fs won't mount anymore.
> > >
> > > Yup, now that I've got the latest fstests I see that failure, too.
> > >
> > > [58972.431760] Call Trace:
> > > [58972.432467] xfs_ag_resv_init+0x1d3/0x240
> > > [58972.433611] xfs_ag_shrink_space+0x1bf/0x360
> > > [58972.434801] xfs_growfs_data+0x413/0x640
> > > [58972.435894] xfs_file_ioctl+0x32f/0xd30
> > > [58972.439289] __x64_sys_ioctl+0x8e/0xc0
> > > [58972.440337] do_syscall_64+0x3a/0x70
> > > [58972.441347] entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > [58972.442741] RIP: 0033:0x7f7021755d87
> > >
> > > > At a bare minimum we probably need to check the same thing the assert
> > > > does and bail out of the shrink; or maybe we just need to create a
> > > > function to adjust an AG's reservation to make that function less
> > > > complicated.
> > >
> > > So if I'm reading xfs_ag_shrink_space() correctly, it doesn't
> > > check what the new reservation will be and so it's purely looking at
> > > whether the physical range can be freed or not? And when freeing
> > > that physical range results in less free space in the AG than the
> > > reservation requires, we pop an assert failure rather than failing
> > > the reservation and undoing the shrink like the code is supposed to
> > > do?
> >
> > Yes. I've wondered for a while now if that assert in xfs_ag_resv_init
> > should get turned into an ENOSPC return so that callers can decide what
> > they want to do with it.
>
> Thanks for the detailed analysis (sorry that I didn't check the 1k blocksize
> case before), I'm now renting a department in a new city, no xfstests env
> available for now.
>
> But if I read/understand correctly, the following code might resolve the issue?
>
> diff --git a/fs/xfs/libxfs/xfs_ag_resv.c b/fs/xfs/libxfs/xfs_ag_resv.c
> index 6c5f8d10589c..1f918afd5e91 100644
> --- a/fs/xfs/libxfs/xfs_ag_resv.c
> +++ b/fs/xfs/libxfs/xfs_ag_resv.c
> @@ -312,10 +312,12 @@ xfs_ag_resv_init(
> if (error)
> return error;
>
> - ASSERT(xfs_perag_resv(pag, XFS_AG_RESV_METADATA)->ar_reserved +
> - xfs_perag_resv(pag, XFS_AG_RESV_RMAPBT)->ar_reserved <=
> - pag->pagf_freeblks + pag->pagf_flcount);
> #endif
> + if (xfs_perag_resv(pag, XFS_AG_RESV_METADATA)->ar_reserved +
> + xfs_perag_resv(pag, XFS_AG_RESV_RMAPBT)->ar_reserved >
> + pag->pagf_freeblks + pag->pagf_flcount)
> + return -ENOSPC;
> +
> out:
> return error;
> }
>
> If that works, could you kindly send out it (or some better/sane solution),
> many thanks in advance!
That does seem to fix the symptoms, though I'm gonna take a closer look
at the error handling elsewhere in that function.
--D
>
> Thanks,
> Gao Xiang
>
> >
> > --D
> >
> > > IOWs, the problem is the ASSERT firing on debug kernels, not the
> > > actual shrink code that does handle this reservation ENOSPC error
> > > case properly? i.e. we've got something like an uncaught overflow
> > > in xfs_ag_resv_init() that is tripping the assert? (e.g. used >
> > > ask)
> > >
> > > So I'm not sure that the problem is the shrink code here - it should
> > > undo a reservation failure just fine, but the reservation code is
> > > failing before we get there on a debug kernel...
> > >
> > > Cheers,
> > >
> > > Dave.
> > > --
> > > Dave Chinner
> > > david@fromorbit.com
prev parent reply other threads:[~2021-05-20 16:44 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-19 21:02 regressions in xfs/168? Darrick J. Wong
2021-05-19 22:20 ` Dave Chinner
2021-05-20 0:08 ` Darrick J. Wong
2021-05-20 8:23 ` Gao Xiang
2021-05-20 16:44 ` Darrick J. Wong [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210520164402.GY9675@magnolia \
--to=djwong@kernel.org \
--cc=david@fromorbit.com \
--cc=hsiangkao@aol.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox