From: "Darrick J. Wong" <djwong@kernel.org>
To: Brian Foster <bfoster@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>,
Chandan Babu R <chandan.babu@oracle.com>,
linux-xfs@vger.kernel.org
Subject: Re: [PATCH 6/7] xfs: don't update file system geometry through transaction deltas
Date: Fri, 11 Oct 2024 10:13:03 -0700 [thread overview]
Message-ID: <20241011171303.GB21853@frogsfrogsfrogs> (raw)
In-Reply-To: <Zwkv6G1ZMIdE5vs2@bfoster>
On Fri, Oct 11, 2024 at 10:02:16AM -0400, Brian Foster wrote:
> On Fri, Oct 11, 2024 at 09:57:09AM +0200, Christoph Hellwig wrote:
> > On Thu, Oct 10, 2024 at 10:05:53AM -0400, Brian Foster wrote:
> > > Ok, so we don't want geometry changes transactions in the same CIL
> > > checkpoint as alloc related transactions that might depend on the
> > > geometry changes. That seems reasonable and on a first pass I have an
> > > idea of what this is doing, but the description is kind of vague.
> > > Obviously this fixes an issue on the recovery side (since I've tested
> > > it), but it's not quite clear to me from the description and/or logic
> > > changes how that issue manifests.
> > >
> > > Could you elaborate please? For example, is this some kind of race
> > > situation between an allocation request and a growfs transaction, where
> > > the former perhaps sees a newly added AG between the time the growfs
> > > transaction commits (applying the sb deltas) and it actually commits to
> > > the log due to being a sync transaction, thus allowing an alloc on a new
> > > AG into the same checkpoint that adds the AG?
> >
> > This is based on the feedback by Dave on the previous version:
> >
> > https://lore.kernel.org/linux-xfs/Zut51Ftv%2F46Oj386@dread.disaster.area/
> >
>
> Ah, Ok. That all seems reasonably sane to me on a first pass, but I'm
> not sure I'd go straight to this change given the situation...
>
> > Just doing the perag/in-core sb updates earlier fixes all the issues
> > with my test case, so I'm not actually sure how to get more updates
> > into the check checkpoint. I'll try your exercisers if it could hit
> > that.
> >
>
> Ok, that explains things a bit. My observation is that the first 5
> patches or so address the mount failure problem, but from there I'm not
> reproducing much difference with or without the final patch.
Does this change to flush the log after committing the new sb fix the
recovery problems on older kernels? I /think/ that's the point of this
patch.
> Either way,
> I see aborts and splats all over the place, which implies at minimum
> this isn't the only issue here.
Ugh. I've recently noticed the long soak logrecovery test vm have seen
a slight tick up in failure rates -- random blocks that have clearly had
garbage written to them such that recovery tries to read the block to
recover a buffer log item and kaboom. At this point it's unclear if
that's a problem with xfs or somewhere else. :(
> So given that 1. growfs recovery seems pretty much broken, 2. this
> particular patch has no straightforward way to test that it fixes
> something and at the same time doesn't break anything else, and 3. we do
I'm curious, what might break? Was that merely a general comment, or do
you have something specific in mind? (iows: do you see more string to
pull? :))
> have at least one fairly straightforward growfs/recovery test in the
> works that reliably explodes, personally I'd suggest to split this work
> off into separate series.
>
> It seems reasonable enough to me to get patches 1-5 in asap once they're
> fully cleaned up, and then leave the next two as part of a followon
> series pending further investigation into these other issues. As part of
> that I'd like to know whether the recovery test reproduces (or can be
> made to reproduce) the issue this patch presumably fixes, but I'd also
> settle for "the grow recovery test now passes reliably and this doesn't
> regress it." But once again, just my .02.
Yeah, it's too bad there's no good way to test recovery with older
kernels either. :(
--D
> Brian
>
>
next prev parent reply other threads:[~2024-10-11 17:13 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-30 16:41 fix recovery of allocator ops after a growfs Christoph Hellwig
2024-09-30 16:41 ` [PATCH 1/7] xfs: pass the exact range to initialize to xfs_initialize_perag Christoph Hellwig
2024-10-10 14:02 ` Brian Foster
2024-10-11 7:53 ` Christoph Hellwig
2024-10-11 14:01 ` Brian Foster
2024-09-30 16:41 ` [PATCH 2/7] xfs: merge the perag freeing helpers Christoph Hellwig
2024-10-10 14:02 ` Brian Foster
2024-09-30 16:41 ` [PATCH 3/7] xfs: update the file system geometry after recoverying superblock buffers Christoph Hellwig
2024-09-30 16:50 ` Darrick J. Wong
2024-10-01 8:49 ` Christoph Hellwig
2024-10-10 16:02 ` Darrick J. Wong
2024-10-10 14:03 ` Brian Foster
2024-09-30 16:41 ` [PATCH 4/7] xfs: error out when a superblock buffer updates reduces the agcount Christoph Hellwig
2024-09-30 16:51 ` Darrick J. Wong
2024-10-01 8:47 ` Christoph Hellwig
2024-10-10 14:04 ` Brian Foster
2024-09-30 16:41 ` [PATCH 5/7] xfs: don't use __GFP_RETRY_MAYFAIL in xfs_initialize_perag Christoph Hellwig
2024-10-10 14:04 ` Brian Foster
2024-09-30 16:41 ` [PATCH 6/7] xfs: don't update file system geometry through transaction deltas Christoph Hellwig
2024-10-10 14:05 ` Brian Foster
2024-10-11 7:57 ` Christoph Hellwig
2024-10-11 14:02 ` Brian Foster
2024-10-11 17:13 ` Darrick J. Wong [this message]
2024-10-11 18:41 ` Brian Foster
2024-10-11 23:12 ` Darrick J. Wong
2024-10-11 23:29 ` Darrick J. Wong
2024-10-14 5:58 ` Christoph Hellwig
2024-10-14 15:30 ` Darrick J. Wong
2024-10-14 18:50 ` Brian Foster
2024-10-15 16:42 ` Darrick J. Wong
2024-10-18 12:27 ` Brian Foster
2024-10-21 16:59 ` Darrick J. Wong
2024-10-23 14:45 ` Brian Foster
2024-10-24 18:02 ` Darrick J. Wong
2024-10-21 13:38 ` Dave Chinner
2024-10-23 15:06 ` Brian Foster
2024-10-10 19:01 ` Darrick J. Wong
2024-10-11 7:59 ` Christoph Hellwig
2024-10-11 16:44 ` Darrick J. Wong
2024-09-30 16:41 ` [PATCH 7/7] xfs: split xfs_trans_mod_sb Christoph Hellwig
2024-10-10 14:06 ` Brian Foster
2024-10-11 7:54 ` Christoph Hellwig
2024-10-11 14:05 ` Brian Foster
2024-10-11 16:50 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241011171303.GB21853@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=bfoster@redhat.com \
--cc=chandan.babu@oracle.com \
--cc=hch@lst.de \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox