From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: linux-xfs@vger.kernel.org, zlang@redhat.com
Subject: Re: [PATCH 2/2] xfs: recovery should not clear di_flushiter unconditionally
Date: Sat, 11 Nov 2023 07:36:59 +1100 [thread overview]
Message-ID: <ZU6Ua7JLAioBP7PY@dread.disaster.area> (raw)
In-Reply-To: <20231110193255.GK1205143@frogsfrogsfrogs>
On Fri, Nov 10, 2023 at 11:32:55AM -0800, Darrick J. Wong wrote:
> On Fri, Nov 10, 2023 at 03:33:14PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > Because on v3 inodes, di_flushiter doesn't exist. It overlaps with
> > zero padding in the inode, except when NREXT64=1 configurations are
> > in use and the zero padding is no longer padding but holds the 64
> > bit extent counter.
> >
> > This manifests obviously on big endian platforms (e.g. s390) because
> > the log dinode is in host order and the overlap is the LSBs of the
> > extent count field. It is not noticed on little endian machines
> > because the overlap is at the MSB end of the extent count field and
> > we need to get more than 2^^48 extents in the inode before it
> > manifests. i.e. the heat death of the universe will occur before we
> > see the problem in little endian machines.
> >
> > This is a zero-day issue for NREXT64=1 configuraitons on big endian
> > machines. Fix it by only clearing di_flushiter on v2 inodes during
> > recovery.
> >
> > Fixes: 9b7d16e34bbe ("xfs: Introduce XFS_DIFLAG2_NREXT64 and associated helpers")
> > cc: stable@kernel.org # 5.19+
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> > fs/xfs/xfs_inode_item_recover.c | 32 +++++++++++++++++---------------
> > 1 file changed, 17 insertions(+), 15 deletions(-)
> >
> > diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c
> > index f4c31c2b60d5..dbdab4ce7c44 100644
> > --- a/fs/xfs/xfs_inode_item_recover.c
> > +++ b/fs/xfs/xfs_inode_item_recover.c
> > @@ -371,24 +371,26 @@ xlog_recover_inode_commit_pass2(
> > * superblock flag to determine whether we need to look at di_flushiter
> > * to skip replay when the on disk inode is newer than the log one
> > */
> > - if (!xfs_has_v3inodes(mp) &&
> > - ldip->di_flushiter < be16_to_cpu(dip->di_flushiter)) {
> > - /*
> > - * Deal with the wrap case, DI_MAX_FLUSH is less
> > - * than smaller numbers
> > - */
> > - if (be16_to_cpu(dip->di_flushiter) == DI_MAX_FLUSH &&
> > - ldip->di_flushiter < (DI_MAX_FLUSH >> 1)) {
> > - /* do nothing */
> > - } else {
> > - trace_xfs_log_recover_inode_skip(log, in_f);
> > - error = 0;
> > - goto out_release;
> > + if (!xfs_has_v3inodes(mp)) {
> > + if (ldip->di_flushiter < be16_to_cpu(dip->di_flushiter)) {
> > + /*
> > + * Deal with the wrap case, DI_MAX_FLUSH is less
> > + * than smaller numbers
> > + */
> > + if (be16_to_cpu(dip->di_flushiter) == DI_MAX_FLUSH &&
> > + ldip->di_flushiter < (DI_MAX_FLUSH >> 1)) {
> > + /* do nothing */
> > + } else {
> > + trace_xfs_log_recover_inode_skip(log, in_f);
> > + error = 0;
> > + goto out_release;
> > + }
> > }
> > +
> > + /* Take the opportunity to reset the flush iteration count */
> > + ldip->di_flushiter = 0;
>
> Hmm. Well this fixes the zeroday problem, so thank you for getting the
> root of this!
>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
>
> Though hch did suggest reducing the amount of indenting here by
> compressing the if tests together. I can't decide if it's worth
> rearranging that old V4 code since none of it's scheduled for removal
> until 2030, but it /is/ legacy code that maybe we just don't care to
> touch?
I'd prefer not to touch it. It's now all isolated in a
"!xfs_has_v3inodes()" branch, so I think we can largely ignore the
grot for now. If we have to do a larger rework of this code in
future, then we can look at reworking it.
But right now, I really don't feel like risking breaking something
else by doing unnecessary cleanups on code we haven't needed to
touch in over a decade.
-Dave.
--
Dave Chinner
david@fromorbit.com
prev parent reply other threads:[~2023-11-10 20:37 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-10 4:33 [PATCH 0/2] xfs: fix recovery corruption on s390 w/ nrext64 Dave Chinner
2023-11-10 4:33 ` [PATCH 1/2] xfs: inode recovery does not validate the recovered inode Dave Chinner
2023-11-10 19:27 ` Darrick J. Wong
2023-11-10 20:40 ` Dave Chinner
2023-11-17 22:19 ` Darrick J. Wong
2023-11-10 4:33 ` [PATCH 2/2] xfs: recovery should not clear di_flushiter unconditionally Dave Chinner
2023-11-10 19:32 ` Darrick J. Wong
2023-11-10 20:36 ` Dave Chinner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZU6Ua7JLAioBP7PY@dread.disaster.area \
--to=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=zlang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.