* [PATCH 0/2] xfs: fix recovery corruption on s390 w/ nrext64 @ 2023-11-10 4:33 Dave Chinner 2023-11-10 4:33 ` [PATCH 1/2] xfs: inode recovery does not validate the recovered inode Dave Chinner 2023-11-10 4:33 ` [PATCH 2/2] xfs: recovery should not clear di_flushiter unconditionally Dave Chinner 0 siblings, 2 replies; 8+ messages in thread From: Dave Chinner @ 2023-11-10 4:33 UTC (permalink / raw) To: linux-xfs; +Cc: zlang Hi folks, The following are two patches to fix the corruption bug that Zorro recently recently reported on big endian platforms when nrext64 is enabled. It is caused by recovery zeroing a v2 inode field even on v3 inodes. Before nrext64, this wasn't an issue because the field in v3 inodes was just padding (i.e. zeroes) and so writing zero to part of it was never noticed. However, nrext64 uses this field for the 64 bit data fork extent count, so zeroing part of it becomes a problem. The zeroing is done to the xfs_log_dinode, which is held in host format. Because of the layout of this field, on little endian machines the di_flush_iter field overlaps the high 16 bits of di_big_nextents, and so we didn't notice that recovery was overwriting the the high bits of the extent count because it's always going to be zero bytes (needs to go over 2^48 extents before we'd notice) in testing contexts. However, on big endian machines, the di_flushiter field overlays the lower 16 bits of the di_big_extents field, and so zeroing di_flushiter essentially rounds it down to the nearest 64k. For typical testing contexts, that's effectively zeroing the extent count. The fix is two parts. The first patch adds detection of this specific corruption to the xfs_dinode verifier, and adds a call to the verifier after recovery has recovered the inode. This catches corruptions at recovery time, rather than silently writing them back and not noticing them until the inode is re-read from disk at runtime. The second patch fixes the recovery bug. It simply moves the resetting of the di_flushiter field into the v2 inode specific flushiter handling branch. That's always been wrong (i.e. since v3 inodes were introduced) but it hasn't mattered until now, so it marked as fixing the introduction of the nrext64 feature.... -Dave. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/2] xfs: inode recovery does not validate the recovered inode 2023-11-10 4:33 [PATCH 0/2] xfs: fix recovery corruption on s390 w/ nrext64 Dave Chinner @ 2023-11-10 4:33 ` Dave Chinner 2023-11-10 19:27 ` Darrick J. Wong 2023-11-10 4:33 ` [PATCH 2/2] xfs: recovery should not clear di_flushiter unconditionally Dave Chinner 1 sibling, 1 reply; 8+ messages in thread From: Dave Chinner @ 2023-11-10 4:33 UTC (permalink / raw) To: linux-xfs; +Cc: zlang From: Dave Chinner <dchinner@redhat.com> Discovered when trying to track down a weird recovery corruption issue that wasn't detected at recovery time. The specific corruption was a zero extent count field when big extent counts are in use, and it turns out the dinode verifier doesn't detect that specific corruption case, either. So fix it too. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/libxfs/xfs_inode_buf.c | 3 +++ fs/xfs/xfs_inode_item_recover.c | 14 +++++++++++++- 2 files changed, 16 insertions(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c index a35781577cad..0f970a0b3382 100644 --- a/fs/xfs/libxfs/xfs_inode_buf.c +++ b/fs/xfs/libxfs/xfs_inode_buf.c @@ -508,6 +508,9 @@ xfs_dinode_verify( if (mode && nextents + naextents > nblocks) return __this_address; + if (nextents + naextents == 0 && nblocks != 0) + return __this_address; + if (S_ISDIR(mode) && nextents > mp->m_dir_geo->max_extents) return __this_address; diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c index 6b09e2bf2d74..f4c31c2b60d5 100644 --- a/fs/xfs/xfs_inode_item_recover.c +++ b/fs/xfs/xfs_inode_item_recover.c @@ -286,6 +286,7 @@ xlog_recover_inode_commit_pass2( struct xfs_log_dinode *ldip; uint isize; int need_free = 0; + xfs_failaddr_t fa; if (item->ri_buf[0].i_len == sizeof(struct xfs_inode_log_format)) { in_f = item->ri_buf[0].i_addr; @@ -529,8 +530,19 @@ xlog_recover_inode_commit_pass2( (dip->di_mode != 0)) error = xfs_recover_inode_owner_change(mp, dip, in_f, buffer_list); - /* re-generate the checksum. */ + /* re-generate the checksum and validate the recovered inode. */ xfs_dinode_calc_crc(log->l_mp, dip); + fa = xfs_dinode_verify(log->l_mp, in_f->ilf_ino, dip); + if (fa) { + XFS_CORRUPTION_ERROR( + "Bad dinode after recovery", + XFS_ERRLEVEL_LOW, mp, dip, sizeof(*dip)); + xfs_alert(mp, + "Metadata corruption detected at %pS, inode 0x%llx", + fa, in_f->ilf_ino); + error = -EFSCORRUPTED; + goto out_release; + } ASSERT(bp->b_mount == mp); bp->b_flags |= _XBF_LOGRECOVERY; -- 2.42.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] xfs: inode recovery does not validate the recovered inode 2023-11-10 4:33 ` [PATCH 1/2] xfs: inode recovery does not validate the recovered inode Dave Chinner @ 2023-11-10 19:27 ` Darrick J. Wong 2023-11-10 20:40 ` Dave Chinner 0 siblings, 1 reply; 8+ messages in thread From: Darrick J. Wong @ 2023-11-10 19:27 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs, zlang On Fri, Nov 10, 2023 at 03:33:13PM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > Discovered when trying to track down a weird recovery corruption > issue that wasn't detected at recovery time. > > The specific corruption was a zero extent count field when big > extent counts are in use, and it turns out the dinode verifier > doesn't detect that specific corruption case, either. So fix it too. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- > fs/xfs/libxfs/xfs_inode_buf.c | 3 +++ > fs/xfs/xfs_inode_item_recover.c | 14 +++++++++++++- > 2 files changed, 16 insertions(+), 1 deletion(-) > > diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c > index a35781577cad..0f970a0b3382 100644 > --- a/fs/xfs/libxfs/xfs_inode_buf.c > +++ b/fs/xfs/libxfs/xfs_inode_buf.c > @@ -508,6 +508,9 @@ xfs_dinode_verify( > if (mode && nextents + naextents > nblocks) > return __this_address; > > + if (nextents + naextents == 0 && nblocks != 0) > + return __this_address; > + > if (S_ISDIR(mode) && nextents > mp->m_dir_geo->max_extents) > return __this_address; > > diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c > index 6b09e2bf2d74..f4c31c2b60d5 100644 > --- a/fs/xfs/xfs_inode_item_recover.c > +++ b/fs/xfs/xfs_inode_item_recover.c > @@ -286,6 +286,7 @@ xlog_recover_inode_commit_pass2( > struct xfs_log_dinode *ldip; > uint isize; > int need_free = 0; > + xfs_failaddr_t fa; > > if (item->ri_buf[0].i_len == sizeof(struct xfs_inode_log_format)) { > in_f = item->ri_buf[0].i_addr; > @@ -529,8 +530,19 @@ xlog_recover_inode_commit_pass2( > (dip->di_mode != 0)) > error = xfs_recover_inode_owner_change(mp, dip, in_f, > buffer_list); > - /* re-generate the checksum. */ > + /* re-generate the checksum and validate the recovered inode. */ > xfs_dinode_calc_crc(log->l_mp, dip); > + fa = xfs_dinode_verify(log->l_mp, in_f->ilf_ino, dip); > + if (fa) { Does xlog_recover_dquot_commit_pass2 need to call xfs_dquot_verify as well? This patch looks good though, Reviewed-by: Darrick J. Wong <djwong@kernel.org> --D > + XFS_CORRUPTION_ERROR( > + "Bad dinode after recovery", > + XFS_ERRLEVEL_LOW, mp, dip, sizeof(*dip)); > + xfs_alert(mp, > + "Metadata corruption detected at %pS, inode 0x%llx", > + fa, in_f->ilf_ino); > + error = -EFSCORRUPTED; > + goto out_release; > + } > > ASSERT(bp->b_mount == mp); > bp->b_flags |= _XBF_LOGRECOVERY; > -- > 2.42.0 > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] xfs: inode recovery does not validate the recovered inode 2023-11-10 19:27 ` Darrick J. Wong @ 2023-11-10 20:40 ` Dave Chinner 2023-11-17 22:19 ` Darrick J. Wong 0 siblings, 1 reply; 8+ messages in thread From: Dave Chinner @ 2023-11-10 20:40 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs, zlang On Fri, Nov 10, 2023 at 11:27:52AM -0800, Darrick J. Wong wrote: > On Fri, Nov 10, 2023 at 03:33:13PM +1100, Dave Chinner wrote: > > From: Dave Chinner <dchinner@redhat.com> > > > > Discovered when trying to track down a weird recovery corruption > > issue that wasn't detected at recovery time. > > > > The specific corruption was a zero extent count field when big > > extent counts are in use, and it turns out the dinode verifier > > doesn't detect that specific corruption case, either. So fix it too. > > > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > > --- > > fs/xfs/libxfs/xfs_inode_buf.c | 3 +++ > > fs/xfs/xfs_inode_item_recover.c | 14 +++++++++++++- > > 2 files changed, 16 insertions(+), 1 deletion(-) > > > > diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c > > index a35781577cad..0f970a0b3382 100644 > > --- a/fs/xfs/libxfs/xfs_inode_buf.c > > +++ b/fs/xfs/libxfs/xfs_inode_buf.c > > @@ -508,6 +508,9 @@ xfs_dinode_verify( > > if (mode && nextents + naextents > nblocks) > > return __this_address; > > > > + if (nextents + naextents == 0 && nblocks != 0) > > + return __this_address; > > + > > if (S_ISDIR(mode) && nextents > mp->m_dir_geo->max_extents) > > return __this_address; > > > > diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c > > index 6b09e2bf2d74..f4c31c2b60d5 100644 > > --- a/fs/xfs/xfs_inode_item_recover.c > > +++ b/fs/xfs/xfs_inode_item_recover.c > > @@ -286,6 +286,7 @@ xlog_recover_inode_commit_pass2( > > struct xfs_log_dinode *ldip; > > uint isize; > > int need_free = 0; > > + xfs_failaddr_t fa; > > > > if (item->ri_buf[0].i_len == sizeof(struct xfs_inode_log_format)) { > > in_f = item->ri_buf[0].i_addr; > > @@ -529,8 +530,19 @@ xlog_recover_inode_commit_pass2( > > (dip->di_mode != 0)) > > error = xfs_recover_inode_owner_change(mp, dip, in_f, > > buffer_list); > > - /* re-generate the checksum. */ > > + /* re-generate the checksum and validate the recovered inode. */ > > xfs_dinode_calc_crc(log->l_mp, dip); > > + fa = xfs_dinode_verify(log->l_mp, in_f->ilf_ino, dip); > > + if (fa) { > > Does xlog_recover_dquot_commit_pass2 need to call xfs_dquot_verify as > well? Maybe - I haven't looked closely at that, and it depends what the dquot buffer verifier does. If it's similar to the inode cluster buffer verifier (i.e. only checks for dquots, doesn't verify the dquots) then it should do the same thing. I don't have time to do this right now because I'm OOO for the next week, so maybe you could check this and send a patch for it? > This patch looks good though, > Reviewed-by: Darrick J. Wong <djwong@kernel.org> Thanks! -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] xfs: inode recovery does not validate the recovered inode 2023-11-10 20:40 ` Dave Chinner @ 2023-11-17 22:19 ` Darrick J. Wong 0 siblings, 0 replies; 8+ messages in thread From: Darrick J. Wong @ 2023-11-17 22:19 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs, zlang On Sat, Nov 11, 2023 at 07:40:43AM +1100, Dave Chinner wrote: > On Fri, Nov 10, 2023 at 11:27:52AM -0800, Darrick J. Wong wrote: > > On Fri, Nov 10, 2023 at 03:33:13PM +1100, Dave Chinner wrote: > > > From: Dave Chinner <dchinner@redhat.com> > > > > > > Discovered when trying to track down a weird recovery corruption > > > issue that wasn't detected at recovery time. > > > > > > The specific corruption was a zero extent count field when big > > > extent counts are in use, and it turns out the dinode verifier > > > doesn't detect that specific corruption case, either. So fix it too. > > > > > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > > > --- > > > fs/xfs/libxfs/xfs_inode_buf.c | 3 +++ > > > fs/xfs/xfs_inode_item_recover.c | 14 +++++++++++++- > > > 2 files changed, 16 insertions(+), 1 deletion(-) > > > > > > diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c > > > index a35781577cad..0f970a0b3382 100644 > > > --- a/fs/xfs/libxfs/xfs_inode_buf.c > > > +++ b/fs/xfs/libxfs/xfs_inode_buf.c > > > @@ -508,6 +508,9 @@ xfs_dinode_verify( > > > if (mode && nextents + naextents > nblocks) > > > return __this_address; > > > > > > + if (nextents + naextents == 0 && nblocks != 0) > > > + return __this_address; > > > + > > > if (S_ISDIR(mode) && nextents > mp->m_dir_geo->max_extents) > > > return __this_address; > > > > > > diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c > > > index 6b09e2bf2d74..f4c31c2b60d5 100644 > > > --- a/fs/xfs/xfs_inode_item_recover.c > > > +++ b/fs/xfs/xfs_inode_item_recover.c > > > @@ -286,6 +286,7 @@ xlog_recover_inode_commit_pass2( > > > struct xfs_log_dinode *ldip; > > > uint isize; > > > int need_free = 0; > > > + xfs_failaddr_t fa; > > > > > > if (item->ri_buf[0].i_len == sizeof(struct xfs_inode_log_format)) { > > > in_f = item->ri_buf[0].i_addr; > > > @@ -529,8 +530,19 @@ xlog_recover_inode_commit_pass2( > > > (dip->di_mode != 0)) > > > error = xfs_recover_inode_owner_change(mp, dip, in_f, > > > buffer_list); > > > - /* re-generate the checksum. */ > > > + /* re-generate the checksum and validate the recovered inode. */ > > > xfs_dinode_calc_crc(log->l_mp, dip); > > > + fa = xfs_dinode_verify(log->l_mp, in_f->ilf_ino, dip); > > > + if (fa) { > > > > Does xlog_recover_dquot_commit_pass2 need to call xfs_dquot_verify as > > well? > > Maybe - I haven't looked closely at that, and it depends what the > dquot buffer verifier does. If it's similar to the inode cluster > buffer verifier (i.e. only checks for dquots, doesn't verify the > dquots) then it should do the same thing. I don't have time to do > this right now because I'm OOO for the next week, so maybe you could > check this and send a patch for it? I tossed on a patch to do this and after a couple of days of generic/388 and generic/475 spinning I haven't noticed any failures. --D > > This patch looks good though, > > Reviewed-by: Darrick J. Wong <djwong@kernel.org> > > Thanks! > > -Dave. > -- > Dave Chinner > david@fromorbit.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/2] xfs: recovery should not clear di_flushiter unconditionally 2023-11-10 4:33 [PATCH 0/2] xfs: fix recovery corruption on s390 w/ nrext64 Dave Chinner 2023-11-10 4:33 ` [PATCH 1/2] xfs: inode recovery does not validate the recovered inode Dave Chinner @ 2023-11-10 4:33 ` Dave Chinner 2023-11-10 19:32 ` Darrick J. Wong 1 sibling, 1 reply; 8+ messages in thread From: Dave Chinner @ 2023-11-10 4:33 UTC (permalink / raw) To: linux-xfs; +Cc: zlang From: Dave Chinner <dchinner@redhat.com> Because on v3 inodes, di_flushiter doesn't exist. It overlaps with zero padding in the inode, except when NREXT64=1 configurations are in use and the zero padding is no longer padding but holds the 64 bit extent counter. This manifests obviously on big endian platforms (e.g. s390) because the log dinode is in host order and the overlap is the LSBs of the extent count field. It is not noticed on little endian machines because the overlap is at the MSB end of the extent count field and we need to get more than 2^^48 extents in the inode before it manifests. i.e. the heat death of the universe will occur before we see the problem in little endian machines. This is a zero-day issue for NREXT64=1 configuraitons on big endian machines. Fix it by only clearing di_flushiter on v2 inodes during recovery. Fixes: 9b7d16e34bbe ("xfs: Introduce XFS_DIFLAG2_NREXT64 and associated helpers") cc: stable@kernel.org # 5.19+ Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/xfs_inode_item_recover.c | 32 +++++++++++++++++--------------- 1 file changed, 17 insertions(+), 15 deletions(-) diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c index f4c31c2b60d5..dbdab4ce7c44 100644 --- a/fs/xfs/xfs_inode_item_recover.c +++ b/fs/xfs/xfs_inode_item_recover.c @@ -371,24 +371,26 @@ xlog_recover_inode_commit_pass2( * superblock flag to determine whether we need to look at di_flushiter * to skip replay when the on disk inode is newer than the log one */ - if (!xfs_has_v3inodes(mp) && - ldip->di_flushiter < be16_to_cpu(dip->di_flushiter)) { - /* - * Deal with the wrap case, DI_MAX_FLUSH is less - * than smaller numbers - */ - if (be16_to_cpu(dip->di_flushiter) == DI_MAX_FLUSH && - ldip->di_flushiter < (DI_MAX_FLUSH >> 1)) { - /* do nothing */ - } else { - trace_xfs_log_recover_inode_skip(log, in_f); - error = 0; - goto out_release; + if (!xfs_has_v3inodes(mp)) { + if (ldip->di_flushiter < be16_to_cpu(dip->di_flushiter)) { + /* + * Deal with the wrap case, DI_MAX_FLUSH is less + * than smaller numbers + */ + if (be16_to_cpu(dip->di_flushiter) == DI_MAX_FLUSH && + ldip->di_flushiter < (DI_MAX_FLUSH >> 1)) { + /* do nothing */ + } else { + trace_xfs_log_recover_inode_skip(log, in_f); + error = 0; + goto out_release; + } } + + /* Take the opportunity to reset the flush iteration count */ + ldip->di_flushiter = 0; } - /* Take the opportunity to reset the flush iteration count */ - ldip->di_flushiter = 0; if (unlikely(S_ISREG(ldip->di_mode))) { if ((ldip->di_format != XFS_DINODE_FMT_EXTENTS) && -- 2.42.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] xfs: recovery should not clear di_flushiter unconditionally 2023-11-10 4:33 ` [PATCH 2/2] xfs: recovery should not clear di_flushiter unconditionally Dave Chinner @ 2023-11-10 19:32 ` Darrick J. Wong 2023-11-10 20:36 ` Dave Chinner 0 siblings, 1 reply; 8+ messages in thread From: Darrick J. Wong @ 2023-11-10 19:32 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs, zlang On Fri, Nov 10, 2023 at 03:33:14PM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > Because on v3 inodes, di_flushiter doesn't exist. It overlaps with > zero padding in the inode, except when NREXT64=1 configurations are > in use and the zero padding is no longer padding but holds the 64 > bit extent counter. > > This manifests obviously on big endian platforms (e.g. s390) because > the log dinode is in host order and the overlap is the LSBs of the > extent count field. It is not noticed on little endian machines > because the overlap is at the MSB end of the extent count field and > we need to get more than 2^^48 extents in the inode before it > manifests. i.e. the heat death of the universe will occur before we > see the problem in little endian machines. > > This is a zero-day issue for NREXT64=1 configuraitons on big endian > machines. Fix it by only clearing di_flushiter on v2 inodes during > recovery. > > Fixes: 9b7d16e34bbe ("xfs: Introduce XFS_DIFLAG2_NREXT64 and associated helpers") > cc: stable@kernel.org # 5.19+ > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- > fs/xfs/xfs_inode_item_recover.c | 32 +++++++++++++++++--------------- > 1 file changed, 17 insertions(+), 15 deletions(-) > > diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c > index f4c31c2b60d5..dbdab4ce7c44 100644 > --- a/fs/xfs/xfs_inode_item_recover.c > +++ b/fs/xfs/xfs_inode_item_recover.c > @@ -371,24 +371,26 @@ xlog_recover_inode_commit_pass2( > * superblock flag to determine whether we need to look at di_flushiter > * to skip replay when the on disk inode is newer than the log one > */ > - if (!xfs_has_v3inodes(mp) && > - ldip->di_flushiter < be16_to_cpu(dip->di_flushiter)) { > - /* > - * Deal with the wrap case, DI_MAX_FLUSH is less > - * than smaller numbers > - */ > - if (be16_to_cpu(dip->di_flushiter) == DI_MAX_FLUSH && > - ldip->di_flushiter < (DI_MAX_FLUSH >> 1)) { > - /* do nothing */ > - } else { > - trace_xfs_log_recover_inode_skip(log, in_f); > - error = 0; > - goto out_release; > + if (!xfs_has_v3inodes(mp)) { > + if (ldip->di_flushiter < be16_to_cpu(dip->di_flushiter)) { > + /* > + * Deal with the wrap case, DI_MAX_FLUSH is less > + * than smaller numbers > + */ > + if (be16_to_cpu(dip->di_flushiter) == DI_MAX_FLUSH && > + ldip->di_flushiter < (DI_MAX_FLUSH >> 1)) { > + /* do nothing */ > + } else { > + trace_xfs_log_recover_inode_skip(log, in_f); > + error = 0; > + goto out_release; > + } > } > + > + /* Take the opportunity to reset the flush iteration count */ > + ldip->di_flushiter = 0; Hmm. Well this fixes the zeroday problem, so thank you for getting the root of this! Reviewed-by: Darrick J. Wong <djwong@kernel.org> Though hch did suggest reducing the amount of indenting here by compressing the if tests together. I can't decide if it's worth rearranging that old V4 code since none of it's scheduled for removal until 2030, but it /is/ legacy code that maybe we just don't care to touch? <shrug> --D > } > > - /* Take the opportunity to reset the flush iteration count */ > - ldip->di_flushiter = 0; > > if (unlikely(S_ISREG(ldip->di_mode))) { > if ((ldip->di_format != XFS_DINODE_FMT_EXTENTS) && > -- > 2.42.0 > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] xfs: recovery should not clear di_flushiter unconditionally 2023-11-10 19:32 ` Darrick J. Wong @ 2023-11-10 20:36 ` Dave Chinner 0 siblings, 0 replies; 8+ messages in thread From: Dave Chinner @ 2023-11-10 20:36 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs, zlang On Fri, Nov 10, 2023 at 11:32:55AM -0800, Darrick J. Wong wrote: > On Fri, Nov 10, 2023 at 03:33:14PM +1100, Dave Chinner wrote: > > From: Dave Chinner <dchinner@redhat.com> > > > > Because on v3 inodes, di_flushiter doesn't exist. It overlaps with > > zero padding in the inode, except when NREXT64=1 configurations are > > in use and the zero padding is no longer padding but holds the 64 > > bit extent counter. > > > > This manifests obviously on big endian platforms (e.g. s390) because > > the log dinode is in host order and the overlap is the LSBs of the > > extent count field. It is not noticed on little endian machines > > because the overlap is at the MSB end of the extent count field and > > we need to get more than 2^^48 extents in the inode before it > > manifests. i.e. the heat death of the universe will occur before we > > see the problem in little endian machines. > > > > This is a zero-day issue for NREXT64=1 configuraitons on big endian > > machines. Fix it by only clearing di_flushiter on v2 inodes during > > recovery. > > > > Fixes: 9b7d16e34bbe ("xfs: Introduce XFS_DIFLAG2_NREXT64 and associated helpers") > > cc: stable@kernel.org # 5.19+ > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > > --- > > fs/xfs/xfs_inode_item_recover.c | 32 +++++++++++++++++--------------- > > 1 file changed, 17 insertions(+), 15 deletions(-) > > > > diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c > > index f4c31c2b60d5..dbdab4ce7c44 100644 > > --- a/fs/xfs/xfs_inode_item_recover.c > > +++ b/fs/xfs/xfs_inode_item_recover.c > > @@ -371,24 +371,26 @@ xlog_recover_inode_commit_pass2( > > * superblock flag to determine whether we need to look at di_flushiter > > * to skip replay when the on disk inode is newer than the log one > > */ > > - if (!xfs_has_v3inodes(mp) && > > - ldip->di_flushiter < be16_to_cpu(dip->di_flushiter)) { > > - /* > > - * Deal with the wrap case, DI_MAX_FLUSH is less > > - * than smaller numbers > > - */ > > - if (be16_to_cpu(dip->di_flushiter) == DI_MAX_FLUSH && > > - ldip->di_flushiter < (DI_MAX_FLUSH >> 1)) { > > - /* do nothing */ > > - } else { > > - trace_xfs_log_recover_inode_skip(log, in_f); > > - error = 0; > > - goto out_release; > > + if (!xfs_has_v3inodes(mp)) { > > + if (ldip->di_flushiter < be16_to_cpu(dip->di_flushiter)) { > > + /* > > + * Deal with the wrap case, DI_MAX_FLUSH is less > > + * than smaller numbers > > + */ > > + if (be16_to_cpu(dip->di_flushiter) == DI_MAX_FLUSH && > > + ldip->di_flushiter < (DI_MAX_FLUSH >> 1)) { > > + /* do nothing */ > > + } else { > > + trace_xfs_log_recover_inode_skip(log, in_f); > > + error = 0; > > + goto out_release; > > + } > > } > > + > > + /* Take the opportunity to reset the flush iteration count */ > > + ldip->di_flushiter = 0; > > Hmm. Well this fixes the zeroday problem, so thank you for getting the > root of this! > > Reviewed-by: Darrick J. Wong <djwong@kernel.org> > > Though hch did suggest reducing the amount of indenting here by > compressing the if tests together. I can't decide if it's worth > rearranging that old V4 code since none of it's scheduled for removal > until 2030, but it /is/ legacy code that maybe we just don't care to > touch? I'd prefer not to touch it. It's now all isolated in a "!xfs_has_v3inodes()" branch, so I think we can largely ignore the grot for now. If we have to do a larger rework of this code in future, then we can look at reworking it. But right now, I really don't feel like risking breaking something else by doing unnecessary cleanups on code we haven't needed to touch in over a decade. -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-11-17 22:19 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-11-10 4:33 [PATCH 0/2] xfs: fix recovery corruption on s390 w/ nrext64 Dave Chinner 2023-11-10 4:33 ` [PATCH 1/2] xfs: inode recovery does not validate the recovered inode Dave Chinner 2023-11-10 19:27 ` Darrick J. Wong 2023-11-10 20:40 ` Dave Chinner 2023-11-17 22:19 ` Darrick J. Wong 2023-11-10 4:33 ` [PATCH 2/2] xfs: recovery should not clear di_flushiter unconditionally Dave Chinner 2023-11-10 19:32 ` Darrick J. Wong 2023-11-10 20:36 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox