* [PATCHSET v2 0/3] xfs_repair: various small fixes
@ 2022-06-28 20:50 Darrick J. Wong
2022-06-28 20:50 ` [PATCH 1/3] xfs_repair: always rewrite secondary supers when needsrepair is set Darrick J. Wong
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Darrick J. Wong @ 2022-06-28 20:50 UTC (permalink / raw)
To: sandeen, djwong; +Cc: Dave Chinner, Chandan Babu R, linux-xfs
Hi all,
Bug fixes for 5.16 include:
- checking V5 feature bits in secondary superblocks against whatever we
decide is the primary
- consistently warning if repair can't check existing rmap and refcount
btrees
- checking the ftype of dot and dotdot entries
Enjoy!
v2: improve documentation
If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.
This is an extraordinary way to destroy everything. Enjoy!
Comments and questions are, as always, welcome.
--D
xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=repair-fixes
---
repair/agheader.c | 23 ++++++++++++++++++++---
repair/dino_chunks.c | 3 +--
2 files changed, 21 insertions(+), 5 deletions(-)
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH 1/3] xfs_repair: always rewrite secondary supers when needsrepair is set 2022-06-28 20:50 [PATCHSET v2 0/3] xfs_repair: various small fixes Darrick J. Wong @ 2022-06-28 20:50 ` Darrick J. Wong 2022-06-28 23:20 ` Dave Chinner 2022-06-28 20:50 ` [PATCH 2/3] xfs_repair: don't flag log_incompat inconsistencies as corruptions Darrick J. Wong 2022-06-28 20:50 ` [PATCH 3/3] xfs_repair: Search for conflicts in inode_tree_ptrs[] when processing uncertain inodes Darrick J. Wong 2 siblings, 1 reply; 6+ messages in thread From: Darrick J. Wong @ 2022-06-28 20:50 UTC (permalink / raw) To: sandeen, djwong; +Cc: Dave Chinner, linux-xfs From: Darrick J. Wong <djwong@kernel.org> Dave Chinner complained about xfs_scrub failures coming from xfs/158. That test induces xfs_repair to fail while upgrading a filesystem to have the inobtcount feature, and then restarts xfs_repair to finish the upgrade. When the second xfs_repair run starts, it will find that the primary super has NEEDSREPAIR set, along with whatever new feature that we were trying to add to the filesystem. From there, repair completes the upgrade in much the same manner as the first repair run would have, with one big exception -- it forgets to set features_changed to trigger rewriting of the secondary supers at the end of repair. This results in discrepancies between the supers: # XFS_REPAIR_FAIL_AFTER_PHASE=2 xfs_repair -c inobtcount=1 /dev/sdf Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Adding inode btree counts to filesystem. Killed # xfs_repair /dev/sdf Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... clearing needsrepair flag and regenerating metadata bad inobt block count 0, saw 1 bad finobt block count 0, saw 1 bad inobt block count 0, saw 1 bad finobt block count 0, saw 1 bad inobt block count 0, saw 1 bad finobt block count 0, saw 1 bad inobt block count 0, saw 1 bad finobt block count 0, saw 1 - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 1 - agno = 2 - agno = 0 - agno = 3 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... done # xfs_db -c 'sb 0' -c 'print' -c 'sb 1' -c 'print' /dev/sdf | \ egrep '(features_ro_compat|features_incompat)' features_ro_compat = 0xd features_incompat = 0xb features_ro_compat = 0x5 features_incompat = 0xb Curiously, re-running xfs_repair will not trigger any warnings about the featureset mismatch between the primary and secondary supers. xfs_scrub immediately notices, which is what causes xfs/158 to fail. This discrepancy doesn't happen when the upgrade completes successfully in a single repair run, so we need to teach repair to rewrite the secondaries at the end of repair any time needsrepair was set. Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/agheader.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/repair/agheader.c b/repair/agheader.c index 36da1395..e91509d0 100644 --- a/repair/agheader.c +++ b/repair/agheader.c @@ -552,6 +552,14 @@ secondary_sb_whack( else do_warn( _("would clear needsrepair flag and regenerate metadata\n")); + /* + * If needsrepair is set on the primary super, there's + * a possibility that repair crashed during an upgrade. + * Set features_changed to ensure that the secondary + * supers are rewritten with the new feature bits once + * we've finished the upgrade. + */ + features_changed = true; } else { /* * Quietly clear needsrepair on the secondary supers as ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 1/3] xfs_repair: always rewrite secondary supers when needsrepair is set 2022-06-28 20:50 ` [PATCH 1/3] xfs_repair: always rewrite secondary supers when needsrepair is set Darrick J. Wong @ 2022-06-28 23:20 ` Dave Chinner 0 siblings, 0 replies; 6+ messages in thread From: Dave Chinner @ 2022-06-28 23:20 UTC (permalink / raw) To: Darrick J. Wong; +Cc: sandeen, linux-xfs On Tue, Jun 28, 2022 at 01:50:40PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@kernel.org> > > Dave Chinner complained about xfs_scrub failures coming from xfs/158. > That test induces xfs_repair to fail while upgrading a filesystem to > have the inobtcount feature, and then restarts xfs_repair to finish the > upgrade. When the second xfs_repair run starts, it will find that the > primary super has NEEDSREPAIR set, along with whatever new feature that > we were trying to add to the filesystem. > > From there, repair completes the upgrade in much the same manner as the > first repair run would have, with one big exception -- it forgets to set > features_changed to trigger rewriting of the secondary supers at the end > of repair. This results in discrepancies between the supers: > > # XFS_REPAIR_FAIL_AFTER_PHASE=2 xfs_repair -c inobtcount=1 /dev/sdf > Phase 1 - find and verify superblock... > Phase 2 - using internal log > - zero log... > - scan filesystem freespace and inode maps... > - found root inode chunk > Adding inode btree counts to filesystem. > Killed > # xfs_repair /dev/sdf > Phase 1 - find and verify superblock... > Phase 2 - using internal log > - zero log... > - scan filesystem freespace and inode maps... > clearing needsrepair flag and regenerating metadata > bad inobt block count 0, saw 1 > bad finobt block count 0, saw 1 > bad inobt block count 0, saw 1 > bad finobt block count 0, saw 1 > bad inobt block count 0, saw 1 > bad finobt block count 0, saw 1 > bad inobt block count 0, saw 1 > bad finobt block count 0, saw 1 > - found root inode chunk > Phase 3 - for each AG... > - scan and clear agi unlinked lists... > - process known inodes and perform inode discovery... > - agno = 0 > - agno = 1 > - agno = 2 > - agno = 3 > - process newly discovered inodes... > Phase 4 - check for duplicate blocks... > - setting up duplicate extent list... > - check for inodes claiming duplicate blocks... > - agno = 1 > - agno = 2 > - agno = 0 > - agno = 3 > Phase 5 - rebuild AG headers and trees... > - reset superblock... > Phase 6 - check inode connectivity... > - resetting contents of realtime bitmap and summary inodes > - traversing filesystem ... > - traversal finished ... > - moving disconnected inodes to lost+found ... > Phase 7 - verify and correct link counts... > done > # xfs_db -c 'sb 0' -c 'print' -c 'sb 1' -c 'print' /dev/sdf | \ > egrep '(features_ro_compat|features_incompat)' > features_ro_compat = 0xd > features_incompat = 0xb > features_ro_compat = 0x5 > features_incompat = 0xb > > Curiously, re-running xfs_repair will not trigger any warnings about the > featureset mismatch between the primary and secondary supers. xfs_scrub > immediately notices, which is what causes xfs/158 to fail. > > This discrepancy doesn't happen when the upgrade completes successfully > in a single repair run, so we need to teach repair to rewrite the > secondaries at the end of repair any time needsrepair was set. > > Reported-by: Dave Chinner <david@fromorbit.com> > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > --- > repair/agheader.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > > diff --git a/repair/agheader.c b/repair/agheader.c > index 36da1395..e91509d0 100644 > --- a/repair/agheader.c > +++ b/repair/agheader.c > @@ -552,6 +552,14 @@ secondary_sb_whack( > else > do_warn( > _("would clear needsrepair flag and regenerate metadata\n")); > + /* > + * If needsrepair is set on the primary super, there's > + * a possibility that repair crashed during an upgrade. > + * Set features_changed to ensure that the secondary > + * supers are rewritten with the new feature bits once > + * we've finished the upgrade. > + */ > + features_changed = true; > } else { > /* > * Quietly clear needsrepair on the secondary supers as > > Looks good. Reviewed-by: Dave Chinner <dchinner@redhat.com> -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 2/3] xfs_repair: don't flag log_incompat inconsistencies as corruptions 2022-06-28 20:50 [PATCHSET v2 0/3] xfs_repair: various small fixes Darrick J. Wong 2022-06-28 20:50 ` [PATCH 1/3] xfs_repair: always rewrite secondary supers when needsrepair is set Darrick J. Wong @ 2022-06-28 20:50 ` Darrick J. Wong 2022-06-28 23:21 ` Dave Chinner 2022-06-28 20:50 ` [PATCH 3/3] xfs_repair: Search for conflicts in inode_tree_ptrs[] when processing uncertain inodes Darrick J. Wong 2 siblings, 1 reply; 6+ messages in thread From: Darrick J. Wong @ 2022-06-28 20:50 UTC (permalink / raw) To: sandeen, djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> While testing xfs/233 and xfs/127 with LARP mode enabled, I noticed errors such as the following: xfs_growfs --BlockSize=4096 --Blocks=8192 data blocks changed from 8192 to 2579968 meta-data=/dev/sdf isize=512 agcount=630, agsize=4096 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=1 = reflink=1 bigtime=1 inobtcount=1 data = bsize=4096 blocks=2579968, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=3075, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 _check_xfs_filesystem: filesystem on /dev/sdf is inconsistent (r) *** xfs_repair -n output *** Phase 1 - find and verify superblock... - reporting progress in intervals of 15 minutes Phase 2 - using internal log - zero log... - 23:03:47: zeroing log - 3075 of 3075 blocks done - scan filesystem freespace and inode maps... would fix log incompat feature mismatch in AG 30 super, 0x0 != 0x1 would fix log incompat feature mismatch in AG 8 super, 0x0 != 0x1 would fix log incompat feature mismatch in AG 12 super, 0x0 != 0x1 would fix log incompat feature mismatch in AG 24 super, 0x0 != 0x1 would fix log incompat feature mismatch in AG 18 super, 0x0 != 0x1 <snip> 0x1 corresponds to XFS_SB_FEAT_INCOMPAT_LOG_XATTRS, which is the feature bit used to indicate that the log contains extended attribute log intent items. This is a mechanism to prevent older kernels from trying to recover log items that they won't know how to recover. I thought about this a little bit more, and realized that log_incompat features bits are set on the primary sb prior to writing certain types of log records, and cleared once the log has written the committed changes back to the filesystem. If the secondary superblocks are updated at any point during that interval (due to things like growfs or setting labels), the log_incompat field will now be set on the secondary supers. Due to the ephemeral nature of the current log_incompat feature bits, a discrepancy between the primary and secondary supers is not a corruption. If we're in dry run mode, we should log the discrepancy, but that's not a reason to end with EXIT_FAILURE. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/agheader.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/repair/agheader.c b/repair/agheader.c index e91509d0..76290158 100644 --- a/repair/agheader.c +++ b/repair/agheader.c @@ -285,15 +285,24 @@ check_v5_feature_mismatch( } } + /* + * Log incompat feature bits are set and cleared from the primary super + * as needed to protect against log replay on old kernels finding log + * records that they cannot handle. Secondary sb resyncs performed as + * part of a geometry update to the primary sb (e.g. growfs, label/uuid + * changes) will copy the log incompat feature bits, but it's not a + * corruption for a secondary to have a bit set that is clear in the + * primary super. + */ if (mp->m_sb.sb_features_log_incompat != sb->sb_features_log_incompat) { if (no_modify) { - do_warn( - _("would fix log incompat feature mismatch in AG %u super, 0x%x != 0x%x\n"), + do_log( + _("would sync log incompat feature in AG %u super, 0x%x != 0x%x\n"), agno, mp->m_sb.sb_features_log_incompat, sb->sb_features_log_incompat); } else { do_warn( - _("will fix log incompat feature mismatch in AG %u super, 0x%x != 0x%x\n"), + _("will sync log incompat feature in AG %u super, 0x%x != 0x%x\n"), agno, mp->m_sb.sb_features_log_incompat, sb->sb_features_log_incompat); dirty = true; ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 2/3] xfs_repair: don't flag log_incompat inconsistencies as corruptions 2022-06-28 20:50 ` [PATCH 2/3] xfs_repair: don't flag log_incompat inconsistencies as corruptions Darrick J. Wong @ 2022-06-28 23:21 ` Dave Chinner 0 siblings, 0 replies; 6+ messages in thread From: Dave Chinner @ 2022-06-28 23:21 UTC (permalink / raw) To: Darrick J. Wong; +Cc: sandeen, linux-xfs On Tue, Jun 28, 2022 at 01:50:45PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@kernel.org> > > While testing xfs/233 and xfs/127 with LARP mode enabled, I noticed > errors such as the following: > > xfs_growfs --BlockSize=4096 --Blocks=8192 > data blocks changed from 8192 to 2579968 > meta-data=/dev/sdf isize=512 agcount=630, agsize=4096 blks > = sectsz=512 attr=2, projid32bit=1 > = crc=1 finobt=1, sparse=1, rmapbt=1 > = reflink=1 bigtime=1 inobtcount=1 > data = bsize=4096 blocks=2579968, imaxpct=25 > = sunit=0 swidth=0 blks > naming =version 2 bsize=4096 ascii-ci=0, ftype=1 > log =internal log bsize=4096 blocks=3075, version=2 > = sectsz=512 sunit=0 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > _check_xfs_filesystem: filesystem on /dev/sdf is inconsistent (r) > *** xfs_repair -n output *** > Phase 1 - find and verify superblock... > - reporting progress in intervals of 15 minutes > Phase 2 - using internal log > - zero log... > - 23:03:47: zeroing log - 3075 of 3075 blocks done > - scan filesystem freespace and inode maps... > would fix log incompat feature mismatch in AG 30 super, 0x0 != 0x1 > would fix log incompat feature mismatch in AG 8 super, 0x0 != 0x1 > would fix log incompat feature mismatch in AG 12 super, 0x0 != 0x1 > would fix log incompat feature mismatch in AG 24 super, 0x0 != 0x1 > would fix log incompat feature mismatch in AG 18 super, 0x0 != 0x1 > <snip> > > 0x1 corresponds to XFS_SB_FEAT_INCOMPAT_LOG_XATTRS, which is the feature > bit used to indicate that the log contains extended attribute log intent > items. This is a mechanism to prevent older kernels from trying to > recover log items that they won't know how to recover. > > I thought about this a little bit more, and realized that log_incompat > features bits are set on the primary sb prior to writing certain types > of log records, and cleared once the log has written the committed > changes back to the filesystem. If the secondary superblocks are > updated at any point during that interval (due to things like growfs or > setting labels), the log_incompat field will now be set on the secondary > supers. > > Due to the ephemeral nature of the current log_incompat feature bits, > a discrepancy between the primary and secondary supers is not a > corruption. If we're in dry run mode, we should log the discrepancy, > but that's not a reason to end with EXIT_FAILURE. > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > --- > repair/agheader.c | 15 ++++++++++++--- > 1 file changed, 12 insertions(+), 3 deletions(-) > > > diff --git a/repair/agheader.c b/repair/agheader.c > index e91509d0..76290158 100644 > --- a/repair/agheader.c > +++ b/repair/agheader.c > @@ -285,15 +285,24 @@ check_v5_feature_mismatch( > } > } > > + /* > + * Log incompat feature bits are set and cleared from the primary super > + * as needed to protect against log replay on old kernels finding log > + * records that they cannot handle. Secondary sb resyncs performed as > + * part of a geometry update to the primary sb (e.g. growfs, label/uuid > + * changes) will copy the log incompat feature bits, but it's not a > + * corruption for a secondary to have a bit set that is clear in the > + * primary super. > + */ > if (mp->m_sb.sb_features_log_incompat != sb->sb_features_log_incompat) { > if (no_modify) { > - do_warn( > - _("would fix log incompat feature mismatch in AG %u super, 0x%x != 0x%x\n"), > + do_log( > + _("would sync log incompat feature in AG %u super, 0x%x != 0x%x\n"), > agno, mp->m_sb.sb_features_log_incompat, > sb->sb_features_log_incompat); > } else { > do_warn( > - _("will fix log incompat feature mismatch in AG %u super, 0x%x != 0x%x\n"), > + _("will sync log incompat feature in AG %u super, 0x%x != 0x%x\n"), > agno, mp->m_sb.sb_features_log_incompat, > sb->sb_features_log_incompat); > dirty = true; Looks fine. Reviewed-by: Dave Chinner <dchinner@redhat.com> -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 3/3] xfs_repair: Search for conflicts in inode_tree_ptrs[] when processing uncertain inodes 2022-06-28 20:50 [PATCHSET v2 0/3] xfs_repair: various small fixes Darrick J. Wong 2022-06-28 20:50 ` [PATCH 1/3] xfs_repair: always rewrite secondary supers when needsrepair is set Darrick J. Wong 2022-06-28 20:50 ` [PATCH 2/3] xfs_repair: don't flag log_incompat inconsistencies as corruptions Darrick J. Wong @ 2022-06-28 20:50 ` Darrick J. Wong 2 siblings, 0 replies; 6+ messages in thread From: Darrick J. Wong @ 2022-06-28 20:50 UTC (permalink / raw) To: sandeen, djwong; +Cc: Chandan Babu R, linux-xfs From: Chandan Babu R <chandan.babu@oracle.com> When processing an uncertain inode chunk record, if we lose 2 blocks worth of inodes or 25% of the chunk, xfs_repair decides to ignore the chunk. Otherwise, xfs_repair adds a new chunk record to inode_tree_ptrs[agno], marking each inode as either free or used. However, before adding the new chunk record, xfs_repair has to check for the existance of a conflicting record. The existing code incorrectly checks for the conflicting record in inode_uncertain_tree_ptrs[agno]. This check will succeed since the inode chunk record being processed was originally obtained from inode_uncertain_tree_ptrs[agno]. This commit fixes the bug by changing xfs_repair to search inode_tree_ptrs[agno] for conflicts. Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/dino_chunks.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/repair/dino_chunks.c b/repair/dino_chunks.c index 11b0eb5f..80c52a43 100644 --- a/repair/dino_chunks.c +++ b/repair/dino_chunks.c @@ -229,8 +229,7 @@ verify_inode_chunk(xfs_mount_t *mp, /* * ok, put the record into the tree, if no conflict. */ - if (find_uncertain_inode_rec(agno, - XFS_AGB_TO_AGINO(mp, start_agbno))) + if (find_inode_rec(mp, agno, XFS_AGB_TO_AGINO(mp, start_agbno))) return(0); start_agino = XFS_AGB_TO_AGINO(mp, start_agbno); ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-06-28 23:21 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-06-28 20:50 [PATCHSET v2 0/3] xfs_repair: various small fixes Darrick J. Wong 2022-06-28 20:50 ` [PATCH 1/3] xfs_repair: always rewrite secondary supers when needsrepair is set Darrick J. Wong 2022-06-28 23:20 ` Dave Chinner 2022-06-28 20:50 ` [PATCH 2/3] xfs_repair: don't flag log_incompat inconsistencies as corruptions Darrick J. Wong 2022-06-28 23:21 ` Dave Chinner 2022-06-28 20:50 ` [PATCH 3/3] xfs_repair: Search for conflicts in inode_tree_ptrs[] when processing uncertain inodes Darrick J. Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox