public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs_repair: -EFSBADCRC needs action when read verifier detects it.
@ 2025-02-26 17:32 bodonnel
  2025-02-26 18:20 ` Darrick J. Wong
  0 siblings, 1 reply; 7+ messages in thread
From: bodonnel @ 2025-02-26 17:32 UTC (permalink / raw)
  To: linux-xfs

From: Bill O'Donnell <bodonnel@redhat.com>

For xfs_repair, there is a case when -EFSBADCRC is encountered but not
acted on. Modify da_read_buf to check for and repair. The current
implementation fails for the case:

$ xfs_repair xfs_metadump_hosting.dmp.image
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
Metadata CRC error detected at 0x46cde8, xfs_dir3_block block 0xd3c50/0x1000
bad directory block magic # 0x16011664 in block 0 for directory inode 867467
corrupt directory block 0 for inode 867467
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 3
        - agno = 2
bad directory block magic # 0x16011664 in block 0 for directory inode 867467
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
bad directory block magic # 0x16011664 for directory inode 867467 block 0: fixing magic # to 0x58444233
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Metadata corruption detected at 0x46cc88, xfs_dir3_block block 0xd3c50/0x1000
libxfs_bwrite: write verifier failed on xfs_dir3_block bno 0xd3c50/0x8
xfs_repair: Releasing dirty buffer to free list!
xfs_repair: Refusing to write a corrupt buffer to the data device!
xfs_repair: Lost a write to the data device!

fatal error -- File system metadata writeout failed, err=117.  Re-run xfs_repair.


With the patch applied:
$ xfs_repair xfs_metadump_hosting.dmp.image
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
Metadata CRC error detected at 0x46ce28, xfs_dir3_block block 0xd3c50/0x1000
bad directory block magic # 0x16011664 in block 0 for directory inode 867467
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
bad directory block magic # 0x16011664 in block 0 for directory inode 867467
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
Metadata CRC error detected at 0x46ce28, xfs_dir3_block block 0xd3c50/0x1000
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
bad directory block magic # 0x16011664 for directory inode 867467 block 0: fixing magic # to 0x58444233
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
rebuilding directory inode 867467
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
---
 repair/da_util.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/repair/da_util.c b/repair/da_util.c
index 7f94f4012062..0a4785e6f69b 100644
--- a/repair/da_util.c
+++ b/repair/da_util.c
@@ -66,6 +66,9 @@ da_read_buf(
 	}
 	libxfs_buf_read_map(mp->m_dev, map, nex, LIBXFS_READBUF_SALVAGE,
 			&bp, ops);
+	if (bp->b_error == -EFSBADCRC) {
+		libxfs_buf_relse(bp);
+	}
 	if (map != map_array)
 		free(map);
 	return bp;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread
* [PATCH] xfs_repair: -EFSBADCRC needs action when read verifier detects it.
@ 2025-02-26 16:43 bodonnel
  0 siblings, 0 replies; 7+ messages in thread
From: bodonnel @ 2025-02-26 16:43 UTC (permalink / raw)
  To: linux-xfs; +Cc: aalbersh, djwong

From: Bill O'Donnell <bodonnel@redhat.com>

For xfs_repair, there is a case when -EFSBADCRC is encountered but not
acted on. Modify da_read_buf to check for and repair. The current
implementation fails for the case:

$ xfs_repair xfs_metadump_hosting.dmp.image
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
Metadata CRC error detected at 0x46cde8, xfs_dir3_block block 0xd3c50/0x1000
bad directory block magic # 0x16011664 in block 0 for directory inode 867467
corrupt directory block 0 for inode 867467
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 3
        - agno = 2
bad directory block magic # 0x16011664 in block 0 for directory inode 867467
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
bad directory block magic # 0x16011664 for directory inode 867467 block 0: fixing magic # to 0x58444233
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Metadata corruption detected at 0x46cc88, xfs_dir3_block block 0xd3c50/0x1000
libxfs_bwrite: write verifier failed on xfs_dir3_block bno 0xd3c50/0x8
xfs_repair: Releasing dirty buffer to free list!
xfs_repair: Refusing to write a corrupt buffer to the data device!
xfs_repair: Lost a write to the data device!

fatal error -- File system metadata writeout failed, err=117.  Re-run xfs_repair.


With the patch applied:
$ xfs_repair xfs_metadump_hosting.dmp.image
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
Metadata CRC error detected at 0x46ce28, xfs_dir3_block block 0xd3c50/0x1000
bad directory block magic # 0x16011664 in block 0 for directory inode 867467
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
bad directory block magic # 0x16011664 in block 0 for directory inode 867467
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
Metadata CRC error detected at 0x46ce28, xfs_dir3_block block 0xd3c50/0x1000
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
bad directory block magic # 0x16011664 for directory inode 867467 block 0: fixing magic # to 0x58444233
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
rebuilding directory inode 867467
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
---
 repair/da_util.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/repair/da_util.c b/repair/da_util.c
index 7f94f4012062..0a4785e6f69b 100644
--- a/repair/da_util.c
+++ b/repair/da_util.c
@@ -66,6 +66,9 @@ da_read_buf(
 	}
 	libxfs_buf_read_map(mp->m_dev, map, nex, LIBXFS_READBUF_SALVAGE,
 			&bp, ops);
+	if (bp->b_error == -EFSBADCRC) {
+		libxfs_buf_relse(bp);
+	}
 	if (map != map_array)
 		free(map);
 	return bp;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-03-13 21:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-26 17:32 [PATCH] xfs_repair: -EFSBADCRC needs action when read verifier detects it bodonnel
2025-02-26 18:20 ` Darrick J. Wong
2025-02-28 15:27   ` Bill O'Donnell
2025-02-28 17:35     ` Darrick J. Wong
2025-03-13 19:07   ` Bill O'Donnell
2025-03-13 21:40     ` Eric Sandeen
  -- strict thread matches above, loose matches on Subject: below --
2025-02-26 16:43 bodonnel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox