Unable to fix metadata corruption with xfs

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* Unable to fix metadata corruption with xfs_repair
@ 2019-01-21 15:36 Julien Lutran
  2019-01-21 16:42 ` Eric Sandeen
  2019-01-21 20:31 ` Dave Chinner
  0 siblings, 2 replies; 6+ messages in thread
From: Julien Lutran @ 2019-01-21 15:36 UTC (permalink / raw)
  To: linux-xfs@vger.kernel.org


[-- Attachment #1.1: Type: text/plain, Size: 1058 bytes --]

Hello,

I’m experiencing an issue with metadata corruption while trying to fix several corrupted xfs filesystems.
Here’s an excerpt of the kernel messages when the disk is mounted :

[…]
Jan 21 15:44:16 rescue kernel: XFS (sdb): Metadata corruption detected at xfs_inode_buf_verify+0x6d/0xf0, xfs_inode block 0x300160
Jan 21 15:44:16 rescue kernel: XFS (sdb): Unmount and run xfs_repair
Jan 21 15:44:16 rescue kernel: XFS (sdb): First 64 bytes of corrupted metadata buffer:
Jan 21 15:44:16 rescue kernel: XFS (sdb): metadata I/O error: block 0x300160 ("xfs_trans_read_buf_map") error 117 numblks 16
Jan 21 15:44:16 rescue kernel: XFS (sdb): xfs_imap_to_bp: xfs_trans_read_buf() returned error -117.

I tried to run a xfs_repair (see attached log) but it ends up the same way : metadata error on block 0x300160
Is there a way to fix this corruption ?

Linux kernel version is 4.14.17 but I encountered the exact same issue in several other hosts running an older kernel.
Xfsprogs version is 4.19.0


Best regards,
Julien Lutran



[-- Attachment #1.2: xfs_repair.log --]
[-- Type: application/octet-stream, Size: 6782 bytes --]

Start xfs_repair with cmdline: xfs_repair -L -m 8047 /dev/sdb

Phase 1 - find and verify superblock...
        - reporting progress in intervals of 15 minutes
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - 11:33:19: scanning filesystem freespace - 32 of 32 allocation groups done
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - 11:33:19: scanning agi unlinked lists - 32 of 32 allocation groups done
        - process known inodes and perform inode discovery...
        - agno = 15
        - agno = 0
        - agno = 30
Metadata corruption detected at 0x4314b3, xfs_inode block 0x300160/0x2000
        - agno = 16
bad magic number 0x0 on inode 6292176
bad version number 0x0 on inode 6292176
bad magic number 0x0 on inode 6292177
bad version number 0x0 on inode 6292177
bad magic number 0x0 on inode 6292178
bad version number 0x0 on inode 6292178
bad magic number 0x0 on inode 6292179
bad version number 0x0 on inode 6292179
bad magic number 0x0 on inode 6292180
bad version number 0x0 on inode 6292180
bad magic number 0x0 on inode 6292181
bad version number 0x0 on inode 6292181
bad magic number 0x0 on inode 6292182
bad version number 0x0 on inode 6292182
bad magic number 0x0 on inode 6292183
bad version number 0x0 on inode 6292183
bad magic number 0x0 on inode 6292184
bad version number 0x0 on inode 6292184
bad magic number 0x0 on inode 6292185
bad version number 0x0 on inode 6292185
bad magic number 0x0 on inode 6292186
bad version number 0x0 on inode 6292186
bad magic number 0x0 on inode 6292187
bad version number 0x0 on inode 6292187
bad magic number 0x0 on inode 6292188
bad version number 0x0 on inode 6292188
bad magic number 0x0 on inode 6292189
bad version number 0x0 on inode 6292189
bad magic number 0x0 on inode 6292190
bad version number 0x0 on inode 6292190
bad magic number 0x0 on inode 6292191
bad version number 0x0 on inode 6292191
bad magic number 0x0 on inode 6292176, resetting magic number
bad version number 0x0 on inode 6292176, resetting version number
bad magic number 0x0 on inode 6292177, resetting magic number
bad version number 0x0 on inode 6292177, resetting version number
bad magic number 0x0 on inode 6292178, resetting magic number
bad version number 0x0 on inode 6292178, resetting version number
bad magic number 0x0 on inode 6292179, resetting magic number
bad version number 0x0 on inode 6292179, resetting version number
bad magic number 0x0 on inode 6292180, resetting magic number
bad version number 0x0 on inode 6292180, resetting version number
bad magic number 0x0 on inode 6292181, resetting magic number
bad version number 0x0 on inode 6292181, resetting version number
bad magic number 0x0 on inode 6292182, resetting magic number
bad version number 0x0 on inode 6292182, resetting version number
bad magic number 0x0 on inode 6292183, resetting magic number
bad version number 0x0 on inode 6292183, resetting version number
bad magic number 0x0 on inode 6292184, resetting magic number
bad version number 0x0 on inode 6292184, resetting version number
bad magic number 0x0 on inode 6292185, resetting magic number
bad version number 0x0 on inode 6292185, resetting version number
bad magic number 0x0 on inode 6292186, resetting magic number
bad version number 0x0 on inode 6292186, resetting version number
bad magic number 0x0 on inode 6292187, resetting magic number
bad version number 0x0 on inode 6292187, resetting version number
bad magic number 0x0 on inode 6292188, resetting magic number
bad version number 0x0 on inode 6292188, resetting version number
bad magic number 0x0 on inode 6292189, resetting magic number
bad version number 0x0 on inode 6292189, resetting version number
bad magic number 0x0 on inode 6292190, resetting magic number
bad version number 0x0 on inode 6292190, resetting version number
bad magic number 0x0 on inode 6292191, resetting magic number
bad version number 0x0 on inode 6292191, resetting version number
        - agno = 1
        - agno = 17
        - agno = 31
        - agno = 2
        - agno = 18
        - agno = 19
        - agno = 3
Metadata corruption detected at 0x431775, xfs_inode block 0x300160/0x2000
libxfs_writebufr: write verifer failed on xfs_inode bno 0x300160/0x2000
        - agno = 20
        - agno = 4
        - agno = 21
        - agno = 5
        - agno = 22
        - agno = 6
        - agno = 23
        - agno = 7
        - agno = 24
        - agno = 8
        - agno = 25
        - agno = 9
        - agno = 26
        - agno = 10
        - agno = 11
        - agno = 27
        - agno = 12
        - agno = 28
        - agno = 13
        - agno = 29
        - agno = 14
        - 14:22:46: process known inodes and inode discovery - 22952512 of 22952512 inodes done
        - process newly discovered inodes...
        - 14:22:46: process newly discovered inodes - 32 of 32 allocation groups done
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - 14:22:47: setting up duplicate extent list - 32 of 32 allocation groups done
        - check for inodes claiming duplicate blocks...
        - agno = 15
        - agno = 0
        - agno = 30
Metadata corruption detected at 0x4314b3, xfs_inode block 0x300160/0x2000
        - agno = 1
        - agno = 16
        - agno = 31
        - agno = 17
        - agno = 2
        - agno = 3
        - agno = 18
        - agno = 19
        - agno = 4
        - agno = 20
        - agno = 5
        - agno = 21
        - agno = 6
        - agno = 7
        - agno = 22
        - agno = 8
        - agno = 23
        - agno = 9
        - agno = 24
        - agno = 10
        - agno = 25
        - agno = 11
        - agno = 26
        - agno = 12
        - agno = 27
        - agno = 13
        - agno = 28
        - agno = 14
        - agno = 29
        - 14:31:35: check for inodes claiming duplicate blocks - 22952512 of 22952512 inodes done
Phase 5 - rebuild AG headers and trees...
        - 14:31:37: rebuild AG headers and trees - 32 of 32 allocation groups done
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
Metadata corruption detected at 0x4314b3, xfs_inode block 0x300160/0x2000
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
        - 14:39:59: verify and correct link counts - 32 of 32 allocation groups done
Metadata corruption detected at 0x431775, xfs_inode block 0x300160/0x2000
libxfs_writebufr: write verifer failed on xfs_inode bno 0x300160/0x2000
releasing dirty buffer (bulk) to free list!done

[-- Attachment #1.3: Type: text/plain, Size: 4 bytes --]






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 874 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Unable to fix metadata corruption with xfs_repair
  2019-01-21 15:36 Unable to fix metadata corruption with xfs_repair Julien Lutran
@ 2019-01-21 16:42 ` Eric Sandeen
  2019-01-23  9:21   ` Julien Lutran
  2019-01-24 13:07   ` Julien Lutran
  2019-01-21 20:31 ` Dave Chinner
  1 sibling, 2 replies; 6+ messages in thread
From: Eric Sandeen @ 2019-01-21 16:42 UTC (permalink / raw)
  To: Julien Lutran, linux-xfs@vger.kernel.org


[-- Attachment #1.1: Type: text/plain, Size: 1565 bytes --]



On 1/21/19 9:36 AM, Julien Lutran wrote:
> Hello,
> 
> I’m experiencing an issue with metadata corruption while trying to fix several corrupted xfs filesystems.
> Here’s an excerpt of the kernel messages when the disk is mounted :
> 
> […]
> Jan 21 15:44:16 rescue kernel: XFS (sdb): Metadata corruption detected at xfs_inode_buf_verify+0x6d/0xf0, xfs_inode block 0x300160
> Jan 21 15:44:16 rescue kernel: XFS (sdb): Unmount and run xfs_repair
> Jan 21 15:44:16 rescue kernel: XFS (sdb): First 64 bytes of corrupted metadata buffer:
> Jan 21 15:44:16 rescue kernel: XFS (sdb): metadata I/O error: block 0x300160 ("xfs_trans_read_buf_map") error 117 numblks 16
> Jan 21 15:44:16 rescue kernel: XFS (sdb): xfs_imap_to_bp: xfs_trans_read_buf() returned error -117.
> 
> I tried to run a xfs_repair (see attached log) but it ends up the same way : metadata error on block 0x300160
> Is there a way to fix this corruption ?
> 
> Linux kernel version is 4.14.17 but I encountered the exact same issue in several other hosts running an older kernel.
> Xfsprogs version is 4.19.0
> 
> 
> Best regards,
> Julien Lutran

Hi Julien -

Your log file says:

"Start xfs_repair with cmdline: xfs_repair -L -m 8047 /dev/sdb"

1) Why did you use -L, would the log not replay?
2) Why use -m?  Does that affect the outcome at all?
3) You could give the for-next branch in git a try, just in case, but otherwise
4) Please provide a compressed xfs_metadump for me to look at, off list, and
   I'll see what I can find.

Thanks,
-Eric
 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Unable to fix metadata corruption with xfs_repair
  2019-01-21 16:42 ` Eric Sandeen
@ 2019-01-23  9:21   ` Julien Lutran
  2019-01-24 13:07   ` Julien Lutran
  1 sibling, 0 replies; 6+ messages in thread
From: Julien Lutran @ 2019-01-23  9:21 UTC (permalink / raw)
  To: linux-xfs@vger.kernel.org


[-- Attachment #1.1: Type: text/plain, Size: 2179 bytes --]

Hello,

> On 21 Jan 2019, at 17:42, Eric Sandeen <sandeen@sandeen.net> wrote:
> 
> 
> 
> On 1/21/19 9:36 AM, Julien Lutran wrote:
>> Hello,
>> 
>> I’m experiencing an issue with metadata corruption while trying to fix several corrupted xfs filesystems.
>> Here’s an excerpt of the kernel messages when the disk is mounted :
>> 
>> […]
>> Jan 21 15:44:16 rescue kernel: XFS (sdb): Metadata corruption detected at xfs_inode_buf_verify+0x6d/0xf0, xfs_inode block 0x300160
>> Jan 21 15:44:16 rescue kernel: XFS (sdb): Unmount and run xfs_repair
>> Jan 21 15:44:16 rescue kernel: XFS (sdb): First 64 bytes of corrupted metadata buffer:
>> Jan 21 15:44:16 rescue kernel: XFS (sdb): metadata I/O error: block 0x300160 ("xfs_trans_read_buf_map") error 117 numblks 16
>> Jan 21 15:44:16 rescue kernel: XFS (sdb): xfs_imap_to_bp: xfs_trans_read_buf() returned error -117.
>> 
>> I tried to run a xfs_repair (see attached log) but it ends up the same way : metadata error on block 0x300160
>> Is there a way to fix this corruption ?
>> 
>> Linux kernel version is 4.14.17 but I encountered the exact same issue in several other hosts running an older kernel.
>> Xfsprogs version is 4.19.0
>> 
>> 
>> Best regards,
>> Julien Lutran
> 
> Hi Julien -
> 
> Your log file says:
> 
> "Start xfs_repair with cmdline: xfs_repair -L -m 8047 /dev/sdb"
> 
> 1) Why did you use -L, would the log not replay?

That’s because we’re dealing with a lot of corrupted filesystems, and our robot sets this option by default to handle filesystems heavily damaged.
I ran another “xfs_repair -m 8047 -v /dev/sdb”, see attached log.

> 2) Why use -m?  Does that affect the outcome at all?

Because that’s a very loaded server, so we throttle the maxmem to 25% of the total RAM.
I tried without setting this parameter but it ends up the same way.

> 3) You could give the for-next branch in git a try, just in case, but otherwise

Will do :)

> 4) Please provide a compressed xfs_metadump for me to look at, off list, and
>   I'll see what I can find.

xfs_metadump currently running, I will provide it asap.

> 
> Thanks,
> -Eric
> 
> 


[-- Attachment #1.2: xfs_repair_verbose.log --]
[-- Type: application/octet-stream, Size: 8715 bytes --]

root@rescue:~# xfs_repair -m 8087 -v /dev/sdb
Phase 1 - find and verify superblock...
        - reporting progress in intervals of 15 minutes
        - block cache size set to 928312 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 1265445 tail block 1265445
        - scan filesystem freespace and inode maps...
        - 09:25:25: scanning filesystem freespace - 32 of 32 allocation groups done
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - 09:25:25: scanning agi unlinked lists - 32 of 32 allocation groups done
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 15
        - agno = 30
Metadata corruption detected at 0x4314b3, xfs_inode block 0x300160/0x2000
        - agno = 16
bad magic number 0x0 on inode 6292176
bad version number 0x0 on inode 6292176
bad magic number 0x0 on inode 6292177
bad version number 0x0 on inode 6292177
bad magic number 0x0 on inode 6292178
bad version number 0x0 on inode 6292178
bad magic number 0x0 on inode 6292179
bad version number 0x0 on inode 6292179
bad magic number 0x0 on inode 6292180
bad version number 0x0 on inode 6292180
bad magic number 0x0 on inode 6292181
bad version number 0x0 on inode 6292181
bad magic number 0x0 on inode 6292182
bad version number 0x0 on inode 6292182
bad magic number 0x0 on inode 6292183
bad version number 0x0 on inode 6292183
bad magic number 0x0 on inode 6292184
bad version number 0x0 on inode 6292184
bad magic number 0x0 on inode 6292185
bad version number 0x0 on inode 6292185
bad magic number 0x0 on inode 6292186
bad version number 0x0 on inode 6292186
bad magic number 0x0 on inode 6292187
bad version number 0x0 on inode 6292187
bad magic number 0x0 on inode 6292188
bad version number 0x0 on inode 6292188
bad magic number 0x0 on inode 6292189
bad version number 0x0 on inode 6292189
bad magic number 0x0 on inode 6292190
bad version number 0x0 on inode 6292190
bad magic number 0x0 on inode 6292191
bad version number 0x0 on inode 6292191
bad magic number 0x0 on inode 6292176, resetting magic number
bad version number 0x0 on inode 6292176, resetting version number
bad magic number 0x0 on inode 6292177, resetting magic number
bad version number 0x0 on inode 6292177, resetting version number
bad magic number 0x0 on inode 6292178, resetting magic number
bad version number 0x0 on inode 6292178, resetting version number
bad magic number 0x0 on inode 6292179, resetting magic number
bad version number 0x0 on inode 6292179, resetting version number
bad magic number 0x0 on inode 6292180, resetting magic number
bad version number 0x0 on inode 6292180, resetting version number
bad magic number 0x0 on inode 6292181, resetting magic number
bad version number 0x0 on inode 6292181, resetting version number
bad magic number 0x0 on inode 6292182, resetting magic number
bad version number 0x0 on inode 6292182, resetting version number
bad magic number 0x0 on inode 6292183, resetting magic number
bad version number 0x0 on inode 6292183, resetting version number
bad magic number 0x0 on inode 6292184, resetting magic number
bad version number 0x0 on inode 6292184, resetting version number
bad magic number 0x0 on inode 6292185, resetting magic number
bad version number 0x0 on inode 6292185, resetting version number
bad magic number 0x0 on inode 6292186, resetting magic number
bad version number 0x0 on inode 6292186, resetting version number
bad magic number 0x0 on inode 6292187, resetting magic number
bad version number 0x0 on inode 6292187, resetting version number
bad magic number 0x0 on inode 6292188, resetting magic number
bad version number 0x0 on inode 6292188, resetting version number
bad magic number 0x0 on inode 6292189, resetting magic number
bad version number 0x0 on inode 6292189, resetting version number
bad magic number 0x0 on inode 6292190, resetting magic number
bad version number 0x0 on inode 6292190, resetting version number
bad magic number 0x0 on inode 6292191, resetting magic number
bad version number 0x0 on inode 6292191, resetting version number
        - agno = 1
        - agno = 17
        - agno = 31
        - agno = 2
        - agno = 18
        - agno = 19
        - agno = 3
Metadata corruption detected at 0x431775, xfs_inode block 0x300160/0x2000
libxfs_writebufr: write verifer failed on xfs_inode bno 0x300160/0x2000
        - agno = 20
        - agno = 4
        - agno = 21
        - agno = 5
        - agno = 22
        - agno = 6
        - agno = 23
        - agno = 7
        - agno = 24
        - agno = 8
        - agno = 25
        - agno = 9
        - agno = 26
        - agno = 10
        - agno = 27
        - agno = 11
        - agno = 28
        - agno = 12
        - agno = 13
        - agno = 29
        - agno = 14
        - 12:12:06: process known inodes and inode discovery - 22822912 of 22822912 inodes done
        - process newly discovered inodes...
        - 12:12:06: process newly discovered inodes - 32 of 32 allocation groups done
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - 12:12:06: setting up duplicate extent list - 32 of 32 allocation groups done
        - check for inodes claiming duplicate blocks...
        - agno = 15
        - agno = 0
        - agno = 30
Metadata corruption detected at 0x4314b3, xfs_inode block 0x300160/0x2000
        - agno = 16
        - agno = 1
        - agno = 31
        - agno = 17
        - agno = 2
        - agno = 18
        - agno = 3
        - agno = 19
        - agno = 4
        - agno = 20
        - agno = 21
        - agno = 5
        - agno = 6
        - agno = 22
        - agno = 7
        - agno = 23
        - agno = 8
        - agno = 9
        - agno = 24
        - agno = 10
        - agno = 25
        - agno = 11
        - agno = 26
        - agno = 12
        - agno = 13
        - agno = 27
        - agno = 14
        - agno = 28
        - agno = 29
        - 12:20:54: check for inodes claiming duplicate blocks - 22822912 of 22822912 inodes done
Phase 5 - rebuild AG headers and trees...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - agno = 17
        - agno = 18
        - agno = 19
        - agno = 20
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 24
        - agno = 25
        - agno = 26
        - agno = 27
        - agno = 28
        - agno = 29
        - agno = 30
        - agno = 31
        - 12:20:58: rebuild AG headers and trees - 32 of 32 allocation groups done
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - agno = 0
Metadata corruption detected at 0x4314b3, xfs_inode block 0x300160/0x2000
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - agno = 17
        - agno = 18
        - agno = 19
        - agno = 20
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 24
        - agno = 25
        - agno = 26
        - agno = 27
        - agno = 28
        - agno = 29
        - agno = 30
        - agno = 31
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
        - 12:29:22: verify and correct link counts - 32 of 32 allocation groups done
Metadata corruption detected at 0x431775, xfs_inode block 0x300160/0x2000
libxfs_writebufr: write verifer failed on xfs_inode bno 0x300160/0x2000
releasing dirty buffer (bulk) to free list!
        XFS_REPAIR Summary    Tue Jan 22 12:29:31 2019

Phase           Start           End             Duration
Phase 1:        01/22 09:25:08  01/22 09:25:08
Phase 2:        01/22 09:25:08  01/22 09:25:25  17 seconds
Phase 3:        01/22 09:25:25  01/22 12:12:06  2 hours, 46 minutes, 41 seconds
Phase 4:        01/22 12:12:06  01/22 12:20:54  8 minutes, 48 seconds
Phase 5:        01/22 12:20:54  01/22 12:20:58  4 seconds
Phase 6:        01/22 12:20:58  01/22 12:29:22  8 minutes, 24 seconds
Phase 7:        01/22 12:29:22  01/22 12:29:22

Total run time: 3 hours, 4 minutes, 14 seconds
done

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 874 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Unable to fix metadata corruption with xfs_repair
  2019-01-21 16:42 ` Eric Sandeen
  2019-01-23  9:21   ` Julien Lutran
@ 2019-01-24 13:07   ` Julien Lutran
  1 sibling, 0 replies; 6+ messages in thread
From: Julien Lutran @ 2019-01-24 13:07 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 1821 bytes --]

Here’s the xfs_metadump (1.35 GB) : https://dl.plik.ovh/file/kyXGCIy5luJe7ZKi/ZvUMwvCqUVPiQ2Oe/sdb.metadump.xz


Julien

> On 21 Jan 2019, at 17:42, Eric Sandeen <sandeen@sandeen.net> wrote:
> 
> 
> 
> On 1/21/19 9:36 AM, Julien Lutran wrote:
>> Hello,
>> 
>> I’m experiencing an issue with metadata corruption while trying to fix several corrupted xfs filesystems.
>> Here’s an excerpt of the kernel messages when the disk is mounted :
>> 
>> […]
>> Jan 21 15:44:16 rescue kernel: XFS (sdb): Metadata corruption detected at xfs_inode_buf_verify+0x6d/0xf0, xfs_inode block 0x300160
>> Jan 21 15:44:16 rescue kernel: XFS (sdb): Unmount and run xfs_repair
>> Jan 21 15:44:16 rescue kernel: XFS (sdb): First 64 bytes of corrupted metadata buffer:
>> Jan 21 15:44:16 rescue kernel: XFS (sdb): metadata I/O error: block 0x300160 ("xfs_trans_read_buf_map") error 117 numblks 16
>> Jan 21 15:44:16 rescue kernel: XFS (sdb): xfs_imap_to_bp: xfs_trans_read_buf() returned error -117.
>> 
>> I tried to run a xfs_repair (see attached log) but it ends up the same way : metadata error on block 0x300160
>> Is there a way to fix this corruption ?
>> 
>> Linux kernel version is 4.14.17 but I encountered the exact same issue in several other hosts running an older kernel.
>> Xfsprogs version is 4.19.0
>> 
>> 
>> Best regards,
>> Julien Lutran
> 
> Hi Julien -
> 
> Your log file says:
> 
> "Start xfs_repair with cmdline: xfs_repair -L -m 8047 /dev/sdb"
> 
> 1) Why did you use -L, would the log not replay?
> 2) Why use -m?  Does that affect the outcome at all?
> 3) You could give the for-next branch in git a try, just in case, but otherwise
> 4) Please provide a compressed xfs_metadump for me to look at, off list, and
>   I'll see what I can find.
> 
> Thanks,
> -Eric


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 874 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Unable to fix metadata corruption with xfs_repair
  2019-01-21 15:36 Unable to fix metadata corruption with xfs_repair Julien Lutran
  2019-01-21 16:42 ` Eric Sandeen
@ 2019-01-21 20:31 ` Dave Chinner
  2019-01-23  9:33   ` Julien Lutran
  1 sibling, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2019-01-21 20:31 UTC (permalink / raw)
  To: Julien Lutran; +Cc: linux-xfs@vger.kernel.org

On Mon, Jan 21, 2019 at 03:36:11PM +0000, Julien Lutran wrote:
> Hello,
> 
> I’m experiencing an issue with metadata corruption while trying to fix several corrupted xfs filesystems.
> Here’s an excerpt of the kernel messages when the disk is mounted :

http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

> […]
> Jan 21 15:44:16 rescue kernel: XFS (sdb): Metadata corruption detected at xfs_inode_buf_verify+0x6d/0xf0, xfs_inode block 0x300160
> Jan 21 15:44:16 rescue kernel: XFS (sdb): Unmount and run xfs_repair
> Jan 21 15:44:16 rescue kernel: XFS (sdb): First 64 bytes of corrupted metadata buffer:
> Jan 21 15:44:16 rescue kernel: XFS (sdb): metadata I/O error: block 0x300160 ("xfs_trans_read_buf_map") error 117 numblks 16
> Jan 21 15:44:16 rescue kernel: XFS (sdb): xfs_imap_to_bp: xfs_trans_read_buf() returned error -117.

What's in the 64 bytes of the corrupted metadata buffer output?
i.e. you trimmed away the bit of the error message that we actually
need to see what went wrong. Can you paste the uneditted log of the
error, including the output from mount time from the filesystem?

> I tried to run a xfs_repair (see attached log) but it ends up the same way : metadata error on block 0x300160
> Is there a way to fix this corruption ?

Likely a repair bug - the inode cluster has been trashed for some
reason and it's not fixing it properly so it's refusing to write
back corrupt inode metadata.

I really need to see what was in the first 64 bytes of that buffer
and xfs_info output to determine if we have a corrupt cluster, a
corrupt filessystem block, or the hardware has returned a compeltely
zeroed sector....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Unable to fix metadata corruption with xfs_repair
  2019-01-21 20:31 ` Dave Chinner
@ 2019-01-23  9:33   ` Julien Lutran
  0 siblings, 0 replies; 6+ messages in thread
From: Julien Lutran @ 2019-01-23  9:33 UTC (permalink / raw)
  To: linux-xfs@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 2957 bytes --]

Hello,

> On 21 Jan 2019, at 21:31, Dave Chinner <david@fromorbit.com> wrote:
> 
> On Mon, Jan 21, 2019 at 03:36:11PM +0000, Julien Lutran wrote:
>> Hello,
>> 
>> I’m experiencing an issue with metadata corruption while trying to fix several corrupted xfs filesystems.
>> Here’s an excerpt of the kernel messages when the disk is mounted :
> 
> http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
> 
>> […]
>> Jan 21 15:44:16 rescue kernel: XFS (sdb): Metadata corruption detected at xfs_inode_buf_verify+0x6d/0xf0, xfs_inode block 0x300160
>> Jan 21 15:44:16 rescue kernel: XFS (sdb): Unmount and run xfs_repair
>> Jan 21 15:44:16 rescue kernel: XFS (sdb): First 64 bytes of corrupted metadata buffer:
>> Jan 21 15:44:16 rescue kernel: XFS (sdb): metadata I/O error: block 0x300160 ("xfs_trans_read_buf_map") error 117 numblks 16
>> Jan 21 15:44:16 rescue kernel: XFS (sdb): xfs_imap_to_bp: xfs_trans_read_buf() returned error -117.
> 
> What's in the 64 bytes of the corrupted metadata buffer output?
> i.e. you trimmed away the bit of the error message that we actually
> need to see what went wrong. Can you paste the uneditted log of the
> error, including the output from mount time from the filesystem?

Sorry, my bad. Here’s the full log :

Jan 23 09:43:36 rescue kernel: XFS (sdb): Metadata corruption detected at xfs_inode_buf_verify+0x6d/0xf0, xfs_inode block 0x300160
Jan 23 09:43:36 rescue kernel: XFS (sdb): Unmount and run xfs_repair
Jan 23 09:43:36 rescue kernel: XFS (sdb): First 64 bytes of corrupted metadata buffer:
Jan 23 09:43:36 rescue kernel: ffff998f8f33b000: 49 4e 41 ed 02 01 00 00 00 00 03 e7 00 00 03 e7  INA.............
Jan 23 09:43:36 rescue kernel: ffff998f8f33b010: 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 01  ................
Jan 23 09:43:36 rescue kernel: ffff998f8f33b020: 59 c8 26 66 34 15 44 a6 59 c8 26 68 04 66 94 c0  Y.&f4.D.Y.&h.f..
Jan 23 09:43:36 rescue kernel: ffff998f8f33b030: 59 c8 26 68 04 66 94 c0 00 00 00 00 00 00 00 2f  Y.&h.f........./
Jan 23 09:43:36 rescue kernel: XFS (sdb): metadata I/O error: block 0x300160 ("xfs_trans_read_buf_map") error 117 numblks 16
Jan 23 09:43:36 rescue kernel: XFS (sdb): xfs_imap_to_bp: xfs_trans_read_buf() returned error -117.

> 
>> I tried to run a xfs_repair (see attached log) but it ends up the same way : metadata error on block 0x300160
>> Is there a way to fix this corruption ?
> 
> Likely a repair bug - the inode cluster has been trashed for some
> reason and it's not fixing it properly so it's refusing to write
> back corrupt inode metadata.
> 
> I really need to see what was in the first 64 bytes of that buffer
> and xfs_info output to determine if we have a corrupt cluster, a
> corrupt filessystem block, or the hardware has returned a compeltely
> zeroed sector....
> 
> Cheers,
> 
> Dave.
> --
> Dave Chinner
> david@fromorbit.com


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 874 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-01-24 13:07 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-01-21 15:36 Unable to fix metadata corruption with xfs_repair Julien Lutran
2019-01-21 16:42 ` Eric Sandeen
2019-01-23  9:21   ` Julien Lutran
2019-01-24 13:07   ` Julien Lutran
2019-01-21 20:31 ` Dave Chinner
2019-01-23  9:33   ` Julien Lutran

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox