From: Brian Foster <bfoster@redhat.com>
To: Avi Kivity <avi@scylladb.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: filesystem dead, xfs_repair won't help
Date: Mon, 10 Apr 2017 11:35:32 -0400 [thread overview]
Message-ID: <20170410153529.GE3991@bfoster.bfoster> (raw)
In-Reply-To: <612a9d5a-9877-405c-ef7c-dc9632a1d8bc@scylladb.com>
On Mon, Apr 10, 2017 at 12:42:33PM +0300, Avi Kivity wrote:
> On 04/10/2017 12:23 PM, Avi Kivity wrote:
> > Today my kernel complained that in memory metadata is corrupt and
> > asked that I run xfs_repair. But xfs_repair doesn't like the
> > superblock and isn't able to find a secondary superblock.
> >
> > Latest Fedora 25 kernel, new Intel NVMe drive (worked for a few weeks
> > without issue).
> >
> > Anything I can do to recover the data?
>
Well I can't explain why you have a checksum error, but what do you mean
that xfs_repair doesn't like the superblock? Can you provide the
xfs_repair output?
It seems strange for xfs_repair to not find the superblock of a
filesystem that can otherwise run log recovery up until it encounters
the buffer with a bad crc.
It also might be useful to find out exactly what that error reported by
smartctl means. Are you aware of whether it pre-existed the filesystem
issue or not?
Brian
>
> Initial error:
>
> Apr 10 11:41:20 avi.cloudius-systems.com kernel: XFS (nvme0n1): Metadata CRC
> error detected at xfs_agfl_read_verify+0xcd/0x100 [xfs], xfs_agfl block
> 0x2cb68e13
> Apr 10 11:41:20 avi.cloudius-systems.com kernel: XFS (nvme0n1): Unmount and
> run xfs_repair
> Apr 10 11:41:20 avi.cloudius-systems.com kernel: XFS (nvme0n1): First 64
> bytes of corrupted metadata buffer:
> Apr 10 11:41:20 avi.cloudius-systems.com kernel: ffff9004a5b75400: 23 40 8f
> 28 5b 50 3a b4 f8 54 1e 31 97 f4 fe ed #@.([P:..T.1....
> Apr 10 11:41:20 avi.cloudius-systems.com kernel: ffff9004a5b75410: 62 87 57
> 51 ee 9d 31 02 ec 2c 10 46 6c 93 db 09 b.WQ..1..,.Fl...
> Apr 10 11:41:20 avi.cloudius-systems.com kernel: ffff9004a5b75420: ae 7a ea
> b3 91 49 7e d3 99 a4 25 49 11 c5 8b be .z...I~...%I....
> Apr 10 11:41:20 avi.cloudius-systems.com kernel: ffff9004a5b75430: e4 2e 14
> d4 8a f8 5f 98 66 d8 67 72 ec c9 1a d5 ......_.f.gr....
> Apr 10 11:41:20 avi.cloudius-systems.com kernel: XFS (nvme0n1): metadata I/O
> error: block 0x2cb68e13 ("xfs_trans_read_buf_map") error 74 numblks 1
> Apr 10 11:41:20 avi.cloudius-systems.com kernel: XFS (nvme0n1):
> xfs_do_force_shutdown(0x8) called from line 236 of file
> fs/xfs/libxfs/xfs_defer.c. Return address = 0xffffffffc05bdbc6
> Apr 10 11:41:20 avi.cloudius-systems.com kernel: XFS (nvme0n1): Corruption
> of in-memory data detected. Shutting down filesystem
> Apr 10 11:41:20 avi.cloudius-systems.com kernel: XFS (nvme0n1): Please
> umount the filesystem and rectify the problem(s)
>
>
> After restart:
>
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): Mounting V5
> Filesystem
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): Starting
> recovery (logdev: internal)
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): Metadata CRC
> error detected at xfs_agfl_read_verify+0xcd/0x100 [xfs], xfs_agfl block
> 0x2cb68e13
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): Unmount and
> run xfs_repair
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): First 64
> bytes of corrupted metadata buffer:
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: ffff9450761d4a00: 23 40 8f
> 28 5b 50 3a b4 f8 54 1e 31 97 f4 fe ed #@.([P:..T.1....
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: ffff9450761d4a10: 62 87 57
> 51 ee 9d 31 02 ec 2c 10 46 6c 93 db 09 b.WQ..1..,.Fl...
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: ffff9450761d4a20: ae 7a ea
> b3 91 49 7e d3 99 a4 25 49 11 c5 8b be .z...I~...%I....
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: ffff9450761d4a30: e4 2e 14
> d4 8a f8 5f 98 66 d8 67 72 ec c9 1a d5 ......_.f.gr....
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): metadata I/O
> error: block 0x2cb68e13 ("xfs_trans_read_buf_map") error 74 numblks 1
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): Internal
> error xfs_trans_cancel at line 983 of file fs/xfs/xfs_trans.c. Caller
> xfs_efi_recover+0x18e/0x1c0 [xfs]
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: CPU: 3 PID: 1063 Comm:
> mount Not tainted 4.10.8-200.fc25.x86_64 #1
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: Hardware name:
> /DH77EB, BIOS EBH7710H.86A.0099.2013.0125.1400 01/25/2013
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: Call Trace:
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: dump_stack+0x63/0x86
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: xfs_error_report+0x3c/0x40
> [xfs]
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: ?
> xfs_efi_recover+0x18e/0x1c0 [xfs]
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: xfs_trans_cancel+0xb6/0xe0
> [xfs]
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: xfs_efi_recover+0x18e/0x1c0
> [xfs]
> Apr 10 11:47:58 avi.cloudius-systems.com kernel:
> xlog_recover_process_efi+0x2c/0x50 [xfs]
> Apr 10 11:47:58 avi.cloudius-systems.com kernel:
> xlog_recover_process_intents.isra.42+0x122/0x160 [xfs]
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: ?
> xfs_reinit_percpu_counters+0x46/0x50 [xfs]
> Apr 10 11:47:58 avi.cloudius-systems.com kernel:
> xlog_recover_finish+0x23/0xb0 [xfs]
> Apr 10 11:47:58 avi.cloudius-systems.com kernel:
> xfs_log_mount_finish+0x29/0x50 [xfs]
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: xfs_mountfs+0x6ce/0x930
> [xfs]
> Apr 10 11:47:58 avi.cloudius-systems.com kernel:
> xfs_fs_fill_super+0x3ee/0x570 [xfs]
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: mount_bdev+0x178/0x1b0
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: ?
> xfs_test_remount_options.isra.14+0x60/0x60 [xfs]
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: xfs_fs_mount+0x15/0x20
> [xfs]
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: mount_fs+0x38/0x150
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: ? __alloc_percpu+0x15/0x20
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: vfs_kern_mount+0x67/0x130
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: do_mount+0x1dd/0xc50
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: ?
> _copy_from_user+0x4e/0x80
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: ? memdup_user+0x4f/0x70
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: SyS_mount+0x83/0xd0
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: do_syscall_64+0x67/0x180
> Apr 10 11:47:58 avi.cloudius-systems.com kernel:
> entry_SYSCALL64_slow_path+0x25/0x25
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: RIP: 0033:0x7f5cb9a626fa
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: RSP: 002b:00007ffeffa2c928
> EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: RAX: ffffffffffffffda RBX:
> 000055b59fd6f030 RCX: 00007f5cb9a626fa
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: RDX: 000055b59fd6f210 RSI:
> 000055b59fd6f250 RDI: 000055b59fd6f230
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: RBP: 0000000000000000 R08:
> 0000000000000000 R09: 0000000000000012
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: R10: 00000000c0ed0000 R11:
> 0000000000000246 R12: 000055b59fd6f230
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: R13: 000055b59fd6f210 R14:
> 0000000000000000 R15: 00000000ffffffff
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1):
> xfs_do_force_shutdown(0x8) called from line 984 of file fs/xfs/xfs_trans.c.
> Return address = 0xffffffffc056324f
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): Corruption
> of in-memory data detected. Shutting down filesystem
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): Please
> umount the filesystem and rectify the problem(s)
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): Failed to
> recover intents
> Apr 10 11:47:58 avi.cloudius-systems.com kernel: XFS (nvme0n1): log mount
> finish failed
>
>
>
> smart (note error at end; there were no kernel I/O errors from the block
> layer):
>
> $ sudo smartctl -a /dev/nvme0n1
> smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.10.8-200.fc25.x86_64] (local
> build)
> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
>
> === START OF INFORMATION SECTION ===
> Model Number: INTEL SSDPEKKW512G7
> Serial Number: BTPY6313086D512F
> Firmware Version: PSF100C
> PCI Vendor/Subsystem ID: 0x8086
> IEEE OUI Identifier: 0x5cd2e4
> Controller ID: 1
> Number of Namespaces: 1
> Namespace 1 Size/Capacity: 512,110,190,592 [512 GB]
> Namespace 1 Formatted LBA Size: 512
> Local Time is: Mon Apr 10 12:36:41 2017 IDT
> Firmware Updates (0x12): 1 Slot, no Reset required
> Optional Admin Commands (0x0006): Format Frmw_DL
> Optional NVM Commands (0x001e): Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
> Maximum Data Transfer Size: 32 Pages
> Warning Comp. Temp. Threshold: 70 Celsius
> Critical Comp. Temp. Threshold: 80 Celsius
>
> Supported Power States
> St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
> 0 + 9.00W - - 0 0 0 0 5 5
> 1 + 4.60W - - 1 1 1 1 30 30
> 2 + 3.80W - - 2 2 2 2 30 30
> 3 - 0.0700W - - 3 3 3 3 10000 300
> 4 - 0.0050W - - 4 4 4 4 2000 10000
>
> Supported LBA Sizes (NSID 0x1)
> Id Fmt Data Metadt Rel_Perf
> 0 + 512 0 0
>
> === START OF SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
>
> SMART/Health Information (NVMe Log 0x02, NSID 0x1)
> Critical Warning: 0x00
> Temperature: 27 Celsius
> Available Spare: 100%
> Available Spare Threshold: 10%
> Percentage Used: 0%
> Data Units Read: 8,854,487 [4.53 TB]
> Data Units Written: 5,652,445 [2.89 TB]
> Host Read Commands: 446,901,662
> Host Write Commands: 35,627,742
> Controller Busy Time: 633
> Power Cycles: 24
> Power On Hours: 987
> Unsafe Shutdowns: 16
> Media and Data Integrity Errors: 1
> Error Information Log Entries: 1
> Warning Comp. Temperature Time: 11
> Critical Comp. Temperature Time: 0
>
> Error Information (NVMe Log 0x01, max 64 entries)
> Num ErrCount SQId CmdId Status PELoc LBA NSID VS
> 0 1 1 0x0000 0x0286 - 0 1 -
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2017-04-10 15:35 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-10 9:23 filesystem dead, xfs_repair won't help Avi Kivity
2017-04-10 9:42 ` Avi Kivity
2017-04-10 15:35 ` Brian Foster [this message]
2017-04-11 7:46 ` Avi Kivity
2017-04-11 11:30 ` Emmanuel Florac
2017-04-11 11:40 ` Avi Kivity
2017-04-11 12:00 ` Emmanuel Florac
2017-04-11 12:03 ` Avi Kivity
2017-04-11 12:49 ` Emmanuel Florac
2017-04-11 13:07 ` Avi Kivity
2017-04-11 16:13 ` Emmanuel Florac
2017-04-11 16:44 ` Avi Kivity
2017-04-11 16:48 ` Eric Sandeen
2017-04-12 15:15 ` Christoph Hellwig
2017-04-12 15:34 ` Eric Sandeen
2017-04-12 15:45 ` Christoph Hellwig
2017-04-12 16:15 ` Avi Kivity
2017-04-12 16:20 ` Christoph Hellwig
2017-04-12 16:22 ` Eric Sandeen
2017-04-12 16:24 ` Avi Kivity
2017-04-12 16:22 ` Avi Kivity
2017-04-12 17:41 ` Christoph Hellwig
2017-04-10 9:43 ` allow mounting w/crc-checking disabled? (was Re: filesystem dead, xfs_repair won't help) L A Walsh
2017-04-10 16:01 ` Eric Sandeen
2017-04-10 18:05 ` L A Walsh
2017-04-11 12:57 ` Emmanuel Florac
2017-04-11 13:34 ` Eric Sandeen
2017-04-11 16:18 ` Emmanuel Florac
2017-04-11 16:34 ` Eric Sandeen
2017-04-10 15:49 ` filesystem dead, xfs_repair won't help Eric Sandeen
2017-04-10 16:23 ` Christoph Hellwig
2017-04-11 7:48 ` Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170410153529.GE3991@bfoster.bfoster \
--to=bfoster@redhat.com \
--cc=avi@scylladb.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.