linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David T-G <davidtg@justpickone.org>
To: Linux-XFS list <linux-xfs@vger.kernel.org>
Subject: is it safe to xfs_repair this volume? do i have a different first step?
Date: Thu, 7 Feb 2019 08:25:34 -0500	[thread overview]
Message-ID: <20190207132534.GA2185@justpickone.org> (raw)

Good morning!

I have a four-disk RAID5 volume with an ~11T filesystem that suddenly
won't mount 

  diskfarm:root:4:~> mount -v /mnt/4Traid5md/
  mount: mount /dev/md0p1 on /mnt/4Traid5md failed: Bad message

after a power outage :-(  Because of the GPT errors I see

  diskfarm:root:4:~> fdisk -l /dev/md0
  The backup GPT table is corrupt, but the primary appears OK, so that will be used.
  Disk /dev/md0: 10.9 TiB, 12001551581184 bytes, 23440530432 sectors
  Units: sectors of 1 * 512 = 512 bytes
  Sector size (logical/physical): 512 bytes / 4096 bytes
  I/O size (minimum/optimal): 524288 bytes / 1572864 bytes
  Disklabel type: gpt
  Disk identifier: 8D29E2FB-1A26-4C46-B284-99FA7163B89D

  Device     Start         End     Sectors  Size Type
  /dev/md0p1  2048 23440530398 23440528351 10.9T Linux filesystem

  diskfarm:root:4:~> parted /dev/md0 print
  Error: end of file while reading /dev/md0
  Retry/Ignore/Cancel? ignore
  Error: The backup GPT table is corrupt, but the primary appears OK, so that will be used.
  OK/Cancel? ok
  Model: Linux Software RAID Array (md)
  Disk /dev/md0: 12.0TB
  Sector size (logical/physical): 512B/4096B
  Partition Table: gpt
  Disk Flags:

  Number  Start   End     Size    File system  Name              Flags
   1      1049kB  12.0TB  12.0TB  xfs          Linux filesystem

when poking, I at first thought that this was a RAID issue, but all of
the md reports look good and apparently the GPT table issue is common, so
I'll leave all of that out unless someone asks for it.

dmesg reports some XFS problems

  diskfarm:root:5:~> dmesg | egrep 'md[:/0]'
  [  117.999012] md/raid:md127: device sdg2 operational as raid disk 1
  [  117.999014] md/raid:md127: device sdh2 operational as raid disk 2
  [  117.999015] md/raid:md127: device sdd2 operational as raid disk 0
  [  117.999246] md/raid:md127: raid level 5 active with 3 out of 3 devices, algorithm 2
  [  120.820661] md/raid:md0: not clean -- starting background reconstruction
  [  120.821279] md/raid:md0: device sdf1 operational as raid disk 2
  [  120.821282] md/raid:md0: device sda1 operational as raid disk 3
  [  120.821283] md/raid:md0: device sdb1 operational as raid disk 0
  [  120.821284] md/raid:md0: device sde1 operational as raid disk 1
  [  120.822028] md/raid:md0: raid level 5 active with 4 out of 4 devices, algorithm 2
  [  120.822063] md0: detected capacity change from 0 to 12001551581184
  [  120.888841]  md0: p1
  [  202.230961] XFS (md0p1): Mounting V4 Filesystem
  [  203.182567] XFS (md0p1): Torn write (CRC failure) detected at log block 0x3397e8. Truncating head block from 0x3399e8.
  [  203.367581] XFS (md0p1): failed to locate log tail
  [  203.367587] XFS (md0p1): log mount/recovery failed: error -74
  [  203.367712] XFS (md0p1): log mount failed
  [  285.893728] XFS (md0p1): Mounting V4 Filesystem
  [  286.057829] XFS (md0p1): Torn write (CRC failure) detected at log block 0x3397e8. Truncating head block from 0x3399e8.
  [  286.203436] XFS (md0p1): failed to locate log tail
  [  286.203440] XFS (md0p1): log mount/recovery failed: error -74
  [  286.203497] XFS (md0p1): log mount failed

but doesn't tell me a whole lot -- or at least not a whole lot that makes
enough sense to me :-)  I tried an xfs_repair dry run and here

  diskfarm:root:4:~> xfs_repair -n /dev/disk/by-label/4Traid5md 2>&1 | egrep -v 'agno = '
  Phase 1 - find and verify superblock...
          - reporting progress in intervals of 15 minutes
  Phase 2 - using internal log
          - zero log...
          - scan filesystem freespace and inode maps...
  sb_fdblocks 471930978, counted 471939170
          - 09:18:47: scanning filesystem freespace - 48 of 48 allocation groups done
          - found root inode chunk
  Phase 3 - for each AG...
          - scan (but don't clear) agi unlinked lists...
          - 09:18:47: scanning agi unlinked lists - 48 of 48 allocation groups done
          - process known inodes and perform inode discovery...
          - 09:24:17: process known inodes and inode discovery - 4466560 of 4466560 inodes done
          - process newly discovered inodes...
          - 09:24:17: process newly discovered inodes - 48 of 48 allocation groups done
  Phase 4 - check for duplicate blocks...
          - setting up duplicate extent list...
          - 09:24:17: setting up duplicate extent list - 48 of 48 allocation groups done
          - check for inodes claiming duplicate blocks...
          - 09:29:44: check for inodes claiming duplicate blocks - 4466560 of 4466560 inodes done
  No modify flag set, skipping phase 5
  Phase 6 - check inode connectivity...
          - traversing filesystem ...
          - traversal finished ...
          - moving disconnected inodes to lost+found ...
  Phase 7 - verify link counts...
          - 09:34:02: verify and correct link counts - 48 of 48 allocation groups done
  No modify flag set, skipping filesystem flush and exiting.

is the trimmed output that can fit on one screen.  Since I don't have a
second copy of all of this data, I'm a bit nervous about pulling the
trigger to write changes and want to make sure that I take the right
steps!  How should I proceed?

I'm not subscribed to this list, so please do cc/bcc me on your replies.
I didn't see any other lists and did see some discussion here, so I hope
that I'm in the right place, but please feel free also to point me in
another direction if that's better.


TIA & HAND

:-D
-- 
David T-G
See http://justpickone.org/davidtg/email/
See http://justpickone.org/davidtg/tofu.txt

             reply	other threads:[~2019-02-07 13:30 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-07 13:25 David T-G [this message]
2019-02-07 14:52 ` is it safe to xfs_repair this volume? do i have a different first step? Brian Foster
2019-02-08  2:25   ` David T-G
2019-02-08 13:00     ` Brian Foster
2019-02-08 19:45     ` Chris Murphy
2019-02-08 18:40 ` Chris Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190207132534.GA2185@justpickone.org \
    --to=davidtg@justpickone.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).