What to do when... xfs_repair hangs?

* What to do when... xfs_repair hangs?
@ 2014-05-30 19:49 Sean Caron
  2014-05-30 21:30 ` Brian Foster
  2014-05-31  0:01 ` Dave Chinner
  0 siblings, 2 replies; 7+ messages in thread
From: Sean Caron @ 2014-05-30 19:49 UTC (permalink / raw)
  To: xfs, Sean Caron

[-- Attachment #1.1: Type: text/plain, Size: 2211 bytes --]

Hi all,

Long story short, we have a big array formatted as XFS, we had a machine go
down hard maybe a month, month and a half ago... when it came back up, XFS
faulted out when we attempted to mount the filesystem; it complained the
log was bad or something... I did a dry run of xfs_repair (-L) and it
looked pretty bad, so we mounted up the filesystem read-only, ran a
backup... I think we got pretty much everything out OK except maybe files
that were open at the time of the crash.

Now with a backup in hand, we kicked off xfs_repair "for real"... it ran
for a while and did its thing, but now it appears to be stuck at the stage -

- agno = 436
rebuilding directory inode ...
rebuilding directory inode ...
rebuilding directory inode ...
...
- traversal finished ...
- moving disconected inodes to lost+found ...
disconnected inode 1109099673,

and then it just stops. I don't know how long its been sitting like that,
but it hasn't moved in the last hour or two. I assume that's not good...

Interestingly when we ran a dry run of xfs_repair (-L) it got all the way
through; it never hung up at any point. Not sure why it would start to hang
up, once it gets run "for real".

This machine is in single-user-mode, I have exactly 24 lines of console
with no scrollback buffer, no other tty available besides that which I'm
running xfs_repair on, the system console.

Running Linux kernel 3.4.61, Ubuntu 12.04 LTS 64-bit with whatever their
current xfsprogs is.

This is a bit of an exceptional situation for me; I've never seen
xfs_repair just hang outright. I hoped I could maybe get some feedback from
the experts here... what should I do?

Try to Control-C out of the xfs_repair and ... re-run it?

Should I just quit wasting time at this point, wipe out the filesystem,
reformat, then just start the long process of restoring from the backups?

Original plan was just to run xfs_repair, see what happened and pull from
backups as required to fix damage. Perhaps we should just cut to the chase,
rebuild, and restore everything? Probably the file system would be
ultimately healthier starting from scratch, than what xfs_repair leaves
behind?

Any insight would be very much appreciated!

Thanks,

Sean

[-- Attachment #1.2: Type: text/html, Size: 2697 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread