public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Sean Caron <scaron@umich.edu>
Cc: xfs@oss.sgi.com
Subject: Re: What to do when... xfs_repair hangs?
Date: Fri, 30 May 2014 17:30:51 -0400	[thread overview]
Message-ID: <20140530213051.GB26314@bfoster.bfoster> (raw)
In-Reply-To: <CAA43vkVzWRTqNQh2VSi5yvFLtstmVOKRJUnYw_ZSkYJGsex8Uw@mail.gmail.com>

On Fri, May 30, 2014 at 03:49:13PM -0400, Sean Caron wrote:
> Hi all,
> 
> Long story short, we have a big array formatted as XFS, we had a machine go
> down hard maybe a month, month and a half ago... when it came back up, XFS
> faulted out when we attempted to mount the filesystem; it complained the
> log was bad or something... I did a dry run of xfs_repair (-L) and it
> looked pretty bad, so we mounted up the filesystem read-only, ran a
> backup... I think we got pretty much everything out OK except maybe files
> that were open at the time of the crash.
> 

I assume you've reasonably verified that the files that have been backed
up at this point have valid content.

> Now with a backup in hand, we kicked off xfs_repair "for real"... it ran
> for a while and did its thing, but now it appears to be stuck at the stage -
> 
> - agno = 436
> rebuilding directory inode ...
> rebuilding directory inode ...
> rebuilding directory inode ...
> ...
> - traversal finished ...
> - moving disconected inodes to lost+found ...
> disconnected inode 1109099673,
> 
> and then it just stops. I don't know how long its been sitting like that,
> but it hasn't moved in the last hour or two. I assume that's not good...
> 

You might want to include a bit more information about your storage and
filesystem geometry, if possible. See here:

http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

In terms of the hang, does the process appear to be active and spinning
via top, or is it idle? If the latter, have you any hung task messages
in dmesg or the system logs? A blocked tasks dump might also be
informative here (see the sysrq-trigger bit in the link). In either
case, I suppose some information of the runtime state of xfs_repair
could be useful.

> Interestingly when we ran a dry run of xfs_repair (-L) it got all the way
> through; it never hung up at any point. Not sure why it would start to hang
> up, once it gets run "for real".
> 

Perhaps writing to storage is problematic..? Have you encountered any
other errors related to the storage?

> This machine is in single-user-mode, I have exactly 24 lines of console
> with no scrollback buffer, no other tty available besides that which I'm
> running xfs_repair on, the system console.
> 
> Running Linux kernel 3.4.61, Ubuntu 12.04 LTS 64-bit with whatever their
> current xfsprogs is.
> 
> This is a bit of an exceptional situation for me; I've never seen
> xfs_repair just hang outright. I hoped I could maybe get some feedback from
> the experts here... what should I do?
> 
> Try to Control-C out of the xfs_repair and ... re-run it?
> 
> Should I just quit wasting time at this point, wipe out the filesystem,
> reformat, then just start the long process of restoring from the backups?
> 

I'm not totally sure, but I think if you include some more of this data,
others might have some suggestions. If there really is something about
the filesystem causing repair to choke/spin/fall-over, a metadump of the
fs might be useful (beforehand, if you do happen to go this route).

Brian

> Original plan was just to run xfs_repair, see what happened and pull from
> backups as required to fix damage. Perhaps we should just cut to the chase,
> rebuild, and restore everything? Probably the file system would be
> ultimately healthier starting from scratch, than what xfs_repair leaves
> behind?
> 
> Any insight would be very much appreciated!
> 
> Thanks,
> 
> Sean

> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2014-05-30 21:31 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-30 19:49 What to do when... xfs_repair hangs? Sean Caron
2014-05-30 21:30 ` Brian Foster [this message]
2014-05-31  0:01 ` Dave Chinner
2014-06-01 16:21   ` Sean Caron
2014-06-01 20:40     ` Emmanuel Florac
2014-06-01 22:48     ` Dave Chinner
2014-06-02 18:32       ` Sean Caron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140530213051.GB26314@bfoster.bfoster \
    --to=bfoster@redhat.com \
    --cc=scaron@umich.edu \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox