From: Dave Chinner <david@fromorbit.com>
To: Dave Jones <davej@redhat.com>,
xfs@oss.sgi.com, Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: 3.10-rc3 xfs mount/recovery failure & ext fsck hang.
Date: Wed, 29 May 2013 08:04:09 +1000 [thread overview]
Message-ID: <20130528220409.GD29338@dastard> (raw)
In-Reply-To: <20130528214137.GC24342@redhat.com>
On Tue, May 28, 2013 at 05:41:37PM -0400, Dave Jones wrote:
> On Wed, May 29, 2013 at 07:32:48AM +1000, Dave Chinner wrote:
> > On Tue, May 28, 2013 at 05:15:44PM -0400, Dave Jones wrote:
> > > On Wed, May 29, 2013 at 07:10:12AM +1000, Dave Chinner wrote:
> > > > On Tue, May 28, 2013 at 12:12:30PM -0400, Dave Jones wrote:
> > > > > box crashed, and needed rebooting. On next bootup, when it found the dirty partition,
> > > > > xfs chose to spew and then hang instead of replaying the journal and mounting :(
> > > > >
> > > > > [ 14.694731] SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, debug enabled
> > > > > [ 14.722328] XFS (sda2): Mounting Filesystem
> > > > > [ 14.757801] XFS (sda2): Starting recovery (logdev: internal)
> > > > > [ 14.782049] XFS: Assertion failed: fs_is_ok, file: fs/xfs/xfs_dir2_data.c, line: 169
> > > >
> > > > A directory block has an entry that is not in the hash index.
> > > > Either there's an underlying corruption on disk, or there's an
> > > > inconsistency in what has been logged and so an entire change has
> > > > not been replayed. Hence the post recovery verification has thrown a
> > > > corruption error....
> > > >
> > > > If you haven't already repaired the filesystem, can you send me a
> > > > metadump of the filesystem in question?
> > >
> > > Sorry, too late. If I can repro, I'll do so next time.
> > > FYI, I ran xfs_repair and it just hung. Wouldn't even answer ctrl-c.
> > > Rebooted, and then it mounted and recovered just fine!
> >
> > Strange. I can't think of any reason outside a kernel problem for
> > xfs_repair going into an uninterruptible sleep. Did it happen after
> > the repair completed (i.e. after phase 7)? If so, then closing the
> > block device might have tripped the same problem that fsck.ext2
> > hit....
>
> didn't even get that far. It opened the block dev, and then just sat there.
> I left it for a few minutes before deciding it was hung.
> And of course, this is an SSD, so there was no way I could tell if there
> was any IO going on by sound/feel/lights.
OK. Normally when it hangs you can kill it or ctrl-c out because it
gets stuck on a futex. You can then run xfs_repair -P to turn off
threading (and speed :() to avoid such hangs. but given that you
couldn't kill it, it doesn't sound like that sort of problem....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Dave Jones <davej@redhat.com>,
xfs@oss.sgi.com, Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: 3.10-rc3 xfs mount/recovery failure & ext fsck hang.
Date: Wed, 29 May 2013 08:04:09 +1000 [thread overview]
Message-ID: <20130528220409.GD29338@dastard> (raw)
In-Reply-To: <20130528214137.GC24342@redhat.com>
On Tue, May 28, 2013 at 05:41:37PM -0400, Dave Jones wrote:
> On Wed, May 29, 2013 at 07:32:48AM +1000, Dave Chinner wrote:
> > On Tue, May 28, 2013 at 05:15:44PM -0400, Dave Jones wrote:
> > > On Wed, May 29, 2013 at 07:10:12AM +1000, Dave Chinner wrote:
> > > > On Tue, May 28, 2013 at 12:12:30PM -0400, Dave Jones wrote:
> > > > > box crashed, and needed rebooting. On next bootup, when it found the dirty partition,
> > > > > xfs chose to spew and then hang instead of replaying the journal and mounting :(
> > > > >
> > > > > [ 14.694731] SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, debug enabled
> > > > > [ 14.722328] XFS (sda2): Mounting Filesystem
> > > > > [ 14.757801] XFS (sda2): Starting recovery (logdev: internal)
> > > > > [ 14.782049] XFS: Assertion failed: fs_is_ok, file: fs/xfs/xfs_dir2_data.c, line: 169
> > > >
> > > > A directory block has an entry that is not in the hash index.
> > > > Either there's an underlying corruption on disk, or there's an
> > > > inconsistency in what has been logged and so an entire change has
> > > > not been replayed. Hence the post recovery verification has thrown a
> > > > corruption error....
> > > >
> > > > If you haven't already repaired the filesystem, can you send me a
> > > > metadump of the filesystem in question?
> > >
> > > Sorry, too late. If I can repro, I'll do so next time.
> > > FYI, I ran xfs_repair and it just hung. Wouldn't even answer ctrl-c.
> > > Rebooted, and then it mounted and recovered just fine!
> >
> > Strange. I can't think of any reason outside a kernel problem for
> > xfs_repair going into an uninterruptible sleep. Did it happen after
> > the repair completed (i.e. after phase 7)? If so, then closing the
> > block device might have tripped the same problem that fsck.ext2
> > hit....
>
> didn't even get that far. It opened the block dev, and then just sat there.
> I left it for a few minutes before deciding it was hung.
> And of course, this is an SSD, so there was no way I could tell if there
> was any IO going on by sound/feel/lights.
OK. Normally when it hangs you can kill it or ctrl-c out because it
gets stuck on a futex. You can then run xfs_repair -P to turn off
threading (and speed :() to avoid such hangs. but given that you
couldn't kill it, it doesn't sound like that sort of problem....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2013-05-28 22:04 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-28 16:12 3.10-rc3 xfs mount/recovery failure & ext fsck hang Dave Jones
2013-05-28 16:12 ` Dave Jones
2013-05-28 21:10 ` Dave Chinner
2013-05-28 21:10 ` Dave Chinner
2013-05-28 21:15 ` Dave Jones
2013-05-28 21:15 ` Dave Jones
2013-05-28 21:32 ` Dave Chinner
2013-05-28 21:32 ` Dave Chinner
2013-05-28 21:41 ` Dave Jones
2013-05-28 21:41 ` Dave Jones
2013-05-28 22:04 ` Dave Chinner [this message]
2013-05-28 22:04 ` Dave Chinner
2013-06-06 12:00 ` Dave Chinner
2013-06-06 12:00 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130528220409.GD29338@dastard \
--to=david@fromorbit.com \
--cc=davej@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.