public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Ole Tange <tange@binf.ku.dk>
Cc: xfs@oss.sgi.com
Subject: Re: xfs_repair segfaults
Date: Tue, 5 Mar 2013 10:23:19 +1100	[thread overview]
Message-ID: <20130304232319.GR23616@dastard> (raw)
In-Reply-To: <CANU9nTmmw3FcHRBNvu_S6Uj8M-B2JFf5poQfHbZuCbJ6_=_RgA@mail.gmail.com>

On Mon, Mar 04, 2013 at 10:03:29AM +0100, Ole Tange wrote:
> On Fri, Mar 1, 2013 at 9:53 PM, Dave Chinner <david@fromorbit.com> wrote:
> :
> > What filesystem errors occurred
> > when the srives went offline?
> 
> See http://dna.ku.dk/~tange/tmp/syslog.3

You log is full of this:

mpt2sas1: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)

What's that mean?

> 
> Feb 26 00:46:52 franklin kernel: [556238.429259] XFS (md5p1): metadata
> I/O error: block 0x459b8 ("xfs_buf_iodone_callbacks") error 5 buf
> count 4096

So, the first IO errors appear at 23:00 on /dev/sdb, and the
controller does a full reset and reprobe. Look slike a port failure
of some kind. Notable:

mpt2sas1: LSISAS2008: FWVersion(07.00.00.00), ChipRevision(0x03), BiosVersion(07.11.10.00)

>From a quick google, that firmware looks out of date (current
LSISAS2008 firmwares are numbered 10 or 11, and bios versions are at
7.21).

So, /dev/md1 reported a failure (/dev/sdb) around 23:01:16, started a
rebuild. Looks like it swapped in /dev/sdd and started a rebuild.

/dev/md4 had a failure (/dev/sds) around 00:19, no rebuild started.
Down to 8 disks in /dev/md4, no rebuild in progress, no redundancy
available.

/dev/md1 had another failure (/dev/sdj) around 00:46, this time on a
SYNCHRONISE CACHE command (i.e. log write). This IO failure caused
the shutdown to occur. And this is the result:

[556219.292225] end_request: I/O error, dev sdj, sector 10
[556219.292275] md: super_written gets error=-5, uptodate=0
[556219.292283] md/raid:md1: Disk failure on sdj, disabling device.
[556219.292286] md/raid:md1: Operation continuing on 7 devices.

At this point, /dev/md1 is reporting 7 working disks and has had an
EIO on it's superblock write, which means it's probably in an
inconsistent state. Further, it's only got 8 disks associated with
it and as a rebuild is in progress it means that data loss has
occurred with this failure. There's your problem.

Essentially, you need to fix your hardware before you do anything
else. Get it all back fully online and fix whatever the problems are
that are causing IO errors, then you can worry about recovering the
filesystem and your data. Until the hardware is stable and not
throwing errors, recovery is going to be unreliable (if not
impossible).

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2013-03-04 23:23 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-28 15:22 xfs_repair segfaults Ole Tange
2013-02-28 18:48 ` Eric Sandeen
2013-03-01  9:37   ` Ole Tange
2013-03-01 16:46     ` Eric Sandeen
2013-03-04  9:00       ` Ole Tange
2013-03-04 15:20         ` Eric Sandeen
2013-03-08 10:21           ` Ole Tange
2013-03-08 20:32             ` Eric Sandeen
2013-03-12 10:41               ` Ole Tange
2013-03-12 14:40                 ` Eric Sandeen
2013-03-12 11:37             ` Ole Tange
2013-03-12 14:47               ` Eric Sandeen
2013-03-01 11:17 ` Dave Chinner
2013-03-01 12:24   ` Ole Tange
2013-03-01 20:53     ` Dave Chinner
2013-03-04  9:03       ` Ole Tange
2013-03-04 23:23         ` Dave Chinner [this message]
2013-03-08 10:09           ` Ole Tange
2013-03-01 22:14 ` Eric Sandeen
2013-03-01 22:31   ` Dave Chinner
2013-03-01 22:32     ` Eric Sandeen
2013-03-01 23:55       ` Eric Sandeen
2013-03-04 12:47       ` Ole Tange
2013-03-04 15:17         ` Eric Sandeen
2013-03-04 23:11           ` Dave Chinner
  -- strict thread matches above, loose matches on Subject: below --
2013-05-06 12:06 Xfs_repair segfaults Filippo Stenico
2013-05-06 14:34 ` Eric Sandeen
2013-05-06 15:00   ` Filippo Stenico
     [not found]     ` <CADNx=Kv0bt3fNGW8Y24GziW9MOO-+b7fBGub4AYP70b5gAegxw@mail.gmail.com>
2013-05-07 13:20       ` Eric Sandeen
2013-05-07 13:36         ` Filippo Stenico
2013-05-07 18:20           ` Filippo Stenico
2013-05-08 17:30             ` Filippo Stenico
2013-05-08 17:42               ` Filippo Stenico
2013-05-08 23:39               ` Dave Chinner
2013-05-09 15:11                 ` Filippo Stenico
2013-05-09 17:22                   ` Filippo Stenico
2013-05-09 22:39                     ` Dave Chinner
     [not found]                       ` <CADNx=KuQjMNHUk6t0+hBZ5DN6s=RXqrPEjeoSxpBta47CJoDgQ@mail.gmail.com>
2013-05-10 11:00                         ` Filippo Stenico
2013-05-09 22:37                   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130304232319.GR23616@dastard \
    --to=david@fromorbit.com \
    --cc=tange@binf.ku.dk \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox