All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Browning <db@kavod.com>
To: linux-raid@vger.kernel.org
Subject: What to do about "ignoring %s as it reports %s as failed"?
Date: Thu, 10 Jan 2013 11:01:01 -0800	[thread overview]
Message-ID: <201301101101.01371.db@kavod.com> (raw)

Hello, folks. What should I do about the following error?

	mdadm: ignoring /dev/sdd1 as it reports /dev/sdb1 as failed

I'm building a new replacement array and restoring from backup, but I would 
still like to try and salvage this failed one if possible, and I was 
surprised to find very few results on google for that particular error 
message.

Here is the background. I recently had a 4-disk raid5 array made up of:

	/dev/sdb1
	/dev/sdc1
	/dev/sdd1
	/dev/sde1

Wednesday afternoon (yesterday), /dev/sde1 failed, so the array went into 
degraded (no parity) state. I thought I'd give sde another chance, so I 
zero'd the superblock and re-added it to the array, which began rebuilding. 
But then when it had reached 72.4% early this morning, /dev/sdb1 failed:

md127 : active raid5 sde1[5] sdc1[0] sdb1[1](F) sdd1[4]
      5859302400 blocks super 1.2 level 5, 512k chunk,
      algorithm 2 [4/2] [U__U]
      [==============>......]  recovery = 72.4% (1414790348/1953100800)
      finish=1192.3min speed=7524K/sec

But /dev/sdb1 is working now (same as /dev/sde1). I tried to re-assemble the 
raid:

[root@lx4 ~]# mdadm --assemble --verbose /dev/md127 /dev/sd[bcde]1
mdadm: looking for devices for /dev/md127
mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 1.
mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 0.
mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 3.
mdadm: /dev/sde1 is identified as a member of /dev/md127, slot -1.
mdadm: added /dev/sdb1 to /dev/md127 as 1 (possibly out of date)
mdadm: no uptodate device for slot 2 of /dev/md127
mdadm: added /dev/sdd1 to /dev/md127 as 3
mdadm: added /dev/sde1 to /dev/md127 as -1
mdadm: added /dev/sdc1 to /dev/md127 as 0
mdadm: /dev/md127 assembled from 2 drives and 1 spare - not enough to start 
the array.

But it rejected /dev/sdb1, so I ran --force to have it update the event 
count:

[root@lx4 ~]# mdadm --assemble --force --verbose /dev/md127 /dev/sd[bcde]1
mdadm: looking for devices for /dev/md127
mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 1.
mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 0.
mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 3.
mdadm: /dev/sde1 is identified as a member of /dev/md127, slot -1.
mdadm: forcing event count in /dev/sdb1(1) from 905199 upto 905262
mdadm: clearing FAULTY flag for device 0 in /dev/md127 for /dev/sdb1
mdadm: Marking array /dev/md127 as 'clean'
mdadm: added /dev/sdb1 to /dev/md127 as 1
mdadm: no uptodate device for slot 2 of /dev/md127
mdadm: added /dev/sdd1 to /dev/md127 as 3
mdadm: added /dev/sde1 to /dev/md127 as -1
mdadm: added /dev/sdc1 to /dev/md127 as 0
mdadm: /dev/md127 assembled from 3 drives and 1 spare - not enough to start 
the array.

This surprised me a lot, because I thought 3 drives would have been enough 
to start the array. But when I ran it again, I got a different error:

[root@lx4 ~]# mdadm --assemble --force --verbose /dev/md127 /dev/sd[bcde]1
mdadm: looking for devices for /dev/md127
mdadm: /dev/sdb1 is identified as a member of /dev/md127, slot 1.
mdadm: /dev/sdc1 is identified as a member of /dev/md127, slot 0.
mdadm: /dev/sdd1 is identified as a member of /dev/md127, slot 3.
mdadm: /dev/sde1 is identified as a member of /dev/md127, slot -1.
mdadm: ignoring /dev/sdd1 as it reports /dev/sdb1 as failed
mdadm: added /dev/sdb1 to /dev/md127 as 1
mdadm: no uptodate device for slot 2 of /dev/md127
mdadm: no uptodate device for slot 3 of /dev/md127
mdadm: added /dev/sde1 to /dev/md127 as -1
mdadm: added /dev/sdc1 to /dev/md127 as 0
mdadm: /dev/md127 assembled from 2 drives and 1 spare - not enough to start 
the array.

It appears to be failing because of this:

	mdadm: ignoring /dev/sdd1 as it reports /dev/sdb1 as failed

The sauce says this:

/* If this device thinks that 'most_recent' has failed, then
 * we must reject this device.
 */

But I can't interpret that into a possible fix. Any ideas?

Thanks in advance,
--
Daniel Browning



Appendix A. Versions
Distro: Fedora Core 16
Kernel: 3.4.4-4.fc16.x86_64 #1 SMP Thu Jul 5 20:01:38 UTC 2012
mdadm: v3.2.5 - 18th May 2012



Appendix B. contents of mdstat after a failed "--assemble":
md127 : inactive sdc1[0](S) sdb1[1](S)
      3906202639 blocks super 1.2



Appendix C. mdadm --examine for all disks, from *before* the 
"--assemble --force" was executed:
/dev/sdb1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 4ca86345:c28c62be:03c9f77b:6760ef5c
           Name : lx4:127
  Creation Time : Sun Oct 10 15:46:28 2010
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 3906202639 (1862.62 GiB 1999.98 GB)
     Array Size : 5859302400 (5587.87 GiB 5999.93 GB)
  Used Dev Size : 3906201600 (1862.62 GiB 1999.98 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 156bc6e0:eaa285fd:8f4ef720:6f2171c2

    Update Time : Thu Jan 10 00:50:25 2013
       Checksum : f0945b4a - correct
         Events : 905199

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AAAA ('A' == active, '.' == missing)
/dev/sdc1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 4ca86345:c28c62be:03c9f77b:6760ef5c
           Name : lx4:127
  Creation Time : Sun Oct 10 15:46:28 2010
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 3906202639 (1862.62 GiB 1999.98 GB)
     Array Size : 5859302400 (5587.87 GiB 5999.93 GB)
  Used Dev Size : 3906201600 (1862.62 GiB 1999.98 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 2dbbc5d0:f3deb841:50c7c992:c9abf856

    Update Time : Thu Jan 10 09:14:03 2013
       Checksum : 2b1b4f88 - correct
         Events : 905262

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : A..A ('A' == active, '.' == missing)
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 4ca86345:c28c62be:03c9f77b:6760ef5c
           Name : lx4:127
  Creation Time : Sun Oct 10 15:46:28 2010
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 3906202639 (1862.62 GiB 1999.98 GB)
     Array Size : 5859302400 (5587.87 GiB 5999.93 GB)
  Used Dev Size : 3906201600 (1862.62 GiB 1999.98 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : bdd8c401:9389bf9b:c80762a2:682b0297

    Update Time : Thu Jan 10 09:14:03 2013
       Checksum : 5c2d7d3 - correct
         Events : 905262

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : A..A ('A' == active, '.' == missing)
/dev/sde1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 4ca86345:c28c62be:03c9f77b:6760ef5c
           Name : lx4:127
  Creation Time : Sun Oct 10 15:46:28 2010
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 3906202639 (1862.62 GiB 1999.98 GB)
     Array Size : 5859302400 (5587.87 GiB 5999.93 GB)
  Used Dev Size : 3906201600 (1862.62 GiB 1999.98 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 78c381f7:1447cbd4:6af86729:d4c08320

    Update Time : Thu Jan 10 09:14:03 2013
       Checksum : 4513061e - correct
         Events : 905262

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : spare
   Array State : A..A ('A' == active, '.' == missing)

             reply	other threads:[~2013-01-10 19:01 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-10 19:01 Daniel Browning [this message]
2013-01-12 21:18 ` What to do about "ignoring %s as it reports %s as failed"? Daniel Browning

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201301101101.01371.db@kavod.com \
    --to=db@kavod.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.