RAID5, 2 out of 4 disks failed, Events value differs too much

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RAID5, 2 out of 4 disks failed, Events value differs too much
@ 2006-12-05  8:48 Bodo Thiesen
  2006-12-08  0:34 ` Corey Hickey
  0 siblings, 1 reply; 2+ messages in thread
From: Bodo Thiesen @ 2006-12-05  8:48 UTC (permalink / raw)
  To: linux-raid

Hi, I have a little problem:

Some hours ago the second of four disks were kicked out of my RAID5 thus rendering it unusable. As of my current knowledge, the disks are still working correctly (I assume a cable connection problem) but that's not the problem. The real problem is, that the first failed disk has an event value of 9102893, the second failed disk has a value of 9324862 and the other two disks have a value of 9324869. In this case, what is the best to do to recover the RAID? Because just recreating the array with the 9324862-disk and the two 9324869-disks and later hotadding the 9102893-disk, is just unclean and as I understood it, this would trigger some silent data failures. Is there a chance to prevent this data failures to happen at all, or is it at least possible to tell, where this error(s) are (so I can manually check the data and take appropriate steps)? Remember that I still have the data from the first failed disk, from which parts may still be relatively up to date.

Has anyone had this problem already and found a nice solution for this?

Regards in advance, Bodo

mdadm -E on all affected disks:
--> please bite here <---> please bite here <---> please bite here <--
/dev/mapper/sda:
          Magic : a92b4efc
        Version : 00.90.03
           UUID : 876ed759:74a31656:94332345:a976171b
  Creation Time : Thu Feb  2 05:30:53 2006
     Raid Level : raid5
    Device Size : 244198464 (232.89 GiB 250.06 GB)
     Array Size : 732595392 (698.66 GiB 750.18 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Wed Nov 29 03:12:12 2006
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 7bd96cb9 - correct
         Events : 0.9102893

         Layout : left-symmetric
     Chunk Size : 32K

      Number   Major   Minor   RaidDevice State
this     1     253        0        1      active sync   /dev/mapper/sda

   0     0     253        2        0      active sync   /dev/mapper/sdc
   1     1     253        0        1      active sync   /dev/mapper/sda
   2     2     253        1        2      active sync   /dev/mapper/sdb
   3     3     253        3        3      active sync   /dev/mapper/sdd
/dev/mapper/sdb:
          Magic : a92b4efc
        Version : 00.90.03
           UUID : 876ed759:74a31656:94332345:a976171b
  Creation Time : Thu Feb  2 05:30:53 2006
     Raid Level : raid5
    Device Size : 244198464 (232.89 GiB 250.06 GB)
     Array Size : 732595392 (698.66 GiB 750.18 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Tue Dec  5 04:31:06 2006
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 2
  Spare Devices : 0
       Checksum : 7be82e68 - correct
         Events : 0.9324862

         Layout : left-symmetric
     Chunk Size : 32K

      Number   Major   Minor   RaidDevice State
this     2     253        1        2      active sync   /dev/mapper/sdb

   0     0     253        2        0      active sync   /dev/mapper/sdc
   1     1       0        0        1      faulty removed
   2     2     253        1        2      active sync   /dev/mapper/sdb
   3     3     253        3        3      active sync   /dev/mapper/sdd
/dev/mapper/sdc:
          Magic : a92b4efc
        Version : 00.90.03
           UUID : 876ed759:74a31656:94332345:a976171b
  Creation Time : Thu Feb  2 05:30:53 2006
     Raid Level : raid5
    Device Size : 244198464 (232.89 GiB 250.06 GB)
     Array Size : 732595392 (698.66 GiB 750.18 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Tue Dec  5 04:45:20 2006
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 4
  Spare Devices : 0
       Checksum : 7be831d7 - correct
         Events : 0.9324869

         Layout : left-symmetric
     Chunk Size : 32K

      Number   Major   Minor   RaidDevice State
this     0     253        2        0      active sync   /dev/mapper/sdc

   0     0     253        2        0      active sync   /dev/mapper/sdc
   1     1       0        0        1      faulty removed
   2     2       0        0        2      faulty removed
   3     3     253        3        3      active sync   /dev/mapper/sdd
/dev/mapper/sdd:
          Magic : a92b4efc
        Version : 00.90.03
           UUID : 876ed759:74a31656:94332345:a976171b
  Creation Time : Thu Feb  2 05:30:53 2006
     Raid Level : raid5
    Device Size : 244198464 (232.89 GiB 250.06 GB)
     Array Size : 732595392 (698.66 GiB 750.18 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Tue Dec  5 04:45:20 2006
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 4
  Spare Devices : 0
       Checksum : 7be831de - correct
         Events : 0.9324869

         Layout : left-symmetric
     Chunk Size : 32K

      Number   Major   Minor   RaidDevice State
this     3     253        3        3      active sync   /dev/mapper/sdd

   0     0     253        2        0      active sync   /dev/mapper/sdc
   1     1       0        0        1      faulty removed
   2     2       0        0        2      faulty removed
   3     3     253        3        3      active sync   /dev/mapper/sdd
--> please bite here <---> please bite here <---> please bite here <--
-- 
"Ein Herz für Kinder" - Ihre Spende hilft! Aktion: www.deutschlandsegelt.de
Unser Dankeschön: Ihr Name auf dem Segel der 1. deutschen America's Cup-Yacht!
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: RAID5, 2 out of 4 disks failed, Events value differs too much
  2006-12-05  8:48 RAID5, 2 out of 4 disks failed, Events value differs too much Bodo Thiesen
@ 2006-12-08  0:34 ` Corey Hickey
  0 siblings, 0 replies; 2+ messages in thread
From: Corey Hickey @ 2006-12-08  0:34 UTC (permalink / raw)
  Cc: linux-raid

Bodo Thiesen wrote:
> Hi, I have a little problem:
> 
> Some hours ago the second of four disks were kicked out of my RAID5 thus rendering it unusable. As of my current knowledge, the disks are still working correctly (I assume a cable connection problem) but that's not the problem. The real problem is, that the first failed disk has an event value of 9102893, the second failed disk has a value of 9324862 and the other two disks have a value of 9324869. In this case, what is the best to do to recover the RAID? Because just recreating the array with the 9324862-disk and the two 9324869-disks and later hotadding the 9102893-disk, is just unclean and as I understood it, this would trigger some silent data failures. Is there a chance to prevent this data failures to happen at all, or is it at least possible to tell, where this error(s) are (so I can manually check the data and take appropriate steps)? Remember that I still have the data from th
 e first failed disk, from which parts may still be relatively up to date.
> 
> Has anyone had this problem already and found a nice solution for this?

If nobody gives you any better advice, I would follow this approach. 
These commands are examples and may need to be fixed; I haven't had this 
exact problem before (only similar ones) and I can't test anything right 
now.

First, force the reassembly of the array using the three freshest disks.
# mdadm --assemble --force --run /dev/md0 /dev/sdb /dev/sdc /dev/sdd

Next, use whatever fsck program corresponds to your filesystem and do a 
read-only check. Something like:
# reiserfsck --check /dev/md0

If fsck finds only a few problems, then it's probably safe to go ahead 
and tell fsck to fix them; data loss will be minimal or nonexistent.
# reiserfsck --fix-fixable /dev/md0

Now you ought to be able to mount the filesystem and look around.
# mount /dev/md0

If all looks good, then hot-add the stale disk and let it resync.
# mdadm /dev/md0 -a /dev/sda

Good luck,
Corey

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2006-12-08  0:34 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-05  8:48 RAID5, 2 out of 4 disks failed, Events value differs too much Bodo Thiesen
2006-12-08  0:34 ` Corey Hickey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).