linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RAID5 in strange state
@ 2009-04-08 21:29 Frank Baumgart
  2009-04-08 21:59 ` Goswin von Brederlow
  2009-04-09  5:51 ` Neil Brown
  0 siblings, 2 replies; 5+ messages in thread
From: Frank Baumgart @ 2009-04-08 21:29 UTC (permalink / raw)
  To: linux-raid

Dear List,

I use MD RAID 5 since some years and so far had to recover from single
disk failures a few times which was always successful.
Now though, I am puzzled.

Setup:
Some PC with 3x WD 1 TB SATA disk drives set up as RAID 5 using kernel
2.6.27.21 (now); the array ran fine for at least 6 months now.

I check the state of the RAID every few days with looking at
/proc/mdstat manually.
Apparently one drive had been kicked out of the array 4 days ago without
me noticing it.
Root cause seemed to be bad cabling but is not confirmed yet.
Anyway, the disc in question ("sde") reports 23 UDMA_CRC errors,
compared to 0 about 2 weeks ago.
Reading the complete device just now via DD still reports those 23
errors but no new ones.

Well, RAID 5 should survive a single disc failure (again) but after a
reboot (due to non-RAID related reasons) the RAID came up as "md0 stopped".

cat /proc/mdstat

Personalities :
md0 : inactive sdc1[1](S) sdd1[2](S) sde1[0](S)
      2930279424 blocks

unused devices: <none>



What's that?
First, documentation on the web is rather outdated and/or incomplete.
Second, my guess that "(S)" represents a spare is backuped up by the
kernel source.


mdadm --examine [devices] gives consistent reports about the RAID 5
structure as:

          Magic : a92b4efc
        Version : 0.90.00
           UUID : ec4fdb7b:e57733c0:4dc42c07:36d99219
  Creation Time : Wed Dec 24 11:40:29 2008
     Raid Level : raid5
  Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
     Array Size : 1953519616 (1863.02 GiB 2000.40 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 0
...
         Layout : left-symmetric
     Chunk Size : 256K



The state though differs:

sdc1:
    Update Time : Tue Apr  7 20:51:33 2009
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : ccff6a15 - correct
         Events : 177920
...
      Number   Major   Minor   RaidDevice State
this     1       8       33        1      active sync   /dev/sdc1

   0     0       0        0        0      removed
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8       49        2      active sync   /dev/sdd1



sdd1:
    Update Time : Tue Apr  7 20:51:33 2009
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : ccff6a27 - correct
         Events : 177920

         Layout : left-symmetric
     Chunk Size : 256K

      Number   Major   Minor   RaidDevice State
this     2       8       49        2      active sync   /dev/sdd1

   0     0       0        0        0      removed
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8       49        2      active sync   /dev/sdd1



sde1:
    Update Time : Fri Apr  3 15:00:31 2009
          State : active
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0
       Checksum : ccf463ec - correct
         Events : 7

         Layout : left-symmetric
     Chunk Size : 256K

      Number   Major   Minor   RaidDevice State
this     0       8       65        0      active sync   /dev/sde1

   0     0       8       65        0      active sync   /dev/sde1
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8       49        2      active sync   /dev/sdd1



sde is the device that failed once and was kicked out of the array.
The update time reflects that if I interprete that right.
But how can sde1 status claim 3 active and working devices? IMO that's
way off.


Now, my assumption:
I think I should be able to either remove sde temporarily and just
restart the degraded array from sdc1/sdd1.
correct?

My backup is a few days old and I would really like to keep the work on
the RAID done in the meantime.

If the answer is just 2 or 3 mdadm command lines, I am yours :-)

Best regards

Frank Baumgart


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-04-09  5:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-08 21:29 RAID5 in strange state Frank Baumgart
2009-04-08 21:59 ` Goswin von Brederlow
2009-04-08 22:19   ` Frank Baumgart
2009-04-08 23:43     ` David Rees
2009-04-09  5:51 ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).