RAID5 in strange state - Frank Baumgart

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Frank Baumgart <frank.baumgart@gmx.net>
To: linux-raid <linux-raid@vger.kernel.org>
Subject: RAID5 in strange state
Date: Wed, 08 Apr 2009 23:29:20 +0200	[thread overview]
Message-ID: <49DD1730.4070108@gmx.net> (raw)

Dear List,

I use MD RAID 5 since some years and so far had to recover from single
disk failures a few times which was always successful.
Now though, I am puzzled.

Setup:
Some PC with 3x WD 1 TB SATA disk drives set up as RAID 5 using kernel
2.6.27.21 (now); the array ran fine for at least 6 months now.

I check the state of the RAID every few days with looking at
/proc/mdstat manually.
Apparently one drive had been kicked out of the array 4 days ago without
me noticing it.
Root cause seemed to be bad cabling but is not confirmed yet.
Anyway, the disc in question ("sde") reports 23 UDMA_CRC errors,
compared to 0 about 2 weeks ago.
Reading the complete device just now via DD still reports those 23
errors but no new ones.

Well, RAID 5 should survive a single disc failure (again) but after a
reboot (due to non-RAID related reasons) the RAID came up as "md0 stopped".

cat /proc/mdstat

Personalities :
md0 : inactive sdc1[1](S) sdd1[2](S) sde1[0](S)
      2930279424 blocks

unused devices: <none>

What's that?
First, documentation on the web is rather outdated and/or incomplete.
Second, my guess that "(S)" represents a spare is backuped up by the
kernel source.

mdadm --examine [devices] gives consistent reports about the RAID 5
structure as:

          Magic : a92b4efc
        Version : 0.90.00
           UUID : ec4fdb7b:e57733c0:4dc42c07:36d99219
  Creation Time : Wed Dec 24 11:40:29 2008
     Raid Level : raid5
  Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
     Array Size : 1953519616 (1863.02 GiB 2000.40 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 0
...
         Layout : left-symmetric
     Chunk Size : 256K

The state though differs:

sdc1:
    Update Time : Tue Apr  7 20:51:33 2009
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : ccff6a15 - correct
         Events : 177920
...
      Number   Major   Minor   RaidDevice State
this     1       8       33        1      active sync   /dev/sdc1

   0     0       0        0        0      removed
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8       49        2      active sync   /dev/sdd1

sdd1:
    Update Time : Tue Apr  7 20:51:33 2009
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : ccff6a27 - correct
         Events : 177920

         Layout : left-symmetric
     Chunk Size : 256K

      Number   Major   Minor   RaidDevice State
this     2       8       49        2      active sync   /dev/sdd1

   0     0       0        0        0      removed
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8       49        2      active sync   /dev/sdd1

sde1:
    Update Time : Fri Apr  3 15:00:31 2009
          State : active
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0
       Checksum : ccf463ec - correct
         Events : 7

         Layout : left-symmetric
     Chunk Size : 256K

      Number   Major   Minor   RaidDevice State
this     0       8       65        0      active sync   /dev/sde1

   0     0       8       65        0      active sync   /dev/sde1
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8       49        2      active sync   /dev/sdd1

sde is the device that failed once and was kicked out of the array.
The update time reflects that if I interprete that right.
But how can sde1 status claim 3 active and working devices? IMO that's
way off.

Now, my assumption:
I think I should be able to either remove sde temporarily and just
restart the degraded array from sdc1/sdd1.
correct?

My backup is a few days old and I would really like to keep the work on
the RAID done in the meantime.

If the answer is just 2 or 3 mdadm command lines, I am yours :-)

Best regards

Frank Baumgart

next             reply	other threads:[~2009-04-08 21:29 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-08 21:29 Frank Baumgart [this message]
2009-04-08 21:59 ` RAID5 in strange state Goswin von Brederlow
2009-04-08 22:19   ` Frank Baumgart
2009-04-08 23:43     ` David Rees
2009-04-09  5:51 ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49DD1730.4070108@gmx.net \
    --to=frank.baumgart@gmx.net \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).