All of lore.kernel.org
 help / color / mirror / Atom feed
From: Frank Baumgart <frank.baumgart@gmx.net>
To: linux-raid <linux-raid@vger.kernel.org>
Subject: RAID5 in strange state
Date: Wed, 08 Apr 2009 23:29:20 +0200	[thread overview]
Message-ID: <49DD1730.4070108@gmx.net> (raw)

Dear List,

I use MD RAID 5 since some years and so far had to recover from single
disk failures a few times which was always successful.
Now though, I am puzzled.

Setup:
Some PC with 3x WD 1 TB SATA disk drives set up as RAID 5 using kernel
2.6.27.21 (now); the array ran fine for at least 6 months now.

I check the state of the RAID every few days with looking at
/proc/mdstat manually.
Apparently one drive had been kicked out of the array 4 days ago without
me noticing it.
Root cause seemed to be bad cabling but is not confirmed yet.
Anyway, the disc in question ("sde") reports 23 UDMA_CRC errors,
compared to 0 about 2 weeks ago.
Reading the complete device just now via DD still reports those 23
errors but no new ones.

Well, RAID 5 should survive a single disc failure (again) but after a
reboot (due to non-RAID related reasons) the RAID came up as "md0 stopped".

cat /proc/mdstat

Personalities :
md0 : inactive sdc1[1](S) sdd1[2](S) sde1[0](S)
      2930279424 blocks

unused devices: <none>



What's that?
First, documentation on the web is rather outdated and/or incomplete.
Second, my guess that "(S)" represents a spare is backuped up by the
kernel source.


mdadm --examine [devices] gives consistent reports about the RAID 5
structure as:

          Magic : a92b4efc
        Version : 0.90.00
           UUID : ec4fdb7b:e57733c0:4dc42c07:36d99219
  Creation Time : Wed Dec 24 11:40:29 2008
     Raid Level : raid5
  Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
     Array Size : 1953519616 (1863.02 GiB 2000.40 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 0
...
         Layout : left-symmetric
     Chunk Size : 256K



The state though differs:

sdc1:
    Update Time : Tue Apr  7 20:51:33 2009
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : ccff6a15 - correct
         Events : 177920
...
      Number   Major   Minor   RaidDevice State
this     1       8       33        1      active sync   /dev/sdc1

   0     0       0        0        0      removed
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8       49        2      active sync   /dev/sdd1



sdd1:
    Update Time : Tue Apr  7 20:51:33 2009
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : ccff6a27 - correct
         Events : 177920

         Layout : left-symmetric
     Chunk Size : 256K

      Number   Major   Minor   RaidDevice State
this     2       8       49        2      active sync   /dev/sdd1

   0     0       0        0        0      removed
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8       49        2      active sync   /dev/sdd1



sde1:
    Update Time : Fri Apr  3 15:00:31 2009
          State : active
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0
       Checksum : ccf463ec - correct
         Events : 7

         Layout : left-symmetric
     Chunk Size : 256K

      Number   Major   Minor   RaidDevice State
this     0       8       65        0      active sync   /dev/sde1

   0     0       8       65        0      active sync   /dev/sde1
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8       49        2      active sync   /dev/sdd1



sde is the device that failed once and was kicked out of the array.
The update time reflects that if I interprete that right.
But how can sde1 status claim 3 active and working devices? IMO that's
way off.


Now, my assumption:
I think I should be able to either remove sde temporarily and just
restart the degraded array from sdc1/sdd1.
correct?

My backup is a few days old and I would really like to keep the work on
the RAID done in the meantime.

If the answer is just 2 or 3 mdadm command lines, I am yours :-)

Best regards

Frank Baumgart


             reply	other threads:[~2009-04-08 21:29 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-08 21:29 Frank Baumgart [this message]
2009-04-08 21:59 ` RAID5 in strange state Goswin von Brederlow
2009-04-08 22:19   ` Frank Baumgart
2009-04-08 23:43     ` David Rees
2009-04-09  5:51 ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49DD1730.4070108@gmx.net \
    --to=frank.baumgart@gmx.net \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.