How should a raid array fail? shall we count the ways...

All of lore.kernel.org
 help / color / mirror / Atom feed

From: David Greaves <david@dgreaves.com>
To: linux-raid@vger.kernel.org
Subject: How should a raid array fail? shall we count the ways...
Date: Fri, 04 Jun 2004 21:54:38 +0100	[thread overview]
Message-ID: <40C0E18E.2070903@dgreaves.com> (raw)

Summary:
If I fault a device on a raid5 array it goes->degraded
If I fault another it's dead. But:
a) mdadm --detail says: State : clean, degraded although I suspect it 
should have automatically stopped.
Then either
b1) adding another device results in a sync loop
b2) if the array is mounted then it can't be stopped and a reboot is needed

I hope this is useful - please tell me if I'm being dim...

So here's my array:
(yep, I got my disk :) )

cu:~# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.01
  Creation Time : Fri Jun  4 20:43:43 2004
     Raid Level : raid5
     Array Size : 2939520 (2.80 GiB 3.01 GB)
    Device Size : 979840 (956.88 MiB 1003.36 MB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Fri Jun  4 20:44:40 2004
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 128K

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
       2       8       33        2      active sync   /dev/sdc1
       3       8       49        3      active sync   /dev/sdd1
           UUID : e95ff7de:36d3f438:0a021fa4:b473a6e2
         Events : 0.2

cu:~# mdadm /dev/md0 -f /dev/sda1
mdadm: set /dev/sda1 faulty in /dev/md0

cu:~# mdadm --detail /dev/md0
/dev/md0:
<snip>
          State : clean, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0
<snip>
    Number   Major   Minor   RaidDevice State
       0       0        0       -1      removed
       1       8       17        1      active sync   /dev/sdb1
       2       8       33        2      active sync   /dev/sdc1
       3       8       49        3      active sync   /dev/sdd1

       4       8        1       -1      faulty   /dev/sda1


################################################
Failure a) --detail is somewhat optimistic :)

cu:~# mdadm /dev/md0 -f /dev/sdb1
mdadm: set /dev/sdb1 faulty in /dev/md0
cu:~# mdadm --detail /dev/md0
/dev/md0:
<snip>
          State : clean, degraded
 Active Devices : 2
Working Devices : 2
 Failed Devices : 2
  Spare Devices : 0
<snip>
    Number   Major   Minor   RaidDevice State
       0       0        0       -1      removed
       1       0        0       -1      removed
       2       8       33        2      active sync   /dev/sdc1
       3       8       49        3      active sync   /dev/sdd1

       4       8       17       -1      faulty   /dev/sdb1
       5       8        1       -1      faulty   /dev/sda1



################################################
Failure b1) failed 2 devices, now add one

cu:~# mdadm /dev/md0 -a /dev/sda2
mdadm: hot added /dev/sda2

dmesg starts printing:
Jun  4 22:10:21 cu kernel: md: syncing RAID array md0
Jun  4 22:10:21 cu kernel: md: minimum _guaranteed_ reconstruction 
speed: 1000 KB/sec/disc.
Jun  4 22:10:21 cu kernel: md: using maximum available idle IO bandwith 
(but not more than 200000 KB/sec) for reconstruction.
Jun  4 22:10:21 cu kernel: md: using 128k window, over a total of 979840 
blocks.
Jun  4 22:10:21 cu kernel: md: md0: sync done.
Jun  4 22:10:21 cu kernel: md: syncing RAID array md0
Jun  4 22:10:21 cu kernel: md: minimum _guaranteed_ reconstruction 
speed: 1000 KB/sec/disc.
Jun  4 22:10:21 cu kernel: md: using maximum available idle IO bandwith 
(but not more than 200000 KB/sec) for reconstruction.
Jun  4 22:10:21 cu kernel: md: using 128k window, over a total of 979840 
blocks.
Jun  4 22:10:21 cu kernel: md: md0: sync done.
Jun  4 22:10:21 cu kernel: md: syncing RAID array md0
...
over and over *very* quickly


cu:~# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.01
  Creation Time : Fri Jun  4 22:03:22 2004
     Raid Level : raid5
     Array Size : 2939520 (2.80 GiB 3.01 GB)
    Device Size : 979840 (956.88 MiB 1003.36 MB)
   Raid Devices : 4
  Total Devices : 5
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Fri Jun  4 22:10:40 2004
          State : clean, degraded
 Active Devices : 2
Working Devices : 3
 Failed Devices : 2
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 128K

    Number   Major   Minor   RaidDevice State
       0       0        0       -1      removed
       1       0        0       -1      removed
       2       8       33        2      active sync   /dev/sdc1
       3       8       49        3      active sync   /dev/sdd1

       4       8        2        0      spare   /dev/sda2
       5       8       17       -1      faulty   /dev/sdb1
       6       8        1       -1      faulty   /dev/sda1
           UUID : 76cd1aba:ae9bb374:8ddc1702:a7e9631e
         Events : 0.903
cu:~# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid5] [raid6]
md0 : active raid5 sda2[4] sdd1[3] sdc1[2] sdb1[5](F) sda1[6](F)
      2939520 blocks level 5, 128k chunk, algorithm 2 [4/2] [__UU]

unused devices: <none>
cu:~#

################################################
Failure b2) filesystem was mounted before either disk failed. After 2nd 
failure:

cu:~# mount /dev/md0 /huge
cu:~# mdadm /dev/md0 -f /dev/sdd1
mdadm: set /dev/sdd1 faulty in /dev/md0
cu:~# mdadm /dev/md0 -f /dev/sdb1
mdadm: set /dev/sdb1 faulty in /dev/md0

cu:~# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.01
  Creation Time : Fri Jun  4 22:47:36 2004
     Raid Level : raid5
     Array Size : 2939520 (2.80 GiB 3.01 GB)
    Device Size : 979840 (956.88 MiB 1003.36 MB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Fri Jun  4 22:49:16 2004
          State : clean, degraded
 Active Devices : 2
Working Devices : 2
 Failed Devices : 2
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 128K

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       0        0       -1      removed
       2       8       33        2      active sync   /dev/sdc1
       3       0        0       -1      removed

       4       8       49       -1      faulty   /dev/sdd1
       5       8       17       -1      faulty   /dev/sdb1
           UUID : 15fa81ab:806e18a2:acfefe4f:b644647d
         Events : 0.13

cu:~# mdadm --stop /dev/md0
mdadm: fail to stop array /dev/md0: Device or resource busy
cu:~# umount /huge

Message from syslogd@cu at Fri Jun  4 22:49:38 2004 ...
cu kernel: journal-601, buffer write failed
Segmentation fault
cu:~# umount /huge
umount: /dev/md0: not mounted
umount: /dev/md0: not mounted
cu:~# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.01
  Creation Time : Fri Jun  4 22:47:36 2004
     Raid Level : raid5
     Array Size : 2939520 (2.80 GiB 3.01 GB)
    Device Size : 979840 (956.88 MiB 1003.36 MB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Fri Jun  4 22:49:38 2004
          State : clean, degraded
 Active Devices : 2
Working Devices : 2
 Failed Devices : 2
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 128K

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       0        0       -1      removed
       2       8       33        2      active sync   /dev/sdc1
       3       0        0       -1      removed

       4       8       49       -1      faulty   /dev/sdd1
       5       8       17       -1      faulty   /dev/sdb1
           UUID : 15fa81ab:806e18a2:acfefe4f:b644647d
         Events : 0.15
cu:~# mdadm --stop /dev/md0
mdadm: fail to stop array /dev/md0: Device or resource busy
cu:~# mount
/dev/hda2 on / type xfs (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/hda1 on /boot type ext3 (rw)
usbfs on /proc/bus/usb type usbfs (rw)
cu:(pid1404) on /net type nfs 
(intr,rw,port=1023,timeo=8,retrans=110,indirect,map=/usr/share/am-utils/amd.net)

cu:~# mdadm --stop /dev/md0
mdadm: fail to stop array /dev/md0: Device or resource busy
cu:~#

BTW, No mdadm is following the array.




I know that if you hit your head against a brick wall and it hurts you 
should stop but I thought this behaviour was worth reporting :)


David

                 reply	other threads:[~2004-06-04 20:54 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=40C0E18E.2070903@dgreaves.com \
    --to=david@dgreaves.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.