How should a raid array fail? shall we count the ways...

All of lore.kernel.org
 help / color / mirror / Atom feed

* How should a raid array fail? shall we count the ways...
@ 2004-06-04 20:54 David Greaves
  0 siblings, 0 replies; only message in thread
From: David Greaves @ 2004-06-04 20:54 UTC (permalink / raw)
  To: linux-raid

Summary:
If I fault a device on a raid5 array it goes->degraded
If I fault another it's dead. But:
a) mdadm --detail says: State : clean, degraded although I suspect it 
should have automatically stopped.
Then either
b1) adding another device results in a sync loop
b2) if the array is mounted then it can't be stopped and a reboot is needed

I hope this is useful - please tell me if I'm being dim...

So here's my array:
(yep, I got my disk :) )

cu:~# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.01
  Creation Time : Fri Jun  4 20:43:43 2004
     Raid Level : raid5
     Array Size : 2939520 (2.80 GiB 3.01 GB)
    Device Size : 979840 (956.88 MiB 1003.36 MB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Fri Jun  4 20:44:40 2004
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 128K

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
       2       8       33        2      active sync   /dev/sdc1
       3       8       49        3      active sync   /dev/sdd1
           UUID : e95ff7de:36d3f438:0a021fa4:b473a6e2
         Events : 0.2

cu:~# mdadm /dev/md0 -f /dev/sda1
mdadm: set /dev/sda1 faulty in /dev/md0

cu:~# mdadm --detail /dev/md0
/dev/md0:
<snip>
          State : clean, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0
<snip>
    Number   Major   Minor   RaidDevice State
       0       0        0       -1      removed
       1       8       17        1      active sync   /dev/sdb1
       2       8       33        2      active sync   /dev/sdc1
       3       8       49        3      active sync   /dev/sdd1

       4       8        1       -1      faulty   /dev/sda1


################################################
Failure a) --detail is somewhat optimistic :)

cu:~# mdadm /dev/md0 -f /dev/sdb1
mdadm: set /dev/sdb1 faulty in /dev/md0
cu:~# mdadm --detail /dev/md0
/dev/md0:
<snip>
          State : clean, degraded
 Active Devices : 2
Working Devices : 2
 Failed Devices : 2
  Spare Devices : 0
<snip>
    Number   Major   Minor   RaidDevice State
       0       0        0       -1      removed
       1       0        0       -1      removed
       2       8       33        2      active sync   /dev/sdc1
       3       8       49        3      active sync   /dev/sdd1

       4       8       17       -1      faulty   /dev/sdb1
       5       8        1       -1      faulty   /dev/sda1



################################################
Failure b1) failed 2 devices, now add one

cu:~# mdadm /dev/md0 -a /dev/sda2
mdadm: hot added /dev/sda2

dmesg starts printing:
Jun  4 22:10:21 cu kernel: md: syncing RAID array md0
Jun  4 22:10:21 cu kernel: md: minimum _guaranteed_ reconstruction 
speed: 1000 KB/sec/disc.
Jun  4 22:10:21 cu kernel: md: using maximum available idle IO bandwith 
(but not more than 200000 KB/sec) for reconstruction.
Jun  4 22:10:21 cu kernel: md: using 128k window, over a total of 979840 
blocks.
Jun  4 22:10:21 cu kernel: md: md0: sync done.
Jun  4 22:10:21 cu kernel: md: syncing RAID array md0
Jun  4 22:10:21 cu kernel: md: minimum _guaranteed_ reconstruction 
speed: 1000 KB/sec/disc.
Jun  4 22:10:21 cu kernel: md: using maximum available idle IO bandwith 
(but not more than 200000 KB/sec) for reconstruction.
Jun  4 22:10:21 cu kernel: md: using 128k window, over a total of 979840 
blocks.
Jun  4 22:10:21 cu kernel: md: md0: sync done.
Jun  4 22:10:21 cu kernel: md: syncing RAID array md0
...
over and over *very* quickly


cu:~# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.01
  Creation Time : Fri Jun  4 22:03:22 2004
     Raid Level : raid5
     Array Size : 2939520 (2.80 GiB 3.01 GB)
    Device Size : 979840 (956.88 MiB 1003.36 MB)
   Raid Devices : 4
  Total Devices : 5
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Fri Jun  4 22:10:40 2004
          State : clean, degraded
 Active Devices : 2
Working Devices : 3
 Failed Devices : 2
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 128K

    Number   Major   Minor   RaidDevice State
       0       0        0       -1      removed
       1       0        0       -1      removed
       2       8       33        2      active sync   /dev/sdc1
       3       8       49        3      active sync   /dev/sdd1

       4       8        2        0      spare   /dev/sda2
       5       8       17       -1      faulty   /dev/sdb1
       6       8        1       -1      faulty   /dev/sda1
           UUID : 76cd1aba:ae9bb374:8ddc1702:a7e9631e
         Events : 0.903
cu:~# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid5] [raid6]
md0 : active raid5 sda2[4] sdd1[3] sdc1[2] sdb1[5](F) sda1[6](F)
      2939520 blocks level 5, 128k chunk, algorithm 2 [4/2] [__UU]

unused devices: <none>
cu:~#

################################################
Failure b2) filesystem was mounted before either disk failed. After 2nd 
failure:

cu:~# mount /dev/md0 /huge
cu:~# mdadm /dev/md0 -f /dev/sdd1
mdadm: set /dev/sdd1 faulty in /dev/md0
cu:~# mdadm /dev/md0 -f /dev/sdb1
mdadm: set /dev/sdb1 faulty in /dev/md0

cu:~# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.01
  Creation Time : Fri Jun  4 22:47:36 2004
     Raid Level : raid5
     Array Size : 2939520 (2.80 GiB 3.01 GB)
    Device Size : 979840 (956.88 MiB 1003.36 MB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Fri Jun  4 22:49:16 2004
          State : clean, degraded
 Active Devices : 2
Working Devices : 2
 Failed Devices : 2
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 128K

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       0        0       -1      removed
       2       8       33        2      active sync   /dev/sdc1
       3       0        0       -1      removed

       4       8       49       -1      faulty   /dev/sdd1
       5       8       17       -1      faulty   /dev/sdb1
           UUID : 15fa81ab:806e18a2:acfefe4f:b644647d
         Events : 0.13

cu:~# mdadm --stop /dev/md0
mdadm: fail to stop array /dev/md0: Device or resource busy
cu:~# umount /huge

Message from syslogd@cu at Fri Jun  4 22:49:38 2004 ...
cu kernel: journal-601, buffer write failed
Segmentation fault
cu:~# umount /huge
umount: /dev/md0: not mounted
umount: /dev/md0: not mounted
cu:~# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.01
  Creation Time : Fri Jun  4 22:47:36 2004
     Raid Level : raid5
     Array Size : 2939520 (2.80 GiB 3.01 GB)
    Device Size : 979840 (956.88 MiB 1003.36 MB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Fri Jun  4 22:49:38 2004
          State : clean, degraded
 Active Devices : 2
Working Devices : 2
 Failed Devices : 2
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 128K

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       0        0       -1      removed
       2       8       33        2      active sync   /dev/sdc1
       3       0        0       -1      removed

       4       8       49       -1      faulty   /dev/sdd1
       5       8       17       -1      faulty   /dev/sdb1
           UUID : 15fa81ab:806e18a2:acfefe4f:b644647d
         Events : 0.15
cu:~# mdadm --stop /dev/md0
mdadm: fail to stop array /dev/md0: Device or resource busy
cu:~# mount
/dev/hda2 on / type xfs (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/hda1 on /boot type ext3 (rw)
usbfs on /proc/bus/usb type usbfs (rw)
cu:(pid1404) on /net type nfs 
(intr,rw,port=1023,timeo=8,retrans=110,indirect,map=/usr/share/am-utils/amd.net)

cu:~# mdadm --stop /dev/md0
mdadm: fail to stop array /dev/md0: Device or resource busy
cu:~#

BTW, No mdadm is following the array.




I know that if you hit your head against a brick wall and it hurts you 
should stop but I thought this behaviour was worth reporting :)


David



^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2004-06-04 20:54 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-04 20:54 How should a raid array fail? shall we count the ways David Greaves

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.