From mboxrd@z Thu Jan 1 00:00:00 1970 From: Brad Campbell Subject: 2.6.11-rc4 md loops on missing drives Date: Tue, 15 Feb 2005 17:38:33 +0400 Message-ID: <4211FB59.7090104@wasp.net.au> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-raid-owner@vger.kernel.org To: RAID Linux List-Id: linux-raid.ids G'day all, I have just finished my shiny new RAID-6 box. 15 x 250GB SATA drives. While doing some failure testing (inadvertently due to libata SMART causing command errors) I dropped 3 drives out of the array in sequence. md coped with the first two (as it should), but after the third one dropped out I got the below errors spinning continuously in my syslog until I managed to stop the array with mdadm --stop /dev/md0 I'm not really sure how it's supposed to cope with losing more disks than planned, but filling the syslog with nastiness is not very polite. This box takes _ages_ (like between 6 an 10 hours) to rebuild the array, but I'm willing to run some tests if anyone has particular RAID-6 stuff they want tested before I put it into service. I do plan on a couple of days burn-in testing before I really load it up anyway. The last disk is missing at the moment as I'm short one disk due to a Maxtor dropping its bundle after about 5000 hours. I'm using todays BK kernel plus the libata and libata-dev trees. The drives are all on Promise SATA150TX4 controllers. Feb 15 17:58:28 storage1 kernel: .<6>md: syncing RAID array md0 Feb 15 17:58:28 storage1 kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc. Feb 15 17:58:28 storage1 kernel: md: using maximum available idle IO bandwith (but not more than 200000 KB/sec) for reconstruction. Feb 15 17:58:28 storage1 kernel: md: using 128k window, over a total of 245117312 blocks. Feb 15 17:58:28 storage1 kernel: md: md0: sync done. Feb 15 17:58:28 storage1 kernel: .<6>md: syncing RAID array md0 Feb 15 17:58:28 storage1 kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc. Feb 15 17:58:28 storage1 kernel: md: using maximum available idle IO bandwith (but not more than 200000 KB/sec) for reconstruction. Feb 15 17:58:28 storage1 kernel: md: using 128k window, over a total of 245117312 blocks. Feb 15 17:58:28 storage1 kernel: md: md0: sync done. Feb 15 17:58:28 storage1 kernel: .<6>md: syncing RAID array md0 Feb 15 17:58:28 storage1 kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc. Feb 15 17:58:28 storage1 kernel: md: using maximum available idle IO bandwith (but not more than 200000 KB/sec) for reconstruction. Feb 15 17:58:28 storage1 kernel: md: using 128k window, over a total of 245117312 blocks. Feb 15 17:58:28 storage1 kernel: md: md0: sync done. Feb 15 17:58:28 storage1 kernel: .<6>md: syncing RAID array md0 Feb 15 17:58:28 storage1 kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc. Feb 15 17:58:28 storage1 kernel: md: using maximum available idle IO bandwith (but not more than 200000 KB/sec) for reconstruction. Feb 15 17:58:28 storage1 kernel: md: using 128k window, over a total of 245117312 blocks. Feb 15 17:58:28 storage1 kernel: md: md0: sync done. Feb 15 17:58:28 storage1 kernel: .<6>md: syncing RAID array md0 Feb 15 17:58:28 storage1 kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc. Feb 15 17:58:28 storage1 kernel: md: using maximum available idle IO bandwith (but not more than 200000 KB/sec) for reconstruction. Existing raid config below. Fail any additional 2 drives due to IO errors to cause this issue. storage1:/home/brad# mdadm --detail /dev/md0 /dev/md0: Version : 00.90.01 Creation Time : Tue Feb 15 22:00:16 2005 Raid Level : raid6 Array Size : 3186525056 (3038.91 GiB 3263.00 GB) Device Size : 245117312 (233.76 GiB 251.00 GB) Raid Devices : 15 Total Devices : 15 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Tue Feb 15 17:17:36 2005 State : clean, degraded, resyncing Active Devices : 14 Working Devices : 14 Failed Devices : 1 Spare Devices : 0 Chunk Size : 128K Rebuild Status : 0% complete UUID : 11217f79:ac676966:279f2816:f5678084 Events : 0.40101 Number Major Minor RaidDevice State 0 8 0 0 active sync /dev/devfs/scsi/host0/bus0/target0/lun0/disc 1 8 16 1 active sync /dev/devfs/scsi/host1/bus0/target0/lun0/disc 2 8 32 2 active sync /dev/devfs/scsi/host2/bus0/target0/lun0/disc 3 8 48 3 active sync /dev/devfs/scsi/host3/bus0/target0/lun0/disc 4 8 64 4 active sync /dev/devfs/scsi/host4/bus0/target0/lun0/disc 5 8 80 5 active sync /dev/devfs/scsi/host5/bus0/target0/lun0/disc 6 8 96 6 active sync /dev/devfs/scsi/host6/bus0/target0/lun0/disc 7 8 112 7 active sync /dev/devfs/scsi/host7/bus0/target0/lun0/disc 8 8 128 8 active sync /dev/devfs/scsi/host8/bus0/target0/lun0/disc 9 8 144 9 active sync /dev/devfs/scsi/host9/bus0/target0/lun0/disc 10 8 160 10 active sync /dev/devfs/scsi/host10/bus0/target0/lun0/disc 11 8 176 11 active sync /dev/devfs/scsi/host11/bus0/target0/lun0/disc 12 8 192 12 active sync /dev/devfs/scsi/host12/bus0/target0/lun0/disc 13 8 208 13 active sync /dev/devfs/scsi/host13/bus0/target0/lun0/disc 14 0 0 - removed 15 8 224 - faulty /dev/devfs/scsi/host14/bus0/target0/lun0/disc Regards, Brad -- "Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so." -- Douglas Adams