From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bart Kus Subject: Re: (help!) MD RAID6 won't --re-add devices? Date: Sat, 15 Jan 2011 09:48:55 -0800 Message-ID: <4D31DE07.1000507@bartk.us> References: <4D2EF83D.6080203@bartk.us> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4D2EF83D.6080203@bartk.us> Sender: linux-raid-owner@vger.kernel.org To: linux-raid@vger.kernel.org List-Id: linux-raid.ids Things seem to have gone from bad to worse. I upgraded to the latest mdadm, and it actually let me do an --add operation, but --re-add was still failing. It added all the devices as spares though. I stopped the array and tried to re-assemble it, but it's not starting. jo ~ # mdadm -A /dev/md4 -f -u da14eb85:00658f24:80f7a070:b9026515 mdadm: /dev/md4 assembled from 5 drives and 5 spares - not enough to start the array. How do I promote these "spares" to being the active decides they once were? Yes, they're behind a few events, so there will be some data loss. --Bart On 1/13/2011 5:03 AM, Bart Kus wrote: > Hello, > > I had a Port Multiplier failure overnight. This put 5 out of 10 > drives offline, degrading my RAID6 array. The file system is still > mounted (and failing to write): > > Buffer I/O error on device md4, logical block 3907023608 > Filesystem "md4": xfs_log_force: error 5 returned. > etc... > > The array is in the following state: > > /dev/md4: > Version : 1.02 > Creation Time : Sun Aug 10 23:41:49 2008 > Raid Level : raid6 > Array Size : 15628094464 (14904.11 GiB 16003.17 GB) > Used Dev Size : 1953511808 (1863.01 GiB 2000.40 GB) > Raid Devices : 10 > Total Devices : 11 > Persistence : Superblock is persistent > > Update Time : Wed Jan 12 05:32:14 2011 > State : clean, degraded > Active Devices : 5 > Working Devices : 5 > Failed Devices : 6 > Spare Devices : 0 > > Chunk Size : 64K > > Name : 4 > UUID : da14eb85:00658f24:80f7a070:b9026515 > Events : 4300692 > > Number Major Minor RaidDevice State > 15 8 1 0 active sync /dev/sda1 > 1 0 0 1 removed > 12 8 33 2 active sync /dev/sdc1 > 16 8 49 3 active sync /dev/sdd1 > 4 0 0 4 removed > 20 8 193 5 active sync /dev/sdm1 > 6 0 0 6 removed > 7 0 0 7 removed > 8 0 0 8 removed > 13 8 17 9 active sync /dev/sdb1 > > 10 8 97 - faulty spare > 11 8 129 - faulty spare > 14 8 113 - faulty spare > 17 8 81 - faulty spare > 18 8 65 - faulty spare > 19 8 145 - faulty spare > > I have replaced the faulty PM and the drives have registered back with > the system, under new names: > > sd 3:0:0:0: [sdn] Attached SCSI disk > sd 3:1:0:0: [sdo] Attached SCSI disk > sd 3:2:0:0: [sdp] Attached SCSI disk > sd 3:4:0:0: [sdr] Attached SCSI disk > sd 3:3:0:0: [sdq] Attached SCSI disk > > But I can't seem to --re-add them into the array now! > > # mdadm /dev/md4 --re-add /dev/sdn1 --re-add /dev/sdo1 --re-add > /dev/sdp1 --re-add /dev/sdr1 --re-add /dev/sdq1 > mdadm: add new device failed for /dev/sdn1 as 21: Device or resource busy > > I haven't unmounted the file system and/or stopped the /dev/md4 > device, since I think that would drop any buffers either layer might > be holding. I'd of course prefer to lose as little data as possible. > How can I get this array going again? > > PS: I think the reason "Failed Devices" shows 6 and not 5 is because I > had a single HD failure a couple weeks back. I replaced the drive and > the array re-built A-OK. I guess it still counted the failure since > the array wasn't stopped during the repair. > > Thanks for any guidance, > > --Bart > > PPS: mdadm - v3.0 - 2nd June 2009 > PPS: Linux jo.bartk.us 2.6.35-gentoo-r9 #1 SMP Sat Oct 2 21:22:14 PDT > 2010 x86_64 Intel(R) Core(TM)2 Quad CPU @ 2.40GHz GenuineIntel GNU/Linux > PPS: # mdadm --examine /dev/sdn1 > /dev/sdn1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x0 > Array UUID : da14eb85:00658f24:80f7a070:b9026515 > Name : 4 > Creation Time : Sun Aug 10 23:41:49 2008 > Raid Level : raid6 > Raid Devices : 10 > > Avail Dev Size : 3907023730 (1863.01 GiB 2000.40 GB) > Array Size : 31256188928 (14904.11 GiB 16003.17 GB) > Used Dev Size : 3907023616 (1863.01 GiB 2000.40 GB) > Data Offset : 272 sectors > Super Offset : 8 sectors > State : clean > Device UUID : c0cf419f:4c33dc64:84bc1c1a:7e9778ba > > Update Time : Wed Jan 12 05:39:55 2011 > Checksum : bdb14e66 - correct > Events : 4300672 > > Chunk Size : 64K > > Device Role : spare > Array State : A.AA.A...A ('A' == active, '.' == missing) > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html