From mboxrd@z Thu Jan 1 00:00:00 1970 From: Robert Osiel Subject: Re: A few mdadm questions Date: Sun, 14 Nov 2004 10:12:39 -0600 Message-ID: <419783F7.5030807@osiel.org> References: <200411140203.iAE23rN08652@www.watkins-home.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <200411140203.iAE23rN08652@www.watkins-home.com> Sender: linux-raid-owner@vger.kernel.org To: Guy Cc: linux-raid@vger.kernel.org List-Id: linux-raid.ids Guy, Thanks for the input. I'm not sure why that disk is now a spare either. I was hoping that there was some way to re-write that superblock to convince the array it was a good disk. I saw some old (pre-mdadm) advice which mentioned using mkraid to rewrite (all) of the superblocks, but that seems really drastic. In the worst case, as you mentioned, I would try to start with the other (failed) disk. Most of the data on that drive is fairly static, so I hope to have some good recovery -- assuming the disk is still OK (in the past it has been something like a loose cable, so I'm hopeful). I'll wait and see if Neil has any advice. *crosses fingers* Bob Guy wrote: >Your array had 5 disks, not counting any spares. >You need to start the array with at least 4 of the five disks, spares don't >help when starting an array. > >I don't know why it thinks your disk (hdi1) is a spare. But, that may >explain how it was removed from the array. Unless Neil has some magic >incantations, I think you are out of luck. > >If Neil has no ideas, you could try to start the array with the drive that >failed (hdk1), but that will cause corruption of any stripes that have >changed since the drive was removed from the array. So, save this option as >a last resort. Of course, if hdk1 has failed hard, you will not be able to >use it. > >Last resort!!! Corruption will occur! >mdadm --assemble --force /dev/md0 /dev/hde1 /dev/hdk1 /dev/hdm1 /dev/hdo1 > >Guy > >-----Original Message----- >From: linux-raid-owner@vger.kernel.org >[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Robert Osiel >Sent: Saturday, November 13, 2004 7:36 PM >To: linux-raid@vger.kernel.org >Subject: Re: A few mdadm questions > >Guy/Neil: > >Thanks a lot for the help. >Sorry that I didn't include all of the info in my last message, but this >box is off the network right now and doesn't even have a floppy or >monitor, so I had to do a little work to get the info out. > >I tried to start the array with the 3 good disks and the 1 spare, but I >got an error to the effect that 3 good + 1 spare drives are not enough >to start the array (see below) > > > cat /proc/mdstat >Personalities : [linear] [raid0] [raid1] [raid5] [multipath] >read_ahead not set >unused devices: > > > mdadm -D /dev/md0 >mdadm: md device /dev/md0 does not appear to be active > > > mdadm --assemble --force /dev/md0 /dev/hde1 /dev/hdi1 /dev/hdm1 /dev/hdo1 >mdadm: /dev/md0 assembled from 3 drives and 1 spare - not enough to >start the array > > > cat /proc/mdstat >Personalities : [linear] [raid0] [raid1] [raid5] [multipath] >read_ahead not set >md0: inactive >ide/host2/bus0/target0/lun0/part1[0] >ide/host4/bus0/target0/lun0/part1[5] >ide/host6/bus1/target0/lun0/part1[4] >ide/host6/bus0/target0/lun0/part1[3] > >Some notes: >hdk1 is the disk which failed initially >hdi1 is the disk which I removed and which thinks it is a 'spare' > >The other three drives report basically identical info, like this: > > mdadm -E /dev/hde1 > >Magic : a92b4efc >Version : 00.90.00 >UUID : ec2e64a8:fffd3e41:ffee5518:2f3e858c >Creation Time : Sun Oct 5 01:25:49 2003 >Build Level: raid5 >Device Size : 160079488 (152.66 GiB 163.92 GB) >Raid Devices : 5 >Total Devices : 5 >Preferred Minor : 0 > >Update Time Sat Sep 25 22:07:26 2004 >State : dirty >Active Devices : 3 >Working Devices : 4 >Failed Devices : 1 >Spare Devices : 1 >Checksum : 4ee5cc77 - correct >Events : 0.10 > >Layout : left-symmetric >Chunk Size : 128K > > Number Major Minor RaidDevice State >this 0 22 1 0 active sync >0 0 22 1 0 active sync >1 1 0 0 1 faulty removed >2 2 56 1 2 faulty >/dev/ide/host4/bus0/target0/lun0/part1 >3 3 57 1 3 active sync >/dev/ide/host4/bus1/target0/lun0/part1 >4 4 88 1 4 active sync >/dev/ide/host6/bus0/target0/lun0/part1 >5 5 34 1 5 spare > >Here are the two drives in question: > >__________mdadm -E /dev/hdi1: > >Magic : a92b4efc >Version : 00.90.00 >UUID : ec2e64a8:fffd3e41:ffee5518:2f3e858c >Creation Time : Sun Oct 5 01:25:49 2003 >Build Level: raid5 >Device Size : 160079488 (152.66 GiB 163.92 GB) >Raid Devices : 5 >Total Devices : 5 >Preferred Minor : 0 > >Update Time Sat Sep 25 22:07:26 2004 >State : dirty >Active Devices : 3 >Working Devices : 4 >Failed Devices : 1 >Spare Devices : 1 >Checksum : 4ee5cc77 - correct >Events : 0.10 > >Layout : left-symmetric >Chunk Size : 128K > > Number Major Minor RaidDevice State >this 5 34 1 5 spare >0 0 22 1 0 active sync >1 1 0 0 1 faulty removed >2 2 56 1 2 faulty >/dev/ide/host4/bus0/target0/lun0/part1 >3 3 57 1 3 active sync >/dev/ide/host4/bus1/target0/lun0/part1 >4 4 88 1 4 active sync >/dev/ide/host6/bus0/target0/lun0/part1 >5 5 34 1 5 spare > > >__________mdadm -E /dev/hdk1 >Magic : a92b4efc >Version : 00.90.00 >UUID : ec2e64a8:fffd3e41:ffee5518:2f3e858c >Creation Time : Sun Oct 5 01:25:49 2003 >Build Level: raid5 >Device Size : 160079488 (152.66 GiB 163.92 GB) >Raid Devices : 5 >Total Devices : 5 >Preferred Minor : 0 > >Update Time Sat Sep 25 22:07:24 2004 >State : dirty >Active Devices : 4 >Working Devices : 5 >Failed Devices : 0 >Spare Devices : 1 >Checksum : 4ee5cc77 - correct >Events : 0.9 > >Layout : left-symmetric >Chunk Size : 128K > > Number Major Minor RaidDevice State >this 2 56 1 2 active sync >/dev/ide/host4/bus0/target0/lun0/part1 >0 0 22 1 0 active sync >1 1 0 0 1 faulty removed >2 2 56 1 2 active sync >/dev/ide/host4/bus0/target0/lun0/part1 >3 3 57 1 3 active sync >/dev/ide/host4/bus1/target0/lun0/part1 >4 4 88 1 4 active sync >/dev/ide/host6/bus0/target0/lun0/part1 >5 5 34 1 5 spare > > > > >Neil Brown wrote: > > > >>On Friday November 12, bugzilla@watkins-home.com wrote: >> >> >> >> >>>First, stop using the old raid tools. Use mdadm only! mdadm would not >>> >>> >have > > >>>allowed your error to occur. >>> >>> >>> >>> >>I'm afraid this isn't correct, though the rest of Guy's advice is very >>good (thanks Guy!). >> >> mdadm --remove >>does exactly the same thing as >> raidhotremove >> >>It is the kernel that should (and does) stop you from hot-removing a >>device that is working and active. So I'm not quite sure what >>happened to Robert... >> >>Robert: it is always useful to provide specific with the output of >> cat /proc/mdstat >>and >> mdadm -D /dev/mdX >> >>This avoids possible confusion over terminology. >> >>NeilBrown >> >> >> >> > >- >To unsubscribe from this list: send the line "unsubscribe linux-raid" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html > >