* mdadm: failed devices become spares! @ 2010-05-16 15:40 Pierre Vignéras 2010-05-16 19:56 ` Leslie Rhorer 0 siblings, 1 reply; 12+ messages in thread From: Pierre Vignéras @ 2010-05-16 15:40 UTC (permalink / raw) To: linux-raid Hi, I encountered a critical problem with mdadm that I submitted to the Debian mailing list (it's a debian lenny/stable). They asked me to submit this to you. So that's what I do. To prevent duplication of description/information, I give you the URL of that bug description: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=578352 If you prefer the full stuff to be copy/pasted to that mailing list, just ask for it. Note: that bug happened again today, on another RAID array. So the good news is that it is somewhat reproducible! The bad news, is that unless you have a magic solution, all my data are just lost (half of it was in the backup pipe!)... Thanks for any help, and regards. -- Pierre Vignéras -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: mdadm: failed devices become spares! 2010-05-16 15:40 mdadm: failed devices become spares! Pierre Vignéras @ 2010-05-16 19:56 ` Leslie Rhorer 2010-05-17 18:10 ` Pierre Vignéras 0 siblings, 1 reply; 12+ messages in thread From: Leslie Rhorer @ 2010-05-16 19:56 UTC (permalink / raw) To: 'Pierre Vignéras', linux-raid > -----Original Message----- > From: linux-raid-owner@vger.kernel.org [mailto:linux-raid- > owner@vger.kernel.org] On Behalf Of Pierre Vignéras > Sent: Sunday, May 16, 2010 10:41 AM > To: linux-raid@vger.kernel.org > Subject: mdadm: failed devices become spares! > > Hi, > > I encountered a critical problem with mdadm that I submitted to the Debian > mailing list (it's a debian lenny/stable). They asked me to submit this to > you. So that's what I do. > > To prevent duplication of description/information, I give you the URL of > that > bug description: > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=578352 > > If you prefer the full stuff to be copy/pasted to that mailing list, just > ask > for it. > > Note: that bug happened again today, on another RAID array. So the good > news > is that it is somewhat reproducible! The bad news, is that unless you have > a > magic solution, all my data are just lost (half of it was in the backup > pipe!)... > > Thanks for any help, and regards. > -- > Pierre Vignéras It's not quite clear to me from the link whether your drives are truly toast, or not. If they are, then you are hosed. Assuming not, then you need to use `mdadm --examine /dev/sdxx` and `mdadm -Dt /dev/mdyy` to determine precisely all the parameters and the order of the block devices in the array. You need the chunk size, the superblock type, which slot was occupied by each device in the array (this may not be the same as when the array was created), the size of the array (if it did not fill the entire partition in every case), the RAID level, etc. Once you are certain you have all the information to enable you to re-create the array, if need be, the try to re-assemble the array with `mdadm --assemble --force /dev/mdyy` If it works, then fsck the file system. (I think I noticed you are using XFS. If so, do not use XFS_Check. Instead, use XFS_Repair with the -n option.) After you have a clean file system, issue the command `echo repair > /sys/block/mdyy/md/sync_action` to re-sync the array. If the array does not assemble, then you will need to stop it and re-create it using the options you obtained from your research above and adding the --assume-clean switch to prevent a resync if something is wrong. If the fsck won't work after re-creating the array, then you probably got one or more of the parameters incorrect. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: mdadm: failed devices become spares! 2010-05-16 19:56 ` Leslie Rhorer @ 2010-05-17 18:10 ` Pierre Vignéras 2010-05-17 21:09 ` Tim Small 2010-05-18 1:30 ` Neil Brown 0 siblings, 2 replies; 12+ messages in thread From: Pierre Vignéras @ 2010-05-17 18:10 UTC (permalink / raw) To: Leslie Rhorer; +Cc: linux-raid On dimanche 16 mai 2010, Leslie Rhorer wrote: > > -----Original Message----- > > From: linux-raid-owner@vger.kernel.org [mailto:linux-raid- > > owner@vger.kernel.org] On Behalf Of Pierre Vignéras > > Sent: Sunday, May 16, 2010 10:41 AM > > To: linux-raid@vger.kernel.org > > Subject: mdadm: failed devices become spares! > > > > Hi, > > > > I encountered a critical problem with mdadm that I submitted to the > > Debian mailing list (it's a debian lenny/stable). They asked me to submit > > this to you. So that's what I do. > > > > To prevent duplication of description/information, I give you the URL of > > that > > bug description: > > > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=578352 > > > > If you prefer the full stuff to be copy/pasted to that mailing list, just > > ask > > for it. > > > > Note: that bug happened again today, on another RAID array. So the good > > news > > is that it is somewhat reproducible! The bad news, is that unless you > > have a > > magic solution, all my data are just lost (half of it was in the backup > > pipe!)... > > > > Thanks for any help, and regards. > > -- > > Pierre Vignéras > > It's not quite clear to me from the link whether your drives are > truly toast, or not. If they are, then you are hosed. Assuming not, then > you need to use > > `mdadm --examine /dev/sdxx` and `mdadm -Dt /dev/mdyy` > > to determine precisely all the parameters and the order of the block > devices in the array. You need the chunk size, the superblock type, which > slot was occupied by each device in the array (this may not be the same as > when the array was created), the size of the array (if it did not fill the > entire partition in every case), the RAID level, etc. Once you are certain > you have all the information to enable you to re-create the array, if need > be, the try to re-assemble the array with > > `mdadm --assemble --force /dev/mdyy` > > If it works, then fsck the file system. (I think I noticed you are > using XFS. If so, do not use XFS_Check. Instead, use XFS_Repair with the > -n option.) After you have a clean file system, issue the command > > `echo repair > /sys/block/mdyy/md/sync_action` > > to re-sync the array. If the array does not assemble, then you will > need to stop it and re-create it using the options you obtained from your > research above and adding the --assume-clean switch to prevent a resync if > something is wrong. If the fsck won't work after re-creating the array, > then you probably got one or more of the parameters incorrect. Thanks for your help. Here is what I did: # cat /proc/mdstat Personalities : [raid1] [raid10] [raid6] [raid5] [raid4] [...] md2 : inactive sdc1[2](S) sdd1[5](S) sdf1[4](S) sde1[3](S) 1250274304 blocks [...] # mdadm --examine /dev/sdc1 /dev/sdd1 /dev/sdf1 /dev/sde1 /dev/sdc1: Magic : a92b4efc Version : 00.90.00 UUID : b34f4192:f823df58:24bf28c1:396de87f (local to host phobos) Creation Time : Thu Aug 6 01:59:44 2009 Raid Level : raid10 Used Dev Size : 312568576 (298.09 GiB 320.07 GB) Array Size : 625137152 (596.18 GiB 640.14 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 2 Update Time : Tue Apr 13 19:22:21 2010 State : clean Internal Bitmap : present Active Devices : 2 Working Devices : 4 Failed Devices : 0 Spare Devices : 2 Checksum : 5baf7939 - correct Events : 90612 Layout : near=2, far=1 Chunk Size : 64K Number Major Minor RaidDevice State this 2 8 33 2 active sync /dev/sdc1 0 0 0 0 0 removed 1 1 0 0 1 faulty removed 2 2 8 33 2 active sync /dev/sdc1 3 3 8 65 3 active sync /dev/sde1 4 4 8 81 4 spare /dev/sdf1 5 5 8 49 5 spare /dev/sdd1 /dev/sdd1: Magic : a92b4efc Version : 00.90.00 UUID : b34f4192:f823df58:24bf28c1:396de87f (local to host phobos) Creation Time : Thu Aug 6 01:59:44 2009 Raid Level : raid10 Used Dev Size : 312568576 (298.09 GiB 320.07 GB) Array Size : 625137152 (596.18 GiB 640.14 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 2 Update Time : Tue Apr 13 19:22:21 2010 State : clean Internal Bitmap : present Active Devices : 2 Working Devices : 4 Failed Devices : 0 Spare Devices : 2 Checksum : 5baf7949 - correct Events : 90612 Layout : near=2, far=1 Chunk Size : 64K Number Major Minor RaidDevice State this 5 8 49 5 spare /dev/sdd1 0 0 0 0 0 removed 1 1 0 0 1 faulty removed 2 2 8 33 2 active sync /dev/sdc1 3 3 8 65 3 active sync /dev/sde1 4 4 8 81 4 spare /dev/sdf1 5 5 8 49 5 spare /dev/sdd1 /dev/sdf1: Magic : a92b4efc Version : 00.90.00 UUID : b34f4192:f823df58:24bf28c1:396de87f (local to host phobos) Creation Time : Thu Aug 6 01:59:44 2009 Raid Level : raid10 Used Dev Size : 312568576 (298.09 GiB 320.07 GB) Array Size : 625137152 (596.18 GiB 640.14 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 2 Update Time : Tue Apr 13 19:22:21 2010 State : clean Internal Bitmap : present Active Devices : 2 Working Devices : 4 Failed Devices : 0 Spare Devices : 2 Checksum : 5baf7967 - correct Events : 90612 Layout : near=2, far=1 Chunk Size : 64K Number Major Minor RaidDevice State this 4 8 81 4 spare /dev/sdf1 0 0 0 0 0 removed 1 1 0 0 1 faulty removed 2 2 8 33 2 active sync /dev/sdc1 3 3 8 65 3 active sync /dev/sde1 4 4 8 81 4 spare /dev/sdf1 5 5 8 49 5 spare /dev/sdd1 /dev/sde1: Magic : a92b4efc Version : 00.90.00 UUID : b34f4192:f823df58:24bf28c1:396de87f (local to host phobos) Creation Time : Thu Aug 6 01:59:44 2009 Raid Level : raid10 Used Dev Size : 312568576 (298.09 GiB 320.07 GB) Array Size : 625137152 (596.18 GiB 640.14 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 2 Update Time : Tue Apr 13 19:22:21 2010 State : clean Internal Bitmap : present Active Devices : 2 Working Devices : 4 Failed Devices : 0 Spare Devices : 2 Checksum : 5baf795b - correct Events : 90612 Layout : near=2, far=1 Chunk Size : 64K Number Major Minor RaidDevice State this 3 8 65 3 active sync /dev/sde1 0 0 0 0 0 removed 1 1 0 0 1 faulty removed 2 2 8 33 2 active sync /dev/sdc1 3 3 8 65 3 active sync /dev/sde1 4 4 8 81 4 spare /dev/sdf1 5 5 8 49 5 spare /dev/sdd1 # mdadm -Dt /dev/md2 mdadm: md device /dev/md2 does not appear to be active. phobos:~# # mdadm --assemble --force /dev/md2 mdadm: /dev/md2 assembled from 2 drives and 2 spares - not enough to start the array. # What I don't get, is how those devices /dev/sdf1 and /dev/sdd1 have been marked as spares after being marked as faulty! I never asked for it. As shown at the previous Debian Bug link (repeated here for your convenience): http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=578352 <bug description extract> ... Apr 12 20:10:02 phobos mdadm[3157]: Fail event detected on md device /dev/md2, component device /dev/sdf1 Apr 12 20:11:02 phobos mdadm[3157]: SpareActive event detected on md device /dev/md2, component device /dev/sdf1 Is that last line normal? It seems to me that this failed drive has been made a spare! (I really hope that I misunderstood something). Is it possible that the USB system (with its "plug'n play" sort-of feature) had made the behaviour of mdadm so strange? </bug> And the next question is: how to activate those 2 spare drives? I was expecting mdadm to use them automagically. Did I miss something, or is there something really strange happening there? Thanks again. -- Pierre Vignéras -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: mdadm: failed devices become spares! 2010-05-17 18:10 ` Pierre Vignéras @ 2010-05-17 21:09 ` Tim Small 2010-05-18 1:30 ` Neil Brown 1 sibling, 0 replies; 12+ messages in thread From: Tim Small @ 2010-05-17 21:09 UTC (permalink / raw) To: Pierre Vignéras; +Cc: Leslie Rhorer, linux-raid, 578352 Pierre Vignéras wrote: > And the next question is: how to activate those 2 spare drives? I was > expecting mdadm to use them automagically. > If you want to experiment with different ways of getting the data back, but without risking writing anything to the drives, you could do this: 1. Use dmsetup to create copy-on-write "virtual drives" which "see-through" to the content of your real drives, but don't risk writing anything at all to them. 2. Use mdadm --create --assume-clean ...blahblah... /dev/mapper/cow_drive_1 ..... to force mdadm to put the array back together the way you think it was (the output of examine will be useful here). You'll need to specify (at least - from memory): . stripe size . metadata version (this affects metadata location on the drives) . correct device order (with or without a single failed drive) ... after that you can run a read-only (or read-write) check on the COW md partition to verify that you've got your data back, then mount it read-only etc. Once you're happy that your commands are going to get things running again, you can run them "for real" on the non-COW devices. See the recent list archives for my post on using a similar set of commands for HW RAID data forensics, along with references.... HTH, Tim. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: mdadm: failed devices become spares! 2010-05-17 18:10 ` Pierre Vignéras 2010-05-17 21:09 ` Tim Small @ 2010-05-18 1:30 ` Neil Brown 2010-05-18 2:06 ` Neil Brown 2010-05-18 23:07 ` mdadm: failed devices become spares! Pierre Vignéras 1 sibling, 2 replies; 12+ messages in thread From: Neil Brown @ 2010-05-18 1:30 UTC (permalink / raw) To: Pierre Vignéras; +Cc: Leslie Rhorer, linux-raid On Mon, 17 May 2010 20:10:36 +0200 Pierre Vignéras <pierre@vigneras.name> wrote: > Did I miss something, or is there something really strange happening there? Something strange... I cannot explain the 'SpareActive' messages. Most of the rest makes sense. You had a RAID10 - 4 drives in near=2 mode. So the first two disks contain identical data, and the second two are also identical and contain the rest. The second device failed due to a write error. Why it seemed to become a spare I'm not sure. I'm not all sure it did become a spare immediately- your logs aren't conclusive on that point. It did eventually become a spare, but that could be because you "removed and added the devices" which would have changed them from 'fail' to 'spares'. Then the first device in the array reported an error and so was failed. After this you would not be able to read or write to the even chunks of the array, xfs noticed and complained. By this time sdf1 seemed to be a spare so it gave recovery a try. The recovery process discovered there was nowhere to read good data from and immediately gave up. However if the devices really are OK, then sdf1 and sdc1 should contain identical data (except the superblock would be slightly different. You could check this with "cmp -l", though that might not be very efficient. Also sdd1 and sde1 should be identical. I suggest that you try: mdadm -S /dev/md2 mdadm -C /dev/md2 -l 10 -n 4 -c 64 -e 0.90 /dev/sdc1 missing /dev/sdd1 missing --assume-clean and then see what the data on md2 looks like. You could equally try sdf1 in place of sdc1, or sde1 in place of sdd1 (make sure you double check the device names, don't assume I got then right). Once you have a combination that look good, you can add the other two devices an they will recover and you should have your data back. BUT be warned. Something cause some errors to be reported. Unless you find out what that was and fix it, errors will occur again. I have no idea what might have caused those errors. Bad media? bad controller ? bad usb controller? bad luck? I wouldn't write new data, or even perform a recovery until you are quite confident of the devices. NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: mdadm: failed devices become spares! 2010-05-18 1:30 ` Neil Brown @ 2010-05-18 2:06 ` Neil Brown 2010-05-18 22:25 ` MRK 2010-05-21 21:27 ` mdadm: failed devices become spares! -> Solved ! Pierre Vignéras 2010-05-18 23:07 ` mdadm: failed devices become spares! Pierre Vignéras 1 sibling, 2 replies; 12+ messages in thread From: Neil Brown @ 2010-05-18 2:06 UTC (permalink / raw) To: Neil Brown; +Cc: Pierre Vignéras, Leslie Rhorer, linux-raid On Tue, 18 May 2010 11:30:16 +1000 Neil Brown <neilb@suse.de> wrote: > On Mon, 17 May 2010 20:10:36 +0200 > Pierre Vignéras <pierre@vigneras.name> wrote: > > > Did I miss something, or is there something really strange happening there? > > Something strange... > I cannot explain the 'SpareActive' messages. Actually I can explain that I think. When a device fails it gets marked as faulty, then as soon as there is no more pending IO it gets moved out of the array. "mdadm -D" will show it with a larger 'Number' and a 'RaidDevice' of '-'. Normally these happen almost as a single operation, though a lot of pending IO can slow it down. "mdadm --monitor" identified devices based on 'Number', so it would normally see a working device disappear - which is reported a a failure, and a 'faulty/spare' device appear, which it ignores. However if --monitor gets to check the array between the above to events, it will first see that the working drive is now faulty, so it reports a failure, and then see that the faulty device isn't faulty any more and in fact isn't even there. The "isn't event there" bit doesn't register and it treats it as 'SpareActive'. I should fix that. So I'm quite sure now that your devices didn't really become spares until you removed and added them, which is exactly they way to turn failed devices into spares. NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: mdadm: failed devices become spares! 2010-05-18 2:06 ` Neil Brown @ 2010-05-18 22:25 ` MRK 2010-05-19 19:56 ` Simon Matthews 2010-05-21 21:00 ` Pierre Vignéras 2010-05-21 21:27 ` mdadm: failed devices become spares! -> Solved ! Pierre Vignéras 1 sibling, 2 replies; 12+ messages in thread From: MRK @ 2010-05-18 22:25 UTC (permalink / raw) Cc: linux-raid On 05/18/2010 04:06 AM, Neil Brown wrote: > However if --monitor gets to check the array between the above to events, it > will first see that the working drive is now faulty, so it reports a failure, > and then see that the faulty device isn't faulty any more and in fact isn't > even there. The "isn't event there" bit doesn't register and it treats it as > 'SpareActive'. > > I should fix that. > However in one case the two events are not detected in the same round: Apr 12 20:10:02 phobos mdadm[3157]: Fail event detected on md device /dev/md2, component device /dev/sdf1 Apr 12 20:11:02 phobos mdadm[3157]: SpareActive event detected on md device /dev/md2, component device /dev/sdf1 1 minute passes between the two entries. I suppose that's the mdadm daemon polling time. In the other case all the entries are at the same time Apr 13 08:00:02 phobos mdadm[3157]: Fail event detected on md device /dev/md2, component device /dev/sdd1 Apr 13 08:00:02 phobos mdadm[3157]: SpareActive event detected on md device /dev/md2, component device /dev/sdd1 Apr 13 08:00:02 phobos last message repeated 7 times [...many times that messages..] ...plus, in this second case the SpareActive triggers a lot of times within that same second (Pierre you cut it short, but are all the "many times that messages" all at the exact same time or they span a few seconds?) It looks to me like some kind of usb failure where the USB connection or USB bridge momentarily fails then immediately gets re-detected and re-added to the system. But since there are no usb entries in dmesg, that would also be an issue of the usb driver. Could the problem also be a mixture with some unwise udev triggers of Debian, maybe somehow causing the auto-re-add of the drive to the RAID? Pierre: - can you post your mdadm.conf? - USB is not good for RAID imho. Many times in my life I saw problems with USB/SATA bridges where the drive would get disconnected on high I/O activity and then reconnected after a few seconds. Anyway, readding it to the RAID shouldn't have happened. Also in my case there were "usb" entries in dmesg. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: mdadm: failed devices become spares! 2010-05-18 22:25 ` MRK @ 2010-05-19 19:56 ` Simon Matthews 2010-05-21 21:00 ` Pierre Vignéras 1 sibling, 0 replies; 12+ messages in thread From: Simon Matthews @ 2010-05-19 19:56 UTC (permalink / raw) To: MRK; +Cc: linux-raid On Tue, May 18, 2010 at 3:25 PM, MRK <mrk@shiftmail.org> wrote: > - USB is not good for RAID imho. I can second that. At one time I had a USB backup drive that was configured as half a RAID 1 set. This was so that the drive could immediately be used in the event of a massive failure of the file server. Pulling this USB drive before stopping the RAID device caused the machine to become unresponsive. I think it was trying to do some kind of I/O, all I know was that a hard boot was the only way I could get the machine out of that condition. Simon ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: mdadm: failed devices become spares! 2010-05-18 22:25 ` MRK 2010-05-19 19:56 ` Simon Matthews @ 2010-05-21 21:00 ` Pierre Vignéras 1 sibling, 0 replies; 12+ messages in thread From: Pierre Vignéras @ 2010-05-21 21:00 UTC (permalink / raw) To: MRK, linux-raid On mercredi 19 mai 2010, MRK wrote: > On 05/18/2010 04:06 AM, Neil Brown wrote: > > However if --monitor gets to check the array between the above to events, > > it will first see that the working drive is now faulty, so it reports a > > failure, and then see that the faulty device isn't faulty any more and in > > fact isn't even there. The "isn't event there" bit doesn't register and > > it treats it as 'SpareActive'. > > > > I should fix that. > > However in one case the two events are not detected in the same round: > > Apr 12 20:10:02 phobos mdadm[3157]: Fail event detected on md device > /dev/md2, component device /dev/sdf1 > Apr 12 20:11:02 phobos mdadm[3157]: SpareActive event detected on md device > /dev/md2, component device /dev/sdf1 > > > 1 minute passes between the two entries. I suppose that's the mdadm > daemon polling time. > > In the other case all the entries are at the same time > > Apr 13 08:00:02 phobos mdadm[3157]: Fail event detected on md device > /dev/md2, component device /dev/sdd1 > Apr 13 08:00:02 phobos mdadm[3157]: SpareActive event detected on md device > /dev/md2, component device /dev/sdd1 > Apr 13 08:00:02 phobos last message repeated 7 times > [...many times that messages..] > > > ...plus, in this second case the SpareActive triggers a lot of times > within that same second (Pierre you cut it short, but are all the "many > times that messages" all at the exact same time or they span a few > seconds?) Well I was probably tired when I tried to filter the log for the bug report. It seems that this 'last message repeated 7 times' is for the: Apr 13 08:00:02 phobos kernel: [5814019.208017] nfsd: non-standard errno: 5 not for the: Apr 13 08:00:02 phobos mdadm[3157]: SpareActive event detected on md device /dev/md2, component device /dev/sdd1 I looked into my log and can't find something else. Sorry, sorry, sorry if this led you to false conclusions. > It looks to me like some kind of usb failure where the USB connection or > USB bridge momentarily fails then immediately gets re-detected and > re-added to the system. But since there are no usb entries in dmesg, > that would also be an issue of the usb driver. Could the problem also be > a mixture with some unwise udev triggers of Debian, maybe somehow > causing the auto-re-add of the drive to the RAID? > > Pierre: > - can you post your mdadm.conf? Sure, but I am not sure it will be useful: $ cat /etc/mdadm/mdadm.conf # mdadm.conf # # Please refer to mdadm.conf(5) for information about this file. # # by default, scan all partitions (/proc/partitions) for MD superblocks. # alternatively, specify devices to scan, using wildcards if desired. DEVICE partitions # auto-create devices with Debian standard permissions CREATE owner=root group=disk mode=0660 auto=yes # automatically tag new arrays as belonging to the local system HOMEHOST <system> # instruct the monitoring daemon where to send mail alerts MAILADDR root # definitions of existing MD arrays ARRAY /dev/md0 level=raid1 num-devices=2 UUID=13f4fdef:db0bd815:77e02d4f:1bda00b4 ARRAY /dev/md1 level=raid1 num-devices=2 UUID=4a120782:2ed3053c:e99784b3:b8e5f7bf ARRAY /dev/md4 level=raid1 num-devices=2 UUID=b3c7212a:e95c5081:24bf28c1:396de87f ARRAY /dev/md2 level=raid10 num-devices=4 UUID=b34f4192:f823df58:24bf28c1:396de87f ARRAY /dev/md3 level=raid5 num-devices=3 UUID=e1f30f82:0999431b:24bf28c1:396de87f > - USB is not good for RAID imho. Many times in my life I saw problems > with USB/SATA bridges where the drive would get disconnected on high I/O > activity and then reconnected after a few seconds. Anyway, readding it > to the RAID shouldn't have happened. Also in my case there were "usb" > entries in dmesg. Well, that is what I discover: USB and RAID is not currently fine (hum, on Debian stable, not sure, we can say 'currently', kernel is: $ uname -a Linux phobos 2.6.26-2-686 #1 SMP Tue Mar 9 17:35:51 UTC 2010 i686 GNU/Linux $ ). Anyway, it would be a great feature if USB can be used for a RAID setup, at least for end users (actually, I am using in my setup, a "special" layout for the using of RAID on several heterogeneous drives that I described here: http://www.linuxconfig.org/prouhd-raid-for-the-end-user ) Thanks for your help and regards. -- Pierre Vignéras -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: mdadm: failed devices become spares! -> Solved ! 2010-05-18 2:06 ` Neil Brown 2010-05-18 22:25 ` MRK @ 2010-05-21 21:27 ` Pierre Vignéras 1 sibling, 0 replies; 12+ messages in thread From: Pierre Vignéras @ 2010-05-21 21:27 UTC (permalink / raw) To: linux-raid Sorry for the delay of my reply... This small mail to let you know that my RAID array is currently recovering thanks to the valuable inputs of this mailing list users. You are great! For the curious, what I did is the following: # ##### Do not forget the '--assume-clean' as I almost did! ;-( # mdadm -C /dev/md2 -l 10 -n 4 -c 64 -e 0.90 --assume-clean /dev/sdd1 missing /dev/sdc1 missing # vgchange -a y # xfs_repair -n -t 1 -v /dev/my-vg/my-lv # mount -o ro /dev/my-vg/my-lv /mnt/tmp # find /mnt/tmp # du -ks /mnt/tmp/ # umount /mnt/tmp # #### Required: XFS asked the log to get replayed # mount /dev/my-vg/my-lv /mnt/tmp/ # umount /mnt/tmp # xfs_repair -t 1 -v /dev/my-vg/my-lv # mdadm --manage /dev/md2 --add /dev/sde1 # mdadm --manage /dev/md2 --add /dev/sdf1 The array is currently at 25 % of the recovery process. A bit too soon to say that everything is fine... By the way, I am quite sure now that my USB controllers (or the use driver or whatever in the chain except all disks) are buggy: all the other RAIDs of my setup are gone! I will try to recover them using the same kind of process, to backup all data. Do you think that using BBR (since each time, the burden started due to a sector (write?) error), the problem will be "solved" (or at least postponed until BBR itself does not have enough free sectors)? Anyway, again, thanks a lot to all of you. Open Source rocks! ;-) -- Pierre Vignéras -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: mdadm: failed devices become spares! 2010-05-18 1:30 ` Neil Brown 2010-05-18 2:06 ` Neil Brown @ 2010-05-18 23:07 ` Pierre Vignéras 2010-05-19 1:45 ` Neil Brown 1 sibling, 1 reply; 12+ messages in thread From: Pierre Vignéras @ 2010-05-18 23:07 UTC (permalink / raw) To: Neil Brown; +Cc: Leslie Rhorer, linux-raid On mardi 18 mai 2010, Neil Brown wrote: > On Mon, 17 May 2010 20:10:36 +0200 > > Pierre Vignéras <pierre@vigneras.name> wrote: > > Did I miss something, or is there something really strange happening > > there? > > Something strange... > I cannot explain the 'SpareActive' messages. > Most of the rest makes sense. > > You had a RAID10 - 4 drives in near=2 mode. So the first two disks contain > identical data, and the second two are also identical and contain the rest. > The second device failed due to a write error. > Why it seemed to become a spare I'm not sure. I'm not all sure it did > become a spare immediately- your logs aren't conclusive on that point. > It did eventually become a spare, but that could be because you "removed > and added the devices" which would have changed them from 'fail' to > 'spares'. > > Then the first device in the array reported an error and so was failed. > After this you would not be able to read or write to the even chunks of the > array, xfs noticed and complained. > > By this time sdf1 seemed to be a spare so it gave recovery a try. The > recovery process discovered there was nowhere to read good data from and > immediately gave up. > > However if the devices really are OK, then sdf1 and sdc1 should contain > identical data (except the superblock would be slightly different. > You could check this with "cmp -l", though that might not be very > efficient. Also sdd1 and sde1 should be identical. Well, actually, here is what I have: phobos:~# mdadm --examine /dev/sd[c-f]1 /dev/sdc1: Magic : a92b4efc Version : 00.90.00 UUID : b34f4192:f823df58:24bf28c1:396de87f (local to host phobos) Creation Time : Thu Aug 6 01:59:44 2009 Raid Level : raid10 Used Dev Size : 312568576 (298.09 GiB 320.07 GB) Array Size : 625137152 (596.18 GiB 640.14 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 2 Update Time : Tue Apr 13 19:22:21 2010 State : clean Internal Bitmap : present Active Devices : 2 Working Devices : 4 Failed Devices : 0 Spare Devices : 2 Checksum : 5baf7939 - correct Events : 90612 Layout : near=2, far=1 Chunk Size : 64K Number Major Minor RaidDevice State this 2 8 33 2 active sync /dev/sdc1 0 0 0 0 0 removed 1 1 0 0 1 faulty removed 2 2 8 33 2 active sync /dev/sdc1 3 3 8 65 3 active sync /dev/sde1 4 4 8 81 4 spare /dev/sdf1 5 5 8 49 5 spare /dev/sdd1 /dev/sdd1: Magic : a92b4efc Version : 00.90.00 UUID : b34f4192:f823df58:24bf28c1:396de87f (local to host phobos) Creation Time : Thu Aug 6 01:59:44 2009 Raid Level : raid10 Used Dev Size : 312568576 (298.09 GiB 320.07 GB) Array Size : 625137152 (596.18 GiB 640.14 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 2 Update Time : Tue Apr 13 19:22:21 2010 State : clean Internal Bitmap : present Active Devices : 2 Working Devices : 4 Failed Devices : 0 Spare Devices : 2 Checksum : 5baf7949 - correct Events : 90612 Layout : near=2, far=1 Chunk Size : 64K Number Major Minor RaidDevice State this 5 8 49 5 spare /dev/sdd1 0 0 0 0 0 removed 1 1 0 0 1 faulty removed 2 2 8 33 2 active sync /dev/sdc1 3 3 8 65 3 active sync /dev/sde1 4 4 8 81 4 spare /dev/sdf1 5 5 8 49 5 spare /dev/sdd1 /dev/sde1: Magic : a92b4efc Version : 00.90.00 UUID : b34f4192:f823df58:24bf28c1:396de87f (local to host phobos) Creation Time : Thu Aug 6 01:59:44 2009 Raid Level : raid10 Used Dev Size : 312568576 (298.09 GiB 320.07 GB) Array Size : 625137152 (596.18 GiB 640.14 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 2 Update Time : Tue Apr 13 19:22:21 2010 State : clean Internal Bitmap : present Active Devices : 2 Working Devices : 4 Failed Devices : 0 Spare Devices : 2 Checksum : 5baf795b - correct Events : 90612 Layout : near=2, far=1 Chunk Size : 64K Number Major Minor RaidDevice State this 3 8 65 3 active sync /dev/sde1 0 0 0 0 0 removed 1 1 0 0 1 faulty removed 2 2 8 33 2 active sync /dev/sdc1 3 3 8 65 3 active sync /dev/sde1 4 4 8 81 4 spare /dev/sdf1 5 5 8 49 5 spare /dev/sdd1 /dev/sdf1: Magic : a92b4efc Version : 00.90.00 UUID : b34f4192:f823df58:24bf28c1:396de87f (local to host phobos) Creation Time : Thu Aug 6 01:59:44 2009 Raid Level : raid10 Used Dev Size : 312568576 (298.09 GiB 320.07 GB) Array Size : 625137152 (596.18 GiB 640.14 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 2 Update Time : Tue Apr 13 19:22:21 2010 State : clean Internal Bitmap : present Active Devices : 2 Working Devices : 4 Failed Devices : 0 Spare Devices : 2 Checksum : 5baf7967 - correct Events : 90612 Layout : near=2, far=1 Chunk Size : 64K Number Major Minor RaidDevice State this 4 8 81 4 spare /dev/sdf1 0 0 0 0 0 removed 1 1 0 0 1 faulty removed 2 2 8 33 2 active sync /dev/sdc1 3 3 8 65 3 active sync /dev/sde1 4 4 8 81 4 spare /dev/sdf1 5 5 8 49 5 spare /dev/sdd1 phobos:~# > I suggest that you try: > > mdadm -S /dev/md2 > mdadm -C /dev/md2 -l 10 -n 4 -c 64 -e 0.90 /dev/sdc1 missing /dev/sdd1 > missing --assume-clean > > and then see what the data on md2 looks like. > You could equally try sdf1 in place of sdc1, or sde1 in place of sdd1 > (make sure you double check the device names, don't assume I got then > right). So, I double checked the names. ;-) I first tried to get which devices where mirrors using cmp -l (thanks for that command I didn't know), and here is the (strange) result: phobos:~# time cmp -l /dev/sdc1 /dev/sdd1 > /tmp/cmp-sdc1-sdd1 ^C real 0m56.337s user 0m52.539s sys 0m3.016s phobos:~# time cmp -l /dev/sdc1 /dev/sde1 > /tmp/cmp-sdc1-sde1 ^C real 0m54.733s user 0m0.380s sys 0m7.688s phobos:~# time cmp -l /dev/sdc1 /dev/sdf1 > /tmp/cmp-sdc1-sdf1 ^C real 0m58.236s user 0m54.099s sys 0m3.216s phobos:~# time cmp -l /dev/sdd1 /dev/sde1 > /tmp/cmp-sdd1-sde1 ^C real 0m57.932s user 0m53.063s sys 0m3.284s phobos:~# time cmp -l /dev/sdd1 /dev/sdf1 > /tmp/cmp-sdd1-sdf1 ^C real 0m58.882s user 0m26.486s sys 0m6.152s phobos:~# time cmp -l /dev/sde1 /dev/sdf1 > /tmp/cmp-sde1-sdf1 ^C real 0m57.996s user 0m49.639s sys 0m3.100s phobos:~# ls -lh /tmp/cmp-sd* -rw-r--r-- 1 root root 954M 2010-05-19 00:23 /tmp/cmp-sdc1-sdd1 -rw-r--r-- 1 root root 0 2010-05-19 00:25 /tmp/cmp-sdc1-sde1 -rw-r--r-- 1 root root 982M 2010-05-19 00:27 /tmp/cmp-sdc1-sdf1 -rw-r--r-- 1 root root 964M 2010-05-19 00:28 /tmp/cmp-sdd1-sde1 -rw-r--r-- 1 root root 466M 2010-05-19 00:30 /tmp/cmp-sdd1-sdf1 -rw-r--r-- 1 root root 872M 2010-05-19 00:31 /tmp/cmp-sde1-sdf1 phobos:~# Therefore, as far as I understand, /dev/sdc1 does not hold the same data as /dev/sdd1 nor /dev/sdf1. Even if this short ~ 1 minute test does not prove anything, there is quite a good probability that /dev/sdc1 and /dev/sde1 was mirrors at some time. What should be considered strange? That sdc1 contains exactly the same content than sde1 on that 1 minute scan or that sdd1 and sdf1 are so different (~ 500 MB/1min) ? Therefore, I am not sure that the command you suggested is the good one: mdadm -C /dev/md2 -l 10 -n 4 -c 64 -e 0.90 /dev/sdc1 missing /dev/sdd1 missing --assume-clean It seems that I only have half the data for sure (sdc1 and sde1), but I don't know what is the other good part (sdd1 or sdf1)... Is there any way to know? According to this information, can you confirm that the above command is the one I should execute? > BUT be warned. Something cause some errors to be reported. Unless you > find out what that was and fix it, errors will occur again. I have no > idea what might have caused those errors. Bad media? bad controller ? bad > usb controller? bad luck? Well, all of those maybe! Anyway, I will consider using BBR. I have the feeling that on such mass market USB drives of 1TB, even the internal "hardware" BBR is not sufficient. There are too much errors (at least that is what my log suggests me)... It's a shame that BBR is not well documented and not as easy to set up using mdadm than using EVMS. > I wouldn't write new data, or even perform a recovery until you are quite > confident of the devices. Sure. > NeilBrown Again, thanks a lot! -- Pierre Vignéras -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: mdadm: failed devices become spares! 2010-05-18 23:07 ` mdadm: failed devices become spares! Pierre Vignéras @ 2010-05-19 1:45 ` Neil Brown 0 siblings, 0 replies; 12+ messages in thread From: Neil Brown @ 2010-05-19 1:45 UTC (permalink / raw) To: Pierre Vignéras; +Cc: Leslie Rhorer, linux-raid On Wed, 19 May 2010 01:07:40 +0200 Pierre Vignéras <pierre@vigneras.name> wrote: > On mardi 18 mai 2010, Neil Brown wrote: > > On Mon, 17 May 2010 20:10:36 +0200 > > > > Pierre Vignéras <pierre@vigneras.name> wrote: > > > Did I miss something, or is there something really strange happening > > > there? > > > > Something strange... > > I cannot explain the 'SpareActive' messages. > > Most of the rest makes sense. > > > > You had a RAID10 - 4 drives in near=2 mode. So the first two disks contain > > identical data, and the second two are also identical and contain the rest. > > The second device failed due to a write error. > > Why it seemed to become a spare I'm not sure. I'm not all sure it did > > become a spare immediately- your logs aren't conclusive on that point. > > It did eventually become a spare, but that could be because you "removed > > and added the devices" which would have changed them from 'fail' to > > 'spares'. > > > > Then the first device in the array reported an error and so was failed. > > After this you would not be able to read or write to the even chunks of the > > array, xfs noticed and complained. > > > > By this time sdf1 seemed to be a spare so it gave recovery a try. The > > recovery process discovered there was nowhere to read good data from and > > immediately gave up. > > > > However if the devices really are OK, then sdf1 and sdc1 should contain > > identical data (except the superblock would be slightly different. > > You could check this with "cmp -l", though that might not be very > > efficient. Also sdd1 and sde1 should be identical. > > Well, actually, here is what I have: > > phobos:~# mdadm --examine /dev/sd[c-f]1 > /dev/sdc1: > Magic : a92b4efc > Version : 00.90.00 > UUID : b34f4192:f823df58:24bf28c1:396de87f (local to host phobos) > Creation Time : Thu Aug 6 01:59:44 2009 > Raid Level : raid10 > Used Dev Size : 312568576 (298.09 GiB 320.07 GB) > Array Size : 625137152 (596.18 GiB 640.14 GB) > Raid Devices : 4 > Total Devices : 4 > Preferred Minor : 2 > > Update Time : Tue Apr 13 19:22:21 2010 > State : clean > Internal Bitmap : present > Active Devices : 2 > Working Devices : 4 > Failed Devices : 0 > Spare Devices : 2 > Checksum : 5baf7939 - correct > Events : 90612 > > Layout : near=2, far=1 > Chunk Size : 64K > > Number Major Minor RaidDevice State > this 2 8 33 2 active sync /dev/sdc1 > > 0 0 0 0 0 removed > 1 1 0 0 1 faulty removed > 2 2 8 33 2 active sync /dev/sdc1 > 3 3 8 65 3 active sync /dev/sde1 > 4 4 8 81 4 spare /dev/sdf1 > 5 5 8 49 5 spare /dev/sdd1 > /dev/sdd1: > Magic : a92b4efc > Version : 00.90.00 > UUID : b34f4192:f823df58:24bf28c1:396de87f (local to host phobos) > Creation Time : Thu Aug 6 01:59:44 2009 > Raid Level : raid10 > Used Dev Size : 312568576 (298.09 GiB 320.07 GB) > Array Size : 625137152 (596.18 GiB 640.14 GB) > Raid Devices : 4 > Total Devices : 4 > Preferred Minor : 2 > > Update Time : Tue Apr 13 19:22:21 2010 > State : clean > Internal Bitmap : present > Active Devices : 2 > Working Devices : 4 > Failed Devices : 0 > Spare Devices : 2 > Checksum : 5baf7949 - correct > Events : 90612 > > Layout : near=2, far=1 > Chunk Size : 64K > > Number Major Minor RaidDevice State > this 5 8 49 5 spare /dev/sdd1 > > 0 0 0 0 0 removed > 1 1 0 0 1 faulty removed > 2 2 8 33 2 active sync /dev/sdc1 > 3 3 8 65 3 active sync /dev/sde1 > 4 4 8 81 4 spare /dev/sdf1 > 5 5 8 49 5 spare /dev/sdd1 > /dev/sde1: > Magic : a92b4efc > Version : 00.90.00 > UUID : b34f4192:f823df58:24bf28c1:396de87f (local to host phobos) > Creation Time : Thu Aug 6 01:59:44 2009 > Raid Level : raid10 > Used Dev Size : 312568576 (298.09 GiB 320.07 GB) > Array Size : 625137152 (596.18 GiB 640.14 GB) > Raid Devices : 4 > Total Devices : 4 > Preferred Minor : 2 > > Update Time : Tue Apr 13 19:22:21 2010 > State : clean > Internal Bitmap : present > Active Devices : 2 > Working Devices : 4 > Failed Devices : 0 > Spare Devices : 2 > Checksum : 5baf795b - correct > Events : 90612 > > Layout : near=2, far=1 > Chunk Size : 64K > > Number Major Minor RaidDevice State > this 3 8 65 3 active sync /dev/sde1 > > 0 0 0 0 0 removed > 1 1 0 0 1 faulty removed > 2 2 8 33 2 active sync /dev/sdc1 > 3 3 8 65 3 active sync /dev/sde1 > 4 4 8 81 4 spare /dev/sdf1 > 5 5 8 49 5 spare /dev/sdd1 > /dev/sdf1: > Magic : a92b4efc > Version : 00.90.00 > UUID : b34f4192:f823df58:24bf28c1:396de87f (local to host phobos) > Creation Time : Thu Aug 6 01:59:44 2009 > Raid Level : raid10 > Used Dev Size : 312568576 (298.09 GiB 320.07 GB) > Array Size : 625137152 (596.18 GiB 640.14 GB) > Raid Devices : 4 > Total Devices : 4 > Preferred Minor : 2 > > Update Time : Tue Apr 13 19:22:21 2010 > State : clean > Internal Bitmap : present > Active Devices : 2 > Working Devices : 4 > Failed Devices : 0 > Spare Devices : 2 > Checksum : 5baf7967 - correct > Events : 90612 > > Layout : near=2, far=1 > Chunk Size : 64K > > Number Major Minor RaidDevice State > this 4 8 81 4 spare /dev/sdf1 > > 0 0 0 0 0 removed > 1 1 0 0 1 faulty removed > 2 2 8 33 2 active sync /dev/sdc1 > 3 3 8 65 3 active sync /dev/sde1 > 4 4 8 81 4 spare /dev/sdf1 > 5 5 8 49 5 spare /dev/sdd1 > phobos:~# > > > I suggest that you try: > > > > mdadm -S /dev/md2 > > mdadm -C /dev/md2 -l 10 -n 4 -c 64 -e 0.90 /dev/sdc1 missing /dev/sdd1 > > missing --assume-clean > > > > and then see what the data on md2 looks like. > > You could equally try sdf1 in place of sdc1, or sde1 in place of sdd1 > > (make sure you double check the device names, don't assume I got then > > right). > > So, I double checked the names. ;-) > > I first tried to get which devices where mirrors using cmp -l (thanks for > that command I didn't know), and here is the (strange) result: > > phobos:~# time cmp -l /dev/sdc1 /dev/sdd1 > /tmp/cmp-sdc1-sdd1 > ^C > > real 0m56.337s > user 0m52.539s > sys 0m3.016s > phobos:~# time cmp -l /dev/sdc1 /dev/sde1 > /tmp/cmp-sdc1-sde1 > ^C > > real 0m54.733s > user 0m0.380s > sys 0m7.688s > phobos:~# time cmp -l /dev/sdc1 /dev/sdf1 > /tmp/cmp-sdc1-sdf1 > ^C > > real 0m58.236s > user 0m54.099s > sys 0m3.216s > phobos:~# time cmp -l /dev/sdd1 /dev/sde1 > /tmp/cmp-sdd1-sde1 > ^C > > real 0m57.932s > user 0m53.063s > sys 0m3.284s > phobos:~# time cmp -l /dev/sdd1 /dev/sdf1 > /tmp/cmp-sdd1-sdf1 > ^C > > real 0m58.882s > user 0m26.486s > sys 0m6.152s > phobos:~# time cmp -l /dev/sde1 /dev/sdf1 > /tmp/cmp-sde1-sdf1 > ^C > > real 0m57.996s > user 0m49.639s > sys 0m3.100s > phobos:~# ls -lh /tmp/cmp-sd* > -rw-r--r-- 1 root root 954M 2010-05-19 00:23 /tmp/cmp-sdc1-sdd1 > -rw-r--r-- 1 root root 0 2010-05-19 00:25 /tmp/cmp-sdc1-sde1 > -rw-r--r-- 1 root root 982M 2010-05-19 00:27 /tmp/cmp-sdc1-sdf1 > -rw-r--r-- 1 root root 964M 2010-05-19 00:28 /tmp/cmp-sdd1-sde1 > -rw-r--r-- 1 root root 466M 2010-05-19 00:30 /tmp/cmp-sdd1-sdf1 > -rw-r--r-- 1 root root 872M 2010-05-19 00:31 /tmp/cmp-sde1-sdf1 > phobos:~# The fact that sdc1 appear to have the same content as sde1 perfectly matches the fact that these two devices think the are devices "2" and "3" in the array, so they still contain half of your data. This is good. The fact that sdf1 appears to match sdd1 partly but not completely suggests that they were devices "0" and "1", but that one of them has had other stuff written to it. It is hard to know based on available information which is the case. > > Therefore, as far as I understand, /dev/sdc1 does not hold the same data as > /dev/sdd1 nor /dev/sdf1. Even if this short ~ 1 minute test does not prove > anything, there is quite a good probability that /dev/sdc1 and /dev/sde1 was > mirrors at some time. > > What should be considered strange? That sdc1 contains exactly the same content > than sde1 on that 1 minute scan or that sdd1 and sdf1 are so different (~ 500 > MB/1min) ? > > Therefore, I am not sure that the command you suggested is the good one: > > mdadm -C /dev/md2 -l 10 -n 4 -c 64 -e 0.90 /dev/sdc1 missing /dev/sdd1 missing > --assume-clean > > It seems that I only have half the data for sure (sdc1 and sde1), but I don't > know what is the other good part (sdd1 or sdf1)... Is there any way to know? The way to find out is to try and see. If you create an array following the above pattern it will not change any data on the devices, just the superblock, which you have a record of in this email now. So you should try creating an array, run "fsck -n" and see if the filesystem looks OK. If it does, mount ( -o ro ) and see what it looks like. Then try the other possibility and see how that compares. Given the current names of devices, the list given to the mdadm command should be: /dev/sdd1 missing /dev/sdc1 missing or /dev/sdf1 missing /dev/sdc1 missing Hopefully one of those will mount and fsck successfully. NeilBrown > > According to this information, can you confirm that the above command is the > one I should execute? > > > BUT be warned. Something cause some errors to be reported. Unless you > > find out what that was and fix it, errors will occur again. I have no > > idea what might have caused those errors. Bad media? bad controller ? bad > > usb controller? bad luck? > > Well, all of those maybe! Anyway, I will consider using BBR. I have the > feeling that on such mass market USB drives of 1TB, even the internal > "hardware" BBR is not sufficient. There are too much errors (at least that is > what my log suggests me)... It's a shame that BBR is not well documented and > not as easy to set up using mdadm than using EVMS. > > > I wouldn't write new data, or even perform a recovery until you are quite > > confident of the devices. > > Sure. > > > NeilBrown > > Again, thanks a lot! > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2010-05-21 21:27 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-05-16 15:40 mdadm: failed devices become spares! Pierre Vignéras 2010-05-16 19:56 ` Leslie Rhorer 2010-05-17 18:10 ` Pierre Vignéras 2010-05-17 21:09 ` Tim Small 2010-05-18 1:30 ` Neil Brown 2010-05-18 2:06 ` Neil Brown 2010-05-18 22:25 ` MRK 2010-05-19 19:56 ` Simon Matthews 2010-05-21 21:00 ` Pierre Vignéras 2010-05-21 21:27 ` mdadm: failed devices become spares! -> Solved ! Pierre Vignéras 2010-05-18 23:07 ` mdadm: failed devices become spares! Pierre Vignéras 2010-05-19 1:45 ` Neil Brown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).