* Two Drive Failure on RAID-5 @ 2008-05-19 22:49 Cry 2008-05-20 7:37 ` David Greaves 2008-05-20 9:14 ` David Greaves 0 siblings, 2 replies; 19+ messages in thread From: Cry @ 2008-05-19 22:49 UTC (permalink / raw) To: linux-raid Folks, I had a drive fail on my 6 drive raid-5 array. while syncing in the replacement drive (11 percent complete) a second drive went bad. Any suggestions to recover as much data as possible from the array? Joel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Two Drive Failure on RAID-5 2008-05-19 22:49 Two Drive Failure on RAID-5 Cry @ 2008-05-20 7:37 ` David Greaves 2008-05-20 15:32 ` Cry 2008-05-20 9:14 ` David Greaves 1 sibling, 1 reply; 19+ messages in thread From: David Greaves @ 2008-05-20 7:37 UTC (permalink / raw) To: Cry; +Cc: linux-raid Yep. Don't panic and don't do anything else yet if you're not confident about what you're doing. I'll follow up with more info in a short while. Info you can provide: kernel version mdadm version cat /proc/mdstat mdadm --examine /dev/sd[abcdef]1 (or whatever your array components are) relevant smartctl info on the bad drive(s) dmesg info about the drive failures Assuming genuine hardware failure: Do you have any spare drives that you can use to replace the components? David Cry wrote: > Folks, > > I had a drive fail on my 6 drive raid-5 array. while syncing in the replacement > drive (11 percent complete) a second drive went bad. > > Any suggestions to recover as much data as possible from the array? > > Joel > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Two Drive Failure on RAID-5 2008-05-20 7:37 ` David Greaves @ 2008-05-20 15:32 ` Cry 2008-05-20 17:18 ` David Lethe ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: Cry @ 2008-05-20 15:32 UTC (permalink / raw) To: linux-raid David Greaves <david <at> dgreaves.com> writes: > > Yep. Don't panic and don't do anything else yet if you're not confident about > what you're doing. > > I'll follow up with more info in a short while. > > Info you can provide: > kernel version > mdadm version > cat /proc/mdstat > mdadm --examine /dev/sd[abcdef]1 (or whatever your array components are) > relevant smartctl info on the bad drive(s) > dmesg info about the drive failures > > Assuming genuine hardware failure: > Do you have any spare drives that you can use to replace the components? > > David Thanks for the info. I was able to do a --force --assemble on the array and I copied off my most critical data. At the moment, I don't have enough drives to take all the data on the array, so I'm going to be at a bit of a standstill until new hardware arrives. Since the copy of that data (about 500Gig of about 2TB) went so well, I decided to try to sync up the spare again and it died at the same point and the raid system pulled down the array. I'm trying to decide if I should follow your suggestion in sister post to copy the failed drive onto my spare or if I should just format the spare and try to recover another 500 gig of data of the array. Is there a mdadm or other command to tell the raid system to stay up in the face of errors? Can the array be assembled in a way that doesn't change the array in any way (completely read-only)? I've got the older failed drive also (about 15 hours older). Can that be leveraged also? The server isn't networked right now, but I'll try to get the above requested logs tonight. By the way, I'm thinking about buying five of these: Seagate Barracuda 7200.11 1TB ST31000340AS SATA-II 32MB Cache and one of these: Supermicro SUPERMICRO CSE-M35T-1 Hot-Swapable SATA HDD Enclosure http://www.supermicro.com/products/accessories/mobilerack/CSE-M35T-1.cfm and building a raid-6 array. I'll convert the surviving drives into a backup for the primary array. Any feedback on the above? Is there a suggestion on an inexpensive controller to give more SATA ports that is very software raid compatible? Any suggestions for optimal configuration (ext3) and tuning for the new array? My load consists of serving a photo gallery via apache and gallery2 as well as a local media (audio/video) server so files sizes tend to be large. Thanks, Joel ^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: Re: Two Drive Failure on RAID-5 2008-05-20 15:32 ` Cry @ 2008-05-20 17:18 ` David Lethe 2008-05-20 19:01 ` Cry 2008-05-20 19:40 ` Janos Haar 2008-05-20 17:27 ` David Lethe 2008-05-20 19:28 ` Brad Campbell 2 siblings, 2 replies; 19+ messages in thread From: David Lethe @ 2008-05-20 17:18 UTC (permalink / raw) To: Cry, linux-raid -----Original Message----- From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Cry Sent: Tuesday, May 20, 2008 10:32 AM To: linux-raid@vger.kernel.org Subject: Re: Two Drive Failure on RAID-5 David Greaves <david <at> dgreaves.com> writes: > > Yep. Don't panic and don't do anything else yet if you're not confident about > what you're doing. > > I'll follow up with more info in a short while. > > Info you can provide: > kernel version > mdadm version > cat /proc/mdstat > mdadm --examine /dev/sd[abcdef]1 (or whatever your array components are) > relevant smartctl info on the bad drive(s) > dmesg info about the drive failures > > Assuming genuine hardware failure: > Do you have any spare drives that you can use to replace the components? > > David Thanks for the info. I was able to do a --force --assemble on the array and I copied off my most critical data. At the moment, I don't have enough drives to take all the data on the array, so I'm going to be at a bit of a standstill until new hardware arrives. Since the copy of that data (about 500Gig of about 2TB) went so well, I decided to try to sync up the spare again and it died at the same point and the raid system pulled down the array. I'm trying to decide if I should follow your suggestion in sister post to copy the failed drive onto my spare or if I should just format the spare and try to recover another 500 gig of data of the array. Is there a mdadm or other command to tell the raid system to stay up in the face of errors? Can the array be assembled in a way that doesn't change the array in any way (completely read-only)? I've got the older failed drive also (about 15 hours older). Can that be leveraged also? The server isn't networked right now, but I'll try to get the above requested logs tonight. By the way, I'm thinking about buying five of these: Seagate Barracuda 7200.11 1TB ST31000340AS SATA-II 32MB Cache and one of these: Supermicro SUPERMICRO CSE-M35T-1 Hot-Swapable SATA HDD Enclosure http://www.supermicro.com/products/accessories/mobilerack/CSE-M35T-1.cfm and building a raid-6 array. I'll convert the surviving drives into a backup for the primary array. Any feedback on the above? Is there a suggestion on an inexpensive controller to give more SATA ports that is very software raid compatible? Any suggestions for optimal configuration (ext3) and tuning for the new array? My load consists of serving a photo gallery via apache and gallery2 as well as a local media (audio/video) server so files sizes tend to be large. Thanks, Joel =============== Joel: Respectfully .. are you nuts??? Don't buy the 7200.11 disks. You bought a bunch of desktop class drives, and they crapped out on you, and you are about to make the same mistake again. Get the server class disk that is designed to run 24x7 duty cycle, which in your case would be the 'cuda ES.2 Sorry about the soapbox, but it never ceases to amaze me how people try to save by buying disk drives architected with lowest possible cost in mind, and don't investigate the higher-quality disks that are designed for extended reliability and data integrity. David ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Re: Two Drive Failure on RAID-5 2008-05-20 17:18 ` David Lethe @ 2008-05-20 19:01 ` Cry 2008-05-20 20:09 ` David Lethe 2008-05-20 19:40 ` Janos Haar 1 sibling, 1 reply; 19+ messages in thread From: Cry @ 2008-05-20 19:01 UTC (permalink / raw) To: linux-raid David Lethe <david <at> santools.com> writes: > Cry wrote: >> By the way, I'm thinking about buying five of these: >> >> Seagate Barracuda 7200.11 1TB ST31000340AS SATA-II 32MB Cache >> >> and one of these: >> >> Supermicro SUPERMICRO CSE-M35T-1 Hot-Swapable SATA HDD Enclosure >> >> http://www.supermicro.com/products/accessories/mobilerack/CSE-M35T-1.cfm >> >> Any feedback on the above? Is there a suggestion on an >> inexpensive controller to give more SATA ports that is very software >> raid compatible? > > Respectfully .. are you nuts??? Probably, thats why I asked for a sanity check. I had originally gotten a batch of six extremely inexpensive WD 500GB drives. I have now had 4 of those six drives go tango uniform since May of 07. None of the replacement drives I've purchased (samsung, hitachi) have reported even a single SMART error. For my personal systems these are the first drives I've had crash in 20 years. > Don't buy the 7200.11 disks. You bought a bunch of desktop class > drives, and they crapped out on you, and you are about to make the same > mistake again. Get the server class disk that is designed to run 24x7 > duty cycle, which in your case would be the 'cuda ES.2 If I go with the 'cuda ES.2 is that enough risk management to stick with a raid-5 arrangement? I am doing this on my own dime so if I can go with four drives now instead of five it would pay for the increased drive grade. > Sorry about the soapbox, but it never ceases to amaze me how people try > to save by buying disk drives architected with lowest possible cost in > mind, and don't investigate the higher-quality disks that are designed > for extended reliability and data integrity. That is the RAID meme. If you have the redundancy why spend money on the fancy drives? On the other hand, four drives crashing has cost me about $500 dollars in replacement drives and lots of time. Always looking for an angle ;-) ^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: Re: Re: Two Drive Failure on RAID-5 2008-05-20 19:01 ` Cry @ 2008-05-20 20:09 ` David Lethe 2008-05-20 23:11 ` Keith Roberts 0 siblings, 1 reply; 19+ messages in thread From: David Lethe @ 2008-05-20 20:09 UTC (permalink / raw) To: Cry, linux-raid Here is a good analogy that puts this in perspective. I haven't seen anybody equate the two yet, so get the name right if you quote this ;) Disk drives are like light bulbs. You can buy the server class (similar to CFLs), or desktop (incandescent). If you don't mind the dark, replace them as they fail, and buy spares as they go on sale. Conversely, if you have to maintain a vaulted ceiling chandelier, and are afraid of heights, then spending twice as much for never having to deal with *THAT* again will seem like a bargain. - David Lethe -----Original Message----- From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Cry Sent: Tuesday, May 20, 2008 2:01 PM To: linux-raid@vger.kernel.org Subject: Re: Re: Two Drive Failure on RAID-5 David Lethe <david <at> santools.com> writes: > Cry wrote: >> By the way, I'm thinking about buying five of these: >> >> Seagate Barracuda 7200.11 1TB ST31000340AS SATA-II 32MB Cache >> >> and one of these: >> >> Supermicro SUPERMICRO CSE-M35T-1 Hot-Swapable SATA HDD Enclosure >> >> http://www.supermicro.com/products/accessories/mobilerack/CSE-M35T-1.cfm >> >> Any feedback on the above? Is there a suggestion on an >> inexpensive controller to give more SATA ports that is very software >> raid compatible? > > Respectfully .. are you nuts??? Probably, thats why I asked for a sanity check. I had originally gotten a batch of six extremely inexpensive WD 500GB drives. I have now had 4 of those six drives go tango uniform since May of 07. None of the replacement drives I've purchased (samsung, hitachi) have reported even a single SMART error. For my personal systems these are the first drives I've had crash in 20 years. > Don't buy the 7200.11 disks. You bought a bunch of desktop class > drives, and they crapped out on you, and you are about to make the same > mistake again. Get the server class disk that is designed to run 24x7 > duty cycle, which in your case would be the 'cuda ES.2 If I go with the 'cuda ES.2 is that enough risk management to stick with a raid-5 arrangement? I am doing this on my own dime so if I can go with four drives now instead of five it would pay for the increased drive grade. > Sorry about the soapbox, but it never ceases to amaze me how people try > to save by buying disk drives architected with lowest possible cost in > mind, and don't investigate the higher-quality disks that are designed > for extended reliability and data integrity. That is the RAID meme. If you have the redundancy why spend money on the fancy drives? On the other hand, four drives crashing has cost me about $500 dollars in replacement drives and lots of time. Always looking for an angle ;-) -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: Re: Re: Two Drive Failure on RAID-5 2008-05-20 20:09 ` David Lethe @ 2008-05-20 23:11 ` Keith Roberts 0 siblings, 0 replies; 19+ messages in thread From: Keith Roberts @ 2008-05-20 23:11 UTC (permalink / raw) To: linux-raid On Tue, 20 May 2008, David Lethe wrote: > To: Cry <cry_regarder@yahoo.com>, linux-raid@vger.kernel.org > From: David Lethe <david@santools.com> > Subject: RE: Re: Re: Two Drive Failure on RAID-5 > > Here is a good analogy that puts this in perspective. I haven't seen > anybody equate the two yet, so get the name right if you quote this ;) > > Disk drives are like light bulbs. You can buy the server class (similar > to CFLs), or desktop (incandescent). If you don't mind the dark, > replace them as they fail, and buy spares as they go on sale. > Conversely, if you have to maintain a vaulted ceiling chandelier, and > are afraid of heights, then spending twice as much for never having to > deal with *THAT* again will seem like a bargain. > > - David Lethe So are there such things as server class EIDE drives? Or are they all SCSI or SATA? Keith ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Re: Two Drive Failure on RAID-5 2008-05-20 17:18 ` David Lethe 2008-05-20 19:01 ` Cry @ 2008-05-20 19:40 ` Janos Haar 1 sibling, 0 replies; 19+ messages in thread From: Janos Haar @ 2008-05-20 19:40 UTC (permalink / raw) To: David Lethe, cry_regarder; +Cc: linux-raid ----- Original Message ----- From: "David Lethe" <david@santools.com> To: "Cry" <cry_regarder@yahoo.com>; <linux-raid@vger.kernel.org> Sent: Tuesday, May 20, 2008 7:18 PM Subject: RE: Re: Two Drive Failure on RAID-5 > > > -----Original Message----- > From: linux-raid-owner@vger.kernel.org > [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Cry > Sent: Tuesday, May 20, 2008 10:32 AM > To: linux-raid@vger.kernel.org > Subject: Re: Two Drive Failure on RAID-5 > > David Greaves <david <at> dgreaves.com> writes: > >> >> Yep. Don't panic and don't do anything else yet if you're not > confident about >> what you're doing. >> >> I'll follow up with more info in a short while. >> >> Info you can provide: >> kernel version >> mdadm version >> cat /proc/mdstat >> mdadm --examine /dev/sd[abcdef]1 (or whatever your array components > are) >> relevant smartctl info on the bad drive(s) >> dmesg info about the drive failures >> >> Assuming genuine hardware failure: >> Do you have any spare drives that you can use to replace the > components? >> >> David > > Thanks for the info. I was able to do a --force --assemble on the array > and I > copied off my most critical data. At the moment, I don't have enough > drives to > take all the data on the array, so I'm going to be at a bit of a > standstill > until new hardware arrives. > > Since the copy of that data (about 500Gig of about 2TB) went so well, I > decided > to try to sync up the spare again and it died at the same point and the > raid > system pulled down the array. I'm trying to decide if I should follow > your > suggestion in sister post to copy the failed drive onto my spare or if I > should > just format the spare and try to recover another 500 gig of data of the > array. > > Is there a mdadm or other command to tell the raid system to stay up in > the face > of errors? Can the array be assembled in a way that doesn't change the > array in > any way (completely read-only)? > > I've got the older failed drive also (about 15 hours older). Can that > be > leveraged also? > > The server isn't networked right now, but I'll try to get the above > requested > logs tonight. > > By the way, I'm thinking about buying five of these: > > Seagate Barracuda 7200.11 1TB ST31000340AS SATA-II 32MB Cache > > and one of these: > > Supermicro SUPERMICRO CSE-M35T-1 Hot-Swapable SATA HDD Enclosure > > http://www.supermicro.com/products/accessories/mobilerack/CSE-M35T-1.cfm > > and building a raid-6 array. I'll convert the surviving drives into a > backup > for the primary array. Any feedback on the above? Is there a > suggestion on an > inexpensive controller to give more SATA ports that is very software > raid > compatible? > > Any suggestions for optimal configuration (ext3) and tuning for the new > array? > My load consists of serving a photo gallery via apache and gallery2 as > well as a > local media (audio/video) server so files sizes tend to be large. > > Thanks, > > Joel > =============== > Joel: > > Respectfully .. are you nuts??? > > Don't buy the 7200.11 disks. You bought a bunch of desktop class > drives, and they crapped out on you, and you are about to make the same > mistake again. Get the server class disk that is designed to run 24x7 > duty cycle, which in your case would be the 'cuda ES.2 > > Sorry about the soapbox, but it never ceases to amaze me how people try > to save by buying disk drives architected with lowest possible cost in > mind, and don't investigate the higher-quality disks that are designed > for extended reliability and data integrity. > > David David and Joel, Let me remember you to the power supply! This is really important too! The 24x7 cycle systems need a good quality PS and cables, connectors for hdd. One poor (Y) cable, or connector can make easy 1-2 or more failed drives at a same time! The SMART can monitor the actual state, but can not monitor the bad connection and/or noise on the voltage. Cheers, Janos > > > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: Re: Two Drive Failure on RAID-5 2008-05-20 15:32 ` Cry 2008-05-20 17:18 ` David Lethe @ 2008-05-20 17:27 ` David Lethe 2008-05-20 19:28 ` Brad Campbell 2 siblings, 0 replies; 19+ messages in thread From: David Lethe @ 2008-05-20 17:27 UTC (permalink / raw) To: Cry, linux-raid -----Original Message----- From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Cry Sent: Tuesday, May 20, 2008 10:32 AM To: linux-raid@vger.kernel.org Subject: Re: Two Drive Failure on RAID-5 David Greaves <david <at> dgreaves.com> writes: > > Yep. Don't panic and don't do anything else yet if you're not confident about > what you're doing. > > I'll follow up with more info in a short while. > > Info you can provide: > kernel version > mdadm version > cat /proc/mdstat > mdadm --examine /dev/sd[abcdef]1 (or whatever your array components are) > relevant smartctl info on the bad drive(s) > dmesg info about the drive failures > > Assuming genuine hardware failure: > Do you have any spare drives that you can use to replace the components? > > David Thanks for the info. I was able to do a --force --assemble on the array and I copied off my most critical data. At the moment, I don't have enough drives to take all the data on the array, so I'm going to be at a bit of a standstill until new hardware arrives. Since the copy of that data (about 500Gig of about 2TB) went so well, I decided to try to sync up the spare again and it died at the same point and the raid system pulled down the array. I'm trying to decide if I should follow your suggestion in sister post to copy the failed drive onto my spare or if I should just format the spare and try to recover another 500 gig of data of the array. Is there a mdadm or other command to tell the raid system to stay up in the face of errors? Can the array be assembled in a way that doesn't change the array in any way (completely read-only)? I've got the older failed drive also (about 15 hours older). Can that be leveraged also? The server isn't networked right now, but I'll try to get the above requested logs tonight. By the way, I'm thinking about buying five of these: Seagate Barracuda 7200.11 1TB ST31000340AS SATA-II 32MB Cache and one of these: Supermicro SUPERMICRO CSE-M35T-1 Hot-Swapable SATA HDD Enclosure http://www.supermicro.com/products/accessories/mobilerack/CSE-M35T-1.cfm and building a raid-6 array. I'll convert the surviving drives into a backup for the primary array. Any feedback on the above? Is there a suggestion on an inexpensive controller to give more SATA ports that is very software raid compatible? Any suggestions for optimal configuration (ext3) and tuning for the new array? My load consists of serving a photo gallery via apache and gallery2 as well as a local media (audio/video) server so files sizes tend to be large. Thanks, Joel ---------------- Also .. statistically speaking, you just lost 2 of 6 disks, within hours of each other, and you have 4 more disks that had the same workload and exposure to environmental conditions. Assuming you deployed all 6 disks at the same time, and they were all made at the same time, then you are betting your data against those other 4 drives failing soon. Make sure you have current backup. David ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Two Drive Failure on RAID-5 2008-05-20 15:32 ` Cry 2008-05-20 17:18 ` David Lethe 2008-05-20 17:27 ` David Lethe @ 2008-05-20 19:28 ` Brad Campbell 2 siblings, 0 replies; 19+ messages in thread From: Brad Campbell @ 2008-05-20 19:28 UTC (permalink / raw) To: Cry; +Cc: linux-raid Cry wrote: > > Supermicro SUPERMICRO CSE-M35T-1 Hot-Swapable SATA HDD Enclosure > > http://www.supermicro.com/products/accessories/mobilerack/CSE-M35T-1.cfm > > and building a raid-6 array. I'll convert the surviving drives into a backup > for the primary array. Any feedback on the above? Is there a suggestion on an > inexpensive controller to give more SATA ports that is very software raid > compatible? > I've got 5 of those enclosures with Maxtor Maxline-II drives in them. I've had them all running between 3 & 4 years now and I've been *extremely* happy with the enclosures. The whole lot are running on 7 Promise SATA150TX4 cards. So I'd certainly be happy with the enclosures, however I tend to agree with what David said below about going for the higher grade drives. I paid a bit extra for the Maxline-II drives over the desktop grade disks, and I've got 27 of them with about 30,000 hours on them now. One early life failure (in the 1st 5 hours) and one recently replaced as it was growing defects.. but the 26 remaining drives are solid. Oh, 15 drives are in a RAID-6 and 10 are in a RAID-5. I plan to replace the 10 drive RAID-5 with 10 1TB drives in a RAID-6 in the not to distant future. I did have a dual drive failure on the RAID-6 (not actually drive related, but a software glitch) and having the RAID-6 saved the data nicely. Brad -- "Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so." -- Douglas Adams ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Two Drive Failure on RAID-5 2008-05-19 22:49 Two Drive Failure on RAID-5 Cry 2008-05-20 7:37 ` David Greaves @ 2008-05-20 9:14 ` David Greaves 2008-05-20 12:17 ` Janos Haar 2008-05-21 14:14 ` Cry 1 sibling, 2 replies; 19+ messages in thread From: David Greaves @ 2008-05-20 9:14 UTC (permalink / raw) To: Cry; +Cc: linux-raid Cry wrote: > Folks, > > I had a drive fail on my 6 drive raid-5 array. while syncing in the replacement > drive (11 percent complete) a second drive went bad. > > Any suggestions to recover as much data as possible from the array? Let us know if any step fails... How valuable is your data - if it is very valuable and you have no backups then you may want to seek professional help. The replacement drive *may* help to rebuild up to 11% of your data in the event that the bad drive fails completely. You can keep it to one side to try this if you get really desperate. Assuming a real drive hardware failure (smartctl shows errors and dmesg showed media errors or similar). I would first suggest using ddrescue to duplicate the 2nd failed drive onto a spare drive (the replacement is fine if you want to risk that <11% of potentially saved data - a new drive would be better - you're going to need a new one anyway!) SOURCE is the 2nd failed drive TARGET is it's replacement blockdev --getra /dev/SOURCE <note the readahead value> blockdev --setro /dev/SOURCE blockdev --setra 0 /dev/SOURCE ddrescue /dev/SOURCE /dev/TARGET /somewhere_safe/logfile Note, Janos Haar recently (18/may) posted a more conservative approach that you may want to use. Additionally you may want to use a logfile ddrescue lets you know how much data it failed to recover. If this is a lot then you may want to read up on the ddrescue info page (includes a tutorial and lots of explanation) and consider drive data recovery tricks such as drive cooling (which some sources suggest may cause more damage than they solve but has worked for me in the past). I have also left ddrescue running overnight against a system that repeatedly timed-out and in the morning I've had a *lot* more recovered data. Having *successfully* done that you can re-assemble the array using the 4 good disks and the newly duplicated one. unless you've rebooted: blockdev --setrw /dev/SOURCE blockdev --setra <saved readahead value> /dev/SOURCE mdadm --assemble --force /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 cat /proc/mdstat will show the drive status mdadm --detail /dev/md0 mdadm --examine /dev/sd[abcdef]1 [components] Should all show a reasonably healthy but degraded array. This should now be amenable to a read-only fsck/xfs_repair/whatever. If that looks reasonable then you may want to do a proper fsck, perform a backup and add a new drive. HTH - let me know if any steps don't make sense; I think its about time I put something on the wiki about data-recovery... David ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Two Drive Failure on RAID-5 2008-05-20 9:14 ` David Greaves @ 2008-05-20 12:17 ` Janos Haar 2008-05-21 14:14 ` Cry 1 sibling, 0 replies; 19+ messages in thread From: Janos Haar @ 2008-05-20 12:17 UTC (permalink / raw) To: David Greaves, cry_regarder; +Cc: linux-raid ----- Original Message ----- From: "David Greaves" <david@dgreaves.com> To: "Cry" <cry_regarder@yahoo.com> Cc: <linux-raid@vger.kernel.org> Sent: Tuesday, May 20, 2008 11:14 AM Subject: Re: Two Drive Failure on RAID-5 > Cry wrote: >> Folks, >> >> I had a drive fail on my 6 drive raid-5 array. while syncing in the >> replacement >> drive (11 percent complete) a second drive went bad. >> >> Any suggestions to recover as much data as possible from the array? > > Let us know if any step fails... > > How valuable is your data - if it is very valuable and you have no backups > then > you may want to seek professional help. > > The replacement drive *may* help to rebuild up to 11% of your data in the > event > that the bad drive fails completely. You can keep it to one side to try > this if > you get really desperate. > > Assuming a real drive hardware failure (smartctl shows errors and dmesg > showed > media errors or similar). > > I would first suggest using ddrescue to duplicate the 2nd failed drive > onto a > spare drive (the replacement is fine if you want to risk that <11% of > potentially saved data - a new drive would be better - you're going to > need a > new one anyway!) > > SOURCE is the 2nd failed drive > TARGET is it's replacement > > blockdev --getra /dev/SOURCE <note the readahead value> > blockdev --setro /dev/SOURCE > blockdev --setra 0 /dev/SOURCE > ddrescue /dev/SOURCE /dev/TARGET /somewhere_safe/logfile > > Note, Janos Haar recently (18/may) posted a more conservative approach > that you > may want to use. Additionally you may want to use a logfile > > ddrescue lets you know how much data it failed to recover. If this is a > lot then > you may want to read up on the ddrescue info page (includes a tutorial and > lots > of explanation) and consider drive data recovery tricks such as drive > cooling > (which some sources suggest may cause more damage than they solve but has > worked > for me in the past). > > I have also left ddrescue running overnight against a system that > repeatedly > timed-out and in the morning I've had a *lot* more recovered data. > > Having *successfully* done that you can re-assemble the array using the 4 > good > disks and the newly duplicated one. > > unless you've rebooted: > blockdev --setrw /dev/SOURCE > blockdev --setra <saved readahead value> /dev/SOURCE > > mdadm --assemble --force /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 > /dev/sde1 > > cat /proc/mdstat will show the drive status > mdadm --detail /dev/md0 > mdadm --examine /dev/sd[abcdef]1 [components] > > Should all show a reasonably healthy but degraded array. > > This should now be amenable to a read-only fsck/xfs_repair/whatever. Maybe COW loop helps a lot. ;-) > > If that looks reasonable then you may want to do a proper fsck, perform a > backup > and add a new drive. > > HTH - let me know if any steps don't make sense; I think its about time I > put > something on the wiki about data-recovery... > > David > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Two Drive Failure on RAID-5 2008-05-20 9:14 ` David Greaves 2008-05-20 12:17 ` Janos Haar @ 2008-05-21 14:14 ` Cry 2008-05-21 20:15 ` David Greaves 1 sibling, 1 reply; 19+ messages in thread From: Cry @ 2008-05-21 14:14 UTC (permalink / raw) To: linux-raid David Greaves <david <at> dgreaves.com> writes: > > Cry wrote: >> Folks, >> >> I had a drive fail on my 6 drive raid-5 array. while syncing in the >> replacement >> drive (11 percent complete) a second drive went bad. >> > > blockdev --getra /dev/SOURCE <note the readahead value> > blockdev --setro /dev/SOURCE > blockdev --setra 0 /dev/SOURCE > ddrescue /dev/SOURCE /dev/TARGET /somewhere_safe/logfile > > unless you've rebooted: > blockdev --setrw /dev/SOURCE > blockdev --setra <saved readahead value> /dev/SOURCE > > mdadm --assemble --force /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 > /dev/sde1 > cat /proc/mdstat will show the drive status > mdadm --detail /dev/md0 > mdadm --examine /dev/sd[abcdef]1 [components] I performed the above steps, however I used dd_rescue instead of ddrescue. ]# dd_rescue -l sda_rescue.log -o sda_rescue.bad -v /dev/sda /dev/sdg1 dd_rescue: (info): about to transfer 0.0 kBytes from /dev/sda to /dev/sdg1 dd_rescue: (info): blocksizes: soft 65536, hard 512 dd_rescue: (info): starting positions: in 0.0k, out 0.0k dd_rescue: (info): Logfile: sda_rescue.log, Maxerr: 0 dd_rescue: (info): Reverse: no , Trunc: no , interactive: no dd_rescue: (info): abort on Write errs: no , spArse write: if err ....... dd_rescue: (info): /dev/sda (488386592.0k): EOF Summary for /dev/sda -> /dev/sdg1: dd_rescue: (info): ipos: 488386592.0k, opos: 488386592.0k, xferd: 488386592.0k errs: 504, errxfer: 252.0k, succxfer: 488386336.0k +curr.rate: 47904kB/s, avg.rate: 14835kB/s, avg.load: 9.6% /dev/sdg1 is my replacement drive (750G) that I had tried to sync previously. The problem now is that while the copy happened, mdadm still thinks it is that old spare: ~]# mdadm -E /dev/sdg1 /dev/sda /dev/sdg1: Magic : a92b4efc Version : 00.90.00 UUID : 18e3d0b8:a21b31d2:7216c3e5:9bbd9f39 Creation Time : Thu May 24 01:55:48 2007 Raid Level : raid5 Used Dev Size : 488386432 (465.76 GiB 500.11 GB) Array Size : 2441932160 (2328.81 GiB 2500.54 GB) Raid Devices : 6 Total Devices : 6 Preferred Minor : 0 Update Time : Mon May 19 22:32:56 2008 State : clean Active Devices : 4 Working Devices : 5 Failed Devices : 2 Spare Devices : 1 Checksum : 1dc8dcfa - correct Events : 0.1187802 Layout : left-symmetric Chunk Size : 128K Number Major Minor RaidDevice State this 6 8 97 6 spare /dev/sdg1 0 0 8 80 0 active sync /dev/sdf 1 1 8 64 1 active sync /dev/sde 2 2 8 128 2 active sync /dev/sdi 3 3 0 0 3 faulty removed 4 4 0 0 4 faulty removed 5 5 8 144 5 active sync /dev/sdj 6 6 8 97 6 spare /dev/sdg1 /dev/sda: Magic : a92b4efc Version : 00.90.00 UUID : 18e3d0b8:a21b31d2:7216c3e5:9bbd9f39 Creation Time : Thu May 24 01:55:48 2007 Raid Level : raid5 Used Dev Size : 488386432 (465.76 GiB 500.11 GB) Array Size : 2441932160 (2328.81 GiB 2500.54 GB) Raid Devices : 6 Total Devices : 6 Preferred Minor : 0 Update Time : Mon May 19 21:40:13 2008 State : clean Active Devices : 5 Working Devices : 6 Failed Devices : 1 Spare Devices : 1 Checksum : 1dc8d023 - correct Events : 0.1187796 Layout : left-symmetric Chunk Size : 128K Number Major Minor RaidDevice State this 4 8 0 4 active sync /dev/sda 0 0 8 80 0 active sync /dev/sdf 1 1 8 64 1 active sync /dev/sde 2 2 8 128 2 active sync /dev/sdi 3 3 0 0 3 faulty removed 4 4 8 0 4 active sync /dev/sda 5 5 8 144 5 active sync /dev/sdj 6 6 8 97 6 spare /dev/sdg1 and when I try to assemble the array, it only sees four disks plus a spare. ]# mdadm --assemble --force --verbose /dev/md0 /dev/sdf /dev/sde /dev/sdi /dev/sdj /dev/sdg1 mdadm: looking for devices for /dev/md0 mdadm: /dev/sdf is identified as a member of /dev/md0, slot 0. mdadm: /dev/sde is identified as a member of /dev/md0, slot 1. mdadm: /dev/sdi is identified as a member of /dev/md0, slot 2. mdadm: /dev/sdj is identified as a member of /dev/md0, slot 5. mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 6. mdadm: added /dev/sde to /dev/md0 as 1 mdadm: added /dev/sdi to /dev/md0 as 2 mdadm: no uptodate device for slot 3 of /dev/md0 mdadm: no uptodate device for slot 4 of /dev/md0 mdadm: added /dev/sdj to /dev/md0 as 5 mdadm: added /dev/sdg1 to /dev/md0 as 6 mdadm: added /dev/sdf to /dev/md0 as 0 mdadm: /dev/md0 assembled from 4 drives and 1 spare - not enough to start the array. How do I transfer the label from /dev/sda (no partitions) to /dev/sdg1? Thanks, Cry ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Two Drive Failure on RAID-5 2008-05-21 14:14 ` Cry @ 2008-05-21 20:15 ` David Greaves 2008-05-21 20:47 ` Janos Haar 0 siblings, 1 reply; 19+ messages in thread From: David Greaves @ 2008-05-21 20:15 UTC (permalink / raw) To: Cry; +Cc: linux-raid Cry wrote: > David Greaves <david <at> dgreaves.com> writes: >> Cry wrote: >> ddrescue /dev/SOURCE /dev/TARGET /somewhere_safe/logfile >> > >> unless you've rebooted: >> blockdev --setrw /dev/SOURCE >> blockdev --setra <saved readahead value> /dev/SOURCE >> >> mdadm --assemble --force /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 >> /dev/sde1 > >> cat /proc/mdstat will show the drive status >> mdadm --detail /dev/md0 >> mdadm --examine /dev/sd[abcdef]1 [components] > > I performed the above steps, however I used dd_rescue instead of ddrescue. Similar software. I think dd_rescue is more 'scripted' and less maintained. > ]# dd_rescue -l sda_rescue.log -o sda_rescue.bad -v /dev/sda /dev/sdg1 doh!! You copied the disk (/dev/sda) into a partition (/dev/sdg1)... > dd_rescue: (info): /dev/sda (488386592.0k): EOF > Summary for /dev/sda -> /dev/sdg1: > dd_rescue: (info): ipos: 488386592.0k, opos: 488386592.0k, > xferd: 488386592.0k > errs: 504, errxfer: 252.0k, > succxfer: 488386336.0k > +curr.rate: 47904kB/s, avg.rate: 14835kB/s, > avg.load: 9.6% So you lost 252k of data. There may be filesystem corruption, a file may be corrupt or some blank diskspace may be even more blank. Almost impossible to tell. [aside: It would be nice if we could take the output from ddrescue and friends to determine what the lost blocks map to via the md stripes.] > /dev/sdg1 is my replacement drive (750G) that I had tried to sync > previously. No. /dev/sdg1 is a *partition* on your old drive. I'm concerned that running the first ddrescue may have stressed /dev/sda and you'd lose data running it again with the correct arguments. > How do I transfer the label from /dev/sda (no partitions) to /dev/sdg1? Can anyone suggest anything. Cry don't do this... I wonder about dd if=/dev/sdg1 of=/dev/sdg but goodness knows if it would work... it'd rely on dd reading from the start of the partition device and writes to the disk device not overlapping - which they shouldn't but... David ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Two Drive Failure on RAID-5 2008-05-21 20:15 ` David Greaves @ 2008-05-21 20:47 ` Janos Haar 2008-05-21 21:21 ` Cry 2008-05-22 0:05 ` Cry 0 siblings, 2 replies; 19+ messages in thread From: Janos Haar @ 2008-05-21 20:47 UTC (permalink / raw) To: David Greaves, cry_regarder; +Cc: linux-raid ----- Original Message ----- From: "David Greaves" <david@dgreaves.com> To: "Cry" <cry_regarder@yahoo.com> Cc: <linux-raid@vger.kernel.org> Sent: Wednesday, May 21, 2008 10:15 PM Subject: Re: Two Drive Failure on RAID-5 > Cry wrote: >> David Greaves <david <at> dgreaves.com> writes: >>> Cry wrote: >>> ddrescue /dev/SOURCE /dev/TARGET /somewhere_safe/logfile >>> >> >>> unless you've rebooted: >>> blockdev --setrw /dev/SOURCE >>> blockdev --setra <saved readahead value> /dev/SOURCE >>> >>> mdadm --assemble --force /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 >>> /dev/sdd1 >>> /dev/sde1 >> >>> cat /proc/mdstat will show the drive status >>> mdadm --detail /dev/md0 >>> mdadm --examine /dev/sd[abcdef]1 [components] >> >> I performed the above steps, however I used dd_rescue instead of >> ddrescue. > Similar software. I think dd_rescue is more 'scripted' and less > maintained. > >> ]# dd_rescue -l sda_rescue.log -o sda_rescue.bad -v /dev/sda /dev/sdg1 > > doh!! > You copied the disk (/dev/sda) into a partition (/dev/sdg1)... > > >> dd_rescue: (info): /dev/sda (488386592.0k): EOF >> Summary for /dev/sda -> /dev/sdg1: >> dd_rescue: (info): ipos: 488386592.0k, opos: 488386592.0k, >> xferd: 488386592.0k >> errs: 504, errxfer: 252.0k, >> succxfer: 488386336.0k >> +curr.rate: 47904kB/s, avg.rate: 14835kB/s, >> avg.load: 9.6% > So you lost 252k of data. There may be filesystem corruption, a file may > be > corrupt or some blank diskspace may be even more blank. Almost impossible > to tell. The dd_rescue shows if the target device is full. The errs number divisible by 8, i think its only bad sectors. But let me note: With the default -b 64k, dd_rescue sometimes drop the entire soft block area on the first error! If you want more precise result, run it again with -b 4096 and -B 1024, and if you can, don't copy the drive to the partition! :-) > > [aside: It would be nice if we could take the output from ddrescue and > friends > to determine what the lost blocks map to via the md stripes.] > >> /dev/sdg1 is my replacement drive (750G) that I had tried to sync >> previously. > No. /dev/sdg1 is a *partition* on your old drive. > > I'm concerned that running the first ddrescue may have stressed /dev/sda > and > you'd lose data running it again with the correct arguments. > >> How do I transfer the label from /dev/sda (no partitions) to /dev/sdg1? > Can anyone suggest anything. Cry i only have this idea: dd_rescue -v -m 128k -r /dev/source -S 128k superblock.bin losetup /dev/loop0 superblock.bin mdadm --build -l linear --raid-devices=2 /dev/md1 /dev/sdg1 /dev/loop0 And the working raid member is /dev/md1. ;-) But only for recovery!!! (only idea, not tested.) Cheers, Janos > > Cry don't do this... > > I wonder about > dd if=/dev/sdg1 of=/dev/sdg > but goodness knows if it would work... it'd rely on dd reading from the > start of > the partition device and writes to the disk device not overlapping - which > they > shouldn't but... > > David > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Two Drive Failure on RAID-5 2008-05-21 20:47 ` Janos Haar @ 2008-05-21 21:21 ` Cry 2008-05-22 8:38 ` David Greaves 2008-05-22 0:05 ` Cry 1 sibling, 1 reply; 19+ messages in thread From: Cry @ 2008-05-21 21:21 UTC (permalink / raw) To: linux-raid Janos Haar <janos.haar <at> netcenter.hu> writes: > But let me note: > With the default -b 64k, dd_rescue sometimes drop the entire soft block area > on the first error! > If you want more precise result, run it again with -b 4096 and -B 1024, and > if you can, don't copy the drive to the partition! Since I kept the bad blocks file from the dd_rescue run, can I just use that to have dd_rescue try to copy exactly the right blocks out? This would avoid over stressing the drive? Would it be best to have dd_rescue copy the blocks to a file and then use dd to write them onto /dev/sdg1 in the right place? >> [aside: It would be nice if we could take the output from ddrescue and >> friends >> to determine what the lost blocks map to via the md stripes.] Yes, because I also have /dev/sdc which failed several hours before /dev/sda. Between the two, everything should be recoverable, modulo low probability of the same block failing on both. Is there a procedure to rebuild the lost stripes leveraging the other failed drive? >>> /dev/sdg1 is my replacement drive (750G) that I had tried to sync >>> previously. >> No. /dev/sdg1 is a *partition* on your old drive. Nope. /dev/sda is my old drive. It has NO partitions because I was retarded 1 year ago: Folks, I made a mistake when I created my original raid array (there is a note about it in the archives of this group) that I built the array on the raw drives, not on partitions. /dev/sda IS the drive. There is no /dev/sda1. However, the replacement drive is a 750Gig (not 500 like the originals) so I built a partition on the drive of the correct size: /dev/sdg1. > >> How do I transfer the label from /dev/sda (no partitions) to /dev/sdg1? > > Can anyone suggest anything. > > Cry i only have this idea: > dd_rescue -v -m 128k -r /dev/source -S 128k superblock.bin > losetup /dev/loop0 superblock.bin > mdadm --build -l linear --raid-devices=2 /dev/md1 /dev/sdg1 /dev/loop0 > > And the working raid member is /dev/md1. > But only for recovery!!! Let me think about the above. This will copy the information that mdadm -E gets from the entire drive /dev/sda into the partition /dev/sdg1? Also, I ordered: SUPERMICRO CSE-M35T-1 Hot-Swapable SATA HDD Enclosure and 5 Seagate Barracuda ES.2 ST31000340NS 1TB 7200 RPM SATA 3.0Gb/s Hard Drive drives to build a RAID-6 replacement for my old array. I'm planning on turning the old drives into a LVM or RAID-0 set to serve as a backup to the primary array. Any suggestions for configuring the array (performance parameters etc.)? Given my constraints about getting this all working again, I can't go through a real performance testing loop. Thanks, Cry ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Two Drive Failure on RAID-5 2008-05-21 21:21 ` Cry @ 2008-05-22 8:38 ` David Greaves 2008-05-31 9:27 ` Cry 0 siblings, 1 reply; 19+ messages in thread From: David Greaves @ 2008-05-22 8:38 UTC (permalink / raw) To: Cry; +Cc: linux-raid Cry wrote: > Janos Haar <janos.haar <at> netcenter.hu> writes: >>> No. /dev/sdg1 is a *partition* on your old drive. > > Nope. /dev/sda is my old drive. It has NO partitions because I was > retarded 1 year ago: > > Folks, I made a mistake when I created my original raid array > (there is a note about it in the archives of this group) that > I built the array on the raw drives, not on partitions. That is not a 'problem' although it is not regarded as best practice. > /dev/sda IS the drive. There is no /dev/sda1. However, the > replacement drive is a 750Gig (not 500 like the originals) so > I built a partition on the drive of the correct size: /dev/sdg1. And you didn't think to mention this? Maybe you thought it would be in the support file I keep for you? When people offer suggestions they (or at least I) will probably form a picture of what's going on - if you are going to throw tweaks into the mix then they may throw us off. Mention them. You have failed to answer some potentially relevant questions and, before you get this array rebuilt you wandered off (on the same thread) into discussions about what disk drives you might like to buy, the best type of external enclosure and various other oddments. This is not helpful. >The correct information should already be in /dev/sdg1 since I copied the entire >/dev/sda there (probably overwrote stuff in /dev/sdg2 since /dev/sda was 160K >bigger than /dev/sdg1). Err, no. Linux doesn't randomly overwrite other partitions... You're on 0.9 superblocks which are located at the end of the disk. I *think* your assemble problem is that /dev/sdg1 was an old component (slot 6); it had a superblock near the end of the partition which you didn't zero. You copied most of /dev/sda into it but you made /dev/sdg1 too small by a few k. The copy finished before it copied the /dev/sda superblock (I don't know why it didn't overwrite the old superblock??). Also, at some point, md will try and seek past the end of /dev/sdg1 and will die. You have now dug a maze of twisty passages... I think at this point you should enlarge /dev/sdg1, recopy /dev/sda to /dev/sdg1 and try again. That will probably work. David ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Two Drive Failure on RAID-5 2008-05-22 8:38 ` David Greaves @ 2008-05-31 9:27 ` Cry 0 siblings, 0 replies; 19+ messages in thread From: Cry @ 2008-05-31 9:27 UTC (permalink / raw) To: linux-raid David Greaves <david <at> dgreaves.com> writes: > Cry wrote: >> Janos Haar <janos.haar <at> netcenter.hu> writes: >> >> Folks, I made a mistake when I created my original raid array >> (there is a note about it in the archives of this group) that >> I built the array on the raw drives, not on partitions. > > When people offer suggestions they (or at least I) will probably form > a picture of what's going on - if you are going to throw tweaks into > the mix then they may throw us off. Mention them. Point taken. > You have failed to answer some potentially relevant questions and, > before you get this array rebuilt you wandered off (on the same > thread) into discussions about what disk drives you might like to > buy, the best type of external enclosure and various other oddments. > This is not helpful. Yup, I should have put that stuff into a separate thread. That said, I did get good feedback on those questions on the other branch of the thread. > You're on 0.9 superblocks which are located at the end of the disk. Thanks for the above line. It was key. > You have now dug a maze of twisty passages... :-) > I think at this point you should enlarge /dev/sdg1, recopy /dev/sda to > /dev/sdg1 and try again. That will probably work. What I ended up doing was writing off the extra 250G in /dev/sdg and using ddrescue to copy the failed drive to it. ddrescue -dr3 /dev/sdf /dev/sdg 750_ddrescue.log Press Ctrl-C to interrupt Initial status (read from logfile) rescued: 500107 MB, errsize: 356 kB, errors: 72 Current status rescued: 500107 MB, errsize: 48128 B, current rate: 0 B/s ipos: 482042 MB, errors: 79, average rate: 269 B/s opos: 482042 MB Copying bad blocks... Retry 1 It was nice that ddrescue as invoked got a good chunk more off the drive than I'd gotten with dd_rescue. The interesting thing was that mdadm -E /dev/sdg reported that there wasn't a superblock on it! Then I remembered your note above about where the superblocks are located so I figured that the data was all fine, just that the superblock was in the wrong place. So I let mdadm assemble the array: At this point, I have moved the drives around extensively so drive letters do not match earlier posts: mdadm --create /dev/md0 --verbose --level=5 --raid-devices=6 --chunk=128 /dev/sdl /dev/sdi /dev/sdj missing /dev/sdg /dev/sdk mdadm: layout defaults to left-symmetric mdadm: /dev/sdl appears to be part of a raid array: level=raid5 devices=6 ctime=Thu May 24 01:55:48 2007 mdadm: /dev/sdi appears to be part of a raid array: level=raid5 devices=6 ctime=Thu May 24 01:55:48 2007 mdadm: /dev/sdj appears to be part of a raid array: level=raid5 devices=6 ctime=Thu May 24 01:55:48 2007 mdadm: /dev/sdk appears to be part of a raid array: level=raid5 devices=6 ctime=Thu May 24 01:55:48 2007 mdadm: size set to 488386432K mdadm: largest drive (/dev/sdg) exceed size (488386432K) by more than 1% Continue creating array? yes mdadm: array /dev/md0 started. At this point I was able to recover all but a couple files onto a second raid array. Thanks David Greaves and Janos Haar for the wonderful advice on restoring my data. Thanks to David Lethe for the advice to get the server class drives and thanks to Brad Campbell for endorsing the supermicro CSE-M35T enclosure. It was quite easy to install and seems to be working well and keeping the drives nice and cool. The old and the new arrays: md0 : active raid5 sdk[5] sdg[4] sdj[2] sdi[1] sdl[0] 2441932160 blocks level 5, 128k chunk, algorithm 2 [6/5] [UUU_UU] md1 : active raid6 sdd1[3] sdc1[2] sde1[4] sda1[0] sdb1[1] 2930279808 blocks level 6, 128k chunk, algorithm 2 [5/5] [UUUUU] Thanks again, Cry ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Two Drive Failure on RAID-5 2008-05-21 20:47 ` Janos Haar 2008-05-21 21:21 ` Cry @ 2008-05-22 0:05 ` Cry 1 sibling, 0 replies; 19+ messages in thread From: Cry @ 2008-05-22 0:05 UTC (permalink / raw) To: linux-raid Janos Haar <janos.haar <at> netcenter.hu> writes: > Cry i only have this idea: > dd_rescue -v -m 128k -r /dev/source -S 128k superblock.bin > losetup /dev/loop0 superblock.bin > mdadm --build -l linear --raid-devices=2 /dev/md1 /dev/sdg1 /dev/loop0 Janos, The correct information should already be in /dev/sdg1 since I copied the entire /dev/sda there (probably overwrote stuff in /dev/sdg2 since /dev/sda was 160K bigger than /dev/sdg1). This means I the superblock should already be there at the start of /dev/sdg1. so the steps above should result in two superblocks being stacked back to back? should I use an losetup offset into /dev/sdg1 to get to the real data and bypass the MBR copied in from /dev/sda? I am a bit confuse don how these things are laid out. Where on the disk does the information printed from mdadm -E /dev/sda and mdadm -E /dev/sdg1 come from? Thanks! Cry ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2008-05-31 9:27 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-05-19 22:49 Two Drive Failure on RAID-5 Cry 2008-05-20 7:37 ` David Greaves 2008-05-20 15:32 ` Cry 2008-05-20 17:18 ` David Lethe 2008-05-20 19:01 ` Cry 2008-05-20 20:09 ` David Lethe 2008-05-20 23:11 ` Keith Roberts 2008-05-20 19:40 ` Janos Haar 2008-05-20 17:27 ` David Lethe 2008-05-20 19:28 ` Brad Campbell 2008-05-20 9:14 ` David Greaves 2008-05-20 12:17 ` Janos Haar 2008-05-21 14:14 ` Cry 2008-05-21 20:15 ` David Greaves 2008-05-21 20:47 ` Janos Haar 2008-05-21 21:21 ` Cry 2008-05-22 8:38 ` David Greaves 2008-05-31 9:27 ` Cry 2008-05-22 0:05 ` Cry
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).