* device with newer data added as spare - data now gone? @ 2009-06-26 11:46 Molinero 2009-06-28 19:04 ` Leslie Rhorer 2009-07-01 2:43 ` Roger Heflin 0 siblings, 2 replies; 4+ messages in thread From: Molinero @ 2009-06-26 11:46 UTC (permalink / raw) To: linux-raid Hi all I've lost quite a lot of data on my /home raid partition and I'm wondering what exactly I did to make it happen. I'd like to know so something similar won't happen in the future. I'm pretty much a raid newbie. I setup raid1 on my home server and I'm guessing that something like this happened. Please tell me if it's possible. * Some time ago I did something to have one device fail which resulted md3 in having only 1 device. * Time went by without me noticing (because I suck) * An update broke my raid setup and gave me a kernel panic (because I suck). Didn't put the mdadm and raid hooks in mkinitcpio.conf * Booted a live-cd, mounted the drives and chrooted back into the system and fixed the mkinitcpio.conf * Rebooted and noticed that md3 was running with only 1 device * Added sdb4 to md3 and it then read 1 device with 1 spare * cat /proc/mdstat started to say "recovery" * All data from approx. 1 year is gone I guessing that the old (not updated) device was set as "master" and the data on the drive (containing newer data) was overwritten by data on the old device - is this plausible? If not what exactly did I do to delete all of the data? It's not the end of the world but it does suck a lot - especially some email and pictures are gone that I will miss. Any clarifications will be much appreciated. -- View this message in context: http://www.nabble.com/device-with-newer-data-added-as-spare---data-now-gone--tp24218843p24218843.html Sent from the linux-raid mailing list archive at Nabble.com. ^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: device with newer data added as spare - data now gone? 2009-06-26 11:46 device with newer data added as spare - data now gone? Molinero @ 2009-06-28 19:04 ` Leslie Rhorer 2009-07-01 2:43 ` Roger Heflin 1 sibling, 0 replies; 4+ messages in thread From: Leslie Rhorer @ 2009-06-28 19:04 UTC (permalink / raw) To: linux-raid > Hi all > > I've lost quite a lot of data on my /home raid partition and I'm wondering > what exactly I did to make it happen. I'd like to know so something > similar > won't happen in the future. Well, first of all, of course learning from your mistakes will prevent them from happening in the future. If you ask me, a second very important utility to enable is mdadm's ability to notify you via e-mail whenever a significant event transpires. You will then be notified quickly of any significant changes to any RAID array, such as losing a hard drive. Finally, more important than anything else: BACK UP YOUR IMPORTANT DATA. If it is data that can be recovered through some external process, but takes a bit of doing, back it up once, and keep it handy. A different drive or array on the same machine or a different machine in the same room is fine for this level of backup. If it is data that cannot be recovered and would cause some heartache if lost, then include it in the local handy backup, but also include it in an off-premise backup. If it is critical data - like financial information, then back it up 16 ways from Sunday. I keep all critical data backed up on two different servers with independent RAID arrays, DVD-ROM backups offsite, and independent multi-generation backups on every workstation which accesses the data. If it is a commercial application and the revenue supports it, or if it is important enough to you and you can personally afford it, I suggest you might look into an online storage solution. Remember, RAID arrays are fault tolerant, not fault-free, and while hard drives are frail, the most likely source of data failure by far is user error. > * Some time ago I did something to have one device fail which resulted md3 > in having only 1 device. I presume md3 is the /home array and this was a 2 drive RAID1 array, yes? > * Time went by without me noticing (because I suck) See above. We human beings all tend to suck from time to time. Computers can help by reminding or notifying us of things - if we bother to set them up to do so. > * An update broke my raid setup and gave me a kernel panic (because I > suck). > Didn't put the mdadm and raid hooks in mkinitcpio.conf > * Booted a live-cd, mounted the drives and chrooted back into the system > and > fixed the mkinitcpio.conf This all sounds like lessons learned. > * Rebooted and noticed that md3 was running with only 1 device > * Added sdb4 to md3 and it then read 1 device with 1 spare > * cat /proc/mdstat started to say "recovery" > * All data from approx. 1 year is gone Was sdb4 originally the second partition in the array? What is the first partition? What was the apparent cause of the failure? > I guessing that the old (not updated) device was set as "master" and the > data on the drive (containing newer data) was overwritten by data on the > old > device - is this plausible? Well, I suppose, yeah. > If not what exactly did I do to delete all of the data? What command did you use to add the partition back to the array? ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: device with newer data added as spare - data now gone? 2009-06-26 11:46 device with newer data added as spare - data now gone? Molinero 2009-06-28 19:04 ` Leslie Rhorer @ 2009-07-01 2:43 ` Roger Heflin [not found] ` <ab50889f0907010231k2af2f35l7d880187a395b6d0@mail.gmail.com> 1 sibling, 1 reply; 4+ messages in thread From: Roger Heflin @ 2009-07-01 2:43 UTC (permalink / raw) To: Molinero; +Cc: linux-raid Molinero wrote: > Hi all > > I've lost quite a lot of data on my /home raid partition and I'm wondering > what exactly I did to make it happen. I'd like to know so something similar > won't happen in the future. > > I'm pretty much a raid newbie. I setup raid1 on my home server and I'm > guessing that something like this happened. Please tell me if it's possible. > > * Some time ago I did something to have one device fail which resulted md3 > in having only 1 device. > * Time went by without me noticing (because I suck) > * An update broke my raid setup and gave me a kernel panic (because I suck). > Didn't put the mdadm and raid hooks in mkinitcpio.conf > * Booted a live-cd, mounted the drives and chrooted back into the system and > fixed the mkinitcpio.conf > * Rebooted and noticed that md3 was running with only 1 device > * Added sdb4 to md3 and it then read 1 device with 1 spare > * cat /proc/mdstat started to say "recovery" > * All data from approx. 1 year is gone > > I guessing that the old (not updated) device was set as "master" and the > data on the drive (containing newer data) was overwritten by data on the old > device - is this plausible? If the old device was brought up as md3 and had dropped out months ago, the data would now be the data that existed when that disk dropped off. And when a device drops out, there is no mark on that device marking it as bad since the typical reasons for the device dropping off are that it is not longer talking. And sometimes mirrors are intentionally broken for various reasons to preserve a copy for one reason or another such as to be able to back out of a serious OS upgrade that did not go well quickly. If you added the current device as a spare it would have copied the data from the old device over the current device. That is one thing that would make 3+ disk raid5 a bit more resistant to this, with a dropped off disk you could not start the array with only the dropped device, and with all 3, 2 of the devices will know the 3rd was dropped at some time in the past, and with any 2 on of those devices would believe the other one was marked bad. ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <ab50889f0907010231k2af2f35l7d880187a395b6d0@mail.gmail.com>]
* Re: device with newer data added as spare - data now gone? [not found] ` <ab50889f0907010231k2af2f35l7d880187a395b6d0@mail.gmail.com> @ 2009-07-01 9:32 ` John McNulty 0 siblings, 0 replies; 4+ messages in thread From: John McNulty @ 2009-07-01 9:32 UTC (permalink / raw) To: linux-raid I've got myself into the babit of comparing the output from "cat /proc/mdstat" and "mdadm -Esbv" to see if there's any old md metadata floating around on disks I'm about to use before using them. Just as a precaution. If I find any then I --zero-superblock the disk first before re-using it, just to prevent myself getting caught out by events like this. Rgds, John On Wed, Jul 1, 2009 at 3:43 AM, Roger Heflin<rogerheflin@gmail.com> wrote: > Molinero wrote: >> >> Hi all >> >> I've lost quite a lot of data on my /home raid partition and I'm wondering >> what exactly I did to make it happen. I'd like to know so something >> similar >> won't happen in the future. >> >> I'm pretty much a raid newbie. I setup raid1 on my home server and I'm >> guessing that something like this happened. Please tell me if it's >> possible. >> >> * Some time ago I did something to have one device fail which resulted md3 >> in having only 1 device. >> * Time went by without me noticing (because I suck) >> * An update broke my raid setup and gave me a kernel panic (because I >> suck). >> Didn't put the mdadm and raid hooks in mkinitcpio.conf >> * Booted a live-cd, mounted the drives and chrooted back into the system >> and >> fixed the mkinitcpio.conf >> * Rebooted and noticed that md3 was running with only 1 device >> * Added sdb4 to md3 and it then read 1 device with 1 spare >> * cat /proc/mdstat started to say "recovery" >> * All data from approx. 1 year is gone >> >> I guessing that the old (not updated) device was set as "master" and the >> data on the drive (containing newer data) was overwritten by data on the >> old >> device - is this plausible? > > If the old device was brought up as md3 and had dropped out months ago, the > data would now be the data that existed when that disk dropped off. And > when a device drops out, there is no mark on that device marking it as bad > since the typical reasons for the device dropping off are that it is not > longer talking. And sometimes mirrors are intentionally broken for > various reasons to preserve a copy for one reason or another such as to be > able to back out of a serious OS upgrade that did not go well quickly. > > If you added the current device as a spare it would have copied the data > from the old device over the current device. > > That is one thing that would make 3+ disk raid5 a bit more resistant to > this, with a dropped off disk you could not start the array with only the > dropped device, and with all 3, 2 of the devices will know the 3rd was > dropped at some time in the past, and with any 2 on of those devices would > believe the other one was marked bad. > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-07-01 9:32 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-26 11:46 device with newer data added as spare - data now gone? Molinero
2009-06-28 19:04 ` Leslie Rhorer
2009-07-01 2:43 ` Roger Heflin
[not found] ` <ab50889f0907010231k2af2f35l7d880187a395b6d0@mail.gmail.com>
2009-07-01 9:32 ` John McNulty
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).