* device with newer data added as spare - data now gone?
@ 2009-06-26 11:46 Molinero
2009-06-28 19:04 ` Leslie Rhorer
2009-07-01 2:43 ` Roger Heflin
0 siblings, 2 replies; 4+ messages in thread
From: Molinero @ 2009-06-26 11:46 UTC (permalink / raw)
To: linux-raid
Hi all
I've lost quite a lot of data on my /home raid partition and I'm wondering
what exactly I did to make it happen. I'd like to know so something similar
won't happen in the future.
I'm pretty much a raid newbie. I setup raid1 on my home server and I'm
guessing that something like this happened. Please tell me if it's possible.
* Some time ago I did something to have one device fail which resulted md3
in having only 1 device.
* Time went by without me noticing (because I suck)
* An update broke my raid setup and gave me a kernel panic (because I suck).
Didn't put the mdadm and raid hooks in mkinitcpio.conf
* Booted a live-cd, mounted the drives and chrooted back into the system and
fixed the mkinitcpio.conf
* Rebooted and noticed that md3 was running with only 1 device
* Added sdb4 to md3 and it then read 1 device with 1 spare
* cat /proc/mdstat started to say "recovery"
* All data from approx. 1 year is gone
I guessing that the old (not updated) device was set as "master" and the
data on the drive (containing newer data) was overwritten by data on the old
device - is this plausible?
If not what exactly did I do to delete all of the data?
It's not the end of the world but it does suck a lot - especially some email
and pictures are gone that I will miss.
Any clarifications will be much appreciated.
--
View this message in context: http://www.nabble.com/device-with-newer-data-added-as-spare---data-now-gone--tp24218843p24218843.html
Sent from the linux-raid mailing list archive at Nabble.com.
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: device with newer data added as spare - data now gone?
2009-06-26 11:46 device with newer data added as spare - data now gone? Molinero
@ 2009-06-28 19:04 ` Leslie Rhorer
2009-07-01 2:43 ` Roger Heflin
1 sibling, 0 replies; 4+ messages in thread
From: Leslie Rhorer @ 2009-06-28 19:04 UTC (permalink / raw)
To: linux-raid
> Hi all
>
> I've lost quite a lot of data on my /home raid partition and I'm wondering
> what exactly I did to make it happen. I'd like to know so something
> similar
> won't happen in the future.
Well, first of all, of course learning from your mistakes will
prevent them from happening in the future. If you ask me, a second very
important utility to enable is mdadm's ability to notify you via e-mail
whenever a significant event transpires. You will then be notified quickly
of any significant changes to any RAID array, such as losing a hard drive.
Finally, more important than anything else: BACK UP YOUR IMPORTANT
DATA. If it is data that can be recovered through some external process,
but takes a bit of doing, back it up once, and keep it handy. A different
drive or array on the same machine or a different machine in the same room
is fine for this level of backup. If it is data that cannot be recovered
and would cause some heartache if lost, then include it in the local handy
backup, but also include it in an off-premise backup. If it is critical
data - like financial information, then back it up 16 ways from Sunday. I
keep all critical data backed up on two different servers with independent
RAID arrays, DVD-ROM backups offsite, and independent multi-generation
backups on every workstation which accesses the data. If it is a commercial
application and the revenue supports it, or if it is important enough to you
and you can personally afford it, I suggest you might look into an online
storage solution.
Remember, RAID arrays are fault tolerant, not fault-free, and while
hard drives are frail, the most likely source of data failure by far is user
error.
> * Some time ago I did something to have one device fail which resulted md3
> in having only 1 device.
I presume md3 is the /home array and this was a 2 drive RAID1 array,
yes?
> * Time went by without me noticing (because I suck)
See above. We human beings all tend to suck from time to time.
Computers can help by reminding or notifying us of things - if we bother to
set them up to do so.
> * An update broke my raid setup and gave me a kernel panic (because I
> suck).
> Didn't put the mdadm and raid hooks in mkinitcpio.conf
> * Booted a live-cd, mounted the drives and chrooted back into the system
> and
> fixed the mkinitcpio.conf
This all sounds like lessons learned.
> * Rebooted and noticed that md3 was running with only 1 device
> * Added sdb4 to md3 and it then read 1 device with 1 spare
> * cat /proc/mdstat started to say "recovery"
> * All data from approx. 1 year is gone
Was sdb4 originally the second partition in the array? What is the
first partition? What was the apparent cause of the failure?
> I guessing that the old (not updated) device was set as "master" and the
> data on the drive (containing newer data) was overwritten by data on the
> old
> device - is this plausible?
Well, I suppose, yeah.
> If not what exactly did I do to delete all of the data?
What command did you use to add the partition back to the array?
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: device with newer data added as spare - data now gone?
2009-06-26 11:46 device with newer data added as spare - data now gone? Molinero
2009-06-28 19:04 ` Leslie Rhorer
@ 2009-07-01 2:43 ` Roger Heflin
[not found] ` <ab50889f0907010231k2af2f35l7d880187a395b6d0@mail.gmail.com>
1 sibling, 1 reply; 4+ messages in thread
From: Roger Heflin @ 2009-07-01 2:43 UTC (permalink / raw)
To: Molinero; +Cc: linux-raid
Molinero wrote:
> Hi all
>
> I've lost quite a lot of data on my /home raid partition and I'm wondering
> what exactly I did to make it happen. I'd like to know so something similar
> won't happen in the future.
>
> I'm pretty much a raid newbie. I setup raid1 on my home server and I'm
> guessing that something like this happened. Please tell me if it's possible.
>
> * Some time ago I did something to have one device fail which resulted md3
> in having only 1 device.
> * Time went by without me noticing (because I suck)
> * An update broke my raid setup and gave me a kernel panic (because I suck).
> Didn't put the mdadm and raid hooks in mkinitcpio.conf
> * Booted a live-cd, mounted the drives and chrooted back into the system and
> fixed the mkinitcpio.conf
> * Rebooted and noticed that md3 was running with only 1 device
> * Added sdb4 to md3 and it then read 1 device with 1 spare
> * cat /proc/mdstat started to say "recovery"
> * All data from approx. 1 year is gone
>
> I guessing that the old (not updated) device was set as "master" and the
> data on the drive (containing newer data) was overwritten by data on the old
> device - is this plausible?
If the old device was brought up as md3 and had dropped out months
ago, the data would now be the data that existed when that disk
dropped off. And when a device drops out, there is no mark on that
device marking it as bad since the typical reasons for the device
dropping off are that it is not longer talking. And sometimes
mirrors are intentionally broken for various reasons to preserve a
copy for one reason or another such as to be able to back out of a
serious OS upgrade that did not go well quickly.
If you added the current device as a spare it would have copied the
data from the old device over the current device.
That is one thing that would make 3+ disk raid5 a bit more resistant
to this, with a dropped off disk you could not start the array with
only the dropped device, and with all 3, 2 of the devices will know
the 3rd was dropped at some time in the past, and with any 2 on of
those devices would believe the other one was marked bad.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: device with newer data added as spare - data now gone?
[not found] ` <ab50889f0907010231k2af2f35l7d880187a395b6d0@mail.gmail.com>
@ 2009-07-01 9:32 ` John McNulty
0 siblings, 0 replies; 4+ messages in thread
From: John McNulty @ 2009-07-01 9:32 UTC (permalink / raw)
To: linux-raid
I've got myself into the babit of comparing the output from "cat /proc/mdstat"
and "mdadm -Esbv" to see if there's any old md metadata
floating around on disks I'm about to use before using them. Just as
a precaution. If I find any then I --zero-superblock the disk first
before re-using it, just to prevent myself getting caught out by
events like this.
Rgds,
John
On Wed, Jul 1, 2009 at 3:43 AM, Roger Heflin<rogerheflin@gmail.com> wrote:
> Molinero wrote:
>>
>> Hi all
>>
>> I've lost quite a lot of data on my /home raid partition and I'm wondering
>> what exactly I did to make it happen. I'd like to know so something
>> similar
>> won't happen in the future.
>>
>> I'm pretty much a raid newbie. I setup raid1 on my home server and I'm
>> guessing that something like this happened. Please tell me if it's
>> possible.
>>
>> * Some time ago I did something to have one device fail which resulted md3
>> in having only 1 device.
>> * Time went by without me noticing (because I suck)
>> * An update broke my raid setup and gave me a kernel panic (because I
>> suck).
>> Didn't put the mdadm and raid hooks in mkinitcpio.conf
>> * Booted a live-cd, mounted the drives and chrooted back into the system
>> and
>> fixed the mkinitcpio.conf
>> * Rebooted and noticed that md3 was running with only 1 device
>> * Added sdb4 to md3 and it then read 1 device with 1 spare
>> * cat /proc/mdstat started to say "recovery"
>> * All data from approx. 1 year is gone
>>
>> I guessing that the old (not updated) device was set as "master" and the
>> data on the drive (containing newer data) was overwritten by data on the
>> old
>> device - is this plausible?
>
> If the old device was brought up as md3 and had dropped out months ago, the
> data would now be the data that existed when that disk dropped off. And
> when a device drops out, there is no mark on that device marking it as bad
> since the typical reasons for the device dropping off are that it is not
> longer talking. And sometimes mirrors are intentionally broken for
> various reasons to preserve a copy for one reason or another such as to be
> able to back out of a serious OS upgrade that did not go well quickly.
>
> If you added the current device as a spare it would have copied the data
> from the old device over the current device.
>
> That is one thing that would make 3+ disk raid5 a bit more resistant to
> this, with a dropped off disk you could not start the array with only the
> dropped device, and with all 3, 2 of the devices will know the 3rd was
> dropped at some time in the past, and with any 2 on of those devices would
> believe the other one was marked bad.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-07-01 9:32 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-26 11:46 device with newer data added as spare - data now gone? Molinero
2009-06-28 19:04 ` Leslie Rhorer
2009-07-01 2:43 ` Roger Heflin
[not found] ` <ab50889f0907010231k2af2f35l7d880187a395b6d0@mail.gmail.com>
2009-07-01 9:32 ` John McNulty
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).