linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* raid1-diseaster on reboot: old version overwrites new version
@ 2005-04-02 15:43 peter pilsl
  2005-04-02 17:27 ` Gordon Henderson
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: peter pilsl @ 2005-04-02 15:43 UTC (permalink / raw)
  To: linux-raid


Two days ago I had a severe servercrash due to raid-problems. The whole 
thing started with a (homemade) DOS-attack on the server. The server 
went to its knees and needed to be resetted. After the reboot the server 
was working fine and background-reconstruction of the mirrors started.
About 30 minutes later the first anomalies occured. Applications 
reported missing libraries, fs-errors (reiserfs) and so on.
It took a while until I reckognized what was going on:

the /-partition was on a raid1 - /dev/md2 - based on two disks : hda6+hdc6.

For some reason the raid seemed to be out of sync for over a year and 
hdc6 holded a old copy that was now successively overwriting hda6 and 
changing the content of / while the raid was running.
I booted with a live-cd to discover the hdc6 was the exact copy of 
spring 2004 (easily found out by content and timestamps of various files 
over the system) and hda6 was not mountable. I ran reiserfsck and had 
the tree rebuild on hda6, but it was too late. All current data was gone.

I had a backup and server is up again and my head is on my shoulders, 
but it leaves a lot of questions to me:

* how can the raid be out of sync. I monitor /proc/mdstat on a 
5-minute-interval and log the content to files. The output was 
definitely like:

md2 : active raid1 hdc6[0] hda6[1]
       5120000 blocks [2/2] [UU]

over the last year without a single exception. I just tested the entries 
in my watchdog and checked functionality of the watchdog by removing one 
disk. It definitely barks.

* how can in case of a unsynced raid the old version overwrite the new 
version. This is like a nightmare (and I remember having such thing before)

* What did I do wrong?

The only explantion to me is, that I had the wrong entry in my 
lilo.conf. I had root=/dev/hda6 there instead of root=/dev/md2
So maybe root was always mounted as /dev/hda6 and never as /dev/md2, 
which was started, but never had any data written to it. Is this a 
possible explanation?


kernel 2.4.24
raidtools-0.90

thnx for any advice,
peter







-- 
mag. peter pilsl
goldfisch.at
IT-management
tel +43 699 1 3574035
fax +43 699 4 3574035
pilsl@goldfisch.at

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-04-04 19:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-02 15:43 raid1-diseaster on reboot: old version overwrites new version peter pilsl
2005-04-02 17:27 ` Gordon Henderson
2005-04-02 17:35 ` Tim Moore
2005-04-02 18:10   ` peter pilsl
2005-04-04 19:39   ` Doug Ledford
2005-04-02 22:31 ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).