raid1-diseaster on reboot: old version overwrites new version

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: peter pilsl <pilsl@goldfisch.at>
To: linux-raid@vger.kernel.org
Subject: raid1-diseaster on reboot: old version overwrites new version
Date: Sat, 02 Apr 2005 17:43:51 +0200	[thread overview]
Message-ID: <424EBDB7.2000106@goldfisch.at> (raw)

Two days ago I had a severe servercrash due to raid-problems. The whole 
thing started with a (homemade) DOS-attack on the server. The server 
went to its knees and needed to be resetted. After the reboot the server 
was working fine and background-reconstruction of the mirrors started.
About 30 minutes later the first anomalies occured. Applications 
reported missing libraries, fs-errors (reiserfs) and so on.
It took a while until I reckognized what was going on:

the /-partition was on a raid1 - /dev/md2 - based on two disks : hda6+hdc6.

For some reason the raid seemed to be out of sync for over a year and 
hdc6 holded a old copy that was now successively overwriting hda6 and 
changing the content of / while the raid was running.
I booted with a live-cd to discover the hdc6 was the exact copy of 
spring 2004 (easily found out by content and timestamps of various files 
over the system) and hda6 was not mountable. I ran reiserfsck and had 
the tree rebuild on hda6, but it was too late. All current data was gone.

I had a backup and server is up again and my head is on my shoulders, 
but it leaves a lot of questions to me:

* how can the raid be out of sync. I monitor /proc/mdstat on a 
5-minute-interval and log the content to files. The output was 
definitely like:

md2 : active raid1 hdc6[0] hda6[1]
       5120000 blocks [2/2] [UU]

over the last year without a single exception. I just tested the entries 
in my watchdog and checked functionality of the watchdog by removing one 
disk. It definitely barks.

* how can in case of a unsynced raid the old version overwrite the new 
version. This is like a nightmare (and I remember having such thing before)

* What did I do wrong?

The only explantion to me is, that I had the wrong entry in my 
lilo.conf. I had root=/dev/hda6 there instead of root=/dev/md2
So maybe root was always mounted as /dev/hda6 and never as /dev/md2, 
which was started, but never had any data written to it. Is this a 
possible explanation?

kernel 2.4.24
raidtools-0.90

thnx for any advice,
peter

-- 
mag. peter pilsl
goldfisch.at
IT-management
tel +43 699 1 3574035
fax +43 699 4 3574035
pilsl@goldfisch.at

next             reply	other threads:[~2005-04-02 15:43 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-02 15:43 peter pilsl [this message]
2005-04-02 17:27 ` raid1-diseaster on reboot: old version overwrites new version Gordon Henderson
2005-04-02 17:35 ` Tim Moore
2005-04-02 18:10   ` peter pilsl
2005-04-04 19:39   ` Doug Ledford
2005-04-02 22:31 ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=424EBDB7.2000106@goldfisch.at \
    --to=pilsl@goldfisch.at \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).