From: peter pilsl <pilsl@goldfisch.at>
To: linux-raid@vger.kernel.org
Subject: raid1-diseaster on reboot: old version overwrites new version
Date: Sat, 02 Apr 2005 17:43:51 +0200 [thread overview]
Message-ID: <424EBDB7.2000106@goldfisch.at> (raw)
Two days ago I had a severe servercrash due to raid-problems. The whole
thing started with a (homemade) DOS-attack on the server. The server
went to its knees and needed to be resetted. After the reboot the server
was working fine and background-reconstruction of the mirrors started.
About 30 minutes later the first anomalies occured. Applications
reported missing libraries, fs-errors (reiserfs) and so on.
It took a while until I reckognized what was going on:
the /-partition was on a raid1 - /dev/md2 - based on two disks : hda6+hdc6.
For some reason the raid seemed to be out of sync for over a year and
hdc6 holded a old copy that was now successively overwriting hda6 and
changing the content of / while the raid was running.
I booted with a live-cd to discover the hdc6 was the exact copy of
spring 2004 (easily found out by content and timestamps of various files
over the system) and hda6 was not mountable. I ran reiserfsck and had
the tree rebuild on hda6, but it was too late. All current data was gone.
I had a backup and server is up again and my head is on my shoulders,
but it leaves a lot of questions to me:
* how can the raid be out of sync. I monitor /proc/mdstat on a
5-minute-interval and log the content to files. The output was
definitely like:
md2 : active raid1 hdc6[0] hda6[1]
5120000 blocks [2/2] [UU]
over the last year without a single exception. I just tested the entries
in my watchdog and checked functionality of the watchdog by removing one
disk. It definitely barks.
* how can in case of a unsynced raid the old version overwrite the new
version. This is like a nightmare (and I remember having such thing before)
* What did I do wrong?
The only explantion to me is, that I had the wrong entry in my
lilo.conf. I had root=/dev/hda6 there instead of root=/dev/md2
So maybe root was always mounted as /dev/hda6 and never as /dev/md2,
which was started, but never had any data written to it. Is this a
possible explanation?
kernel 2.4.24
raidtools-0.90
thnx for any advice,
peter
--
mag. peter pilsl
goldfisch.at
IT-management
tel +43 699 1 3574035
fax +43 699 4 3574035
pilsl@goldfisch.at
next reply other threads:[~2005-04-02 15:43 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-04-02 15:43 peter pilsl [this message]
2005-04-02 17:27 ` raid1-diseaster on reboot: old version overwrites new version Gordon Henderson
2005-04-02 17:35 ` Tim Moore
2005-04-02 18:10 ` peter pilsl
2005-04-04 19:39 ` Doug Ledford
2005-04-02 22:31 ` Neil Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=424EBDB7.2000106@goldfisch.at \
--to=pilsl@goldfisch.at \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.