From: peter pilsl <pilsl@goldfisch.at>
To: linux-raid@vger.kernel.org
Subject: raid1-diseaster on reboot: old version overwrites new version
Date: Sat, 02 Apr 2005 17:43:51 +0200 [thread overview]
Message-ID: <424EBDB7.2000106@goldfisch.at> (raw)
Two days ago I had a severe servercrash due to raid-problems. The whole
thing started with a (homemade) DOS-attack on the server. The server
went to its knees and needed to be resetted. After the reboot the server
was working fine and background-reconstruction of the mirrors started.
About 30 minutes later the first anomalies occured. Applications
reported missing libraries, fs-errors (reiserfs) and so on.
It took a while until I reckognized what was going on:
the /-partition was on a raid1 - /dev/md2 - based on two disks : hda6+hdc6.
For some reason the raid seemed to be out of sync for over a year and
hdc6 holded a old copy that was now successively overwriting hda6 and
changing the content of / while the raid was running.
I booted with a live-cd to discover the hdc6 was the exact copy of
spring 2004 (easily found out by content and timestamps of various files
over the system) and hda6 was not mountable. I ran reiserfsck and had
the tree rebuild on hda6, but it was too late. All current data was gone.
I had a backup and server is up again and my head is on my shoulders,
but it leaves a lot of questions to me:
* how can the raid be out of sync. I monitor /proc/mdstat on a
5-minute-interval and log the content to files. The output was
definitely like:
md2 : active raid1 hdc6[0] hda6[1]
5120000 blocks [2/2] [UU]
over the last year without a single exception. I just tested the entries
in my watchdog and checked functionality of the watchdog by removing one
disk. It definitely barks.
* how can in case of a unsynced raid the old version overwrite the new
version. This is like a nightmare (and I remember having such thing before)
* What did I do wrong?
The only explantion to me is, that I had the wrong entry in my
lilo.conf. I had root=/dev/hda6 there instead of root=/dev/md2
So maybe root was always mounted as /dev/hda6 and never as /dev/md2,
which was started, but never had any data written to it. Is this a
possible explanation?
kernel 2.4.24
raidtools-0.90
thnx for any advice,
peter
--
mag. peter pilsl
goldfisch.at
IT-management
tel +43 699 1 3574035
fax +43 699 4 3574035
pilsl@goldfisch.at
next reply other threads:[~2005-04-02 15:43 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-04-02 15:43 peter pilsl [this message]
2005-04-02 17:27 ` raid1-diseaster on reboot: old version overwrites new version Gordon Henderson
2005-04-02 17:35 ` Tim Moore
2005-04-02 18:10 ` peter pilsl
2005-04-04 19:39 ` Doug Ledford
2005-04-02 22:31 ` Neil Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=424EBDB7.2000106@goldfisch.at \
--to=pilsl@goldfisch.at \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).