From: Mitchell Laks <mlaks@verizon.net>
To: linux-raid@vger.kernel.org
Subject: Re: disaster. raid1 drive failure rsync=DELAYED why?? please help
Date: Mon, 14 Mar 2005 02:43:24 -0500 [thread overview]
Message-ID: <200503140243.25308.mlaks@verizon.net> (raw)
In-Reply-To: <423460EE.9070602@dgreaves.com>
On Sunday 13 March 2005 10:49 am, David Greave wrote: Many Helpful remarks:
>
David I am grateful that you were there for me.
I went to the site. I connected a monitor to the headless machine. I saw the
screen flooded with write errors to the spare drive in the original raid1.
The terminals were flooded on tty1-6. I had to log in remotely on network. I
tried to shutdown -h now. I tried "init 1" as root. No go. I had to hard
power down the machine. Debian boot dropped me into single user mode on
bootup and I took /dev/md0 out of the /etc/fstab to get the machine to boot
past the fsck on boot.
I looked through /var/log/messages.0. I found that on last wednesday at 10am
drive /dev/hda1 failed. The paired drive (it actually was /dev/hdg1) in the
raid1 began to issue bad kernel messages immediately. These filled the /var
directory to 100% and seemed to have caused all the bad behavior we saw.
I had noticed that /var was full initially and made room by cutting out much
of /var/log/messages.
I likely could not successfully run "shutdown -h now" because the /var
partition likely needed some kind of fsck or something to deal with having
been filled and the many many processes that had been writing to it were very
"angry" and in a "messy state". They needed a powerdown. (Very m$ftlike).
I am not sure I mentioned here, (but I discussed on another mailing list :)),
that my main application on the server is a database backed application
running off of a postgresql backend.
Postgresql was also put into a weird state by this incident - not because
there was anything wrong with it. Just because filling /var partition caused
multiple effects, cause postgresql databases live in /var/lib/postgres. I
could not run pg_ctl stop or pg_ctl stop -m fast. Only pg_ctl stop -m
immediate. Luckily I was able to rescue the database.
My assessment (correct me if I am wrong) is that I have to rethink my
architecture. As I continue to work with software raid, I likely will have to
move the postgresql database to a separate partition, so I will not have
mixing of points of failure.
I took out the 2 drives /dev/hda1 and /dev/hdg1 from the machine. I restored
my systems from the most recent backup, with only a few days worth of suspect
data (wed/thur/friday ...). I replaced with new hard drives. Its good to have
duplicate servers and raids. Both are neccesary I see.
I will play with /dev/hdg1 a little on a different machine to see what it
behaves like. I suspect with all those errors it is really dead too.
I just had bad luck. A double disk failure.
Thank you David again!
Mitchell
next prev parent reply other threads:[~2005-03-14 7:43 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-03-13 4:51 disaster. raid1 drive failure rsync=DELAYED why?? please help Mitchell Laks
2005-03-13 9:49 ` David Greaves
2005-03-13 14:32 ` Mitchell Laks
2005-03-13 15:23 ` David Greaves
2005-03-13 15:49 ` David Greaves
2005-03-14 7:43 ` Mitchell Laks [this message]
2005-03-14 9:49 ` David Greaves
-- strict thread matches above, loose matches on Subject: below --
2005-03-13 6:23 Mitchell Laks
2005-03-13 6:45 ` Mitchell Laks
2005-03-13 7:22 Mitchell Laks
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200503140243.25308.mlaks@verizon.net \
--to=mlaks@verizon.net \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).