From: Mitchell Laks <mlaks@verizon.net>
To: linux-raid@vger.kernel.org
Subject: apparent but not real raid1 failure. what happened? still confused. Gurus Please help...
Date: Mon, 02 May 2005 12:24:35 -0400 [thread overview]
Message-ID: <200505021224.35396.mlaks@verizon.net> (raw)
I recently had what appeared to be
a raid1 failure and was wondering what insight I should draw. The kernel
diagnostics suggested a dual drive failure - but the data turned out to still
be there. What does this mean?
I described what happened in an earlier post, but I really don't understand
and would be very grateful for insight from the gurus on the list.
Is it a bug in the kernel? in software raid? Is it my stupidity?
my system:
asus K8v-x motherboard with amd64 processor,
uname -a
Linux A2 2.6.8-1-386 #1 Mon Jan 24 03:01:58 EST 2005 i686 GNU/Linux
(this was the debian stock 2.6.8 kernel circa January)
mdadm-v1.9.0
All harddrives are 250GB parallel ata ide drives,
WD2500 JB drives (3 year warranty)
Initially, one raid failed:
/dev/md0 between /dev/hda1 and
/dev/hdg1 with the /dev/hdg1 on a highpoint rocket 133 controller.
From reading the log files I see that initially /dev/hda1 died
Apr 21 07:36:01 A2 kernel: hda: dma_intr: status=0x51 { DriveReady
SeekComplete Error }
Apr 21 07:36:01 A2 kernel: hda: dma_intr: error=0x40 { UncorrectableError },
LBAsect=209715335, high=12, low=8388743, sector=209
715335
Apr 21 07:36:01 A2 kernel: end_request: I/O error, dev hda, sector 209715335
Apr 21 07:36:01 A2 kernel: raid1: Disk failure on hda1, disabling device.
Apr 21 07:36:01 A2 kernel: ^IOperation continuing on 1 devices
Apr 21 07:36:01 A2 kernel: raid1: hda1: rescheduling sector 209715272
Apr 21 07:36:01 A2 kernel: RAID1 conf printout:
Apr 21 07:36:01 A2 kernel: --- wd:1 rd:2
Apr 21 07:36:01 A2 kernel: disk 0, wo:1, o:0, dev:hda1
Apr 21 07:36:01 A2 kernel: disk 1, wo:0, o:1, dev:hdg1
Apr 21 07:36:01 A2 kernel: RAID1 conf printout:
Apr 21 07:36:01 A2 kernel: --- wd:1 rd:2
Apr 21 07:36:01 A2 kernel: disk 1, wo:0, o:1, dev:hdg1
Apr 21 07:36:01 A2 kernel: raid1: hdg1: redirecting sector 209715272 to
another mirror
Apr 21 07:36:21 A2 kernel: hdg: dma_timer_expiry: dma status == 0x20
Apr 21 07:36:21 A2 kernel: hdg: DMA timeout retry
Apr 21 07:36:21 A2 kernel: hdg: timeout waiting for DMA
Apr 21 07:36:21 A2 kernel: hdg: status error: status=0x58 { DriveReady
SeekComplete DataRequest }
Apr 21 07:36:21 A2 kernel:
Apr 21 07:36:21 A2 kernel: hdg: drive not ready for command
Apr 21 07:36:21 A2 kernel: raid1: hdg1: rescheduling sector 209715272
Apr 21 07:36:21 A2 kernel: raid1: hdg1: redirecting sector 209715272 to
another mirror
Apr 21 07:36:21 A2 kernel: hdg: status error: status=0x58 { DriveReady
SeekComplete DataRequest }
Apr 21 07:36:21 A2 kernel:
Apr 21 07:36:21 A2 kernel: hdg: drive not ready for command
Apr 21 07:36:41 A2 kernel: hdg: dma_timer_expiry: dma status == 0x20
Apr 21 07:36:41 A2 kernel: hdg: DMA timeout retry
Apr 21 07:36:41 A2 kernel: hdg: timeout waiting for DMA
Apr 21 07:36:41 A2 kernel: hdg: status error: status=0x58 { DriveReady
SeekComplete DataRequest }
Apr 21 07:36:41 A2 kernel:
Apr 21 07:36:41 A2 kernel: hdg: drive not ready for command
Apr 21 07:36:41 A2 kernel: raid1: hdg1: rescheduling sector 209715272
Apr 21 07:36:41 A2 kernel: raid1: hdg1: redirecting sector 209715272 to
another mirror
Apr 21 07:36:41 A2 kernel: hdg: status error: status=0x58 { DriveReady
SeekComplete DataRequest }
Apr 21 07:36:41 A2 kernel:
and then /dev/hdg1 immediately began to spew forth error messages of the
following sort
Apr 22 22:29:21 A2 kernel: hdg: status error: status=0x58 { DriveReady
SeekComplete DataRequest }
Apr 22 22:29:21 A2 kernel:
Apr 22 22:29:21 A2 kernel: hdg: drive not ready for command
Apr 22 22:29:21 A2 kernel: raid1: hdg1: rescheduling sector 209715272
Apr 22 22:29:21 A2 kernel: raid1: hdg1: redirecting sector 209715272 to
another
mirror
Apr 22 22:29:21 A2 kernel: hdg: status error: status=0x58 { DriveReady
SeekCompl
ete DataRequest }
Apr 22 22:29:21 A2 kernel:
Apr 22 22:29:21 A2 kernel: hdg: drive not ready for command
Apr 22 22:29:21 A2 kernel: raid1: hdg1: rescheduling sector 209715272
Apr 22 22:29:21 A2 kernel: raid1: hdg1: redirecting sector 209715272 to other
....
These errors continued nonstop all day/night until /
var ran out of space and errors filled the 6GB /var partition.
2.6GB of /var/log/kern.log and
2.6GB of /var/log/syslog and
1GB of /var/log/messages
were filled by these errors.
I then pulled the two drives out of the system and
put a pair of new drives in for /dev/hda1 and /dev/hdg1 and
and created /dev/md0 anew, and restored the data to my servers from backups.
I then took the two drives /dev/hda1 and /dev/hdg1 to another machine
and ran the Western Digital drive diagnostics on both of them and they
are both fine. No errors.
I then took /dev/hda1 on the new system and did
modprobe raid1
mknod /dev/md0 b 9 0
mdadm -A /dev/md0 /dev/hda1
and then mount /dev/md0 /mnt and i see my data which looks intact.
Similarly if I do that with /dev/hdg1 i see the same data.
(note if i then try to do mdadm -A /dev/md0 /dev/hda1 /dev/hdc1
(where /dev/hdc1 was /dev/hdg1 on the other machine) then i get a message
saying effectively that they are not up to date to each other ...)
Has anyone else had this trouble? Could someone explain what happened?
What should I have done when I found the errors when my system failed?
Is it safe for me to continue to use raid1?
Thanks,
Mitchell
next reply other threads:[~2005-05-02 16:24 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-05-02 16:24 Mitchell Laks [this message]
2005-05-02 18:20 ` apparent but not real raid1 failure. what happened? still confused. Gurus Please help Peter T. Breuer
2005-06-02 15:23 ` Is there a drive error "retry" parameter? Carlos Knowlton
2005-06-02 17:16 ` Michael Tokarev
2005-06-03 9:21 ` danci
2005-06-14 21:53 ` Carlos Knowlton
2005-06-14 22:46 ` Michael Tokarev
2005-06-15 21:40 ` Carlos Knowlton
2005-06-16 0:20 ` Paul Clements
2005-06-16 16:23 ` Michael Tokarev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200505021224.35396.mlaks@verizon.net \
--to=mlaks@verizon.net \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).