From: Giovanni Tessore <giotex@texsoft.it>
To: linux-raid@vger.kernel.org
Subject: Read errors on raid5 ignored, array still clean .. then disaster !!
Date: Tue, 26 Jan 2010 23:28:03 +0100 [thread overview]
Message-ID: <4B5F6C73.30707@texsoft.it> (raw)
Hello everybody!
I'm not very deep inside software raid, so I'd like some expert's help
I'm having a big problem with a raid5 array with 6 sata disks: /dev/md3
made of /dev/sd[acbdef]4
kernel is 2.6.24 (ubuntu 8.04 2.6.24-21-server)
mdadm - v2.6.3 - 20th August 2007
Here is what happened as read from logs:
- since beginning of december a lot (hundreds) of read errors occurred
on /dev/sdb, but md3 silently recovered them, WITHOUT setting the device
as faulty (see error reported below) or signaling the situation
- on 18 january a failure occured on /dev/sdf, and md3 marked it as faulty
- after /dev/sdf was replaced with new disk and re-added to array, the
resync started
- at 98% of the resync, a read error occurred on /dev/sdb (as is was
clearly in bad shape) and the whole array became unusable !!!
Is this some kind of bug?
Is there any way to configure raid in order to have devices marked
faulty on read errors (at least when they clearly become too many)?
This could (and for me did) bring to big disasters!
Suppose you have a 4 disk raid with 2 spare disk ready for recovery
There are lot of read errors on disk 1, but md silently recovers them
whitout marking disk as faulty (as it did for me)
Disk 3 fails
md adds one of the spare disks, and starts resync
resync fails due to the read errors on disk 1
everything is lost! till having 2 spare disks!!!???
This is no fault tollerance ... it's fault creation!!!
In a post of some months ago of a person who had a similar problem, I
read as reply that ignoring the read errors is the wanted behaviour of
md ... but I can't believe this!!
I was able to recover something with
mdadm --create /dev/md3 --assume-clean --level=5 --raid-devices=6
--spare-devices=0 /dev/sda4 /dev/sdb4 /dev/sdc4 /dev/sdd4 /dev/sde4 missing
and use md3 in degraded mode, reapplying the command on each read error
on /dev/sdb
Thanks in advance
Read errors reported into log about /dev/sdb long before the failure of
/dev/sdf where like (notice the data recover message at bottom):
Dec 27 11:40:45 teroknor kernel: res
41/40:08:3b:b2:c3/14:00:38:00:00/00 Emask 0x409 (media error) <F>
Dec 27 11:40:45 teroknor kernel: ata2.00: configured for UDMA/133
Dec 27 11:40:45 teroknor kernel: ata2: EH complete
Dec 27 11:40:45 teroknor kernel: sd 1:0:0:0: [sdb] 976773168 512-byte
hardware sectors (500108 MB)
Dec 27 11:40:45 teroknor kernel: sd 1:0:0:0: [sdb] Write Protect is off
Dec 27 11:40:45 teroknor kernel: sd 1:0:0:0: [sdb] Write cache:
enabled, read cache: enabled, doesn't support DPO or FUA
Dec 27 11:40:48 teroknor kernel: res
41/40:08:3b:b2:c3/14:00:38:00:00/00 Emask 0x409 (media error) <F>
Dec 27 11:40:48 teroknor kernel: ata2.00: configured for UDMA/133
Dec 27 11:40:48 teroknor kernel: ata2: EH complete
Dec 27 11:40:48 teroknor kernel: sd 1:0:0:0: [sdb] 976773168 512-byte
hardware sectors (500108 MB)
Dec 27 11:40:48 teroknor kernel: sd 1:0:0:0: [sdb] Write Protect is off
Dec 27 11:40:48 teroknor kernel: sd 1:0:0:0: [sdb] Write cache:
enabled, read cache: enabled, doesn't support DPO or FUA
Dec 27 11:40:51 teroknor kernel: res
41/40:08:3b:b2:c3/14:00:38:00:00/00 Emask 0x409 (media error) <F>
Dec 27 11:40:51 teroknor kernel: ata2.00: configured for UDMA/133
Dec 27 11:40:51 teroknor kernel: ata2: EH complete
Dec 27 11:40:51 teroknor kernel: sd 1:0:0:0: [sdb] 976773168 512-byte
hardware sectors (500108 MB)
Dec 27 11:40:51 teroknor kernel: sd 1:0:0:0: [sdb] Write Protect is off
Dec 27 11:40:51 teroknor kernel: sd 1:0:0:0: [sdb] Write cache:
enabled, read cache: enabled, doesn't support DPO or FUA
Dec 27 11:40:54 teroknor kernel: res
41/40:08:3b:b2:c3/14:00:38:00:00/00 Emask 0x409 (media error) <F>
Dec 27 11:40:54 teroknor kernel: ata2.00: configured for UDMA/133
Dec 27 11:40:54 teroknor kernel: ata2: EH complete
Dec 27 11:40:54 teroknor kernel: sd 1:0:0:0: [sdb] 976773168 512-byte
hardware sectors (500108 MB)
Dec 27 11:40:54 teroknor kernel: sd 1:0:0:0: [sdb] Write Protect is off
Dec 27 11:40:54 teroknor kernel: sd 1:0:0:0: [sdb] Write cache:
enabled, read cache: enabled, doesn't support DPO or FUA
Dec 27 11:40:57 teroknor kernel: res
41/40:08:3b:b2:c3/14:00:38:00:00/00 Emask 0x409 (media error) <F>
Dec 27 11:40:57 teroknor kernel: ata2.00: configured for UDMA/133
Dec 27 11:40:57 teroknor kernel: ata2: EH complete
Dec 27 11:40:57 teroknor kernel: sd 1:0:0:0: [sdb] 976773168 512-byte
hardware sectors (500108 MB)
Dec 27 11:40:57 teroknor kernel: sd 1:0:0:0: [sdb] Write Protect is off
Dec 27 11:40:57 teroknor kernel: sd 1:0:0:0: [sdb] Write cache:
enabled, read cache: enabled, doesn't support DPO or FUA
Dec 27 11:40:59 teroknor kernel: res
41/40:08:3b:b2:c3/14:00:38:00:00/00 Emask 0x409 (media error) <F>
Dec 27 11:40:59 teroknor kernel: ata2.00: configured for UDMA/133
Dec 27 11:40:59 teroknor kernel: sd 1:0:0:0: [sdb] Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Dec 27 11:40:59 teroknor kernel: sd 1:0:0:0: [sdb] Sense Key : Medium
Error [current] [descriptor]
Dec 27 11:40:59 teroknor kernel: Descriptor sense data with sense
descriptors (in hex):
Dec 27 11:40:59 teroknor kernel: 72 03 11 04 00 00 00 0c 00 0a
80 00 00 00 00 00
Dec 27 11:40:59 teroknor kernel: 00 00 00 3b
Dec 27 11:40:59 teroknor kernel: sd 1:0:0:0: [sdb] Add. Sense:
Unrecovered read error - auto reallocate failed
Dec 27 11:40:59 teroknor kernel: end_request: I/O error, dev sdb,
sector 952349242
Dec 27 11:40:59 teroknor kernel: ata2: EH complete
Dec 27 11:40:59 teroknor kernel: sd 1:0:0:0: [sdb] 976773168 512-byte
hardware sectors (500108 MB)
Dec 27 11:40:59 teroknor kernel: sd 1:0:0:0: [sdb] Write Protect is off
Dec 27 11:40:59 teroknor kernel: sd 1:0:0:0: [sdb] Write cache:
enabled, read cache: enabled, doesn't support DPO or FUA
Dec 27 11:41:00 teroknor kernel: raid5:md3: read error corrected (8
sectors at 942549592 on sdb4)
--
Cordiali saluti.
Yours faithfully.
Giovanni Tessore
next reply other threads:[~2010-01-26 22:28 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-26 22:28 Giovanni Tessore [this message]
2010-01-27 7:41 ` Read errors on raid5 ignored, array still clean .. then disaster !! Luca Berra
2010-01-27 9:01 ` Goswin von Brederlow
2010-01-29 10:48 ` Neil Brown
2010-01-29 11:58 ` Goswin von Brederlow
2010-01-29 19:14 ` Giovanni Tessore
2010-01-30 7:58 ` Luca Berra
2010-01-30 15:52 ` Giovanni Tessore
2010-01-30 7:54 ` Luca Berra
2010-01-30 10:55 ` Giovanni Tessore
2010-01-30 18:44 ` Giovanni Tessore
2010-01-30 21:41 ` Asdo
2010-01-30 22:20 ` Giovanni Tessore
2010-01-31 1:23 ` Roger Heflin
2010-01-31 10:45 ` Giovanni Tessore
2010-01-31 14:08 ` Roger Heflin
2010-01-31 14:31 ` Asdo
2010-02-01 10:56 ` Giovanni Tessore
2010-02-01 12:45 ` Asdo
2010-02-01 15:11 ` Giovanni Tessore
2010-02-01 13:27 ` Luca Berra
2010-02-01 15:51 ` Giovanni Tessore
2010-01-27 9:01 ` Asdo
2010-01-27 10:09 ` Giovanni Tessore
2010-01-27 10:50 ` Asdo
2010-01-27 15:06 ` Goswin von Brederlow
2010-01-27 16:15 ` Giovanni Tessore
2010-01-27 19:33 ` Richard Scobie
-- strict thread matches above, loose matches on Subject: below --
2010-01-27 9:56 Giovanni Tessore
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B5F6C73.30707@texsoft.it \
--to=giotex@texsoft.it \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).