linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Read errors on raid5 ignored, array still clean .. then disaster !!
@ 2010-01-26 22:28 Giovanni Tessore
  2010-01-27  7:41 ` Luca Berra
  2010-01-27  9:01 ` Asdo
  0 siblings, 2 replies; 29+ messages in thread
From: Giovanni Tessore @ 2010-01-26 22:28 UTC (permalink / raw)
  To: linux-raid

Hello everybody!
I'm not very deep inside software raid, so I'd like some expert's help

I'm having a big problem with a raid5 array with 6 sata disks: /dev/md3 
made of /dev/sd[acbdef]4
kernel is 2.6.24 (ubuntu 8.04 2.6.24-21-server)
mdadm - v2.6.3 - 20th August 2007

Here is what happened as read from logs:
- since beginning of december a lot (hundreds) of read errors occurred 
on /dev/sdb, but md3 silently recovered them, WITHOUT setting the device 
as faulty (see error reported below) or signaling the situation
- on 18 january a failure occured on /dev/sdf, and md3 marked it as faulty
- after /dev/sdf was replaced with new disk and re-added to array, the 
resync started
- at 98% of the resync, a read error occurred on /dev/sdb (as is was 
clearly in bad shape) and the whole array became unusable !!!

Is this some kind of bug?
Is there any way to configure raid in order to have devices marked 
faulty on read errors (at least when they clearly become too many)?

This could (and for me did) bring to big disasters!
Suppose you have a 4 disk raid with 2 spare disk ready for recovery
There are lot of read errors on disk 1, but md silently recovers them 
whitout marking disk as faulty (as it did for me)
Disk 3 fails
md adds one of the spare disks, and starts resync
resync fails due to the read errors on disk 1
everything is lost! till having 2 spare disks!!!???
This is no fault tollerance ... it's fault creation!!!

In a post of some months ago of a person who had a similar problem, I 
read as reply that ignoring the read errors is the wanted behaviour of 
md ... but I can't believe this!!

I was able to recover something with
mdadm --create /dev/md3 --assume-clean --level=5 --raid-devices=6 
--spare-devices=0 /dev/sda4 /dev/sdb4 /dev/sdc4 /dev/sdd4 /dev/sde4 missing
and use md3 in degraded mode, reapplying the command on each read error 
on /dev/sdb

Thanks in advance


Read errors reported into log about /dev/sdb long before the failure of 
/dev/sdf where like (notice the data recover message at bottom):

Dec 27 11:40:45 teroknor kernel:           res 
41/40:08:3b:b2:c3/14:00:38:00:00/00 Emask 0x409 (media error) <F>
Dec 27 11:40:45 teroknor kernel: ata2.00: configured for UDMA/133
Dec 27 11:40:45 teroknor kernel:  ata2: EH complete
Dec 27 11:40:45 teroknor kernel:  sd 1:0:0:0: [sdb] 976773168 512-byte 
hardware sectors (500108 MB)
Dec 27 11:40:45 teroknor kernel:  sd 1:0:0:0: [sdb] Write Protect is off
Dec 27 11:40:45 teroknor kernel:  sd 1:0:0:0: [sdb] Write cache: 
enabled, read cache: enabled, doesn't support DPO or FUA
Dec 27 11:40:48 teroknor kernel:           res 
41/40:08:3b:b2:c3/14:00:38:00:00/00 Emask 0x409 (media error) <F>
Dec 27 11:40:48 teroknor kernel:  ata2.00: configured for UDMA/133
Dec 27 11:40:48 teroknor kernel:  ata2: EH complete
Dec 27 11:40:48 teroknor kernel:  sd 1:0:0:0: [sdb] 976773168 512-byte 
hardware sectors (500108 MB)
Dec 27 11:40:48 teroknor kernel:  sd 1:0:0:0: [sdb] Write Protect is off
Dec 27 11:40:48 teroknor kernel:  sd 1:0:0:0: [sdb] Write cache: 
enabled, read cache: enabled, doesn't support DPO or FUA
Dec 27 11:40:51 teroknor kernel:           res 
41/40:08:3b:b2:c3/14:00:38:00:00/00 Emask 0x409 (media error) <F>
Dec 27 11:40:51 teroknor kernel:  ata2.00: configured for UDMA/133
Dec 27 11:40:51 teroknor kernel:  ata2: EH complete
Dec 27 11:40:51 teroknor kernel:  sd 1:0:0:0: [sdb] 976773168 512-byte 
hardware sectors (500108 MB)
Dec 27 11:40:51 teroknor kernel:  sd 1:0:0:0: [sdb] Write Protect is off
Dec 27 11:40:51 teroknor kernel:  sd 1:0:0:0: [sdb] Write cache: 
enabled, read cache: enabled, doesn't support DPO or FUA
Dec 27 11:40:54 teroknor kernel:           res 
41/40:08:3b:b2:c3/14:00:38:00:00/00 Emask 0x409 (media error) <F>
Dec 27 11:40:54 teroknor kernel:  ata2.00: configured for UDMA/133
Dec 27 11:40:54 teroknor kernel:  ata2: EH complete
Dec 27 11:40:54 teroknor kernel:  sd 1:0:0:0: [sdb] 976773168 512-byte 
hardware sectors (500108 MB)
Dec 27 11:40:54 teroknor kernel:  sd 1:0:0:0: [sdb] Write Protect is off
Dec 27 11:40:54 teroknor kernel:  sd 1:0:0:0: [sdb] Write cache: 
enabled, read cache: enabled, doesn't support DPO or FUA
Dec 27 11:40:57 teroknor kernel:           res 
41/40:08:3b:b2:c3/14:00:38:00:00/00 Emask 0x409 (media error) <F>
Dec 27 11:40:57 teroknor kernel:  ata2.00: configured for UDMA/133
Dec 27 11:40:57 teroknor kernel:  ata2: EH complete
Dec 27 11:40:57 teroknor kernel:  sd 1:0:0:0: [sdb] 976773168 512-byte 
hardware sectors (500108 MB)
Dec 27 11:40:57 teroknor kernel:  sd 1:0:0:0: [sdb] Write Protect is off
Dec 27 11:40:57 teroknor kernel:  sd 1:0:0:0: [sdb] Write cache: 
enabled, read cache: enabled, doesn't support DPO or FUA
Dec 27 11:40:59 teroknor kernel:           res 
41/40:08:3b:b2:c3/14:00:38:00:00/00 Emask 0x409 (media error) <F>
Dec 27 11:40:59 teroknor kernel:  ata2.00: configured for UDMA/133
Dec 27 11:40:59 teroknor kernel:  sd 1:0:0:0: [sdb] Result: 
hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Dec 27 11:40:59 teroknor kernel:  sd 1:0:0:0: [sdb] Sense Key : Medium 
Error [current] [descriptor]
Dec 27 11:40:59 teroknor kernel:  Descriptor sense data with sense 
descriptors (in hex):
Dec 27 11:40:59 teroknor kernel:          72 03 11 04 00 00 00 0c 00 0a 
80 00 00 00 00 00
Dec 27 11:40:59 teroknor kernel:          00 00 00 3b
Dec 27 11:40:59 teroknor kernel:  sd 1:0:0:0: [sdb] Add. Sense: 
Unrecovered read error - auto reallocate failed
Dec 27 11:40:59 teroknor kernel:  end_request: I/O error, dev sdb, 
sector 952349242
Dec 27 11:40:59 teroknor kernel:  ata2: EH complete
Dec 27 11:40:59 teroknor kernel:  sd 1:0:0:0: [sdb] 976773168 512-byte 
hardware sectors (500108 MB)
Dec 27 11:40:59 teroknor kernel:  sd 1:0:0:0: [sdb] Write Protect is off
Dec 27 11:40:59 teroknor kernel:  sd 1:0:0:0: [sdb] Write cache: 
enabled, read cache: enabled, doesn't support DPO or FUA
Dec 27 11:41:00 teroknor kernel:  raid5:md3: read error corrected (8 
sectors at 942549592 on sdb4)

-- 
Cordiali saluti.
Yours faithfully.

Giovanni Tessore



^ permalink raw reply	[flat|nested] 29+ messages in thread
* Re: Read errors on raid5 ignored, array still clean .. then disaster !!
@ 2010-01-27  9:56 Giovanni Tessore
  0 siblings, 0 replies; 29+ messages in thread
From: Giovanni Tessore @ 2010-01-27  9:56 UTC (permalink / raw)
  To: linux-raid

>
> > Is there any way to configure raid in order to have devices marked faulty 
> > on read errors (at least when they clearly become too many)?
> I don't think so
>   
I think it would be useful to be able to configure the number of 
recovered read error allowed before the device goes faulty.

> > This could (and for me did) bring to big disasters!
> Don't agree with you, you had all the info from syslog
> You should have run smart tests on the disks and proactively replace a
> failing disk.
>   
Would be nice if md issues warning on recovered read error events, such 
as it does for other md events (device failure, etc.).

> it does _not_ ignore read errors 
> in case of read errors mdadm rewrites the erroring sector, and only if
> this fails it will kick the member out of the array.
> with modern drives it is possible to have some failed sector, which the
> drive firmware will reallocate on write (all modern drives have a range
> of sectors reserved for this very purpose)
> mdadm does not do any bookkeeping on reallocated_sector_count per drive
> the drive does. the data can be accessed with smartctl
> drives showing excessive reallocated_sector_count should be replaced.
>   
Sorry, with ignore I mean "it silently manage to recover the read error, 
without alerting anybody"
Btw, as I see from kernel sources, it keep track of recovered read error 
per device instead.
And only when they are > 256 it marks the device faulty (I'm preparing 
another post on it).
So, why to wait for just 256 errors?
I think should be configurable ... and a much lower level for me.

> Consider the following scenario:
> raid5 (sda,b,c,d)
> sda has a read error, mdadm kicks it immediately from the array
> a few minutes/hours later sdc fails completely
> lost data and no time to react, that is far worse than having 50 days of
> warnings and ignoring them.
>   
Yes, but suppose that sda has a number of corrected read errors that is 
250; it's still clean.
sdc fails and is kicked off
resync starts
sda get > 6 read erros during resync, it's set as faulty (and it's 
likely to happen as the drive is clearly dying)
lost data the same way
(this is my real scenario actually, really happened)

Much difference?

Personally i'd prefere to know as soon as possible that something is 
going wrong, if not setting the device faulty, with a warning (by mail 
like other md events), saying "this is the n-th revocered error for this 
device"
IMHO the admin have to be clearly awared *by md*, not other monitoring 
tools, that the array is facing a possible critical sistuation.

> I'm sorry for your data, hope you had backups.
>   
Thanks.
I am trying to recover forcing to re-add the drive which gives read 
errors and using the array in degraded mode ... it seems to work.

Giovanni

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2010-02-01 15:51 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-26 22:28 Read errors on raid5 ignored, array still clean .. then disaster !! Giovanni Tessore
2010-01-27  7:41 ` Luca Berra
2010-01-27  9:01   ` Goswin von Brederlow
2010-01-29 10:48   ` Neil Brown
2010-01-29 11:58     ` Goswin von Brederlow
2010-01-29 19:14     ` Giovanni Tessore
2010-01-30  7:58       ` Luca Berra
2010-01-30 15:52         ` Giovanni Tessore
2010-01-30  7:54     ` Luca Berra
2010-01-30 10:55     ` Giovanni Tessore
2010-01-30 18:44     ` Giovanni Tessore
2010-01-30 21:41       ` Asdo
2010-01-30 22:20         ` Giovanni Tessore
2010-01-31  1:23           ` Roger Heflin
2010-01-31 10:45             ` Giovanni Tessore
2010-01-31 14:08               ` Roger Heflin
2010-01-31 14:31         ` Asdo
2010-02-01 10:56           ` Giovanni Tessore
2010-02-01 12:45             ` Asdo
2010-02-01 15:11               ` Giovanni Tessore
2010-02-01 13:27             ` Luca Berra
2010-02-01 15:51               ` Giovanni Tessore
2010-01-27  9:01 ` Asdo
2010-01-27 10:09   ` Giovanni Tessore
2010-01-27 10:50     ` Asdo
2010-01-27 15:06       ` Goswin von Brederlow
2010-01-27 16:15       ` Giovanni Tessore
2010-01-27 19:33     ` Richard Scobie
  -- strict thread matches above, loose matches on Subject: below --
2010-01-27  9:56 Giovanni Tessore

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).