Feature Request

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Feature Request
@ 2010-02-09  8:43 Stefan *St0fF* Huebner
  2010-02-09 12:28 ` Michael Tokarev
  0 siblings, 1 reply; 3+ messages in thread
From: Stefan *St0fF* Huebner @ 2010-02-09  8:43 UTC (permalink / raw)
  To: linux-raid

Hi Everybody,

I would like to propose a few probably hard-to-implement features to mdraid.

Background:
Nowadays harddisk drives, I only talk about ATA/SATA drives (SCSI
devices are too expensive for me), do their own error correction.  Most
of them also have a feature called ERC (Error Recovery Control), where
you can set timeouts for read/write error correction.  Desktop drives
are preset to run their error recovery to its fullest extend, not
reacting while this procedure is active.  RAID-edition/enterprise disks
are normally set to start error recovery, but report back a media error
after 7 seconds of unsuccessful error recovery - here this timeout
"happens".

Now imagine any RAID with some kind of redundancy, reading/writing
data.  One of the disks finds out "I cannot correctly read/write the
requested sector", starts its error correction, hits the respective
ERC-timeout and reports back a media error or unrecoverable error.  Now
mdraid would drop the disk.

But actually the data of the sector can be recreated through the
existing redundancy.  Wouldn't it be a smart thing if the mdraid
recreates the sector and just tried to write it again?  And after a good
amount of failed retries it may well drop the disk.

Prerequisites:
- upon assembling/creating of the array:
  - mdraid needs to find out if the used devices rely on (s)ata block
devices
  - if it does, the ERC-timeouts for reading/writing operations on each
device need to be set, as this feature is volatile (gets reset to
factory defaults upon power-on-reset).
  - if successful, some flag indicating the enabled feature shall be set
- error handling needs to be updated with above described "intelligence"
for devices, that have the ERC-feature set

This is a request for comments (and of course this feature).

All the best,
Stefan Hübner
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Feature Request
  2010-02-09  8:43 Feature Request Stefan *St0fF* Huebner
@ 2010-02-09 12:28 ` Michael Tokarev
  2010-02-09 14:19   ` Stefan Hübner
  0 siblings, 1 reply; 3+ messages in thread
From: Michael Tokarev @ 2010-02-09 12:28 UTC (permalink / raw)
  To: st0ff; +Cc: linux-raid

Stefan *St0fF* Huebner wrote:
[]
> Now imagine any RAID with some kind of redundancy, reading/writing
> data.  One of the disks finds out "I cannot correctly read/write the
> requested sector", starts its error correction, hits the respective
> ERC-timeout and reports back a media error or unrecoverable error.  Now
> mdraid would drop the disk.
> 
> But actually the data of the sector can be recreated through the
> existing redundancy.  Wouldn't it be a smart thing if the mdraid
> recreates the sector and just tried to write it again?  And after a good
> amount of failed retries it may well drop the disk.

This is exactly what md layer is doing.  On failed _read_ it tries to
reconstruct data from other disk drives and writes the reconstructed
data back to the drive where read failed.  If the _write_ fails md will
drop the disk.

/mjt

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Feature Request
  2010-02-09 12:28 ` Michael Tokarev
@ 2010-02-09 14:19   ` Stefan Hübner
  0 siblings, 0 replies; 3+ messages in thread
From: Stefan Hübner @ 2010-02-09 14:19 UTC (permalink / raw)
  To: linux-raid

Am 09.02.2010 13:28, schrieb Michael Tokarev:
> Stefan *St0fF* Huebner wrote:
> []
>> Now imagine any RAID with some kind of redundancy, reading/writing
>> data.  One of the disks finds out "I cannot correctly read/write the
>> requested sector", starts its error correction, hits the respective
>> ERC-timeout and reports back a media error or unrecoverable error.  Now
>> mdraid would drop the disk.
>>
>> But actually the data of the sector can be recreated through the
>> existing redundancy.  Wouldn't it be a smart thing if the mdraid
>> recreates the sector and just tried to write it again?  And after a good
>> amount of failed retries it may well drop the disk.
>
> This is exactly what md layer is doing.  On failed _read_ it tries to
> reconstruct data from other disk drives and writes the reconstructed
> data back to the drive where read failed.  If the _write_ fails md will
> drop the disk.
Hi Mjt,

I hoped so - great it is implemented like that.

Well, then all that's needed is the check at assembly/creation time:
- (is the drive an ATA-drive) && (does it support SCT ERC)
-> and if it does, set some reasonable timeouts. (like the 7s it is with
enterprise class drives for reading.  For writing I would suggest 14s,
bearing in mind that too quick reallocation results in the spare sectors
running out quickly.)

The writing back (I guess this is done with a reasonable amount of
retries) does not make sense if the drive is still in its error recovery
procedure and does not react to any commands until it is done.

P.S.: I have already implemented the checks and setup, but in userspace
using SG_IO.

/st0ff
>
> /mjt
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-02-09 14:19 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-09  8:43 Feature Request Stefan *St0fF* Huebner
2010-02-09 12:28 ` Michael Tokarev
2010-02-09 14:19   ` Stefan Hübner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).