* Feature Request
@ 2010-02-09 8:43 Stefan *St0fF* Huebner
2010-02-09 12:28 ` Michael Tokarev
0 siblings, 1 reply; 3+ messages in thread
From: Stefan *St0fF* Huebner @ 2010-02-09 8:43 UTC (permalink / raw)
To: linux-raid
Hi Everybody,
I would like to propose a few probably hard-to-implement features to mdraid.
Background:
Nowadays harddisk drives, I only talk about ATA/SATA drives (SCSI
devices are too expensive for me), do their own error correction. Most
of them also have a feature called ERC (Error Recovery Control), where
you can set timeouts for read/write error correction. Desktop drives
are preset to run their error recovery to its fullest extend, not
reacting while this procedure is active. RAID-edition/enterprise disks
are normally set to start error recovery, but report back a media error
after 7 seconds of unsuccessful error recovery - here this timeout
"happens".
Now imagine any RAID with some kind of redundancy, reading/writing
data. One of the disks finds out "I cannot correctly read/write the
requested sector", starts its error correction, hits the respective
ERC-timeout and reports back a media error or unrecoverable error. Now
mdraid would drop the disk.
But actually the data of the sector can be recreated through the
existing redundancy. Wouldn't it be a smart thing if the mdraid
recreates the sector and just tried to write it again? And after a good
amount of failed retries it may well drop the disk.
Prerequisites:
- upon assembling/creating of the array:
- mdraid needs to find out if the used devices rely on (s)ata block
devices
- if it does, the ERC-timeouts for reading/writing operations on each
device need to be set, as this feature is volatile (gets reset to
factory defaults upon power-on-reset).
- if successful, some flag indicating the enabled feature shall be set
- error handling needs to be updated with above described "intelligence"
for devices, that have the ERC-feature set
This is a request for comments (and of course this feature).
All the best,
Stefan Hübner
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Feature Request
2010-02-09 8:43 Feature Request Stefan *St0fF* Huebner
@ 2010-02-09 12:28 ` Michael Tokarev
2010-02-09 14:19 ` Stefan Hübner
0 siblings, 1 reply; 3+ messages in thread
From: Michael Tokarev @ 2010-02-09 12:28 UTC (permalink / raw)
To: st0ff; +Cc: linux-raid
Stefan *St0fF* Huebner wrote:
[]
> Now imagine any RAID with some kind of redundancy, reading/writing
> data. One of the disks finds out "I cannot correctly read/write the
> requested sector", starts its error correction, hits the respective
> ERC-timeout and reports back a media error or unrecoverable error. Now
> mdraid would drop the disk.
>
> But actually the data of the sector can be recreated through the
> existing redundancy. Wouldn't it be a smart thing if the mdraid
> recreates the sector and just tried to write it again? And after a good
> amount of failed retries it may well drop the disk.
This is exactly what md layer is doing. On failed _read_ it tries to
reconstruct data from other disk drives and writes the reconstructed
data back to the drive where read failed. If the _write_ fails md will
drop the disk.
/mjt
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Feature Request
2010-02-09 12:28 ` Michael Tokarev
@ 2010-02-09 14:19 ` Stefan Hübner
0 siblings, 0 replies; 3+ messages in thread
From: Stefan Hübner @ 2010-02-09 14:19 UTC (permalink / raw)
To: linux-raid
Am 09.02.2010 13:28, schrieb Michael Tokarev:
> Stefan *St0fF* Huebner wrote:
> []
>> Now imagine any RAID with some kind of redundancy, reading/writing
>> data. One of the disks finds out "I cannot correctly read/write the
>> requested sector", starts its error correction, hits the respective
>> ERC-timeout and reports back a media error or unrecoverable error. Now
>> mdraid would drop the disk.
>>
>> But actually the data of the sector can be recreated through the
>> existing redundancy. Wouldn't it be a smart thing if the mdraid
>> recreates the sector and just tried to write it again? And after a good
>> amount of failed retries it may well drop the disk.
>
> This is exactly what md layer is doing. On failed _read_ it tries to
> reconstruct data from other disk drives and writes the reconstructed
> data back to the drive where read failed. If the _write_ fails md will
> drop the disk.
Hi Mjt,
I hoped so - great it is implemented like that.
Well, then all that's needed is the check at assembly/creation time:
- (is the drive an ATA-drive) && (does it support SCT ERC)
-> and if it does, set some reasonable timeouts. (like the 7s it is with
enterprise class drives for reading. For writing I would suggest 14s,
bearing in mind that too quick reallocation results in the spare sectors
running out quickly.)
The writing back (I guess this is done with a reasonable amount of
retries) does not make sense if the drive is still in its error recovery
procedure and does not react to any commands until it is done.
P.S.: I have already implemented the checks and setup, but in userspace
using SG_IO.
/st0ff
>
> /mjt
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2010-02-09 14:19 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-09 8:43 Feature Request Stefan *St0fF* Huebner
2010-02-09 12:28 ` Michael Tokarev
2010-02-09 14:19 ` Stefan Hübner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).