linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: End to end SMART to RAID repair
       [not found] ` <18528.10923.329740.465179@notabene.brown>
@ 2008-06-24  0:05   ` Arthur Britto
  2008-06-24 11:24     ` Ric Wheeler
  0 siblings, 1 reply; 2+ messages in thread
From: Arthur Britto @ 2008-06-24  0:05 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

On Tue, 2008-06-24 at 08:58 +1000, Neil Brown wrote:
> On Sunday June 22, ahbritto@iat.com wrote:
> > smartmontools (http://smartmontools.sourceforge.net/) can be configured
> > to passively scan hard drives for defects in the background.  The block
> > numbers of pending unreadable sectors are logged via syslog.  These
> > sectors will be remapped when written too.
> > 
> > It would be great if this worked end to end with linux software raid to
> > automatically repair the bad sector.
> 
> Well, you can just get md to do a scan (echo check >
> /sys/block/mdXX/md/sync_action) and it will find any read errors and
> correct them.

True.  However, a SMART on disk check requires no main board
resources. Some drives, when idle, may do background checking anyway.
This would provide a way to correct the error without needing to scan
the whole volume and other components with an md check.  Error checking
may be less intrusive (vs retries to the exclusion of other work) than
normal for an attempted sector read.  At least manufactures have the
option to give priority to actual read requests over background defect
checking.

> Extracting numbers from syslog is a fairly messy thing to try to do.
> Maybe if smartmontools could report these in some other way -
> e.g. run a program giving device and block number, we could write a
> script that feeds that info to md.
> We would need to map the device+offset to partition+offset, then find
> out if that is a member of an md array, then request a limited-range
> 'check', which I think is possible with current code...
> 
> Do you know if smartmontools can provide this info in a more
> controlled way?

I was thinking, a non-smartmontools specific method would be best.  That
is: (1) some way for the md driver to request notification about pending
uncorrected read errors from a region of a block device and (2) some way
for a trusted application to inform the kernel about pending uncorrected
read errors (e.g. echo "start-stop > /sys/...").

-Arthur

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: End to end SMART to RAID repair
  2008-06-24  0:05   ` End to end SMART to RAID repair Arthur Britto
@ 2008-06-24 11:24     ` Ric Wheeler
  0 siblings, 0 replies; 2+ messages in thread
From: Ric Wheeler @ 2008-06-24 11:24 UTC (permalink / raw)
  To: Arthur Britto; +Cc: Neil Brown, linux-raid

Arthur Britto wrote:
> On Tue, 2008-06-24 at 08:58 +1000, Neil Brown wrote:
>   
>> On Sunday June 22, ahbritto@iat.com wrote:
>>     
>>> smartmontools (http://smartmontools.sourceforge.net/) can be configured
>>> to passively scan hard drives for defects in the background.  The block
>>> numbers of pending unreadable sectors are logged via syslog.  These
>>> sectors will be remapped when written too.
>>>
>>> It would be great if this worked end to end with linux software raid to
>>> automatically repair the bad sector.
>>>       
>> Well, you can just get md to do a scan (echo check >
>> /sys/block/mdXX/md/sync_action) and it will find any read errors and
>> correct them.
>>     
>
> True.  However, a SMART on disk check requires no main board
> resources. Some drives, when idle, may do background checking anyway.
> This would provide a way to correct the error without needing to scan
> the whole volume and other components with an md check.  Error checking
> may be less intrusive (vs retries to the exclusion of other work) than
> normal for an attempted sector read.  At least manufactures have the
> option to give priority to actual read requests over background defect
> checking.
>   

This is almost always the case with disk arrays for example.
>   
>> Extracting numbers from syslog is a fairly messy thing to try to do.
>> Maybe if smartmontools could report these in some other way -
>> e.g. run a program giving device and block number, we could write a
>> script that feeds that info to md.
>> We would need to map the device+offset to partition+offset, then find
>> out if that is a member of an md array, then request a limited-range
>> 'check', which I think is possible with current code...
>>
>> Do you know if smartmontools can provide this info in a more
>> controlled way?
>>     
>
> I was thinking, a non-smartmontools specific method would be best.  That
> is: (1) some way for the md driver to request notification about pending
> uncorrected read errors from a region of a block device and (2) some way
> for a trusted application to inform the kernel about pending uncorrected
> read errors (e.g. echo "start-stop > /sys/...").
>
> -Arthur
>
>   

One thing that you can do that is much less invasive is to use the "read 
verify" command to scan the platter of the disk at a fairly low rate.

Read verify does not transfer data from the disk to the host and you can 
issue fairly large requests (say 1MB at a time) as a background task per 
drive. What you will get out of this is a validation that nothing has 
failed at the disk sector level (i.e., each sector is still readable). 
On detection of an error, you can go back and try to pin point the 
failed sector with small IO's and then try to repair the damage with a 
write (say from the other mirror in a RAID1 device).

This is useful, but is not an end to end data integrity check (like 
Martin's T10 DIF work that was posted).

In general, it would also be really neat to figure out and API which 
would let a higher level application (or file system) inform the block 
level of an error and possibly ask for a read from another mirror.

ric

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2008-06-24 11:24 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1214199855.4296.8.camel@loss.redstem.com>
     [not found] ` <18528.10923.329740.465179@notabene.brown>
2008-06-24  0:05   ` End to end SMART to RAID repair Arthur Britto
2008-06-24 11:24     ` Ric Wheeler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).