self healing of MD raid

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* self healing of MD raid
@ 2015-06-02 17:22 keld
  2015-06-02 17:53 ` Robin Hill
  0 siblings, 1 reply; 4+ messages in thread
From: keld @ 2015-06-02 17:22 UTC (permalink / raw)
  To: linux-raid

Hi list

I wonder if MD RAID software is kind of self healing.
That is, if a read operation gets an IO error, then the logical
sector of the RAID can be recreated from the other sector(s)
of the raid, and then written out on the block which gave a read error.

His could work both for the mirrored RAID types, and for the
parity orientet RAID types.

Is that implemented in MD RAID?

Similarily the self healing process could be part of the monitoring
background processes.

Best regaqrds
keld

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: self healing of MD raid
  2015-06-02 17:22 self healing of MD raid keld
@ 2015-06-02 17:53 ` Robin Hill
  2015-06-02 18:01   ` Alireza Haghdoost
  0 siblings, 1 reply; 4+ messages in thread
From: Robin Hill @ 2015-06-02 17:53 UTC (permalink / raw)
  To: keld; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1480 bytes --]

On Tue Jun 02, 2015 at 07:22:36PM +0200, keld@keldix.com wrote:

> Hi list
> 
> I wonder if MD RAID software is kind of self healing.
> That is, if a read operation gets an IO error, then the logical
> sector of the RAID can be recreated from the other sector(s)
> of the raid, and then written out on the block which gave a read error.
> 
> His could work both for the mirrored RAID types, and for the
> parity orientet RAID types.
> 
> Is that implemented in MD RAID?
> 
> Similarily the self healing process could be part of the monitoring
> background processes.
> 
> Best regaqrds
> keld

Yes, this is implemented as standard for all forms of RAID with
redundant data (parity/mirror). A read error will automatically trigger
a rewrite of the faulty block with data recovered from the other
members. This rewrite should also trigger a remapping within the drive
if the original block proves to be unwritable as well.

Running a regular check (echo check > /sys/block/mdX/md/sync_action)
will do a full read of all active members in an array and therefore
trigger rewrites for any unreadable blocks. This is often set up as part
of the standard distro cron jobs, but should be set up manually if not.

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: self healing of MD raid
  2015-06-02 17:53 ` Robin Hill
@ 2015-06-02 18:01   ` Alireza Haghdoost
  2015-06-02 19:14     ` Robin Hill
  0 siblings, 1 reply; 4+ messages in thread
From: Alireza Haghdoost @ 2015-06-02 18:01 UTC (permalink / raw)
  To: keld, Linux RAID

Robin,

Do you know what would be the MD action if it cannot recover the
faulty block from the other members ? Assuming not enough members are
online, does it just print a warning in the dmesg ? Does any one in
the MD layer keep track of the number of corruption events like this ?

--Alireza



On Tue, Jun 2, 2015 at 12:53 PM, Robin Hill <robin@robinhill.me.uk> wrote:
> On Tue Jun 02, 2015 at 07:22:36PM +0200, keld@keldix.com wrote:
>
>> Hi list
>>
>> I wonder if MD RAID software is kind of self healing.
>> That is, if a read operation gets an IO error, then the logical
>> sector of the RAID can be recreated from the other sector(s)
>> of the raid, and then written out on the block which gave a read error.
>>
>> His could work both for the mirrored RAID types, and for the
>> parity orientet RAID types.
>>
>> Is that implemented in MD RAID?
>>
>> Similarily the self healing process could be part of the monitoring
>> background processes.
>>
>> Best regaqrds
>> keld
>
> Yes, this is implemented as standard for all forms of RAID with
> redundant data (parity/mirror). A read error will automatically trigger
> a rewrite of the faulty block with data recovered from the other
> members. This rewrite should also trigger a remapping within the drive
> if the original block proves to be unwritable as well.
>
> Running a regular check (echo check > /sys/block/mdX/md/sync_action)
> will do a full read of all active members in an array and therefore
> trigger rewrites for any unreadable blocks. This is often set up as part
> of the standard distro cron jobs, but should be set up manually if not.
>
> Cheers,
>     Robin
> --
>      ___
>     ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
>    / / )      | Little Jim says ....                            |
>   // !!       |      "He fallen in de water !!"                 |

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: self healing of MD raid
  2015-06-02 18:01   ` Alireza Haghdoost
@ 2015-06-02 19:14     ` Robin Hill
  0 siblings, 0 replies; 4+ messages in thread
From: Robin Hill @ 2015-06-02 19:14 UTC (permalink / raw)
  To: Alireza Haghdoost; +Cc: keld, Linux RAID

[-- Attachment #1: Type: text/plain, Size: 2550 bytes --]

On Tue Jun 02, 2015 at 01:01:31PM -0500, Alireza Haghdoost wrote:
> On Tue, Jun 2, 2015 at 12:53 PM, Robin Hill <robin@robinhill.me.uk> wrote:
> > On Tue Jun 02, 2015 at 07:22:36PM +0200, keld@keldix.com wrote:
> >
> >> Hi list
> >>
> >> I wonder if MD RAID software is kind of self healing.
> >> That is, if a read operation gets an IO error, then the logical
> >> sector of the RAID can be recreated from the other sector(s)
> >> of the raid, and then written out on the block which gave a read error.
> >>
> >> His could work both for the mirrored RAID types, and for the
> >> parity orientet RAID types.
> >>
> >> Is that implemented in MD RAID?
> >>
> >> Similarily the self healing process could be part of the monitoring
> >> background processes.
> >>
> >> Best regaqrds
> >> keld
> >
> > Yes, this is implemented as standard for all forms of RAID with
> > redundant data (parity/mirror). A read error will automatically trigger
> > a rewrite of the faulty block with data recovered from the other
> > members. This rewrite should also trigger a remapping within the drive
> > if the original block proves to be unwritable as well.
> >
> > Running a regular check (echo check > /sys/block/mdX/md/sync_action)
> > will do a full read of all active members in an array and therefore
> > trigger rewrites for any unreadable blocks. This is often set up as part
> > of the standard distro cron jobs, but should be set up manually if not.
> >
> 
> Do you know what would be the MD action if it cannot recover the
> faulty block from the other members ? Assuming not enough members are
> online, does it just print a warning in the dmesg ? Does any one in
> the MD layer keep track of the number of corruption events like this ?
> 
> --Alireza
> 

If the faulty block cannot be rebuilt from the other members then a read
error is passed on to the application and the array keeps running (the
same way a normal block device would handle a read error).

If you have a bad block log on the array member (a relatively new
feature) then it will record that the block is invalid. Otherwise I
don't think there's any tracking within the md layer - you'd need to
fall back on whatever tracking there is on the underlying block device
(i.e. SMART data, etc.).

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-06-02 19:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-02 17:22 self healing of MD raid keld
2015-06-02 17:53 ` Robin Hill
2015-06-02 18:01   ` Alireza Haghdoost
2015-06-02 19:14     ` Robin Hill

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).