* self healing of MD raid @ 2015-06-02 17:22 keld 2015-06-02 17:53 ` Robin Hill 0 siblings, 1 reply; 4+ messages in thread From: keld @ 2015-06-02 17:22 UTC (permalink / raw) To: linux-raid Hi list I wonder if MD RAID software is kind of self healing. That is, if a read operation gets an IO error, then the logical sector of the RAID can be recreated from the other sector(s) of the raid, and then written out on the block which gave a read error. His could work both for the mirrored RAID types, and for the parity orientet RAID types. Is that implemented in MD RAID? Similarily the self healing process could be part of the monitoring background processes. Best regaqrds keld ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: self healing of MD raid 2015-06-02 17:22 self healing of MD raid keld @ 2015-06-02 17:53 ` Robin Hill 2015-06-02 18:01 ` Alireza Haghdoost 0 siblings, 1 reply; 4+ messages in thread From: Robin Hill @ 2015-06-02 17:53 UTC (permalink / raw) To: keld; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 1480 bytes --] On Tue Jun 02, 2015 at 07:22:36PM +0200, keld@keldix.com wrote: > Hi list > > I wonder if MD RAID software is kind of self healing. > That is, if a read operation gets an IO error, then the logical > sector of the RAID can be recreated from the other sector(s) > of the raid, and then written out on the block which gave a read error. > > His could work both for the mirrored RAID types, and for the > parity orientet RAID types. > > Is that implemented in MD RAID? > > Similarily the self healing process could be part of the monitoring > background processes. > > Best regaqrds > keld Yes, this is implemented as standard for all forms of RAID with redundant data (parity/mirror). A read error will automatically trigger a rewrite of the faulty block with data recovered from the other members. This rewrite should also trigger a remapping within the drive if the original block proves to be unwritable as well. Running a regular check (echo check > /sys/block/mdX/md/sync_action) will do a full read of all active members in an array and therefore trigger rewrites for any unreadable blocks. This is often set up as part of the standard distro cron jobs, but should be set up manually if not. Cheers, Robin -- ___ ( ' } | Robin Hill <robin@robinhill.me.uk> | / / ) | Little Jim says .... | // !! | "He fallen in de water !!" | [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: self healing of MD raid 2015-06-02 17:53 ` Robin Hill @ 2015-06-02 18:01 ` Alireza Haghdoost 2015-06-02 19:14 ` Robin Hill 0 siblings, 1 reply; 4+ messages in thread From: Alireza Haghdoost @ 2015-06-02 18:01 UTC (permalink / raw) To: keld, Linux RAID Robin, Do you know what would be the MD action if it cannot recover the faulty block from the other members ? Assuming not enough members are online, does it just print a warning in the dmesg ? Does any one in the MD layer keep track of the number of corruption events like this ? --Alireza On Tue, Jun 2, 2015 at 12:53 PM, Robin Hill <robin@robinhill.me.uk> wrote: > On Tue Jun 02, 2015 at 07:22:36PM +0200, keld@keldix.com wrote: > >> Hi list >> >> I wonder if MD RAID software is kind of self healing. >> That is, if a read operation gets an IO error, then the logical >> sector of the RAID can be recreated from the other sector(s) >> of the raid, and then written out on the block which gave a read error. >> >> His could work both for the mirrored RAID types, and for the >> parity orientet RAID types. >> >> Is that implemented in MD RAID? >> >> Similarily the self healing process could be part of the monitoring >> background processes. >> >> Best regaqrds >> keld > > Yes, this is implemented as standard for all forms of RAID with > redundant data (parity/mirror). A read error will automatically trigger > a rewrite of the faulty block with data recovered from the other > members. This rewrite should also trigger a remapping within the drive > if the original block proves to be unwritable as well. > > Running a regular check (echo check > /sys/block/mdX/md/sync_action) > will do a full read of all active members in an array and therefore > trigger rewrites for any unreadable blocks. This is often set up as part > of the standard distro cron jobs, but should be set up manually if not. > > Cheers, > Robin > -- > ___ > ( ' } | Robin Hill <robin@robinhill.me.uk> | > / / ) | Little Jim says .... | > // !! | "He fallen in de water !!" | ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: self healing of MD raid 2015-06-02 18:01 ` Alireza Haghdoost @ 2015-06-02 19:14 ` Robin Hill 0 siblings, 0 replies; 4+ messages in thread From: Robin Hill @ 2015-06-02 19:14 UTC (permalink / raw) To: Alireza Haghdoost; +Cc: keld, Linux RAID [-- Attachment #1: Type: text/plain, Size: 2550 bytes --] On Tue Jun 02, 2015 at 01:01:31PM -0500, Alireza Haghdoost wrote: > On Tue, Jun 2, 2015 at 12:53 PM, Robin Hill <robin@robinhill.me.uk> wrote: > > On Tue Jun 02, 2015 at 07:22:36PM +0200, keld@keldix.com wrote: > > > >> Hi list > >> > >> I wonder if MD RAID software is kind of self healing. > >> That is, if a read operation gets an IO error, then the logical > >> sector of the RAID can be recreated from the other sector(s) > >> of the raid, and then written out on the block which gave a read error. > >> > >> His could work both for the mirrored RAID types, and for the > >> parity orientet RAID types. > >> > >> Is that implemented in MD RAID? > >> > >> Similarily the self healing process could be part of the monitoring > >> background processes. > >> > >> Best regaqrds > >> keld > > > > Yes, this is implemented as standard for all forms of RAID with > > redundant data (parity/mirror). A read error will automatically trigger > > a rewrite of the faulty block with data recovered from the other > > members. This rewrite should also trigger a remapping within the drive > > if the original block proves to be unwritable as well. > > > > Running a regular check (echo check > /sys/block/mdX/md/sync_action) > > will do a full read of all active members in an array and therefore > > trigger rewrites for any unreadable blocks. This is often set up as part > > of the standard distro cron jobs, but should be set up manually if not. > > > > Do you know what would be the MD action if it cannot recover the > faulty block from the other members ? Assuming not enough members are > online, does it just print a warning in the dmesg ? Does any one in > the MD layer keep track of the number of corruption events like this ? > > --Alireza > If the faulty block cannot be rebuilt from the other members then a read error is passed on to the application and the array keeps running (the same way a normal block device would handle a read error). If you have a bad block log on the array member (a relatively new feature) then it will record that the block is invalid. Otherwise I don't think there's any tracking within the md layer - you'd need to fall back on whatever tracking there is on the underlying block device (i.e. SMART data, etc.). Cheers, Robin -- ___ ( ' } | Robin Hill <robin@robinhill.me.uk> | / / ) | Little Jim says .... | // !! | "He fallen in de water !!" | [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-06-02 19:14 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-06-02 17:22 self healing of MD raid keld 2015-06-02 17:53 ` Robin Hill 2015-06-02 18:01 ` Alireza Haghdoost 2015-06-02 19:14 ` Robin Hill
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).