* Question about raid5 disk recovery logic @ 2012-07-01 7:08 Alexander Lyakas 2012-07-01 8:00 ` NeilBrown 0 siblings, 1 reply; 5+ messages in thread From: Alexander Lyakas @ 2012-07-01 7:08 UTC (permalink / raw) To: linux-raid Hi everybody, I am trying to understand what happens when raid5 is recovering a disk, and a write comes to a stripe that has not been recovered yet. Does md first reconstruct the missing chunk and then applies the write, or first the write is applied as if the array is still degraded (and not recovering), and only later the missing chunk is reconstructed (when the md_do_sync() loop gets to this area)? I am looking at the stripe handling logic (kernel 2.6.38), can anybody pls point me at the path that handle_stripe5() takes in that case? Thanks, Alex. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Question about raid5 disk recovery logic 2012-07-01 7:08 Question about raid5 disk recovery logic Alexander Lyakas @ 2012-07-01 8:00 ` NeilBrown 2012-07-01 13:36 ` Alexander Lyakas 0 siblings, 1 reply; 5+ messages in thread From: NeilBrown @ 2012-07-01 8:00 UTC (permalink / raw) To: Alexander Lyakas; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 1422 bytes --] On Sun, 1 Jul 2012 10:08:40 +0300 Alexander Lyakas <alex.bolshoy@gmail.com> wrote: > Hi everybody, > I am trying to understand what happens when raid5 is recovering a > disk, and a write comes to a stripe that has not been recovered yet. > Does md first reconstruct the missing chunk and then applies the > write, or first the write is applied as if the array is still degraded > (and not recovering), and only later the missing chunk is > reconstructed (when the md_do_sync() loop gets to this area)? > I am looking at the stripe handling logic (kernel 2.6.38), can anybody > pls point me at the path that handle_stripe5() takes in that case? > > Hi Alex, The stripe is still degraded, so md/raid5 treats it like a write to a degraded array. Exactly what happens depends one which block is being written. If the block being written would be stored on the recovering devices, then md will perform a reconstruct-write. It will read the other data blocks, calculate the parity, and write out the parity and the changed data. Similarly if the parity block is on the recovering device a reconstruct-write will be needed. If some other block is being written, md will do a read-modify-write to calculate the new parity and then write out the parity and data. In this case the block on the recovering device will not be written. I hope that clarifies the situation. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Question about raid5 disk recovery logic 2012-07-01 8:00 ` NeilBrown @ 2012-07-01 13:36 ` Alexander Lyakas 2012-07-01 21:44 ` NeilBrown 0 siblings, 1 reply; 5+ messages in thread From: Alexander Lyakas @ 2012-07-01 13:36 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid Thanks, Neil! That clarifies. Does this also mean, that when md_do_sync() gets to such already-reconstructed stripe, it might reconstruct it once again, unless the stripe stays in the stripe cache? Thanks for helping, Alex. On Sun, Jul 1, 2012 at 11:00 AM, NeilBrown <neilb@suse.de> wrote: > On Sun, 1 Jul 2012 10:08:40 +0300 Alexander Lyakas <alex.bolshoy@gmail.com> > wrote: > >> Hi everybody, >> I am trying to understand what happens when raid5 is recovering a >> disk, and a write comes to a stripe that has not been recovered yet. >> Does md first reconstruct the missing chunk and then applies the >> write, or first the write is applied as if the array is still degraded >> (and not recovering), and only later the missing chunk is >> reconstructed (when the md_do_sync() loop gets to this area)? >> I am looking at the stripe handling logic (kernel 2.6.38), can anybody >> pls point me at the path that handle_stripe5() takes in that case? >> >> > > Hi Alex, > > The stripe is still degraded, so md/raid5 treats it like a write to a > degraded array. > Exactly what happens depends one which block is being written. > If the block being written would be stored on the recovering devices, then > md will perform a reconstruct-write. It will read the other data blocks, > calculate the parity, and write out the parity and the changed data. > Similarly if the parity block is on the recovering device a > reconstruct-write will be needed. > If some other block is being written, md will do a read-modify-write to > calculate the new parity and then write out the parity and data. In this > case the block on the recovering device will not be written. > > I hope that clarifies the situation. > > NeilBrown ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Question about raid5 disk recovery logic 2012-07-01 13:36 ` Alexander Lyakas @ 2012-07-01 21:44 ` NeilBrown 2012-07-02 8:32 ` Alexander Lyakas 0 siblings, 1 reply; 5+ messages in thread From: NeilBrown @ 2012-07-01 21:44 UTC (permalink / raw) To: Alexander Lyakas; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 2204 bytes --] On Sun, 1 Jul 2012 16:36:51 +0300 Alexander Lyakas <alex.bolshoy@gmail.com> wrote: > Thanks, Neil! > That clarifies. > > Does this also mean, that when md_do_sync() gets to such > already-reconstructed stripe, it might reconstruct it once again, > unless the stripe stays in the stripe cache? Yes, it will reconstruct it, and that might be "again" if the reconstructed block has already been written. If the stripe is still in the cache, I think it will still write that block out again, but won't need to reconstruct it. NeilBrown > > Thanks for helping, > Alex. > > > On Sun, Jul 1, 2012 at 11:00 AM, NeilBrown <neilb@suse.de> wrote: > > On Sun, 1 Jul 2012 10:08:40 +0300 Alexander Lyakas <alex.bolshoy@gmail.com> > > wrote: > > > >> Hi everybody, > >> I am trying to understand what happens when raid5 is recovering a > >> disk, and a write comes to a stripe that has not been recovered yet. > >> Does md first reconstruct the missing chunk and then applies the > >> write, or first the write is applied as if the array is still degraded > >> (and not recovering), and only later the missing chunk is > >> reconstructed (when the md_do_sync() loop gets to this area)? > >> I am looking at the stripe handling logic (kernel 2.6.38), can anybody > >> pls point me at the path that handle_stripe5() takes in that case? > >> > >> > > > > Hi Alex, > > > > The stripe is still degraded, so md/raid5 treats it like a write to a > > degraded array. > > Exactly what happens depends one which block is being written. > > If the block being written would be stored on the recovering devices, then > > md will perform a reconstruct-write. It will read the other data blocks, > > calculate the parity, and write out the parity and the changed data. > > Similarly if the parity block is on the recovering device a > > reconstruct-write will be needed. > > If some other block is being written, md will do a read-modify-write to > > calculate the new parity and then write out the parity and data. In this > > case the block on the recovering device will not be written. > > > > I hope that clarifies the situation. > > > > NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Question about raid5 disk recovery logic 2012-07-01 21:44 ` NeilBrown @ 2012-07-02 8:32 ` Alexander Lyakas 0 siblings, 0 replies; 5+ messages in thread From: Alexander Lyakas @ 2012-07-02 8:32 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid Thanks, Neil, for the clear explanation. Alex. On Mon, Jul 2, 2012 at 12:44 AM, NeilBrown <neilb@suse.de> wrote: > On Sun, 1 Jul 2012 16:36:51 +0300 Alexander Lyakas <alex.bolshoy@gmail.com> > wrote: > >> Thanks, Neil! >> That clarifies. >> >> Does this also mean, that when md_do_sync() gets to such >> already-reconstructed stripe, it might reconstruct it once again, >> unless the stripe stays in the stripe cache? > > Yes, it will reconstruct it, and that might be "again" if the reconstructed > block has already been written. If the stripe is still in the cache, I think > it will still write that block out again, but won't need to reconstruct it. > > NeilBrown > > >> >> Thanks for helping, >> Alex. >> >> >> On Sun, Jul 1, 2012 at 11:00 AM, NeilBrown <neilb@suse.de> wrote: >> > On Sun, 1 Jul 2012 10:08:40 +0300 Alexander Lyakas <alex.bolshoy@gmail.com> >> > wrote: >> > >> >> Hi everybody, >> >> I am trying to understand what happens when raid5 is recovering a >> >> disk, and a write comes to a stripe that has not been recovered yet. >> >> Does md first reconstruct the missing chunk and then applies the >> >> write, or first the write is applied as if the array is still degraded >> >> (and not recovering), and only later the missing chunk is >> >> reconstructed (when the md_do_sync() loop gets to this area)? >> >> I am looking at the stripe handling logic (kernel 2.6.38), can anybody >> >> pls point me at the path that handle_stripe5() takes in that case? >> >> >> >> >> > >> > Hi Alex, >> > >> > The stripe is still degraded, so md/raid5 treats it like a write to a >> > degraded array. >> > Exactly what happens depends one which block is being written. >> > If the block being written would be stored on the recovering devices, then >> > md will perform a reconstruct-write. It will read the other data blocks, >> > calculate the parity, and write out the parity and the changed data. >> > Similarly if the parity block is on the recovering device a >> > reconstruct-write will be needed. >> > If some other block is being written, md will do a read-modify-write to >> > calculate the new parity and then write out the parity and data. In this >> > case the block on the recovering device will not be written. >> > >> > I hope that clarifies the situation. >> > >> > NeilBrown > ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2012-07-02 8:32 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-07-01 7:08 Question about raid5 disk recovery logic Alexander Lyakas 2012-07-01 8:00 ` NeilBrown 2012-07-01 13:36 ` Alexander Lyakas 2012-07-01 21:44 ` NeilBrown 2012-07-02 8:32 ` Alexander Lyakas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).