* using the raid6check report @ 2016-12-23 0:56 Eyal Lebedinsky 2017-01-08 17:40 ` Piergiorgio Sartor 0 siblings, 1 reply; 13+ messages in thread From: Eyal Lebedinsky @ 2016-12-23 0:56 UTC (permalink / raw) To: list linux-raid From time to time I get non-zero mismatch_count in the weekly scrub. The way I handle it is to run a check around the stripe (I have a background job printing the mismatch count and /proc/mdstat regularly) which should report the same count. I now drill into the fs to find which files use this area, deal with them and delete the bad ones. I then run a repair on that small area. I now found about raid6check which can actually tell me which disk holds the bad data. This is something raid6 should be able to do assuming a single error. Hoping it is one bad disk, the simple solution now is to recover the bad stripe on that disk. Will a 'repair' rewrite the bad disk or just create fresh P+Q which may just make the bad data invisible to a 'check'? I recall this being the case in the past. 'man md' still says For RAID5/RAID6 new parity blocks are written I think RAID6 can do better. TIA -- Eyal Lebedinsky (eyal@eyal.emu.id.au) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: using the raid6check report 2016-12-23 0:56 using the raid6check report Eyal Lebedinsky @ 2017-01-08 17:40 ` Piergiorgio Sartor 2017-01-08 20:36 ` Eyal Lebedinsky 2017-01-08 20:52 ` Wols Lists 0 siblings, 2 replies; 13+ messages in thread From: Piergiorgio Sartor @ 2017-01-08 17:40 UTC (permalink / raw) To: Eyal Lebedinsky; +Cc: list linux-raid On Fri, Dec 23, 2016 at 11:56:34AM +1100, Eyal Lebedinsky wrote: > From time to time I get non-zero mismatch_count in the weekly scrub. The way I handle > it is to run a check around the stripe (I have a background job printing the mismatch > count and /proc/mdstat regularly) which should report the same count. > > I now drill into the fs to find which files use this area, deal with them and delete > the bad ones. I then run a repair on that small area. > > I now found about raid6check which can actually tell me which disk holds the bad data. > This is something raid6 should be able to do assuming a single error. > Hoping it is one bad disk, the simple solution now is to recover the bad stripe on > that disk. > > Will a 'repair' rewrite the bad disk or just create fresh P+Q which may just make the > bad data invisible to a 'check'? I recall this being the case in the past. "repair" should fix the data which is assumed to be wrong. It should not simply correct P+Q, but really find out which disk is not OK and fix it. > > 'man md' still says > For RAID5/RAID6 new parity blocks are written > I think RAID6 can do better. > > TIA > > -- > Eyal Lebedinsky (eyal@eyal.emu.id.au) > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- piergiorgio ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: using the raid6check report 2017-01-08 17:40 ` Piergiorgio Sartor @ 2017-01-08 20:36 ` Eyal Lebedinsky 2017-01-08 20:46 ` Piergiorgio Sartor 2017-01-08 20:52 ` Wols Lists 1 sibling, 1 reply; 13+ messages in thread From: Eyal Lebedinsky @ 2017-01-08 20:36 UTC (permalink / raw) Cc: list linux-raid On 09/01/17 04:40, Piergiorgio Sartor wrote: > On Fri, Dec 23, 2016 at 11:56:34AM +1100, Eyal Lebedinsky wrote: >> From time to time I get non-zero mismatch_count in the weekly scrub. The way I handle >> it is to run a check around the stripe (I have a background job printing the mismatch >> count and /proc/mdstat regularly) which should report the same count. >> >> I now drill into the fs to find which files use this area, deal with them and delete >> the bad ones. I then run a repair on that small area. >> >> I now found about raid6check which can actually tell me which disk holds the bad data. >> This is something raid6 should be able to do assuming a single error. >> Hoping it is one bad disk, the simple solution now is to recover the bad stripe on >> that disk. >> >> Will a 'repair' rewrite the bad disk or just create fresh P+Q which may just make the >> bad data invisible to a 'check'? I recall this being the case in the past. > > "repair" should fix the data which is assumed You say "should", as in "it does today" or as in "need to change to do" this? As I noted originally, the man pages says it does the simple thing - should the man page be fixed? > to be wrong. > It should not simply correct P+Q, but really > find out which disk is not OK and fix it. > >> >> 'man md' still says >> For RAID5/RAID6 new parity blocks are written >> I think RAID6 can do better. >> >> TIA >> >> -- >> Eyal Lebedinsky (eyal@eyal.emu.id.au) -- Eyal Lebedinsky (eyal@eyal.emu.id.au) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: using the raid6check report 2017-01-08 20:36 ` Eyal Lebedinsky @ 2017-01-08 20:46 ` Piergiorgio Sartor 2017-01-08 21:06 ` Wols Lists 0 siblings, 1 reply; 13+ messages in thread From: Piergiorgio Sartor @ 2017-01-08 20:46 UTC (permalink / raw) To: Eyal Lebedinsky; +Cc: linux-raid On Mon, Jan 09, 2017 at 07:36:59AM +1100, Eyal Lebedinsky wrote: > On 09/01/17 04:40, Piergiorgio Sartor wrote: > > On Fri, Dec 23, 2016 at 11:56:34AM +1100, Eyal Lebedinsky wrote: > > > From time to time I get non-zero mismatch_count in the weekly scrub. The way I handle > > > it is to run a check around the stripe (I have a background job printing the mismatch > > > count and /proc/mdstat regularly) which should report the same count. > > > > > > I now drill into the fs to find which files use this area, deal with them and delete > > > the bad ones. I then run a repair on that small area. > > > > > > I now found about raid6check which can actually tell me which disk holds the bad data. > > > This is something raid6 should be able to do assuming a single error. > > > Hoping it is one bad disk, the simple solution now is to recover the bad stripe on > > > that disk. > > > > > > Will a 'repair' rewrite the bad disk or just create fresh P+Q which may just make the > > > bad data invisible to a 'check'? I recall this being the case in the past. > > > > "repair" should fix the data which is assumed > > You say "should", as in "it does today" or as in "need to change to do" this? > As I noted originally, the man pages says it does the simple thing - should the > man page be fixed? "should" as in "it is supposed to do it". So, as far as I know, "raid6check" with "repair" will check the parity and try to find errors. If possible, it will find where the error is, then re-compute the value and write the corrected data. Now, this was somehow tested and *should* work. An other option is just to check for the errors and see if one drive is constantly at fault. This will not write anything, so it is safer, but it will help to see if there are strange things, before writing to the disk(s). bye, pg > > to be wrong. > > It should not simply correct P+Q, but really > > find out which disk is not OK and fix it. > > > > > > > > 'man md' still says > > > For RAID5/RAID6 new parity blocks are written > > > I think RAID6 can do better. > > > > > > TIA > > > > > > -- > > > Eyal Lebedinsky (eyal@eyal.emu.id.au) > > -- > Eyal Lebedinsky (eyal@eyal.emu.id.au) > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- piergiorgio ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: using the raid6check report 2017-01-08 20:46 ` Piergiorgio Sartor @ 2017-01-08 21:06 ` Wols Lists 2017-01-08 21:20 ` Eyal Lebedinsky 2017-01-08 21:43 ` Piergiorgio Sartor 0 siblings, 2 replies; 13+ messages in thread From: Wols Lists @ 2017-01-08 21:06 UTC (permalink / raw) To: Piergiorgio Sartor, Eyal Lebedinsky; +Cc: linux-raid On 08/01/17 20:46, Piergiorgio Sartor wrote: > "should" as in "it is supposed to do it". > > So, as far as I know, "raid6check" with "repair" will > check the parity and try to find errors. > If possible, it will find where the error is, then > re-compute the value and write the corrected data. > > Now, this was somehow tested and *should* work. > > An other option is just to check for the errors and > see if one drive is constantly at fault. > This will not write anything, so it is safer, but > it will help to see if there are strange things, > before writing to the disk(s). Hmmm ... I've now been thinking about it, and actually I'm not sure it's possible even with raid6, to correct a corrupt read. The thing is, raid protects against a failure to read - if a sector fails, the parity will re-create it. But if a data sector is corrupted, how is raid to know WHICH sector? If one of the parity sectors is corrupted, it's easy. Calculate parity from the data, and either P or Q will be wrong, so fix it. But if it's a *data* sector that's corrupted, both P and Q will be wrong. How easy is it to work back from that, and work out *which* data sector is wrong? My fu makes me think you can't, though I could quite easily be wrong :-) But should that even happen, unless a disk is on its way out, anyway? I remember years ago, back in the 80s, our minicomputers had error-correction in the drive. I don't remember the algorithm, but it wrote 16-bit words to disk - each an 8-bit data byte. The first half was the original data, and the second half was some parity pattern such that for any single-bit corruption you knew which half was corrupt, and you could throw away the corrupt parity, or recreate the correct data from the parity. Even with a 2-bit error I think it was >90% detection and recreation. I can't imagine something like that not being in drive hardware today. Cheers, Wol ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: using the raid6check report 2017-01-08 21:06 ` Wols Lists @ 2017-01-08 21:20 ` Eyal Lebedinsky 2017-01-08 21:43 ` Piergiorgio Sartor 1 sibling, 0 replies; 13+ messages in thread From: Eyal Lebedinsky @ 2017-01-08 21:20 UTC (permalink / raw) To: linux-raid On 09/01/17 08:06, Wols Lists wrote: > On 08/01/17 20:46, Piergiorgio Sartor wrote: [trim] > If one of the parity sectors is corrupted, it's easy. Calculate parity > from the data, and either P or Q will be wrong, so fix it. But if it's a > *data* sector that's corrupted, both P and Q will be wrong. How easy is > it to work back from that, and work out *which* data sector is wrong? My > fu makes me think you can't, though I could quite easily be wrong :-) My understanding of RAID6 is that you CAN say which of the data/P/Q is wrong if one assumes only one is wrong. Is this not what raid6check claims to do? "In case of parity mismatches, "raid6check" reports, if possible, "which component drive could be responsible" > But should that even happen, unless a disk is on its way out, anyway? Not so. I get, from time to time, non zero mismatch where I saw no disk errors of any sort in kernel messages or in smart status. > I remember years ago, back in the 80s, our minicomputers had > error-correction in the drive. I don't remember the algorithm, but it > wrote 16-bit words to disk - each an 8-bit data byte. The first half was > the original data, and the second half was some parity pattern such that > for any single-bit corruption you knew which half was corrupt, and you > could throw away the corrupt parity, or recreate the correct data from > the parity. Even with a 2-bit error I think it was >90% detection and > recreation. I can't imagine something like that not being in drive > hardware today. The disk thinks it has good data but md thinks not. Maybe bad data was written due to some other bug? A corner case when the system rebooted unexpectedly? Maybe the controller corrupted the data? > Cheers, > Wol -- Eyal Lebedinsky (eyal@eyal.emu.id.au) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: using the raid6check report 2017-01-08 21:06 ` Wols Lists 2017-01-08 21:20 ` Eyal Lebedinsky @ 2017-01-08 21:43 ` Piergiorgio Sartor 1 sibling, 0 replies; 13+ messages in thread From: Piergiorgio Sartor @ 2017-01-08 21:43 UTC (permalink / raw) To: Wols Lists; +Cc: Piergiorgio Sartor, Eyal Lebedinsky, linux-raid On Sun, Jan 08, 2017 at 09:06:14PM +0000, Wols Lists wrote: > On 08/01/17 20:46, Piergiorgio Sartor wrote: > > "should" as in "it is supposed to do it". > > > > So, as far as I know, "raid6check" with "repair" will > > check the parity and try to find errors. > > If possible, it will find where the error is, then > > re-compute the value and write the corrected data. > > > > Now, this was somehow tested and *should* work. > > > > An other option is just to check for the errors and > > see if one drive is constantly at fault. > > This will not write anything, so it is safer, but > > it will help to see if there are strange things, > > before writing to the disk(s). > > Hmmm ... > > I've now been thinking about it, and actually I'm not sure it's possible > even with raid6, to correct a corrupt read. The thing is, raid protects > against a failure to read - if a sector fails, the parity will re-create > it. But if a data sector is corrupted, how is raid to know WHICH sector? Here all you need to know: http://ftp.nluug.nl/ftp/ftp/os/Linux/system/kernel/people/hpa/raid6.pdf bye, pg > > If one of the parity sectors is corrupted, it's easy. Calculate parity > from the data, and either P or Q will be wrong, so fix it. But if it's a > *data* sector that's corrupted, both P and Q will be wrong. How easy is > it to work back from that, and work out *which* data sector is wrong? My > fu makes me think you can't, though I could quite easily be wrong :-) > > But should that even happen, unless a disk is on its way out, anyway? I > remember years ago, back in the 80s, our minicomputers had > error-correction in the drive. I don't remember the algorithm, but it > wrote 16-bit words to disk - each an 8-bit data byte. The first half was > the original data, and the second half was some parity pattern such that > for any single-bit corruption you knew which half was corrupt, and you > could throw away the corrupt parity, or recreate the correct data from > the parity. Even with a 2-bit error I think it was >90% detection and > recreation. I can't imagine something like that not being in drive > hardware today. > > Cheers, > Wol -- piergiorgio ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: using the raid6check report 2017-01-08 17:40 ` Piergiorgio Sartor 2017-01-08 20:36 ` Eyal Lebedinsky @ 2017-01-08 20:52 ` Wols Lists 2017-01-08 21:41 ` Piergiorgio Sartor 1 sibling, 1 reply; 13+ messages in thread From: Wols Lists @ 2017-01-08 20:52 UTC (permalink / raw) To: Piergiorgio Sartor, Eyal Lebedinsky; +Cc: list linux-raid On 08/01/17 17:40, Piergiorgio Sartor wrote: > On Fri, Dec 23, 2016 at 11:56:34AM +1100, Eyal Lebedinsky wrote: >> > From time to time I get non-zero mismatch_count in the weekly scrub. The way I handle >> > it is to run a check around the stripe (I have a background job printing the mismatch >> > count and /proc/mdstat regularly) which should report the same count. >> > >> > I now drill into the fs to find which files use this area, deal with them and delete >> > the bad ones. I then run a repair on that small area. >> > >> > I now found about raid6check which can actually tell me which disk holds the bad data. >> > This is something raid6 should be able to do assuming a single error. >> > Hoping it is one bad disk, the simple solution now is to recover the bad stripe on >> > that disk. >> > >> > Will a 'repair' rewrite the bad disk or just create fresh P+Q which may just make the >> > bad data invisible to a 'check'? I recall this being the case in the past. > "repair" should fix the data which is assumed > to be wrong. > It should not simply correct P+Q, but really > find out which disk is not OK and fix it. > Having just looked at the man page and the source to raid6check as found online ... "man raid6check" says that it does not write to the disk. Looking at the source, it appears to have code that is intended to write to the disk and repair the stripe. So what's going on? I can add it to the wiki as a little programming project, but it would be nice to know the exact status of things - my raid-fu isn't good enough at present to read the code and work out what's going on. It would be nice to be able to write "parity-check" or somesuch to sync_action, and then for raid5 it would check and update parity, or raid6 it would check and correct data/parity. Cheers, Wol ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: using the raid6check report 2017-01-08 20:52 ` Wols Lists @ 2017-01-08 21:41 ` Piergiorgio Sartor 2017-01-08 22:39 ` NeilBrown 0 siblings, 1 reply; 13+ messages in thread From: Piergiorgio Sartor @ 2017-01-08 21:41 UTC (permalink / raw) To: Wols Lists; +Cc: Piergiorgio Sartor, Eyal Lebedinsky, list linux-raid On Sun, Jan 08, 2017 at 08:52:40PM +0000, Wols Lists wrote: > On 08/01/17 17:40, Piergiorgio Sartor wrote: > > On Fri, Dec 23, 2016 at 11:56:34AM +1100, Eyal Lebedinsky wrote: > >> > From time to time I get non-zero mismatch_count in the weekly scrub. The way I handle > >> > it is to run a check around the stripe (I have a background job printing the mismatch > >> > count and /proc/mdstat regularly) which should report the same count. > >> > > >> > I now drill into the fs to find which files use this area, deal with them and delete > >> > the bad ones. I then run a repair on that small area. > >> > > >> > I now found about raid6check which can actually tell me which disk holds the bad data. > >> > This is something raid6 should be able to do assuming a single error. > >> > Hoping it is one bad disk, the simple solution now is to recover the bad stripe on > >> > that disk. > >> > > >> > Will a 'repair' rewrite the bad disk or just create fresh P+Q which may just make the > >> > bad data invisible to a 'check'? I recall this being the case in the past. > > "repair" should fix the data which is assumed > > to be wrong. > > It should not simply correct P+Q, but really > > find out which disk is not OK and fix it. > > > Having just looked at the man page and the source to raid6check as found > online ... > > "man raid6check" says that it does not write to the disk. Looking at the > source, it appears to have code that is intended to write to the disk > and repair the stripe. So what's going on? There was a patch adding the write capability, but likely only for the C code, not the man page. > > I can add it to the wiki as a little programming project, but it would > be nice to know the exact status of things - my raid-fu isn't good > enough at present to read the code and work out what's going on. > > It would be nice to be able to write "parity-check" or somesuch to > sync_action, and then for raid5 it would check and update parity, or > raid6 it would check and correct data/parity. At that time, the agreement with Neil was to do such things in user space and not inside the md raid "driver" (so to speak) in kernal space. So, as far as I know, the kernel md code can check the parity and, possibily, re-write. "raid6check" can detect errors *and*, if only one, where it is, so a "data repair" capability is possible. bye, pg > Cheers, > Wol > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- piergiorgio ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: using the raid6check report 2017-01-08 21:41 ` Piergiorgio Sartor @ 2017-01-08 22:39 ` NeilBrown 2017-01-09 0:32 ` Eyal Lebedinsky 0 siblings, 1 reply; 13+ messages in thread From: NeilBrown @ 2017-01-08 22:39 UTC (permalink / raw) To: Wols Lists; +Cc: Piergiorgio Sartor, Eyal Lebedinsky, list linux-raid [-- Attachment #1: Type: text/plain, Size: 5096 bytes --] On Mon, Jan 09 2017, Piergiorgio Sartor wrote: > On Sun, Jan 08, 2017 at 08:52:40PM +0000, Wols Lists wrote: >> On 08/01/17 17:40, Piergiorgio Sartor wrote: >> > On Fri, Dec 23, 2016 at 11:56:34AM +1100, Eyal Lebedinsky wrote: >> >> > From time to time I get non-zero mismatch_count in the weekly scrub. The way I handle >> >> > it is to run a check around the stripe (I have a background job printing the mismatch >> >> > count and /proc/mdstat regularly) which should report the same count. >> >> > >> >> > I now drill into the fs to find which files use this area, deal with them and delete >> >> > the bad ones. I then run a repair on that small area. >> >> > >> >> > I now found about raid6check which can actually tell me which disk holds the bad data. >> >> > This is something raid6 should be able to do assuming a single error. >> >> > Hoping it is one bad disk, the simple solution now is to recover the bad stripe on >> >> > that disk. >> >> > >> >> > Will a 'repair' rewrite the bad disk or just create fresh P+Q which may just make the >> >> > bad data invisible to a 'check'? I recall this being the case in the past. >> > "repair" should fix the data which is assumed >> > to be wrong. >> > It should not simply correct P+Q, but really >> > find out which disk is not OK and fix it. >> > >> Having just looked at the man page and the source to raid6check as found >> online ... >> >> "man raid6check" says that it does not write to the disk. Looking at the >> source, it appears to have code that is intended to write to the disk >> and repair the stripe. So what's going on? > > There was a patch adding the write capability, > but likely only for the C code, not the man page. > >> >> I can add it to the wiki as a little programming project, but it would >> be nice to know the exact status of things - my raid-fu isn't good >> enough at present to read the code and work out what's going on. >> >> It would be nice to be able to write "parity-check" or somesuch to >> sync_action, and then for raid5 it would check and update parity, or >> raid6 it would check and correct data/parity. > > At that time, the agreement with Neil was to do > such things in user space and not inside the > md raid "driver" (so to speak) in kernal space. This is correct. With RAID6 it is possible to determine, with high reliability, if a single device is corrupt. There is a mathematical function that can be calculated over a set of bytes, one from each device. If the result is a number less than the number of devices in the array (including P and Q), then the device with that index number is corrupt (or at least, both P and Q can be made correct again by simply changing that one byte). If we compute that function over all 512 (or 4096) bytes in a stripe and they all report the same device (or report that there are no errors for some bytes) then it is reasonable to assume the block on the identified device is corrupt. raid6check does this and provides very useful functionality for a sysadmin to determine which device is corrupt, and to then correct that if they wish. However, I am not comfortable with having that be done transparently without any confirmation from the sysadmin. This is because I don't have a credible threat model for how the corruption could have happened in the first place. I understand how hardware failure can make a whole device unaccessible, and how media errors can cause a single block to be unreadable. But I don't see a "most likely way" that a single block can become corrupt. Without a clear model, I cannot determine what the correct response is. The corruption might have happened on the write path ... so re-writing the block could just cause more corruption. It could have happened on the read path, so re-writing won't change anything. It could have happened in memory, so nothing can be trusted. It could have happened due to buggy code. Without knowing the cause with high probability, it is not safe to try to fix anything. The most likely cause for incorrect P and Q is if the machine crashed which a stipe was being updated. In that case, simply updating P and Q is the correct response. So that is the only response that the kernel performs. For more reading, see http://neil.brown.name/blog/20100211050355 NeilBrown > > So, as far as I know, the kernel md code can > check the parity and, possibily, re-write. > > "raid6check" can detect errors *and*, if only one, > where it is, so a "data repair" capability is possible. > > bye, > > pg > >> Cheers, >> Wol >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > > piergiorgio > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: using the raid6check report 2017-01-08 22:39 ` NeilBrown @ 2017-01-09 0:32 ` Eyal Lebedinsky 2017-01-09 1:56 ` NeilBrown 0 siblings, 1 reply; 13+ messages in thread From: Eyal Lebedinsky @ 2017-01-09 0:32 UTC (permalink / raw) To: list linux-raid On 09/01/17 09:39, NeilBrown wrote: > On Mon, Jan 09 2017, Piergiorgio Sartor wrote: > [trim] >> >> There was a patch adding the write capability, >> but likely only for the C code, not the man page. >> >>> >>> I can add it to the wiki as a little programming project, but it would >>> be nice to know the exact status of things - my raid-fu isn't good >>> enough at present to read the code and work out what's going on. >>> >>> It would be nice to be able to write "parity-check" or somesuch to >>> sync_action, and then for raid5 it would check and update parity, or >>> raid6 it would check and correct data/parity. >> >> At that time, the agreement with Neil was to do >> such things in user space and not inside the >> md raid "driver" (so to speak) in kernal space. > > This is correct. > > With RAID6 it is possible to determine, with high reliability, if a > single device is corrupt. There is a mathematical function that can be > calculated over a set of bytes, one from each device. If the result is > a number less than the number of devices in the array (including P and > Q), then the device with that index number is corrupt (or at least, both > P and Q can be made correct again by simply changing that one byte). If > we compute that function over all 512 (or 4096) bytes in a stripe and > they all report the same device (or report that there are no errors for > some bytes) then it is reasonable to assume the block on the identified > device is corrupt. > > raid6check does this and provides very useful functionality for a > sysadmin to determine which device is corrupt, and to then correct that > if they wish. > > However, I am not comfortable with having that be done transparently > without any confirmation from the sysadmin. This is because I don't > have a credible threat model for how the corruption could have happened > in the first place. I understand how hardware failure can make a whole > device unaccessible, and how media errors can cause a single block to be > unreadable. But I don't see a "most likely way" that a single block can > become corrupt. > > Without a clear model, I cannot determine what the correct response is. > The corruption might have happened on the write path ... so re-writing > the block could just cause more corruption. It could have happened on > the read path, so re-writing won't change anything. It could have > happened in memory, so nothing can be trusted. It could have happened > due to buggy code. Without knowing the cause with high probability, it > is not safe to try to fix anything. > > The most likely cause for incorrect P and Q is if the machine crashed > which a stipe was being updated. In that case, simply updating P and Q > is the correct response. So that is the only response that the kernel > performs. > > For more reading, see http://neil.brown.name/blog/20100211050355 > > NeilBrown [trim] I am aware of that discussion and agree with the sentiment (fix in user space). What I miss is a message from md when a 'check' mismatch is found. Not having this means I have to run 'raid6check', then after looking at the situation run 'raid6check autorepair' in the small sections reported as bad. This is time consuming and risky. What I resort to doing now is 'cat /proc/mdstat' repeatedly during md 'check' and use the report as a clue to the location of problem stripes. -- Eyal Lebedinsky (eyal@eyal.emu.id.au) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: using the raid6check report 2017-01-09 0:32 ` Eyal Lebedinsky @ 2017-01-09 1:56 ` NeilBrown 2017-01-09 2:13 ` Eyal Lebedinsky 0 siblings, 1 reply; 13+ messages in thread From: NeilBrown @ 2017-01-09 1:56 UTC (permalink / raw) To: Eyal Lebedinsky, list linux-raid [-- Attachment #1: Type: text/plain, Size: 1376 bytes --] On Mon, Jan 09 2017, Eyal Lebedinsky wrote: > > I am aware of that discussion and agree with the sentiment (fix in user space). (I primarily provided for the information of others) > What I miss is a message from md when a 'check' mismatch is found. Not having > this means I have to run 'raid6check', then after looking at the situation > run 'raid6check autorepair' in the small sections reported as bad. This is time > consuming and risky. Something like this? diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 69b0a169e43d..f19c38baf2b2 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -2738,6 +2738,8 @@ static void handle_parity_checks5(raid5_conf_t *conf, struct stripe_head *sh, conf->mddev->resync_mismatches += STRIPE_SECTORS; if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery)) /* don't try to repair!! */ + pr_debug("%s: \"check\" found inconsistency near sector %llu\n", + md_name(conf->mddev), sh->sector); set_bit(STRIPE_INSYNC, &sh->state); else { sh->check_state = check_state_compute_run; I chose pr_debug() because I didn't want to flood the logs if there are lots of inconsistencies. You can selectively enable pr_debug() messages by writing to /sys/kernel/debug/dynamic_debug/control providing you have dynamic debugging compiled in. Maybe use pr_info_ratelimited() instead?? NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: using the raid6check report 2017-01-09 1:56 ` NeilBrown @ 2017-01-09 2:13 ` Eyal Lebedinsky 0 siblings, 0 replies; 13+ messages in thread From: Eyal Lebedinsky @ 2017-01-09 2:13 UTC (permalink / raw) To: list linux-raid On 09/01/17 12:56, NeilBrown wrote: > On Mon, Jan 09 2017, Eyal Lebedinsky wrote: > >> >> I am aware of that discussion and agree with the sentiment (fix in user space). > > (I primarily provided for the information of others) > >> What I miss is a message from md when a 'check' mismatch is found. Not having >> this means I have to run 'raid6check', then after looking at the situation >> run 'raid6check autorepair' in the small sections reported as bad. This is time >> consuming and risky. > > Something like this? > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index 69b0a169e43d..f19c38baf2b2 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -2738,6 +2738,8 @@ static void handle_parity_checks5(raid5_conf_t *conf, struct stripe_head *sh, > conf->mddev->resync_mismatches += STRIPE_SECTORS; > if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery)) > /* don't try to repair!! */ > + pr_debug("%s: \"check\" found inconsistency near sector %llu\n", > + md_name(conf->mddev), sh->sector); > set_bit(STRIPE_INSYNC, &sh->state); > else { > sh->check_state = check_state_compute_run; > > > I chose pr_debug() because I didn't want to flood the logs if there are > lots of inconsistencies. > You can selectively enable pr_debug() messages by writing to > /sys/kernel/debug/dynamic_debug/control > providing you have dynamic debugging compiled in. I run fedora and can see the dynamic debugging control file. > Maybe use pr_info_ratelimited() instead?? Yes, rate limiting is probably a good idea when we have a really bad day. > NeilBrown -- Eyal Lebedinsky (eyal@eyal.emu.id.au) ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2017-01-09 2:13 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-12-23 0:56 using the raid6check report Eyal Lebedinsky 2017-01-08 17:40 ` Piergiorgio Sartor 2017-01-08 20:36 ` Eyal Lebedinsky 2017-01-08 20:46 ` Piergiorgio Sartor 2017-01-08 21:06 ` Wols Lists 2017-01-08 21:20 ` Eyal Lebedinsky 2017-01-08 21:43 ` Piergiorgio Sartor 2017-01-08 20:52 ` Wols Lists 2017-01-08 21:41 ` Piergiorgio Sartor 2017-01-08 22:39 ` NeilBrown 2017-01-09 0:32 ` Eyal Lebedinsky 2017-01-09 1:56 ` NeilBrown 2017-01-09 2:13 ` Eyal Lebedinsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).