using the raid6check report

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* using the raid6check report
@ 2016-12-23  0:56 Eyal Lebedinsky
  2017-01-08 17:40 ` Piergiorgio Sartor
  0 siblings, 1 reply; 13+ messages in thread
From: Eyal Lebedinsky @ 2016-12-23  0:56 UTC (permalink / raw)
  To: list linux-raid

 From time to time I get non-zero mismatch_count in the weekly scrub. The way I handle
it is to run a check around the stripe (I have a background job printing the mismatch
count and /proc/mdstat regularly) which should report the same count.

I now drill into the fs to find which files use this area, deal with them and delete
the bad ones. I then run a repair on that small area.

I now found about raid6check which can actually tell me which disk holds the bad data.
This is something raid6 should be able to do assuming a single error.
Hoping it is one bad disk, the simple solution now is to recover the bad stripe on
that disk.

Will a 'repair' rewrite the bad disk or just create fresh P+Q which may just make the
bad data invisible to a 'check'? I recall this being the case in the past.

'man md' still says
	For RAID5/RAID6 new parity blocks are written
I think RAID6 can do better.

TIA

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: using the raid6check report
  2016-12-23  0:56 using the raid6check report Eyal Lebedinsky
@ 2017-01-08 17:40 ` Piergiorgio Sartor
  2017-01-08 20:36   ` Eyal Lebedinsky
  2017-01-08 20:52   ` Wols Lists
  0 siblings, 2 replies; 13+ messages in thread
From: Piergiorgio Sartor @ 2017-01-08 17:40 UTC (permalink / raw)
  To: Eyal Lebedinsky; +Cc: list linux-raid

On Fri, Dec 23, 2016 at 11:56:34AM +1100, Eyal Lebedinsky wrote:
> From time to time I get non-zero mismatch_count in the weekly scrub. The way I handle
> it is to run a check around the stripe (I have a background job printing the mismatch
> count and /proc/mdstat regularly) which should report the same count.
> 
> I now drill into the fs to find which files use this area, deal with them and delete
> the bad ones. I then run a repair on that small area.
> 
> I now found about raid6check which can actually tell me which disk holds the bad data.
> This is something raid6 should be able to do assuming a single error.
> Hoping it is one bad disk, the simple solution now is to recover the bad stripe on
> that disk.
> 
> Will a 'repair' rewrite the bad disk or just create fresh P+Q which may just make the
> bad data invisible to a 'check'? I recall this being the case in the past.

"repair" should fix the data which is assumed
to be wrong.
It should not simply correct P+Q, but really
find out which disk is not OK and fix it.

> 
> 'man md' still says
> 	For RAID5/RAID6 new parity blocks are written
> I think RAID6 can do better.
> 
> TIA
> 
> -- 
> Eyal Lebedinsky (eyal@eyal.emu.id.au)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: using the raid6check report
  2017-01-08 17:40 ` Piergiorgio Sartor
@ 2017-01-08 20:36   ` Eyal Lebedinsky
  2017-01-08 20:46     ` Piergiorgio Sartor
  2017-01-08 20:52   ` Wols Lists
  1 sibling, 1 reply; 13+ messages in thread
From: Eyal Lebedinsky @ 2017-01-08 20:36 UTC (permalink / raw)
  Cc: list linux-raid

On 09/01/17 04:40, Piergiorgio Sartor wrote:
> On Fri, Dec 23, 2016 at 11:56:34AM +1100, Eyal Lebedinsky wrote:
>> From time to time I get non-zero mismatch_count in the weekly scrub. The way I handle
>> it is to run a check around the stripe (I have a background job printing the mismatch
>> count and /proc/mdstat regularly) which should report the same count.
>>
>> I now drill into the fs to find which files use this area, deal with them and delete
>> the bad ones. I then run a repair on that small area.
>>
>> I now found about raid6check which can actually tell me which disk holds the bad data.
>> This is something raid6 should be able to do assuming a single error.
>> Hoping it is one bad disk, the simple solution now is to recover the bad stripe on
>> that disk.
>>
>> Will a 'repair' rewrite the bad disk or just create fresh P+Q which may just make the
>> bad data invisible to a 'check'? I recall this being the case in the past.
>
> "repair" should fix the data which is assumed

You say "should", as in "it does today" or as in "need to change to do" this?
As I noted originally, the man pages says it does the simple thing - should the
man page be fixed?

> to be wrong.
> It should not simply correct P+Q, but really
> find out which disk is not OK and fix it.
>
>>
>> 'man md' still says
>> 	For RAID5/RAID6 new parity blocks are written
>> I think RAID6 can do better.
>>
>> TIA
>>
>> --
>> Eyal Lebedinsky (eyal@eyal.emu.id.au)

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: using the raid6check report
  2017-01-08 20:36   ` Eyal Lebedinsky
@ 2017-01-08 20:46     ` Piergiorgio Sartor
  2017-01-08 21:06       ` Wols Lists
  0 siblings, 1 reply; 13+ messages in thread
From: Piergiorgio Sartor @ 2017-01-08 20:46 UTC (permalink / raw)
  To: Eyal Lebedinsky; +Cc: linux-raid

On Mon, Jan 09, 2017 at 07:36:59AM +1100, Eyal Lebedinsky wrote:
> On 09/01/17 04:40, Piergiorgio Sartor wrote:
> > On Fri, Dec 23, 2016 at 11:56:34AM +1100, Eyal Lebedinsky wrote:
> > > From time to time I get non-zero mismatch_count in the weekly scrub. The way I handle
> > > it is to run a check around the stripe (I have a background job printing the mismatch
> > > count and /proc/mdstat regularly) which should report the same count.
> > > 
> > > I now drill into the fs to find which files use this area, deal with them and delete
> > > the bad ones. I then run a repair on that small area.
> > > 
> > > I now found about raid6check which can actually tell me which disk holds the bad data.
> > > This is something raid6 should be able to do assuming a single error.
> > > Hoping it is one bad disk, the simple solution now is to recover the bad stripe on
> > > that disk.
> > > 
> > > Will a 'repair' rewrite the bad disk or just create fresh P+Q which may just make the
> > > bad data invisible to a 'check'? I recall this being the case in the past.
> > 
> > "repair" should fix the data which is assumed
> 
> You say "should", as in "it does today" or as in "need to change to do" this?
> As I noted originally, the man pages says it does the simple thing - should the
> man page be fixed?

"should" as in "it is supposed to do it".

So, as far as I know, "raid6check" with "repair" will
check the parity and try to find errors.
If possible, it will find where the error is, then
re-compute the value and write the corrected data.

Now, this was somehow tested and *should* work.

An other option is just to check for the errors and
see if one drive is constantly at fault.
This will not write anything, so it is safer, but
it will help to see if there are strange things,
before writing to the disk(s).

bye,

pg
 
> > to be wrong.
> > It should not simply correct P+Q, but really
> > find out which disk is not OK and fix it.
> > 
> > > 
> > > 'man md' still says
> > > 	For RAID5/RAID6 new parity blocks are written
> > > I think RAID6 can do better.
> > > 
> > > TIA
> > > 
> > > --
> > > Eyal Lebedinsky (eyal@eyal.emu.id.au)
> 
> -- 
> Eyal Lebedinsky (eyal@eyal.emu.id.au)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: using the raid6check report
  2017-01-08 20:46     ` Piergiorgio Sartor
@ 2017-01-08 21:06       ` Wols Lists
  2017-01-08 21:20         ` Eyal Lebedinsky
  2017-01-08 21:43         ` Piergiorgio Sartor
  0 siblings, 2 replies; 13+ messages in thread
From: Wols Lists @ 2017-01-08 21:06 UTC (permalink / raw)
  To: Piergiorgio Sartor, Eyal Lebedinsky; +Cc: linux-raid

On 08/01/17 20:46, Piergiorgio Sartor wrote:
> "should" as in "it is supposed to do it".
> 
> So, as far as I know, "raid6check" with "repair" will
> check the parity and try to find errors.
> If possible, it will find where the error is, then
> re-compute the value and write the corrected data.
> 
> Now, this was somehow tested and *should* work.
> 
> An other option is just to check for the errors and
> see if one drive is constantly at fault.
> This will not write anything, so it is safer, but
> it will help to see if there are strange things,
> before writing to the disk(s).

Hmmm ...

I've now been thinking about it, and actually I'm not sure it's possible
even with raid6, to correct a corrupt read. The thing is, raid protects
against a failure to read - if a sector fails, the parity will re-create
it. But if a data sector is corrupted, how is raid to know WHICH sector?

If one of the parity sectors is corrupted, it's easy. Calculate parity
from the data, and either P or Q will be wrong, so fix it. But if it's a
*data* sector that's corrupted, both P and Q will be wrong. How easy is
it to work back from that, and work out *which* data sector is wrong? My
fu makes me think you can't, though I could quite easily be wrong :-)

But should that even happen, unless a disk is on its way out, anyway? I
remember years ago, back in the 80s, our minicomputers had
error-correction in the drive. I don't remember the algorithm, but it
wrote 16-bit words to disk - each an 8-bit data byte. The first half was
the original data, and the second half was some parity pattern such that
for any single-bit corruption you knew which half was corrupt, and you
could throw away the corrupt parity, or recreate the correct data from
the parity. Even with a 2-bit error I think it was >90% detection and
recreation. I can't imagine something like that not being in drive
hardware today.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: using the raid6check report
  2017-01-08 21:06       ` Wols Lists
@ 2017-01-08 21:20         ` Eyal Lebedinsky
  2017-01-08 21:43         ` Piergiorgio Sartor
  1 sibling, 0 replies; 13+ messages in thread
From: Eyal Lebedinsky @ 2017-01-08 21:20 UTC (permalink / raw)
  To: linux-raid

On 09/01/17 08:06, Wols Lists wrote:
> On 08/01/17 20:46, Piergiorgio Sartor wrote:
[trim]

> If one of the parity sectors is corrupted, it's easy. Calculate parity
> from the data, and either P or Q will be wrong, so fix it. But if it's a
> *data* sector that's corrupted, both P and Q will be wrong. How easy is
> it to work back from that, and work out *which* data sector is wrong? My
> fu makes me think you can't, though I could quite easily be wrong :-)

My understanding of RAID6 is that you CAN say which of the data/P/Q is
wrong if one assumes only one is wrong.

Is this not what raid6check claims to do?
	"In case of parity mismatches, "raid6check" reports, if possible,
	"which component drive could be responsible"

> But should that even happen, unless a disk is on its way out, anyway?

Not so. I get, from time to time, non zero mismatch where I saw no disk
errors of any sort in kernel messages or in smart status.

> I remember years ago, back in the 80s, our minicomputers had
> error-correction in the drive. I don't remember the algorithm, but it
> wrote 16-bit words to disk - each an 8-bit data byte. The first half was
> the original data, and the second half was some parity pattern such that
> for any single-bit corruption you knew which half was corrupt, and you
> could throw away the corrupt parity, or recreate the correct data from
> the parity. Even with a 2-bit error I think it was >90% detection and
> recreation. I can't imagine something like that not being in drive
> hardware today.

The disk thinks it has good data but md thinks not. Maybe bad data was
written due to some other bug? A corner case when the system rebooted
unexpectedly? Maybe the controller corrupted the data?

> Cheers,
> Wol

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: using the raid6check report
  2017-01-08 21:06       ` Wols Lists
  2017-01-08 21:20         ` Eyal Lebedinsky
@ 2017-01-08 21:43         ` Piergiorgio Sartor
  1 sibling, 0 replies; 13+ messages in thread
From: Piergiorgio Sartor @ 2017-01-08 21:43 UTC (permalink / raw)
  To: Wols Lists; +Cc: Piergiorgio Sartor, Eyal Lebedinsky, linux-raid

On Sun, Jan 08, 2017 at 09:06:14PM +0000, Wols Lists wrote:
> On 08/01/17 20:46, Piergiorgio Sartor wrote:
> > "should" as in "it is supposed to do it".
> > 
> > So, as far as I know, "raid6check" with "repair" will
> > check the parity and try to find errors.
> > If possible, it will find where the error is, then
> > re-compute the value and write the corrected data.
> > 
> > Now, this was somehow tested and *should* work.
> > 
> > An other option is just to check for the errors and
> > see if one drive is constantly at fault.
> > This will not write anything, so it is safer, but
> > it will help to see if there are strange things,
> > before writing to the disk(s).
> 
> Hmmm ...
> 
> I've now been thinking about it, and actually I'm not sure it's possible
> even with raid6, to correct a corrupt read. The thing is, raid protects
> against a failure to read - if a sector fails, the parity will re-create
> it. But if a data sector is corrupted, how is raid to know WHICH sector?

Here all you need to know:

http://ftp.nluug.nl/ftp/ftp/os/Linux/system/kernel/people/hpa/raid6.pdf

bye,

pg

> 
> If one of the parity sectors is corrupted, it's easy. Calculate parity
> from the data, and either P or Q will be wrong, so fix it. But if it's a
> *data* sector that's corrupted, both P and Q will be wrong. How easy is
> it to work back from that, and work out *which* data sector is wrong? My
> fu makes me think you can't, though I could quite easily be wrong :-)
> 
> But should that even happen, unless a disk is on its way out, anyway? I
> remember years ago, back in the 80s, our minicomputers had
> error-correction in the drive. I don't remember the algorithm, but it
> wrote 16-bit words to disk - each an 8-bit data byte. The first half was
> the original data, and the second half was some parity pattern such that
> for any single-bit corruption you knew which half was corrupt, and you
> could throw away the corrupt parity, or recreate the correct data from
> the parity. Even with a 2-bit error I think it was >90% detection and
> recreation. I can't imagine something like that not being in drive
> hardware today.
> 
> Cheers,
> Wol

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: using the raid6check report
  2017-01-08 17:40 ` Piergiorgio Sartor
  2017-01-08 20:36   ` Eyal Lebedinsky
@ 2017-01-08 20:52   ` Wols Lists
  2017-01-08 21:41     ` Piergiorgio Sartor
  1 sibling, 1 reply; 13+ messages in thread
From: Wols Lists @ 2017-01-08 20:52 UTC (permalink / raw)
  To: Piergiorgio Sartor, Eyal Lebedinsky; +Cc: list linux-raid

On 08/01/17 17:40, Piergiorgio Sartor wrote:
> On Fri, Dec 23, 2016 at 11:56:34AM +1100, Eyal Lebedinsky wrote:
>> > From time to time I get non-zero mismatch_count in the weekly scrub. The way I handle
>> > it is to run a check around the stripe (I have a background job printing the mismatch
>> > count and /proc/mdstat regularly) which should report the same count.
>> > 
>> > I now drill into the fs to find which files use this area, deal with them and delete
>> > the bad ones. I then run a repair on that small area.
>> > 
>> > I now found about raid6check which can actually tell me which disk holds the bad data.
>> > This is something raid6 should be able to do assuming a single error.
>> > Hoping it is one bad disk, the simple solution now is to recover the bad stripe on
>> > that disk.
>> > 
>> > Will a 'repair' rewrite the bad disk or just create fresh P+Q which may just make the
>> > bad data invisible to a 'check'? I recall this being the case in the past.
> "repair" should fix the data which is assumed
> to be wrong.
> It should not simply correct P+Q, but really
> find out which disk is not OK and fix it.
> 
Having just looked at the man page and the source to raid6check as found
online ...

"man raid6check" says that it does not write to the disk. Looking at the
source, it appears to have code that is intended to write to the disk
and repair the stripe. So what's going on?

I can add it to the wiki as a little programming project, but it would
be nice to know the exact status of things - my raid-fu isn't good
enough at present to read the code and work out what's going on.

It would be nice to be able to write "parity-check" or somesuch to
sync_action, and then for raid5 it would check and update parity, or
raid6 it would check and correct data/parity.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: using the raid6check report
  2017-01-08 20:52   ` Wols Lists
@ 2017-01-08 21:41     ` Piergiorgio Sartor
  2017-01-08 22:39       ` NeilBrown
  0 siblings, 1 reply; 13+ messages in thread
From: Piergiorgio Sartor @ 2017-01-08 21:41 UTC (permalink / raw)
  To: Wols Lists; +Cc: Piergiorgio Sartor, Eyal Lebedinsky, list linux-raid

On Sun, Jan 08, 2017 at 08:52:40PM +0000, Wols Lists wrote:
> On 08/01/17 17:40, Piergiorgio Sartor wrote:
> > On Fri, Dec 23, 2016 at 11:56:34AM +1100, Eyal Lebedinsky wrote:
> >> > From time to time I get non-zero mismatch_count in the weekly scrub. The way I handle
> >> > it is to run a check around the stripe (I have a background job printing the mismatch
> >> > count and /proc/mdstat regularly) which should report the same count.
> >> > 
> >> > I now drill into the fs to find which files use this area, deal with them and delete
> >> > the bad ones. I then run a repair on that small area.
> >> > 
> >> > I now found about raid6check which can actually tell me which disk holds the bad data.
> >> > This is something raid6 should be able to do assuming a single error.
> >> > Hoping it is one bad disk, the simple solution now is to recover the bad stripe on
> >> > that disk.
> >> > 
> >> > Will a 'repair' rewrite the bad disk or just create fresh P+Q which may just make the
> >> > bad data invisible to a 'check'? I recall this being the case in the past.
> > "repair" should fix the data which is assumed
> > to be wrong.
> > It should not simply correct P+Q, but really
> > find out which disk is not OK and fix it.
> > 
> Having just looked at the man page and the source to raid6check as found
> online ...
> 
> "man raid6check" says that it does not write to the disk. Looking at the
> source, it appears to have code that is intended to write to the disk
> and repair the stripe. So what's going on?

There was a patch adding the write capability,
but likely only for the C code, not the man page.

> 
> I can add it to the wiki as a little programming project, but it would
> be nice to know the exact status of things - my raid-fu isn't good
> enough at present to read the code and work out what's going on.
> 
> It would be nice to be able to write "parity-check" or somesuch to
> sync_action, and then for raid5 it would check and update parity, or
> raid6 it would check and correct data/parity.

At that time, the agreement with Neil was to do
such things in user space and not inside the
md raid "driver" (so to speak) in kernal space.

So, as far as I know, the kernel md code can
check the parity and, possibily, re-write.

"raid6check" can detect errors *and*, if only one,
where it is, so a "data repair" capability is possible.

bye,

pg
 
> Cheers,
> Wol
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: using the raid6check report
  2017-01-08 21:41     ` Piergiorgio Sartor
@ 2017-01-08 22:39       ` NeilBrown
  2017-01-09  0:32         ` Eyal Lebedinsky
  0 siblings, 1 reply; 13+ messages in thread
From: NeilBrown @ 2017-01-08 22:39 UTC (permalink / raw)
  To: Wols Lists; +Cc: Piergiorgio Sartor, Eyal Lebedinsky, list linux-raid

[-- Attachment #1: Type: text/plain, Size: 5096 bytes --]

On Mon, Jan 09 2017, Piergiorgio Sartor wrote:

> On Sun, Jan 08, 2017 at 08:52:40PM +0000, Wols Lists wrote:
>> On 08/01/17 17:40, Piergiorgio Sartor wrote:
>> > On Fri, Dec 23, 2016 at 11:56:34AM +1100, Eyal Lebedinsky wrote:
>> >> > From time to time I get non-zero mismatch_count in the weekly scrub. The way I handle
>> >> > it is to run a check around the stripe (I have a background job printing the mismatch
>> >> > count and /proc/mdstat regularly) which should report the same count.
>> >> > 
>> >> > I now drill into the fs to find which files use this area, deal with them and delete
>> >> > the bad ones. I then run a repair on that small area.
>> >> > 
>> >> > I now found about raid6check which can actually tell me which disk holds the bad data.
>> >> > This is something raid6 should be able to do assuming a single error.
>> >> > Hoping it is one bad disk, the simple solution now is to recover the bad stripe on
>> >> > that disk.
>> >> > 
>> >> > Will a 'repair' rewrite the bad disk or just create fresh P+Q which may just make the
>> >> > bad data invisible to a 'check'? I recall this being the case in the past.
>> > "repair" should fix the data which is assumed
>> > to be wrong.
>> > It should not simply correct P+Q, but really
>> > find out which disk is not OK and fix it.
>> > 
>> Having just looked at the man page and the source to raid6check as found
>> online ...
>> 
>> "man raid6check" says that it does not write to the disk. Looking at the
>> source, it appears to have code that is intended to write to the disk
>> and repair the stripe. So what's going on?
>
> There was a patch adding the write capability,
> but likely only for the C code, not the man page.
>
>> 
>> I can add it to the wiki as a little programming project, but it would
>> be nice to know the exact status of things - my raid-fu isn't good
>> enough at present to read the code and work out what's going on.
>> 
>> It would be nice to be able to write "parity-check" or somesuch to
>> sync_action, and then for raid5 it would check and update parity, or
>> raid6 it would check and correct data/parity.
>
> At that time, the agreement with Neil was to do
> such things in user space and not inside the
> md raid "driver" (so to speak) in kernal space.

This is correct.

With RAID6 it is possible to determine, with high reliability, if a
single device is corrupt.  There is a mathematical function that can be
calculated over a set of bytes, one from each device.  If the result is
a number less than the number of devices in the array (including P and
Q), then the device with that index number is corrupt (or at least, both
P and Q can be made correct again by simply changing that one byte).  If
we compute that function over all 512 (or 4096) bytes in a stripe and
they all report the same device (or report that there are no errors for
some bytes) then it is reasonable to assume the block on the identified
device is corrupt.

raid6check does this and provides very useful functionality for a
sysadmin to determine which device is corrupt, and to then correct that
if they wish.

However, I am not comfortable with having that be done transparently
without any confirmation from the sysadmin.  This is because I don't
have a credible threat model for how the corruption could have happened
in the first place.  I understand how hardware failure can make a whole
device unaccessible, and how media errors can cause a single block to be
unreadable.  But I don't see a "most likely way" that a single block can
become corrupt.

Without a clear model, I cannot determine what the correct response is.
The corruption might have happened on the write path ... so re-writing
the block could just cause more corruption.  It could have happened on
the read path, so re-writing won't change anything.  It could have
happened in memory, so nothing can be trusted.  It could have happened
due to buggy code.  Without knowing the cause with high probability, it
is not safe to try to fix anything.

The most likely cause for incorrect P and Q is if the machine crashed
which a stipe was being updated. In that case, simply updating P and Q
is the correct response.  So that is the only response that the kernel
performs.

For more reading, see http://neil.brown.name/blog/20100211050355

NeilBrown

>
> So, as far as I know, the kernel md code can
> check the parity and, possibily, re-write.
>
> "raid6check" can detect errors *and*, if only one,
> where it is, so a "data repair" capability is possible.
>
> bye,
>
> pg
>  
>> Cheers,
>> Wol
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> -- 
>
> piergiorgio
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: using the raid6check report
  2017-01-08 22:39       ` NeilBrown
@ 2017-01-09  0:32         ` Eyal Lebedinsky
  2017-01-09  1:56           ` NeilBrown
  0 siblings, 1 reply; 13+ messages in thread
From: Eyal Lebedinsky @ 2017-01-09  0:32 UTC (permalink / raw)
  To: list linux-raid

On 09/01/17 09:39, NeilBrown wrote:
> On Mon, Jan 09 2017, Piergiorgio Sartor wrote:
>
[trim]
>>
>> There was a patch adding the write capability,
>> but likely only for the C code, not the man page.
>>
>>>
>>> I can add it to the wiki as a little programming project, but it would
>>> be nice to know the exact status of things - my raid-fu isn't good
>>> enough at present to read the code and work out what's going on.
>>>
>>> It would be nice to be able to write "parity-check" or somesuch to
>>> sync_action, and then for raid5 it would check and update parity, or
>>> raid6 it would check and correct data/parity.
>>
>> At that time, the agreement with Neil was to do
>> such things in user space and not inside the
>> md raid "driver" (so to speak) in kernal space.
>
> This is correct.
>
> With RAID6 it is possible to determine, with high reliability, if a
> single device is corrupt.  There is a mathematical function that can be
> calculated over a set of bytes, one from each device.  If the result is
> a number less than the number of devices in the array (including P and
> Q), then the device with that index number is corrupt (or at least, both
> P and Q can be made correct again by simply changing that one byte).  If
> we compute that function over all 512 (or 4096) bytes in a stripe and
> they all report the same device (or report that there are no errors for
> some bytes) then it is reasonable to assume the block on the identified
> device is corrupt.
>
> raid6check does this and provides very useful functionality for a
> sysadmin to determine which device is corrupt, and to then correct that
> if they wish.
>
> However, I am not comfortable with having that be done transparently
> without any confirmation from the sysadmin.  This is because I don't
> have a credible threat model for how the corruption could have happened
> in the first place.  I understand how hardware failure can make a whole
> device unaccessible, and how media errors can cause a single block to be
> unreadable.  But I don't see a "most likely way" that a single block can
> become corrupt.
>
> Without a clear model, I cannot determine what the correct response is.
> The corruption might have happened on the write path ... so re-writing
> the block could just cause more corruption.  It could have happened on
> the read path, so re-writing won't change anything.  It could have
> happened in memory, so nothing can be trusted.  It could have happened
> due to buggy code.  Without knowing the cause with high probability, it
> is not safe to try to fix anything.
>
> The most likely cause for incorrect P and Q is if the machine crashed
> which a stipe was being updated. In that case, simply updating P and Q
> is the correct response.  So that is the only response that the kernel
> performs.
>
> For more reading, see http://neil.brown.name/blog/20100211050355
>
> NeilBrown
[trim]

I am aware of that discussion and agree with the sentiment (fix in user space).
What I miss is a message from md when a 'check' mismatch is found. Not having
this means I have to run 'raid6check', then after looking at the situation
run 'raid6check autorepair' in the small sections reported as bad. This is time
consuming and risky.

What I resort to doing now is 'cat /proc/mdstat' repeatedly during md 'check'
and use the report as a clue to the location of problem stripes.

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: using the raid6check report
  2017-01-09  0:32         ` Eyal Lebedinsky
@ 2017-01-09  1:56           ` NeilBrown
  2017-01-09  2:13             ` Eyal Lebedinsky
  0 siblings, 1 reply; 13+ messages in thread
From: NeilBrown @ 2017-01-09  1:56 UTC (permalink / raw)
  To: Eyal Lebedinsky, list linux-raid

[-- Attachment #1: Type: text/plain, Size: 1376 bytes --]

On Mon, Jan 09 2017, Eyal Lebedinsky wrote:

>
> I am aware of that discussion and agree with the sentiment (fix in user space).

(I primarily provided for the information of others)

> What I miss is a message from md when a 'check' mismatch is found. Not having
> this means I have to run 'raid6check', then after looking at the situation
> run 'raid6check autorepair' in the small sections reported as bad. This is time
> consuming and risky.

Something like this?

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 69b0a169e43d..f19c38baf2b2 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2738,6 +2738,8 @@ static void handle_parity_checks5(raid5_conf_t *conf, struct stripe_head *sh,
 			conf->mddev->resync_mismatches += STRIPE_SECTORS;
 			if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery))
 				/* don't try to repair!! */
+				pr_debug("%s: \"check\" found inconsistency near sector %llu\n",
+					 md_name(conf->mddev), sh->sector);
 				set_bit(STRIPE_INSYNC, &sh->state);
 			else {
 				sh->check_state = check_state_compute_run;


I chose pr_debug() because I didn't want to flood the logs if there are
lots of inconsistencies.
You can selectively enable pr_debug() messages by writing to
/sys/kernel/debug/dynamic_debug/control
providing you have dynamic debugging compiled in.

Maybe use pr_info_ratelimited() instead??

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: using the raid6check report
  2017-01-09  1:56           ` NeilBrown
@ 2017-01-09  2:13             ` Eyal Lebedinsky
  0 siblings, 0 replies; 13+ messages in thread
From: Eyal Lebedinsky @ 2017-01-09  2:13 UTC (permalink / raw)
  To: list linux-raid

On 09/01/17 12:56, NeilBrown wrote:
> On Mon, Jan 09 2017, Eyal Lebedinsky wrote:
>
>>
>> I am aware of that discussion and agree with the sentiment (fix in user space).
>
> (I primarily provided for the information of others)
>
>> What I miss is a message from md when a 'check' mismatch is found. Not having
>> this means I have to run 'raid6check', then after looking at the situation
>> run 'raid6check autorepair' in the small sections reported as bad. This is time
>> consuming and risky.
>
> Something like this?
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 69b0a169e43d..f19c38baf2b2 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -2738,6 +2738,8 @@ static void handle_parity_checks5(raid5_conf_t *conf, struct stripe_head *sh,
>  			conf->mddev->resync_mismatches += STRIPE_SECTORS;
>  			if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery))
>  				/* don't try to repair!! */
> +				pr_debug("%s: \"check\" found inconsistency near sector %llu\n",
> +					 md_name(conf->mddev), sh->sector);
>  				set_bit(STRIPE_INSYNC, &sh->state);
>  			else {
>  				sh->check_state = check_state_compute_run;
>
>
> I chose pr_debug() because I didn't want to flood the logs if there are
> lots of inconsistencies.
> You can selectively enable pr_debug() messages by writing to
> /sys/kernel/debug/dynamic_debug/control
> providing you have dynamic debugging compiled in.

I run fedora and can see the dynamic debugging control file.

> Maybe use pr_info_ratelimited() instead??

Yes, rate limiting is probably a good idea when we have a really bad day.

> NeilBrown

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au)

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-01-09  2:13 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-23  0:56 using the raid6check report Eyal Lebedinsky
2017-01-08 17:40 ` Piergiorgio Sartor
2017-01-08 20:36   ` Eyal Lebedinsky
2017-01-08 20:46     ` Piergiorgio Sartor
2017-01-08 21:06       ` Wols Lists
2017-01-08 21:20         ` Eyal Lebedinsky
2017-01-08 21:43         ` Piergiorgio Sartor
2017-01-08 20:52   ` Wols Lists
2017-01-08 21:41     ` Piergiorgio Sartor
2017-01-08 22:39       ` NeilBrown
2017-01-09  0:32         ` Eyal Lebedinsky
2017-01-09  1:56           ` NeilBrown
2017-01-09  2:13             ` Eyal Lebedinsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).