From mboxrd@z Thu Jan  1 00:00:00 1970
From: Wols Lists <antlists@youngman.org.uk>
Subject: Re: using the raid6check report
Date: Sun, 8 Jan 2017 21:06:14 +0000
Message-ID: <5872A9C6.7010408@youngman.org.uk>
References: <14e8ec23-de4a-e90b-4b67-155e5e3cc228@eyal.emu.id.au>
 <20170108174010.GA3699@lazy.lzy>
 <a33dbf5c-465d-b5af-a9a9-a00664816489@eyal.emu.id.au>
 <20170108204659.GB7057@lazy.lzy>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 8bit
Return-path: <linux-raid-owner@vger.kernel.org>
In-Reply-To: <20170108204659.GB7057@lazy.lzy>
Sender: linux-raid-owner@vger.kernel.org
To: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>, Eyal Lebedinsky <eyal@eyal.emu.id.au>
Cc: linux-raid@vger.kernel.org
List-Id: linux-raid.ids

On 08/01/17 20:46, Piergiorgio Sartor wrote:
> "should" as in "it is supposed to do it".
> 
> So, as far as I know, "raid6check" with "repair" will
> check the parity and try to find errors.
> If possible, it will find where the error is, then
> re-compute the value and write the corrected data.
> 
> Now, this was somehow tested and *should* work.
> 
> An other option is just to check for the errors and
> see if one drive is constantly at fault.
> This will not write anything, so it is safer, but
> it will help to see if there are strange things,
> before writing to the disk(s).

Hmmm ...

I've now been thinking about it, and actually I'm not sure it's possible
even with raid6, to correct a corrupt read. The thing is, raid protects
against a failure to read - if a sector fails, the parity will re-create
it. But if a data sector is corrupted, how is raid to know WHICH sector?

If one of the parity sectors is corrupted, it's easy. Calculate parity
from the data, and either P or Q will be wrong, so fix it. But if it's a
*data* sector that's corrupted, both P and Q will be wrong. How easy is
it to work back from that, and work out *which* data sector is wrong? My
fu makes me think you can't, though I could quite easily be wrong :-)

But should that even happen, unless a disk is on its way out, anyway? I
remember years ago, back in the 80s, our minicomputers had
error-correction in the drive. I don't remember the algorithm, but it
wrote 16-bit words to disk - each an 8-bit data byte. The first half was
the original data, and the second half was some parity pattern such that
for any single-bit corruption you knew which half was corrupt, and you
could throw away the corrupt parity, or recreate the correct data from
the parity. Even with a 2-bit error I think it was >90% detection and
recreation. I can't imagine something like that not being in drive
hardware today.

Cheers,
Wol