linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Goffredo Baroncelli <kreijack@libero.it>
To: David Brown <david.brown@hesbynett.no>
Cc: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>,
	Andrea Mazzoleni <amadvance@gmail.com>,
	linux-raid@vger.kernel.org, linux-btrfs@vger.kernel.org,
	hpa@zytor.com, creamyfish@gmail.com
Subject: Re: Triple parity and beyond
Date: Thu, 21 Nov 2013 18:37:23 +0100	[thread overview]
Message-ID: <528E44D3.8050705@libero.it> (raw)
In-Reply-To: <528DDCC9.7050701@hesbynett.no>

On 2013-11-21 11:13, David Brown wrote:
> On 20/11/13 22:59, Piergiorgio Sartor wrote:
>> On Wed, Nov 20, 2013 at 11:44:39AM +0100, David Brown wrote:
>> [...]
>>>> In RAID-6 (as per raid6check) there is an easy way
>>>> to verify where an HDD has incorrect data.
>>>>
>>>
>>> I think the way to do that is just to generate the parity blocks from
>>> the data blocks, and compare them to the existing parity blocks.
>>
>> Uhm, the generic RS decoder should try all
>> the possible combination of erasure and so
>> detect the error.
>> This is unfeasible already with 3 parities,
>> so there are faster algorithms, I believe:
>>
>> Peterson–Gorenstein–Zierler algorithm
>> Berlekamp–Massey algorithm
>>
>> Nevertheless, I do not know too much about
>> those, so I cannot state if they apply to
>> the Cauchy matrix as explained here.
>>
>> bye,
>>
> 
> Ah, you are trying to find which disk has incorrect data so that you can
> change just that one disk?  There are dangers with that...
> 
> <http://neil.brown.name/blog/20100211050355>
> 
> If you disagree with this blog post (and I urge you to read it in full
> first), then this is how I would do a "smart" stripe recovery:
> 
> 
> First calculate the parities from the data blocks, and compare these
> with the existing parity blocks.
> 
> If they all match, the stripe is consistent.
> 
> Normal (detectable) disk errors and unrecoverable read errors get
> flagged by the disk and the IO system, and you /know/ there is a problem
> with that block.  Whether it is a data block or a parity block, you
> re-generate the correct data and store it - that's what your raid is for.
> 
> If you have no detected read errors, and there is one parity
> inconsistency, then /probably/ that block has had an undetected read
> error, or it simply has not been written completely before a crash.
> Either way, just re-write the correct parity.
> 
> If there are two or more parity inconsistencies, but not all parities
> are in error, then you either have multiple disk or block failures, or
> you have a partly-written stripe.  Any attempts at "smart" correction
> will almost certainly be worse than just re-writing the new parities and
> hoping that the filesystem's journal works.
> 
> If all the parities are inconsistent, then the "smart" thing is to look
> for a single incorrect disk block.  Just step through the blocks one by
> one - assume that block is wrong and replace it (in temporary memory,
> not on disk!) with a recovered version from the other data blocks and
> the parities (only the first parity is needed).  Re-calculate the other
> parities and compare.  If the other parities now match, then you have
> found a single inconsistent data block.  It /may/ be a good idea to
> re-write this - or maybe not (see the blog post linked above).
> 
> If you don't find any single data blocks that can be "corrected" in this
> way, then re-writing the parity blocks to match the disk data is
> probably the least harmful fix.

It has to be pointed out that all filesystems or are trying to integrate
or have integrated some sort of checksumming to avoid guessing which
between the data and/or the parity is wrong:
- btrfs is fully checksummed
- zfs is fully checksummed
- ext4 and xfs are trying to checksumming the metadata [1][2]
- refs (Windows) protects with checksum the metadata and (optionally)
the data [3]

We are talking about


[1] https://ext4.wiki.kernel.org/index.php/Ext4_Metadata_Checksums
[2] http://xfs.org/images/d/d1/Xfs-scalability-lca2012.pdf
[3] http://en.wikipedia.org/wiki/ReFS

> 
> 
> Remember, this is not a general error detection and correction scheme -
> it is a system targeted for a particular type of use, with particular
> patterns of failure and failure causes, and particular mechanisms on top
> (journalled file systems) to consider.
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

  reply	other threads:[~2013-11-21 17:37 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-18 22:08 Triple parity and beyond Andrea Mazzoleni
2013-11-18 22:12 ` H. Peter Anvin
2013-11-18 22:35   ` Andrea Mazzoleni
2013-11-18 23:25     ` H. Peter Anvin
2013-11-19 10:16       ` David Brown
2013-11-19 17:36         ` Andrea Mazzoleni
2013-11-19 22:51           ` Drew
2013-11-20  0:54             ` Chris Murphy
2013-11-20  1:23               ` John Williams
2013-11-20 10:35                 ` David Brown
2013-11-20 10:31           ` David Brown
2013-11-20 18:09             ` John Williams
2013-11-20 18:44               ` Andrea Mazzoleni
2013-11-21  6:15                 ` Stan Hoeppner
2013-11-21  8:32               ` David Brown
2013-11-20 18:34             ` Andrea Mazzoleni
2013-11-20 18:43               ` H. Peter Anvin
2013-11-20 18:56                 ` Andrea Mazzoleni
2013-11-20 18:59                   ` H. Peter Anvin
2013-11-20 21:21                     ` Andrea Mazzoleni
2013-11-20 19:00                   ` H. Peter Anvin
2013-11-20 21:04                     ` Andrea Mazzoleni
2013-11-20 21:06                       ` H. Peter Anvin
2013-11-21  8:36               ` David Brown
2013-11-19 17:28       ` Andrea Mazzoleni
2013-11-19 20:29         ` Ric Wheeler
2013-11-20 16:16           ` James Plank
2013-11-20 19:05             ` Andrea Mazzoleni
2013-11-20 19:10               ` H. Peter Anvin
2013-11-20 20:30                 ` James Plank
2013-11-20 21:23                   ` Andrea Mazzoleni
2013-11-27  2:50                     ` ronnie sahlberg
2013-11-20 21:28                   ` H. Peter Anvin
2013-11-21  1:28             ` Stan Hoeppner
2013-11-21  2:46               ` John Williams
2013-11-21  6:52                 ` Stan Hoeppner
2013-11-21  7:05                   ` John Williams
2013-11-21 22:57                     ` Stan Hoeppner
2013-11-21 23:38                       ` John Williams
2013-11-22  9:35                         ` Stan Hoeppner
2013-11-22 15:01                           ` John Williams
2013-11-22 22:28                             ` Stan Hoeppner
2013-11-22 23:07                       ` NeilBrown
2013-11-23  3:46                         ` Stan Hoeppner
2013-11-23  5:04                           ` NeilBrown
2013-11-23  5:34                             ` John Williams
2013-11-23  7:12                               ` NeilBrown
2013-11-24  4:03                                 ` Stan Hoeppner
2013-11-24  5:14                                   ` John Williams
2013-11-24 21:13                                     ` Stan Hoeppner
2013-11-24 23:28                                       ` Rudy Zijlstra
     [not found]                                       ` <l6u3h9$l72$2@ger.gmane.org>
2013-11-25  2:04                                         ` Stan Hoeppner
2013-11-25  9:15                                       ` David Brown
2013-11-24  5:19                                   ` Russell Coker
2013-11-24 21:44                                     ` Stan Hoeppner
2013-11-24 22:31                                       ` Mark Knecht
2013-11-25  2:14                                       ` Russell Coker
2013-11-25  9:20                                         ` David Brown
2013-11-21  8:08               ` joystick
2013-11-22  0:30                 ` Stan Hoeppner
2013-11-22  0:33                   ` H. Peter Anvin
2013-11-22  0:45                   ` David Brown
2013-11-21  9:07               ` David Brown
2013-11-21  9:54                 ` Adam Goryachev
2013-11-21 10:32                   ` David Brown
2013-11-22  8:12                   ` Russell Coker
2013-11-25 18:23                     ` Pasi Kärkkäinen
2013-11-22  8:13                 ` Stan Hoeppner
2013-11-22 13:15                   ` David Brown
2013-11-22 16:07                   ` Stan Hoeppner
2013-11-22 22:59                     ` NeilBrown
2013-11-23 17:39                       ` David Brown
2013-11-22 16:50                   ` Mark Knecht
2013-11-22 19:51                     ` Duncan
2013-11-22  8:38                 ` Stan Hoeppner
2013-11-22 13:24                   ` David Brown
2013-11-28  7:16                     ` Stan Hoeppner
2013-11-28  7:36                       ` Russell Coker
2013-11-28  9:56                       ` David Brown
2013-11-21 19:56               ` Piergiorgio Sartor
2013-11-19 18:12 ` Piergiorgio Sartor
2013-11-20 10:44   ` David Brown
2013-11-20 21:59     ` Piergiorgio Sartor
2013-11-21 10:13       ` David Brown
2013-11-21 17:37         ` Goffredo Baroncelli [this message]
2013-11-21 20:05         ` Piergiorgio Sartor
2013-11-21 20:31           ` David Brown
2013-11-21 20:52             ` Piergiorgio Sartor
2013-11-22  0:32               ` David Brown
2013-11-22 20:32                 ` Piergiorgio Sartor
2013-11-26 18:10             ` joystick
2013-11-20 21:38   ` Andrea Mazzoleni
2013-11-20 22:29 ` Piergiorgio Sartor
2013-11-23  7:55   ` Andrea Mazzoleni
2013-11-23 22:10     ` Piergiorgio Sartor
2013-11-24  9:39       ` Andrea Mazzoleni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=528E44D3.8050705@libero.it \
    --to=kreijack@libero.it \
    --cc=amadvance@gmail.com \
    --cc=creamyfish@gmail.com \
    --cc=david.brown@hesbynett.no \
    --cc=hpa@zytor.com \
    --cc=kreijack@inwind.it \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=piergiorgio.sartor@nexgo.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).