Re: Adventures in btrfs raid5 disk recovery

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>,
	Chris Murphy <lists@colorremedies.com>
Cc: Andrei Borzenkov <arvidjaar@gmail.com>,
	Hugo Mills <hugo@carfax.org.uk>,
	kreijack@inwind.it, Roman Mamedov <rm@romanrm.net>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Adventures in btrfs raid5 disk recovery
Date: Tue, 28 Jun 2016 08:05:34 -0400	[thread overview]
Message-ID: <ab23dea9-4fee-feef-cc7a-5f58cfd4067f@gmail.com> (raw)
In-Reply-To: <20160627215726.GG14667@hungrycats.org>

On 2016-06-27 17:57, Zygo Blaxell wrote:
> On Mon, Jun 27, 2016 at 10:17:04AM -0600, Chris Murphy wrote:
>> On Mon, Jun 27, 2016 at 5:21 AM, Austin S. Hemmelgarn
>> <ahferroin7@gmail.com> wrote:
>>> On 2016-06-25 12:44, Chris Murphy wrote:
>>>> On Fri, Jun 24, 2016 at 12:19 PM, Austin S. Hemmelgarn
>>>> <ahferroin7@gmail.com> wrote:
>>>>
>>>> OK but hold on. During scrub, it should read data, compute checksums
>>>> *and* parity, and compare those to what's on-disk - > EXTENT_CSUM in
>>>> the checksum tree, and the parity strip in the chunk tree. And if
>>>> parity is wrong, then it should be replaced.
>>>
>>> Except that's horribly inefficient.  With limited exceptions involving
>>> highly situational co-processors, computing a checksum of a parity block is
>>> always going to be faster than computing parity for the stripe.  By using
>>> that to check parity, we can safely speed up the common case of near zero
>>> errors during a scrub by a pretty significant factor.
>>
>> OK I'm in favor of that. Although somehow md gets away with this by
>> computing and checking parity for its scrubs, and still manages to
>> keep drives saturated in the process - at least HDDs, I'm not sure how
>> it fares on SSDs.
>
> A modest desktop CPU can compute raid6 parity at 6GB/sec, a less-modest
> one at more than 10GB/sec.  Maybe a bottleneck is within reach of an
> array of SSDs vs. a slow CPU.
OK, great for people who are using modern desktop or server CPU's.  Not 
everyone has that luxury, and even on many such CPU's, it's _still_ 
faster to computer CRC32c checksums.  On top of that, we don't appear to 
be using the in-kernel parity-raid libraries (or if we are, I haven't 
been able to find where we are calling the functions for it), so we 
don't necessarily get assembly optimized or co-processor accelerated 
computation of the parity itself.  The other thing that I didn't mention 
above though, is that computing parity checksums will always take less 
time than computing parity, because you have to process significantly 
less data.  On a 4 disk RAID5 array, you're processing roughly 2/3 as 
much data to do the parity checksums instead of parity itself, which 
means that the parity computation would need to be 200% faster than the 
CRC32c computation to break even, and this margin gets bigger and bigger 
as you add more disks.

On small arrays, this obviously won't have much impact.  Once you start 
to scale past a few TB though, even a few hundred MB/s faster processing 
means a significant decrease in processing time.  Say you have a CPU 
which gets about 12.0GB/s for RAID5 parity, and and about 12.25GB/s for 
CRC32c (~2% is a conservative ratio assuming you use the CRC32c 
instruction and assembly optimized RAID5 parity computations on a modern 
x86_64 processor (the ratio on both the mobile Core i5 in my laptop and 
the Xeon E3 in my home server is closer to 5%)).  Assuming those 
numbers, and that we're already checking checksums on non-parity blocks, 
processing 120TB of data in a 4 disk array (which gives 40TB of parity 
data, so 160TB total) gives:
For computing the parity to scrub:
120TB / 12.25GB =  9795.9 seconds for processing CRC32c csums of all the 
regular data
120TB / 12GB    = 10000 seconds for processing parity of all stripes
                 = 19795.9 seconds total
                 ~ 5.4 hours total

For computing csums of the parity:
120TB / 12.25GB =  9795.9 seconds for processing CRC32c csums of all the 
regular data
40TB / 12.25GB  =  3265.3 seconds for processing CRC32c csums of all the 
parity data
                 = 13061.2 seconds total
                 ~ 3.6 hours total

The checksum based computation is approximately 34% faster than the 
parity computation.  Much of this of course is that you have to process 
the regular data twice for the parity computation method (once for 
csums, once for parity).  You could probably do one pass computing both 
values, but that would need to be done carefully; and, without 
significant optimization, would likely not get you much benefit other 
than cutting the number of loads in half.

next prev parent reply	other threads:[~2016-06-28 12:05 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-20  3:44 Adventures in btrfs raid5 disk recovery Zygo Blaxell
2016-06-20 18:13 ` Roman Mamedov
2016-06-20 19:11   ` Zygo Blaxell
2016-06-20 19:30     ` Chris Murphy
2016-06-20 20:40       ` Zygo Blaxell
2016-06-20 21:27         ` Chris Murphy
2016-06-21  1:55           ` Zygo Blaxell
2016-06-21  3:53             ` Zygo Blaxell
2016-06-22 17:14             ` Chris Murphy
2016-06-22 20:35               ` Zygo Blaxell
2016-06-23 19:32                 ` Goffredo Baroncelli
2016-06-24  0:26                   ` Chris Murphy
2016-06-24  1:47                     ` Zygo Blaxell
2016-06-24  4:02                       ` Andrei Borzenkov
2016-06-24  8:50                         ` Hugo Mills
2016-06-24  9:52                           ` Andrei Borzenkov
2016-06-24 10:16                             ` Hugo Mills
2016-06-24 10:19                               ` Andrei Borzenkov
2016-06-24 10:59                                 ` Hugo Mills
2016-06-24 11:36                                   ` Austin S. Hemmelgarn
2016-06-24 17:40                               ` Chris Murphy
2016-06-24 18:06                                 ` Zygo Blaxell
2016-06-24 17:06                             ` Chris Murphy
2016-06-24 17:21                               ` Andrei Borzenkov
2016-06-24 17:52                                 ` Chris Murphy
2016-06-24 18:19                                   ` Austin S. Hemmelgarn
2016-06-25 16:44                                     ` Chris Murphy
2016-06-25 21:52                                       ` Chris Murphy
2016-06-26  7:54                                         ` Andrei Borzenkov
2016-06-26 15:03                                           ` Duncan
2016-06-26 19:30                                           ` Chris Murphy
2016-06-26 19:52                                             ` Zygo Blaxell
2016-06-27 11:21                                       ` Austin S. Hemmelgarn
2016-06-27 16:17                                         ` Chris Murphy
2016-06-27 20:54                                           ` Chris Murphy
2016-06-27 21:02                                           ` Henk Slager
2016-06-27 21:57                                           ` Zygo Blaxell
2016-06-27 22:30                                             ` Chris Murphy
2016-06-28  1:52                                               ` Zygo Blaxell
2016-06-28  2:39                                                 ` Chris Murphy
2016-06-28  3:17                                                   ` Zygo Blaxell
2016-06-28 11:23                                                     ` Austin S. Hemmelgarn
2016-06-28 12:05                                             ` Austin S. Hemmelgarn [this message]
2016-06-28 12:14                                               ` Steven Haigh
2016-06-28 12:25                                                 ` Austin S. Hemmelgarn
2016-06-28 16:40                                                   ` Steven Haigh
2016-06-28 18:01                                                     ` Chris Murphy
2016-06-28 18:17                                                       ` Steven Haigh
2016-07-05 23:05                                                         ` Chris Murphy
2016-07-06 11:51                                                           ` Austin S. Hemmelgarn
2016-07-06 16:43                                                             ` Chris Murphy
2016-07-06 17:18                                                               ` Austin S. Hemmelgarn
2016-07-06 18:45                                                                 ` Chris Murphy
2016-07-06 19:15                                                                   ` Austin S. Hemmelgarn
2016-07-06 21:01                                                                     ` Chris Murphy
2016-06-24 16:52                           ` Chris Murphy
2016-06-24 16:56                             ` Hugo Mills
2016-06-24 16:39                         ` Zygo Blaxell
2016-06-24  1:36                   ` Zygo Blaxell
2016-06-23 23:37               ` Chris Murphy
2016-06-24  2:07                 ` Zygo Blaxell
2016-06-24  5:20                   ` Chris Murphy
2016-06-24 10:16                     ` Andrei Borzenkov
2016-06-24 17:33                       ` Chris Murphy
2016-06-24 11:24                     ` Austin S. Hemmelgarn
2016-06-24 16:32                     ` Zygo Blaxell
2016-06-24  2:17                 ` Zygo Blaxell
2016-06-22  4:06 ` Adventures in btrfs raid5 disk recovery - update Zygo Blaxell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ab23dea9-4fee-feef-cc7a-5f58cfd4067f@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=arvidjaar@gmail.com \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=hugo@carfax.org.uk \
    --cc=kreijack@inwind.it \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    --cc=rm@romanrm.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).