linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Austin S Hemmelgarn <ahferroin7@gmail.com>
To: russell@coker.com.au, Martin Steigerwald <ms@teamix.de>,
	linux-btrfs@vger.kernel.org
Subject: Re: btrfs RAID with enterprise SATA or SAS drives
Date: Thu, 10 Jul 2014 07:28:34 -0400	[thread overview]
Message-ID: <53BE78E2.40005@gmail.com> (raw)
In-Reply-To: <1873412.YKCevnRR4J@russell.coker.com.au>

[-- Attachment #1: Type: text/plain, Size: 4676 bytes --]

On 2014-07-09 22:10, Russell Coker wrote:
> On Wed, 9 Jul 2014 16:48:05 Martin Steigerwald wrote:
>>> - for someone using SAS or enterprise SATA drives with Linux, I
>>> understand btrfs gives the extra benefit of checksums, are there any
>>> other specific benefits over using mdadm or dmraid?
>>
>> I think I can answer this one.
>>
>> Most important advantage I think is BTRFS is aware of which blocks of the
>> RAID are in use and need to be synced:
>>
>> - Instant initialization of RAID regardless of size (unless at some
>> capacity mkfs.btrfs needs more time)
> 
> From mdadm(8):
> 
>        --assume-clean
>               Tell mdadm that the array pre-existed and is known to be  clean.
>               It  can be useful when trying to recover from a major failure as
>               you can be sure that no data will be affected unless  you  actu‐
>               ally  write  to  the array.  It can also be used when creating a
>               RAID1 or RAID10 if you want to avoid the initial resync, however
>               this  practice  — while normally safe — is not recommended.  Use
>               this only if you really know what you are doing.
> 
>               When the devices that will be part of a new  array  were  filled
>               with zeros before creation the operator knows the array is actu‐
>               ally clean. If that is the case,  such  as  after  running  bad‐
>               blocks,  this  argument  can be used to tell mdadm the facts the
>               operator knows.
> 
> While it might be regarded as a hack, it is possible to do a fairly instant 
> initialisation of a Linux software RAID-1.
>
This has the notable disadvantage however that the first scrub you run
will essentially preform a full resync if you didn't make sure that the
disks had identical data to begin with.
>> - Rebuild after disk failure or disk replace will only copy *used* blocks
> 
> Have you done any benchmarks on this?  The down-side of copying used blocks is 
> that you first need to discover which blocks are used.  Given that seek time is 
> a major bottleneck at some portion of space used it will be faster to just 
> copy the entire disk.
> 
> I haven't done any tests on BTRFS in this regard, but I've seen a disk 
> replacement on ZFS run significantly slower than a dd of the block device 
> would.
> 
First of all, this isn't really a good comparison for two reasons:
1. EVERYTHING on ZFS (or any filesystem that tries to do that much work)
is slower than a dd of the raw block device.
2. Even if the throughput is lower, this is only really an issue if the
disk is more than half full, because you don't copy the unused blocks

Also, while it isn't really a recovery situation, I recently upgraded
from a 2 1TB disk BTRFS RAID1 setup to a 4 1TB disk BTRFS RAID10 setup,
and the performance of the re-balance really wasn't all that bad.  I
have maybe 100GB of actual data, so the array started out roughly 10%
full, and the re-balance only took about 2 minutes.  Of course, it
probably helps that I make a point to keep my filesystems de-fragmented,
scrub and balance regularly, and don't use a lot of sub-volumes or
snapshots, so the filesystem in question is not too different from what
it would have looked like if I had just wiped the FS and restored from a
backup.
>> Scrubbing can repair from good disk if RAID with redundancy, but SoftRAID
>> should be able to do this as well. But also for scrubbing: BTRFS only
>> check and repairs used blocks.
> 
> When you scrub Linux Software RAID (and in fact pretty much every RAID) it 
> will only correct errors that the disks flag.  If a disk returns bad data and 
> says that it's good then the RAID scrub will happily copy the bad data over 
> the good data (for a RAID-1) or generate new valid parity blocks for bad data 
> (for RAID-5/6).
> 
> http://research.cs.wisc.edu/adsl/Publications/corruption-fast08.html
> 
> Page 12 of the above document says that "nearline" disks (IE the ones people 
> like me can afford for home use) have a 0.466% incidence of returning bad data 
> and claiming it's good in a year.  Currently I run about 20 such disks in a 
> variety of servers, workstations, and laptops.  Therefore the probability of 
> having no such errors on all those disks would be .99534^20=.91081.  The 
> probability of having no such errors over a period of 10 years would be 
> (.99534^20)^10=.39290 which means that over 10 years I should expect to have 
> such errors, which is why BTRFS RAID-1 and DUP metadata on single disks are 
> necessary features.
> 



[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2967 bytes --]

      parent reply	other threads:[~2014-07-10 11:28 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-09 22:01 btrfs RAID with enterprise SATA or SAS drives Daniel Pocock
2012-05-10 19:58 ` Hubert Kario
2012-05-18 16:19   ` btrfs RAID with RAID cards (thread renamed) Daniel Pocock
2012-05-11  2:18 ` btrfs RAID with enterprise SATA or SAS drives Duncan
2012-05-11 16:58   ` Martin Steigerwald
2012-05-14  8:38     ` Duncan
2014-07-09 14:48 ` Martin Steigerwald
2014-07-10  2:10   ` Russell Coker
2014-07-10  8:27     ` Martin Steigerwald
2014-07-10 11:28     ` Austin S Hemmelgarn [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53BE78E2.40005@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=ms@teamix.de \
    --cc=russell@coker.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).