Re: RAID1 3+ drives - Russell Coker

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Russell Coker <russell@coker.com.au>
To: Duncan <1i5t5.duncan@cox.net>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: RAID1 3+ drives
Date: Sat, 28 Jun 2014 23:40 +1000	[thread overview]
Message-ID: <1428358.y4e5pb3mAe@xev> (raw)
In-Reply-To: <pan$960c0$4afea28f$ba20e37d$eab767e1@cox.net>

On Sat, 28 Jun 2014 11:38:47 Duncan wrote:
> And with the size of disks we have today, the statistics on multiple
> whole device reliability are NOT good to us!  There's a VERY REAL chance,
> even likelihood, that at least one block on the device is going to be
> bad, and not be caught by its own error detection!

http://research.cs.wisc.edu/adsl/Publications/corruption-fast08.html

The above paper suggests that it's about 10% of SATA disks getting such errors 
per year and that typically a disk that has such a problem has it for ~50 
sectors.  The probability of having 2 disks randomly get such errors (if they 
are truly random and independent) would be something like 1% per year.  The 
probability that the ~50 sectors on each of 2*3TB disks happening to match up 
is much lower.

> > Also if you were REALLY paranoid you could have 2 BTRFS RAID-1
> > filesystems that each contain a single large file.  Those 2 large files
> > could be run via losetup and used for another BTRFS RAID-1 filesystem.
> > That gets you redundancy at both levels.  Of course if you had 2 disks
> > in one pair fail then the loopback BTRFS filesystem would still be OK.
> 
> But the COW and fragmentation issues on the bottom level... OUCH!  And
> you can't simply set NOCOW, because that turns off the checksumming as
> well, leaving you right back where you were without the integrity
> checking!

It really depends on how much performance you need.  I've got some virtual 
servers running BTRFS within BTRFS and with modern hardware and a light load 
it works OK.

> *BUT* at a cost of essentially *CONSTANT* scrubbing.  Constant because at
> the multi-TBs we're talking, just completing a single scrub cycle could
> well take more than a standard 8-hour work-day, so by the time you
> finish, it's already about time to start the next scrub cycle.

Scrubbing my BTRFS RAID-1 filesystem with 2.4TB of data stored on a pair of 
3TB disks takes 5 hours.

> That sort of constant scrubbing is going to take its toll both on device
> life and on I/O thruput for whatever data you're actually storing on the
> device, since a good share of the time it's going to be scrubbing as
> well, slowing down the speed of the real I/O.

Some years ago I asked an executive from a company that manufactured hard 
drives about this.  The engineering manager who was directed to answer my 
question told me that the drives were designed to perform any sequence of 
legal operations continually for the warranty period.  So if a disk had a 3 
year warranty then it should be able to survive a scrubbing loop for 3 years.

But scrubbing a system that runs 24*7 is a problem.  Hopefully we will get a 
speed limit feature for BTRFS scrubbing as there is for Linux software RAID 
rebuild/scrub.

> > No.  I have a RAID-1 array of 3TB disks that is 2/3 full which I scrub
> > every Sunday night.  If I had an array of 4 disks then I could do scrubs
> > on Saturday night as well.
> 
> But are you scrubbing at both the btrfs and the md/dmraid level?  That'll
> effectively double the scrub-time.

It's a BTRFS RAID-1, there is no mdadm on that system.

> And while that might not take a full 24 hours, it's likely to take a
> significant enough portion of 24 hours, that if you're doing a full mdraid
> and btrfs level both scrub every two days, some significant fraction (say
> a third to a half) of the time will be spent scrubbing, during which
> normal I/O speeds will be significantly reduced, while also reducing
> device lifetime due to the relatively high duty cycle seek activity.

When the expected error rate for SATA disks is ~10% of disks having errors per 
year a scrub every second day seems rather paranoid.

But if you are that paranoid then the wisc.edu paper suggests that you should 
be buying "enterprise" disks that have a much lower error rate.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

next prev parent reply	other threads:[~2014-06-28 13:40 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-28  0:30 RAID1 3+ drives Zack Coffey
2014-06-28  0:51 ` Russell Coker
2014-06-28  4:26   ` Duncan
2014-06-28  6:28     ` Russell Coker
2014-06-28  7:38       ` Martin Steigerwald
2014-06-28  7:43         ` Hugo Mills
2014-06-28 11:38       ` Duncan
2014-06-28 13:40         ` Russell Coker [this message]
2014-06-28 18:15       ` Chris Murphy
2014-06-28 10:13     ` Roman Mamedov
2014-06-29  2:30       ` Duncan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1428358.y4e5pb3mAe@xev \
    --to=russell@coker.com.au \
    --cc=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.