Re: high throughput storage server?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Stan Hoeppner <stan@hardwarefreak.com>
To: David Brown <david@westcontrol.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: high throughput storage server?
Date: Tue, 22 Feb 2011 23:52:02 -0600	[thread overview]
Message-ID: <4D64A082.9000601@hardwarefreak.com> (raw)
In-Reply-To: <ik0gkm$nn$1@dough.gmane.org>

David Brown put forth on 2/22/2011 8:18 AM:

> Yes, this is definitely true - RAID10 is less affected by running
> degraded, and recovering is faster and involves less disk wear.  The
> disadvantage compared to RAID6 is, of course, if the other half of a
> disk pair dies during recovery then your raid is gone - with RAID6 you
> have better worst-case redundancy.

The odds of the mirror partner dying during rebuild are very very long,
and the odds of suffering a URE are very low.  However, in the case of
RAID5/6, moreso with RAID5, with modern very large drives (1/2/3TB),
there is being quite a bit written these days about unrecoverable read
error rates.  Using a sufficient number of these very large disks will
at some point guarantee a URE during an array rebuild, which may very
likely cost you your entire array.  This is because every block of every
remaining disk (assuming full disk RAID not small partitions on each
disk) must be read during a RAID5/6 rebuild.  I don't have the equation
handy but Google should be able to fetch it for you.  IIRC this is one
of the reasons RAID6 is becoming more popular today.  Not just because
it can survive an additional disk failure, but that it's more resilient
to a URE during a rebuild.

With a RAID10 rebuild, as you're only reading entire contents of a
single disk, the odds of encountering a URE are much lower than with a
RAID5 with the same number of drives, simply due to the total number of
bits read.

> Once md raid has support for bad block lists, hot replace, and non-sync
> lists, then the differences will be far less clear.  If a disk in a RAID
> 5/6 set has a few failures (rather than dying completely), then it will
> run as normal except when bad blocks are accessed.  This means for all
> but the few bad blocks, the degraded performance will be full speed. And

You're muddying the definition of a "degraded RAID".

> if you use "hot replace" to replace the partially failed drive, the
> rebuild will have almost exactly the same characteristics as RAID10
> rebuilds - apart from the bad blocks, which must be recovered by parity
> calculations, you have a straight disk-to-disk copy.

Are you saying you'd take a "partially failing" drive in a RAID5/6 and
simply do a full disk copy onto the spare, except "bad blocks",
rebuilding those in the normal fashion, simply to approximate the
recover speed of RAID10?

I think your logic is a tad flawed here.  If a drive is already failing,
why on earth would you trust it, period?  I think you'd be asking for
trouble doing this.  This is precisely one of the reasons many hardware
RAID controllers have historically kicked drives offline after the first
signs of trouble--if a drive is acting flaky we don't want to trust it,
but replace it as soon as possible.

The assumption is that the data on the array is far more valuable than
the cost of a single drive or the entire hardware for that matter.  In
most environments this is the case.  Everyone seems fond of the WD20EARS
drives (which I disdain).  I hear they're loved because Newegg has them
for less than $100.  What's your 2TB of data on that drive worth?  In
the case of a MythTV box, to the owner, that $100 is worth more than the
content.  In a business setting, I'd dare say the data on that drive is
worth far more than the $100 cost of the drive and the admin $$ time
required to replace/rebuild it.

In the MythTV case what you propose might be a worthwhile risk.  In a
business environment, definitely not.

-- 
Stan

next prev parent reply	other threads:[~2011-02-23  5:52 UTC|newest]

Thread overview: 116+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-14 23:59 high throughput storage server? Matt Garman
2011-02-15  2:06 ` Doug Dumitru
2011-02-15  4:44   ` Matt Garman
2011-02-15  5:49     ` hansbkk
2011-02-15  9:43     ` David Brown
2011-02-24 20:28       ` Matt Garman
2011-02-24 20:43         ` David Brown
2011-02-15 15:16     ` Joe Landman
2011-02-15 20:37       ` NeilBrown
2011-02-15 20:47         ` Joe Landman
2011-02-15 21:41           ` NeilBrown
2011-02-24 20:58       ` Matt Garman
2011-02-24 21:20         ` Joe Landman
2011-02-26 23:54           ` high throughput storage server? GPFS w/ 10GB/s throughput to the rescue Stan Hoeppner
2011-02-27  0:56             ` Joe Landman
2011-02-27 14:55               ` Stan Hoeppner
2011-03-12 22:49                 ` Matt Garman
2011-02-27 21:30     ` high throughput storage server? Ed W
2011-02-28 15:46       ` Joe Landman
2011-02-28 23:14         ` Stan Hoeppner
2011-02-28 22:22       ` Stan Hoeppner
2011-03-02  3:44       ` Matt Garman
2011-03-02  4:20         ` Joe Landman
2011-03-02  7:10           ` Roberto Spadim
2011-03-02 19:03             ` Drew
2011-03-02 19:20               ` Roberto Spadim
2011-03-13 20:10                 ` Christoph Hellwig
2011-03-14 12:27                   ` Stan Hoeppner
2011-03-14 12:47                     ` Christoph Hellwig
2011-03-18 13:16                       ` Stan Hoeppner
2011-03-18 14:05                         ` Christoph Hellwig
2011-03-18 15:43                           ` Stan Hoeppner
2011-03-18 16:21                             ` Roberto Spadim
2011-03-18 22:01                             ` NeilBrown
2011-03-18 22:23                               ` Roberto Spadim
2011-03-20  1:34                               ` Stan Hoeppner
2011-03-20  3:41                                 ` NeilBrown
2011-03-20  5:32                                   ` Roberto Spadim
2011-03-20 23:22                                     ` Stan Hoeppner
2011-03-21  0:52                                       ` Roberto Spadim
2011-03-21  2:44                                       ` Keld Jørn Simonsen
2011-03-21  3:13                                         ` Roberto Spadim
2011-03-21  3:14                                           ` Roberto Spadim
2011-03-21 17:07                                             ` Stan Hoeppner
2011-03-21 14:18                                         ` Stan Hoeppner
2011-03-21 17:08                                           ` Roberto Spadim
2011-03-21 22:13                                           ` Keld Jørn Simonsen
2011-03-22  9:46                                             ` Robin Hill
2011-03-22 10:14                                               ` Keld Jørn Simonsen
2011-03-23  8:53                                                 ` Stan Hoeppner
2011-03-23 15:57                                                   ` Roberto Spadim
2011-03-23 16:19                                                     ` Joe Landman
2011-03-24  8:05                                                       ` Stan Hoeppner
2011-03-24 13:12                                                         ` Joe Landman
2011-03-25  7:06                                                           ` Stan Hoeppner
2011-03-24 17:07                                                       ` Christoph Hellwig
2011-03-24  5:52                                                     ` Stan Hoeppner
2011-03-24  6:33                                                       ` NeilBrown
2011-03-24  8:07                                                         ` Roberto Spadim
2011-03-24  8:31                                                         ` Stan Hoeppner
2011-03-22 10:00                                             ` Stan Hoeppner
2011-03-22 11:01                                               ` Keld Jørn Simonsen
2011-02-15 12:29 ` Stan Hoeppner
2011-02-15 12:45   ` Roberto Spadim
2011-02-15 13:03     ` Roberto Spadim
2011-02-24 20:43       ` Matt Garman
2011-02-24 20:53         ` Zdenek Kaspar
2011-02-24 21:07           ` Joe Landman
2011-02-15 13:39   ` David Brown
2011-02-16 23:32     ` Stan Hoeppner
2011-02-17  0:00       ` Keld Jørn Simonsen
2011-02-17  0:19         ` Stan Hoeppner
2011-02-17  2:23           ` Roberto Spadim
2011-02-17  3:05             ` Stan Hoeppner
2011-02-17  0:26       ` David Brown
2011-02-17  0:45         ` Stan Hoeppner
2011-02-17 10:39           ` David Brown
2011-02-24 20:49     ` Matt Garman
2011-02-15 13:48 ` Zdenek Kaspar
2011-02-15 14:29   ` Roberto Spadim
2011-02-15 14:51     ` A. Krijgsman
2011-02-15 16:44       ` Roberto Spadim
2011-02-15 14:56     ` Zdenek Kaspar
2011-02-24 20:36       ` Matt Garman
2011-02-17 11:07 ` John Robinson
2011-02-17 13:36   ` Roberto Spadim
2011-02-17 13:54     ` Roberto Spadim
2011-02-17 21:47   ` Stan Hoeppner
2011-02-17 22:13     ` Joe Landman
2011-02-17 23:49       ` Stan Hoeppner
2011-02-18  0:06         ` Joe Landman
2011-02-18  3:48           ` Stan Hoeppner
2011-02-18 13:49 ` Mattias Wadenstein
2011-02-18 23:16   ` Stan Hoeppner
2011-02-21 10:25     ` Mattias Wadenstein
2011-02-21 21:51       ` Stan Hoeppner
2011-02-22  8:57         ` David Brown
2011-02-22  9:30           ` Mattias Wadenstein
2011-02-22  9:49             ` David Brown
2011-02-22 13:38           ` Stan Hoeppner
2011-02-22 14:18             ` David Brown
2011-02-23  5:52               ` Stan Hoeppner [this message]
2011-02-23 13:56                 ` David Brown
2011-02-23 14:25                   ` John Robinson
2011-02-23 15:15                     ` David Brown
2011-02-23 23:14                       ` Stan Hoeppner
2011-02-24 10:19                         ` David Brown
2011-02-23 21:59                     ` Stan Hoeppner
2011-02-23 23:43                       ` John Robinson
2011-02-24 15:53                         ` Stan Hoeppner
2011-02-23 21:11                   ` Stan Hoeppner
2011-02-24 11:24                     ` David Brown
2011-02-24 23:30                       ` Stan Hoeppner
2011-02-25  8:20                         ` David Brown
2011-02-19  0:24   ` Joe Landman
2011-02-21 10:04     ` Mattias Wadenstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D64A082.9000601@hardwarefreak.com \
    --to=stan@hardwarefreak.com \
    --cc=david@westcontrol.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).