Re: Distributed spares

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Neil Brown <neilb@suse.de>
To: Bill Davidsen <davidsen@tmr.com>
Cc: David Lethe <david@santools.com>,
	Linux RAID <linux-raid@vger.kernel.org>
Subject: Re: Distributed spares
Date: Mon, 20 Oct 2008 12:11:26 +1100	[thread overview]
Message-ID: <18683.55998.145056.757423@notabene.brown> (raw)
In-Reply-To: message from Bill Davidsen on Friday October 17

On Friday October 17, davidsen@tmr.com wrote:
> David Lethe wrote:
> >
> > With all due respect, RAID5E isn't practical.  Too many corner cases
> > dealing
> > With performance implications, and where you even just put the parity
> > block, to insure
> > That when a disk fails you didn't put yourself into situation where the
> > Hot spare chunk
> > is located on the disk drive that just died.  
> >
> >   
> Having run 38 multi-TB machines for an ISP using RAID5e in the SCSI 
> controller, I feel pretty sure that the practicality is established, and 
> only the ability to reinvent that particular wheel is in question. The 
> complexity is that the hot spare drive needs to be defined after the 1st 
> drive failure, using the spare sectors on the functional drives.

I don't think that will be particularly complex.  It will just be a
bit of code in raid5_compute_sector.  The detail of 'which device has
failed' would be stored in ->algorithm somehow.

There is an interesting question of how general do we want the code to
be.
e.g do we want to be able to configure an array with 2 distributed
spares?  I suspect that people would rarely want 2, and never want 3,
so it would be worth making 2 work if the code didn't get too complex,
which I don't think it would (but I'm not certain).

> > Algorithms dealing
> > with drive failures, unrecoverable read/write errors on normal
> > operations as well as rebuilds, expansions, 
> > and journalization/optimization are not well understood.  It is new
> > territory.
> >   
> 
> That's why I'm being quite cautious about saying I can do this, the 
> coding is easy, it's finding out what to code that's hard. It appears 
> that configuration decisions need to be made after the failure event, 
> before the rebuild. Yes, it's complex. But from experience I can say 
> that performance during rebuild is far better with a distributed spare 
> than beating the snot out of one newly added spare with other RAID 
> levels. So there's a performance benefit for both the normal case and 
> the rebuild case, and a side benefit of faster rebuild time.

I cannot see why rebuilding a raid5e would be faster than rebuilding a
raid5 to a fresh device.
In each case, you need to read from n-1 devices, and write to 1
device.  So all devices are constantly doing IO at the same rate.
In the raid5 case you could get better streaming as each device is
either "always reading" or "always writing", where as in a raid5e
rebuild, devices will sometimes read and sometimes write.  So if
anything, I would expect raid5e to rebuild more slowly, but you would
probably only notice this with small chunk sizes.

I agree that (with suitably large chunk sizes) you should be able to
get better throughput on raid5e.

NeilBrown

next prev parent reply	other threads:[~2008-10-20  1:11 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-13 21:50 Distributed spares Bill Davidsen
2008-10-13 22:11 ` Justin Piszcz
2008-10-13 22:30   ` Billy Crook
2008-10-13 23:29     ` Keld Jørn Simonsen
2008-10-14 10:12       ` Martin K. Petersen
2008-10-14 13:06         ` Keld Jørn Simonsen
2008-10-14 13:20         ` David Lethe
2008-10-14 12:02     ` non-degraded component replacement was " David Greaves
2008-10-14 13:18       ` Billy Crook
2008-10-14 23:20   ` Bill Davidsen
2008-10-14 10:04 ` Neil Brown
2008-10-16 23:50   ` Bill Davidsen
2008-10-17  4:09     ` David Lethe
2008-10-17 13:46       ` Bill Davidsen
2008-10-20  1:11         ` Neil Brown [this message]
2008-10-17 13:09   ` Gabor Gombas
  -- strict thread matches above, loose matches on Subject: below --
2008-10-14 13:30 David Lethe
2008-10-14 14:37 ` Keld Jørn Simonsen
2008-10-14 15:18   ` David Lethe
2008-10-14 16:29     ` KELEMEN Peter
2008-10-14 17:16       ` David Lethe
2008-10-14 17:20       ` Mattias Wadenstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=18683.55998.145056.757423@notabene.brown \
    --to=neilb@suse.de \
    --cc=david@santools.com \
    --cc=davidsen@tmr.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).