Re: [LSF/MM TOPIC] De-clustered RAID with MD

public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed

From: Wols Lists <antlists@youngman.org.uk>
To: Johannes Thumshirn <jthumshirn@suse.de>,
	lsf-pc@lists.linux-foundation.org
Cc: linux-raid@vger.kernel.org, linux-block@vger.kernel.org,
	Hannes Reinecke <hare@suse.de>, Neil Brown <neilb@suse.de>
Subject: Re: [LSF/MM TOPIC] De-clustered RAID with MD
Date: Mon, 29 Jan 2018 16:32:38 +0000	[thread overview]
Message-ID: <5A6F4CA6.5060802@youngman.org.uk> (raw)
In-Reply-To: <mqdvafkhep0.fsf@linux-x5ow.site>

On 29/01/18 15:23, Johannes Thumshirn wrote:
> Hi linux-raid, lsf-pc
> 
> (If you've received this mail multiple times, I'm sorry, I'm having
> trouble with the mail setup).

My immediate reactions as a lay person (I edit the raid wiki) ...
> 
> With the rise of bigger and bigger disks, array rebuilding times start
> skyrocketing.

And? Yes, your data is at risk during a rebuild, but md-raid throttles
the i/o, so it doesn't hammer the system.
> 
> In a paper form '92 Holland and Gibson [1] suggest a mapping algorithm
> similar to RAID5 but instead of utilizing all disks in an array for
> every I/O operation, but implement a per-I/O mapping function to only
> use a subset of the available disks.
> 
> This has at least two advantages:
> 1) If one disk has to be replaced, it's not needed to read the data from
>    all disks to recover the one failed disk so non-affected disks can be
>    used for real user I/O and not just recovery and

Again, that's throttling, so that's not a problem ...

> 2) an efficient mapping function can improve parallel I/O submission, as
>    two different I/Os are not necessarily going to the same disks in the
>    array. 
> 
> For the mapping function used a hashing algorithm like Ceph's CRUSH [2]
> would be ideal, as it provides a pseudo random but deterministic mapping
> for the I/O onto the drives.
> 
> This whole declustering of cause only makes sense for more than (at
> least) 4 drives but we do have customers with several orders of
> magnitude more drivers in an MD array.

If you have four drives or more - especially if they are multi-terabyte
drives - you should NOT be using raid-5 ...
> 
> At LSF I'd like to discuss if:
> 1) The wider MD audience is interested in de-clusterd RAID with MD

I haven't read the papers, so no comment, sorry.

> 2) de-clustered RAID should be implemented as a sublevel of RAID5 or
>    as a new personality

Neither! If you're going to do it, it should be raid-6.

> 3) CRUSH is a suitible algorith for this (there's evidence in [3] that
>    the NetApp E-Series Arrays do use CRUSH for parity declustering)
> 
> [1] http://www.pdl.cmu.edu/PDL-FTP/Declustering/ASPLOS.pdf 
> [2] https://ceph.com/wp-content/uploads/2016/08/weil-crush-sc06.pdf
> [3]
> https://www.snia.org/sites/default/files/files2/files2/SDC2013/presentations/DistributedStorage/Jibbe-Gwaltney_Method-to_Establish_High_Availability.pdf
> 
Okay - I've now skimmed the crush paper [2]. Looks well interesting.
BUT. It feels more like btrfs than it does like raid.

Btrfs manages disks, and does raid, it tries to be the "everything
between the hard drive and the file". This crush thingy reads to me like
it wants to be the same. There's nothing wrong with that, but md is a
unix-y "do one thing (raid) and do it well".

My knee-jerk reaction is if you want to go for it, it sounds like a good
idea. It just doesn't really feel a good fit for md.

Cheers,
Wol

next prev parent reply	other threads:[~2018-01-29 16:38 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-29 15:23 [LSF/MM TOPIC] De-clustered RAID with MD Johannes Thumshirn
2018-01-29 16:32 ` Wols Lists [this message]
2018-01-29 21:50   ` [Lsf-pc] " NeilBrown
2018-01-30 10:43     ` Wols Lists
2018-01-30 11:24       ` NeilBrown
2018-01-30 17:40         ` Wol's lists
2018-02-03 15:53         ` Wols Lists
2018-02-03 17:16         ` Wols Lists
2018-01-31  9:58     ` [Lsf-pc] " David Brown
2018-01-31 10:58       ` Johannes Thumshirn
2018-01-31 14:27       ` Wols Lists
2018-01-31 14:41         ` David Brown
2018-01-30  9:40   ` [Lsf-pc] " Johannes Thumshirn
2018-01-31  8:03     ` David Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5A6F4CA6.5060802@youngman.org.uk \
    --to=antlists@youngman.org.uk \
    --cc=hare@suse.de \
    --cc=jthumshirn@suse.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox