From: Johannes Thumshirn <jthumshirn@suse.de>
To: lsf-pc@lists.linux-foundation.org
Cc: linux-raid@vger.kernel.org, linux-block@vger.kernel.org,
Hannes Reinecke <hare@suse.de>, Neil Brown <neilb@suse.de>
Subject: [LSF/MM TOPIC] De-clustered RAID with MD
Date: Mon, 29 Jan 2018 16:23:07 +0100 [thread overview]
Message-ID: <mqdvafkhep0.fsf@linux-x5ow.site> (raw)
Hi linux-raid, lsf-pc
(If you've received this mail multiple times, I'm sorry, I'm having
trouble with the mail setup).
With the rise of bigger and bigger disks, array rebuilding times start
skyrocketing.
In a paper form '92 Holland and Gibson [1] suggest a mapping algorithm
similar to RAID5 but instead of utilizing all disks in an array for
every I/O operation, but implement a per-I/O mapping function to only
use a subset of the available disks.
This has at least two advantages:
1) If one disk has to be replaced, it's not needed to read the data from
all disks to recover the one failed disk so non-affected disks can be
used for real user I/O and not just recovery and
2) an efficient mapping function can improve parallel I/O submission, as
two different I/Os are not necessarily going to the same disks in the
array.
For the mapping function used a hashing algorithm like Ceph's CRUSH [2]
would be ideal, as it provides a pseudo random but deterministic mapping
for the I/O onto the drives.
This whole declustering of cause only makes sense for more than (at
least) 4 drives but we do have customers with several orders of
magnitude more drivers in an MD array.
At LSF I'd like to discuss if:
1) The wider MD audience is interested in de-clusterd RAID with MD
2) de-clustered RAID should be implemented as a sublevel of RAID5 or
as a new personality
3) CRUSH is a suitible algorith for this (there's evidence in [3] that
the NetApp E-Series Arrays do use CRUSH for parity declustering)
[1] http://www.pdl.cmu.edu/PDL-FTP/Declustering/ASPLOS.pdf
[2] https://ceph.com/wp-content/uploads/2016/08/weil-crush-sc06.pdf
[3]
https://www.snia.org/sites/default/files/files2/files2/SDC2013/presentations/DistributedStorage/Jibbe-Gwaltney_Method-to_Establish_High_Availability.pdf
Thanks,
Johannes
--
Johannes Thumshirn Storage
jthumshirn@suse.de +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
next reply other threads:[~2018-01-29 15:23 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-29 15:23 Johannes Thumshirn [this message]
2018-01-29 16:32 ` [LSF/MM TOPIC] De-clustered RAID with MD Wols Lists
2018-01-29 21:50 ` NeilBrown
2018-01-30 10:43 ` Wols Lists
2018-01-30 11:24 ` NeilBrown
2018-01-30 17:40 ` Wol's lists
2018-02-03 15:53 ` Wols Lists
2018-02-03 17:16 ` Wols Lists
2018-01-31 9:58 ` David Brown
2018-01-31 10:58 ` Johannes Thumshirn
2018-01-31 14:27 ` Wols Lists
2018-01-31 14:41 ` David Brown
2018-01-30 9:40 ` Johannes Thumshirn
2018-01-31 8:03 ` David Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=mqdvafkhep0.fsf@linux-x5ow.site \
--to=jthumshirn@suse.de \
--cc=hare@suse.de \
--cc=linux-block@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).