From: Johannes Thumshirn <jthumshirn@suse.de>
To: David Brown <david.brown@hesbynett.no>
Cc: NeilBrown <neilb@suse.com>, Wols Lists <antlists@youngman.org.uk>,
lsf-pc@lists.linux-foundation.org, linux-raid@vger.kernel.org,
linux-block@vger.kernel.org, Hannes Reinecke <hare@suse.de>,
Neil Brown <neilb@suse.de>
Subject: Re: [LSF/MM TOPIC] De-clustered RAID with MD
Date: Wed, 31 Jan 2018 11:58:34 +0100 [thread overview]
Message-ID: <mqdy3kewazp.fsf@linux-x5ow.site> (raw)
In-Reply-To: <5A71933B.1050908@hesbynett.no> (David Brown's message of "Wed, 31 Jan 2018 10:58:19 +0100")
David Brown <david.brown@hesbynett.no> writes:
> That sounds smart. I don't see that you need anything particularly
> complicated for how you distribute your data and parity drives across
> the 100 disks - you just need a fairly even spread.
Exactly.
> I would be more concerned with how you could deal with resizing such an
> array. In particular, I think it is not unlikely that someone with a
> 100 drive array will one day want to add another bank of 24 disks (or
> whatever fits in a cabinet). Making that work nicely would, I believe,
> be more important than making sure the rebuild load distribution is
> balanced evenly across 99 drives.
I don't think rebuilding is such a big deal, lets consider the following
hypothetical scenario:
6 Disks with 4 data blocks (3 replicas per block, could be RAID1 like
duplicates or RAID5 like data + parity, doesn't matter at all for this
example)
D1 D2 D3 D4 D5 D6
[A] [B] [C] [ ] [ ] [ ]
[ ] [ ] [ ] [A] [D] [B]
[ ] [A] [B] [ ] [C] [ ]
[C] [ ] [ ] [D] [ ] [D]
Now we're adding one disk and rebalance:
D1 D2 D3 D4 D5 D6 D7
[A] [B] [C] [ ] [ ] [ ] [A]
[ ] [ ] [ ] [ ] [D] [B] [ ]
[ ] [A] [B] [ ] [ ] [ ] [C]
[C] [ ] [ ] [D] [ ] [D] [ ]
This moved the "A" from D4 and the "C" from D5 to D7. The whole
rebalancing affected only 3 disks (read from D4 and D5 write to D7).
> I would also be interested in how the data and parities are distributed
> across cabinets and disk controllers. When you manually build from
> smaller raid sets, you can ensure that in set the data disks and the
> parity are all in different cabinets - that way if an entire cabinet
> goes up in smoke, you have lost one drive from each set, and your data
> is still there. With a pseudo random layout, you have lost that. (I
> don't know how often entire cabinets of disks die, but I once lost both
> disks of a raid1 mirror when the disk controller card died.)
Well this is something CRSUH takes care of. As I said earlier it's a
weighted decision tree. One of the weights could be to evenly spread all
blocks across two cabinets.
Taking this into account would require a non-trivial user interface and
I'm not sure if the benefits of this outnumber the costs (at least for
an initial implementation).
Byte,
Johannes
--
Johannes Thumshirn Storage
jthumshirn@suse.de +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
next prev parent reply other threads:[~2018-01-31 10:58 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-29 15:23 [LSF/MM TOPIC] De-clustered RAID with MD Johannes Thumshirn
2018-01-29 16:32 ` Wols Lists
2018-01-29 21:50 ` [Lsf-pc] " NeilBrown
2018-01-30 10:43 ` Wols Lists
2018-01-30 11:24 ` NeilBrown
2018-01-30 17:40 ` Wol's lists
2018-02-03 15:53 ` Wols Lists
2018-02-03 17:16 ` Wols Lists
2018-01-31 9:58 ` [Lsf-pc] " David Brown
2018-01-31 10:58 ` Johannes Thumshirn [this message]
2018-01-31 14:27 ` Wols Lists
2018-01-31 14:41 ` David Brown
2018-01-30 9:40 ` [Lsf-pc] " Johannes Thumshirn
2018-01-31 8:03 ` David Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=mqdy3kewazp.fsf@linux-x5ow.site \
--to=jthumshirn@suse.de \
--cc=antlists@youngman.org.uk \
--cc=david.brown@hesbynett.no \
--cc=hare@suse.de \
--cc=linux-block@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=neilb@suse.com \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox