Re: [LSF/MM TOPIC] De-clustered RAID with MD

public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed

From: Wol's lists <antlists@youngman.org.uk>
To: NeilBrown <neilb@suse.com>,
	Johannes Thumshirn <jthumshirn@suse.de>,
	lsf-pc@lists.linux-foundation.org
Cc: linux-raid@vger.kernel.org, linux-block@vger.kernel.org,
	Hannes Reinecke <hare@suse.de>, Neil Brown <neilb@suse.de>
Subject: Re: [LSF/MM TOPIC] De-clustered RAID with MD
Date: Tue, 30 Jan 2018 17:40:02 +0000	[thread overview]
Message-ID: <337d2541-e8d1-1c2e-e61b-bcca1e7c7388@youngman.org.uk> (raw)
In-Reply-To: <87372n613s.fsf@notabene.neil.brown.name>

On 30/01/18 11:24, NeilBrown wrote:
> On Tue, Jan 30 2018, Wols Lists wrote:
> 
>> On 29/01/18 21:50, NeilBrown wrote:
>>> By doing declustered parity you can sanely do raid6 on 100 drives, using
>>> a logical stripe size that is much smaller than 100.
>>> When recovering a single drive, the 10-groups-of-10 would put heavy load
>>> on 9 other drives, while the decluster approach puts light load on 99
>>> other drives.  No matter how clever md is at throttling recovery, I
>>> would still rather distribute the load so that md has an easier job.
>>
>> Not offering to do it ... :-)
>>
>> But that sounds a bit like linux raid-10. Could a simple approach be to
>> do something like "raid-6,11,100", ie raid-6 with 9 data chunks, two
>> parity, striped across 100 drives? Okay, it's not as good as the
>> decluster approach, but it would spread the stress of a rebuild across
>> 20 drives, not 10. And probably be fairly easy to implement.
> 
> If you did that, I think you would be about 80% of the way to fully
> declustered-parity RAID.
> If you then tweak the math a bit so that one stripe would was
> 
> A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4 ....
> 
> and the next
> 
> A1 C1 A2 C2 A3 C3 A4 C4 B1 D1 B2 D2 ....
> 
> and then
> 
> A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 C3 D3 ....
> 
>                    XX
>                    
> When Ax are a logical stripe and Bx are the next,  then you have a
> slightly better distribution.  If device XX fails then the reads needed
> for the first stripe mostly come from different drives than those for
> the second stripe, which are mostly different again for the 3rd stripe.
> 
> Presumably the CRUSH algorithm (which I only skim-read once about a year
> ago) formalizes how to do this, and does it better.
> Once you have the data handling in place for your proposal, it should be
> little more than replacing a couple of calculations to get the full
> solution.
> 
Okay. I think I have it - a definition for raid-16 (or is it raid-61). 
But I need a bit of help with the maths. And it might need a look-up 
table :-(

Okay. Like raid-10, raid-16 would be spec'd as "--level 16,3,8", ie 3 
mirrors, emulating an 8-drive raid-6.

What I'm looking for is a PRNG that has the "bug" that it repeats over a 
short period, and over that period is guaranteed to repeat each number 
the same number of times. I saw a wonderful video demonstration of this 
years ago - if you plot the generated number against the number of times 
it was generated, after a few rows it "filled up" a rectangle on the graph.

At which point the maths becomes very simple. I just need at least as 
many real drives as "mirrors times emulated".

If somebody can come up with the algorithm I want, I can spec it, and 
then maybe someone can implement it? It'll be fun testing - I'll need my 
new machine when I get it working :-)

Cheers,
Wol

next prev parent reply	other threads:[~2018-01-30 17:40 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-29 15:23 [LSF/MM TOPIC] De-clustered RAID with MD Johannes Thumshirn
2018-01-29 16:32 ` Wols Lists
2018-01-29 21:50   ` [Lsf-pc] " NeilBrown
2018-01-30 10:43     ` Wols Lists
2018-01-30 11:24       ` NeilBrown
2018-01-30 17:40         ` Wol's lists [this message]
2018-02-03 15:53         ` Wols Lists
2018-02-03 17:16         ` Wols Lists
2018-01-31  9:58     ` [Lsf-pc] " David Brown
2018-01-31 10:58       ` Johannes Thumshirn
2018-01-31 14:27       ` Wols Lists
2018-01-31 14:41         ` David Brown
2018-01-30  9:40   ` [Lsf-pc] " Johannes Thumshirn
2018-01-31  8:03     ` David Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=337d2541-e8d1-1c2e-e61b-bcca1e7c7388@youngman.org.uk \
    --to=antlists@youngman.org.uk \
    --cc=hare@suse.de \
    --cc=jthumshirn@suse.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=neilb@suse.com \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox