Re: Full use of varying drive sizes?---maybe a new raid mode is the answer?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Goswin von Brederlow <goswin-v-b@web.de>
To: Konstantinos Skarlatos <k.skarlatos@gmail.com>
Cc: Goswin von Brederlow <goswin-v-b@web.de>,
	Jon@eHardcastle.com, linux-raid@vger.kernel.org, neilb@suse.de
Subject: Re: Full use of varying drive sizes?---maybe a new raid mode is the answer?
Date: Mon, 05 Oct 2009 11:06:40 +0200	[thread overview]
Message-ID: <87k4zanra7.fsf@frosties.localdomain> (raw)
In-Reply-To: <4AC0C3EB.30606@gmail.com> (Konstantinos Skarlatos's message of "Mon, 28 Sep 2009 17:10:51 +0300")

Konstantinos Skarlatos <k.skarlatos@gmail.com> writes:

> Goswin von Brederlow wrote:
>> Konstantinos Skarlatos <k.skarlatos@gmail.com> writes:
>>
>>
>>> Instead of doing all those things, I have a suggestion to make:
>>>
>>> Something that is like RAID 4 without striping.
>>>
>>> There are already 3 programs doing that, Unraid, Flexraid and
>>> disparity, but putting this functionality into linux-raid would be
>>> tremendous. (the first two work on linux and the third one is a
>>> command line windows program that works fine under wine).
>>>
>>> The basic idea is this: Take any number of drives, with any capacity
>>> and filesystem you like. Then provide the program with an empty disk
>>> at least as large as your largest disk. The program creates parity
>>> data by XORing together the disks sequentially block by block(or file
>>> by file), until it reaches the end of the smallest one.(It XORs block
>>> 1 of disk A with block1 of disk B, with block1 of disk C.... and
>>> writes the result to block1 of Parity disk) Then it continues with the
>>> rest of the drives, until it reaches the end of the last drive.
>>>
>>> Disk     A    B   C   D   E    P
>>> Block   1    1    1    1    1    1
>>> Block   2    2    2                2
>>> Block   3    3                      3
>>> Block   4                            4
>>>
>>> The great thing about this method is that when you lose one disk you
>>> can get all your data back. when you lose two disks you only lose the
>>> data on them, and not the whole array. New disks can be added and the
>>> parity recalculated by reading only the new disk and the parity disk.
>>>
>>
>> This has some problem though:
>>
>> 1) every wite is a read-modify-write
>>    Well, for one thing this is slow.
>>
> Is that necessary? Why not read every other data disk at the same time
> and calculate new parity blocks on the fly? granted, that would mean
> spinning up every disk, so maybe this mode could be an option?

Reading one parity block and updating it is faster than reading X data
blocks and recomputing the parity. Both in I/O and CPU terms.

>> 2) every write is a read-modify-write of the parity disk
>>    Even worse, all writes to independent disks bottleneck at the
>>    parity disk.
>> 3) every write is a read-modify-write of the parity disk    That
>> poor parity disk. It can never catch a break, untill it
>>    breaks. It is likely that it will break first.
>>
> No problem, a failed parity disk on this method is a much smaller
> problem than a failed disk on a RAID 5

But the reason they went from raid3/4 to raid5. :)

>> 4) if the parity disk is larger than the 2nd largest disk it will
>>    waste space
>> 5) data at the start of the disk is more likely to fail than at the
>>    end of a disk
>>    (Say disks A and D fail then Block A1 is lost but A2-A4 are still
>>    there)
>>
>> As for adding a new disks there are 2 cases:
>>
>> 1) adding a small disk
>>    zero out the new disk and then the parity does not need to be updated
>> 2) adding a large disk
>>    zero out the new disk and then that becomes the parity disk
>>
> So the new disk gets a copy of the parity data of the previous parity disk?

No, the old parity disk becomes a data disk that happens to initialy
contain the parity of A, B, C, D, E. The new parity disk becomes all
zero.

Look at it this way: XORing disk A, B, C, D, E together gives
P. XORing A, B, C, D, E, P together always gives 0. So by filling the
new parity disk with zero you are computing the parity of A, B, C, D,
E, P. Just more intelligently.

>>> Please consider adding this feature request, it would be a big plus
>>> for linux if such a functionality existed, bringing many users from
>>> WHS and ZFS here, as it especially caters to the needs of people that
>>> store video and their movie collection at their home server.
>>>
>>> Thanks for your time
>>>
>>>
>>> ABCDE for data drives, and P for parity
>>>
>>
>> As a side note I like the idea of not striping, despide the uneven
>> use. For home use the speed of a single disk is usualy sufficient but
>> the noise of concurrent access to multiple disks is bothersome.
> Have you tried the Seagate Barracuda LP's? totally silent! I have 8 of
> them and i can assure you that they are perfect for large media
> storage in a silent computer.

I buy when I need space and have the money (unfortunately that doesn't
always coincide) and i use what I have. But it is interesting to see
how newer disks are much quieter and I don't believe that is just age
making old disks louder.

>> Also
>> for movie archives a lot of access will be reading and then the parity
>> disk can rest. Disks can also be spun down more often. Only the disk
>> containing the movie one currently watches need to be spinning. That
>> could translate into real money saved on the electric bill.
>>
>>
> I agree this is something mainly for home use, where reads exceed
> writes by a large margin and when writes are done, they are done to
> one or two disks at the same time at most.
>> But I would still do this with my algorithm to get even amount of
>> redunancy. One can then use partitions or lvm to split the overall
>> raid device back into seperate drives if one wants to.
>>
> Yes I think that an option for merging the disks into a large one
> would be nice, as long as data is still recoverable from individual
> disks if for example 2 disks fail. One of the main advantages of not
> stripping is that when things go haywire some data is still
> recoverable, so please lets not lose that.

That is just a matter of placing the partitions/volumes to not span
multiple disks. With partitionable raids one could implement such a
raid mode that would combine all disks into a single raid device but
export each data disk back as a partition. One wouldn't be able to
repartition that though. So not sure if I would want that in the
driver layer. Not everyone will care about having each data disk
seperate.

MfG
        Goswin

     prev parent reply	other threads:[~2009-10-05  9:06 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-22 11:24 Full use of varying drive sizes? Jon Hardcastle
2009-09-22 11:52 ` Kristleifur Daðason
2009-09-22 12:58   ` John Robinson
2009-09-22 13:07     ` Majed B.
2009-09-22 15:38       ` Jon Hardcastle
2009-09-22 15:47         ` Majed B.
2009-09-22 15:48         ` Ryan Wagoner
2009-09-22 16:04         ` Robin Hill
2009-09-23  8:20       ` John Robinson
2009-09-23 10:15       ` Tapani Tarvainen
2009-09-23 12:42         ` Goswin von Brederlow
2009-09-22 13:05 ` Tapani Tarvainen
2009-09-23 10:07 ` Goswin von Brederlow
2009-09-23 14:57   ` Jon Hardcastle
2009-09-23 20:28     ` Full use of varying drive sizes?---maybe a new raid mode is the answer? Konstantinos Skarlatos
2009-09-23 21:29       ` Chris Green
2009-09-24 17:23       ` John Robinson
2009-09-25  6:09       ` Neil Brown
2009-09-27 12:26         ` Konstantinos Skarlatos
2009-09-28 10:53       ` Goswin von Brederlow
2009-09-28 14:10         ` Konstantinos Skarlatos
2009-10-05  9:06           ` Goswin von Brederlow [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k4zanra7.fsf@frosties.localdomain \
    --to=goswin-v-b@web.de \
    --cc=Jon@eHardcastle.com \
    --cc=k.skarlatos@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).