From: Goswin von Brederlow <goswin-v-b@web.de>
To: Konstantinos Skarlatos <k.skarlatos@gmail.com>
Cc: Goswin von Brederlow <goswin-v-b@web.de>,
Jon@eHardcastle.com, linux-raid@vger.kernel.org, neilb@suse.de
Subject: Re: Full use of varying drive sizes?---maybe a new raid mode is the answer?
Date: Mon, 05 Oct 2009 11:06:40 +0200 [thread overview]
Message-ID: <87k4zanra7.fsf@frosties.localdomain> (raw)
In-Reply-To: <4AC0C3EB.30606@gmail.com> (Konstantinos Skarlatos's message of "Mon, 28 Sep 2009 17:10:51 +0300")
Konstantinos Skarlatos <k.skarlatos@gmail.com> writes:
> Goswin von Brederlow wrote:
>> Konstantinos Skarlatos <k.skarlatos@gmail.com> writes:
>>
>>
>>> Instead of doing all those things, I have a suggestion to make:
>>>
>>> Something that is like RAID 4 without striping.
>>>
>>> There are already 3 programs doing that, Unraid, Flexraid and
>>> disparity, but putting this functionality into linux-raid would be
>>> tremendous. (the first two work on linux and the third one is a
>>> command line windows program that works fine under wine).
>>>
>>> The basic idea is this: Take any number of drives, with any capacity
>>> and filesystem you like. Then provide the program with an empty disk
>>> at least as large as your largest disk. The program creates parity
>>> data by XORing together the disks sequentially block by block(or file
>>> by file), until it reaches the end of the smallest one.(It XORs block
>>> 1 of disk A with block1 of disk B, with block1 of disk C.... and
>>> writes the result to block1 of Parity disk) Then it continues with the
>>> rest of the drives, until it reaches the end of the last drive.
>>>
>>> Disk A B C D E P
>>> Block 1 1 1 1 1 1
>>> Block 2 2 2 2
>>> Block 3 3 3
>>> Block 4 4
>>>
>>> The great thing about this method is that when you lose one disk you
>>> can get all your data back. when you lose two disks you only lose the
>>> data on them, and not the whole array. New disks can be added and the
>>> parity recalculated by reading only the new disk and the parity disk.
>>>
>>
>> This has some problem though:
>>
>> 1) every wite is a read-modify-write
>> Well, for one thing this is slow.
>>
> Is that necessary? Why not read every other data disk at the same time
> and calculate new parity blocks on the fly? granted, that would mean
> spinning up every disk, so maybe this mode could be an option?
Reading one parity block and updating it is faster than reading X data
blocks and recomputing the parity. Both in I/O and CPU terms.
>> 2) every write is a read-modify-write of the parity disk
>> Even worse, all writes to independent disks bottleneck at the
>> parity disk.
>> 3) every write is a read-modify-write of the parity disk That
>> poor parity disk. It can never catch a break, untill it
>> breaks. It is likely that it will break first.
>>
> No problem, a failed parity disk on this method is a much smaller
> problem than a failed disk on a RAID 5
But the reason they went from raid3/4 to raid5. :)
>> 4) if the parity disk is larger than the 2nd largest disk it will
>> waste space
>> 5) data at the start of the disk is more likely to fail than at the
>> end of a disk
>> (Say disks A and D fail then Block A1 is lost but A2-A4 are still
>> there)
>>
>> As for adding a new disks there are 2 cases:
>>
>> 1) adding a small disk
>> zero out the new disk and then the parity does not need to be updated
>> 2) adding a large disk
>> zero out the new disk and then that becomes the parity disk
>>
> So the new disk gets a copy of the parity data of the previous parity disk?
No, the old parity disk becomes a data disk that happens to initialy
contain the parity of A, B, C, D, E. The new parity disk becomes all
zero.
Look at it this way: XORing disk A, B, C, D, E together gives
P. XORing A, B, C, D, E, P together always gives 0. So by filling the
new parity disk with zero you are computing the parity of A, B, C, D,
E, P. Just more intelligently.
>>> Please consider adding this feature request, it would be a big plus
>>> for linux if such a functionality existed, bringing many users from
>>> WHS and ZFS here, as it especially caters to the needs of people that
>>> store video and their movie collection at their home server.
>>>
>>> Thanks for your time
>>>
>>>
>>> ABCDE for data drives, and P for parity
>>>
>>
>> As a side note I like the idea of not striping, despide the uneven
>> use. For home use the speed of a single disk is usualy sufficient but
>> the noise of concurrent access to multiple disks is bothersome.
> Have you tried the Seagate Barracuda LP's? totally silent! I have 8 of
> them and i can assure you that they are perfect for large media
> storage in a silent computer.
I buy when I need space and have the money (unfortunately that doesn't
always coincide) and i use what I have. But it is interesting to see
how newer disks are much quieter and I don't believe that is just age
making old disks louder.
>> Also
>> for movie archives a lot of access will be reading and then the parity
>> disk can rest. Disks can also be spun down more often. Only the disk
>> containing the movie one currently watches need to be spinning. That
>> could translate into real money saved on the electric bill.
>>
>>
> I agree this is something mainly for home use, where reads exceed
> writes by a large margin and when writes are done, they are done to
> one or two disks at the same time at most.
>> But I would still do this with my algorithm to get even amount of
>> redunancy. One can then use partitions or lvm to split the overall
>> raid device back into seperate drives if one wants to.
>>
> Yes I think that an option for merging the disks into a large one
> would be nice, as long as data is still recoverable from individual
> disks if for example 2 disks fail. One of the main advantages of not
> stripping is that when things go haywire some data is still
> recoverable, so please lets not lose that.
That is just a matter of placing the partitions/volumes to not span
multiple disks. With partitionable raids one could implement such a
raid mode that would combine all disks into a single raid device but
export each data disk back as a partition. One wouldn't be able to
repartition that though. So not sure if I would want that in the
driver layer. Not everyone will care about having each data disk
seperate.
MfG
Goswin
prev parent reply other threads:[~2009-10-05 9:06 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-22 11:24 Full use of varying drive sizes? Jon Hardcastle
2009-09-22 11:52 ` Kristleifur Daðason
2009-09-22 12:58 ` John Robinson
2009-09-22 13:07 ` Majed B.
2009-09-22 15:38 ` Jon Hardcastle
2009-09-22 15:47 ` Majed B.
2009-09-22 15:48 ` Ryan Wagoner
2009-09-22 16:04 ` Robin Hill
2009-09-23 8:20 ` John Robinson
2009-09-23 10:15 ` Tapani Tarvainen
2009-09-23 12:42 ` Goswin von Brederlow
2009-09-22 13:05 ` Tapani Tarvainen
2009-09-23 10:07 ` Goswin von Brederlow
2009-09-23 14:57 ` Jon Hardcastle
2009-09-23 20:28 ` Full use of varying drive sizes?---maybe a new raid mode is the answer? Konstantinos Skarlatos
2009-09-23 21:29 ` Chris Green
2009-09-24 17:23 ` John Robinson
2009-09-25 6:09 ` Neil Brown
2009-09-27 12:26 ` Konstantinos Skarlatos
2009-09-28 10:53 ` Goswin von Brederlow
2009-09-28 14:10 ` Konstantinos Skarlatos
2009-10-05 9:06 ` Goswin von Brederlow [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87k4zanra7.fsf@frosties.localdomain \
--to=goswin-v-b@web.de \
--cc=Jon@eHardcastle.com \
--cc=k.skarlatos@gmail.com \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.