From: Goswin von Brederlow <goswin-v-b@web.de>
To: Konstantinos Skarlatos <k.skarlatos@gmail.com>
Cc: Goswin von Brederlow <goswin-v-b@web.de>,
Jon@eHardcastle.com, linux-raid@vger.kernel.org, neilb@suse.de
Subject: Re: Full use of varying drive sizes?---maybe a new raid mode is the answer?
Date: Mon, 05 Oct 2009 11:06:40 +0200 [thread overview]
Message-ID: <87k4zanra7.fsf@frosties.localdomain> (raw)
In-Reply-To: <4AC0C3EB.30606@gmail.com> (Konstantinos Skarlatos's message of "Mon, 28 Sep 2009 17:10:51 +0300")
Konstantinos Skarlatos <k.skarlatos@gmail.com> writes:
> Goswin von Brederlow wrote:
>> Konstantinos Skarlatos <k.skarlatos@gmail.com> writes:
>>
>>
>>> Instead of doing all those things, I have a suggestion to make:
>>>
>>> Something that is like RAID 4 without striping.
>>>
>>> There are already 3 programs doing that, Unraid, Flexraid and
>>> disparity, but putting this functionality into linux-raid would be
>>> tremendous. (the first two work on linux and the third one is a
>>> command line windows program that works fine under wine).
>>>
>>> The basic idea is this: Take any number of drives, with any capacity
>>> and filesystem you like. Then provide the program with an empty disk
>>> at least as large as your largest disk. The program creates parity
>>> data by XORing together the disks sequentially block by block(or file
>>> by file), until it reaches the end of the smallest one.(It XORs block
>>> 1 of disk A with block1 of disk B, with block1 of disk C.... and
>>> writes the result to block1 of Parity disk) Then it continues with the
>>> rest of the drives, until it reaches the end of the last drive.
>>>
>>> Disk A B C D E P
>>> Block 1 1 1 1 1 1
>>> Block 2 2 2 2
>>> Block 3 3 3
>>> Block 4 4
>>>
>>> The great thing about this method is that when you lose one disk you
>>> can get all your data back. when you lose two disks you only lose the
>>> data on them, and not the whole array. New disks can be added and the
>>> parity recalculated by reading only the new disk and the parity disk.
>>>
>>
>> This has some problem though:
>>
>> 1) every wite is a read-modify-write
>> Well, for one thing this is slow.
>>
> Is that necessary? Why not read every other data disk at the same time
> and calculate new parity blocks on the fly? granted, that would mean
> spinning up every disk, so maybe this mode could be an option?
Reading one parity block and updating it is faster than reading X data
blocks and recomputing the parity. Both in I/O and CPU terms.
>> 2) every write is a read-modify-write of the parity disk
>> Even worse, all writes to independent disks bottleneck at the
>> parity disk.
>> 3) every write is a read-modify-write of the parity disk That
>> poor parity disk. It can never catch a break, untill it
>> breaks. It is likely that it will break first.
>>
> No problem, a failed parity disk on this method is a much smaller
> problem than a failed disk on a RAID 5
But the reason they went from raid3/4 to raid5. :)
>> 4) if the parity disk is larger than the 2nd largest disk it will
>> waste space
>> 5) data at the start of the disk is more likely to fail than at the
>> end of a disk
>> (Say disks A and D fail then Block A1 is lost but A2-A4 are still
>> there)
>>
>> As for adding a new disks there are 2 cases:
>>
>> 1) adding a small disk
>> zero out the new disk and then the parity does not need to be updated
>> 2) adding a large disk
>> zero out the new disk and then that becomes the parity disk
>>
> So the new disk gets a copy of the parity data of the previous parity disk?
No, the old parity disk becomes a data disk that happens to initialy
contain the parity of A, B, C, D, E. The new parity disk becomes all
zero.
Look at it this way: XORing disk A, B, C, D, E together gives
P. XORing A, B, C, D, E, P together always gives 0. So by filling the
new parity disk with zero you are computing the parity of A, B, C, D,
E, P. Just more intelligently.
>>> Please consider adding this feature request, it would be a big plus
>>> for linux if such a functionality existed, bringing many users from
>>> WHS and ZFS here, as it especially caters to the needs of people that
>>> store video and their movie collection at their home server.
>>>
>>> Thanks for your time
>>>
>>>
>>> ABCDE for data drives, and P for parity
>>>
>>
>> As a side note I like the idea of not striping, despide the uneven
>> use. For home use the speed of a single disk is usualy sufficient but
>> the noise of concurrent access to multiple disks is bothersome.
> Have you tried the Seagate Barracuda LP's? totally silent! I have 8 of
> them and i can assure you that they are perfect for large media
> storage in a silent computer.
I buy when I need space and have the money (unfortunately that doesn't
always coincide) and i use what I have. But it is interesting to see
how newer disks are much quieter and I don't believe that is just age
making old disks louder.
>> Also
>> for movie archives a lot of access will be reading and then the parity
>> disk can rest. Disks can also be spun down more often. Only the disk
>> containing the movie one currently watches need to be spinning. That
>> could translate into real money saved on the electric bill.
>>
>>
> I agree this is something mainly for home use, where reads exceed
> writes by a large margin and when writes are done, they are done to
> one or two disks at the same time at most.
>> But I would still do this with my algorithm to get even amount of
>> redunancy. One can then use partitions or lvm to split the overall
>> raid device back into seperate drives if one wants to.
>>
> Yes I think that an option for merging the disks into a large one
> would be nice, as long as data is still recoverable from individual
> disks if for example 2 disks fail. One of the main advantages of not
> stripping is that when things go haywire some data is still
> recoverable, so please lets not lose that.
That is just a matter of placing the partitions/volumes to not span
multiple disks. With partitionable raids one could implement such a
raid mode that would combine all disks into a single raid device but
export each data disk back as a partition. One wouldn't be able to
repartition that though. So not sure if I would want that in the
driver layer. Not everyone will care about having each data disk
seperate.
MfG
Goswin
prev parent reply other threads:[~2009-10-05 9:06 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-22 11:24 Full use of varying drive sizes? Jon Hardcastle
2009-09-22 11:52 ` Kristleifur Daðason
2009-09-22 12:58 ` John Robinson
2009-09-22 13:07 ` Majed B.
2009-09-22 15:38 ` Jon Hardcastle
2009-09-22 15:47 ` Majed B.
2009-09-22 15:48 ` Ryan Wagoner
2009-09-22 16:04 ` Robin Hill
2009-09-23 8:20 ` John Robinson
2009-09-23 10:15 ` Tapani Tarvainen
2009-09-23 12:42 ` Goswin von Brederlow
2009-09-22 13:05 ` Tapani Tarvainen
2009-09-23 10:07 ` Goswin von Brederlow
2009-09-23 14:57 ` Jon Hardcastle
2009-09-23 20:28 ` Full use of varying drive sizes?---maybe a new raid mode is the answer? Konstantinos Skarlatos
2009-09-23 21:29 ` Chris Green
2009-09-24 17:23 ` John Robinson
2009-09-25 6:09 ` Neil Brown
2009-09-27 12:26 ` Konstantinos Skarlatos
2009-09-28 10:53 ` Goswin von Brederlow
2009-09-28 14:10 ` Konstantinos Skarlatos
2009-10-05 9:06 ` Goswin von Brederlow [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87k4zanra7.fsf@frosties.localdomain \
--to=goswin-v-b@web.de \
--cc=Jon@eHardcastle.com \
--cc=k.skarlatos@gmail.com \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).