Re: Full use of varying drive sizes?---maybe a new raid mode is the answer?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Konstantinos Skarlatos <k.skarlatos@gmail.com>
To: Neil Brown <neilb@suse.de>
Cc: Jon@eHardcastle.com, Goswin von Brederlow <goswin-v-b@web.de>,
	linux-raid@vger.kernel.org
Subject: Re: Full use of varying drive sizes?---maybe a new raid mode is the answer?
Date: Sun, 27 Sep 2009 15:26:35 +0300	[thread overview]
Message-ID: <4ABF59FB.40908@gmail.com> (raw)
In-Reply-To: <19132.24233.852227.120095@notabene.brown>

Neil, thanks for your answer! I appreciate that you took the time to 
look into this.

So, what can people like me - who do not know how to program - do in 
order to make something like this more likely to happen? create a new 
thread here? post into forums like avsforums that are full of people 
that would die to have something like this into linux? donate some money 
or equipment? beg? :-)

FWIW I can be a tester for any code that comes out.


Best regards,
Konstantinos Skarlatos

Neil Brown wrote:
> On Wednesday September 23, k.skarlatos@gmail.com wrote:
>   
>> Instead of doing all those things, I have a suggestion to make:
>>
>> Something that is like RAID 4 without striping.
>>
>> There are already 3 programs doing that, Unraid, Flexraid and disparity, 
>> but putting this functionality into linux-raid would be tremendous. (the 
>> first two work on linux and the third one is a command line windows 
>> program that works fine under wine).
>>
>> The basic idea is this: Take any number of drives, with any capacity and 
>> filesystem you like. Then provide the program with an empty disk at 
>> least as large as your largest disk. The program creates parity data by 
>> XORing together the disks sequentially block by block(or file by file), 
>> until it reaches the end of the smallest one.(It XORs block 1 of disk A 
>> with block1 of disk B, with block1 of disk C.... and writes the result 
>> to block1 of Parity disk) Then it continues with the rest of the drives, 
>> until it reaches the end of the last drive.
>>
>> Disk     A    B   C   D   E    P
>> Block   1    1    1    1    1    1
>> Block   2    2    2                2
>> Block   3    3                      3
>> Block   4                            4
>>
>> The great thing about this method is that when you lose one disk you can 
>> get all your data back. when you lose two disks you only lose the data 
>> on them, and not the whole array. New disks can be added and the parity 
>> recalculated by reading only the new disk and the parity disk.
>>
>> Please consider adding this feature request, it would be a big plus for 
>> linux if such a functionality existed, bringing many users from WHS and 
>> ZFS here, as it especially caters to the needs of people that store 
>> video and their movie collection at their home server.
>>     
>
> This probably wouldn't be too hard.  There would be some awkwardnesses
> though.
> The whole array would be one device, so the 'obvious' way to present
> the separate non-parity drives would be as partitions of that device.
> However you would not then be able to re-partition the device.
> You could use dm to partition the partitions I suppose.
>
> Another awkwardness would be that you would need to record somewhere
> the size of each device so that when a device fails you can synthesis
> a partition/device of the right size.  The current md metadata doesn't
> have anywhere to store that sort of per-device data.  That is clearly
> a solvable problem but finding an elegant solution might be a
> challenge.
>
>
> However this is not something I am likely to work on in the
> foreseeable future.  If someone else would like to have a go I can
> certainly make suggestions and review code.
>
> NeilBrown
>
>
>
>   
>> Thanks for your time
>>
>>
>> ABCDE for data drives, and P for parity
>>
>> Jon Hardcastle wrote:
>>     
>>> --- On Wed, 23/9/09, Goswin von Brederlow <goswin-v-b@web.de> wrote:
>>>
>>>   
>>>       
>>>> From: Goswin von Brederlow <goswin-v-b@web.de>
>>>> Subject: Re: Full use of varying drive sizes?
>>>> To: Jon@eHardcastle.com
>>>> Cc: linux-raid@vger.kernel.org
>>>> Date: Wednesday, 23 September, 2009, 11:07 AM
>>>> Jon Hardcastle <jd_hardcastle@yahoo.com>
>>>> writes:
>>>>
>>>>     
>>>>         
>>>>> Hey guys,
>>>>>
>>>>> I have an array made of many drive sizes ranging from
>>>>>       
>>>>>           
>>>> 500GB to 1TB and I appreciate that the array can only be a
>>>> multiple of the smallest - I use the differing sizes as i
>>>> just buy the best value drive at the time and hope that as i
>>>> phase out the old drives I can '--grow' the array. That is
>>>> all fine and dandy.
>>>>     
>>>>         
>>>>> But could someone tell me, did I dream that there
>>>>>       
>>>>>           
>>>> might one day be support to allow you to actually use that
>>>> unused space in the array? Because that would be awesome!
>>>> (if a little hairy re: spare drives - have to be the size of
>>>> the largest drive in the array atleast..?) I have 3x500GB
>>>> 2x750GB 1x1TB so I have 1TB of completely unused space!
>>>>     
>>>>         
>>>>> Cheers.
>>>>>
>>>>> Jon H
>>>>>       
>>>>>           
>>>> I face the same problem as I buy new disks whenever I need
>>>> more space
>>>> and have the money.
>>>>
>>>> I found a rather simple way to organize disks of different
>>>> sizes into
>>>> a set of software raids that gives the maximum size. The
>>>> reasoning for
>>>> this algorithm are as follows:
>>>>
>>>> 1) 2 partitions of a disk must never be in the same raid
>>>> set
>>>>
>>>> 2) as many disks as possible in each raid set to minimize
>>>> the loss for
>>>> parity
>>>>
>>>> 3) the number of disks in each raid set should be equal to
>>>> give
>>>> uniform amount of redundancy (same saftey for all data).
>>>> Worst (and
>>>> usual) case will be a difference of 1 disk.
>>>>
>>>>
>>>> So here is the algorithm:
>>>>
>>>> 1) Draw a box as wide as the largest disk and open ended
>>>> towards the
>>>>    bottom.
>>>>
>>>> 2) Draw in each disk in order of size one right to the
>>>> other.
>>>>    When you hit the right side of the box
>>>> continue in the next line.
>>>>
>>>> 3) Go through the box left to right and draw a vertical
>>>> line every
>>>>    time one disk ends and another starts.
>>>>
>>>> 4) Each sub-box creted thus represents one raid using the
>>>> disks drawn
>>>>    into it in the respective sizes present
>>>> in the box.
>>>>
>>>> In your case you have 6 Disks: A (1TB), BC (750G),
>>>> DEF(500G)
>>>>
>>>> +----------+-----+-----+
>>>> |AAAAAAAAAA|AAAAA|AAAAA|
>>>> |BBBBBBBBBB|BBBBB|CCCCC|
>>>> |CCCCCCCCCC|DDDDD|DDDDD|
>>>> |EEEEEEEEEE|FFFFF|FFFFF|
>>>> |  md0     | md1 | md2 |
>>>>
>>>> For raid5 this would give you:
>>>>
>>>> md0: sda1, sdb1, sdc1, sde1 (500G)  -> 1500G
>>>> md1: sda2, sdb2, sdd1, sdf1 (250G)  ->  750G
>>>> md2: sda3, sdc2, sdd2, sdf2 (250G)  ->  750G
>>>>                
>>>>                
>>>>        -----
>>>>                
>>>>                
>>>>        3000G total
>>>>
>>>> As spare you would probably want to always use the largest
>>>> disk as
>>>> only then it is completly unused and can power down.
>>>>
>>>> Note that in your case the fit is perfect with all raids
>>>> having 4
>>>> disks. This is not always the case. Worst case there is a
>>>> difference
>>>> of 1 between raids though.
>>>>
>>>>
>>>>
>>>> As a side node: Resizing when you get new disks might
>>>> become tricky
>>>> and involve shuffeling around a lot of data. You might want
>>>> to split
>>>> md0 into 2 raids with 250G partitiosn each assuming future
>>>> disks will
>>>> continue to be multiples of 250G.
>>>>
>>>> MfG
>>>>         Goswin
>>>>
>>>>     
>>>>         
>>> Yes,
>>>
>>> This is a great system. I did think about this when i first created my array but I was young and lacked the confidence to do much..
>>>
>>> So assuming I then purchased a 1.5TB drive the diagram would change to
>>>
>>> 6 Disks: A (1TB), BC (750G), DEF(500G), G(1.5TB)
>>>
>>> i) So i'd partition the drive up into 250GB chucks and add each chuck to md0~3
>>>
>>> +-----+-----+-----+-----+-----+-----+
>>> |GGGGG|GGGGG|GGGGG|GGGGG|GGGGG|GGGGG|
>>> |AAAAA|AAAAA|AAAAA|AAAAA|     |     |
>>> |BBBBB|BBBBB|BBBBB|CCCCC|     |     |
>>> |CCCCC|CCCCC|DDDDD|DDDDD|     |     |
>>> |EEEEE|EEEEE|FFFFF|FFFFF|     |     |
>>> |  md0| md1 | md2 | md3 | md4 | md5 |
>>>
>>>
>>> ii) then I guess I'd have to relieve the E's from md0 and md1? giving (which I can do by failing the drives?) 
>>> this would then kick in the use of the newly added G's?
>>>
>>> +-----+-----+-----+-----+-----+-----+
>>> |GGGGG|GGGGG|GGGGG|GGGGG|GGGGG|GGGGG|
>>> |AAAAA|AAAAA|AAAAA|AAAAA|EEEEE|EEEEE|
>>> |BBBBB|BBBBB|BBBBB|CCCCC|FFFFF|FFFFF|
>>> |CCCCC|CCCCC|DDDDD|DDDDD|     |     |
>>> |XXXXX|XXXXX|XXXXX|XXXXX|     |     |
>>> |  md0| md1 | md2 | md3 | md4 | md5 |
>>>
>>> iii) Repeat for the F's which would again trigger the rebuild using the G's.
>>>
>>> the end result is 6 arrays with 4 and 2 partions in respectively i.e.
>>>
>>>    +--1--+--2--+--3--+--4--+--5--+--6--+
>>> sda|GGGGG|GGGGG|GGGGG|GGGGG|GGGGG|GGGGG|
>>> sdb|AAAAA|AAAAA|AAAAA|AAAAA|EEEEE|EEEEE|
>>> sdc|BBBBB|BBBBB|BBBBB|CCCCC|FFFFF|FFFFF|
>>> sdd|CCCCC|CCCCC|DDDDD|DDDDD|     |     |
>>> sde|  md0| md1 | md2 | md3 | md4 | md5 |
>>>
>>>
>>> md0: sda1, sdb1, sdc1, sdd1 (250G)  -> 750G
>>> md1: sda2, sdb2, sdc2, sdd2 (250G)  -> 750G
>>> md2: sda3, sdb3, sdc3, sdd3 (250G)  -> 750G
>>> md3: sda4, sdb4, sdc4, sdd4 (250G)  -> 750G
>>> md4: sda5, sdb5, sdc5               -> 500G
>>> md5: sda6, sdb6, sdc6               -> 500G
>>>
>>> Total                               -> 4000G
>>>
>>> I cant do the maths tho as my head hurts too much but is this quite wasteful with so many raid 5 arrays each time burning 1x250gb?
>>>
>>> Finally... i DID find a reference...
>>>
>>> check out: http://neil.brown.name/blog/20090817000931
>>>
>>> '
>>> ...
>>> It would also be nice to teach RAID5 to handle arrays with devices of different sizes. There are some complications there as you could have a hot spare that can replace some devices but not all. 
>>> ...
>>> '
>>>
>>>
>>> -----------------------
>>> N: Jon Hardcastle
>>> E: Jon@eHardcastle.com
>>> 'Do not worry about tomorrow, for tomorrow will bring worries of its own.'
>>> -----------------------
>>>
>>>
>>>
>>>       
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>   
>>>

next prev parent reply	other threads:[~2009-09-27 12:26 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-22 11:24 Full use of varying drive sizes? Jon Hardcastle
2009-09-22 11:52 ` Kristleifur Daðason
2009-09-22 12:58   ` John Robinson
2009-09-22 13:07     ` Majed B.
2009-09-22 15:38       ` Jon Hardcastle
2009-09-22 15:47         ` Majed B.
2009-09-22 15:48         ` Ryan Wagoner
2009-09-22 16:04         ` Robin Hill
2009-09-23  8:20       ` John Robinson
2009-09-23 10:15       ` Tapani Tarvainen
2009-09-23 12:42         ` Goswin von Brederlow
2009-09-22 13:05 ` Tapani Tarvainen
2009-09-23 10:07 ` Goswin von Brederlow
2009-09-23 14:57   ` Jon Hardcastle
2009-09-23 20:28     ` Full use of varying drive sizes?---maybe a new raid mode is the answer? Konstantinos Skarlatos
2009-09-23 21:29       ` Chris Green
2009-09-24 17:23       ` John Robinson
2009-09-25  6:09       ` Neil Brown
2009-09-27 12:26         ` Konstantinos Skarlatos [this message]
2009-09-28 10:53       ` Goswin von Brederlow
2009-09-28 14:10         ` Konstantinos Skarlatos
2009-10-05  9:06           ` Goswin von Brederlow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ABF59FB.40908@gmail.com \
    --to=k.skarlatos@gmail.com \
    --cc=Jon@eHardcastle.com \
    --cc=goswin-v-b@web.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).