Proposal: non-striping RAID4

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Proposal: non-striping RAID4
@ 2007-11-10  0:57 James Lee
  2007-11-12  1:29 ` Bill Davidsen
  0 siblings, 1 reply; 17+ messages in thread
From: James Lee @ 2007-11-10  0:57 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 3566 bytes --]

I have come across an unusual RAID configuration type which differs
from any of the standard RAID 0/1/4/5 levels currently available in
the md driver, and has a couple of very useful properties (see below).
 I think it would be useful to have this code included in the main
kernel, as it allows for some use cases that aren't well catered for
with the standard RAID levels.  I was wondering what people's thoughts
on this might be?

The RAID type has been named "unRAID" by it's author, and is basically
similar to RAID 4 but without data being striped across the drives in
the array.  In an n-drive array (where the drives need not have the
same capacity), n-1 of the drives appear as independent drives with
data written to them as with a single standalone drive, and the 1
remaining drive is a parity drive (this must be the largest capacity
drive), which stores the bitwise XOR of the data on the other n-1
drives (where the data being XORed is taken to be 0 if we're past the
end of that particular drive).  Data recovery then works as per normal
RAID 4/5 in the case of the failure of any one of the drives in the
array.

The advantages of this are:
- Drives need not be of the same size as each other; the only
requirement is that the parity drive must be the largest drive in the
array.  The available space of the array is the sum of the space of
all drives in the array, minus the size of the largest drive.
- Data protection is slightly better than with RAID 4/5 in that in the
event of multiple drive failures, only some data is lost (since the
data on any non-failed, non-parity drives is usable).

The disadvantages are:
- Performance:
    - As there is no striping, on a non-degraded array the read
performance will be identical to that of a single drive setup, and the
write performance will be comparable or somewhat worse than that of a
single-drive setup.
    - On a degraded arrays with many drives the read and write
performance could take further hits due to the PCI / PCI-E bus getting
saturated.

The company which has implemented this is "Lime technology" (website
here: http://www.lime-technology.com/); an overview of the technical
detail is given on their website here:
http://www.lime-technology.com/wordpress/?page_id=13.  The changes
made to the Linux md driver to support this have been released under
the GPL by the author - I've attached these to this email.

Now I'm guessing that the reason this hasn't been implemented before
is that in most cases the points above mean that this is a worse
option than RAID 5, however there is a strong use case for this
system.  For many home users who want data redundancy, the current
RAID levels are impractical because the user has many hard drives of
different sizes accumulated over the years.  Even for new setups, it
is generally not cost-effective to buy multiple identical sized hard
drives, compared with incrementally adding storage of the capacity
which is at the best price per GB at the time.  The fact that there is
a need for this type of flexibility is evidenced, for example, by
various forum threads such as for example this thread containing over
1500 posts in a specialized audio / video forum:
http://www.avsforum.com/avs-vb/showthread.php?t=573986, as well as the
active community in the forums on the Lime technology website.

Would there be interest in making this kind of addition to the md code?

PS: In case it wasn't clear, the attached code is simply the code the
author has released under GPL - it's intended just for reference, not
as proposed code for review.

[-- Attachment #2: unraid_code.zip --]
[-- Type: application/zip, Size: 31434 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Proposal: non-striping RAID4
  2007-11-10  0:57 James Lee
@ 2007-11-12  1:29 ` Bill Davidsen
  2007-11-13 23:48   ` James Lee
  0 siblings, 1 reply; 17+ messages in thread
From: Bill Davidsen @ 2007-11-12  1:29 UTC (permalink / raw)
  To: James Lee; +Cc: linux-raid

James Lee wrote:
> I have come across an unusual RAID configuration type which differs
> from any of the standard RAID 0/1/4/5 levels currently available in
> the md driver, and has a couple of very useful properties (see below).
>  I think it would be useful to have this code included in the main
> kernel, as it allows for some use cases that aren't well catered for
> with the standard RAID levels.  I was wondering what people's thoughts
> on this might be?
>
> The RAID type has been named "unRAID" by it's author, and is basically
> similar to RAID 4 but without data being striped across the drives in
> the array.  In an n-drive array (where the drives need not have the
> same capacity), n-1 of the drives appear as independent drives with
> data written to them as with a single standalone drive, and the 1
> remaining drive is a parity drive (this must be the largest capacity
> drive), which stores the bitwise XOR of the data on the other n-1
> drives (where the data being XORed is taken to be 0 if we're past the
> end of that particular drive).  Data recovery then works as per normal
> RAID 4/5 in the case of the failure of any one of the drives in the
> array.
>
> The advantages of this are:
> - Drives need not be of the same size as each other; the only
> requirement is that the parity drive must be the largest drive in the
> array.  The available space of the array is the sum of the space of
> all drives in the array, minus the size of the largest drive.
> - Data protection is slightly better than with RAID 4/5 in that in the
> event of multiple drive failures, only some data is lost (since the
> data on any non-failed, non-parity drives is usable).
>
> The disadvantages are:
> - Performance:
>     - As there is no striping, on a non-degraded array the read
> performance will be identical to that of a single drive setup, and the
> write performance will be comparable or somewhat worse than that of a
> single-drive setup.
>     - On a degraded arrays with many drives the read and write
> performance could take further hits due to the PCI / PCI-E bus getting
> saturated.
>   

I personally feel that "this still looks like a bunch of little drives" 
should be listed first...
> The company which has implemented this is "Lime technology" (website
> here: http://www.lime-technology.com/); an overview of the technical
> detail is given on their website here:
> http://www.lime-technology.com/wordpress/?page_id=13.  The changes
> made to the Linux md driver to support this have been released under
> the GPL by the author - I've attached these to this email.
>
> Now I'm guessing that the reason this hasn't been implemented before
> is that in most cases the points above mean that this is a worse
> option than RAID 5, however there is a strong use case for this
> system.  For many home users who want data redundancy, the current
> RAID levels are impractical because the user has many hard drives of
> different sizes accumulated over the years.  Even for new setups, it
>   

And over the years is just the problem. You have a bunch of tiny drives 
unsuited, or marginally suited, for the size of modern distributions, 
and using assorted old technology. There's a reason these are thrown out 
and available, they're pretty much useless. Also they're power-hungry, 
slow, and you would probably need more of these than would fit in a 
standard case to provide even minimal useful size. They're also probably 
PATA, meaning that many modern motherboards don't support them well (if 
at all).
> is generally not cost-effective to buy multiple identical sized hard
> drives, compared with incrementally adding storage of the capacity
> which is at the best price per GB at the time.  The fact that there is
> a need for this type of flexibility is evidenced, for example, by
> various forum threads such as for example this thread containing over
> 1500 posts in a specialized audio / video forum:
> http://www.avsforum.com/avs-vb/showthread.php?t=573986, as well as the
> active community in the forums on the Lime technology website.
>   

I can buy 500GB USB drives for $98+tax if I wait until Stapels or Office 
Max have a sale, $120 anytime, anywhere. I see 250GB PATA drives being 
flogged for $50-70 for lack of demand. I simply can't imagine any case 
where this would be useful other than as a proof of concept.

Note: you can do this with existing code by setting up a partitions of 
various sizes on multiple drives, first partitions size of smallest 
drive, next the remaining space on the next drive, etc. On every set of 
 >2 drives make a raid-5, on every set of two drives make raid-10, and 
have a bunch of smaller redundant drives which are faster. You can then 
combine them all into one linear array if it pleases you. I have a crate 
if drives from 360MB (six) to about 4GB, and they are going to be sold 
by the pound because they are garbage.
> Would there be interest in making this kind of addition to the md code?
>   

I can't see that the cost of maintaining it is justified by the benefit, 
but not my decision. If you were to set up such a thing using FUSE, 
keeping it out of the kernel but still providing the functionality, it 
might be worth doing. On the other hand, setting up the partitions and 
creating the arrays could probably be done by a perl script which would 
take only a few hours to write.
> PS: In case it wasn't clear, the attached code is simply the code the
> author has released under GPL - it's intended just for reference, not
> as proposed code for review.
>   
Much as I generally like adding functionality, I *really* can't see much 
in this idea. It seems to me to be in the "clever but not useful" category.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Proposal: non-striping RAID4
  2007-11-12  1:29 ` Bill Davidsen
@ 2007-11-13 23:48   ` James Lee
  2007-11-14  1:06     ` James Lee
  0 siblings, 1 reply; 17+ messages in thread
From: James Lee @ 2007-11-13 23:48 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: linux-raid

Thanks for the reply Bill, and on reflection I agree with a lot of it.

I do feel that the use case is a sensible valid one - though maybe I
didn't illustrate it well.  As an example suppose I want to, starting
from scratch, build up a cost-effective large redundant array to hold
data.

With standard RAID5 I might do as follows:
- Buy 3x 500GB SATA drives, setup as a single RAID5 array.
- Once I have run out of space on this array (say in 12 months, for
example), add another 1x500GB drive and expand the array.
- Another 6 months or so later, buy another 1x 500GB drive and expand
the array, etc.
This isn't that cost-efficient, as by the second or third iteration,
500GB drives are not good value per-GB (as the sweet-spot has moved
onto 1TB drives say).

With a scheme which gives a redundant array with the capacity being
the sum of the size of all drives minus the size of the largest drive,
the sequence can be something like:
- Buy 3x500GB drives.
- Once out of space, add a drive with size determined by current best
price/GB (eg. 1x750GB drive).
- Repeat as above (giving, say, 1TB, 1.5TB,  drives).
(- When adding larger drives, potentially also start removing the
smallest drives from the array and selling them - to avoid having too
many drives.)

However what I do agree with is that this is entirely achievable using
current RAID5 and RAID1, and as you described (and ideally then having
creating a linear array out of the resulting arrays).  All it would
require, as you say, is either a simple wrapper script issuing mdadm
commands, or ideally for this ability to be added to mdadm itself.  So
that the create command for this new "raid type" would just create all
the RAID5 and RAID1 arrays, and use them to make a linear array.  The
grow command (when adding a new drive to the array) would partition it
up, expand each of the RAID5 arrays onto it, convert the existing
RAID1 array to a RAID5 array using the new drive, create a new RAID1
array, and expand the linear array containing them all.  The only
thing I'm not entirely sure about is whether mdadm currently supports
online conversion of 2-drive RAID1 array --> 3-drive RAID5 array?

So thanks for the input, and I'll now ask a slightly different
question to my original one - would there be any interest in enhancing
mdadm to do the above?  By which I mean would patches which did this
be considered, or would this be deemed not to be useful / desirable?

Thanks,
James

On 12/11/2007, Bill Davidsen <davidsen@tmr.com> wrote:
> James Lee wrote:
> > I have come across an unusual RAID configuration type which differs
> > from any of the standard RAID 0/1/4/5 levels currently available in
> > the md driver, and has a couple of very useful properties (see below).
> >  I think it would be useful to have this code included in the main
> > kernel, as it allows for some use cases that aren't well catered for
> > with the standard RAID levels.  I was wondering what people's thoughts
> > on this might be?
> >
> > The RAID type has been named "unRAID" by it's author, and is basically
> > similar to RAID 4 but without data being striped across the drives in
> > the array.  In an n-drive array (where the drives need not have the
> > same capacity), n-1 of the drives appear as independent drives with
> > data written to them as with a single standalone drive, and the 1
> > remaining drive is a parity drive (this must be the largest capacity
> > drive), which stores the bitwise XOR of the data on the other n-1
> > drives (where the data being XORed is taken to be 0 if we're past the
> > end of that particular drive).  Data recovery then works as per normal
> > RAID 4/5 in the case of the failure of any one of the drives in the
> > array.
> >
> > The advantages of this are:
> > - Drives need not be of the same size as each other; the only
> > requirement is that the parity drive must be the largest drive in the
> > array.  The available space of the array is the sum of the space of
> > all drives in the array, minus the size of the largest drive.
> > - Data protection is slightly better than with RAID 4/5 in that in the
> > event of multiple drive failures, only some data is lost (since the
> > data on any non-failed, non-parity drives is usable).
> >
> > The disadvantages are:
> > - Performance:
> >     - As there is no striping, on a non-degraded array the read
> > performance will be identical to that of a single drive setup, and the
> > write performance will be comparable or somewhat worse than that of a
> > single-drive setup.
> >     - On a degraded arrays with many drives the read and write
> > performance could take further hits due to the PCI / PCI-E bus getting
> > saturated.
> >
>
> I personally feel that "this still looks like a bunch of little drives"
> should be listed first...
> > The company which has implemented this is "Lime technology" (website
> > here: http://www.lime-technology.com/); an overview of the technical
> > detail is given on their website here:
> > http://www.lime-technology.com/wordpress/?page_id=13.  The changes
> > made to the Linux md driver to support this have been released under
> > the GPL by the author - I've attached these to this email.
> >
> > Now I'm guessing that the reason this hasn't been implemented before
> > is that in most cases the points above mean that this is a worse
> > option than RAID 5, however there is a strong use case for this
> > system.  For many home users who want data redundancy, the current
> > RAID levels are impractical because the user has many hard drives of
> > different sizes accumulated over the years.  Even for new setups, it
> >
>
> And over the years is just the problem. You have a bunch of tiny drives
> unsuited, or marginally suited, for the size of modern distributions,
> and using assorted old technology. There's a reason these are thrown out
> and available, they're pretty much useless. Also they're power-hungry,
> slow, and you would probably need more of these than would fit in a
> standard case to provide even minimal useful size. They're also probably
> PATA, meaning that many modern motherboards don't support them well (if
> at all).
> > is generally not cost-effective to buy multiple identical sized hard
> > drives, compared with incrementally adding storage of the capacity
> > which is at the best price per GB at the time.  The fact that there is
> > a need for this type of flexibility is evidenced, for example, by
> > various forum threads such as for example this thread containing over
> > 1500 posts in a specialized audio / video forum:
> > http://www.avsforum.com/avs-vb/showthread.php?t=573986, as well as the
> > active community in the forums on the Lime technology website.
> >
>
> I can buy 500GB USB drives for $98+tax if I wait until Stapels or Office
> Max have a sale, $120 anytime, anywhere. I see 250GB PATA drives being
> flogged for $50-70 for lack of demand. I simply can't imagine any case
> where this would be useful other than as a proof of concept.
>
> Note: you can do this with existing code by setting up a partitions of
> various sizes on multiple drives, first partitions size of smallest
> drive, next the remaining space on the next drive, etc. On every set of
>  >2 drives make a raid-5, on every set of two drives make raid-10, and
> have a bunch of smaller redundant drives which are faster. You can then
> combine them all into one linear array if it pleases you. I have a crate
> if drives from 360MB (six) to about 4GB, and they are going to be sold
> by the pound because they are garbage.
> > Would there be interest in making this kind of addition to the md code?
> >
>
> I can't see that the cost of maintaining it is justified by the benefit,
> but not my decision. If you were to set up such a thing using FUSE,
> keeping it out of the kernel but still providing the functionality, it
> might be worth doing. On the other hand, setting up the partitions and
> creating the arrays could probably be done by a perl script which would
> take only a few hours to write.
> > PS: In case it wasn't clear, the attached code is simply the code the
> > author has released under GPL - it's intended just for reference, not
> > as proposed code for review.
> >
> Much as I generally like adding functionality, I *really* can't see much
> in this idea. It seems to me to be in the "clever but not useful" category.
>
> --
> bill davidsen <davidsen@tmr.com>
>   CTO TMR Associates, Inc
>   Doing interesting things with small computers since 1979
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Proposal: non-striping RAID4
  2007-11-13 23:48   ` James Lee
@ 2007-11-14  1:06     ` James Lee
  2007-11-14 23:16       ` Bill Davidsen
  0 siblings, 1 reply; 17+ messages in thread
From: James Lee @ 2007-11-14  1:06 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: linux-raid

From a quick search through this mailing list, it looks like I can
answer my own question regarding RAID1 --> RAID5 conversion.  Instead
of creating a RAID1 array for the partitions on the two biggest
drives, it should just create a 2-drive RAID5 (which is identical, but
can be expanded as with any other RAID5 array).

So it looks like this should work I guess.

On 13/11/2007, James Lee <james.lee@cantab.net> wrote:
> Thanks for the reply Bill, and on reflection I agree with a lot of it.
>
> I do feel that the use case is a sensible valid one - though maybe I
> didn't illustrate it well.  As an example suppose I want to, starting
> from scratch, build up a cost-effective large redundant array to hold
> data.
>
> With standard RAID5 I might do as follows:
> - Buy 3x 500GB SATA drives, setup as a single RAID5 array.
> - Once I have run out of space on this array (say in 12 months, for
> example), add another 1x500GB drive and expand the array.
> - Another 6 months or so later, buy another 1x 500GB drive and expand
> the array, etc.
> This isn't that cost-efficient, as by the second or third iteration,
> 500GB drives are not good value per-GB (as the sweet-spot has moved
> onto 1TB drives say).
>
> With a scheme which gives a redundant array with the capacity being
> the sum of the size of all drives minus the size of the largest drive,
> the sequence can be something like:
> - Buy 3x500GB drives.
> - Once out of space, add a drive with size determined by current best
> price/GB (eg. 1x750GB drive).
> - Repeat as above (giving, say, 1TB, 1.5TB,  drives).
> (- When adding larger drives, potentially also start removing the
> smallest drives from the array and selling them - to avoid having too
> many drives.)
>
> However what I do agree with is that this is entirely achievable using
> current RAID5 and RAID1, and as you described (and ideally then having
> creating a linear array out of the resulting arrays).  All it would
> require, as you say, is either a simple wrapper script issuing mdadm
> commands, or ideally for this ability to be added to mdadm itself.  So
> that the create command for this new "raid type" would just create all
> the RAID5 and RAID1 arrays, and use them to make a linear array.  The
> grow command (when adding a new drive to the array) would partition it
> up, expand each of the RAID5 arrays onto it, convert the existing
> RAID1 array to a RAID5 array using the new drive, create a new RAID1
> array, and expand the linear array containing them all.  The only
> thing I'm not entirely sure about is whether mdadm currently supports
> online conversion of 2-drive RAID1 array --> 3-drive RAID5 array?
>
> So thanks for the input, and I'll now ask a slightly different
> question to my original one - would there be any interest in enhancing
> mdadm to do the above?  By which I mean would patches which did this
> be considered, or would this be deemed not to be useful / desirable?
>
> Thanks,
> James
>
> On 12/11/2007, Bill Davidsen <davidsen@tmr.com> wrote:
> > James Lee wrote:
> > > I have come across an unusual RAID configuration type which differs
> > > from any of the standard RAID 0/1/4/5 levels currently available in
> > > the md driver, and has a couple of very useful properties (see below).
> > >  I think it would be useful to have this code included in the main
> > > kernel, as it allows for some use cases that aren't well catered for
> > > with the standard RAID levels.  I was wondering what people's thoughts
> > > on this might be?
> > >
> > > The RAID type has been named "unRAID" by it's author, and is basically
> > > similar to RAID 4 but without data being striped across the drives in
> > > the array.  In an n-drive array (where the drives need not have the
> > > same capacity), n-1 of the drives appear as independent drives with
> > > data written to them as with a single standalone drive, and the 1
> > > remaining drive is a parity drive (this must be the largest capacity
> > > drive), which stores the bitwise XOR of the data on the other n-1
> > > drives (where the data being XORed is taken to be 0 if we're past the
> > > end of that particular drive).  Data recovery then works as per normal
> > > RAID 4/5 in the case of the failure of any one of the drives in the
> > > array.
> > >
> > > The advantages of this are:
> > > - Drives need not be of the same size as each other; the only
> > > requirement is that the parity drive must be the largest drive in the
> > > array.  The available space of the array is the sum of the space of
> > > all drives in the array, minus the size of the largest drive.
> > > - Data protection is slightly better than with RAID 4/5 in that in the
> > > event of multiple drive failures, only some data is lost (since the
> > > data on any non-failed, non-parity drives is usable).
> > >
> > > The disadvantages are:
> > > - Performance:
> > >     - As there is no striping, on a non-degraded array the read
> > > performance will be identical to that of a single drive setup, and the
> > > write performance will be comparable or somewhat worse than that of a
> > > single-drive setup.
> > >     - On a degraded arrays with many drives the read and write
> > > performance could take further hits due to the PCI / PCI-E bus getting
> > > saturated.
> > >
> >
> > I personally feel that "this still looks like a bunch of little drives"
> > should be listed first...
> > > The company which has implemented this is "Lime technology" (website
> > > here: http://www.lime-technology.com/); an overview of the technical
> > > detail is given on their website here:
> > > http://www.lime-technology.com/wordpress/?page_id=13.  The changes
> > > made to the Linux md driver to support this have been released under
> > > the GPL by the author - I've attached these to this email.
> > >
> > > Now I'm guessing that the reason this hasn't been implemented before
> > > is that in most cases the points above mean that this is a worse
> > > option than RAID 5, however there is a strong use case for this
> > > system.  For many home users who want data redundancy, the current
> > > RAID levels are impractical because the user has many hard drives of
> > > different sizes accumulated over the years.  Even for new setups, it
> > >
> >
> > And over the years is just the problem. You have a bunch of tiny drives
> > unsuited, or marginally suited, for the size of modern distributions,
> > and using assorted old technology. There's a reason these are thrown out
> > and available, they're pretty much useless. Also they're power-hungry,
> > slow, and you would probably need more of these than would fit in a
> > standard case to provide even minimal useful size. They're also probably
> > PATA, meaning that many modern motherboards don't support them well (if
> > at all).
> > > is generally not cost-effective to buy multiple identical sized hard
> > > drives, compared with incrementally adding storage of the capacity
> > > which is at the best price per GB at the time.  The fact that there is
> > > a need for this type of flexibility is evidenced, for example, by
> > > various forum threads such as for example this thread containing over
> > > 1500 posts in a specialized audio / video forum:
> > > http://www.avsforum.com/avs-vb/showthread.php?t=573986, as well as the
> > > active community in the forums on the Lime technology website.
> > >
> >
> > I can buy 500GB USB drives for $98+tax if I wait until Stapels or Office
> > Max have a sale, $120 anytime, anywhere. I see 250GB PATA drives being
> > flogged for $50-70 for lack of demand. I simply can't imagine any case
> > where this would be useful other than as a proof of concept.
> >
> > Note: you can do this with existing code by setting up a partitions of
> > various sizes on multiple drives, first partitions size of smallest
> > drive, next the remaining space on the next drive, etc. On every set of
> >  >2 drives make a raid-5, on every set of two drives make raid-10, and
> > have a bunch of smaller redundant drives which are faster. You can then
> > combine them all into one linear array if it pleases you. I have a crate
> > if drives from 360MB (six) to about 4GB, and they are going to be sold
> > by the pound because they are garbage.
> > > Would there be interest in making this kind of addition to the md code?
> > >
> >
> > I can't see that the cost of maintaining it is justified by the benefit,
> > but not my decision. If you were to set up such a thing using FUSE,
> > keeping it out of the kernel but still providing the functionality, it
> > might be worth doing. On the other hand, setting up the partitions and
> > creating the arrays could probably be done by a perl script which would
> > take only a few hours to write.
> > > PS: In case it wasn't clear, the attached code is simply the code the
> > > author has released under GPL - it's intended just for reference, not
> > > as proposed code for review.
> > >
> > Much as I generally like adding functionality, I *really* can't see much
> > in this idea. It seems to me to be in the "clever but not useful" category.
> >
> > --
> > bill davidsen <davidsen@tmr.com>
> >   CTO TMR Associates, Inc
> >   Doing interesting things with small computers since 1979
> >
> >
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Proposal: non-striping RAID4
  2007-11-14  1:06     ` James Lee
@ 2007-11-14 23:16       ` Bill Davidsen
  2007-11-15  0:24         ` James Lee
  0 siblings, 1 reply; 17+ messages in thread
From: Bill Davidsen @ 2007-11-14 23:16 UTC (permalink / raw)
  To: James Lee; +Cc: linux-raid

James Lee wrote:
> >From a quick search through this mailing list, it looks like I can
> answer my own question regarding RAID1 --> RAID5 conversion.  Instead
> of creating a RAID1 array for the partitions on the two biggest
> drives, it should just create a 2-drive RAID5 (which is identical, but
> can be expanded as with any other RAID5 array).
>
> So it looks like this should work I guess.

I believe what you want to create might be a three drive raid-5 with one 
failed drive. That way you can just add a drive when you want.

  mdadm -C -c32 -l5 -n3 -amd /dev/md7 /dev/loop[12] missing

Then you can add another drive:

  mdadm --add /dev/md7 /dev/loop3

The output are at the end of this message.

But in general think it would be really great to be able to have a 
format which would do raid-5 or raid-6 over all the available parts of 
multiple drives, and since there's some similar logic for raid-10 over a 
selection of drives it is clearly possible. But in terms of the benefit 
to be gained, unless it fails out of the code and someone feels the 
desire to do it, I can't see much joy to ever having such a thing.

The feature I would really like to have is raid5e, distributed spare so 
head motion is spread over all drives. Don't have time to look at that 
one, either, but it really helps performance under load with small arrays.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Proposal: non-striping RAID4
  2007-11-14 23:16       ` Bill Davidsen
@ 2007-11-15  0:24         ` James Lee
  2007-11-15  6:01           ` Neil Brown
  0 siblings, 1 reply; 17+ messages in thread
From: James Lee @ 2007-11-15  0:24 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: linux-raid

But creating a 3-drive RAID5 with a missing device for the final two
drives wouldn't give me what I'm looking for, as that array would no
longer be fault-tolerant.  So I think what we'd have on an array of n
differently-sized drives is:
- One n drive RAID5 array.
- One (n-1) drive RAID5 array.
...
- One 2 drive RAID5 array.
- One non-RAIDed single partition.

All of these except for the non-RAIDed partition would then be used as
elements in a linear array (which would tolerate the failure of any
single drive, as each of its constituent arrays does).  This would
leave a single non-RAIDed partition which can be used for anything
else.

Thinking back over it, I think one potential issue might be how resync
works.  If all of the RAID5 arrays become in need of resync at the
same time (which is perfectly likely - e.g. if the system is powered
down abruptly, a drive is replaced, ...) will the md driver attempt to
resync each of the arrays sequentially or in parallel?  If the latter,
this is likely to be extremely slow, as it'll be trying to resync
multiple arrays on the same drives (and therefore doing huge amounts
of seeking, etc.).

The other issue is that it looks like (correct me if I'm wrong here),
mdadm doesn't support growing a linear array by increasing the size of
it's constituent parts (which is what would be required here to be
able to expand the entire array when adding a new drive).  I don't
know how hard this would be to implement (I don't know how data gets
arranged in a linear array - does it start with all of the first
drive, then the second, and so on or does it write bits to each?).

Neil: any comments on whether this would be desirable / useful / feasible?

James

PS: and as you say, all of the above could also be done with RAID6
arrays instead of RAID5.

On 14/11/2007, Bill Davidsen <davidsen@tmr.com> wrote:
> James Lee wrote:
> > >From a quick search through this mailing list, it looks like I can
> > answer my own question regarding RAID1 --> RAID5 conversion.  Instead
> > of creating a RAID1 array for the partitions on the two biggest
> > drives, it should just create a 2-drive RAID5 (which is identical, but
> > can be expanded as with any other RAID5 array).
> >
> > So it looks like this should work I guess.
>
> I believe what you want to create might be a three drive raid-5 with one
> failed drive. That way you can just add a drive when you want.
>
>   mdadm -C -c32 -l5 -n3 -amd /dev/md7 /dev/loop[12] missing
>
> Then you can add another drive:
>
>   mdadm --add /dev/md7 /dev/loop3
>
> The output are at the end of this message.
>
> But in general think it would be really great to be able to have a
> format which would do raid-5 or raid-6 over all the available parts of
> multiple drives, and since there's some similar logic for raid-10 over a
> selection of drives it is clearly possible. But in terms of the benefit
> to be gained, unless it fails out of the code and someone feels the
> desire to do it, I can't see much joy to ever having such a thing.
>
> The feature I would really like to have is raid5e, distributed spare so
> head motion is spread over all drives. Don't have time to look at that
> one, either, but it really helps performance under load with small arrays.
>
> --
> bill davidsen <davidsen@tmr.com>
>   CTO TMR Associates, Inc
>   Doing interesting things with small computers since 1979
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Proposal: non-striping RAID4
  2007-11-15  0:24         ` James Lee
@ 2007-11-15  6:01           ` Neil Brown
  0 siblings, 0 replies; 17+ messages in thread
From: Neil Brown @ 2007-11-15  6:01 UTC (permalink / raw)
  To: James Lee; +Cc: Bill Davidsen, linux-raid

On Thursday November 15, james.lee@cantab.net wrote:
> 
> Neil: any comments on whether this would be desirable / useful / feasible?

1/ Have in raid4 variant which arranges the data like 'linear' is
   something I am planning to do eventually.  If your filesystem nows
   about the geometry of the array , then it can distribute the data
   across the drives and can make up for a lot of the benefits of
   striping.  The big advantage of such an arrangement is that it is
   trivial to add a drive - just zero it and make it part of the
   array.  No need to re-arrange what is currently there.
   However I was not thinking of support different sizes devices in
   such a configuration.

2/ Having an array with redundancy where drives are of different sizes
   is awkward, primarily because if there was a spare that as not as
   large as the largest device, you may-or-may not be able to rebuild
   in that situation.   Certainly I could code up those decisions, but
   I'm not sure the scenario is worth the complexity.
   If you have drives of different sizes, use raid0 to combine pairs
   of smaller one to match larger ones, and do raid5 across devices
   that look like the same size.

3/ If you really want to use exactly what you have, you can partition
   them into bits and make a variety of raid5 arrays as you suggest.
   md will notice and will resync in series so that you don't kill
   performance.

NeilBrown

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Proposal: non-striping RAID4
@ 2007-11-23 15:58 Chris Green
  0 siblings, 0 replies; 17+ messages in thread
From: Chris Green @ 2007-11-23 15:58 UTC (permalink / raw)
  To: linux-raid

Ignoring the issues of using mismatched drive sizes, is it possible to 
achieve the data resilience of this non-striped raid4 using the current
linux-raid?

 I'm not so interested in being able to use mismatched drives, etc,
but I am interested in being able to create a raid setup with the
enhanced data loss protection that this gives you. For a read-mostly
system that doesn't need the extra read speed that raid5 gets you, its
pretty compelling. With a 5 drive raid5 system, if I lose 2 drives, I
lose all my data, while with this setup, I only lose 1/4 of it

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Proposal: non-striping RAID4
@ 2008-05-22 21:15 Tony Germano
  2008-05-22 22:10 ` David Lethe
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Tony Germano @ 2008-05-22 21:15 UTC (permalink / raw)
  To: linux-raid

I would like to bring this back to the attention of the group (from November 2007) since the conversation died off and it looks like a few key features important to me were left out of the discussion... *grin*

The original post was regarding "unRAID" developed by http://lime-technology.com/

I had an idea in my head, and "unRAID" has features almost identical to what I was thinking about with the exception of a couple deal breaking design decisions. These are due to the proprietary front end, not the modified driver.

Bad decision #1) Implementation is for a NAS Appliance. Files are only accessible through a Samba share. (Though this is great for the hoards of people that use it as network storage for their windows media center pcs.)

Bad decision #2) Imposed ReiserFS.

Oh yeah, and it's not free in either sense of the word.

The most relevant uses I can think of for this type of array are archive storage and low use media servers. Keeping that in mind...

Good Thing #1)
"JBOD with parity." Each usable disk is seen separately and has its own filesystem. This allows mixed sized disks and replacing older smaller drives with newer larger ones one at a time while utilizing the extra capacity right away (after expanding the filesystem.) In the event that two or more disks are lost, surviving non-parity disks still have 100% of their data. (Adding a new disk larger than the parity disk is possible, but takes multiple steps of converting it to the new parity disk and then adding the old parity disk back to the array as a regular disk... acceptable to me)

Good Thing #2)
You can spin down idle disks. Since there is no data striping and file systems don't [have to] span drives, reading a file only requires 1 disk to be spinning. Writing only requires 1 disk + parity disk. This is an important feature to the "GREEN" community. On my mythtv server, I only record a few shows each week. I would have disks in this setup possibly not accessed for weeks or even months at a time. They don't need to be spinning, and performance is of no importance to me as long as it can keep up with writing HD streams.

Hopefully this brings a new perspective to the idea.

Thanks,
Tony Germano
_________________________________________________________________
Keep your kids safer online with Windows Live Family Safety.
http://www.windowslive.com/family_safety/overview.html?ocid=TXT_TAGLM_WL_Refresh_family_safety_052008

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: Proposal: non-striping RAID4
  2008-05-22 21:15 Proposal: non-striping RAID4 Tony Germano
@ 2008-05-22 22:10 ` David Lethe
  2008-05-22 22:56   ` Tony Germano
  2008-05-23 15:12 ` Roger Heflin
  2008-05-23 15:47 ` Chris Green
  2 siblings, 1 reply; 17+ messages in thread
From: David Lethe @ 2008-05-22 22:10 UTC (permalink / raw)
  To: Tony Germano, linux-raid

I personally have a real problem with sleepy drives.  There is no ANSI
specification, and no drive vendors are making disks (today) that are
engineered for this.  Granted spinning down disks saves power & heat,
but since disks aren't yet engineered for frequent spin-ups, then there
are industry-wide concerns about disk life.

Without benefit of an ANSI spec for this mode (and why stop at sleep,
have several lower-RPM speeds that sacrifice performance for heat/power
savings), then I just see too many problems for general use, so it would
have to be limited to an appliance.  The appliance vendor would probably
have to carefully test & qualify disks, and insure that applications
won't constantly spin disks up and have problems with 30+ sec timeouts
and such.

I think best next step is to write a bunch of emails to the various ANSI
T10, T11, and T13 committee members and have them work out a spec so we
have rules, and disks that are designed for this purpose.

 Undoubtedly there is a need, but without standards it will be a kludge.

David lethe

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Tony Germano
Sent: Thursday, May 22, 2008 4:16 PM
To: linux-raid@vger.kernel.org
Subject: Re: Proposal: non-striping RAID4

I would like to bring this back to the attention of the group (from
November 2007) since the conversation died off and it looks like a few
key features important to me were left out of the discussion... *grin*

The original post was regarding "unRAID" developed by
http://lime-technology.com/

I had an idea in my head, and "unRAID" has features almost identical to
what I was thinking about with the exception of a couple deal breaking
design decisions. These are due to the proprietary front end, not the
modified driver.

Bad decision #1) Implementation is for a NAS Appliance. Files are only
accessible through a Samba share. (Though this is great for the hoards
of people that use it as network storage for their windows media center
pcs.)

Bad decision #2) Imposed ReiserFS.

Oh yeah, and it's not free in either sense of the word.

The most relevant uses I can think of for this type of array are archive
storage and low use media servers. Keeping that in mind...

Good Thing #1)
"JBOD with parity." Each usable disk is seen separately and has its own
filesystem. This allows mixed sized disks and replacing older smaller
drives with newer larger ones one at a time while utilizing the extra
capacity right away (after expanding the filesystem.) In the event that
two or more disks are lost, surviving non-parity disks still have 100%
of their data. (Adding a new disk larger than the parity disk is
possible, but takes multiple steps of converting it to the new parity
disk and then adding the old parity disk back to the array as a regular
disk... acceptable to me)

Good Thing #2)
You can spin down idle disks. Since there is no data striping and file
systems don't [have to] span drives, reading a file only requires 1 disk
to be spinning. Writing only requires 1 disk + parity disk. This is an
important feature to the "GREEN" community. On my mythtv server, I only
record a few shows each week. I would have disks in this setup possibly
not accessed for weeks or even months at a time. They don't need to be
spinning, and performance is of no importance to me as long as it can
keep up with writing HD streams.

Hopefully this brings a new perspective to the idea.

Thanks,
Tony Germano
_________________________________________________________________
Keep your kids safer online with Windows Live Family Safety.
http://www.windowslive.com/family_safety/overview.html?ocid=TXT_TAGLM_WL
_Refresh_family_safety_052008--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: Proposal: non-striping RAID4
  2008-05-22 22:10 ` David Lethe
@ 2008-05-22 22:56   ` Tony Germano
  0 siblings, 0 replies; 17+ messages in thread
From: Tony Germano @ 2008-05-22 22:56 UTC (permalink / raw)
  To: linux-raid


First off, I totally agree with you that standards need to be defined. I also agree with you that in most situations you would not want to be frequently spinning drives down and up.

That being said, I was explaining particular applications where the drives are not frequently accessed, and that's where this would be useful.

Western Digital has green drives already in production (albeit without standards.) See http://www.wdc.com/en/products/Products.asp?DriveID=336

I'm glad I double checked because I almost said these drives have variable speeds. It looks like the different models spin at different rates depending on how many platters they have. However, they do give the sleep/standby power usage as ~3W less than idle. I would imagine that number would be even larger for "non-green" drives. That can make a huge difference over time and many drives.

Implementing this disk array type would not enforce drive spin down, but rather allow it where standard raid types do not (within reason.)

I think of this approach as a bunch of independent disks that sacrifice a little write performance for peace of mind knowing that you can rebuild one if necessary. It has limited application, but would be a terrific option where applicable (online archive that is infrequently accessed.)

Tony Germano


> Subject: RE: Proposal: non-striping RAID4
> Date: Thu, 22 May 2008 17:10:08 -0500
> From: david@santools.com
> To: tony_germano@hotmail.com; linux-raid@vger.kernel.org
>
> I personally have a real problem with sleepy drives. There is no ANSI
> specification, and no drive vendors are making disks (today) that are
> engineered for this. Granted spinning down disks saves power & heat,
> but since disks aren't yet engineered for frequent spin-ups, then there
> are industry-wide concerns about disk life.
>
> Without benefit of an ANSI spec for this mode (and why stop at sleep,
> have several lower-RPM speeds that sacrifice performance for heat/power
> savings), then I just see too many problems for general use, so it would
> have to be limited to an appliance. The appliance vendor would probably
> have to carefully test & qualify disks, and insure that applications
> won't constantly spin disks up and have problems with 30+ sec timeouts
> and such.
>
> I think best next step is to write a bunch of emails to the various ANSI
> T10, T11, and T13 committee members and have them work out a spec so we
> have rules, and disks that are designed for this purpose.
>
> Undoubtedly there is a need, but without standards it will be a kludge.
>
>
> David lethe
>
> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org
> [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Tony Germano
> Sent: Thursday, May 22, 2008 4:16 PM
> To: linux-raid@vger.kernel.org
> Subject: Re: Proposal: non-striping RAID4
>
>
> I would like to bring this back to the attention of the group (from
> November 2007) since the conversation died off and it looks like a few
> key features important to me were left out of the discussion... *grin*
>
> The original post was regarding "unRAID" developed by
> http://lime-technology.com/
>
> I had an idea in my head, and "unRAID" has features almost identical to
> what I was thinking about with the exception of a couple deal breaking
> design decisions. These are due to the proprietary front end, not the
> modified driver.
>
> Bad decision #1) Implementation is for a NAS Appliance. Files are only
> accessible through a Samba share. (Though this is great for the hoards
> of people that use it as network storage for their windows media center
> pcs.)
>
> Bad decision #2) Imposed ReiserFS.
>
> Oh yeah, and it's not free in either sense of the word.
>
> The most relevant uses I can think of for this type of array are archive
> storage and low use media servers. Keeping that in mind...
>
> Good Thing #1)
> "JBOD with parity." Each usable disk is seen separately and has its own
> filesystem. This allows mixed sized disks and replacing older smaller
> drives with newer larger ones one at a time while utilizing the extra
> capacity right away (after expanding the filesystem.) In the event that
> two or more disks are lost, surviving non-parity disks still have 100%
> of their data. (Adding a new disk larger than the parity disk is
> possible, but takes multiple steps of converting it to the new parity
> disk and then adding the old parity disk back to the array as a regular
> disk... acceptable to me)
>
> Good Thing #2)
> You can spin down idle disks. Since there is no data striping and file
> systems don't [have to] span drives, reading a file only requires 1 disk
> to be spinning. Writing only requires 1 disk + parity disk. This is an
> important feature to the "GREEN" community. On my mythtv server, I only
> record a few shows each week. I would have disks in this setup possibly
> not accessed for weeks or even months at a time. They don't need to be
> spinning, and performance is of no importance to me as long as it can
> keep up with writing HD streams.
>
> Hopefully this brings a new perspective to the idea.
>
> Thanks,
> Tony Germano
> _________________________________________________________________
> Keep your kids safer online with Windows Live Family Safety.
> http://www.windowslive.com/family_safety/overview.html?ocid=TXT_TAGLM_WL
> _Refresh_family_safety_052008--
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

_________________________________________________________________
Make every e-mail and IM count. Join the i’m Initiative from Microsoft.
http://im.live.com/Messenger/IM/Join/Default.aspx?source=EML_WL_ MakeCount--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Proposal: non-striping RAID4
  2008-05-22 21:15 Proposal: non-striping RAID4 Tony Germano
  2008-05-22 22:10 ` David Lethe
@ 2008-05-23 15:12 ` Roger Heflin
  2008-05-23 15:47 ` Chris Green
  2 siblings, 0 replies; 17+ messages in thread
From: Roger Heflin @ 2008-05-23 15:12 UTC (permalink / raw)
  To: Tony Germano; +Cc: linux-raid

Tony Germano wrote:
> I would like to bring this back to the attention of the group (from November 2007) since the conversation died off and it looks like a few key features important to me were left out of the discussion... *grin*
> 
> The original post was regarding "unRAID" developed by http://lime-technology.com/
> 
> I had an idea in my head, and "unRAID" has features almost identical to what I was thinking about with the exception of a couple deal breaking design decisions. These are due to the proprietary front end, not the modified driver.
> 
> Bad decision #1) Implementation is for a NAS Appliance. Files are only accessible through a Samba share. (Though this is great for the hoards of people that use it as network storage for their windows media center pcs.)
> 
> Bad decision #2) Imposed ReiserFS.
> 
> Oh yeah, and it's not free in either sense of the word.
> 
> The most relevant uses I can think of for this type of array are archive storage and low use media servers. Keeping that in mind...
> 
> Good Thing #1)
> "JBOD with parity." Each usable disk is seen separately and has its own filesystem. This allows mixed sized disks and replacing older smaller drives with newer larger ones one at a time while utilizing the extra capacity right away (after expanding the filesystem.) In the event that two or more disks are lost, surviving non-parity disks still have 100% of their data. (Adding a new disk larger than the parity disk is possible, but takes multiple steps of converting it to the new parity disk and then adding the old parity disk back to the array as a regular disk... acceptable to me)
> 
> Good Thing #2)
> You can spin down idle disks. Since there is no data striping and file systems don't [have to] span drives, reading a file only requires 1 disk to be spinning. Writing only requires 1 disk + parity disk. This is an important feature to the "GREEN" community. On my mythtv server, I only record a few shows each week. I would have disks in this setup possibly not accessed for weeks or even months at a time. They don't need to be spinning, and performance is of no importance to me as long as it can keep up with writing HD streams.
> 
> Hopefully this brings a new perspective to the idea.
> 

I would think (for mythtv and similar uses) that they way to handle this would 
be to setup the raid array similar to enterprise class offline storage systems, 
you have a local disk cache of say 10-20GB, and when something is accessed you 
spinup the array and copy the entire file in, on writing, every hour or so (or 
any time you have to spin up the array) you copy all or part of the file off of 
the cache onto the array.   This would require either proper hooks in the kernel 
to deal with the offline storage concept or software at the application level to 
do it.

I know most of the offline storage has all of the files showing on a filesytem 
with proper metadata, but the file data is actually elsewhere, and when an 
application access the files the offline system (behind the scenes) brings the 
file data back from the offline storage onto the cache.   With this on a myth 
system I would expect the array to at most be spun up < 1x per hour for under 10 
minutes (my array does 35MB/s write, 90MB/s read-1.5GB recording would take 45 
seconds to copy from cache to array, and a reading a recording would take <20 
seconds to copy from cache to array plus spinup time), so a most the array would 
be actually spun up for maybe 3-5 minutes per hour under heavy usage, and 
probably not spun up at all in when things were not used.    My array uses about 
40W for the 4 disks, so being spun down 23 hours a day would save about 1KWhr 
per day, At the low power rate I pay (0.07/kwh) that comes to about $25 per year 
in power, in some more expensive states it would be 2-3 times that, and probably 
higher in Europe.

The big question is would the disks survive being spun down that often?

                             Roger

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: Proposal: non-striping RAID4
  2008-05-22 21:15 Proposal: non-striping RAID4 Tony Germano
  2008-05-22 22:10 ` David Lethe
  2008-05-23 15:12 ` Roger Heflin
@ 2008-05-23 15:47 ` Chris Green
  2008-05-24 14:21   ` Bill Davidsen
  2 siblings, 1 reply; 17+ messages in thread
From: Chris Green @ 2008-05-23 15:47 UTC (permalink / raw)
  To: Tony Germano, linux-raid

I would really like to have this functionality. Honestly, its pretty
much perfect for the "home server" application (which I have several
of), where:

   - writes are far less common than reads,
   - The system goes hours without any reads and days without any
writes.
   - single drive read speed is plenty for the applications that are
sitting on the other side
   - a lot of the data is too voluminous to backup (media that can just
be re-ripped or downloaded).
   - you want some redundancy beyond a single drive copy, but don't want
to spend a lot of drives on it. The model of "if you lose 1 disk, you
lose nothing, if you lose 2 disks you lose a portion" is better than the
raid5 model of losing everything with a double-disk failure.
   - a common access pattern is to do a long sequential read at a slow
rate that takes hours to go through a few gigs (playing media).

-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Tony Germano
Sent: Thursday, May 22, 2008 2:16 PM
To: linux-raid@vger.kernel.org
Subject: Re: Proposal: non-striping RAID4

I would like to bring this back to the attention of the group (from
November 2007) since the conversation died off and it looks like a few
key features important to me were left out of the discussion... *grin*

The original post was regarding "unRAID" developed by
http://lime-technology.com/

I had an idea in my head, and "unRAID" has features almost identical to
what I was thinking about with the exception of a couple deal breaking
design decisions. These are due to the proprietary front end, not the
modified driver.

Bad decision #1) Implementation is for a NAS Appliance. Files are only
accessible through a Samba share. (Though this is great for the hoards
of people that use it as network storage for their windows media center
pcs.)

Bad decision #2) Imposed ReiserFS.

Oh yeah, and it's not free in either sense of the word.

The most relevant uses I can think of for this type of array are archive
storage and low use media servers. Keeping that in mind...

Good Thing #1)
"JBOD with parity." Each usable disk is seen separately and has its own
filesystem. This allows mixed sized disks and replacing older smaller
drives with newer larger ones one at a time while utilizing the extra
capacity right away (after expanding the filesystem.) In the event that
two or more disks are lost, surviving non-parity disks still have 100%
of their data. (Adding a new disk larger than the parity disk is
possible, but takes multiple steps of converting it to the new parity
disk and then adding the old parity disk back to the array as a regular
disk... acceptable to me)

Good Thing #2)
You can spin down idle disks. Since there is no data striping and file
systems don't [have to] span drives, reading a file only requires 1 disk
to be spinning. Writing only requires 1 disk + parity disk. This is an
important feature to the "GREEN" community. On my mythtv server, I only
record a few shows each week. I would have disks in this setup possibly
not accessed for weeks or even months at a time. They don't need to be
spinning, and performance is of no importance to me as long as it can
keep up with writing HD streams.

Hopefully this brings a new perspective to the idea.

Thanks,
Tony Germano
_________________________________________________________________
Keep your kids safer online with Windows Live Family Safety.
http://www.windowslive.com/family_safety/overview.html?ocid=TXT_TAGLM_WL
_Refresh_family_safety_052008--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: Proposal: non-striping RAID4
  2008-05-24 14:21   ` Bill Davidsen
@ 2008-05-24 14:19     ` Chris Green
  2008-05-28 23:14       ` Bill Davidsen
  0 siblings, 1 reply; 17+ messages in thread
From: Chris Green @ 2008-05-24 14:19 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Tony Germano, linux-raid

I don't think this quite does it. It sounds like that would give me the
spin down capability, but not (to me), the most
interesting facility - the ability to have a system with
RAID5-equivalent redundancy
(i.e. I get N-1 drives worth of storage and can recover perfectly from
the loss of 1 drive) but also lets me survive a multiple 
drive failure with only partial data loss.

-----Original Message-----
From: Bill Davidsen [mailto:davidsen@tmr.com] 
Sent: Saturday, May 24, 2008 7:22 AM
To: Chris Green
Cc: Tony Germano; linux-raid@vger.kernel.org
Subject: Re: Proposal: non-striping RAID4

Chris Green wrote:
> I would really like to have this functionality. Honestly, its pretty
> much perfect for the "home server" application (which I have several
> of), where:
>
>    - writes are far less common than reads,
>    - The system goes hours without any reads and days without any
> writes.
>    - single drive read speed is plenty for the applications that are
> sitting on the other side
>    - a lot of the data is too voluminous to backup (media that can
just
> be re-ripped or downloaded).
>    - you want some redundancy beyond a single drive copy, but don't
want
> to spend a lot of drives on it. The model of "if you lose 1 disk, you
> lose nothing, if you lose 2 disks you lose a portion" is better than
the
> raid5 model of losing everything with a double-disk failure.
>    - a common access pattern is to do a long sequential read at a slow
> rate that takes hours to go through a few gigs (playing media).
>  

I think you can do this right now with a touch of cleverness...

Assume you create a raid-1 array, load your data, and call that
initialized.

 From cron, daily or weekly, you set one drive of the array 
"write-mostly" and set the spin-down time (hdparm -S) to an hour or so. 
Now reads will go to one drive, the other will spin down, *and*, should 
you do one of those infrequent writes, the idle drive will spin back up 
and write the data (I want a bitmap of course). At the end of the time 
period you clear the write-mostly and spin-down time on the idle drive, 
put them on the other drive, and ideally you wind up with redundancy, 
splitting the disk wear evenly, and using existing capabilities.

Actually you can't quite use existing capabilities, write-mostly can 
only be used at inconvenient times, like build, create, or add, so it's 
not obviously possible to change without at least shutting the array 
down. Perhaps Neil will give us his thoughts on that. However, if you 
don't mind a *really* ugly script, you might be able to mark the active 
drive failed, which would force all i/o to the previously sleeping 
drive, then remove the previously active drive, and add it back in using

write-mostly. You would do a full sync (I think) but the change would be

made.

Better to make write-mostly a flag which can be enabled and disabled at 
will. That would be useful when a remote drive is normally operated over

a fast link and has to drop to a slow backup link. I'm sure other uses 
would be found.

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Proposal: non-striping RAID4
  2008-05-23 15:47 ` Chris Green
@ 2008-05-24 14:21   ` Bill Davidsen
  2008-05-24 14:19     ` Chris Green
  0 siblings, 1 reply; 17+ messages in thread
From: Bill Davidsen @ 2008-05-24 14:21 UTC (permalink / raw)
  To: Chris Green; +Cc: Tony Germano, linux-raid

Chris Green wrote:
> I would really like to have this functionality. Honestly, its pretty
> much perfect for the "home server" application (which I have several
> of), where:
>
>    - writes are far less common than reads,
>    - The system goes hours without any reads and days without any
> writes.
>    - single drive read speed is plenty for the applications that are
> sitting on the other side
>    - a lot of the data is too voluminous to backup (media that can just
> be re-ripped or downloaded).
>    - you want some redundancy beyond a single drive copy, but don't want
> to spend a lot of drives on it. The model of "if you lose 1 disk, you
> lose nothing, if you lose 2 disks you lose a portion" is better than the
> raid5 model of losing everything with a double-disk failure.
>    - a common access pattern is to do a long sequential read at a slow
> rate that takes hours to go through a few gigs (playing media).
>  

I think you can do this right now with a touch of cleverness...

Assume you create a raid-1 array, load your data, and call that initialized.

 From cron, daily or weekly, you set one drive of the array 
"write-mostly" and set the spin-down time (hdparm -S) to an hour or so. 
Now reads will go to one drive, the other will spin down, *and*, should 
you do one of those infrequent writes, the idle drive will spin back up 
and write the data (I want a bitmap of course). At the end of the time 
period you clear the write-mostly and spin-down time on the idle drive, 
put them on the other drive, and ideally you wind up with redundancy, 
splitting the disk wear evenly, and using existing capabilities.

Actually you can't quite use existing capabilities, write-mostly can 
only be used at inconvenient times, like build, create, or add, so it's 
not obviously possible to change without at least shutting the array 
down. Perhaps Neil will give us his thoughts on that. However, if you 
don't mind a *really* ugly script, you might be able to mark the active 
drive failed, which would force all i/o to the previously sleeping 
drive, then remove the previously active drive, and add it back in using 
write-mostly. You would do a full sync (I think) but the change would be 
made.

Better to make write-mostly a flag which can be enabled and disabled at 
will. That would be useful when a remote drive is normally operated over 
a fast link and has to drop to a slow backup link. I'm sure other uses 
would be found.

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Proposal: non-striping RAID4
  2008-05-24 14:19     ` Chris Green
@ 2008-05-28 23:14       ` Bill Davidsen
  2008-05-30 17:23         ` Tony Germano
  0 siblings, 1 reply; 17+ messages in thread
From: Bill Davidsen @ 2008-05-28 23:14 UTC (permalink / raw)
  To: Chris Green; +Cc: Tony Germano, linux-raid

Chris Green wrote:
> I don't think this quite does it. It sounds like that would give me the
> spin down capability, but not (to me), the most
> interesting facility - the ability to have a system with
> RAID5-equivalent redundancy
> (i.e. I get N-1 drives worth of storage and can recover perfectly from
> the loss of 1 drive) but also lets me survive a multiple 
> drive failure with only partial data loss.
> 
Agree, you can't get that with current kernel, although I think the 
parts are there to actually make it work. All it would take is some 
clever calling of raid-4 operations to get it going. It's not trivial, 
but if you knew the code it might fall out by laying the raid-4 out and 
just changing the mapping such that all of the sequential sectors fall 
on the same device. Interesting summer project for someone.

> -----Original Message-----
> From: Bill Davidsen [mailto:davidsen@tmr.com] 
> Sent: Saturday, May 24, 2008 7:22 AM
> To: Chris Green
> Cc: Tony Germano; linux-raid@vger.kernel.org
> Subject: Re: Proposal: non-striping RAID4
> 
> Chris Green wrote:
>> I would really like to have this functionality. Honestly, its pretty
>> much perfect for the "home server" application (which I have several
>> of), where:
>>
>>    - writes are far less common than reads,
>>    - The system goes hours without any reads and days without any
>> writes.
>>    - single drive read speed is plenty for the applications that are
>> sitting on the other side
>>    - a lot of the data is too voluminous to backup (media that can
> just
>> be re-ripped or downloaded).
>>    - you want some redundancy beyond a single drive copy, but don't
> want
>> to spend a lot of drives on it. The model of "if you lose 1 disk, you
>> lose nothing, if you lose 2 disks you lose a portion" is better than
> the
>> raid5 model of losing everything with a double-disk failure.
>>    - a common access pattern is to do a long sequential read at a slow
>> rate that takes hours to go through a few gigs (playing media).
>>  
> 
> I think you can do this right now with a touch of cleverness...
> 
> Assume you create a raid-1 array, load your data, and call that
> initialized.
> 
>  From cron, daily or weekly, you set one drive of the array 
> "write-mostly" and set the spin-down time (hdparm -S) to an hour or so. 
> Now reads will go to one drive, the other will spin down, *and*, should 
> you do one of those infrequent writes, the idle drive will spin back up 
> and write the data (I want a bitmap of course). At the end of the time 
> period you clear the write-mostly and spin-down time on the idle drive, 
> put them on the other drive, and ideally you wind up with redundancy, 
> splitting the disk wear evenly, and using existing capabilities.
> 
> Actually you can't quite use existing capabilities, write-mostly can 
> only be used at inconvenient times, like build, create, or add, so it's 
> not obviously possible to change without at least shutting the array 
> down. Perhaps Neil will give us his thoughts on that. However, if you 
> don't mind a *really* ugly script, you might be able to mark the active 
> drive failed, which would force all i/o to the previously sleeping 
> drive, then remove the previously active drive, and add it back in using
> 
> write-mostly. You would do a full sync (I think) but the change would be
> 
> made.
> 
> Better to make write-mostly a flag which can be enabled and disabled at 
> will. That would be useful when a remote drive is normally operated over
> 
> a fast link and has to drop to a slow backup link. I'm sure other uses 
> would be found.
> 


-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: Proposal: non-striping RAID4
  2008-05-28 23:14       ` Bill Davidsen
@ 2008-05-30 17:23         ` Tony Germano
  0 siblings, 0 replies; 17+ messages in thread
From: Tony Germano @ 2008-05-30 17:23 UTC (permalink / raw)
  To: linux-raid

Unfortunately I'm not very familiar with writing driver code, and I'm
painfully out of practice with writing code in general. For someone
interested in looking, the source for the modified md driver is
available from lime-technology that they use to implement these
features. I don't know if I'm able to attach the files to this message,
so I'll just tell you how to get them.

Download from
http://lime-technology.com/dnlds/ the unRAID Server install. I got
version 4.2.4. I used 7zip (I'm in Windows at work) to open the archive
and then open the bzroot image. The source files are in
bzroot\[Content]\usr\src\linux-2.6.20\drivers\md\ . Hopefully that will
be useful to someone.

> Date: Wed, 28 May 2008 19:14:30 -0400
> From: davidsen@tmr.com
> To: cgreen@valvesoftware.com
> CC: tony_germano@hotmail.com; linux-raid@vger.kernel.org
> Subject: Re: Proposal: non-striping RAID4
>
> Agree, you can't get that with current kernel, although I think the
> parts are there to actually make it work. All it would take is some
> clever calling of raid-4 operations to get it going. It's not trivial,
> but if you knew the code it might fall out by laying the raid-4 out and
> just changing the mapping such that all of the sequential sectors fall
> on the same device. Interesting summer project for someone.

_________________________________________________________________
Give to a good cause with every e-mail. Join the i’m Initiative from Microsoft.
http://im.live.com/Messenger/IM/Join/Default.aspx?souce=EML_WL_ GoodCause--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2008-05-30 17:23 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-22 21:15 Proposal: non-striping RAID4 Tony Germano
2008-05-22 22:10 ` David Lethe
2008-05-22 22:56   ` Tony Germano
2008-05-23 15:12 ` Roger Heflin
2008-05-23 15:47 ` Chris Green
2008-05-24 14:21   ` Bill Davidsen
2008-05-24 14:19     ` Chris Green
2008-05-28 23:14       ` Bill Davidsen
2008-05-30 17:23         ` Tony Germano
  -- strict thread matches above, loose matches on Subject: below --
2007-11-23 15:58 Chris Green
2007-11-10  0:57 James Lee
2007-11-12  1:29 ` Bill Davidsen
2007-11-13 23:48   ` James Lee
2007-11-14  1:06     ` James Lee
2007-11-14 23:16       ` Bill Davidsen
2007-11-15  0:24         ` James Lee
2007-11-15  6:01           ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).