linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* btrfs-RAID(3 or 5/6/etc) like btrfs-RAID1?
@ 2014-02-13 16:13 Jim Salter
  2014-02-13 16:21 ` Hugo Mills
  2014-02-13 20:22 ` Goffredo Baroncelli
  0 siblings, 2 replies; 6+ messages in thread
From: Jim Salter @ 2014-02-13 16:13 UTC (permalink / raw)
  To: linux-btrfs

This might be a stupid question but...

Are there any plans to make parity RAID levels in btrfs similar to the 
current implementation of btrfs-raid1?

It took me a while to realize how different and powerful btrfs-raid1 is 
from traditional raid1.  The ability to string together virtually any 
combination of "mutt" hard drives together in arbitrary ways and yet 
maintain redundancy is POWERFUL, and is seriously going to be a killer 
feature advancing btrfs adoption in small environments.

The one real drawback to btrfs-raid1 is that you're committed to n/2 
storage efficiency, since you're using pure redundancy rather than 
parity on the array.  I was thinking about that this morning, and 
suddenly it occurred to me that you ought to be able to create a striped 
parity array in much the same way as a btrfs-raid1 array.

Let's say you have five disks, and you arbitrarily want to define a 
stripe length of four data blocks plus one parity block per "stripe".  
Right now, what you're looking at effectively amounts to a RAID3 array, 
like FreeBSD used to use.  But, what if we add two more disks? Or three 
more disks? Or ten more?  Is there any reason we can't keep our stripe 
length of four blocks + one parity block, and just distribute them 
relatively ad-hoc in the same way btrfs-raid1 distributes redundant data 
blocks across an ad-hoc array of disks?

This could be a pretty powerful setup IMO - if you implemented something 
like this, you'd be able to arbitrarily define your storage efficiency 
(percentage of parity blocks / data blocks) and your fault-tolerance 
level (how many drives you can afford to lose before failure) WITHOUT 
tying it directly to your underlying disks, or necessarily needing to 
rebalance as you add more disks to the array.  This would be a heck of a 
lot more flexible than ZFS' approach of adding more immutable vdevs.

Please feel free to tell me why I'm dumb for either 1. not realizing the 
obvious flaw in this idea or 2. not realizing it's already being worked 
on in exactly this fashion. =)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: btrfs-RAID(3 or 5/6/etc) like btrfs-RAID1?
  2014-02-13 16:13 btrfs-RAID(3 or 5/6/etc) like btrfs-RAID1? Jim Salter
@ 2014-02-13 16:21 ` Hugo Mills
  2014-02-13 16:32   ` Jim Salter
  2014-02-13 20:22 ` Goffredo Baroncelli
  1 sibling, 1 reply; 6+ messages in thread
From: Hugo Mills @ 2014-02-13 16:21 UTC (permalink / raw)
  To: Jim Salter; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3144 bytes --]

On Thu, Feb 13, 2014 at 11:13:58AM -0500, Jim Salter wrote:
> This might be a stupid question but...
> 
> Are there any plans to make parity RAID levels in btrfs similar to
> the current implementation of btrfs-raid1?

   Yes.

> It took me a while to realize how different and powerful btrfs-raid1
> is from traditional raid1.  The ability to string together virtually
> any combination of "mutt" hard drives together in arbitrary ways and
> yet maintain redundancy is POWERFUL, and is seriously going to be a
> killer feature advancing btrfs adoption in small environments.
> 
> The one real drawback to btrfs-raid1 is that you're committed to n/2
> storage efficiency, since you're using pure redundancy rather than
> parity on the array.  I was thinking about that this morning, and
> suddenly it occurred to me that you ought to be able to create a
> striped parity array in much the same way as a btrfs-raid1 array.
> 
> Let's say you have five disks, and you arbitrarily want to define a
> stripe length of four data blocks plus one parity block per
> "stripe".  Right now, what you're looking at effectively amounts to
> a RAID3 array, like FreeBSD used to use.  But, what if we add two
> more disks? Or three more disks? Or ten more?  Is there any reason
> we can't keep our stripe length of four blocks + one parity block,
> and just distribute them relatively ad-hoc in the same way
> btrfs-raid1 distributes redundant data blocks across an ad-hoc array
> of disks?

   None whatsoever.

> This could be a pretty powerful setup IMO - if you implemented
> something like this, you'd be able to arbitrarily define your
> storage efficiency (percentage of parity blocks / data blocks) and
> your fault-tolerance level (how many drives you can afford to lose
> before failure) WITHOUT tying it directly to your underlying disks,
> or necessarily needing to rebalance as you add more disks to the
> array.  This would be a heck of a lot more flexible than ZFS'
> approach of adding more immutable vdevs.
> 
> Please feel free to tell me why I'm dumb for either 1. not realizing
> the obvious flaw in this idea or 2. not realizing it's already being
> worked on in exactly this fashion. =)

   The latter. :)

   One of the (many) existing problems with the parity RAID
implementation as it is is that with large numbers of devices, it
becomes quite inefficient to write data when you (may) need to modify
dozens of devices. It's been Chris's stated intention for a while now
to allow a bound to be placed on the maximum number of devices per
stripe, which allows a degree of control over the space-yield <->
performance knob.

   It's one of the reasons that the usage tool [1] has a "maximum
stripes" knob on it -- so that you can see the behaviour of the FS
once that feature's in place.

   Hugo.

[1] http://carfax.org.uk/btrfs-usage/

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
         --- Nothing right in my left brain. Nothing left in ---         
                             my right brain.                             

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: btrfs-RAID(3 or 5/6/etc) like btrfs-RAID1?
  2014-02-13 16:21 ` Hugo Mills
@ 2014-02-13 16:32   ` Jim Salter
  2014-02-13 18:23     ` Hugo Mills
  0 siblings, 1 reply; 6+ messages in thread
From: Jim Salter @ 2014-02-13 16:32 UTC (permalink / raw)
  To: Hugo Mills, linux-btrfs

That is FANTASTIC news.  Thank you for wielding the LART gently. =)

I do a fair amount of public speaking and writing about next-gen 
filesystems (example: 
http://arstechnica.com/information-technology/2014/01/bitrot-and-atomic-cows-inside-next-gen-filesystems/) 
and I will be VERY sure to talk about the upcoming divorce of stripe 
size from array size in future presentations.  This makes me positively 
giddy.

FWIW, after writing the above article I got contacted by a proprietary 
storage vendor who wanted to tell me all about his midmarket/enterprise 
product, and he was pretty audibly flummoxed when I explained how 
btrfs-RAID1 distributes data and redundancy - his product does something 
similar (to be fair, his product also does a lot of other things btrfs 
doesn't inherently do, like clustered storage and synchronous dedup), 
and he had no idea that anything freely available did anything vaguely 
like it.

I have a feeling the storage world - even the relatively well-informed 
part of it that's aware of ZFS - has little to no inclination how 
gigantic of a splash btrfs is going to make when it truly hits the 
mainstream.

>> This could be a pretty powerful setup IMO - if you implemented
>> something like this, you'd be able to arbitrarily define your
>> storage efficiency (percentage of parity blocks / data blocks) and
>> your fault-tolerance level (how many drives you can afford to lose
>> before failure) WITHOUT tying it directly to your underlying disks,
>> or necessarily needing to rebalance as you add more disks to the
>> array.  This would be a heck of a lot more flexible than ZFS'
>> approach of adding more immutable vdevs.
>>
>> Please feel free to tell me why I'm dumb for either 1. not realizing
>> the obvious flaw in this idea or 2. not realizing it's already being
>> worked on in exactly this fashion. =)
>     The latter. :)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: btrfs-RAID(3 or 5/6/etc) like btrfs-RAID1?
  2014-02-13 16:32   ` Jim Salter
@ 2014-02-13 18:23     ` Hugo Mills
  0 siblings, 0 replies; 6+ messages in thread
From: Hugo Mills @ 2014-02-13 18:23 UTC (permalink / raw)
  To: Jim Salter; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2581 bytes --]

On Thu, Feb 13, 2014 at 11:32:03AM -0500, Jim Salter wrote:
> That is FANTASTIC news.  Thank you for wielding the LART gently. =)

   No LART necessary. :) Nobody knows everything, and it's not a
particularly heavily-documented or written-about feature at the moment
(mostly because it only exists in Chris's local git repo).

> I do a fair amount of public speaking and writing about next-gen
> filesystems (example: http://arstechnica.com/information-technology/2014/01/bitrot-and-atomic-cows-inside-next-gen-filesystems/)
> and I will be VERY sure to talk about the upcoming divorce of stripe
> size from array size in future presentations.  This makes me
> positively giddy.
> 
> FWIW, after writing the above article I got contacted by a
> proprietary storage vendor who wanted to tell me all about his
> midmarket/enterprise product, and he was pretty audibly flummoxed
> when I explained how btrfs-RAID1 distributes data and redundancy -
> his product does something similar (to be fair, his product also
> does a lot of other things btrfs doesn't inherently do, like
> clustered storage and synchronous dedup), and he had no idea that
> anything freely available did anything vaguely like it.

   That's quite entertaining for the bogglement factor. Although,
again, see my comment above...

   Hugo.

> I have a feeling the storage world - even the relatively
> well-informed part of it that's aware of ZFS - has little to no
> inclination how gigantic of a splash btrfs is going to make when it
> truly hits the mainstream.
> 
> >>This could be a pretty powerful setup IMO - if you implemented
> >>something like this, you'd be able to arbitrarily define your
> >>storage efficiency (percentage of parity blocks / data blocks) and
> >>your fault-tolerance level (how many drives you can afford to lose
> >>before failure) WITHOUT tying it directly to your underlying disks,
> >>or necessarily needing to rebalance as you add more disks to the
> >>array.  This would be a heck of a lot more flexible than ZFS'
> >>approach of adding more immutable vdevs.
> >>
> >>Please feel free to tell me why I'm dumb for either 1. not realizing
> >>the obvious flaw in this idea or 2. not realizing it's already being
> >>worked on in exactly this fashion. =)
> >    The latter. :)
> 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
         --- Nothing right in my left brain. Nothing left in ---         
                             my right brain.                             

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: btrfs-RAID(3 or 5/6/etc) like btrfs-RAID1?
  2014-02-13 16:13 btrfs-RAID(3 or 5/6/etc) like btrfs-RAID1? Jim Salter
  2014-02-13 16:21 ` Hugo Mills
@ 2014-02-13 20:22 ` Goffredo Baroncelli
  2014-02-13 20:52   ` Hugo Mills
  1 sibling, 1 reply; 6+ messages in thread
From: Goffredo Baroncelli @ 2014-02-13 20:22 UTC (permalink / raw)
  To: Jim Salter, linux-btrfs

Hi Jim,
On 02/13/2014 05:13 PM, Jim Salter wrote:
> This might be a stupid question but...

There is no stupid questions, only stupid answers...

> 
> Are there any plans to make parity RAID levels in btrfs similar to
> the current implementation of btrfs-raid1?
> 
> It took me a while to realize how different and powerful btrfs-raid1
> is from traditional raid1.  The ability to string together virtually
> any combination of "mutt" hard drives together in arbitrary ways and
> yet maintain redundancy is POWERFUL, and is seriously going to be a
> killer feature advancing btrfs adoption in small environments.
> 
> The one real drawback to btrfs-raid1 is that you're committed to n/2
> storage efficiency, since you're using pure redundancy rather than
> parity on the array.  I was thinking about that this morning, and
> suddenly it occurred to me that you ought to be able to create a
> striped parity array in much the same way as a btrfs-raid1 array.
> 
> Let's say you have five disks, and you arbitrarily want to define a
> stripe length of four data blocks plus one parity block per "stripe".

I what it is different from a raid5 setup (which is supported by btrfs)?

> Right now, what you're looking at effectively amounts to a RAID3
> array, like FreeBSD used to use.  But, what if we add two more disks?
> Or three more disks? Or ten more?  Is there any reason we can't keep
> our stripe length of four blocks + one parity block, and just
> distribute them relatively ad-hoc in the same way btrfs-raid1
> distributes redundant data blocks across an ad-hoc array of disks?
> 
> This could be a pretty powerful setup IMO - if you implemented
> something like this, you'd be able to arbitrarily define your storage
> efficiency (percentage of parity blocks / data blocks) and your
> fault-tolerance level (how many drives you can afford to lose before
> failure) WITHOUT tying it directly to your underlying disks

May be that it is a good idea, but which would be the advantage to 
use less drives that the available ones for a RAID ?

Regarding the fault tolerance level, few weeks ago there was a 
posting about a kernel library which would provide a generic
RAID framework capable of several degree of fault tolerance 
(raid 5,6,7...) [give a look to 
"[RFC v4 2/3] fs: btrfs: Extends btrfs/raid56 to 
support up to six parities" 2014/1/25]. This definitely would be a
big leap forward.

BTW, the raid5/raid6 support in BTRFS is only for testing purpose. 
However Chris Mason, told few week ago that he will work on these
issues.

[...]
> necessarily needing to rebalance as you add more disks to the array.
> This would be a heck of a lot more flexible than ZFS' approach of
> adding more immutable vdevs.

There is no needing to re-balance if you add more drives. The next 
chunk allocation will span all the available drives anyway. It is only 
required when you want to spans all data already written on all the drives.

Regards
Goffredo


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: btrfs-RAID(3 or 5/6/etc) like btrfs-RAID1?
  2014-02-13 20:22 ` Goffredo Baroncelli
@ 2014-02-13 20:52   ` Hugo Mills
  0 siblings, 0 replies; 6+ messages in thread
From: Hugo Mills @ 2014-02-13 20:52 UTC (permalink / raw)
  To: kreijack; +Cc: Btrfs mailing list

[-- Attachment #1: Type: text/plain, Size: 2978 bytes --]

On Thu, Feb 13, 2014 at 09:22:07PM +0100, Goffredo Baroncelli wrote:
> Hi Jim,
> On 02/13/2014 05:13 PM, Jim Salter wrote:
> > Let's say you have five disks, and you arbitrarily want to define a
> > stripe length of four data blocks plus one parity block per "stripe".
> 
> I what it is different from a raid5 setup (which is supported by btrfs)?

   With what's above, yes, that's the current RAID-5 code.

> > Right now, what you're looking at effectively amounts to a RAID3
> > array, like FreeBSD used to use.  But, what if we add two more disks?
> > Or three more disks? Or ten more?  Is there any reason we can't keep
> > our stripe length of four blocks + one parity block, and just
> > distribute them relatively ad-hoc in the same way btrfs-raid1
> > distributes redundant data blocks across an ad-hoc array of disks?
> > 
> > This could be a pretty powerful setup IMO - if you implemented
> > something like this, you'd be able to arbitrarily define your storage
> > efficiency (percentage of parity blocks / data blocks) and your
> > fault-tolerance level (how many drives you can afford to lose before
> > failure) WITHOUT tying it directly to your underlying disks
> 
> May be that it is a good idea, but which would be the advantage to 
> use less drives that the available ones for a RAID ?

   Performance, plus the ability to handle different sized drives.
Hmm... maybe I should do an "optimise" option for the space planner...

> Regarding the fault tolerance level, few weeks ago there was a 
> posting about a kernel library which would provide a generic
> RAID framework capable of several degree of fault tolerance 
> (raid 5,6,7...) [give a look to 
> "[RFC v4 2/3] fs: btrfs: Extends btrfs/raid56 to 
> support up to six parities" 2014/1/25]. This definitely would be a
> big leap forward.
> 
> BTW, the raid5/raid6 support in BTRFS is only for testing purpose. 
> However Chris Mason, told few week ago that he will work on these
> issues.
> 
> [...]
> > necessarily needing to rebalance as you add more disks to the array.
> > This would be a heck of a lot more flexible than ZFS' approach of
> > adding more immutable vdevs.
> 
> There is no needing to re-balance if you add more drives. The next 
> chunk allocation will span all the available drives anyway. It is only 
> required when you want to spans all data already written on all the drives.

   The balance opens up more usable space, unless the new device is
(some nasty function of) the remaining free space on the other drives.
It's not necessarily about spanning the data, although that's an
effect, too.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
      --- It used to take a lot of talent and a certain type of ---      
        upbringing to be perfectly polite and have filthy manners        
            at the same time. Now all it needs is a computer.            

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-02-13 20:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-13 16:13 btrfs-RAID(3 or 5/6/etc) like btrfs-RAID1? Jim Salter
2014-02-13 16:21 ` Hugo Mills
2014-02-13 16:32   ` Jim Salter
2014-02-13 18:23     ` Hugo Mills
2014-02-13 20:22 ` Goffredo Baroncelli
2014-02-13 20:52   ` Hugo Mills

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).