Linux Btrfs filesystem development
 help / color / mirror / Atom feed
* RAID0 extent sizes?
@ 2014-12-12 22:54 Robert White
  2014-12-12 22:59 ` Hugo Mills
  0 siblings, 1 reply; 5+ messages in thread
From: Robert White @ 2014-12-12 22:54 UTC (permalink / raw)
  To: Btrfs BTRFS

I've seen it mentioned here that generally data extents are 1G and 
metadata extents are 256M.

Is that per-drive or per-stripe in the case of RAID0?

That is, if I have data mode raid0 across N drives does the system 
allocate one 1G extent on each drive making the full stripe allocation 
N-gigs; or does it allocate 1/Nth(gig) on each drive making the total 
new allocation 1G?

Does the raid0 have any arity constraints (like how raid1 is always 
arity-2)?


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RAID0 extent sizes?
  2014-12-12 22:54 RAID0 extent sizes? Robert White
@ 2014-12-12 22:59 ` Hugo Mills
  2014-12-12 23:25   ` Robert White
  0 siblings, 1 reply; 5+ messages in thread
From: Hugo Mills @ 2014-12-12 22:59 UTC (permalink / raw)
  To: Robert White; +Cc: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 1338 bytes --]

On Fri, Dec 12, 2014 at 02:54:24PM -0800, Robert White wrote:
> I've seen it mentioned here that generally data extents are 1G and
> metadata extents are 256M.
> 
> Is that per-drive or per-stripe in the case of RAID0?
> 
> That is, if I have data mode raid0 across N drives does the system
> allocate one 1G extent on each drive making the full stripe
> allocation N-gigs; or does it allocate 1/Nth(gig) on each drive
> making the total new allocation 1G?
> 
> Does the raid0 have any arity constraints (like how raid1 is always
> arity-2)?

   The 1 GiB (or 256 MiB for metadata) is the allocation unit. So for
striped RAID levels (like 0, 10, 5, 6), the FS will allocate as many
as it can across all the available devices, and stripe within those.

   Now on to your question -- the stripes within the allocation unit
are 64 KiB in size, so the first 64k goes on the first device, the
next 64k on the second device, and so on.

   The minimum stripe width (e.g. number of devices) is 2 for RAID-0,
4 for RAID-10, 2 for RAID-5 and 3 for RAID-6.

   Hugo.

-- 
Hugo Mills             | I get nervous when I see words like 'mayhaps' in a
hugo@... carfax.org.uk | novel, because I fear that just round the corner is
http://carfax.org.uk/  | lurking 'forsooth'
PGP: 65E74AC0          |                                      GRRM's UK editor

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RAID0 extent sizes?
  2014-12-12 22:59 ` Hugo Mills
@ 2014-12-12 23:25   ` Robert White
  2014-12-12 23:35     ` Hugo Mills
  0 siblings, 1 reply; 5+ messages in thread
From: Robert White @ 2014-12-12 23:25 UTC (permalink / raw)
  To: Hugo Mills, Btrfs BTRFS

On 12/12/2014 02:59 PM, Hugo Mills wrote:
> On Fri, Dec 12, 2014 at 02:54:24PM -0800, Robert White wrote:
>> I've seen it mentioned here that generally data extents are 1G and
>> metadata extents are 256M.
>>
>> Is that per-drive or per-stripe in the case of RAID0?
>>
>> That is, if I have data mode raid0 across N drives does the system
>> allocate one 1G extent on each drive making the full stripe
>> allocation N-gigs; or does it allocate 1/Nth(gig) on each drive
>> making the total new allocation 1G?
>>
>> Does the raid0 have any arity constraints (like how raid1 is always
>> arity-2)?
>
>     The 1 GiB (or 256 MiB for metadata) is the allocation unit. So for
> striped RAID levels (like 0, 10, 5, 6), the FS will allocate as many
> as it can across all the available devices, and stripe within those.
>
>     Now on to your question -- the stripes within the allocation unit
> are 64 KiB in size, so the first 64k goes on the first device, the
> next 64k on the second device, and so on.
>
>     The minimum stripe width (e.g. number of devices) is 2 for RAID-0,
> 4 for RAID-10, 2 for RAID-5 and 3 for RAID-6.
>
>     Hugo.
>

[So to check my understanding, and just sticking to RAID-0 data only].

So for RAID-0 data on 5 drives with ample space, the expected outcome of 
allocating more data space is 5GiB, one 1GiB allocated on each drive.

If one drive is too full (say it was smaller) and didn't have 1G of 
contiguous space available, the allocation would simply fail.

The net effect is to create an association of allocations, one on each 
available drive that had "enough space", each of which will contribute 
exactly 1GiB to the association. So every time the data space allocation 
expands its going to expand by N-GiB total on an N-drive data=raid0 system.

Since data and metadata are separate you can end up being "out of space" 
for big files but still be able to create files small enough to fit into 
the metadata with the inode.

Am I correct?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RAID0 extent sizes?
  2014-12-12 23:25   ` Robert White
@ 2014-12-12 23:35     ` Hugo Mills
  2014-12-13  0:11       ` Chris Murphy
  0 siblings, 1 reply; 5+ messages in thread
From: Hugo Mills @ 2014-12-12 23:35 UTC (permalink / raw)
  To: Robert White; +Cc: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 2838 bytes --]

On Fri, Dec 12, 2014 at 03:25:19PM -0800, Robert White wrote:
> On 12/12/2014 02:59 PM, Hugo Mills wrote:
> >On Fri, Dec 12, 2014 at 02:54:24PM -0800, Robert White wrote:
> >>I've seen it mentioned here that generally data extents are 1G and
> >>metadata extents are 256M.
> >>
> >>Is that per-drive or per-stripe in the case of RAID0?
> >>
> >>That is, if I have data mode raid0 across N drives does the system
> >>allocate one 1G extent on each drive making the full stripe
> >>allocation N-gigs; or does it allocate 1/Nth(gig) on each drive
> >>making the total new allocation 1G?
> >>
> >>Does the raid0 have any arity constraints (like how raid1 is always
> >>arity-2)?
> >
> >    The 1 GiB (or 256 MiB for metadata) is the allocation unit. So for
> >striped RAID levels (like 0, 10, 5, 6), the FS will allocate as many
> >as it can across all the available devices, and stripe within those.
> >
> >    Now on to your question -- the stripes within the allocation unit
> >are 64 KiB in size, so the first 64k goes on the first device, the
> >next 64k on the second device, and so on.
> >
> >    The minimum stripe width (e.g. number of devices) is 2 for RAID-0,
> >4 for RAID-10, 2 for RAID-5 and 3 for RAID-6.
> >
> >    Hugo.
> >
> 
> [So to check my understanding, and just sticking to RAID-0 data only].
> 
> So for RAID-0 data on 5 drives with ample space, the expected
> outcome of allocating more data space is 5GiB, one 1GiB allocated on
> each drive.

   Correct.

> If one drive is too full (say it was smaller) and didn't have 1G of
> contiguous space available, the allocation would simply fail.

   No, it would allocate on the remaining 4 devices instead, with a
total of 4 GiB of space. The allocation in these cases is the maximum
feasible, not precisely the number of devices.

> The net effect is to create an association of allocations, one on
> each available drive that had "enough space", each of which will
> contribute exactly 1GiB to the association.

   Yes.

> So every time the data
> space allocation expands its going to expand by N-GiB total on an
> N-drive data=raid0 system.

   Not necessarily -- if one device is already full (because it's
smaller), then the number of devices will decrease as appropriate,
down to the minimum of 2.

> Since data and metadata are separate you can end up being "out of
> space" for big files but still be able to create files small enough
> to fit into the metadata with the inode.

   Yes, but this isn't related to the number of devices in striped
RAID allocations.

> Am I correct?

   Partially. :)

   Hugo.

-- 
Hugo Mills             | "How deep will this sub go?"
hugo@... carfax.org.uk | "Oh, she'll go all the way to the bottom if we don't
http://carfax.org.uk/  | stop her."
PGP: 65E74AC0          |                                                  U571

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: RAID0 extent sizes?
  2014-12-12 23:35     ` Hugo Mills
@ 2014-12-13  0:11       ` Chris Murphy
  0 siblings, 0 replies; 5+ messages in thread
From: Chris Murphy @ 2014-12-13  0:11 UTC (permalink / raw)
  To: Btrfs BTRFS

Based on looking at how identically sized, empty, qcow2 files grow
when they're added to a Btrfs volume, the 1GiB Btrfs chunk or
allocation unit, doesn't have an immediate physical allocation. It's
more of a virtual thing, but it has a physical manifestation.

Single profile, 5 disks: As data is copied, one drive has one chunk
allocated to it, and data is copied into that chunk and thus into one
qcow2 file until the qcow2 file is about 1GiB in size. Then it stops
growing and another qcow2 file starts to grow, again up to 1GiB in
size. Until all qcow2s are 1GiB. Now when everyone is identical, they
actually aren't, chances are one of them has some little bit of extra
metadata so the allocator is going to pick the block device with the
most free space next, which is how this can affect uneven sized
devices.

For raid0,5,6 I'm not sure if my interpretation is correct. But what I
see is, at the time the chunk is allocated, the block devices with
sufficient free space belong to it; and grow in 64KB increments. e.g.
5 qcow2's in a Btrfs data raid0 will grow to ~1GiB in size each as I
copy 5GiB of data to the volume. Since I used raid1 metadata in all
cases, the qcow2's are a bit uneven in practice. If I then add a 6th
qcow2, I don't immediately notice it grow. I *think* it's because the
most recent chunk is still only writing to the block devices available
at the time that chunk was created; shortly though I start seeing that
6th qcow2 grow. This suggests that this volume has 5 strip (device)
chunks; and a 6 strip chunks.

*sigh* for what it's worth, Btrfs chunk is not the same thing as mdadm
chunk. The mdadm chunk is a strip, since that's what SNIA's dictionary
calls it. A stripe is strip x numdevices. So if you have a 5 device
raid0, that's 5 strips of 64KB each, or a stripe size of 320KB meaning
it takes a file of at least 320KB to write to all 5 disks at the same
time.

And I side note that in the latest Phoronix raid tests, Btrfs is
kicking ass compared to most everything else.

Chris Murphy

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-12-13  0:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-12 22:54 RAID0 extent sizes? Robert White
2014-12-12 22:59 ` Hugo Mills
2014-12-12 23:25   ` Robert White
2014-12-12 23:35     ` Hugo Mills
2014-12-13  0:11       ` Chris Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox