Linux Btrfs filesystem development
 help / color / mirror / Atom feed
* Balance on 5-disk RAID1 put all data on 2 disks, leaving the rest empty
@ 2023-10-25 20:29 Peter Wedder
  2023-10-25 21:08 ` Remi Gauvin
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Wedder @ 2023-10-25 20:29 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org

Hello,

I had a RAID1 array on top of 4x4TB drives. Recently I removed one 4TB drive and added two 16TB drives to it. After running a full, unfiltered balance on the array, I am left in a situation where all the 4TB drives are completely empty, and all the data and metadata is on the 16TB drives. Is this normal? I was expecting to have at least some data on the smaller drives.

Using btrfs-progs v6.3.2 on kernel 6.3.11, Fedora Server 38.

# btrfs fi show
Label: none  uuid: 6f6bf357-774d-4e1f-8cad-a2ed801533a8
        Total devices 5 FS bytes used 5.57TiB
        devid    1 size 3.64TiB used 0.00B path /dev/sde
        devid    2 size 3.64TiB used 0.00B path /dev/sdd
        devid    3 size 3.64TiB used 0.00B path /dev/sda
        devid    5 size 14.55TiB used 5.58TiB path /dev/sdb
        devid    6 size 14.55TiB used 5.58TiB path /dev/sdf


# btrfs device usage /media/raid1
/dev/sde, ID: 1
   Device size:             3.64TiB
   Device slack:              0.00B
   Unallocated:             3.64TiB

/dev/sdd, ID: 2
   Device size:             3.64TiB
   Device slack:              0.00B
   Unallocated:             3.64TiB

/dev/sda, ID: 3
   Device size:             3.64TiB
   Device slack:              0.00B
   Unallocated:             3.64TiB

/dev/sdb, ID: 5
   Device size:            14.55TiB
   Device slack:              0.00B
   Data,RAID1:              5.58TiB
   Metadata,RAID1:          8.00GiB
   System,RAID1:           32.00MiB
   Unallocated:             8.97TiB

/dev/sdf, ID: 6
   Device size:            14.55TiB
   Device slack:              0.00B
   Data,RAID1:              5.58TiB
   Metadata,RAID1:          8.00GiB
   System,RAID1:           32.00MiB
   Unallocated:             8.97TiB

# btrfs filesystem usage /media/raid1
Overall:
    Device size:                  40.02TiB
    Device allocated:             11.17TiB
    Device unallocated:           28.85TiB
    Device missing:                  0.00B
    Device slack:                    0.00B
    Used:                         11.14TiB
    Free (estimated):             14.44TiB      (min: 14.44TiB)
    Free (statfs, df):            12.62TiB
    Data ratio:                       2.00
    Metadata ratio:                   2.00
    Global reserve:              512.00MiB      (used: 0.00B)
    Multiple profiles:                  no

Data,RAID1: Size:5.58TiB, Used:5.56TiB (99.71%)
   /dev/sdb        5.58TiB
   /dev/sdf        5.58TiB

Metadata,RAID1: Size:8.00GiB, Used:6.87GiB (85.93%)
   /dev/sdb        8.00GiB
   /dev/sdf        8.00GiB

System,RAID1: Size:32.00MiB, Used:816.00KiB (2.49%)
   /dev/sdb       32.00MiB
   /dev/sdf       32.00MiB

Unallocated:
   /dev/sde        3.64TiB
   /dev/sdd        3.64TiB
   /dev/sda        3.64TiB
   /dev/sdb        8.97TiB
   /dev/sdf        8.97TiB


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Balance on 5-disk RAID1 put all data on 2 disks, leaving the rest empty
  2023-10-25 20:29 Balance on 5-disk RAID1 put all data on 2 disks, leaving the rest empty Peter Wedder
@ 2023-10-25 21:08 ` Remi Gauvin
  2023-10-25 21:15   ` Roman Mamedov
  0 siblings, 1 reply; 9+ messages in thread
From: Remi Gauvin @ 2023-10-25 21:08 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org

On 2023-10-25 4:29 p.m., Peter Wedder wrote:
> Hello,
>
> I had a RAID1 array on top of 4x4TB drives. Recently I removed one 4TB drive and added two 16TB drives to it. After running a full, unfiltered balance on the array, I am left in a situation where all the 4TB drives are completely empty, and all the data and metadata is on the 16TB drives. Is this normal? I was expecting to have at least some data on the smaller drives.
>

Yes, this is normal.  The BTRFS allocates space in drives with the the
most available free space.  The idea is to balance the 'unallocated'
space on each drive, so they can be filled evenly.  The 4TB drives will
be used when the 16TB dives have less than 4TB unallocated.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Balance on 5-disk RAID1 put all data on 2 disks, leaving the rest empty
  2023-10-25 21:08 ` Remi Gauvin
@ 2023-10-25 21:15   ` Roman Mamedov
  2023-10-27  4:21     ` Anand Jain
  0 siblings, 1 reply; 9+ messages in thread
From: Roman Mamedov @ 2023-10-25 21:15 UTC (permalink / raw)
  To: Remi Gauvin; +Cc: linux-btrfs@vger.kernel.org

On Wed, 25 Oct 2023 17:08:08 -0400
Remi Gauvin <remi@georgianit.com> wrote:

> On 2023-10-25 4:29 p.m., Peter Wedder wrote:
> > Hello,
> >
> > I had a RAID1 array on top of 4x4TB drives. Recently I removed one 4TB drive and added two 16TB drives to it. After running a full, unfiltered balance on the array, I am left in a situation where all the 4TB drives are completely empty, and all the data and metadata is on the 16TB drives. Is this normal? I was expecting to have at least some data on the smaller drives.
> >
> 
> Yes, this is normal.  The BTRFS allocates space in drives with the the
> most available free space.  The idea is to balance the 'unallocated'
> space on each drive, so they can be filled evenly.  The 4TB drives will
> be used when the 16TB dives have less than 4TB unallocated.

Interesting question and resolution. I'd be surprised by that as well.

Now, a great chance to "btrfs dev delete" all three remaining 4TB drives and
unplug them for the time being, to save on noise, heat and power consumption!

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Balance on 5-disk RAID1 put all data on 2 disks, leaving the rest empty
  2023-10-25 21:15   ` Roman Mamedov
@ 2023-10-27  4:21     ` Anand Jain
  2023-11-01 19:20       ` Pedro Macedo
  0 siblings, 1 reply; 9+ messages in thread
From: Anand Jain @ 2023-10-27  4:21 UTC (permalink / raw)
  To: Roman Mamedov, Remi Gauvin; +Cc: linux-btrfs@vger.kernel.org

On 10/26/23 05:15, Roman Mamedov wrote:
> On Wed, 25 Oct 2023 17:08:08 -0400
> Remi Gauvin <remi@georgianit.com> wrote:
> 
>> On 2023-10-25 4:29 p.m., Peter Wedder wrote:
>>> Hello,
>>>
>>> I had a RAID1 array on top of 4x4TB drives. Recently I removed one 4TB drive and added two 16TB drives to it. After running a full, unfiltered balance on the array, I am left in a situation where all the 4TB drives are completely empty, and all the data and metadata is on the 16TB drives. Is this normal? I was expecting to have at least some data on the smaller drives.
>>>
>>
>> Yes, this is normal.  The BTRFS allocates space in drives with the the
>> most available free space.  The idea is to balance the 'unallocated'
>> space on each drive, so they can be filled evenly.  The 4TB drives will
>> be used when the 16TB dives have less than 4TB unallocated.
> 

Correct. That's the only allocation method we have at the moment. Do you
have any feedback on whether there are any other allocation methods that
make sense?

Thanks, Anand

> Interesting question and resolution. I'd be surprised by that as well.
> 
> Now, a great chance to "btrfs dev delete" all three remaining 4TB drives and
> unplug them for the time being, to save on noise, heat and power consumption!


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Balance on 5-disk RAID1 put all data on 2 disks, leaving the rest empty
  2023-10-27  4:21     ` Anand Jain
@ 2023-11-01 19:20       ` Pedro Macedo
  2023-11-02  2:13         ` Zygo Blaxell
  0 siblings, 1 reply; 9+ messages in thread
From: Pedro Macedo @ 2023-11-01 19:20 UTC (permalink / raw)
  To: Anand Jain, Roman Mamedov, Remi Gauvin; +Cc: linux-btrfs@vger.kernel.org


On 27.10.23 06:21, Anand Jain wrote:
> On 10/26/23 05:15, Roman Mamedov wrote:
>> On Wed, 25 Oct 2023 17:08:08 -0400
>> Remi Gauvin <remi@georgianit.com> wrote:
>>
>>> On 2023-10-25 4:29 p.m., Peter Wedder wrote:
>>>> Hello,
>>>>
>>>> I had a RAID1 array on top of 4x4TB drives. Recently I removed one 
>>>> 4TB drive and added two 16TB drives to it. After running a full, 
>>>> unfiltered balance on the array, I am left in a situation where all 
>>>> the 4TB drives are completely empty, and all the data and metadata 
>>>> is on the 16TB drives. Is this normal? I was expecting to have at 
>>>> least some data on the smaller drives.
>>>>
>>>
>>> Yes, this is normal.  The BTRFS allocates space in drives with the the
>>> most available free space.  The idea is to balance the 'unallocated'
>>> space on each drive, so they can be filled evenly.  The 4TB drives will
>>> be used when the 16TB dives have less than 4TB unallocated.
>>
>
> Correct. That's the only allocation method we have at the moment. Do you
> have any feedback on whether there are any other allocation methods that
> make sense?


IMHO, based on the frequency of this question appearing here/on 
reddit/other sites, perhaps allocation by absolute space used?  It 
should fit the expectations of most folks that if you have free space on 
a disk it will be utilised, plus has potential performance implications 
by always using as many devices as possible to write to as long as they 
have any space left.

Regards,

Pedro


>
> Thanks, Anand
>
>> Interesting question and resolution. I'd be surprised by that as well.
>>
>> Now, a great chance to "btrfs dev delete" all three remaining 4TB 
>> drives and
>> unplug them for the time being, to save on noise, heat and power 
>> consumption!
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Balance on 5-disk RAID1 put all data on 2 disks, leaving the rest empty
  2023-11-01 19:20       ` Pedro Macedo
@ 2023-11-02  2:13         ` Zygo Blaxell
  2023-11-02  5:11           ` Paul Jones
  0 siblings, 1 reply; 9+ messages in thread
From: Zygo Blaxell @ 2023-11-02  2:13 UTC (permalink / raw)
  To: Pedro Macedo
  Cc: Anand Jain, Roman Mamedov, Remi Gauvin,
	linux-btrfs@vger.kernel.org

On Wed, Nov 01, 2023 at 08:20:56PM +0100, Pedro Macedo wrote:
> 
> On 27.10.23 06:21, Anand Jain wrote:
> > On 10/26/23 05:15, Roman Mamedov wrote:
> > > On Wed, 25 Oct 2023 17:08:08 -0400
> > > Remi Gauvin <remi@georgianit.com> wrote:
> > > 
> > > > On 2023-10-25 4:29 p.m., Peter Wedder wrote:
> > > > > Hello,
> > > > > 
> > > > > I had a RAID1 array on top of 4x4TB drives. Recently I
> > > > > removed one 4TB drive and added two 16TB drives to it. After
> > > > > running a full, unfiltered balance on the array, I am left
> > > > > in a situation where all the 4TB drives are completely
> > > > > empty, and all the data and metadata is on the 16TB drives.
> > > > > Is this normal? I was expecting to have at least some data
> > > > > on the smaller drives.
> > > > > 
> > > > 
> > > > Yes, this is normal.  The BTRFS allocates space in drives with the the
> > > > most available free space.  The idea is to balance the 'unallocated'
> > > > space on each drive, so they can be filled evenly.  The 4TB drives will
> > > > be used when the 16TB dives have less than 4TB unallocated.
> > > 
> > 
> > Correct. That's the only allocation method we have at the moment. Do you
> > have any feedback on whether there are any other allocation methods that
> > make sense?
> 
> 
> IMHO, based on the frequency of this question appearing here/on reddit/other
> sites, perhaps allocation by absolute space used?  It should fit the
> expectations of most folks that if you have free space on a disk it will be
> utilised, plus has potential performance implications by always using as
> many devices as possible to write to as long as they have any space left.

That is how allocation works with striped profiles:  chunks are allocated
using space from all non-full drives, in order to use space and iops
optimally.

For a non-striped profile like raid1, it's not possible to use all the
space without filling the larger devices first.  As the large devices
fill up, their free space becomes equal in size to the smaller devices,
and it's always possible to completely fill a raid1 array of equal-sized
devices.  If raid1 distributed data across the small devices at the same
time as the large devices, it would run out of space on small devices
before running out of space on the large ones, so significant space on
some devices would be wasted.

In some cases you really do want the data distributed across all the small
devices first, even though some of the space can't be used at first.
e.g. you plan to replace the small devices with larger ones later,
and you don't want to have to do an expensive balance operation each
time you replace a small device with a large one.  In that case, you
can use 'btrfs fi resize' to set the larger devices to the same size
as the smaller devices.  That will provide the equal filling of the
devices you want as the small devices fill up, and it will run out of
space when the only free space remaining is all on the large device.
Before that happens, you can replace the small devices with larger ones,
resize all the devices to the same size as the large one, and fill the
devices equally until all available space is used.  You'd have to manage
the device sizes yourself, because there's no way btrfs could guess you
planned to do this in advance.



> Regards,
> 
> Pedro
> 
> 
> > 
> > Thanks, Anand
> > 
> > > Interesting question and resolution. I'd be surprised by that as well.
> > > 
> > > Now, a great chance to "btrfs dev delete" all three remaining 4TB
> > > drives and
> > > unplug them for the time being, to save on noise, heat and power
> > > consumption!
> > 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Balance on 5-disk RAID1 put all data on 2 disks, leaving the rest empty
  2023-11-02  2:13         ` Zygo Blaxell
@ 2023-11-02  5:11           ` Paul Jones
  2023-11-02 13:50             ` Zygo Blaxell
  0 siblings, 1 reply; 9+ messages in thread
From: Paul Jones @ 2023-11-02  5:11 UTC (permalink / raw)
  To: Zygo Blaxell, Pedro Macedo
  Cc: Anand Jain, Roman Mamedov, Remi Gauvin,
	linux-btrfs@vger.kernel.org

> -----Original Message-----
> From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
> Sent: Thursday, November 2, 2023 1:13 PM
> To: Pedro Macedo <pmacedo@pmacedo.com>
> Cc: Anand Jain <anand.jain@oracle.com>; Roman Mamedov
> <rm@romanrm.net>; Remi Gauvin <remi@georgianit.com>; linux-
> btrfs@vger.kernel.org
> Subject: Re: Balance on 5-disk RAID1 put all data on 2 disks, leaving the rest
> empty
> 
> On Wed, Nov 01, 2023 at 08:20:56PM +0100, Pedro Macedo wrote:
> >
> > On 27.10.23 06:21, Anand Jain wrote:
> > > On 10/26/23 05:15, Roman Mamedov wrote:
> > > > On Wed, 25 Oct 2023 17:08:08 -0400 Remi Gauvin
> > > > <remi@georgianit.com> wrote:
> > > >
> > > > > On 2023-10-25 4:29 p.m., Peter Wedder wrote:
> > > > > > Hello,
> > > > > >
> > > > > > I had a RAID1 array on top of 4x4TB drives. Recently I removed
> > > > > > one 4TB drive and added two 16TB drives to it. After running a
> > > > > > full, unfiltered balance on the array, I am left in a
> > > > > > situation where all the 4TB drives are completely empty, and
> > > > > > all the data and metadata is on the 16TB drives.
> > > > > > Is this normal? I was expecting to have at least some data on
> > > > > > the smaller drives.
> > > > > >
> > > > >
> > > > > Yes, this is normal.  The BTRFS allocates space in drives with
> > > > > the the most available free space.  The idea is to balance the
> 'unallocated'
> > > > > space on each drive, so they can be filled evenly.  The 4TB
> > > > > drives will be used when the 16TB dives have less than 4TB
> unallocated.
> > > >
> > >
> > > Correct. That's the only allocation method we have at the moment. Do
> > > you have any feedback on whether there are any other allocation
> > > methods that make sense?
> >
> >
> > IMHO, based on the frequency of this question appearing here/on
> > reddit/other sites, perhaps allocation by absolute space used?  It
> > should fit the expectations of most folks that if you have free space
> > on a disk it will be utilised, plus has potential performance
> > implications by always using as many devices as possible to write to as long
> as they have any space left.
> 
> That is how allocation works with striped profiles:  chunks are allocated using
> space from all non-full drives, in order to use space and iops optimally.
> 
> For a non-striped profile like raid1, it's not possible to use all the space
> without filling the larger devices first.  As the large devices fill up, their free
> space becomes equal in size to the smaller devices, and it's always possible to
> completely fill a raid1 array of equal-sized devices.  If raid1 distributed data
> across the small devices at the same time as the large devices, it would run
> out of space on small devices before running out of space on the large ones,
> so significant space on some devices would be wasted.

I was always under the impression that space was allocated from the emptiest drive(s) on a percentage basis. Was that ever the case and has since changed? That seems like the most optimal way to do it.


Paul.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Balance on 5-disk RAID1 put all data on 2 disks, leaving the rest empty
  2023-11-02  5:11           ` Paul Jones
@ 2023-11-02 13:50             ` Zygo Blaxell
  2023-11-02 23:57               ` waxhead
  0 siblings, 1 reply; 9+ messages in thread
From: Zygo Blaxell @ 2023-11-02 13:50 UTC (permalink / raw)
  To: Paul Jones
  Cc: Pedro Macedo, Anand Jain, Roman Mamedov, Remi Gauvin,
	linux-btrfs@vger.kernel.org

On Thu, Nov 02, 2023 at 05:11:00AM +0000, Paul Jones wrote:
> > -----Original Message-----
> > From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
> > Sent: Thursday, November 2, 2023 1:13 PM
> > To: Pedro Macedo <pmacedo@pmacedo.com>
> > Cc: Anand Jain <anand.jain@oracle.com>; Roman Mamedov
> > <rm@romanrm.net>; Remi Gauvin <remi@georgianit.com>; linux-
> > btrfs@vger.kernel.org
> > Subject: Re: Balance on 5-disk RAID1 put all data on 2 disks, leaving the rest
> > empty
> > 
> > On Wed, Nov 01, 2023 at 08:20:56PM +0100, Pedro Macedo wrote:
> > >
> > > On 27.10.23 06:21, Anand Jain wrote:
> > > > On 10/26/23 05:15, Roman Mamedov wrote:
> > > > > On Wed, 25 Oct 2023 17:08:08 -0400 Remi Gauvin
> > > > > <remi@georgianit.com> wrote:
> > > > >
> > > > > > On 2023-10-25 4:29 p.m., Peter Wedder wrote:
> > > > > > > Hello,
> > > > > > >
> > > > > > > I had a RAID1 array on top of 4x4TB drives. Recently I removed
> > > > > > > one 4TB drive and added two 16TB drives to it. After running a
> > > > > > > full, unfiltered balance on the array, I am left in a
> > > > > > > situation where all the 4TB drives are completely empty, and
> > > > > > > all the data and metadata is on the 16TB drives.
> > > > > > > Is this normal? I was expecting to have at least some data on
> > > > > > > the smaller drives.
> > > > > > >
> > > > > >
> > > > > > Yes, this is normal.  The BTRFS allocates space in drives with
> > > > > > the the most available free space.  The idea is to balance the
> > 'unallocated'
> > > > > > space on each drive, so they can be filled evenly.  The 4TB
> > > > > > drives will be used when the 16TB dives have less than 4TB
> > unallocated.
> > > > >
> > > >
> > > > Correct. That's the only allocation method we have at the moment. Do
> > > > you have any feedback on whether there are any other allocation
> > > > methods that make sense?
> > >
> > >
> > > IMHO, based on the frequency of this question appearing here/on
> > > reddit/other sites, perhaps allocation by absolute space used?  It
> > > should fit the expectations of most folks that if you have free space
> > > on a disk it will be utilised, plus has potential performance
> > > implications by always using as many devices as possible to write to as long
> > as they have any space left.
> > 
> > That is how allocation works with striped profiles:  chunks are allocated using
> > space from all non-full drives, in order to use space and iops optimally.
> > 
> > For a non-striped profile like raid1, it's not possible to use all the space
> > without filling the larger devices first.  As the large devices fill up, their free
> > space becomes equal in size to the smaller devices, and it's always possible to
> > completely fill a raid1 array of equal-sized devices.  If raid1 distributed data
> > across the small devices at the same time as the large devices, it would run
> > out of space on small devices before running out of space on the large ones,
> > so significant space on some devices would be wasted.
> 
> I was always under the impression that space was allocated from the
> emptiest drive(s) on a percentage basis. Was that ever the case and
> has since changed? That seems like the most optimal way to do it.

The current behavior was introduced in 2011, and hasn't changed since
except for regressions in 2015, 2022, and 2023 (now fixed).  Support for
zoned devices was added in 2020, but it doesn't affect regular device
behavior.

btrfs finds the largest contiguous free space block >= 1 GiB on each
device (using the lowest offset to break ties), then creates a chunk
using up to 1 GiB from each of the top N devices with the largest free
byte count (using devid to break ties), where N is the maximum number
of devices supported by the profile.

You could replace "largest free byte count" with "largest proportion
of free space" in the above, but that would only make sense if the
filesystem had never had drives added or replaced.  e.g. in cases where
you had already filled some devices, then replaced them with larger ones,
the space available on a device would not be correlated to its size
at all.

> 
> Paul.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Balance on 5-disk RAID1 put all data on 2 disks, leaving the rest empty
  2023-11-02 13:50             ` Zygo Blaxell
@ 2023-11-02 23:57               ` waxhead
  0 siblings, 0 replies; 9+ messages in thread
From: waxhead @ 2023-11-02 23:57 UTC (permalink / raw)
  To: Zygo Blaxell, Paul Jones
  Cc: Pedro Macedo, Anand Jain, Roman Mamedov, Remi Gauvin,
	linux-btrfs@vger.kernel.org

Zygo Blaxell wrote:
> On Thu, Nov 02, 2023 at 05:11:00AM +0000, Paul Jones wrote:
>>> -----Original Message-----
>>> From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
>>> Sent: Thursday, November 2, 2023 1:13 PM
>>> To: Pedro Macedo <pmacedo@pmacedo.com>
>>> Cc: Anand Jain <anand.jain@oracle.com>; Roman Mamedov
>>> <rm@romanrm.net>; Remi Gauvin <remi@georgianit.com>; linux-
>>> btrfs@vger.kernel.org
>>> Subject: Re: Balance on 5-disk RAID1 put all data on 2 disks, leaving the rest
>>> empty
>>>
>>> On Wed, Nov 01, 2023 at 08:20:56PM +0100, Pedro Macedo wrote:
>>>>
>>>> On 27.10.23 06:21, Anand Jain wrote:
>>>>> On 10/26/23 05:15, Roman Mamedov wrote:
>>>>>> On Wed, 25 Oct 2023 17:08:08 -0400 Remi Gauvin
>>>>>> <remi@georgianit.com> wrote:
>>>>>>
>>>>>>> On 2023-10-25 4:29 p.m., Peter Wedder wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I had a RAID1 array on top of 4x4TB drives. Recently I removed
>>>>>>>> one 4TB drive and added two 16TB drives to it. After running a
>>>>>>>> full, unfiltered balance on the array, I am left in a
>>>>>>>> situation where all the 4TB drives are completely empty, and
>>>>>>>> all the data and metadata is on the 16TB drives.
>>>>>>>> Is this normal? I was expecting to have at least some data on
>>>>>>>> the smaller drives.
>>>>>>>>
>>>>>>>
>>>>>>> Yes, this is normal.  The BTRFS allocates space in drives with
>>>>>>> the the most available free space.  The idea is to balance the
>>> 'unallocated'
>>>>>>> space on each drive, so they can be filled evenly.  The 4TB
>>>>>>> drives will be used when the 16TB dives have less than 4TB
>>> unallocated.
>>>>>>
>>>>>
>>>>> Correct. That's the only allocation method we have at the moment. Do
>>>>> you have any feedback on whether there are any other allocation
>>>>> methods that make sense?
>>>>
>>>>
>>>> IMHO, based on the frequency of this question appearing here/on
>>>> reddit/other sites, perhaps allocation by absolute space used?  It
>>>> should fit the expectations of most folks that if you have free space
>>>> on a disk it will be utilised, plus has potential performance
>>>> implications by always using as many devices as possible to write to as long
>>> as they have any space left.
>>>
>>> That is how allocation works with striped profiles:  chunks are allocated using
>>> space from all non-full drives, in order to use space and iops optimally.
>>>
>>> For a non-striped profile like raid1, it's not possible to use all the space
>>> without filling the larger devices first.  As the large devices fill up, their free
>>> space becomes equal in size to the smaller devices, and it's always possible to
>>> completely fill a raid1 array of equal-sized devices.  If raid1 distributed data
>>> across the small devices at the same time as the large devices, it would run
>>> out of space on small devices before running out of space on the large ones,
>>> so significant space on some devices would be wasted.
>>
>> I was always under the impression that space was allocated from the
>> emptiest drive(s) on a percentage basis. Was that ever the case and
>> has since changed? That seems like the most optimal way to do it.
> 
> The current behavior was introduced in 2011, and hasn't changed since
> except for regressions in 2015, 2022, and 2023 (now fixed).  Support for
> zoned devices was added in 2020, but it doesn't affect regular device
> behavior.
> 
> btrfs finds the largest contiguous free space block >= 1 GiB on each
> device (using the lowest offset to break ties), then creates a chunk
> using up to 1 GiB from each of the top N devices with the largest free
> byte count (using devid to break ties), where N is the maximum number
> of devices supported by the profile.
> 
> You could replace "largest free byte count" with "largest proportion
> of free space" in the above, but that would only make sense if the
> filesystem had never had drives added or replaced.  e.g. in cases where
> you had already filled some devices, then replaced them with larger ones,
> the space available on a device would not be correlated to its size
> at all.
> 
>>
>> Paul.

I am surprised that nobody mentioned RAID10 to Peter. It will try to 
fill up all devices first and "degrade" to RAID1 when it has to. So in 
rough terms it works in reverse of RAID1 as far as filling up devices 
goes. And BTRFS' version of RAID10 does not offer better redundancy than 
RAID1 anyway.

Also kind of off topic, but kind of related.I feel like mentioning 
something I have talked about before. With the new raid stripe tree (and 
extent v2 tree?) in the pipeline it is perhaps worth bringing up my old 
idea of assigning storage devices to groups where one could do more 
advanced stuff like assigning weight to certain devices, allocation / 
redundancy policies, and/or assignment of subvolumes to them etc...
I believe the case mentioned here is something that could have been 
solved with such an ability. Albeit it might create more complexity than 
it solves if too advanced. (sorry for getting carried away).

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-11-02 23:57 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-25 20:29 Balance on 5-disk RAID1 put all data on 2 disks, leaving the rest empty Peter Wedder
2023-10-25 21:08 ` Remi Gauvin
2023-10-25 21:15   ` Roman Mamedov
2023-10-27  4:21     ` Anand Jain
2023-11-01 19:20       ` Pedro Macedo
2023-11-02  2:13         ` Zygo Blaxell
2023-11-02  5:11           ` Paul Jones
2023-11-02 13:50             ` Zygo Blaxell
2023-11-02 23:57               ` waxhead

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox