Rebalancing raid1 after adding a device

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Rebalancing raid1 after adding a device
@ 2019-06-18 18:26 Stéphane Lesimple
  2019-06-18 18:45 ` Hugo Mills
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Stéphane Lesimple @ 2019-06-18 18:26 UTC (permalink / raw)
  To: linux-btrfs

Hello,

I've been a btrfs user for quite a number of years now, but it seems I need 
the wiseness of the btrfs gurus on this one!

I have a 5-hdd btrfs raid1 setup with 4x3T+1x10T drives.
A few days ago, I replaced one of the 3T by a new 10T, running btrfs 
replace and then resizing the FS to use all the available space of the new 
device.

The filesystem was 90% full before I expanded it so, as expected, most of 
the space on the new device wasn't actually allocatable in raid1, as very 
few available space was available on the 4 other devs.

Of course the solution is to run a balance, but as the filesystem is now 
quite big, I'd like to avoid running a full rebalance. This would be quite 
i/o intensive, would be running for several days, and putting and 
unecessary stress on the drives. This also seems excessive as in theory 
only some Tb would need to be moved: if I'm correct, only one of two block 
groups of a sufficient amount of chunks to be moved to the new device so 
that the sum of the amount of available space on the 4 preexisting devices 
would at least equal the available space on the new device, ~7Tb instead of 
moving ~22T.
I don't need to have a perfectly balanced FS, I just want all the space to 
be allocatable.

I tried using the -ddevid option but it only instructs btrfs to work on the 
block groups allocated on said device, as it happens, it tends to move data 
between the 4 preexisting devices and doesn't fix my problem. A full 
balance with -dlimit=100 did no better.

Is there a way to ask the block group allocator to prefer writing to a 
specific device during a balance? Something like -ddestdevid=N? This would 
just be a hint to the allocator and the usual constraints would always 
apply (and prevail over the hint when needed).

Or is there any obvious solution I'm completely missing?

Thanks,

Stéphane.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Rebalancing raid1 after adding a device
  2019-06-18 18:26 Rebalancing raid1 after adding a device Stéphane Lesimple
@ 2019-06-18 18:45 ` Hugo Mills
  2019-06-18 18:50   ` Austin S. Hemmelgarn
                     ` (2 more replies)
  2019-06-18 19:06 ` Austin S. Hemmelgarn
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 16+ messages in thread
From: Hugo Mills @ 2019-06-18 18:45 UTC (permalink / raw)
  To: Stéphane Lesimple; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3309 bytes --]

On Tue, Jun 18, 2019 at 08:26:32PM +0200, Stéphane Lesimple wrote:
> I've been a btrfs user for quite a number of years now, but it seems
> I need the wiseness of the btrfs gurus on this one!
> 
> I have a 5-hdd btrfs raid1 setup with 4x3T+1x10T drives.
> A few days ago, I replaced one of the 3T by a new 10T, running btrfs
> replace and then resizing the FS to use all the available space of
> the new device.
> 
> The filesystem was 90% full before I expanded it so, as expected,
> most of the space on the new device wasn't actually allocatable in
> raid1, as very few available space was available on the 4 other
> devs.
> 
> Of course the solution is to run a balance, but as the filesystem is
> now quite big, I'd like to avoid running a full rebalance. This
> would be quite i/o intensive, would be running for several days, and
> putting and unecessary stress on the drives. This also seems
> excessive as in theory only some Tb would need to be moved: if I'm
> correct, only one of two block groups of a sufficient amount of
> chunks to be moved to the new device so that the sum of the amount
> of available space on the 4 preexisting devices would at least equal
> the available space on the new device, ~7Tb instead of moving ~22T.
> I don't need to have a perfectly balanced FS, I just want all the
> space to be allocatable.
> 
> I tried using the -ddevid option but it only instructs btrfs to work
> on the block groups allocated on said device, as it happens, it
> tends to move data between the 4 preexisting devices and doesn't fix
> my problem. A full balance with -dlimit=100 did no better.

   -dlimit=100 will only move 100 GiB of data (i.e. 200 GiB), so it'll
be a pretty limited change. You'll need to use a larger number than
that if you want it to have a significant visible effect.

   The -ddevid=<old_10T> option would be my recommendation. It's got
more chunks on it, so they're likely to have their copies spread
across the other four devices. This should help with the
balance.

   Alternatively, just do a full balance and then cancel it when the
amount of unallocated space is reasonably well spread across the
devices (specifically, the new device's unallocated space is less than
the sum of the unallocated space on the other devices).

> Is there a way to ask the block group allocator to prefer writing to
> a specific device during a balance? Something like -ddestdevid=N?
> This would just be a hint to the allocator and the usual constraints
> would always apply (and prevail over the hint when needed).

   No, there isn't. Having control over the allocator (or bypassing
it) would be pretty difficult to implement, I think.

   It would be really great if there was an ioctl that allowed you to
say things like "take the chunks of this block group and put them on
devices 2, 4 and 5 in RAID-5", because you could do a load of
optimisation with reshaping the FS in userspace with that. But I
suspect it's a long way down the list of things to do.

> Or is there any obvious solution I'm completely missing?

   I don't think so.

   Hugo.

-- 
Hugo Mills             | Great films about cricket: Umpire of the Rising Sun
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Rebalancing raid1 after adding a device
  2019-06-18 18:45 ` Hugo Mills
@ 2019-06-18 18:50   ` Austin S. Hemmelgarn
  2019-06-18 18:57     ` Hugo Mills
  2019-06-18 18:57     ` Chris Murphy
  2019-06-19  3:27   ` Andrei Borzenkov
  2019-06-19 11:59   ` Supercilious Dude
  2 siblings, 2 replies; 16+ messages in thread
From: Austin S. Hemmelgarn @ 2019-06-18 18:50 UTC (permalink / raw)
  To: Hugo Mills, Stéphane Lesimple, linux-btrfs

On 2019-06-18 14:45, Hugo Mills wrote:
> On Tue, Jun 18, 2019 at 08:26:32PM +0200, Stéphane Lesimple wrote:
>> I've been a btrfs user for quite a number of years now, but it seems
>> I need the wiseness of the btrfs gurus on this one!
>>
>> I have a 5-hdd btrfs raid1 setup with 4x3T+1x10T drives.
>> A few days ago, I replaced one of the 3T by a new 10T, running btrfs
>> replace and then resizing the FS to use all the available space of
>> the new device.
>>
>> The filesystem was 90% full before I expanded it so, as expected,
>> most of the space on the new device wasn't actually allocatable in
>> raid1, as very few available space was available on the 4 other
>> devs.
>>
>> Of course the solution is to run a balance, but as the filesystem is
>> now quite big, I'd like to avoid running a full rebalance. This
>> would be quite i/o intensive, would be running for several days, and
>> putting and unecessary stress on the drives. This also seems
>> excessive as in theory only some Tb would need to be moved: if I'm
>> correct, only one of two block groups of a sufficient amount of
>> chunks to be moved to the new device so that the sum of the amount
>> of available space on the 4 preexisting devices would at least equal
>> the available space on the new device, ~7Tb instead of moving ~22T.
>> I don't need to have a perfectly balanced FS, I just want all the
>> space to be allocatable.
>>
>> I tried using the -ddevid option but it only instructs btrfs to work
>> on the block groups allocated on said device, as it happens, it
>> tends to move data between the 4 preexisting devices and doesn't fix
>> my problem. A full balance with -dlimit=100 did no better.
> 
>     -dlimit=100 will only move 100 GiB of data (i.e. 200 GiB), so it'll
> be a pretty limited change. You'll need to use a larger number than
> that if you want it to have a significant visible effect.
Last I checked, that's not how the limit filter works.  AFAIUI, it's an 
upper limit on how full a chunk can be to be considered for the balance 
operation.  So, balancing with only `-dlimit=100` should actually 
balance all data chunks (but only data chunks, because you haven't asked 
for metadata balancing).
> 
>     The -ddevid=<old_10T> option would be my recommendation. It's got
> more chunks on it, so they're likely to have their copies spread
> across the other four devices. This should help with the
> balance.
> 
>     Alternatively, just do a full balance and then cancel it when the
> amount of unallocated space is reasonably well spread across the
> devices (specifically, the new device's unallocated space is less than
> the sum of the unallocated space on the other devices).
> 
>> Is there a way to ask the block group allocator to prefer writing to
>> a specific device during a balance? Something like -ddestdevid=N?
>> This would just be a hint to the allocator and the usual constraints
>> would always apply (and prevail over the hint when needed).
> 
>     No, there isn't. Having control over the allocator (or bypassing
> it) would be pretty difficult to implement, I think.
> 
>     It would be really great if there was an ioctl that allowed you to
> say things like "take the chunks of this block group and put them on
> devices 2, 4 and 5 in RAID-5", because you could do a load of
> optimisation with reshaping the FS in userspace with that. But I
> suspect it's a long way down the list of things to do.
> 
>> Or is there any obvious solution I'm completely missing?
> 
>     I don't think so.
> 
>     Hugo.
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Rebalancing raid1 after adding a device
  2019-06-18 18:50   ` Austin S. Hemmelgarn
@ 2019-06-18 18:57     ` Hugo Mills
  2019-06-18 18:58       ` Austin S. Hemmelgarn
  2019-06-18 18:57     ` Chris Murphy
  1 sibling, 1 reply; 16+ messages in thread
From: Hugo Mills @ 2019-06-18 18:57 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Stéphane Lesimple, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2622 bytes --]

On Tue, Jun 18, 2019 at 02:50:34PM -0400, Austin S. Hemmelgarn wrote:
> On 2019-06-18 14:45, Hugo Mills wrote:
> >On Tue, Jun 18, 2019 at 08:26:32PM +0200, Stéphane Lesimple wrote:
> >>I've been a btrfs user for quite a number of years now, but it seems
> >>I need the wiseness of the btrfs gurus on this one!
> >>
> >>I have a 5-hdd btrfs raid1 setup with 4x3T+1x10T drives.
> >>A few days ago, I replaced one of the 3T by a new 10T, running btrfs
> >>replace and then resizing the FS to use all the available space of
> >>the new device.
> >>
> >>The filesystem was 90% full before I expanded it so, as expected,
> >>most of the space on the new device wasn't actually allocatable in
> >>raid1, as very few available space was available on the 4 other
> >>devs.
> >>
> >>Of course the solution is to run a balance, but as the filesystem is
> >>now quite big, I'd like to avoid running a full rebalance. This
> >>would be quite i/o intensive, would be running for several days, and
> >>putting and unecessary stress on the drives. This also seems
> >>excessive as in theory only some Tb would need to be moved: if I'm
> >>correct, only one of two block groups of a sufficient amount of
> >>chunks to be moved to the new device so that the sum of the amount
> >>of available space on the 4 preexisting devices would at least equal
> >>the available space on the new device, ~7Tb instead of moving ~22T.
> >>I don't need to have a perfectly balanced FS, I just want all the
> >>space to be allocatable.
> >>
> >>I tried using the -ddevid option but it only instructs btrfs to work
> >>on the block groups allocated on said device, as it happens, it
> >>tends to move data between the 4 preexisting devices and doesn't fix
> >>my problem. A full balance with -dlimit=100 did no better.
> >
> >    -dlimit=100 will only move 100 GiB of data (i.e. 200 GiB), so it'll
> >be a pretty limited change. You'll need to use a larger number than
> >that if you want it to have a significant visible effect.
> Last I checked, that's not how the limit filter works.  AFAIUI, it's
> an upper limit on how full a chunk can be to be considered for the
> balance operation.  So, balancing with only `-dlimit=100` should
> actually balance all data chunks (but only data chunks, because you
> haven't asked for metadata balancing).

   That's usage, not limit. limit is simply counting the number of
block groups to move.

   Hugo.

-- 
Hugo Mills             | Great films about cricket: Umpire of the Rising Sun
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Rebalancing raid1 after adding a device
  2019-06-18 18:57     ` Hugo Mills
@ 2019-06-18 18:58       ` Austin S. Hemmelgarn
  2019-06-18 19:03         ` Chris Murphy
  0 siblings, 1 reply; 16+ messages in thread
From: Austin S. Hemmelgarn @ 2019-06-18 18:58 UTC (permalink / raw)
  To: Hugo Mills, Stéphane Lesimple, linux-btrfs

On 2019-06-18 14:57, Hugo Mills wrote:
> On Tue, Jun 18, 2019 at 02:50:34PM -0400, Austin S. Hemmelgarn wrote:
>> On 2019-06-18 14:45, Hugo Mills wrote:
>>> On Tue, Jun 18, 2019 at 08:26:32PM +0200, Stéphane Lesimple wrote:
>>>> I've been a btrfs user for quite a number of years now, but it seems
>>>> I need the wiseness of the btrfs gurus on this one!
>>>>
>>>> I have a 5-hdd btrfs raid1 setup with 4x3T+1x10T drives.
>>>> A few days ago, I replaced one of the 3T by a new 10T, running btrfs
>>>> replace and then resizing the FS to use all the available space of
>>>> the new device.
>>>>
>>>> The filesystem was 90% full before I expanded it so, as expected,
>>>> most of the space on the new device wasn't actually allocatable in
>>>> raid1, as very few available space was available on the 4 other
>>>> devs.
>>>>
>>>> Of course the solution is to run a balance, but as the filesystem is
>>>> now quite big, I'd like to avoid running a full rebalance. This
>>>> would be quite i/o intensive, would be running for several days, and
>>>> putting and unecessary stress on the drives. This also seems
>>>> excessive as in theory only some Tb would need to be moved: if I'm
>>>> correct, only one of two block groups of a sufficient amount of
>>>> chunks to be moved to the new device so that the sum of the amount
>>>> of available space on the 4 preexisting devices would at least equal
>>>> the available space on the new device, ~7Tb instead of moving ~22T.
>>>> I don't need to have a perfectly balanced FS, I just want all the
>>>> space to be allocatable.
>>>>
>>>> I tried using the -ddevid option but it only instructs btrfs to work
>>>> on the block groups allocated on said device, as it happens, it
>>>> tends to move data between the 4 preexisting devices and doesn't fix
>>>> my problem. A full balance with -dlimit=100 did no better.
>>>
>>>     -dlimit=100 will only move 100 GiB of data (i.e. 200 GiB), so it'll
>>> be a pretty limited change. You'll need to use a larger number than
>>> that if you want it to have a significant visible effect.
>> Last I checked, that's not how the limit filter works.  AFAIUI, it's
>> an upper limit on how full a chunk can be to be considered for the
>> balance operation.  So, balancing with only `-dlimit=100` should
>> actually balance all data chunks (but only data chunks, because you
>> haven't asked for metadata balancing).
> 
>     That's usage, not limit. limit is simply counting the number of
> block groups to move.

Realized that I got the two mixed up right after I hit send.

That said, given the size of the FS, it's not unlikely that it may move 
more than 100GB worth of data (pre-replication), as the FS itself is 
getting into the range where chunk sizes start to scale up.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Rebalancing raid1 after adding a device
  2019-06-18 18:58       ` Austin S. Hemmelgarn
@ 2019-06-18 19:03         ` Chris Murphy
  0 siblings, 0 replies; 16+ messages in thread
From: Chris Murphy @ 2019-06-18 19:03 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Hugo Mills, Stéphane Lesimple, Btrfs BTRFS

On Tue, Jun 18, 2019 at 12:58 PM Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:
>
> On 2019-06-18 14:57, Hugo Mills wrote:
> > On Tue, Jun 18, 2019 at 02:50:34PM -0400, Austin S. Hemmelgarn wrote:
> >> On 2019-06-18 14:45, Hugo Mills wrote:
> >>> On Tue, Jun 18, 2019 at 08:26:32PM +0200, Stéphane Lesimple wrote:
> >>>> I've been a btrfs user for quite a number of years now, but it seems
> >>>> I need the wiseness of the btrfs gurus on this one!
> >>>>
> >>>> I have a 5-hdd btrfs raid1 setup with 4x3T+1x10T drives.
> >>>> A few days ago, I replaced one of the 3T by a new 10T, running btrfs
> >>>> replace and then resizing the FS to use all the available space of
> >>>> the new device.
> >>>>
> >>>> The filesystem was 90% full before I expanded it so, as expected,
> >>>> most of the space on the new device wasn't actually allocatable in
> >>>> raid1, as very few available space was available on the 4 other
> >>>> devs.
> >>>>
> >>>> Of course the solution is to run a balance, but as the filesystem is
> >>>> now quite big, I'd like to avoid running a full rebalance. This
> >>>> would be quite i/o intensive, would be running for several days, and
> >>>> putting and unecessary stress on the drives. This also seems
> >>>> excessive as in theory only some Tb would need to be moved: if I'm
> >>>> correct, only one of two block groups of a sufficient amount of
> >>>> chunks to be moved to the new device so that the sum of the amount
> >>>> of available space on the 4 preexisting devices would at least equal
> >>>> the available space on the new device, ~7Tb instead of moving ~22T.
> >>>> I don't need to have a perfectly balanced FS, I just want all the
> >>>> space to be allocatable.
> >>>>
> >>>> I tried using the -ddevid option but it only instructs btrfs to work
> >>>> on the block groups allocated on said device, as it happens, it
> >>>> tends to move data between the 4 preexisting devices and doesn't fix
> >>>> my problem. A full balance with -dlimit=100 did no better.
> >>>
> >>>     -dlimit=100 will only move 100 GiB of data (i.e. 200 GiB), so it'll
> >>> be a pretty limited change. You'll need to use a larger number than
> >>> that if you want it to have a significant visible effect.
> >> Last I checked, that's not how the limit filter works.  AFAIUI, it's
> >> an upper limit on how full a chunk can be to be considered for the
> >> balance operation.  So, balancing with only `-dlimit=100` should
> >> actually balance all data chunks (but only data chunks, because you
> >> haven't asked for metadata balancing).
> >
> >     That's usage, not limit. limit is simply counting the number of
> > block groups to move.
>
> Realized that I got the two mixed up right after I hit send.

No one's ever done that! :D

> That said, given the size of the FS, it's not unlikely that it may move
> more than 100GB worth of data (pre-replication), as the FS itself is
> getting into the range where chunk sizes start to scale up.

It's a good point. The limit filter could cause for non-deterministic
results if there's mixed size block groups, created at various sizes
of the file system over time.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Rebalancing raid1 after adding a device
  2019-06-18 18:50   ` Austin S. Hemmelgarn
  2019-06-18 18:57     ` Hugo Mills
@ 2019-06-18 18:57     ` Chris Murphy
  1 sibling, 0 replies; 16+ messages in thread
From: Chris Murphy @ 2019-06-18 18:57 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Hugo Mills, Stéphane Lesimple, Btrfs BTRFS

On Tue, Jun 18, 2019 at 12:50 PM Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:
>
> On 2019-06-18 14:45, Hugo Mills wrote:
> >     -dlimit=100 will only move 100 GiB of data (i.e. 200 GiB), so it'll
> > be a pretty limited change. You'll need to use a larger number than
> > that if you want it to have a significant visible effect.
> Last I checked, that's not how the limit filter works.  AFAIUI, it's an
> upper limit on how full a chunk can be to be considered for the balance
> operation.  So, balancing with only `-dlimit=100` should actually
> balance all data chunks (but only data chunks, because you haven't asked
> for metadata balancing).

You're thinking of -dusage filter, which is what I mostly use, and is
a percentage value (or a range), whereas the limit filter is a
quantity of block groups, or a range.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Rebalancing raid1 after adding a device
  2019-06-18 18:45 ` Hugo Mills
  2019-06-18 18:50   ` Austin S. Hemmelgarn
@ 2019-06-19  3:27   ` Andrei Borzenkov
  2019-06-19  8:58     ` Stéphane Lesimple
  2019-06-19 11:59   ` Supercilious Dude
  2 siblings, 1 reply; 16+ messages in thread
From: Andrei Borzenkov @ 2019-06-19  3:27 UTC (permalink / raw)
  To: Hugo Mills, Stéphane Lesimple, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1034 bytes --]

18.06.2019 21:45, Hugo Mills пишет:
...
> 
>> Is there a way to ask the block group allocator to prefer writing to
>> a specific device during a balance? Something like -ddestdevid=N?
>> This would just be a hint to the allocator and the usual constraints
>> would always apply (and prevail over the hint when needed).
> 
>    No, there isn't. Having control over the allocator (or bypassing
> it) would be pretty difficult to implement, I think.
> 
>    It would be really great if there was an ioctl that allowed you to
> say things like "take the chunks of this block group and put them on
> devices 2, 4 and 5 in RAID-5", because you could do a load of
> optimisation with reshaping the FS in userspace with that. But I
> suspect it's a long way down the list of things to do.
> 

It really sounds like "btrfs replace -ddrange=x..y". Replace already
knows how to move chunks from one device and put it on another. Now it
"just" needs to skip "replace" part and ignore chunks not covered by
filter ...


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Rebalancing raid1 after adding a device
  2019-06-19  3:27   ` Andrei Borzenkov
@ 2019-06-19  8:58     ` Stéphane Lesimple
  0 siblings, 0 replies; 16+ messages in thread
From: Stéphane Lesimple @ 2019-06-19  8:58 UTC (permalink / raw)
  To: Andrei Borzenkov, Hugo Mills, linux-btrfs

Le 19 juin 2019 05:27:21 Andrei Borzenkov <arvidjaar@gmail.com> a écrit :

> 18.06.2019 21:45, Hugo Mills пишет:
> ...
>>
>>> Is there a way to ask the block group allocator to prefer writing to
>>> a specific device during a balance? Something like -ddestdevid=N?
>>> This would just be a hint to the allocator and the usual constraints
>>> would always apply (and prevail over the hint when needed).
>>
>> No, there isn't. Having control over the allocator (or bypassing
>> it) would be pretty difficult to implement, I think.
>>
>> It would be really great if there was an ioctl that allowed you to
>> say things like "take the chunks of this block group and put them on
>> devices 2, 4 and 5 in RAID-5", because you could do a load of
>> optimisation with reshaping the FS in userspace with that. But I
>> suspect it's a long way down the list of things to do.
>
> It really sounds like "btrfs replace -ddrange=x..y". Replace already
> knows how to move chunks from one device and put it on another. Now it
> "just" needs to skip "replace" part and ignore chunks not covered by
> filter ...

Yes having btrfs balance able to "empty" a device as replace does, without 
actually removing the device from the array would be nice.

There's a way to mimic that: running a btrfs device remove, and rebooting 
when it's almost done. The operation itself is not cancellable, but btrfs 
forgets about the pending remove after the reboot, and the device ils still 
part of the FS. It's ugly of course, and not really advisable, but it works.

However what I would need here seems a bit different: I need block groups 
moved from any device(s) (I don't care which), to one specific device.
I don't think anything like that exists (even counting hacky ways).

-- 
Stéphane.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Rebalancing raid1 after adding a device
  2019-06-18 18:45 ` Hugo Mills
  2019-06-18 18:50   ` Austin S. Hemmelgarn
  2019-06-19  3:27   ` Andrei Borzenkov
@ 2019-06-19 11:59   ` Supercilious Dude
  2 siblings, 0 replies; 16+ messages in thread
From: Supercilious Dude @ 2019-06-19 11:59 UTC (permalink / raw)
  To: Hugo Mills, Stéphane Lesimple, linux-btrfs

On Tue, 18 Jun 2019 at 19:45, Hugo Mills <hugo@carfax.org.uk> wrote:
>
>    It would be really great if there was an ioctl that allowed you to
> say things like "take the chunks of this block group and put them on
> devices 2, 4 and 5 in RAID-5", because you could do a load of
> optimisation with reshaping the FS in userspace with that. But I
> suspect it's a long way down the list of things to do.
>

This combined with a flag that prevents btrfs from allocating on
specific devices automatically would finally enable my dream
filesystem where all writes always go to SSDs and are gradually
migrated to slower storage after the fact (and eventually to RAID-6 if
the data is really cold) as well as hot data being migrated to faster
storage. All this could be done via a userspace daemon with
controllable policy with this ioctl() and flag.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Rebalancing raid1 after adding a device
  2019-06-18 18:26 Rebalancing raid1 after adding a device Stéphane Lesimple
  2019-06-18 18:45 ` Hugo Mills
@ 2019-06-18 19:06 ` Austin S. Hemmelgarn
  2019-06-18 19:15 ` Stéphane Lesimple
  2019-06-18 19:37 ` Stéphane Lesimple
  3 siblings, 0 replies; 16+ messages in thread
From: Austin S. Hemmelgarn @ 2019-06-18 19:06 UTC (permalink / raw)
  To: Stéphane Lesimple, linux-btrfs

On 2019-06-18 14:26, Stéphane Lesimple wrote:
> Hello,
> 
> I've been a btrfs user for quite a number of years now, but it seems I 
> need the wiseness of the btrfs gurus on this one!
> 
> I have a 5-hdd btrfs raid1 setup with 4x3T+1x10T drives.
> A few days ago, I replaced one of the 3T by a new 10T, running btrfs 
> replace and then resizing the FS to use all the available space of the 
> new device.
> 
> The filesystem was 90% full before I expanded it so, as expected, most 
> of the space on the new device wasn't actually allocatable in raid1, as 
> very few available space was available on the 4 other devs.
> 
> Of course the solution is to run a balance, but as the filesystem is now 
> quite big, I'd like to avoid running a full rebalance. This would be 
> quite i/o intensive, would be running for several days, and putting and 
> unecessary stress on the drives. This also seems excessive as in theory 
> only some Tb would need to be moved: if I'm correct, only one of two 
> block groups of a sufficient amount of chunks to be moved to the new 
> device so that the sum of the amount of available space on the 4 
> preexisting devices would at least equal the available space on the new 
> device, ~7Tb instead of moving ~22T.
> I don't need to have a perfectly balanced FS, I just want all the space 
> to be allocatable.
> 
> I tried using the -ddevid option but it only instructs btrfs to work on 
> the block groups allocated on said device, as it happens, it tends to 
> move data between the 4 preexisting devices and doesn't fix my problem. 
> A full balance with -dlimit=100 did no better.
> 
> Is there a way to ask the block group allocator to prefer writing to a 
> specific device during a balance? Something like -ddestdevid=N? This 
> would just be a hint to the allocator and the usual constraints would 
> always apply (and prevail over the hint when needed).
> 
> Or is there any obvious solution I'm completely missing?

Based on what you've said, you may actually not have enough free space 
that can be allocated to balance things properly.

When a chunk gets balanced, you need to have enough space to create a 
new instance of that type of chunk before the old one is removed.  As 
such, if you can't allocate new chunks at all, you can't balance those 
chunks either.

So, that brings up the question of how to deal with your situation.

The first thing I would do is multiple compaction passes using the 
`usage` filter.  Start with:

     btrfs balance -dusage=0 -musage=0 /wherever

That will clear out any empty chunks which haven't been removed (there 
shouldn't be any if you're on a recent kernel, but it's good practice 
anyway).  After that, repeat the same command, but with a value of 10 
instead of 0, and then keep repeating in increments of 10 up until 50. 
Doing this will clean up chunks that are more than half empty (making 
multiple passes like this is a bit more reliable, and in some cases also 
more efficient), which should free up enough space for balance to work 
with (as well as probably moving most of the block groups it touches to 
use the new disk).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Rebalancing raid1 after adding a device
  2019-06-18 18:26 Rebalancing raid1 after adding a device Stéphane Lesimple
  2019-06-18 18:45 ` Hugo Mills
  2019-06-18 19:06 ` Austin S. Hemmelgarn
@ 2019-06-18 19:15 ` Stéphane Lesimple
  2019-06-18 19:22   ` Hugo Mills
  2019-06-18 19:37 ` Stéphane Lesimple
  3 siblings, 1 reply; 16+ messages in thread
From: Stéphane Lesimple @ 2019-06-18 19:15 UTC (permalink / raw)
  To: Hugo Mills; +Cc: linux-btrfs

June 18, 2019 8:45 PM, "Hugo Mills" <hugo@carfax.org.uk> wrote:

> On Tue, Jun 18, 2019 at 08:26:32PM +0200, Stéphane Lesimple wrote:
>> [...]
>> Of course the solution is to run a balance, but as the filesystem is
>> now quite big, I'd like to avoid running a full rebalance. This
>> would be quite i/o intensive, would be running for several days, and
>> putting and unecessary stress on the drives. This also seems
>> excessive as in theory only some Tb would need to be moved: if I'm
>> correct, only one of two block groups of a sufficient amount of
>> chunks to be moved to the new device so that the sum of the amount
>> of available space on the 4 preexisting devices would at least equal
>> the available space on the new device, ~7Tb instead of moving ~22T.
>> I don't need to have a perfectly balanced FS, I just want all the
>> space to be allocatable.
>> 
>> I tried using the -ddevid option but it only instructs btrfs to work
>> on the block groups allocated on said device, as it happens, it
>> tends to move data between the 4 preexisting devices and doesn't fix
>> my problem. A full balance with -dlimit=100 did no better.
> 
> -dlimit=100 will only move 100 GiB of data (i.e. 200 GiB), so it'll
> be a pretty limited change. You'll need to use a larger number than
> that if you want it to have a significant visible effect.

Yes of course, I wasn't clear here but what I meant to do when starting
a full balance with -dlimit=100 was to test under a reasonable amount of
time whether the allocator would prefer to fill the new drive. I observed
after those 100G (200G) of data moved that it wasn't the case at all.
Specifically, no single allocation happened on the new drive. I know this
would be the case at some point, after Terabytes of data would have been
moved, but that's exactly what I'm trying to avoid.

> The -ddevid=<old_10T> option would be my recommendation. It's got
> more chunks on it, so they're likely to have their copies spread
> across the other four devices. This should help with the
> balance.

Makes sense. That's probably what I'm going to do if I don't find
a better solution. That's a bit frustrating because I know exactly
what I want btrfs to do, but I have no way to make it do that.

> Alternatively, just do a full balance and then cancel it when the
> amount of unallocated space is reasonably well spread across the
> devices (specifically, the new device's unallocated space is less than
> the sum of the unallocated space on the other devices).

I'll try with the old 10T and cancel it when I get 0 unallocatable
space, if that happens before all the data is moved around.

>> Is there a way to ask the block group allocator to prefer writing to
>> a specific device during a balance? Something like -ddestdevid=N?
>> This would just be a hint to the allocator and the usual constraints
>> would always apply (and prevail over the hint when needed).
> 
> No, there isn't. Having control over the allocator (or bypassing
> it) would be pretty difficult to implement, I think.
> 
> It would be really great if there was an ioctl that allowed you to
> say things like "take the chunks of this block group and put them on
> devices 2, 4 and 5 in RAID-5", because you could do a load of
> optimisation with reshaping the FS in userspace with that. But I
> suspect it's a long way down the list of things to do.

Exactly, that would be awesome. I would probably even go as far as
writing some C code myself to call this ioctl to do this "intelligent"
balance on my system!

--
Stéphane.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Rebalancing raid1 after adding a device
  2019-06-18 19:15 ` Stéphane Lesimple
@ 2019-06-18 19:22   ` Hugo Mills
  0 siblings, 0 replies; 16+ messages in thread
From: Hugo Mills @ 2019-06-18 19:22 UTC (permalink / raw)
  To: Stéphane Lesimple; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2486 bytes --]

On Tue, Jun 18, 2019 at 07:14:26PM +0000, DO NOT USE wrote:
> June 18, 2019 8:45 PM, "Hugo Mills" <hugo@carfax.org.uk> wrote:
>
> > On Tue, Jun 18, 2019 at 08:26:32PM +0200, Stéphane Lesimple wrote:
> >> [...]
> >> I tried using the -ddevid option but it only instructs btrfs to work
> >> on the block groups allocated on said device, as it happens, it
> >> tends to move data between the 4 preexisting devices and doesn't fix
> >> my problem. A full balance with -dlimit=100 did no better.
> >
> > -dlimit=100 will only move 100 GiB of data (i.e. 200 GiB), so it'll
> > be a pretty limited change. You'll need to use a larger number than
> > that if you want it to have a significant visible effect.
>
> Yes of course, I wasn't clear here but what I meant to do when starting
> a full balance with -dlimit=100 was to test under a reasonable amount of
> time whether the allocator would prefer to fill the new drive. I observed
> after those 100G (200G) of data moved that it wasn't the case at all.
> Specifically, no single allocation happened on the new drive. I know this
> would be the case at some point, after Terabytes of data would have been
> moved, but that's exactly what I'm trying to avoid.

   It's probably putting the data into empty space first. The solution
here would, as Austin said in his reply to your original post, be to
run some compaction on the FS, which will move data from chunks with
little data in, into existing chunks with space. When that's done,
you'll be able to see the chunks moving onto the new device.

[snip]
> > It would be really great if there was an ioctl that allowed you to
> > say things like "take the chunks of this block group and put them on
> > devices 2, 4 and 5 in RAID-5", because you could do a load of
> > optimisation with reshaping the FS in userspace with that. But I
> > suspect it's a long way down the list of things to do.
>
> Exactly, that would be awesome. I would probably even go as far as
> writing some C code myself to call this ioctl to do this "intelligent"
> balance on my system!

   You wouldn't need to. I'd be at the head of the queue to write the
tool. :)

   Hugo.

-- 
Hugo Mills             | How do you become King? You stand in the marketplace
hugo@... carfax.org.uk | and announce you're going to tax everyone. If you
http://carfax.org.uk/  | get out alive, you're King.
PGP: E2AB1DE4          |                                        Harry Harrison

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Rebalancing raid1 after adding a device
  2019-06-18 18:26 Rebalancing raid1 after adding a device Stéphane Lesimple
                   ` (2 preceding siblings ...)
  2019-06-18 19:15 ` Stéphane Lesimple
@ 2019-06-18 19:37 ` Stéphane Lesimple
  2019-06-18 19:42   ` Austin S. Hemmelgarn
  2019-06-18 20:03   ` Stéphane Lesimple
  3 siblings, 2 replies; 16+ messages in thread
From: Stéphane Lesimple @ 2019-06-18 19:37 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: linux-btrfs

June 18, 2019 9:06 PM, "Austin S. Hemmelgarn" <ahferroin7@gmail.com> wrote:

> On 2019-06-18 14:26, Stéphane Lesimple wrote:
>
> [...] 
>
>> I don't need to have a perfectly balanced FS, I just want all the space > to be allocatable.
>> I tried using the -ddevid option but it only instructs btrfs to work on > the block groups
>> allocated on said device, as it happens, it tends to > move data between the 4 preexisting devices
>> and doesn't fix my problem. > A full balance with -dlimit=100 did no better.
>> Is there a way to ask the block group allocator to prefer writing to a > specific device during a
>> balance? Something like -ddestdevid=N? This > would just be a hint to the allocator and the usual
>> constraints would > always apply (and prevail over the hint when needed).
>> Or is there any obvious solution I'm completely missing?
> 
> Based on what you've said, you may actually not have enough free space that can be allocated to
> balance things properly.
> 
> When a chunk gets balanced, you need to have enough space to create a new instance of that type of
> chunk before the old one is removed. As such, if you can't allocate new chunks at all, you can't
> balance those chunks either.
> 
> So, that brings up the question of how to deal with your situation.
> 
> The first thing I would do is multiple compaction passes using the `usage` filter. Start with:
> 
> btrfs balance -dusage=0 -musage=0 /wherever
> 
> That will clear out any empty chunks which haven't been removed (there shouldn't be any if you're
> on a recent kernel, but it's good practice anyway). After that, repeat the same command, but with a
> value of 10 instead of 0, and then keep repeating in increments of 10 up until 50. Doing this will
> clean up chunks that are more than half empty (making multiple passes like this is a bit more
> reliable, and in some cases also more efficient), which should free up enough space for balance to
> work with (as well as probably moving most of the block groups it touches to use the new disk).

Fair point, I do run some balances with -dusage=20 from time to time, the current state of the FS
is actually as follows:

btrfs d u /tank | grep Unallocated:
   Unallocated:            57.45GiB
   Unallocated:             4.58TiB <= new 10T
   Unallocated:            16.03GiB
   Unallocated:            63.49GiB
   Unallocated:            69.52GiB

As you can see I was able to move some data to the new 10T drive in the last few days, mainly by
trial/error with several -ddevid and -dlimit parameters. As of now I still have 4.38T that are
unallocatable, out of the 4.58T that are unallocated on the new drive. I was looking for a better
solution that just running a full balance (with or without -devid=old10T) by asking btrfs to
balance data to the new drive, but it seems there's no way to instruct btrfs to do that.

I think I'll still run a -dusage pass before doing the full balance indeed, can't hurt.

-- 
Stéphane.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Rebalancing raid1 after adding a device
  2019-06-18 19:37 ` Stéphane Lesimple
@ 2019-06-18 19:42   ` Austin S. Hemmelgarn
  2019-06-18 20:03   ` Stéphane Lesimple
  1 sibling, 0 replies; 16+ messages in thread
From: Austin S. Hemmelgarn @ 2019-06-18 19:42 UTC (permalink / raw)
  To: Stéphane Lesimple; +Cc: linux-btrfs

On 2019-06-18 15:37, Stéphane Lesimple wrote:
> June 18, 2019 9:06 PM, "Austin S. Hemmelgarn" <ahferroin7@gmail.com> wrote:
> 
>> On 2019-06-18 14:26, Stéphane Lesimple wrote:
>>
>> [...]
>>
>>> I don't need to have a perfectly balanced FS, I just want all the space > to be allocatable.
>>> I tried using the -ddevid option but it only instructs btrfs to work on > the block groups
>>> allocated on said device, as it happens, it tends to > move data between the 4 preexisting devices
>>> and doesn't fix my problem. > A full balance with -dlimit=100 did no better.
>>> Is there a way to ask the block group allocator to prefer writing to a > specific device during a
>>> balance? Something like -ddestdevid=N? This > would just be a hint to the allocator and the usual
>>> constraints would > always apply (and prevail over the hint when needed).
>>> Or is there any obvious solution I'm completely missing?
>>
>> Based on what you've said, you may actually not have enough free space that can be allocated to
>> balance things properly.
>>
>> When a chunk gets balanced, you need to have enough space to create a new instance of that type of
>> chunk before the old one is removed. As such, if you can't allocate new chunks at all, you can't
>> balance those chunks either.
>>
>> So, that brings up the question of how to deal with your situation.
>>
>> The first thing I would do is multiple compaction passes using the `usage` filter. Start with:
>>
>> btrfs balance -dusage=0 -musage=0 /wherever
>>
>> That will clear out any empty chunks which haven't been removed (there shouldn't be any if you're
>> on a recent kernel, but it's good practice anyway). After that, repeat the same command, but with a
>> value of 10 instead of 0, and then keep repeating in increments of 10 up until 50. Doing this will
>> clean up chunks that are more than half empty (making multiple passes like this is a bit more
>> reliable, and in some cases also more efficient), which should free up enough space for balance to
>> work with (as well as probably moving most of the block groups it touches to use the new disk).
> 
> Fair point, I do run some balances with -dusage=20 from time to time, the current state of the FS
> is actually as follows:
> 
> btrfs d u /tank | grep Unallocated:
>     Unallocated:            57.45GiB
>     Unallocated:             4.58TiB <= new 10T
>     Unallocated:            16.03GiB
>     Unallocated:            63.49GiB
>     Unallocated:            69.52GiB
> 
> As you can see I was able to move some data to the new 10T drive in the last few days, mainly by
> trial/error with several -ddevid and -dlimit parameters. As of now I still have 4.38T that are
> unallocatable, out of the 4.58T that are unallocated on the new drive. I was looking for a better
> solution that just running a full balance (with or without -devid=old10T) by asking btrfs to
> balance data to the new drive, but it seems there's no way to instruct btrfs to do that.
> 
> I think I'll still run a -dusage pass before doing the full balance indeed, can't hurt.
> 
I would specifically make a point to go all the way up to `-dusage=50` 
on that pass though.  It will, of course, take longer than a run with 
`-dusage=20` would, but it will also do a much better job.

That said, it looks like you should have more than enough space for 
balance to be doing it's job correctly here, so I suspect you may have a 
lot of partially full chunks around and the balance is repacking into 
those instead of allocating new chunks.

Regardless though, I suspect that just doing a balance pass with the 
devid filter and only balancing chunks that are on the old 10TB disk as 
Hugo suggested is probably going to get you the best results 
proportionate to the time it takes.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Rebalancing raid1 after adding a device
  2019-06-18 19:37 ` Stéphane Lesimple
  2019-06-18 19:42   ` Austin S. Hemmelgarn
@ 2019-06-18 20:03   ` Stéphane Lesimple
  1 sibling, 0 replies; 16+ messages in thread
From: Stéphane Lesimple @ 2019-06-18 20:03 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: linux-btrfs

June 18, 2019 9:42 PM, "Austin S. Hemmelgarn" <ahferroin7@gmail.com> wrote:

> On 2019-06-18 15:37, Stéphane Lesimple wrote:
> 
>> June 18, 2019 9:06 PM, "Austin S. Hemmelgarn" <ahferroin7@gmail.com> wrote:
>> On 2019-06-18 14:26, Stéphane Lesimple wrote:
>>> [...]
>> 
>> I don't need to have a perfectly balanced FS, I just want all the space > to be allocatable.
>> I tried using the -ddevid option but it only instructs btrfs to work on > the block groups
>> allocated on said device, as it happens, it tends to > move data between the 4 preexisting devices
>> and doesn't fix my problem. > A full balance with -dlimit=100 did no better.
>> Is there a way to ask the block group allocator to prefer writing to a > specific device during a
>> balance? Something like -ddestdevid=N? This > would just be a hint to the allocator and the usual
>> constraints would > always apply (and prevail over the hint when needed).
>> Or is there any obvious solution I'm completely missing?
>>> Based on what you've said, you may actually not have enough free space that can be allocated to
>>> balance things properly.
>>> 
>>> When a chunk gets balanced, you need to have enough space to create a new instance of that type of
>>> chunk before the old one is removed. As such, if you can't allocate new chunks at all, you can't
>>> balance those chunks either.
>>> 
>>> So, that brings up the question of how to deal with your situation.
>>> 
>>> The first thing I would do is multiple compaction passes using the `usage` filter. Start with:
>>> 
>>> btrfs balance -dusage=0 -musage=0 /wherever
>>> 
>>> That will clear out any empty chunks which haven't been removed (there shouldn't be any if you're
>>> on a recent kernel, but it's good practice anyway). After that, repeat the same command, but with a
>>> value of 10 instead of 0, and then keep repeating in increments of 10 up until 50. Doing this will
>>> clean up chunks that are more than half empty (making multiple passes like this is a bit more
>>> reliable, and in some cases also more efficient), which should free up enough space for balance to
>>> work with (as well as probably moving most of the block groups it touches to use the new disk).
>> 
>> Fair point, I do run some balances with -dusage=20 from time to time, the current state of the FS
>> is actually as follows:
>> btrfs d u /tank | grep Unallocated:
>> Unallocated: 57.45GiB
>> Unallocated: 4.58TiB <= new 10T
>> Unallocated: 16.03GiB
>> Unallocated: 63.49GiB
>> Unallocated: 69.52GiB
>> As you can see I was able to move some data to the new 10T drive in the last few days, mainly by
>> trial/error with several -ddevid and -dlimit parameters. As of now I still have 4.38T that are
>> unallocatable, out of the 4.58T that are unallocated on the new drive. I was looking for a better
>> solution that just running a full balance (with or without -devid=old10T) by asking btrfs to
>> balance data to the new drive, but it seems there's no way to instruct btrfs to do that.
>> I think I'll still run a -dusage pass before doing the full balance indeed, can't hurt.

> I would specifically make a point to go all the way up to `-dusage=50` on that pass though. It
> will, of course, take longer than a run with `-dusage=20` would, but it will also do a much better
> job.

> That said, it looks like you should have more than enough space for balance to be doing it's job
> correctly here, so I suspect you may have a lot of partially full chunks around and the balance is
> repacking into those instead of allocating new chunks.

> Regardless though, I suspect that just doing a balance pass with the devid filter and only
> balancing chunks that are on the old 10TB disk as Hugo suggested is probably going to get you the
> best results proportionate to the time it takes.

About the chunks, that's entirely possible.
I'll run some passes up to -dusage=50 before launching the balance then.

Thanks!

-- 
Stéphane.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-06-19 11:59 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-06-18 18:26 Rebalancing raid1 after adding a device Stéphane Lesimple
2019-06-18 18:45 ` Hugo Mills
2019-06-18 18:50   ` Austin S. Hemmelgarn
2019-06-18 18:57     ` Hugo Mills
2019-06-18 18:58       ` Austin S. Hemmelgarn
2019-06-18 19:03         ` Chris Murphy
2019-06-18 18:57     ` Chris Murphy
2019-06-19  3:27   ` Andrei Borzenkov
2019-06-19  8:58     ` Stéphane Lesimple
2019-06-19 11:59   ` Supercilious Dude
2019-06-18 19:06 ` Austin S. Hemmelgarn
2019-06-18 19:15 ` Stéphane Lesimple
2019-06-18 19:22   ` Hugo Mills
2019-06-18 19:37 ` Stéphane Lesimple
2019-06-18 19:42   ` Austin S. Hemmelgarn
2019-06-18 20:03   ` Stéphane Lesimple

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).