public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* Adding a 4TB disk to a 2x4TB btrfs (data:single) filesystem and balancing takes extremely long (over a month). Filesystem has been deduped with bees
@ 2022-04-01 11:13 Konstantinos Skarlatos
  2022-04-01 13:17 ` Hugo Mills
  2022-04-01 14:11 ` Zygo Blaxell
  0 siblings, 2 replies; 6+ messages in thread
From: Konstantinos Skarlatos @ 2022-04-01 11:13 UTC (permalink / raw)
  To: linux-btrfs; +Cc: ce3g8jdj

Hello,
I am running btrfs on 2x 4TB HDDs (data: single, metadata: raid1) and i 
added another 4TB disk.
According to btrfs wiki i should run balance after adding the new device.
My problem is that this balance takes extremely long, it is running for 
4 days and it still has 91% left.
Is this normal, and can i do anything to fix this?

kernel is linux-5.17.1, i have also tried with 5.16 kernels.
mount options are: rw,relatime,compress-force=zstd:11,space_cache=v2
I have been using bees for dedup, but it is disabled for the balance.
I am not doing any IO on the disks, they have no smart errors, and none 
of them are SMR (2x WD40EFRX and 1x ST4000DM000)
Autodefrag is disabled, and i also have checked that the disks are in 
stable drive cages in order to be sure i have no problems with vibration.
Benchmarking them gives normal speeds. Quotas have never been enabled.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Adding a 4TB disk to a 2x4TB btrfs (data:single) filesystem and balancing takes extremely long (over a month). Filesystem has been deduped with bees
  2022-04-01 11:13 Adding a 4TB disk to a 2x4TB btrfs (data:single) filesystem and balancing takes extremely long (over a month). Filesystem has been deduped with bees Konstantinos Skarlatos
@ 2022-04-01 13:17 ` Hugo Mills
  2022-04-03 13:43   ` Konstantinos Skarlatos
  2022-04-01 14:11 ` Zygo Blaxell
  1 sibling, 1 reply; 6+ messages in thread
From: Hugo Mills @ 2022-04-01 13:17 UTC (permalink / raw)
  To: Konstantinos Skarlatos; +Cc: linux-btrfs, ce3g8jdj

On Fri, Apr 01, 2022 at 02:13:58PM +0300, Konstantinos Skarlatos wrote:
> Hello,
> I am running btrfs on 2x 4TB HDDs (data: single, metadata: raid1) and i
> added another 4TB disk.
> According to btrfs wiki i should run balance after adding the new device.
> My problem is that this balance takes extremely long, it is running for 4
> days and it still has 91% left.
> Is this normal, and can i do anything to fix this?

   It's not normal for it to take that long, no. Do you have lots of
snapshots (like, thousands), and lots of small or heavily fragmented
files?

   Is the balance actually progressing, or has it got stuck? Are there
regular messages in dmesg about it balancing block groups? If not,
when was the last one?

   If your data is single, it's not really necessary to do the balance
anyway, so you may want to cancel it. The balance in this situation is
more about ensuring that all the space on the new disk is usable by
the higher RAID levels. For example, adding a single disk to a
nearly-full RAID-1 without balancing would leave you in a state where
you couldn't add more data chunks because there's only space on one
disk to do so, and RAID-1 needs two disks with space to allocate a
data chunk in. With single data, that's not a problem.

> kernel is linux-5.17.1, i have also tried with 5.16 kernels.
> mount options are: rw,relatime,compress-force=zstd:11,space_cache=v2
> I have been using bees for dedup, but it is disabled for the balance.
> I am not doing any IO on the disks, they have no smart errors, and none of
> them are SMR (2x WD40EFRX and 1x ST4000DM000)
> Autodefrag is disabled, and i also have checked that the disks are in stable
> drive cages in order to be sure i have no problems with vibration.
> Benchmarking them gives normal speeds. Quotas have never been enabled.

-- 
Hugo Mills             | "Damn and blast British Telecom!" said Dirk,
hugo@... carfax.org.uk | the words coming easily from force of habit.
http://carfax.org.uk/  |                                        Douglas Adams,
PGP: E2AB1DE4          |               Dirk Gently's Holistic Detective Agency

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Adding a 4TB disk to a 2x4TB btrfs (data:single) filesystem and balancing takes extremely long (over a month). Filesystem has been deduped with bees
  2022-04-01 11:13 Adding a 4TB disk to a 2x4TB btrfs (data:single) filesystem and balancing takes extremely long (over a month). Filesystem has been deduped with bees Konstantinos Skarlatos
  2022-04-01 13:17 ` Hugo Mills
@ 2022-04-01 14:11 ` Zygo Blaxell
  2022-04-02 18:27   ` Konstantinos Skarlatos
  1 sibling, 1 reply; 6+ messages in thread
From: Zygo Blaxell @ 2022-04-01 14:11 UTC (permalink / raw)
  To: Konstantinos Skarlatos; +Cc: linux-btrfs

On Fri, Apr 01, 2022 at 02:13:58PM +0300, Konstantinos Skarlatos wrote:
> Hello,
> I am running btrfs on 2x 4TB HDDs (data: single, metadata: raid1) and i
> added another 4TB disk.
> According to btrfs wiki i should run balance after adding the new device.
> My problem is that this balance takes extremely long, it is running for 4
> days and it still has 91% left.
> Is this normal, and can i do anything to fix this?
> 
> kernel is linux-5.17.1, i have also tried with 5.16 kernels.
> mount options are: rw,relatime,compress-force=zstd:11,space_cache=v2
> I have been using bees for dedup, but it is disabled for the balance.

Deduplication increases the reflink count and increases relocation time.
Snapshots increase the time as well, but not as directly:  creating the
snapshot doesn't increase balance time, but the snapshot will convert
into reflinks over time as the snapshot diverges from its origin subvol,
and those reflinks do increase relocation time.

2x4TB with single profile works out to about 8000 block groups.
Each block group will take between 1 and 60 minutes on 7200 rpm spinning
drives, mostly dependent on the number of reflinks in the block group
(relocating the data takes only 5-10 seconds, the reflink updates are
the vast majority of the relocation time).

The expected range of balance times will be between 8000 minutes (5.5
days) and 8000 hours (333 days or 48 weeks).

As Hugo pointed out, it's not necessary to balance more than a few
block groups in this situation.  You have to ensure that the amount
of unallocated space on all the disks is large enough to contain one
mirror copy of the metadata.  For most users that means at most 0.5%
unallocated on each disk.  If you've already balanced 9% of the disk
then you've already done 18x more balancing than needed and you can
stop now.

> I am not doing any IO on the disks, they have no smart errors, and none of
> them are SMR (2x WD40EFRX and 1x ST4000DM000)
> Autodefrag is disabled, and i also have checked that the disks are in stable
> drive cages in order to be sure i have no problems with vibration.
> Benchmarking them gives normal speeds. Quotas have never been enabled.
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Adding a 4TB disk to a 2x4TB btrfs (data:single) filesystem and balancing takes extremely long (over a month). Filesystem has been deduped with bees
  2022-04-01 14:11 ` Zygo Blaxell
@ 2022-04-02 18:27   ` Konstantinos Skarlatos
  2022-04-02 23:16     ` Zygo Blaxell
  0 siblings, 1 reply; 6+ messages in thread
From: Konstantinos Skarlatos @ 2022-04-02 18:27 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: linux-btrfs, Hugo Mills

On 1/4/2022 5:11 μμ, Zygo Blaxell wrote:
> As Hugo pointed out, it's not necessary to balance more than a few
> block groups in this situation.  You have to ensure that the amount
> of unallocated space on all the disks is large enough to contain one
> mirror copy of the metadata.  For most users that means at most 0.5%
> unallocated on each disk.  If you've already balanced 9% of the disk
> then you've already done 18x more balancing than needed and you can
> stop now.
Thank you for your answer. I guess that this should be documented 
somehow in the wiki or the btrfs balance command or even better make a 
"btrfs balance-after-device-add" command that does the right thing 
because now it is very easy to assume that after adding a device one 
should wait for the complete
balance to finish.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Adding a 4TB disk to a 2x4TB btrfs (data:single) filesystem and balancing takes extremely long (over a month). Filesystem has been deduped with bees
  2022-04-02 18:27   ` Konstantinos Skarlatos
@ 2022-04-02 23:16     ` Zygo Blaxell
  0 siblings, 0 replies; 6+ messages in thread
From: Zygo Blaxell @ 2022-04-02 23:16 UTC (permalink / raw)
  To: Konstantinos Skarlatos; +Cc: linux-btrfs, Hugo Mills

On Sat, Apr 02, 2022 at 09:27:00PM +0300, Konstantinos Skarlatos wrote:
> On 1/4/2022 5:11 μμ, Zygo Blaxell wrote:
> > As Hugo pointed out, it's not necessary to balance more than a few
> > block groups in this situation.  You have to ensure that the amount
> > of unallocated space on all the disks is large enough to contain one
> > mirror copy of the metadata.  For most users that means at most 0.5%
> > unallocated on each disk.  If you've already balanced 9% of the disk
> > then you've already done 18x more balancing than needed and you can
> > stop now.
> Thank you for your answer. I guess that this should be documented somehow in
> the wiki or the btrfs balance command or even better make a "btrfs
> balance-after-device-add" command that does the right thing because now it
> is very easy to assume that after adding a device one should wait for the
> complete
> balance to finish.

It would be a full-sized book to describe all the possible situations.
It's definitely not a solved problem on btrfs.

With knowledge of how the allocator algorithms work, we can develop
balance plans for specific situations.  In this case we can take a
short cut based on a special case, but every situation is different and
a balance plan has to be tailored for each case from first principles.
In some cases specialized software must be developed as the stock btrfs
balance algorithm cannot handle all cases.

e.g. in your case, if you intend to stay with these raid profiles,
then it's sufficient to balance 0.5%; however, if you want to move to
a striped data profile (e.g. raid0 or raid10) in the future, you are
better off doing a full balance now, as the stock balance code will
not be able to do a conversion to striped profiles if you allow the 5th
drive to fill now.  Full balance is recommended because it has the fewest
long-term surprises (but not zero!).

I'm on day 605 of balancing a 65TB filesystem--or day 30 of the 6th time
the balance has been restarted due to drive replacements.  The balances
don't have time to finish between drive replacements, so that filesystem
has been running balance continuously since it got larger than 33TB.
The stock btrfs balance algorithm can no longer make forward progress
with the current block group layout.  I've had to develop custom software
to continue growing the filesystem.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Adding a 4TB disk to a 2x4TB btrfs (data:single) filesystem and balancing takes extremely long (over a month). Filesystem has been deduped with bees
  2022-04-01 13:17 ` Hugo Mills
@ 2022-04-03 13:43   ` Konstantinos Skarlatos
  0 siblings, 0 replies; 6+ messages in thread
From: Konstantinos Skarlatos @ 2022-04-03 13:43 UTC (permalink / raw)
  To: Hugo Mills, linux-btrfs, ce3g8jdj



On 1/4/2022 4:17 μμ, Hugo Mills wrote:
> On Fri, Apr 01, 2022 at 02:13:58PM +0300, Konstantinos Skarlatos wrote:
>> Hello,
>> I am running btrfs on 2x 4TB HDDs (data: single, metadata: raid1) and i
>> added another 4TB disk.
>> According to btrfs wiki i should run balance after adding the new device.
>> My problem is that this balance takes extremely long, it is running for 4
>> days and it still has 91% left.
>> Is this normal, and can i do anything to fix this?
>     It's not normal for it to take that long, no. Do you have lots of
> snapshots (like, thousands), and lots of small or heavily fragmented
> files?
Hi, sorry for missing your reply.
I only have 12 subvolumes and no snapshots. There are about 10 million 
files in the filesystem.

>
>     Is the balance actually progressing, or has it got stuck? Are there
> regular messages in dmesg about it balancing block groups? If not,
> when was the last one?
It is progressing without getting stuck, every few minutes (sometimes 
sooner sometimes longer) a new block gets balanced

>
>     If your data is single, it's not really necessary to do the balance
> anyway, so you may want to cancel it. The balance in this situation is
> more about ensuring that all the space on the new disk is usable by
> the higher RAID levels. For example, adding a single disk to a
> nearly-full RAID-1 without balancing would leave you in a state where
> you couldn't add more data chunks because there's only space on one
> disk to do so, and RAID-1 needs two disks with space to allocate a
> data chunk in. With single data, that's not a problem.
Thanks for the advice. I think this is something that should be better 
documented as the official wiki says that this must be done after adding 
a new disk


>
>> kernel is linux-5.17.1, i have also tried with 5.16 kernels.
>> mount options are: rw,relatime,compress-force=zstd:11,space_cache=v2
>> I have been using bees for dedup, but it is disabled for the balance.
>> I am not doing any IO on the disks, they have no smart errors, and none of
>> them are SMR (2x WD40EFRX and 1x ST4000DM000)
>> Autodefrag is disabled, and i also have checked that the disks are in stable
>> drive cages in order to be sure i have no problems with vibration.
>> Benchmarking them gives normal speeds. Quotas have never been enabled.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-04-03 13:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-04-01 11:13 Adding a 4TB disk to a 2x4TB btrfs (data:single) filesystem and balancing takes extremely long (over a month). Filesystem has been deduped with bees Konstantinos Skarlatos
2022-04-01 13:17 ` Hugo Mills
2022-04-03 13:43   ` Konstantinos Skarlatos
2022-04-01 14:11 ` Zygo Blaxell
2022-04-02 18:27   ` Konstantinos Skarlatos
2022-04-02 23:16     ` Zygo Blaxell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox