97% full system, dusage didn't help, musage strange

Linux Btrfs filesystem development
 help / color / mirror / Atom feed

* 97% full system, dusage didn't help, musage strange
@ 2024-12-14 17:55 Leszek Dubiel
  2024-12-14 18:35 ` Roman Mamedov
  2024-12-14 18:47 ` Andrei Borzenkov
  0 siblings, 2 replies; 11+ messages in thread
From: Leszek Dubiel @ 2024-12-14 17:55 UTC (permalink / raw)
  To: Btrfs BTRFS



My system is almost full:


root@zefir:~# df -h

Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb2       8.2T  7.9T  256G  97% /




root@zefir:~# btrfs fi df /

Data, RAID1: total=8.08TiB, used=7.84TiB
System, RAID1: total=32.00MiB, used=1.47MiB
Metadata, RAID1: total=44.00GiB, used=36.26GiB
GlobalReserve, single: total=512.00MiB, used=0.00B




I have 256 GB free space, but almost no unallocated space:


root@zefir:~# btrfs dev usa /
/dev/sdb2, ID: 1
    Device size:             5.43TiB
    Device slack:              0.00B
    Data,RAID1:              5.38TiB
    Metadata,RAID1:         31.00GiB
    System,RAID1:           32.00MiB
    Unallocated:            11.00GiB

/dev/sdc2, ID: 2
    Device size:             5.43TiB
    Device slack:              0.00B
    Data,RAID1:              5.39TiB
    Metadata,RAID1:         22.00GiB
    Unallocated:            10.03GiB

/dev/sda3, ID: 3
    Device size:             5.43TiB
    Device slack:              0.00B
    Data,RAID1:              5.38TiB
    Metadata,RAID1:         35.00GiB
    System,RAID1:           32.00MiB
    Unallocated:            11.00GiB




I've been running whole day

           btrfs balance start -dusage=xxx,limit=8 /

with increasing numbers of xxx, until I reached dusage=90:


root@zefir:~# btrfs bala start -dusage=20,limit=8 /
Done, had to relocate 0 out of 8319 chunks

root@zefir:~# btrfs bala start -dusage=50,limit=8 /
Done, had to relocate 0 out of 8319 chunks

root@zefir:~# btrfs bala start -dusage=80,limit=8 /
Done, had to relocate 0 out of 8319 chunks

root@zefir:~# btrfs bala start -dusage=90,limit=8 /





I was running with -dusage=90 (90%) whole day, but
unallocated space didn't increase.

On logs i can see:

2024-12-09T08:46:13.001188+01:00 zefir kernel: [431476.446252] BTRFS 
info (device sda2): balance: start -dusage=90,limit=8
2024-12-09T08:46:13.013180+01:00 zefir kernel: [431476.458060] BTRFS 
info (device sda2): relocating block group 34750669520896 flags data|raid1
2024-12-09T08:46:40.389168+01:00 zefir kernel: [431503.832191] BTRFS 
info (device sda2): found 6 extents, stage: move data extents
2024-12-09T08:46:44.193216+01:00 zefir kernel: [431507.636729] BTRFS 
info (device sda2): found 6 extents, stage: update data pointers
2024-12-09T08:46:47.113166+01:00 zefir kernel: [431510.558009] BTRFS 
info (device sda2): relocating block group 34748522037248 flags data|raid1
2024-12-09T08:47:22.241196+01:00 zefir kernel: [431545.684216] BTRFS 
info (device sda2): found 11 extents, stage: move data extents
2024-12-09T08:47:23.933198+01:00 zefir kernel: [431547.378516] BTRFS 
info (device sda2): found 11 extents, stage: update data pointers
2024-12-09T08:47:25.137176+01:00 zefir kernel: [431548.582508] BTRFS 
info (device sda2): relocating block group 34731342168064 flags data|raid1
2024-12-09T08:48:01.897151+01:00 zefir kernel: [431585.342544] BTRFS 
info (device sda2): found 8 extents, stage: move data extents
2024-12-09T08:48:07.949185+01:00 zefir kernel: [431591.393774] BTRFS 
info (device sda2): found 8 extents, stage: update data pointers
2024-12-09T08:48:10.169177+01:00 zefir kernel: [431593.614676] BTRFS 
info (device sda2): relocating block group 34723825975296 flags data|raid1
2024-12-09T08:48:33.781190+01:00 zefir kernel: [431617.225031] BTRFS 
info (device sda2): found 10 extents, stage: move data extents
2024-12-09T08:48:44.353165+01:00 zefir kernel: [431627.799342] BTRFS 
info (device sda2): found 10 extents, stage: update data pointers
2024-12-09T08:48:47.453174+01:00 zefir kernel: [431630.899246] BTRFS 
info (device sda2): relocating block group 34721678491648 flags data|raid1

But unallocated space didn't increase.





So I started to play with metadata optimization, that is with musage.



When I put limit=0, no blocks are reallocated.
When I put limit=1 or limit=2 always one block is reallocated.
When I put limit greater then no blocks are reallocated.


See the test:

root@zefir:~# for lim in 0 1 2 3 4 5 6; do echo "lim=$lim"; for f in 
$(seq 5); do btrfs bala start -musage=30,limit=$lim /; done; done
lim=0
Done, had to relocate 0 out of 8318 chunks
Done, had to relocate 0 out of 8318 chunks
Done, had to relocate 0 out of 8318 chunks
Done, had to relocate 0 out of 8318 chunks
Done, had to relocate 0 out of 8318 chunks
lim=1
Done, had to relocate 1 out of 8318 chunks
Done, had to relocate 1 out of 8318 chunks
Done, had to relocate 1 out of 8318 chunks
Done, had to relocate 1 out of 8318 chunks
Done, had to relocate 1 out of 8318 chunks
lim=2
Done, had to relocate 1 out of 8318 chunks
Done, had to relocate 1 out of 8318 chunks
Done, had to relocate 1 out of 8318 chunks
Done, had to relocate 1 out of 8318 chunks
Done, had to relocate 1 out of 8318 chunks
lim=3
Done, had to relocate 0 out of 8318 chunks
Done, had to relocate 0 out of 8318 chunks
Done, had to relocate 0 out of 8318 chunks
Done, had to relocate 0 out of 8318 chunks
Done, had to relocate 0 out of 8318 chunks
lim=4
Done, had to relocate 0 out of 8318 chunks
Done, had to relocate 0 out of 8318 chunks
Done, had to relocate 0 out of 8318 chunks
Done, had to relocate 0 out of 8318 chunks
Done, had to relocate 0 out of 8318 chunks
lim=5
Done, had to relocate 0 out of 8318 chunks
Done, had to relocate 0 out of 8318 chunks
Done, had to relocate 0 out of 8318 chunks
Done, had to relocate 0 out of 8318 chunks
Done, had to relocate 0 out of 8318 chunks
lim=6
Done, had to relocate 0 out of 8318 chunks
Done, had to relocate 0 out of 8318 chunks
Done, had to relocate 0 out of 8318 chunks
Done, had to relocate 0 out of 8318 chunks
Done, had to relocate 0 out of 8318 chunks




root@zefir:~# btrfs bala start -musage=30,limit=1 /
Done, had to relocate 1 out of 8318 chunks

root@zefir:~# dmesg -T | tail
[Sat Dec 14 18:50:00 2024] BTRFS info (device sdb2): balance: start 
-musage=30,limit=6 -susage=30,limit=6
[Sat Dec 14 18:50:00 2024] BTRFS info (device sdb2): balance: ended with 
status: 0
[Sat Dec 14 18:50:00 2024] BTRFS info (device sdb2): balance: start 
-musage=30,limit=6 -susage=30,limit=6
[Sat Dec 14 18:50:00 2024] BTRFS info (device sdb2): balance: ended with 
status: 0
[Sat Dec 14 18:50:00 2024] BTRFS info (device sdb2): balance: start 
-musage=30,limit=6 -susage=30,limit=6
[Sat Dec 14 18:50:00 2024] BTRFS info (device sdb2): balance: ended with 
status: 0
[Sat Dec 14 18:50:42 2024] BTRFS info (device sdb2): balance: start 
-musage=30,limit=1 -susage=30,limit=1
[Sat Dec 14 18:50:42 2024] BTRFS info (device sdb2): relocating block 
group 38091650760704 flags system|raid1
[Sat Dec 14 18:50:43 2024] BTRFS info (device sdb2): found 91 extents, 
stage: move data extents
[Sat Dec 14 18:50:44 2024] BTRFS info (device sdb2): balance: ended with 
status: 0




During those all operations level of Unallocated space is not increasing.
What should i do next?











^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 97% full system, dusage didn't help, musage strange
  2024-12-14 17:55 97% full system, dusage didn't help, musage strange Leszek Dubiel
@ 2024-12-14 18:35 ` Roman Mamedov
  2024-12-14 18:47 ` Andrei Borzenkov
  1 sibling, 0 replies; 11+ messages in thread
From: Roman Mamedov @ 2024-12-14 18:35 UTC (permalink / raw)
  To: Leszek Dubiel; +Cc: Btrfs BTRFS

On Sat, 14 Dec 2024 18:55:06 +0100
Leszek Dubiel <leszek@dubiel.pl> wrote:

> What should i do next?

Logically speaking it seems the next to try would be -musage=40, 50 and so on.
But even so, the current physical space taken up by metadata is 88 GB, with
36 GB of actual metadata (72 GB in RAID1). So the potential gain here is only
16 GB at best.

It looks like the rest of the "free" 256 GB is really scattered across data
extents, even those beyond 90% utilization. To reclaim that into unallocated
it could take balancing with the 95-98 filter or no filter at all, and a
very long time.

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 97% full system, dusage didn't help, musage strange
  2024-12-14 17:55 97% full system, dusage didn't help, musage strange Leszek Dubiel
  2024-12-14 18:35 ` Roman Mamedov
@ 2024-12-14 18:47 ` Andrei Borzenkov
  2024-12-14 20:13   ` Leszek Dubiel
  1 sibling, 1 reply; 11+ messages in thread
From: Andrei Borzenkov @ 2024-12-14 18:47 UTC (permalink / raw)
  To: Leszek Dubiel, Btrfs BTRFS

14.12.2024 20:55, Leszek Dubiel wrote:
> 
> 
> My system is almost full:
> 
> 
> root@zefir:~# df -h
> 
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/sdb2       8.2T  7.9T  256G  97% /
> 
> 
> 
> 
> root@zefir:~# btrfs fi df /
> 
> Data, RAID1: total=8.08TiB, used=7.84TiB
> System, RAID1: total=32.00MiB, used=1.47MiB
> Metadata, RAID1: total=44.00GiB, used=36.26GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
> 
> 
> 
> 
> I have 256 GB free space, but almost no unallocated space:
> 
> 
> root@zefir:~# btrfs dev usa /
> /dev/sdb2, ID: 1
>      Device size:             5.43TiB
>      Device slack:              0.00B
>      Data,RAID1:              5.38TiB
>      Metadata,RAID1:         31.00GiB
>      System,RAID1:           32.00MiB
>      Unallocated:            11.00GiB
> 
> /dev/sdc2, ID: 2
>      Device size:             5.43TiB
>      Device slack:              0.00B
>      Data,RAID1:              5.39TiB
>      Metadata,RAID1:         22.00GiB
>      Unallocated:            10.03GiB
> 
> /dev/sda3, ID: 3
>      Device size:             5.43TiB
>      Device slack:              0.00B
>      Data,RAID1:              5.38TiB
>      Metadata,RAID1:         35.00GiB
>      System,RAID1:           32.00MiB
>      Unallocated:            11.00GiB
> 
> 
> 

Show

btrfs filesystem usage -T

> 
> I've been running whole day
> 
>             btrfs balance start -dusage=xxx,limit=8 /
> 
> with increasing numbers of xxx, until I reached dusage=90:
> 
> 
> root@zefir:~# btrfs bala start -dusage=20,limit=8 /
> Done, had to relocate 0 out of 8319 chunks
> 
> root@zefir:~# btrfs bala start -dusage=50,limit=8 /
> Done, had to relocate 0 out of 8319 chunks
> 
> root@zefir:~# btrfs bala start -dusage=80,limit=8 /
> Done, had to relocate 0 out of 8319 chunks
> 
> root@zefir:~# btrfs bala start -dusage=90,limit=8 /
> 
> 
> 
> 
> 
> I was running with -dusage=90 (90%) whole day, but
> unallocated space didn't increase.
> 
> On logs i can see:
> 
> 2024-12-09T08:46:13.001188+01:00 zefir kernel: [431476.446252] BTRFS
> info (device sda2): balance: start -dusage=90,limit=8
> 2024-12-09T08:46:13.013180+01:00 zefir kernel: [431476.458060] BTRFS
> info (device sda2): relocating block group 34750669520896 flags data|raid1
> 2024-12-09T08:46:40.389168+01:00 zefir kernel: [431503.832191] BTRFS
> info (device sda2): found 6 extents, stage: move data extents
> 2024-12-09T08:46:44.193216+01:00 zefir kernel: [431507.636729] BTRFS
> info (device sda2): found 6 extents, stage: update data pointers
> 2024-12-09T08:46:47.113166+01:00 zefir kernel: [431510.558009] BTRFS
> info (device sda2): relocating block group 34748522037248 flags data|raid1
> 2024-12-09T08:47:22.241196+01:00 zefir kernel: [431545.684216] BTRFS
> info (device sda2): found 11 extents, stage: move data extents
> 2024-12-09T08:47:23.933198+01:00 zefir kernel: [431547.378516] BTRFS
> info (device sda2): found 11 extents, stage: update data pointers
> 2024-12-09T08:47:25.137176+01:00 zefir kernel: [431548.582508] BTRFS
> info (device sda2): relocating block group 34731342168064 flags data|raid1
> 2024-12-09T08:48:01.897151+01:00 zefir kernel: [431585.342544] BTRFS
> info (device sda2): found 8 extents, stage: move data extents
> 2024-12-09T08:48:07.949185+01:00 zefir kernel: [431591.393774] BTRFS
> info (device sda2): found 8 extents, stage: update data pointers
> 2024-12-09T08:48:10.169177+01:00 zefir kernel: [431593.614676] BTRFS
> info (device sda2): relocating block group 34723825975296 flags data|raid1
> 2024-12-09T08:48:33.781190+01:00 zefir kernel: [431617.225031] BTRFS
> info (device sda2): found 10 extents, stage: move data extents
> 2024-12-09T08:48:44.353165+01:00 zefir kernel: [431627.799342] BTRFS
> info (device sda2): found 10 extents, stage: update data pointers
> 2024-12-09T08:48:47.453174+01:00 zefir kernel: [431630.899246] BTRFS
> info (device sda2): relocating block group 34721678491648 flags data|raid1
> 
> But unallocated space didn't increase.
> 
> 

Why did you expect it to increase? To free space balance need to pack 
more extents into less chunks. In your case chunks are near to full and 
extents are relatively large, so chunks simply may not have enough free 
space to accommodate more extents. You just move extents around.

> 
> 
> 
> So I started to play with metadata optimization, that is with musage.
> 
> 
> 
> When I put limit=0, no blocks are reallocated.
> When I put limit=1 or limit=2 always one block is reallocated.
> When I put limit greater then no blocks are reallocated.
> 
> 
> See the test:
> 
> root@zefir:~# for lim in 0 1 2 3 4 5 6; do echo "lim=$lim"; for f in
> $(seq 5); do btrfs bala start -musage=30,limit=$lim /; done; done
> lim=0
> Done, had to relocate 0 out of 8318 chunks
> Done, had to relocate 0 out of 8318 chunks
> Done, had to relocate 0 out of 8318 chunks
> Done, had to relocate 0 out of 8318 chunks
> Done, had to relocate 0 out of 8318 chunks
> lim=1
> Done, had to relocate 1 out of 8318 chunks
> Done, had to relocate 1 out of 8318 chunks
> Done, had to relocate 1 out of 8318 chunks
> Done, had to relocate 1 out of 8318 chunks
> Done, had to relocate 1 out of 8318 chunks
> lim=2
> Done, had to relocate 1 out of 8318 chunks
> Done, had to relocate 1 out of 8318 chunks
> Done, had to relocate 1 out of 8318 chunks
> Done, had to relocate 1 out of 8318 chunks
> Done, had to relocate 1 out of 8318 chunks
> lim=3
> Done, had to relocate 0 out of 8318 chunks
> Done, had to relocate 0 out of 8318 chunks
> Done, had to relocate 0 out of 8318 chunks
> Done, had to relocate 0 out of 8318 chunks
> Done, had to relocate 0 out of 8318 chunks
> lim=4
> Done, had to relocate 0 out of 8318 chunks
> Done, had to relocate 0 out of 8318 chunks
> Done, had to relocate 0 out of 8318 chunks
> Done, had to relocate 0 out of 8318 chunks
> Done, had to relocate 0 out of 8318 chunks
> lim=5
> Done, had to relocate 0 out of 8318 chunks
> Done, had to relocate 0 out of 8318 chunks
> Done, had to relocate 0 out of 8318 chunks
> Done, had to relocate 0 out of 8318 chunks
> Done, had to relocate 0 out of 8318 chunks
> lim=6
> Done, had to relocate 0 out of 8318 chunks
> Done, had to relocate 0 out of 8318 chunks
> Done, had to relocate 0 out of 8318 chunks
> Done, had to relocate 0 out of 8318 chunks
> Done, had to relocate 0 out of 8318 chunks
> 
> 
> 
> 
> root@zefir:~# btrfs bala start -musage=30,limit=1 /
> Done, had to relocate 1 out of 8318 chunks
> 
> root@zefir:~# dmesg -T | tail
> [Sat Dec 14 18:50:00 2024] BTRFS info (device sdb2): balance: start
> -musage=30,limit=6 -susage=30,limit=6
> [Sat Dec 14 18:50:00 2024] BTRFS info (device sdb2): balance: ended with
> status: 0
> [Sat Dec 14 18:50:00 2024] BTRFS info (device sdb2): balance: start
> -musage=30,limit=6 -susage=30,limit=6
> [Sat Dec 14 18:50:00 2024] BTRFS info (device sdb2): balance: ended with
> status: 0
> [Sat Dec 14 18:50:00 2024] BTRFS info (device sdb2): balance: start
> -musage=30,limit=6 -susage=30,limit=6
> [Sat Dec 14 18:50:00 2024] BTRFS info (device sdb2): balance: ended with
> status: 0
> [Sat Dec 14 18:50:42 2024] BTRFS info (device sdb2): balance: start
> -musage=30,limit=1 -susage=30,limit=1
> [Sat Dec 14 18:50:42 2024] BTRFS info (device sdb2): relocating block
> group 38091650760704 flags system|raid1
> [Sat Dec 14 18:50:43 2024] BTRFS info (device sdb2): found 91 extents,
> stage: move data extents
> [Sat Dec 14 18:50:44 2024] BTRFS info (device sdb2): balance: ended with
> status: 0
> 
> 
> 
> 
> During those all operations level of Unallocated space is not increasing.

Relocating one chunk simply moves extents from this chunk to another 
location. It does not free any chunk. You can only get more unallocated 
space when you are able to pack extents from two (or more) chunks into 
one chunk. Which is only possible if chunks are filled to 50%.

> What should i do next?
> 
> 

It looks like your filesystem is simply full. Do you have reasons to 
believe that it is not true?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 97% full system, dusage didn't help, musage strange
  2024-12-14 18:47 ` Andrei Borzenkov
@ 2024-12-14 20:13   ` Leszek Dubiel
  2024-12-14 21:14     ` Qu Wenruo
  0 siblings, 1 reply; 11+ messages in thread
From: Leszek Dubiel @ 2024-12-14 20:13 UTC (permalink / raw)
  To: Andrei Borzenkov, Btrfs BTRFS

>
>
> btrfs filesystem usage -T
>

Overall:
     Device size:          16.28TiB
     Device allocated:          16.24TiB
     Device unallocated:          38.02GiB
     Device missing:             0.00B
     Device slack:             0.00B
     Used:              15.75TiB
     Free (estimated):         264.00GiB    (min: 264.00GiB)
     Free (statfs, df):         257.98GiB
     Data ratio:                  2.00
     Metadata ratio:              2.00
     Global reserve:         512.00MiB    (used: 160.00KiB)
     Multiple profiles:                no

              Data    Metadata System
Id Path      RAID1   RAID1    RAID1    Unallocated Total    Slack
-- --------- ------- -------- -------- ----------- -------- -----
  1 /dev/sdb2 5.39TiB 28.00GiB 32.00MiB    13.00GiB  5.43TiB     -
  2 /dev/sdc2 5.39TiB 20.00GiB        -    13.03GiB  5.43TiB     -
  3 /dev/sda3 5.38TiB 34.00GiB 32.00MiB    12.00GiB  5.43TiB     -
-- --------- ------- -------- -------- ----------- -------- -----
    Total     8.08TiB 41.00GiB 32.00MiB    38.02GiB 16.28TiB 0.00B
    Used      7.84TiB 37.67GiB  1.47MiB



>>
>> But unallocated space didn't increase.
>>
>>
>
> Why did you expect it to increase? To free space balance need to pack 
> more extents into less chunks. In your case chunks are near to full 
> and extents are relatively large, so chunks simply may not have enough 
> free space to accommodate more extents. You just move extents around.
>

Ok. I didn't exactly know what chunks are compared to extents.





> Relocating one chunk simply moves extents from this chunk to another 
> location. It does not free any chunk. You can only get more 
> unallocated space when you are able to pack extents from two (or more) 
> chunks into one chunk. Which is only possible if chunks are filled to 
> 50%.
>

Thank you for explanation.




>> What should i do next?
>
> It looks like your filesystem is simply full. Do you have reasons to 
> believe that it is not true?


It is backup server. It should be almost full.
I have a procedure that:

— removes old snapshots if free space is less then 250 GB
— starts balancing if there is less then 8GB of unallocated space on any 
disk


It failed now — there is 258 GB free, but balancing didn't help to 
restore unallocated space.
















^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 97% full system, dusage didn't help, musage strange
  2024-12-14 20:13   ` Leszek Dubiel
@ 2024-12-14 21:14     ` Qu Wenruo
  2024-12-16 17:12       ` Leszek Dubiel
  0 siblings, 1 reply; 11+ messages in thread
From: Qu Wenruo @ 2024-12-14 21:14 UTC (permalink / raw)
  To: Leszek Dubiel, Andrei Borzenkov, Btrfs BTRFS



在 2024/12/15 06:43, Leszek Dubiel 写道:
>>
>>
>> btrfs filesystem usage -T
>>
>
> Overall:
>      Device size:          16.28TiB
>      Device allocated:          16.24TiB
>      Device unallocated:          38.02GiB
>      Device missing:             0.00B
>      Device slack:             0.00B
>      Used:              15.75TiB
>      Free (estimated):         264.00GiB    (min: 264.00GiB)
>      Free (statfs, df):         257.98GiB
>      Data ratio:                  2.00
>      Metadata ratio:              2.00
>      Global reserve:         512.00MiB    (used: 160.00KiB)
>      Multiple profiles:                no
>
>               Data    Metadata System
> Id Path      RAID1   RAID1    RAID1    Unallocated Total    Slack
> -- --------- ------- -------- -------- ----------- -------- -----
>   1 /dev/sdb2 5.39TiB 28.00GiB 32.00MiB    13.00GiB  5.43TiB     -
>   2 /dev/sdc2 5.39TiB 20.00GiB        -    13.03GiB  5.43TiB     -
>   3 /dev/sda3 5.38TiB 34.00GiB 32.00MiB    12.00GiB  5.43TiB     -
> -- --------- ------- -------- -------- ----------- -------- -----
>     Total     8.08TiB 41.00GiB 32.00MiB    38.02GiB 16.28TiB 0.00B
>     Used      7.84TiB 37.67GiB  1.47MiB
>
>
>
>>>
>>> But unallocated space didn't increase.
>>>
>>>
>>
>> Why did you expect it to increase? To free space balance need to pack
>> more extents into less chunks. In your case chunks are near to full
>> and extents are relatively large, so chunks simply may not have enough
>> free space to accommodate more extents. You just move extents around.
>>
>
> Ok. I didn't exactly know what chunks are compared to extents.
>
>
>
>
>
>> Relocating one chunk simply moves extents from this chunk to another
>> location. It does not free any chunk. You can only get more
>> unallocated space when you are able to pack extents from two (or more)
>> chunks into one chunk. Which is only possible if chunks are filled to
>> 50%.
>>
>
> Thank you for explanation.
>
>
>
>
>>> What should i do next?
>>
>> It looks like your filesystem is simply full. Do you have reasons to
>> believe that it is not true?
>
>
> It is backup server. It should be almost full.
> I have a procedure that:
>
> — removes old snapshots if free space is less then 250 GB
> — starts balancing if there is less then 8GB of unallocated space on any
> disk
>
>
> It failed now — there is 258 GB free, but balancing didn't help to
> restore unallocated space.

Because there isn't that much free space to reclaim in the first place.

Your data and metadata chunks are already very highly utilized, you can
increase the dusage/musage and retry, but they will only bring marginal
gain if any.

The only way to go next is start deleting more
snapshots/subvolumes/files/etc.

With more data/metadata space released, then try musage/dusage again
which can free up some space.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 97% full system, dusage didn't help, musage strange
  2024-12-14 21:14     ` Qu Wenruo
@ 2024-12-16 17:12       ` Leszek Dubiel
  2024-12-16 21:01         ` Qu Wenruo
  0 siblings, 1 reply; 11+ messages in thread
From: Leszek Dubiel @ 2024-12-16 17:12 UTC (permalink / raw)
  To: Btrfs BTRFS

>>
>> It failed now — there is 258 GB free, but balancing didn't help to
>> restore unallocated space.
>
> Because there isn't that much free space to reclaim in the first place.
>
> Your data and metadata chunks are already very highly utilized, you can
> increase the dusage/musage and retry, but they will only bring marginal
> gain if any.
>
> The only way to go next is start deleting more
> snapshots/subvolumes/files/etc.
>
> With more data/metadata space released, then try musage/dusage again
> which can free up some space.

This helped.
Thank you.

My system was deleting shapshots automatically if there was less then 
250 GB free space.
This procedure worked fine — 258 GB was free.

Second procedure was balancing system if there was less then 8GB of 
Unallocated space.
This procedure couldn't reclaim free space to makie it unallocated.

How to improve?

Tell the first procedure to keep 500GB free space? 800GB?

Or change the procedure to look at percentage of free disk space?
What is optimal? 90% should be maximum of occupied?

This is system for backups, so this could be almost full.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 97% full system, dusage didn't help, musage strange
  2024-12-16 17:12       ` Leszek Dubiel
@ 2024-12-16 21:01         ` Qu Wenruo
  2024-12-17 21:44           ` Leszek Dubiel
  2025-01-03 22:52           ` Leszek Dubiel
  0 siblings, 2 replies; 11+ messages in thread
From: Qu Wenruo @ 2024-12-16 21:01 UTC (permalink / raw)
  To: Leszek Dubiel, Btrfs BTRFS

在 2024/12/17 03:42, Leszek Dubiel 写道:
> 
>>>
>>> It failed now — there is 258 GB free, but balancing didn't help to
>>> restore unallocated space.
>>
>> Because there isn't that much free space to reclaim in the first place.
>>
>> Your data and metadata chunks are already very highly utilized, you can
>> increase the dusage/musage and retry, but they will only bring marginal
>> gain if any.
>>
>> The only way to go next is start deleting more
>> snapshots/subvolumes/files/etc.
>>
>> With more data/metadata space released, then try musage/dusage again
>> which can free up some space.
> 
> This helped.
> Thank you.
> 
> 
> 
> My system was deleting shapshots automatically if there was less then 
> 250 GB free space.
> This procedure worked fine — 258 GB was free.
> 
> Second procedure was balancing system if there was less then 8GB of 
> Unallocated space.
> This procedure couldn't reclaim free space to makie it unallocated.
> 
> 
> 
> 
> How to improve?
> 
> Tell the first procedure to keep 500GB free space? 800GB?

Increasing the free space will definitely increase the chance of 
reclaiming space using balance.

But that's only increasing the chance, never to ensure that.

As you can keep 800GiB free space, but since your fs have 8TiB data 
space, for the worst case scenario where all space are freed evenly 
among all chunks, it means each 1GiB chunk will only have around 100MiB 
(10%) freed.

That is not really going to chance the result of balance.
But that's for the worst case scenario.

> 
> Or change the procedure to look at percentage of free disk space?
> What is optimal? 90% should be maximum of occupied?
> 
> This is system for backups, so this could be almost full.

If you want to be extra safe, the best solution is to use tools that can 
report the usage percentage of each block group.

You need something procedure like this:

start:
	if (unallocated space >= 8GiB)
		return;
check_usage_percentage:
	if (no block group has usage percentage < 30%) {
		delete_files;
		goto check_usage_percentage;
	}
	balance dusage=30
	goto start;

Although there are some concerns, firstly the tool, sorry I didn't 
remember the name but there is an out-of-btrfs-progs tool can do exactly 
that.

Secondly the tool may need to go through all 8000+ block groups, thus 
just checking the block group usage itself will take some time.
(IIRC if it's using the tree search ioctl and the fs doesn't have 
block-group-tree feature, it may spend one or two minutes, just like 
mounting a large fs)

Third, since you have no control where the data is, at file deleting 
stage, for the worst case you may have to delete 70% of your data so 
that all block groups reach 30% free space...

But at least that should be the most precious way to keep your 
unallocated safe...

Thanks,
Qu

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 97% full system, dusage didn't help, musage strange
  2024-12-16 21:01         ` Qu Wenruo
@ 2024-12-17 21:44           ` Leszek Dubiel
  2025-01-03 22:52           ` Leszek Dubiel
  1 sibling, 0 replies; 11+ messages in thread
From: Leszek Dubiel @ 2024-12-17 21:44 UTC (permalink / raw)
  To: Btrfs BTRFS

>
> You need something procedure like this:
>
> start:
>     if (unallocated space >= 8GiB)
>         return;
> check_usage_percentage:
>     if (no block group has usage percentage < 30%) {
>         delete_files;
>         goto check_usage_percentage;
>     }
>     balance dusage=30
>     goto start;
>
> Although there are some concerns, firstly the tool, sorry I didn't 
> remember the name but there is an out-of-btrfs-progs tool can do 
> exactly that.


Thank you for hints. I'll search for that program and try it.





^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 97% full system, dusage didn't help, musage strange
  2024-12-16 21:01         ` Qu Wenruo
  2024-12-17 21:44           ` Leszek Dubiel
@ 2025-01-03 22:52           ` Leszek Dubiel
  2025-01-04  5:32             ` Andrei Borzenkov
  1 sibling, 1 reply; 11+ messages in thread
From: Leszek Dubiel @ 2025-01-03 22:52 UTC (permalink / raw)
  To: Btrfs BTRFS










W dniu 16.12.2024 o 22:01, Qu Wenruo pisze:



 > If you want to be extra safe, the best solution is to use tools that 
can report the usage percentage of each block group.
 >
 > You need something procedure like this:
 >
 > start:
 >     if (unallocated space >= 8GiB)
 >         return;
 > check_usage_percentage:
 >     if (no block group has usage percentage < 30%) {
 >         delete_files;
 >         goto check_usage_percentage;
 >     }
 >     balance dusage=30
 >     goto start;
 >
 > Although there are some concerns, firstly the tool, sorry I didn't 
remember the name but there is an out-of-btrfs-progs tool can do exactly 
that.

In btrfs-progs package I didn't find any such tool.

There is "btrfs maintenance" by kdave:

                      https://github.com/kdave/btrfsmaintenance

but it starts normal balance, it doesn't analize "block usage percentage".




 >> How to improve?
 >>
 >> Tell the first procedure to keep 500GB free space? 800GB?
 >
 > Increasing the free space will definitely increase the chance of 
reclaiming space using balance.
 >
 > But that's only increasing the chance, never to ensure that.
 >
 > As you can keep 800GiB free space, but since your fs have 8TiB data 
space, for the worst case scenario where all space are freed evenly 
among all chunks, it means each 1GiB chunk will only have around 100MiB 
(10%) freed.
 >
 > That is not really going to chance the result of balance.
 > But that's for the worst case scenario.


I think I'm hitting the worst case scenario again and again.

It looks as if my BTRFS system is always going towards that situation 
where "all space are freed evenly among all chunks".





I wrote a script that ensures:

— at least 200 GB free disk space
— at least 5% free disk space

Then another script to balance btrfs when there is low "Unallocated space".




It got stuck:

# df -h ./sdc3/

Filesystem      Size  Used Avail Use% Mounted on
/dev/sdc3       2.7T  2.5T  226G  92% /mnt/leszek/sdc3


# btrfs filesystem usage ./sdc3 -T

Overall:
     Device size:           2.70TiB
     Device allocated:           2.68TiB
     Device unallocated:          19.45GiB
     Device missing:             0.00B
     Device slack:           3.50KiB
     Used:               2.47TiB
     Free (estimated):         225.94GiB    (min: 216.21GiB)
     Free (statfs, df):         225.87GiB
     Data ratio:                  1.00
     Metadata ratio:              2.00
     Global reserve:         512.00MiB    (used: 32.00KiB)
     Multiple profiles:                no

              Data    Metadata System
Id Path      single  DUP      DUP       Unallocated Total   Slack
-- --------- ------- -------- --------- ----------- ------- -------
  1 /dev/sdc3 2.64TiB 36.00GiB  64.00MiB    19.45GiB 2.70TiB 3.50KiB
-- --------- ------- -------- --------- ----------- ------- -------
    Total     2.64TiB 18.00GiB  32.00MiB    19.45GiB 2.70TiB 3.50KiB
    Used      2.44TiB 17.08GiB 432.00KiB





I have tried different combinations of musage, dusage...



btrfs balance start -dusage=30,limit=5 ./sdc3/

btrfs balance start -dusage=50,limit=5 ./sdc3/

btrfs balance start -dusage=90,limit=5 ./sdc3/

btrfs balance start -dusage=99,limit=5 ./sdc3/

[Fri Jan  3 23:06:57 2025] BTRFS info (device sdc3): balance: start 
-dusage=99,limit=5
[Fri Jan  3 23:06:57 2025] BTRFS info (device sdc3): relocating block 
group 10938102710272 flags data
[Fri Jan  3 23:07:20 2025] BTRFS info (device sdc3): found 10 extents, 
stage: move data extents
[Fri Jan  3 23:07:22 2025] BTRFS info (device sdc3): found 10 extents, 
stage: update data pointers
[Fri Jan  3 23:07:22 2025] BTRFS info (device sdc3): relocating block 
group 10937028968448 flags data
[Fri Jan  3 23:07:43 2025] BTRFS info (device sdc3): found 11 extents, 
stage: move data extents
[Fri Jan  3 23:07:44 2025] BTRFS info (device sdc3): found 11 extents, 
stage: update data pointers
[Fri Jan  3 23:07:45 2025] BTRFS info (device sdc3): relocating block 
group 10935955226624 flags data
[Fri Jan  3 23:08:04 2025] BTRFS info (device sdc3): found 9 extents, 
stage: move data extents
[Fri Jan  3 23:08:06 2025] BTRFS info (device sdc3): found 9 extents, 
stage: update data pointers





btrfs balance start -musage=30,limit=5 ./sdc3/

btrfs balance start -musage=50,limit=5 ./sdc3/

btrfs balance start -musage=90,limit=5 ./sdc3/

btrfs balance start -musage=99,limit=5 ./sdc3/


What is strange with "musage"?

When I run with "musage" btrfs finds "24 extents" but for "system|dup".

[Fri Jan  3 22:30:59 2025] BTRFS info (device sdc3): balance: start 
-musage=30,limit=1 -susage=30,limit=1
[Fri Jan  3 22:30:59 2025] BTRFS info (device sdc3): relocating block 
group 10975113707520 flags system|dup
[Fri Jan  3 22:31:02 2025] BTRFS info (device sdc3): found 24 extents, 
stage: move data extents
[Fri Jan  3 22:31:14 2025] BTRFS info (device sdc3): balance: ended with 
status: 0
[Fri Jan  3 22:31:20 2025] BTRFS info (device sdc3): balance: start 
-dusage=50,limit=1
[Fri Jan  3 22:31:20 2025] BTRFS info (device sdc3): relocating block 
group 10975168233472 flags data
[Fri Jan  3 22:31:20 2025] BTRFS info (device sdc3): balance: ended with 
status: 0
[Fri Jan  3 22:31:21 2025] BTRFS info (device sdc3): balance: start 
-musage=50,limit=1 -susage=50,limit=1
[Fri Jan  3 22:31:21 2025] BTRFS info (device sdc3): relocating block 
group 10975159844864 flags system|dup
[Fri Jan  3 22:31:26 2025] BTRFS info (device sdc3): found 24 extents, 
stage: move data extents
[Fri Jan  3 22:31:28 2025] BTRFS info (device sdc3): balance: ended with 
status: 0



Only sometimes there is "metadata|dup", and there is a huge (!!!) amount 
of "found extents" — two thousand or four thousand:

[Fri Jan  3 18:09:49 2025] BTRFS info (device sdc3): balance: start 
-musage=90,limit=3 -susage=90,limit=3
[Fri Jan  3 18:09:49 2025] BTRFS info (device sdc3): relocating block 
group 10632086290432 flags metadata|dup
[Fri Jan  3 18:11:48 2025] BTRFS info (device sdc3): found 21714 
extents, stage: move data extents
^^^
[Fri Jan  3 18:11:52 2025] BTRFS info (device sdc3): relocating block 
group 2695122386944 flags metadata|dup
[Fri Jan  3 18:49:37 2025] BTRFS info (device sdc3): found 47432 
extents, stage: move data extents
[Fri Jan  3 18:51:06 2025] BTRFS info (device sdc3): balance: ended with 
status: 0





My command line is using only "musage":

             btrfs balance start -musage=99,limit=1 ./sdc3/

but it optimizes "system|dup" and "metadata|dup".

[Fri Jan  3 21:52:55 2025] BTRFS info (device sdc3): balance: start 
-musage=99,limit=1 -susage=99,limit=1
[Fri Jan  3 21:52:55 2025] BTRFS info (device sdc3): relocating block 
group 10945618903040 flags system|dup
[Fri Jan  3 21:52:59 2025] BTRFS info (device sdc3): found 24 extents, 
stage: move data extents
[Fri Jan  3 21:53:00 2025] BTRFS info (device sdc3): relocating block 
group 10939176452096 flags metadata|dup
[Fri Jan  3 21:58:04 2025] BTRFS info (device sdc3): found 22488 
extents, stage: move data extents
[Fri Jan  3 21:59:02 2025] BTRFS info (device sdc3): balance: ended with 
status: 0

22488 found extents?




With musage=30:

[Fri Jan  3 22:54:35 2025] BTRFS info (device sdc3): balance: start 
-musage=30,limit=1 -susage=30,limit=1
[Fri Jan  3 22:54:35 2025] BTRFS info (device sdc3): relocating block 
group 10977475100672 flags metadata|dup
[Fri Jan  3 22:55:09 2025] BTRFS info (device sdc3): found 11710 
extents, stage: move data extents
[Fri Jan  3 22:55:11 2025] BTRFS info (device sdc3): relocating block 
group 10975306645504 flags system|dup
[Fri Jan  3 22:55:14 2025] BTRFS info (device sdc3): found 26 extents, 
stage: move data extents
[Fri Jan  3 22:55:18 2025] BTRFS info (device sdc3): balance: ended with 
status: 0

11710 found extents for musage=30?




I know, that standard solution is to

1. free space
2. make balance



But now I have 226GB free space, 92% disk occupied only, and only 19 Gb 
of unallocated space.


How to reclaim unallocated space from that 226 GB free space?














^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 97% full system, dusage didn't help, musage strange
  2025-01-03 22:52           ` Leszek Dubiel
@ 2025-01-04  5:32             ` Andrei Borzenkov
  2025-01-04  7:11               ` Leszek Dubiel
  0 siblings, 1 reply; 11+ messages in thread
From: Andrei Borzenkov @ 2025-01-04  5:32 UTC (permalink / raw)
  To: Leszek Dubiel, Btrfs BTRFS

04.01.2025 01:52, Leszek Dubiel wrote:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> W dniu 16.12.2024 o 22:01, Qu Wenruo pisze:
> 
> 
> 
>   > If you want to be extra safe, the best solution is to use tools that
> can report the usage percentage of each block group.
>   >
>   > You need something procedure like this:
>   >
>   > start:
>   >     if (unallocated space >= 8GiB)
>   >         return;
>   > check_usage_percentage:
>   >     if (no block group has usage percentage < 30%) {
>   >         delete_files;
>   >         goto check_usage_percentage;
>   >     }
>   >     balance dusage=30
>   >     goto start;
>   >
>   > Although there are some concerns, firstly the tool, sorry I didn't
> remember the name but there is an out-of-btrfs-progs tool can do exactly
> that.
> 
> In btrfs-progs package I didn't find any such tool.
> 
> There is "btrfs maintenance" by kdave:
> 
>                        https://github.com/kdave/btrfsmaintenance
> 
> but it starts normal balance, it doesn't analize "block usage percentage".
> 

https://github.com/knorrie/btrfs-heatmap

https://github.com/knorrie/python-btrfs

The latter is general Python library to work with btrfs and various 
(sample) tools, like btrfs-balance-least-used or btrfs-usage-report.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 97% full system, dusage didn't help, musage strange
  2025-01-04  5:32             ` Andrei Borzenkov
@ 2025-01-04  7:11               ` Leszek Dubiel
  0 siblings, 0 replies; 11+ messages in thread
From: Leszek Dubiel @ 2025-01-04  7:11 UTC (permalink / raw)
  To: Btrfs BTRFS



>>
>>   > If you want to be extra safe, the best solution is to use tools that
>> can report the usage percentage of each block group.
>>   >
>>   > You need something procedure like this:
>>   >
>>   > start:
>>   >     if (unallocated space >= 8GiB)
>>   >         return;
>>   > check_usage_percentage:
>>   >     if (no block group has usage percentage < 30%) {
>>   >         delete_files;
>>   >         goto check_usage_percentage;
>>   >     }
>>   >     balance dusage=30
>>   >     goto start;
>>   >
>>   > Although there are some concerns, firstly the tool, sorry I didn't
>> remember the name but there is an out-of-btrfs-progs tool can do exactly
>> that.
>>
>> In btrfs-progs package I didn't find any such tool.
>>
>> There is "btrfs maintenance" by kdave:
>>
>>                        https://github.com/kdave/btrfsmaintenance
>>
>> but it starts normal balance, it doesn't analize "block usage 
>> percentage".
>>
>
> https://github.com/knorrie/btrfs-heatmap
>
> https://github.com/knorrie/python-btrfs
>
> The latter is general Python library to work with btrfs and various 
> (sample) tools, like btrfs-balance-least-used or btrfs-usage-report.
>

Ok, thank you for support. I will look for that tools and try them.



PS.

For other people searching information:



— on Github "'no space left on device' despite only 59% fill" — good 
help from kakra:

https://github.com/btrfs/btrfs-todo/issues/61



— old reading on Marc's Blog "Fixing Btrfs Filesystem Full Problems"

https://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html



— superuser thread, clear_cache, skip_balance:

https://superuser.com/questions/1419067/btrfs-root-no-space-left-on-device-auto-remount-read-only-cant-balance-cant











^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-01-04  7:11 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-14 17:55 97% full system, dusage didn't help, musage strange Leszek Dubiel
2024-12-14 18:35 ` Roman Mamedov
2024-12-14 18:47 ` Andrei Borzenkov
2024-12-14 20:13   ` Leszek Dubiel
2024-12-14 21:14     ` Qu Wenruo
2024-12-16 17:12       ` Leszek Dubiel
2024-12-16 21:01         ` Qu Wenruo
2024-12-17 21:44           ` Leszek Dubiel
2025-01-03 22:52           ` Leszek Dubiel
2025-01-04  5:32             ` Andrei Borzenkov
2025-01-04  7:11               ` Leszek Dubiel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox