clear space cache v1 fails with Unable to find block group for 0

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* clear space cache v1 fails with Unable to find block group for 0
@ 2024-12-08 16:02 j4nn
  2024-12-08 20:26 ` Qu Wenruo
  0 siblings, 1 reply; 13+ messages in thread
From: j4nn @ 2024-12-08 16:02 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I am trying to switch 8TB raid1 btrfs from space cache v1 to v2, but
the clear space cache v1 fails as following:

gentoo ~ # btrfs filesystem df /mnt/data
Data, RAID1: total=7.36TiB, used=7.00TiB
System, RAID1: total=64.00MiB, used=1.11MiB
Metadata, RAID1: total=63.00GiB, used=57.37GiB
Metadata, DUP: total=5.00GiB, used=1.18GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
WARNING: Multiple block group profiles detected, see 'man btrfs(5)'
WARNING:    Metadata: raid1, dup
gentoo ~ # umount /mnt/data

gentoo ~ # time btrfs rescue clear-space-cache v1 /dev/mapper/wdrb-bdata
Unable to find block group for 0
Unable to find block group for 0
Unable to find block group for 0
ERROR: failed to clear free space cache
extent buffer leak: start 9587384418304 len 16384

real    7m8.174s
user    0m6.883s
sys     0m9.322s


Here some info:

gentoo ~ # uname -a
Linux gentoo 6.12.3-gentoo-x86_64 #1 SMP PREEMPT_DYNAMIC Sun Dec  8
00:12:56 CET 2024 x86_64 AMD Ryzen 9 5950X 16-Core Processor
AuthenticAMD GNU/Linux
gentoo ~ # btrfs --version
btrfs-progs v6.12
-EXPERIMENTAL -INJECT -STATIC +LZO +ZSTD +UDEV +FSVERITY +ZONED CRYPTO=builtin
gentoo ~ # btrfs filesystem show /mnt/data
Label: 'rdata'  uuid: 1dfac20a-3f84-4149-aba0-f160ab633373
       Total devices 2 FS bytes used 7.06TiB
       devid    1 size 8.00TiB used 7.26TiB path /dev/mapper/wdrb-bdata
       devid    2 size 8.00TiB used 7.25TiB path /dev/mapper/wdrc-cdata
gentoo ~ # dmesg | tail -n 6
[31008.980706] BTRFS info (device dm-0): first mount of filesystem
1dfac20a-3f84-4149-aba0-f160ab633373
[31008.980726] BTRFS info (device dm-0): using crc32c (crc32c-intel)
checksum algorithm
[31008.980731] BTRFS info (device dm-0): disk space caching is enabled
[31008.980734] BTRFS warning (device dm-0): space cache v1 is being
deprecated and will be removed in a future release, please use -o
space_cache=v2
[31009.994687] BTRFS info (device dm-0): bdev /dev/mapper/wdrb-bdata
errs: wr 8, rd 0, flush 0, corrupt 0, gen 0
[31009.994696] BTRFS info (device dm-0): bdev /dev/mapper/wdrc-cdata
errs: wr 7, rd 0, flush 0, corrupt 0, gen 0

Completed scrub (which corrected 4 errors), btrfs check completed
without errors:

gentoo ~ # btrfs scrub status /mnt/data
UUID:             1dfac20a-3f84-4149-aba0-f160ab633373
Scrub started:    Fri Dec  6 13:12:36 2024
Status:           finished
Duration:         16:11:22
Total to scrub:   14.92TiB
Rate:             268.35MiB/s
Error summary:    verify=4
 Corrected:      4
 Uncorrectable:  0
 Unverified:     0
gentoo ~ # umount /mnt/data

gentoo ~ # time btrfs check -p /dev/mapper/wdrb-bdata
Opening filesystem to check...
Checking filesystem on /dev/mapper/wdrb-bdata
UUID: 1dfac20a-3f84-4149-aba0-f160ab633373
[1/7] checking root items                      (0:06:57 elapsed,
34253945 items checked)
[2/7] checking extents                         (0:23:08 elapsed,
3999596 items checked)
[3/7] checking free space cache                (0:04:25 elapsed, 7868
items checked)
[4/7] checking fs roots                        (1:03:46 elapsed,
3215533 items checked)
[5/7] checking csums (without verifying data)  (0:11:58 elapsed,
15418322 items checked)
[6/7] checking root refs                       (0:00:00 elapsed, 52
items checked)
[7/7] checking quota groups skipped (not enabled on this FS)
found 8199989936128 bytes used, no error found
total csum bytes: 7940889876
total tree bytes: 65528446976
total fs tree bytes: 52856799232
total extent tree bytes: 3578331136
btree space waste bytes: 10797983857
file data blocks allocated: 21632555483136
referenced 9547690319872

real    111m10.370s
user    10m28.442s
sys     6m44.888s

Tried some balance as found example posted, not really sure if that should help:

gentoo ~ # btrfs balance start -dusage=10 /mnt/data
Done, had to relocate 32 out of 7467 chunks

gentoo ~ # btrfs filesystem df /mnt/data
Data, RAID1: total=7.19TiB, used=7.00TiB
System, RAID1: total=64.00MiB, used=1.08MiB
Metadata, RAID1: total=63.00GiB, used=57.36GiB
Metadata, DUP: total=5.00GiB, used=1.18GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
WARNING: Multiple block group profiles detected, see 'man btrfs(5)'
WARNING:    Metadata: raid1, dup

But it did not help:

gentoo ~ # time btrfs rescue clear-space-cache v1 /dev/mapper/wdrb-bdata
Unable to find block group for 0
Unable to find block group for 0
Unable to find block group for 0
ERROR: failed to clear free space cache
extent buffer leak: start 7995086045184 len 16384

real    6m58.515s
user    0m6.270s
sys     0m9.586s

Any idea how to fix this?
Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clear space cache v1 fails with Unable to find block group for 0
  2024-12-08 16:02 clear space cache v1 fails with Unable to find block group for 0 j4nn
@ 2024-12-08 20:26 ` Qu Wenruo
  2024-12-08 21:25   ` j4nn
  0 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2024-12-08 20:26 UTC (permalink / raw)
  To: j4nn, linux-btrfs



在 2024/12/9 02:32, j4nn 写道:
> Hi,
>
> I am trying to switch 8TB raid1 btrfs from space cache v1 to v2, but
> the clear space cache v1 fails as following:
>
> gentoo ~ # btrfs filesystem df /mnt/data
> Data, RAID1: total=7.36TiB, used=7.00TiB
> System, RAID1: total=64.00MiB, used=1.11MiB
> Metadata, RAID1: total=63.00GiB, used=57.37GiB
> Metadata, DUP: total=5.00GiB, used=1.18GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
> WARNING: Multiple block group profiles detected, see 'man btrfs(5)'
> WARNING:    Metadata: raid1, dup
> gentoo ~ # umount /mnt/data
>
> gentoo ~ # time btrfs rescue clear-space-cache v1 /dev/mapper/wdrb-bdata
> Unable to find block group for 0
> Unable to find block group for 0
> Unable to find block group for 0

This is a common indicator of -ENOSPC.

But according to the fi df output, we should have quite a lot of
metadata space left.

The only concern is the DUP metadata, which may cause the space
reservation code not to work in progs.

Have you tried to convert the DUP metadata first?

And `btrfs fi usage` output please.

> ERROR: failed to clear free space cache
> extent buffer leak: start 9587384418304 len 16384
>
> real    7m8.174s
> user    0m6.883s
> sys     0m9.322s
>
>
> Here some info:
>
> gentoo ~ # uname -a
> Linux gentoo 6.12.3-gentoo-x86_64 #1 SMP PREEMPT_DYNAMIC Sun Dec  8
> 00:12:56 CET 2024 x86_64 AMD Ryzen 9 5950X 16-Core Processor
> AuthenticAMD GNU/Linux
> gentoo ~ # btrfs --version
> btrfs-progs v6.12
> -EXPERIMENTAL -INJECT -STATIC +LZO +ZSTD +UDEV +FSVERITY +ZONED CRYPTO=builtin
> gentoo ~ # btrfs filesystem show /mnt/data
> Label: 'rdata'  uuid: 1dfac20a-3f84-4149-aba0-f160ab633373
>         Total devices 2 FS bytes used 7.06TiB
>         devid    1 size 8.00TiB used 7.26TiB path /dev/mapper/wdrb-bdata
>         devid    2 size 8.00TiB used 7.25TiB path /dev/mapper/wdrc-cdata
> gentoo ~ # dmesg | tail -n 6
> [31008.980706] BTRFS info (device dm-0): first mount of filesystem
> 1dfac20a-3f84-4149-aba0-f160ab633373
> [31008.980726] BTRFS info (device dm-0): using crc32c (crc32c-intel)
> checksum algorithm
> [31008.980731] BTRFS info (device dm-0): disk space caching is enabled
> [31008.980734] BTRFS warning (device dm-0): space cache v1 is being
> deprecated and will be removed in a future release, please use -o
> space_cache=v2
> [31009.994687] BTRFS info (device dm-0): bdev /dev/mapper/wdrb-bdata
> errs: wr 8, rd 0, flush 0, corrupt 0, gen 0
> [31009.994696] BTRFS info (device dm-0): bdev /dev/mapper/wdrc-cdata
> errs: wr 7, rd 0, flush 0, corrupt 0, gen 0
>
> Completed scrub (which corrected 4 errors), btrfs check completed
> without errors:
>
> gentoo ~ # btrfs scrub status /mnt/data
> UUID:             1dfac20a-3f84-4149-aba0-f160ab633373
> Scrub started:    Fri Dec  6 13:12:36 2024
> Status:           finished
> Duration:         16:11:22
> Total to scrub:   14.92TiB
> Rate:             268.35MiB/s
> Error summary:    verify=4
>   Corrected:      4
>   Uncorrectable:  0
>   Unverified:     0
> gentoo ~ # umount /mnt/data
>
> gentoo ~ # time btrfs check -p /dev/mapper/wdrb-bdata
> Opening filesystem to check...
> Checking filesystem on /dev/mapper/wdrb-bdata
> UUID: 1dfac20a-3f84-4149-aba0-f160ab633373
> [1/7] checking root items                      (0:06:57 elapsed,
> 34253945 items checked)
> [2/7] checking extents                         (0:23:08 elapsed,
> 3999596 items checked)
> [3/7] checking free space cache                (0:04:25 elapsed, 7868
> items checked)
> [4/7] checking fs roots                        (1:03:46 elapsed,
> 3215533 items checked)
> [5/7] checking csums (without verifying data)  (0:11:58 elapsed,
> 15418322 items checked)
> [6/7] checking root refs                       (0:00:00 elapsed, 52
> items checked)
> [7/7] checking quota groups skipped (not enabled on this FS)
> found 8199989936128 bytes used, no error found
> total csum bytes: 7940889876
> total tree bytes: 65528446976
> total fs tree bytes: 52856799232
> total extent tree bytes: 3578331136
> btree space waste bytes: 10797983857
> file data blocks allocated: 21632555483136
> referenced 9547690319872
>
> real    111m10.370s
> user    10m28.442s
> sys     6m44.888s
>
> Tried some balance as found example posted, not really sure if that should help:
>
> gentoo ~ # btrfs balance start -dusage=10 /mnt/data
> Done, had to relocate 32 out of 7467 chunks

The balance doesn't do much, the overall chunk layout is still the same.
>
> gentoo ~ # btrfs filesystem df /mnt/data
> Data, RAID1: total=7.19TiB, used=7.00TiB
> System, RAID1: total=64.00MiB, used=1.08MiB
> Metadata, RAID1: total=63.00GiB, used=57.36GiB
> Metadata, DUP: total=5.00GiB, used=1.18GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
> WARNING: Multiple block group profiles detected, see 'man btrfs(5)'
> WARNING:    Metadata: raid1, dup
>
> But it did not help:
>
> gentoo ~ # time btrfs rescue clear-space-cache v1 /dev/mapper/wdrb-bdata
> Unable to find block group for 0
> Unable to find block group for 0
> Unable to find block group for 0
> ERROR: failed to clear free space cache
> extent buffer leak: start 7995086045184 len 16384
>
> real    6m58.515s
> user    0m6.270s
> sys     0m9.586s

Migrating to v2 cache doesn't really need to manually clear the v1 cache.

Just mounting with "space_cache=v2" option will automatically purge the
v1 cache, just as explained in the man page:

   If v2 is enabled, and v1 space cache will be cleared (at the first
   mount)

If you want to dig deeper, the implementation is done in
btrfs_set_free_space_cache_v1_active() which calls
cleanup_free_space_cache_v1() if @active is false.

Thanks,
Qu
>
> Any idea how to fix this?
> Thanks.
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clear space cache v1 fails with Unable to find block group for 0
  2024-12-08 20:26 ` Qu Wenruo
@ 2024-12-08 21:25   ` j4nn
  2024-12-08 21:36     ` Qu Wenruo
  0 siblings, 1 reply; 13+ messages in thread
From: j4nn @ 2024-12-08 21:25 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Sun, 8 Dec 2024 at 21:26, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> 在 2024/12/9 02:32, j4nn 写道:
> > gentoo ~ # time btrfs rescue clear-space-cache v1 /dev/mapper/wdrb-bdata
> > Unable to find block group for 0
> > Unable to find block group for 0
> > Unable to find block group for 0
>
> This is a common indicator of -ENOSPC.
>
> But according to the fi df output, we should have quite a lot of
> metadata space left.
>
> The only concern is the DUP metadata, which may cause the space
> reservation code not to work in progs.
>
> Have you tried to convert the DUP metadata first?

I am not sure how to do that.
I see the "Multiple block group profiles detected" warning, assumed it
is about metadata in RAID1 and DUP.
But I am not sure how that got created or if it has any benefit or not.
And what that DUP should be converted into?

> And `btrfs fi usage` output please.

gentoo ~ # btrfs fi usage /mnt/data
Overall:
   Device size:                  16.00TiB
   Device allocated:             14.51TiB
   Device unallocated:            1.48TiB
   Device missing:                  0.00B
   Device slack:                    0.00B
   Used:                         14.18TiB
   Free (estimated):            923.95GiB      (min: 923.95GiB)
   Free (statfs, df):           918.95GiB
   Data ratio:                       2.00
   Metadata ratio:                   2.00
   Global reserve:              512.00MiB      (used: 0.00B)
   Multiple profiles:                 yes      (metadata)

Data,RAID1: Size:7.19TiB, Used:7.03TiB (97.77%)
  /dev/mapper/wdrb-bdata          7.19TiB
  /dev/mapper/wdrc-cdata          7.19TiB

Metadata,RAID1: Size:63.00GiB, Used:58.56GiB (92.95%)
  /dev/mapper/wdrb-bdata         63.00GiB
  /dev/mapper/wdrc-cdata         63.00GiB

Metadata,DUP: Size:5.00GiB, Used:1.18GiB (23.60%)
  /dev/mapper/wdrb-bdata         10.00GiB

System,RAID1: Size:32.00MiB, Used:1.08MiB (3.37%)
  /dev/mapper/wdrb-bdata         32.00MiB
  /dev/mapper/wdrc-cdata         32.00MiB

Unallocated:
  /dev/mapper/wdrb-bdata        755.00GiB
  /dev/mapper/wdrc-cdata        765.00GiB

gentoo ~ # lvs
 LV     VG     Attr       LSize    Pool Origin Data%  Meta%  Move Log
Cpy%Sync Convert
 bdata  wdrb   -wi-ao----    8.00t
 cdata  wdrc   -wi-ao----    8.00t
gentoo ~ # vgs
 VG     #PV #LV #SN Attr   VSize    VFree
 wdrb     1   1   0 wz--n-   <9.10t <1.10t
 wdrc     1   3   0 wz--n-    9.09t     0
gentoo ~ # pvs
 PV         VG     Fmt  Attr PSize    PFree
 /dev/sdb1  wdrc   lvm2 a--     9.09t     0
 /dev/sdd1  wdrb   lvm2 a--    <9.10t <1.10t


> > Tried some balance as found example posted, not really sure if that should help:
> >
> > gentoo ~ # btrfs balance start -dusage=10 /mnt/data
> > Done, had to relocate 32 out of 7467 chunks
>
> The balance doesn't do much, the overall chunk layout is still the same.
> >
> > gentoo ~ # time btrfs rescue clear-space-cache v1 /dev/mapper/wdrb-bdata
> > Unable to find block group for 0
> > Unable to find block group for 0
> > Unable to find block group for 0
> > ERROR: failed to clear free space cache
> > extent buffer leak: start 7995086045184 len 16384
>
> Migrating to v2 cache doesn't really need to manually clear the v1 cache.
>
> Just mounting with "space_cache=v2" option will automatically purge the
> v1 cache, just as explained in the man page:
>
>    If v2 is enabled, and v1 space cache will be cleared (at the first
>    mount)
>
> If you want to dig deeper, the implementation is done in
> btrfs_set_free_space_cache_v1_active() which calls
> cleanup_free_space_cache_v1() if @active is false.

Ok, I just followed a howto for the switch.
Did not know it is ok just with the mount option.
Should it be safe to try it if I get the errors with the "btrfs rescue
clear-space-cache v1"?

Thank you.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clear space cache v1 fails with Unable to find block group for 0
  2024-12-08 21:25   ` j4nn
@ 2024-12-08 21:36     ` Qu Wenruo
  2024-12-08 22:19       ` j4nn
  0 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2024-12-08 21:36 UTC (permalink / raw)
  To: j4nn; +Cc: linux-btrfs



在 2024/12/9 07:55, j4nn 写道:
> On Sun, 8 Dec 2024 at 21:26, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>> 在 2024/12/9 02:32, j4nn 写道:
>>> gentoo ~ # time btrfs rescue clear-space-cache v1 /dev/mapper/wdrb-bdata
>>> Unable to find block group for 0
>>> Unable to find block group for 0
>>> Unable to find block group for 0
>>
>> This is a common indicator of -ENOSPC.
>>
>> But according to the fi df output, we should have quite a lot of
>> metadata space left.
>>
>> The only concern is the DUP metadata, which may cause the space
>> reservation code not to work in progs.
>>
>> Have you tried to convert the DUP metadata first?
>
> I am not sure how to do that.
> I see the "Multiple block group profiles detected" warning, assumed it
> is about metadata in RAID1 and DUP.
> But I am not sure how that got created or if it has any benefit or not.
> And what that DUP should be converted into?

Not sure either. But I guess in the past you mounted the device with one
disk missing, and did some writes.
And those writes by incident created a new chunk, and in that case the
new chunk are only seeing one writable disk, so it went DUP.


To remove it, you need specific balance filter, e.g

  # btrfs balance start -mprofiles=dup,convert=raid1 /mnt/data

>
>> And `btrfs fi usage` output please.
>
> gentoo ~ # btrfs fi usage /mnt/data
> Overall:
>     Device size:                  16.00TiB
>     Device allocated:             14.51TiB
>     Device unallocated:            1.48TiB
>     Device missing:                  0.00B
>     Device slack:                    0.00B
>     Used:                         14.18TiB
>     Free (estimated):            923.95GiB      (min: 923.95GiB)
>     Free (statfs, df):           918.95GiB
>     Data ratio:                       2.00
>     Metadata ratio:                   2.00
>     Global reserve:              512.00MiB      (used: 0.00B)
>     Multiple profiles:                 yes      (metadata)
>
> Data,RAID1: Size:7.19TiB, Used:7.03TiB (97.77%)
>    /dev/mapper/wdrb-bdata          7.19TiB
>    /dev/mapper/wdrc-cdata          7.19TiB
>
> Metadata,RAID1: Size:63.00GiB, Used:58.56GiB (92.95%)
>    /dev/mapper/wdrb-bdata         63.00GiB
>    /dev/mapper/wdrc-cdata         63.00GiB
>
> Metadata,DUP: Size:5.00GiB, Used:1.18GiB (23.60%)
>    /dev/mapper/wdrb-bdata         10.00GiB
>
> System,RAID1: Size:32.00MiB, Used:1.08MiB (3.37%)
>    /dev/mapper/wdrb-bdata         32.00MiB
>    /dev/mapper/wdrc-cdata         32.00MiB
>
> Unallocated:
>    /dev/mapper/wdrb-bdata        755.00GiB
>    /dev/mapper/wdrc-cdata        765.00GiB

You have more than enough space to remove the DUP chunks.

>
> gentoo ~ # lvs
>   LV     VG     Attr       LSize    Pool Origin Data%  Meta%  Move Log
> Cpy%Sync Convert
>   bdata  wdrb   -wi-ao----    8.00t
>   cdata  wdrc   -wi-ao----    8.00t
> gentoo ~ # vgs
>   VG     #PV #LV #SN Attr   VSize    VFree
>   wdrb     1   1   0 wz--n-   <9.10t <1.10t
>   wdrc     1   3   0 wz--n-    9.09t     0
> gentoo ~ # pvs
>   PV         VG     Fmt  Attr PSize    PFree
>   /dev/sdb1  wdrc   lvm2 a--     9.09t     0
>   /dev/sdd1  wdrb   lvm2 a--    <9.10t <1.10t
>
>
>>> Tried some balance as found example posted, not really sure if that should help:
>>>
>>> gentoo ~ # btrfs balance start -dusage=10 /mnt/data
>>> Done, had to relocate 32 out of 7467 chunks
>>
>> The balance doesn't do much, the overall chunk layout is still the same.
>>>
>>> gentoo ~ # time btrfs rescue clear-space-cache v1 /dev/mapper/wdrb-bdata
>>> Unable to find block group for 0
>>> Unable to find block group for 0
>>> Unable to find block group for 0
>>> ERROR: failed to clear free space cache
>>> extent buffer leak: start 7995086045184 len 16384
>>
>> Migrating to v2 cache doesn't really need to manually clear the v1 cache.
>>
>> Just mounting with "space_cache=v2" option will automatically purge the
>> v1 cache, just as explained in the man page:
>>
>>     If v2 is enabled, and v1 space cache will be cleared (at the first
>>     mount)
>>
>> If you want to dig deeper, the implementation is done in
>> btrfs_set_free_space_cache_v1_active() which calls
>> cleanup_free_space_cache_v1() if @active is false.
>
> Ok, I just followed a howto for the switch.
> Did not know it is ok just with the mount option.
> Should it be safe to try it if I get the errors with the "btrfs rescue
> clear-space-cache v1"?

Since progs and kernel have different implementations on the space
reservation code, it's not that rare to hit cases where btrfs-progs hits
some false alerts.

If you balanced removed the DUP profile, then you can try "btrfs rescue"
again, just to see if it works and I really appreciate the extra
feedback to help debugging the progs bug.

Otherwise I believe it should be pretty safe just using "space_cache=v2"
mount option.

Thanks,
Qu
>
> Thank you.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clear space cache v1 fails with Unable to find block group for 0
  2024-12-08 21:36     ` Qu Wenruo
@ 2024-12-08 22:19       ` j4nn
  2024-12-08 22:32         ` Qu Wenruo
  0 siblings, 1 reply; 13+ messages in thread
From: j4nn @ 2024-12-08 22:19 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Sun, 8 Dec 2024 at 22:36, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> 在 2024/12/9 07:55, j4nn 写道:
> > On Sun, 8 Dec 2024 at 21:26, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >> 在 2024/12/9 02:32, j4nn 写道:
> >>> gentoo ~ # time btrfs rescue clear-space-cache v1 /dev/mapper/wdrb-bdata
> >>> Unable to find block group for 0
> >>> Unable to find block group for 0
> >>> Unable to find block group for 0
> >>
> >> This is a common indicator of -ENOSPC.
> >>
> >> But according to the fi df output, we should have quite a lot of
> >> metadata space left.
> >>
> >> The only concern is the DUP metadata, which may cause the space
> >> reservation code not to work in progs.
> >>
> >> Have you tried to convert the DUP metadata first?
> >
> > I am not sure how to do that.
> > I see the "Multiple block group profiles detected" warning, assumed it
> > is about metadata in RAID1 and DUP.
> > But I am not sure how that got created or if it has any benefit or not.
> > And what that DUP should be converted into?
>
> Not sure either. But I guess in the past you mounted the device with one
> disk missing, and did some writes.
> And those writes by incident created a new chunk, and in that case the
> new chunk are only seeing one writable disk, so it went DUP.
>
>
> To remove it, you need specific balance filter, e.g
>
>   # btrfs balance start -mprofiles=dup,convert=raid1 /mnt/data

Thank you very much - it worked!

gentoo ~ # btrfs rescue clear-space-cache v1 /dev/mapper/wdrb-bdata
Free space cache cleared

[55392.407421] BTRFS info (device dm-0): balance: start
-mconvert=raid1,profiles=dup -sconvert=raid1,profiles=dup
[55392.480051] BTRFS info (device dm-0): relocating block group
3216960913408 flags metadata|dup
[55431.499765] BTRFS info (device dm-0): found 27769 extents, stage:
move data extents
[55434.081160] BTRFS info (device dm-0): relocating block group
1142491709440 flags metadata|dup
[55464.555860] BTRFS info (device dm-0): found 9152 extents, stage:
move data extents
[55466.614881] BTRFS info (device dm-0): relocating block group
1112426938368 flags metadata|dup
[55497.322148] BTRFS info (device dm-0): found 9925 extents, stage:
move data extents
[55499.938966] BTRFS info (device dm-0): relocating block group
1079140941824 flags metadata|dup
[55520.415801] BTRFS info (device dm-0): found 14986 extents, stage:
move data extents
[55521.796855] BTRFS info (device dm-0): relocating block group
746280976384 flags metadata|dup
[55548.800335] BTRFS info (device dm-0): found 15430 extents, stage:
move data extents
[55550.291120] BTRFS info (device dm-0): balance: ended with status: 0
[55849.661892] BTRFS info (device dm-0): last unmount of filesystem
1dfac20a-3f84-4149-aba0-f160ab633373

gentoo ~ # vi /etc/fstab
gentoo ~ # systemctl daemon-reload
gentoo ~ # mount /mnt/data

[56068.187456] BTRFS info (device dm-0): first mount of filesystem
1dfac20a-3f84-4149-aba0-f160ab633373
[56068.187473] BTRFS info (device dm-0): using crc32c (crc32c-intel)
checksum algorithm
[56068.187477] BTRFS info (device dm-0): using free-space-tree
[56068.823725] BTRFS info (device dm-0): bdev /dev/mapper/wdrb-bdata
errs: wr 8, rd 0, flush 0, corrupt 0, gen 0
[56068.823733] BTRFS info (device dm-0): bdev /dev/mapper/wdrc-cdata
errs: wr 7, rd 0, flush 0, corrupt 0, gen 0

Just wondering what the "errs: wr 8" and "errs: wr 7" mean here?


>
> You have more than enough space to remove the DUP chunks.
>
> > Ok, I just followed a howto for the switch.
> > Did not know it is ok just with the mount option.
> > Should it be safe to try it if I get the errors with the "btrfs rescue
> > clear-space-cache v1"?
>
> Since progs and kernel have different implementations on the space
> reservation code, it's not that rare to hit cases where btrfs-progs hits
> some false alerts.
>
> If you balanced removed the DUP profile, then you can try "btrfs rescue"
> again, just to see if it works and I really appreciate the extra
> feedback to help debugging the progs bug.

Yes, it worked - thanks again.
Here some extra feedback:

gentoo ~ # btrfs fi usage /mnt/data
Overall:
   Device size:                  16.00TiB
   Device allocated:             14.55TiB
   Device unallocated:            1.45TiB
   Device missing:                  0.00B
   Device slack:                    0.00B
   Used:                         14.18TiB
   Free (estimated):            906.41GiB      (min: 906.41GiB)
   Free (statfs, df):           906.41GiB
   Data ratio:                       2.00
   Metadata ratio:                   2.00
   Global reserve:              512.00MiB      (used: 0.00B)
   Multiple profiles:                  no

Data,RAID1: Size:7.19TiB, Used:7.03TiB (97.77%)
  /dev/mapper/wdrb-bdata          7.19TiB
  /dev/mapper/wdrc-cdata          7.19TiB

Metadata,RAID1: Size:86.00GiB, Used:59.74GiB (69.46%)
  /dev/mapper/wdrb-bdata         86.00GiB
  /dev/mapper/wdrc-cdata         86.00GiB

System,RAID1: Size:32.00MiB, Used:1.08MiB (3.37%)
  /dev/mapper/wdrb-bdata         32.00MiB
  /dev/mapper/wdrc-cdata         32.00MiB

Unallocated:
  /dev/mapper/wdrb-bdata        742.00GiB
  /dev/mapper/wdrc-cdata        742.00GiB

gentoo ~ # btrfs filesystem df /mnt/data
Data, RAID1: total=7.19TiB, used=7.03TiB
System, RAID1: total=32.00MiB, used=1.08MiB
Metadata, RAID1: total=86.00GiB, used=59.74GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

gentoo ~ # mount -o rw,remount /mnt/data

[56709.279574] BTRFS info (device dm-0 state M): creating free space tree
[56893.200005] INFO: task btrfs-transacti:2510362 blocked for more
than 122 seconds.
[56893.200011]       Tainted: G        W  O       6.12.3-gentoo-x86_64 #1
...
[57061.541085] BTRFS info (device dm-0 state M): setting compat-ro
feature flag for FREE_SPACE_TREE (0x1)
[57061.541091] BTRFS info (device dm-0 state M): setting compat-ro
feature flag for FREE_SPACE_TREE_VALID (0x2)
[57062.642266] BTRFS info (device dm-0 state M): cleaning free space cache v1

gentoo ~ # btrfs filesystem df /mnt/data
Data, RAID1: total=7.19TiB, used=7.03TiB
System, RAID1: total=32.00MiB, used=1.08MiB
Metadata, RAID1: total=64.00GiB, used=59.75GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

gentoo ~ # btrfs fi usage /mnt/data
Overall:
   Device size:                  16.00TiB
   Device allocated:             14.51TiB
   Device unallocated:            1.49TiB
   Device missing:                  0.00B
   Device slack:                    0.00B
   Used:                         14.18TiB
   Free (estimated):            928.41GiB      (min: 928.41GiB)
   Free (statfs, df):           928.41GiB
   Data ratio:                       2.00
   Metadata ratio:                   2.00
   Global reserve:              512.00MiB      (used: 0.00B)
   Multiple profiles:                  no

Data,RAID1: Size:7.19TiB, Used:7.03TiB (97.77%)
  /dev/mapper/wdrb-bdata          7.19TiB
  /dev/mapper/wdrc-cdata          7.19TiB

Metadata,RAID1: Size:64.00GiB, Used:59.75GiB (93.35%)
  /dev/mapper/wdrb-bdata         64.00GiB
  /dev/mapper/wdrc-cdata         64.00GiB

System,RAID1: Size:32.00MiB, Used:1.08MiB (3.37%)
  /dev/mapper/wdrb-bdata         32.00MiB
  /dev/mapper/wdrc-cdata         32.00MiB

Unallocated:
  /dev/mapper/wdrb-bdata        764.00GiB
  /dev/mapper/wdrc-cdata        764.00GiB

>
> Otherwise I believe it should be pretty safe just using "space_cache=v2"
> mount option.
>
> Thanks,
> Qu
> >
> > Thank you.
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clear space cache v1 fails with Unable to find block group for 0
  2024-12-08 22:19       ` j4nn
@ 2024-12-08 22:32         ` Qu Wenruo
  2024-12-08 22:50           ` j4nn
  0 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2024-12-08 22:32 UTC (permalink / raw)
  To: j4nn; +Cc: linux-btrfs



在 2024/12/9 08:49, j4nn 写道:
> On Sun, 8 Dec 2024 at 22:36, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>> 在 2024/12/9 07:55, j4nn 写道:
>>> On Sun, 8 Dec 2024 at 21:26, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>> 在 2024/12/9 02:32, j4nn 写道:
>>>>> gentoo ~ # time btrfs rescue clear-space-cache v1 /dev/mapper/wdrb-bdata
>>>>> Unable to find block group for 0
>>>>> Unable to find block group for 0
>>>>> Unable to find block group for 0
>>>>
>>>> This is a common indicator of -ENOSPC.
>>>>
>>>> But according to the fi df output, we should have quite a lot of
>>>> metadata space left.
>>>>
>>>> The only concern is the DUP metadata, which may cause the space
>>>> reservation code not to work in progs.
>>>>
>>>> Have you tried to convert the DUP metadata first?
>>>
>>> I am not sure how to do that.
>>> I see the "Multiple block group profiles detected" warning, assumed it
>>> is about metadata in RAID1 and DUP.
>>> But I am not sure how that got created or if it has any benefit or not.
>>> And what that DUP should be converted into?
>>
>> Not sure either. But I guess in the past you mounted the device with one
>> disk missing, and did some writes.
>> And those writes by incident created a new chunk, and in that case the
>> new chunk are only seeing one writable disk, so it went DUP.
>>
>>
>> To remove it, you need specific balance filter, e.g
>>
>>    # btrfs balance start -mprofiles=dup,convert=raid1 /mnt/data
>
> Thank you very much - it worked!
>
> gentoo ~ # btrfs rescue clear-space-cache v1 /dev/mapper/wdrb-bdata
> Free space cache cleared
>
> [55392.407421] BTRFS info (device dm-0): balance: start
> -mconvert=raid1,profiles=dup -sconvert=raid1,profiles=dup
> [55392.480051] BTRFS info (device dm-0): relocating block group
> 3216960913408 flags metadata|dup
> [55431.499765] BTRFS info (device dm-0): found 27769 extents, stage:
> move data extents
> [55434.081160] BTRFS info (device dm-0): relocating block group
> 1142491709440 flags metadata|dup
> [55464.555860] BTRFS info (device dm-0): found 9152 extents, stage:
> move data extents
> [55466.614881] BTRFS info (device dm-0): relocating block group
> 1112426938368 flags metadata|dup
> [55497.322148] BTRFS info (device dm-0): found 9925 extents, stage:
> move data extents
> [55499.938966] BTRFS info (device dm-0): relocating block group
> 1079140941824 flags metadata|dup
> [55520.415801] BTRFS info (device dm-0): found 14986 extents, stage:
> move data extents
> [55521.796855] BTRFS info (device dm-0): relocating block group
> 746280976384 flags metadata|dup
> [55548.800335] BTRFS info (device dm-0): found 15430 extents, stage:
> move data extents
> [55550.291120] BTRFS info (device dm-0): balance: ended with status: 0
> [55849.661892] BTRFS info (device dm-0): last unmount of filesystem
> 1dfac20a-3f84-4149-aba0-f160ab633373
>
> gentoo ~ # vi /etc/fstab
> gentoo ~ # systemctl daemon-reload
> gentoo ~ # mount /mnt/data
>
> [56068.187456] BTRFS info (device dm-0): first mount of filesystem
> 1dfac20a-3f84-4149-aba0-f160ab633373
> [56068.187473] BTRFS info (device dm-0): using crc32c (crc32c-intel)
> checksum algorithm
> [56068.187477] BTRFS info (device dm-0): using free-space-tree
> [56068.823725] BTRFS info (device dm-0): bdev /dev/mapper/wdrb-bdata
> errs: wr 8, rd 0, flush 0, corrupt 0, gen 0
> [56068.823733] BTRFS info (device dm-0): bdev /dev/mapper/wdrc-cdata
> errs: wr 7, rd 0, flush 0, corrupt 0, gen 0
>
> Just wondering what the "errs: wr 8" and "errs: wr 7" mean here?

It means in the past, there are 8 write failures for the device
wdrb-bdata, and 7 write failures for device wdrc-cdata.

The error is from the lower layer, thus btrfs has no idea what happened.
You may need to check the log to find out why.

>
>
>>
>> You have more than enough space to remove the DUP chunks.
>>
>>> Ok, I just followed a howto for the switch.
>>> Did not know it is ok just with the mount option.
>>> Should it be safe to try it if I get the errors with the "btrfs rescue
>>> clear-space-cache v1"?
>>
>> Since progs and kernel have different implementations on the space
>> reservation code, it's not that rare to hit cases where btrfs-progs hits
>> some false alerts.
>>
>> If you balanced removed the DUP profile, then you can try "btrfs rescue"
>> again, just to see if it works and I really appreciate the extra
>> feedback to help debugging the progs bug.
>
> Yes, it worked - thanks again.
> Here some extra feedback:
>
> gentoo ~ # btrfs fi usage /mnt/data
> Overall:
>     Device size:                  16.00TiB
>     Device allocated:             14.55TiB
>     Device unallocated:            1.45TiB
>     Device missing:                  0.00B
>     Device slack:                    0.00B
>     Used:                         14.18TiB
>     Free (estimated):            906.41GiB      (min: 906.41GiB)
>     Free (statfs, df):           906.41GiB
>     Data ratio:                       2.00
>     Metadata ratio:                   2.00
>     Global reserve:              512.00MiB      (used: 0.00B)
>     Multiple profiles:                  no
>
> Data,RAID1: Size:7.19TiB, Used:7.03TiB (97.77%)
>    /dev/mapper/wdrb-bdata          7.19TiB
>    /dev/mapper/wdrc-cdata          7.19TiB
>
> Metadata,RAID1: Size:86.00GiB, Used:59.74GiB (69.46%)
>    /dev/mapper/wdrb-bdata         86.00GiB
>    /dev/mapper/wdrc-cdata         86.00GiB
>
> System,RAID1: Size:32.00MiB, Used:1.08MiB (3.37%)
>    /dev/mapper/wdrb-bdata         32.00MiB
>    /dev/mapper/wdrc-cdata         32.00MiB
>
> Unallocated:
>    /dev/mapper/wdrb-bdata        742.00GiB
>    /dev/mapper/wdrc-cdata        742.00GiB
>
> gentoo ~ # btrfs filesystem df /mnt/data
> Data, RAID1: total=7.19TiB, used=7.03TiB
> System, RAID1: total=32.00MiB, used=1.08MiB
> Metadata, RAID1: total=86.00GiB, used=59.74GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
>
> gentoo ~ # mount -o rw,remount /mnt/data
>
> [56709.279574] BTRFS info (device dm-0 state M): creating free space tree
> [56893.200005] INFO: task btrfs-transacti:2510362 blocked for more
> than 122 seconds.
> [56893.200011]       Tainted: G        W  O       6.12.3-gentoo-x86_64 #1

I'm a little concerned about this.

Mind to share the full dmesg?

My current guess is, the v1 cache clearing is taking too much time for
the initial mount.
We should handle it properly or it may cause false alerts for future
migration.

Thanks,
Qu

> ...
> [57061.541085] BTRFS info (device dm-0 state M): setting compat-ro
> feature flag for FREE_SPACE_TREE (0x1)
> [57061.541091] BTRFS info (device dm-0 state M): setting compat-ro
> feature flag for FREE_SPACE_TREE_VALID (0x2)
> [57062.642266] BTRFS info (device dm-0 state M): cleaning free space cache v1
>
> gentoo ~ # btrfs filesystem df /mnt/data
> Data, RAID1: total=7.19TiB, used=7.03TiB
> System, RAID1: total=32.00MiB, used=1.08MiB
> Metadata, RAID1: total=64.00GiB, used=59.75GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
>
> gentoo ~ # btrfs fi usage /mnt/data
> Overall:
>     Device size:                  16.00TiB
>     Device allocated:             14.51TiB
>     Device unallocated:            1.49TiB
>     Device missing:                  0.00B
>     Device slack:                    0.00B
>     Used:                         14.18TiB
>     Free (estimated):            928.41GiB      (min: 928.41GiB)
>     Free (statfs, df):           928.41GiB
>     Data ratio:                       2.00
>     Metadata ratio:                   2.00
>     Global reserve:              512.00MiB      (used: 0.00B)
>     Multiple profiles:                  no
>
> Data,RAID1: Size:7.19TiB, Used:7.03TiB (97.77%)
>    /dev/mapper/wdrb-bdata          7.19TiB
>    /dev/mapper/wdrc-cdata          7.19TiB
>
> Metadata,RAID1: Size:64.00GiB, Used:59.75GiB (93.35%)
>    /dev/mapper/wdrb-bdata         64.00GiB
>    /dev/mapper/wdrc-cdata         64.00GiB
>
> System,RAID1: Size:32.00MiB, Used:1.08MiB (3.37%)
>    /dev/mapper/wdrb-bdata         32.00MiB
>    /dev/mapper/wdrc-cdata         32.00MiB
>
> Unallocated:
>    /dev/mapper/wdrb-bdata        764.00GiB
>    /dev/mapper/wdrc-cdata        764.00GiB
>
>>
>> Otherwise I believe it should be pretty safe just using "space_cache=v2"
>> mount option.
>>
>> Thanks,
>> Qu
>>>
>>> Thank you.
>>
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clear space cache v1 fails with Unable to find block group for 0
  2024-12-08 22:32         ` Qu Wenruo
@ 2024-12-08 22:50           ` j4nn
  2024-12-16 18:52             ` j4nn
  0 siblings, 1 reply; 13+ messages in thread
From: j4nn @ 2024-12-08 22:50 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Sun, 8 Dec 2024 at 23:32, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> 在 2024/12/9 08:49, j4nn 写道:
> > On Sun, 8 Dec 2024 at 22:36, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >> 在 2024/12/9 07:55, j4nn 写道:
> >>> On Sun, 8 Dec 2024 at 21:26, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >>>> 在 2024/12/9 02:32, j4nn 写道:
> > [56068.823725] BTRFS info (device dm-0): bdev /dev/mapper/wdrb-bdata
> > errs: wr 8, rd 0, flush 0, corrupt 0, gen 0
> > [56068.823733] BTRFS info (device dm-0): bdev /dev/mapper/wdrc-cdata
> > errs: wr 7, rd 0, flush 0, corrupt 0, gen 0
> >
> > Just wondering what the "errs: wr 8" and "errs: wr 7" mean here?
>
> It means in the past, there are 8 write failures for the device
> wdrb-bdata, and 7 write failures for device wdrc-cdata.
>
> The error is from the lower layer, thus btrfs has no idea what happened.
> You may need to check the log to find out why.

Thank you for the explanation.
I guess that must have been from the time when replacing cpu - my
motherboard has flaky sata connectors and it's really a pain to
"position" sata cables in a way that does not cause sata errors in
kernel logs.


> > gentoo ~ # mount -o rw,remount /mnt/data
> >
> > [56709.279574] BTRFS info (device dm-0 state M): creating free space tree
> > [56893.200005] INFO: task btrfs-transacti:2510362 blocked for more
> > than 122 seconds.
> > [56893.200011]       Tainted: G        W  O       6.12.3-gentoo-x86_64 #1
>
> I'm a little concerned about this.
>
> Mind to share the full dmesg?
>

[56068.187456] BTRFS info (device dm-0): first mount of filesystem
1dfac20a-3f84-4149-aba0-f160ab633373
[56068.187473] BTRFS info (device dm-0): using crc32c (crc32c-intel)
checksum algorithm
[56068.187477] BTRFS info (device dm-0): using free-space-tree
[56068.823725] BTRFS info (device dm-0): bdev /dev/mapper/wdrb-bdata
errs: wr 8, rd 0, flush 0, corrupt 0, gen 0
[56068.823733] BTRFS info (device dm-0): bdev /dev/mapper/wdrc-cdata
errs: wr 7, rd 0, flush 0, corrupt 0, gen 0
[56709.279574] BTRFS info (device dm-0 state M): creating free space tree
[56893.200005] INFO: task btrfs-transacti:2510362 blocked for more
than 122 seconds.
[56893.200011]       Tainted: G        W  O       6.12.3-gentoo-x86_64 #1
[56893.200014] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[56893.200016] task:btrfs-transacti state:D stack:0     pid:2510362
tgid:2510362 ppid:2      flags:0x00004000
[56893.200021] Call Trace:
[56893.200023]  <TASK>
[56893.200028]  __schedule+0x3f0/0xbd0
[56893.200037]  schedule+0x27/0xf0
[56893.200043]  btrfs_commit_transaction+0xc27/0xe80 [btrfs]
[56893.200086]  ? start_transaction+0xc0/0x820 [btrfs]
[56893.200120]  ? __pfx_autoremove_wake_function+0x10/0x10
[56893.200126]  transaction_kthread+0x159/0x1c0 [btrfs]
[56893.200160]  ? __pfx_transaction_kthread+0x10/0x10 [btrfs]
[56893.200198]  kthread+0xd2/0x100
[56893.200204]  ? __pfx_kthread+0x10/0x10
[56893.200209]  ret_from_fork+0x34/0x50
[56893.200214]  ? __pfx_kthread+0x10/0x10
[56893.200217]  ret_from_fork_asm+0x1a/0x30
[56893.200224]  </TASK>
[57061.541085] BTRFS info (device dm-0 state M): setting compat-ro
feature flag for FREE_SPACE_TREE (0x1)
[57061.541091] BTRFS info (device dm-0 state M): setting compat-ro
feature flag for FREE_SPACE_TREE_VALID (0x2)
[57062.642266] BTRFS info (device dm-0 state M): cleaning free space cache v1

> My current guess is, the v1 cache clearing is taking too much time for
> the initial mount.

I assumed the same, so I used the suggested echo to avoid repeating
that backtrace.
The initial mount after removal of v1 has been read only, the
backtrace happened when remounting rw, which started to create the
free space tree.

> We should handle it properly or it may cause false alerts for future
> migration.

Hope that helps...
Thank you.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clear space cache v1 fails with Unable to find block group for 0
  2024-12-08 22:50           ` j4nn
@ 2024-12-16 18:52             ` j4nn
  2024-12-17  5:31               ` j4nn
  0 siblings, 1 reply; 13+ messages in thread
From: j4nn @ 2024-12-16 18:52 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

This is unrelated, but as you have been interested in the hung task
backtrace, I got two more when using "btrfs send ... | btrfs receive
..." to copy 7TB of data from one btrfs disk to another one (still in
progress, both rotational hard drives):
[81837.347137] INFO: task btrfs-transacti:29385 blocked for more than
122 seconds.
[81837.347144]       Tainted: G        W  O       6.12.3-gentoo-x86_64 #1
[81837.347147] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[81837.347149] task:btrfs-transacti state:D stack:0     pid:29385
tgid:29385 ppid:2      flags:0x00004000
[81837.347154] Call Trace:
[81837.347156]  <TASK>
[81837.347161]  __schedule+0x3f0/0xbd0
[81837.347170]  schedule+0x27/0xf0
[81837.347174]  io_schedule+0x46/0x70
[81837.347177]  folio_wait_bit_common+0x123/0x340
[81837.347184]  ? __pfx_wake_page_function+0x10/0x10
[81837.347189]  folio_wait_writeback+0x2b/0x80
[81837.347193]  __filemap_fdatawait_range+0x7d/0xd0
[81837.347201]  filemap_fdatawait_range+0x12/0x20
[81837.347206]  __btrfs_wait_marked_extents.isra.0+0xb8/0xf0 [btrfs]
[81837.347250]  btrfs_write_and_wait_transaction+0x5e/0xd0 [btrfs]
[81837.347286]  btrfs_commit_transaction+0x8d9/0xe80 [btrfs]
[81837.347319]  ? start_transaction+0xc0/0x820 [btrfs]
[81837.347353]  transaction_kthread+0x159/0x1c0 [btrfs]
[81837.347386]  ? __pfx_transaction_kthread+0x10/0x10 [btrfs]
[81837.347418]  kthread+0xd2/0x100
[81837.347423]  ? __pfx_kthread+0x10/0x10
[81837.347427]  ret_from_fork+0x34/0x50
[81837.347430]  ? __pfx_kthread+0x10/0x10
[81837.347434]  ret_from_fork_asm+0x1a/0x30
[81837.347440]  </TASK>
[82205.983491] INFO: task btrfs-transacti:29385 blocked for more than
122 seconds.
[82205.983498]       Tainted: G        W  O       6.12.3-gentoo-x86_64 #1
[82205.983501] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[82205.983503] task:btrfs-transacti state:D stack:0     pid:29385
tgid:29385 ppid:2      flags:0x00004000
[82205.983508] Call Trace:
[82205.983511]  <TASK>
[82205.983515]  __schedule+0x3f0/0xbd0
[82205.983524]  schedule+0x27/0xf0
[82205.983528]  io_schedule+0x46/0x70
[82205.983531]  folio_wait_bit_common+0x123/0x340
[82205.983538]  ? __pfx_wake_page_function+0x10/0x10
[82205.983543]  folio_wait_writeback+0x2b/0x80
[82205.983546]  __filemap_fdatawait_range+0x7d/0xd0
[82205.983551]  ? srso_alias_return_thunk+0x5/0xfbef5
[82205.983555]  ? __slab_free+0xbf/0x2c0
[82205.983560]  ? srso_alias_return_thunk+0x5/0xfbef5
[82205.983563]  ? kmem_cache_alloc_noprof+0x201/0x2a0
[82205.983567]  ? clear_state_bit+0xfc/0x160 [btrfs]
[82205.983609]  ? srso_alias_return_thunk+0x5/0xfbef5
[82205.983612]  ? __clear_extent_bit+0x160/0x490 [btrfs]
[82205.983646]  filemap_fdatawait_range+0x12/0x20
[82205.983650]  __btrfs_wait_marked_extents.isra.0+0xb8/0xf0 [btrfs]
[82205.983689]  btrfs_write_and_wait_transaction+0x5e/0xd0 [btrfs]
[82205.983725]  btrfs_commit_transaction+0x8d9/0xe80 [btrfs]
[82205.983759]  ? start_transaction+0xc0/0x820 [btrfs]
[82205.983793]  transaction_kthread+0x159/0x1c0 [btrfs]
[82205.983827]  ? __pfx_transaction_kthread+0x10/0x10 [btrfs]
[82205.983859]  kthread+0xd2/0x100
[82205.983864]  ? __pfx_kthread+0x10/0x10
[82205.983867]  ret_from_fork+0x34/0x50
[82205.983871]  ? __pfx_kthread+0x10/0x10
[82205.983874]  ret_from_fork_asm+0x1a/0x30
[82205.983881]  </TASK>

Let me know please if you need some more info.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clear space cache v1 fails with Unable to find block group for 0
  2024-12-16 18:52             ` j4nn
@ 2024-12-17  5:31               ` j4nn
  2024-12-17  5:50                 ` Qu Wenruo
  0 siblings, 1 reply; 13+ messages in thread
From: j4nn @ 2024-12-17  5:31 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, 16 Dec 2024 at 19:52, j4nn <j4nn.xda@gmail.com> wrote:
>
> This is unrelated, but as you have been interested in the hung task
> backtrace, I got two more when using "btrfs send ... | btrfs receive
> ..." to copy 7TB of data from one btrfs disk to another one (still in
> progress, both rotational hard drives):
> [81837.347137] INFO: task btrfs-transacti:29385 blocked for more than
> 122 seconds.
> [81837.347144]       Tainted: G        W  O       6.12.3-gentoo-x86_64 #1
> [81837.347147] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [81837.347149] task:btrfs-transacti state:D stack:0     pid:29385
> tgid:29385 ppid:2      flags:0x00004000
> [81837.347154] Call Trace:
> [81837.347156]  <TASK>
> [81837.347161]  __schedule+0x3f0/0xbd0
> [81837.347170]  schedule+0x27/0xf0
> [81837.347174]  io_schedule+0x46/0x70
> [81837.347177]  folio_wait_bit_common+0x123/0x340
> [81837.347184]  ? __pfx_wake_page_function+0x10/0x10
> [81837.347189]  folio_wait_writeback+0x2b/0x80
> [81837.347193]  __filemap_fdatawait_range+0x7d/0xd0
> [81837.347201]  filemap_fdatawait_range+0x12/0x20
> [81837.347206]  __btrfs_wait_marked_extents.isra.0+0xb8/0xf0 [btrfs]
> [81837.347250]  btrfs_write_and_wait_transaction+0x5e/0xd0 [btrfs]
> [81837.347286]  btrfs_commit_transaction+0x8d9/0xe80 [btrfs]
> [81837.347319]  ? start_transaction+0xc0/0x820 [btrfs]
> [81837.347353]  transaction_kthread+0x159/0x1c0 [btrfs]
> [81837.347386]  ? __pfx_transaction_kthread+0x10/0x10 [btrfs]
> [81837.347418]  kthread+0xd2/0x100
> [81837.347423]  ? __pfx_kthread+0x10/0x10
> [81837.347427]  ret_from_fork+0x34/0x50
> [81837.347430]  ? __pfx_kthread+0x10/0x10
> [81837.347434]  ret_from_fork_asm+0x1a/0x30
> [81837.347440]  </TASK>
> [82205.983491] INFO: task btrfs-transacti:29385 blocked for more than
> 122 seconds.
> [82205.983498]       Tainted: G        W  O       6.12.3-gentoo-x86_64 #1
> [82205.983501] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [82205.983503] task:btrfs-transacti state:D stack:0     pid:29385
> tgid:29385 ppid:2      flags:0x00004000
> [82205.983508] Call Trace:
> [82205.983511]  <TASK>
> [82205.983515]  __schedule+0x3f0/0xbd0
> [82205.983524]  schedule+0x27/0xf0
> [82205.983528]  io_schedule+0x46/0x70
> [82205.983531]  folio_wait_bit_common+0x123/0x340
> [82205.983538]  ? __pfx_wake_page_function+0x10/0x10
> [82205.983543]  folio_wait_writeback+0x2b/0x80
> [82205.983546]  __filemap_fdatawait_range+0x7d/0xd0
> [82205.983551]  ? srso_alias_return_thunk+0x5/0xfbef5
> [82205.983555]  ? __slab_free+0xbf/0x2c0
> [82205.983560]  ? srso_alias_return_thunk+0x5/0xfbef5
> [82205.983563]  ? kmem_cache_alloc_noprof+0x201/0x2a0
> [82205.983567]  ? clear_state_bit+0xfc/0x160 [btrfs]
> [82205.983609]  ? srso_alias_return_thunk+0x5/0xfbef5
> [82205.983612]  ? __clear_extent_bit+0x160/0x490 [btrfs]
> [82205.983646]  filemap_fdatawait_range+0x12/0x20
> [82205.983650]  __btrfs_wait_marked_extents.isra.0+0xb8/0xf0 [btrfs]
> [82205.983689]  btrfs_write_and_wait_transaction+0x5e/0xd0 [btrfs]
> [82205.983725]  btrfs_commit_transaction+0x8d9/0xe80 [btrfs]
> [82205.983759]  ? start_transaction+0xc0/0x820 [btrfs]
> [82205.983793]  transaction_kthread+0x159/0x1c0 [btrfs]
> [82205.983827]  ? __pfx_transaction_kthread+0x10/0x10 [btrfs]
> [82205.983859]  kthread+0xd2/0x100
> [82205.983864]  ? __pfx_kthread+0x10/0x10
> [82205.983867]  ret_from_fork+0x34/0x50
> [82205.983871]  ? __pfx_kthread+0x10/0x10
> [82205.983874]  ret_from_fork_asm+0x1a/0x30
> [82205.983881]  </TASK>
>
> Let me know please if you need some more info.

Got few more of those during the above mentioned transfer:
[101497.950425] INFO: task btrfs-transacti:29385 blocked for more than
122 seconds.
[101497.950432]       Tainted: G        W  O       6.12.3-gentoo-x86_64 #1
[101497.950435] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[101497.950437] task:btrfs-transacti state:D stack:0     pid:29385
tgid:29385 ppid:2      flags:0x00004000
[101497.950442] Call Trace:
[101497.950444]  <TASK>
[101497.950449]  __schedule+0x3f0/0xbd0
[101497.950458]  schedule+0x27/0xf0
[101497.950461]  io_schedule+0x46/0x70
[101497.950465]  folio_wait_bit_common+0x123/0x340
[101497.950472]  ? __pfx_wake_page_function+0x10/0x10
[101497.950477]  folio_wait_writeback+0x2b/0x80
[101497.950480]  __filemap_fdatawait_range+0x7d/0xd0
[101497.950489]  filemap_fdatawait_range+0x12/0x20
[101497.950493]  __btrfs_wait_marked_extents.isra.0+0xb8/0xf0 [btrfs]
[101497.950536]  btrfs_write_and_wait_transaction+0x5e/0xd0 [btrfs]
[101497.950572]  btrfs_commit_transaction+0x8d9/0xe80 [btrfs]
[101497.950606]  ? start_transaction+0xc0/0x820 [btrfs]
[101497.950640]  transaction_kthread+0x159/0x1c0 [btrfs]
[101497.950674]  ? __pfx_transaction_kthread+0x10/0x10 [btrfs]
[101497.950705]  kthread+0xd2/0x100
[101497.950710]  ? __pfx_kthread+0x10/0x10
[101497.950714]  ret_from_fork+0x34/0x50
[101497.950718]  ? __pfx_kthread+0x10/0x10
[101497.950721]  ret_from_fork_asm+0x1a/0x30
[101497.950727]  </TASK>
[102358.101888] INFO: task btrfs-transacti:29385 blocked for more than
122 seconds.
[102358.101895]       Tainted: G        W  O       6.12.3-gentoo-x86_64 #1
[102358.101898] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[102358.101899] task:btrfs-transacti state:D stack:0     pid:29385
tgid:29385 ppid:2      flags:0x00004000
[102358.101904] Call Trace:
[102358.101907]  <TASK>
[102358.101912]  __schedule+0x3f0/0xbd0
[102358.101920]  schedule+0x27/0xf0
[102358.101924]  io_schedule+0x46/0x70
[102358.101928]  folio_wait_bit_common+0x123/0x340
[102358.101934]  ? __pfx_wake_page_function+0x10/0x10
[102358.101939]  folio_wait_writeback+0x2b/0x80
[102358.101943]  __filemap_fdatawait_range+0x7d/0xd0
[102358.101951]  filemap_fdatawait_range+0x12/0x20
[102358.101956]  __btrfs_wait_marked_extents.isra.0+0xb8/0xf0 [btrfs]
[102358.102000]  btrfs_write_and_wait_transaction+0x5e/0xd0 [btrfs]
[102358.102035]  btrfs_commit_transaction+0x8d9/0xe80 [btrfs]
[102358.102069]  ? start_transaction+0xc0/0x820 [btrfs]
[102358.102107]  transaction_kthread+0x159/0x1c0 [btrfs]
[102358.102142]  ? __pfx_transaction_kthread+0x10/0x10 [btrfs]
[102358.102174]  kthread+0xd2/0x100
[102358.102178]  ? __pfx_kthread+0x10/0x10
[102358.102182]  ret_from_fork+0x34/0x50
[102358.102186]  ? __pfx_kthread+0x10/0x10
[102358.102189]  ret_from_fork_asm+0x1a/0x30
[102358.102195]  </TASK>
[102849.617090] INFO: task btrfs-transacti:29385 blocked for more than
122 seconds.
[102849.617097]       Tainted: G        W  O       6.12.3-gentoo-x86_64 #1
[102849.617100] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[102849.617102] task:btrfs-transacti state:D stack:0     pid:29385
tgid:29385 ppid:2      flags:0x00004000
[102849.617107] Call Trace:
[102849.617109]  <TASK>
[102849.617113]  __schedule+0x3f0/0xbd0
[102849.617122]  schedule+0x27/0xf0
[102849.617126]  io_schedule+0x46/0x70
[102849.617130]  folio_wait_bit_common+0x123/0x340
[102849.617137]  ? __pfx_wake_page_function+0x10/0x10
[102849.617141]  folio_wait_writeback+0x2b/0x80
[102849.617145]  __filemap_fdatawait_range+0x7d/0xd0
[102849.617151]  ? srso_alias_return_thunk+0x5/0xfbef5
[102849.617155]  ? kmem_cache_alloc_noprof+0x201/0x2a0
[102849.617160]  ? clear_state_bit+0xfc/0x160 [btrfs]
[102849.617202]  ? srso_alias_return_thunk+0x5/0xfbef5
[102849.617206]  ? __clear_extent_bit+0x160/0x490 [btrfs]
[102849.617240]  filemap_fdatawait_range+0x12/0x20
[102849.617244]  __btrfs_wait_marked_extents.isra.0+0xb8/0xf0 [btrfs]
[102849.617283]  btrfs_write_and_wait_transaction+0x5e/0xd0 [btrfs]
[102849.617319]  btrfs_commit_transaction+0x8d9/0xe80 [btrfs]
[102849.617353]  ? start_transaction+0xc0/0x820 [btrfs]
[102849.617387]  transaction_kthread+0x159/0x1c0 [btrfs]
[102849.617421]  ? __pfx_transaction_kthread+0x10/0x10 [btrfs]
[102849.617452]  kthread+0xd2/0x100
[102849.617457]  ? __pfx_kthread+0x10/0x10
[102849.617461]  ret_from_fork+0x34/0x50
[102849.617465]  ? __pfx_kthread+0x10/0x10
[102849.617468]  ret_from_fork_asm+0x1a/0x30
[102849.617474]  </TASK>
[103464.011016] INFO: task btrfs-transacti:29385 blocked for more than
122 seconds.
[103464.011023]       Tainted: G        W  O       6.12.3-gentoo-x86_64 #1
[103464.011025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[103464.011027] task:btrfs-transacti state:D stack:0     pid:29385
tgid:29385 ppid:2      flags:0x00004000
[103464.011031] Call Trace:
[103464.011033]  <TASK>
[103464.011037]  __schedule+0x3f0/0xbd0
[103464.011045]  schedule+0x27/0xf0
[103464.011048]  io_schedule+0x46/0x70
[103464.011051]  folio_wait_bit_common+0x123/0x340
[103464.011057]  ? __pfx_wake_page_function+0x10/0x10
[103464.011061]  folio_wait_writeback+0x2b/0x80
[103464.011064]  __filemap_fdatawait_range+0x7d/0xd0
[103464.011072]  filemap_fdatawait_range+0x12/0x20
[103464.011076]  __btrfs_wait_marked_extents.isra.0+0xb8/0xf0 [btrfs]
[103464.011113]  btrfs_write_and_wait_transaction+0x5e/0xd0 [btrfs]
[103464.011143]  btrfs_commit_transaction+0x8d9/0xe80 [btrfs]
[103464.011172]  ? start_transaction+0xc0/0x820 [btrfs]
[103464.011200]  transaction_kthread+0x159/0x1c0 [btrfs]
[103464.011228]  ? __pfx_transaction_kthread+0x10/0x10 [btrfs]
[103464.011255]  kthread+0xd2/0x100
[103464.011259]  ? __pfx_kthread+0x10/0x10
[103464.011262]  ret_from_fork+0x34/0x50
[103464.011266]  ? __pfx_kthread+0x10/0x10
[103464.011268]  ret_from_fork_asm+0x1a/0x30
[103464.011274]  </TASK>
[103832.647363] INFO: task btrfs-transacti:29385 blocked for more than
122 seconds.
[103832.647370]       Tainted: G        W  O       6.12.3-gentoo-x86_64 #1
[103832.647372] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[103832.647374] task:btrfs-transacti state:D stack:0     pid:29385
tgid:29385 ppid:2      flags:0x00004000
[103832.647378] Call Trace:
[103832.647380]  <TASK>
[103832.647384]  __schedule+0x3f0/0xbd0
[103832.647392]  schedule+0x27/0xf0
[103832.647396]  io_schedule+0x46/0x70
[103832.647399]  folio_wait_bit_common+0x123/0x340
[103832.647405]  ? __pfx_wake_page_function+0x10/0x10
[103832.647410]  folio_wait_writeback+0x2b/0x80
[103832.647413]  __filemap_fdatawait_range+0x7d/0xd0
[103832.647420]  ? srso_alias_return_thunk+0x5/0xfbef5
[103832.647425]  ? __clear_extent_bit+0x160/0x490 [btrfs]
[103832.647466]  filemap_fdatawait_range+0x12/0x20
[103832.647470]  __btrfs_wait_marked_extents.isra.0+0xb8/0xf0 [btrfs]
[103832.647509]  btrfs_write_and_wait_transaction+0x5e/0xd0 [btrfs]
[103832.647545]  btrfs_commit_transaction+0x8d9/0xe80 [btrfs]
[103832.647579]  ? start_transaction+0xc0/0x820 [btrfs]
[103832.647613]  transaction_kthread+0x159/0x1c0 [btrfs]
[103832.647648]  ? __pfx_transaction_kthread+0x10/0x10 [btrfs]
[103832.647680]  kthread+0xd2/0x100
[103832.647685]  ? __pfx_kthread+0x10/0x10
[103832.647688]  ret_from_fork+0x34/0x50
[103832.647692]  ? __pfx_kthread+0x10/0x10
[103832.647695]  ret_from_fork_asm+0x1a/0x30
[103832.647700]  </TASK>
[104078.404940] INFO: task btrfs-transacti:29385 blocked for more than
122 seconds.
[104078.404947]       Tainted: G        W  O       6.12.3-gentoo-x86_64 #1
[104078.404950] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[104078.404952] task:btrfs-transacti state:D stack:0     pid:29385
tgid:29385 ppid:2      flags:0x00004000
[104078.404957] Call Trace:
[104078.404959]  <TASK>
[104078.404964]  __schedule+0x3f0/0xbd0
[104078.404973]  schedule+0x27/0xf0
[104078.404976]  io_schedule+0x46/0x70
[104078.404980]  folio_wait_bit_common+0x123/0x340
[104078.404987]  ? __pfx_wake_page_function+0x10/0x10
[104078.404992]  folio_wait_writeback+0x2b/0x80
[104078.404995]  __filemap_fdatawait_range+0x7d/0xd0
[104078.405001]  ? srso_alias_return_thunk+0x5/0xfbef5
[104078.405005]  ? kmem_cache_alloc_noprof+0x201/0x2a0
[104078.405010]  ? clear_state_bit+0xfc/0x160 [btrfs]
[104078.405051]  ? srso_alias_return_thunk+0x5/0xfbef5
[104078.405055]  ? __clear_extent_bit+0x160/0x490 [btrfs]
[104078.405089]  filemap_fdatawait_range+0x12/0x20
[104078.405093]  __btrfs_wait_marked_extents.isra.0+0xb8/0xf0 [btrfs]
[104078.405132]  btrfs_write_and_wait_transaction+0x5e/0xd0 [btrfs]
[104078.405168]  btrfs_commit_transaction+0x8d9/0xe80 [btrfs]
[104078.405202]  ? start_transaction+0xc0/0x820 [btrfs]
[104078.405236]  transaction_kthread+0x159/0x1c0 [btrfs]
[104078.405270]  ? __pfx_transaction_kthread+0x10/0x10 [btrfs]
[104078.405301]  kthread+0xd2/0x100
[104078.405306]  ? __pfx_kthread+0x10/0x10
[104078.405310]  ret_from_fork+0x34/0x50
[104078.405313]  ? __pfx_kthread+0x10/0x10
[104078.405317]  ret_from_fork_asm+0x1a/0x30
[104078.405323]  </TASK>
[104324.162492] INFO: task btrfs-transacti:29385 blocked for more than
122 seconds.
[104324.162499]       Tainted: G        W  O       6.12.3-gentoo-x86_64 #1
[104324.162502] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[104324.162504] task:btrfs-transacti state:D stack:0     pid:29385
tgid:29385 ppid:2      flags:0x00004000
[104324.162509] Call Trace:
[104324.162511]  <TASK>
[104324.162515]  __schedule+0x3f0/0xbd0
[104324.162525]  schedule+0x27/0xf0
[104324.162528]  io_schedule+0x46/0x70
[104324.162532]  folio_wait_bit_common+0x123/0x340
[104324.162539]  ? __pfx_wake_page_function+0x10/0x10
[104324.162544]  folio_wait_writeback+0x2b/0x80
[104324.162547]  __filemap_fdatawait_range+0x7d/0xd0
[104324.162553]  ? srso_alias_return_thunk+0x5/0xfbef5
[104324.162557]  ? kmem_cache_alloc_noprof+0x201/0x2a0
[104324.162562]  ? clear_state_bit+0xfc/0x160 [btrfs]
[104324.162604]  ? srso_alias_return_thunk+0x5/0xfbef5
[104324.162608]  ? __clear_extent_bit+0x160/0x490 [btrfs]
[104324.162641]  filemap_fdatawait_range+0x12/0x20
[104324.162645]  __btrfs_wait_marked_extents.isra.0+0xb8/0xf0 [btrfs]
[104324.162685]  btrfs_write_and_wait_transaction+0x5e/0xd0 [btrfs]
[104324.162721]  btrfs_commit_transaction+0x8d9/0xe80 [btrfs]
[104324.162755]  ? start_transaction+0xc0/0x820 [btrfs]
[104324.162789]  transaction_kthread+0x159/0x1c0 [btrfs]
[104324.162823]  ? __pfx_transaction_kthread+0x10/0x10 [btrfs]
[104324.162854]  kthread+0xd2/0x100
[104324.162859]  ? __pfx_kthread+0x10/0x10
[104324.162863]  ret_from_fork+0x34/0x50
[104324.162866]  ? __pfx_kthread+0x10/0x10
[104324.162870]  ret_from_fork_asm+0x1a/0x30
[104324.162876]  </TASK>
[105798.707833] INFO: task btrfs-transacti:29385 blocked for more than
122 seconds.
[105798.707840]       Tainted: G        W  O       6.12.3-gentoo-x86_64 #1
[105798.707843] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[105798.707845] task:btrfs-transacti state:D stack:0     pid:29385
tgid:29385 ppid:2      flags:0x00004000
[105798.707850] Call Trace:
[105798.707853]  <TASK>
[105798.707857]  __schedule+0x3f0/0xbd0
[105798.707866]  schedule+0x27/0xf0
[105798.707870]  io_schedule+0x46/0x70
[105798.707874]  folio_wait_bit_common+0x123/0x340
[105798.707881]  ? __pfx_wake_page_function+0x10/0x10
[105798.707886]  folio_wait_writeback+0x2b/0x80
[105798.707890]  __filemap_fdatawait_range+0x7d/0xd0
[105798.707896]  ? srso_alias_return_thunk+0x5/0xfbef5
[105798.707900]  ? kmem_cache_alloc_noprof+0x201/0x2a0
[105798.707906]  ? clear_state_bit+0xfc/0x160 [btrfs]
[105798.707949]  ? srso_alias_return_thunk+0x5/0xfbef5
[105798.707953]  ? __clear_extent_bit+0x160/0x490 [btrfs]
[105798.707989]  filemap_fdatawait_range+0x12/0x20
[105798.707993]  __btrfs_wait_marked_extents.isra.0+0xb8/0xf0 [btrfs]
[105798.708034]  btrfs_write_and_wait_transaction+0x5e/0xd0 [btrfs]
[105798.708071]  btrfs_commit_transaction+0x8d9/0xe80 [btrfs]
[105798.708107]  ? start_transaction+0xc0/0x820 [btrfs]
[105798.708142]  transaction_kthread+0x159/0x1c0 [btrfs]
[105798.708177]  ? __pfx_transaction_kthread+0x10/0x10 [btrfs]
[105798.708211]  kthread+0xd2/0x100
[105798.708215]  ? __pfx_kthread+0x10/0x10
[105798.708219]  ret_from_fork+0x34/0x50
[105798.708223]  ? __pfx_kthread+0x10/0x10
[105798.708227]  ret_from_fork_asm+0x1a/0x30
[105798.708233]  </TASK>
[105798.708235] Future hung task reports are suppressed, see sysctl
kernel.hung_task_warnings

The transfer has been completed, took following 'time':
real    812m30.231s
user    6m7.214s
sys     169m1.938s

The destination btrfs filesystem has been freshly created (single
device, no raid) and thus empty before starting the transfer.
This is output of destination 'btrfs filesystem df' after the transfer:
Data, single: total=7.10TiB, used=7.06TiB
System, DUP: total=8.00MiB, used=768.00KiB
Metadata, DUP: total=8.00GiB, used=7.53GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

The btrfs is on a luks device (cryptsetup -h sha512 -c aes-xts-plain64
-s 256), but cpu has been idle the whole time (ryzen 5950x).

Hope that helps.
Thank you.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clear space cache v1 fails with Unable to find block group for 0
  2024-12-17  5:31               ` j4nn
@ 2024-12-17  5:50                 ` Qu Wenruo
  2024-12-17  6:10                   ` j4nn
  0 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2024-12-17  5:50 UTC (permalink / raw)
  To: j4nn, Qu Wenruo; +Cc: linux-btrfs

在 2024/12/17 16:01, j4nn 写道:
> On Mon, 16 Dec 2024 at 19:52, j4nn <j4nn.xda@gmail.com> wrote:
>>
>> This is unrelated, but as you have been interested in the hung task
>> backtrace, I got two more when using "btrfs send ... | btrfs receive
>> ..." to copy 7TB of data from one btrfs disk to another one (still in
>> progress, both rotational hard drives):
>> [81837.347137] INFO: task btrfs-transacti:29385 blocked for more than
>> 122 seconds.
>> [81837.347144]       Tainted: G        W  O       6.12.3-gentoo-x86_64 #1
>> [81837.347147] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [81837.347149] task:btrfs-transacti state:D stack:0     pid:29385
>> tgid:29385 ppid:2      flags:0x00004000
>> [81837.347154] Call Trace:
>> [81837.347156]  <TASK>
>> [81837.347161]  __schedule+0x3f0/0xbd0
>> [81837.347170]  schedule+0x27/0xf0
>> [81837.347174]  io_schedule+0x46/0x70
>> [81837.347177]  folio_wait_bit_common+0x123/0x340
>> [81837.347184]  ? __pfx_wake_page_function+0x10/0x10
>> [81837.347189]  folio_wait_writeback+0x2b/0x80
>> [81837.347193]  __filemap_fdatawait_range+0x7d/0xd0
>> [81837.347201]  filemap_fdatawait_range+0x12/0x20
>> [81837.347206]  __btrfs_wait_marked_extents.isra.0+0xb8/0xf0 [btrfs]

This is from the metadata writeback.

My guess is, since you're transfering a lot of data, the metadata also 
go very large very fast.

And the default commit interval is 30s, and considering your hardware, 
your hardware memory may also be pretty large (at least 32G I believe?).

That means we can have several giga bytes of metadata waiting to be 
written back.

What makes things worse may be the fact that, metadata writeback 
nowadays are always in nodesize, no merging at all.

Furthermore since your storage device is HDD, the low IOPS performance 
is making it even worse.

Combining all those things together, we're writing several giga bytes of 
metadata, all in 16K nodesize no merging, resulting a very bad write 
performance on rotating disks...

IIRC there are some ways to limit how many bytes can be utilized by page 
cache (btrfs metadata is also utilizing page cache), thus it may improve 
the situation by not writing too much metadata in one go.

[...]
> 
> The destination btrfs filesystem has been freshly created (single
> device, no raid) and thus empty before starting the transfer.
> This is output of destination 'btrfs filesystem df' after the transfer:
> Data, single: total=7.10TiB, used=7.06TiB
> System, DUP: total=8.00MiB, used=768.00KiB
> Metadata, DUP: total=8.00GiB, used=7.53GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
> 
> The btrfs is on a luks device (cryptsetup -h sha512 -c aes-xts-plain64
> -s 256), but cpu has been idle the whole time (ryzen 5950x).

Considering the transfer finished, and you can unmount the fs, it should 
really be a false alert, mind to share how large your RAM is?
32G or even 64G?

Thanks,
Qu
> 
> Hope that helps.
> Thank you.
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clear space cache v1 fails with Unable to find block group for 0
  2024-12-17  5:50                 ` Qu Wenruo
@ 2024-12-17  6:10                   ` j4nn
  2024-12-17  6:34                     ` j4nn
  0 siblings, 1 reply; 13+ messages in thread
From: j4nn @ 2024-12-17  6:10 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs

On Tue, 17 Dec 2024 at 06:50, Qu Wenruo <wqu@suse.com> wrote:
> 在 2024/12/17 16:01, j4nn 写道:
> > On Mon, 16 Dec 2024 at 19:52, j4nn <j4nn.xda@gmail.com> wrote:
> >>
> >> This is unrelated, but as you have been interested in the hung task
> >> backtrace, I got two more when using "btrfs send ... | btrfs receive
> >> ..." to copy 7TB of data from one btrfs disk to another one (still in
> >> progress, both rotational hard drives):
> >> [81837.347137] INFO: task btrfs-transacti:29385 blocked for more than
> >> 122 seconds.
> >> [81837.347144]       Tainted: G        W  O       6.12.3-gentoo-x86_64 #1
> >> [81837.347147] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >> disables this message.
> >> [81837.347149] task:btrfs-transacti state:D stack:0     pid:29385
> >> tgid:29385 ppid:2      flags:0x00004000
> >> [81837.347154] Call Trace:
> >> [81837.347156]  <TASK>
> >> [81837.347161]  __schedule+0x3f0/0xbd0
> >> [81837.347170]  schedule+0x27/0xf0
> >> [81837.347174]  io_schedule+0x46/0x70
> >> [81837.347177]  folio_wait_bit_common+0x123/0x340
> >> [81837.347184]  ? __pfx_wake_page_function+0x10/0x10
> >> [81837.347189]  folio_wait_writeback+0x2b/0x80
> >> [81837.347193]  __filemap_fdatawait_range+0x7d/0xd0
> >> [81837.347201]  filemap_fdatawait_range+0x12/0x20
> >> [81837.347206]  __btrfs_wait_marked_extents.isra.0+0xb8/0xf0 [btrfs]
>
> This is from the metadata writeback.
>
> My guess is, since you're transfering a lot of data, the metadata also
> go very large very fast.
>
> And the default commit interval is 30s, and considering your hardware,
> your hardware memory may also be pretty large (at least 32G I believe?).
>
> That means we can have several giga bytes of metadata waiting to be
> written back.
>
> What makes things worse may be the fact that, metadata writeback
> nowadays are always in nodesize, no merging at all.
>
> Furthermore since your storage device is HDD, the low IOPS performance
> is making it even worse.
>
>
> Combining all those things together, we're writing several giga bytes of
> metadata, all in 16K nodesize no merging, resulting a very bad write
> performance on rotating disks...
>
> IIRC there are some ways to limit how many bytes can be utilized by page
> cache (btrfs metadata is also utilizing page cache), thus it may improve
> the situation by not writing too much metadata in one go.
>
> [...]
> >
> > The destination btrfs filesystem has been freshly created (single
> > device, no raid) and thus empty before starting the transfer.
> > This is output of destination 'btrfs filesystem df' after the transfer:
> > Data, single: total=7.10TiB, used=7.06TiB
> > System, DUP: total=8.00MiB, used=768.00KiB
> > Metadata, DUP: total=8.00GiB, used=7.53GiB
> > GlobalReserve, single: total=512.00MiB, used=0.00B
> >
> > The btrfs is on a luks device (cryptsetup -h sha512 -c aes-xts-plain64
> > -s 256), but cpu has been idle the whole time (ryzen 5950x).
>
>
> Considering the transfer finished, and you can unmount the fs, it should
> really be a false alert, mind to share how large your RAM is?
> 32G or even 64G?

Yes, you are right, using both :-)
That is 96GB of RAM...

Thank you for the explanations, particularly the "16K nodesize no
merging" metadata chunks.
The reported 'time' of the transfer included a 'sync' command after
the btrfs receive.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clear space cache v1 fails with Unable to find block group for 0
  2024-12-17  6:10                   ` j4nn
@ 2024-12-17  6:34                     ` j4nn
  2024-12-17  6:55                       ` Qu Wenruo
  0 siblings, 1 reply; 13+ messages in thread
From: j4nn @ 2024-12-17  6:34 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs

On Tue, 17 Dec 2024 at 07:10, j4nn <j4nn.xda@gmail.com> wrote:
>
> On Tue, 17 Dec 2024 at 06:50, Qu Wenruo <wqu@suse.com> wrote:
> > 在 2024/12/17 16:01, j4nn 写道:
> > > On Mon, 16 Dec 2024 at 19:52, j4nn <j4nn.xda@gmail.com> wrote:
> > >>
> >
> > This is from the metadata writeback.
> >
> > My guess is, since you're transfering a lot of data, the metadata also
> > go very large very fast.
> >
> > And the default commit interval is 30s, and considering your hardware,
> > your hardware memory may also be pretty large (at least 32G I believe?).
> >
> > That means we can have several giga bytes of metadata waiting to be
> > written back.
> >
> > What makes things worse may be the fact that, metadata writeback
> > nowadays are always in nodesize, no merging at all.
> >
> > Furthermore since your storage device is HDD, the low IOPS performance
> > is making it even worse.
> >
> >
> > Combining all those things together, we're writing several giga bytes of
> > metadata, all in 16K nodesize no merging, resulting a very bad write
> > performance on rotating disks...
> >
> > IIRC there are some ways to limit how many bytes can be utilized by page
> > cache (btrfs metadata is also utilizing page cache), thus it may improve
> > the situation by not writing too much metadata in one go.
> >
> > [...]
> > Considering the transfer finished, and you can unmount the fs, it should
> > really be a false alert, mind to share how large your RAM is?
> > 32G or even 64G?
>
> Yes, you are right, using both :-)
> That is 96GB of RAM...
>
> Thank you for the explanations, particularly the "16K nodesize no
> merging" metadata chunks.
> The reported 'time' of the transfer included a 'sync' command after
> the btrfs receive.

I guess the hung task backtrace that appeared during creating of free
space tree cache had the same cause as this simple btrfs send and
receive?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clear space cache v1 fails with Unable to find block group for 0
  2024-12-17  6:34                     ` j4nn
@ 2024-12-17  6:55                       ` Qu Wenruo
  0 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2024-12-17  6:55 UTC (permalink / raw)
  To: j4nn, Qu Wenruo; +Cc: linux-btrfs



在 2024/12/17 17:04, j4nn 写道:
> On Tue, 17 Dec 2024 at 07:10, j4nn <j4nn.xda@gmail.com> wrote:
>>
>> On Tue, 17 Dec 2024 at 06:50, Qu Wenruo <wqu@suse.com> wrote:
>>> 在 2024/12/17 16:01, j4nn 写道:
>>>> On Mon, 16 Dec 2024 at 19:52, j4nn <j4nn.xda@gmail.com> wrote:
>>>>>
>>>
>>> This is from the metadata writeback.
>>>
>>> My guess is, since you're transfering a lot of data, the metadata also
>>> go very large very fast.
>>>
>>> And the default commit interval is 30s, and considering your hardware,
>>> your hardware memory may also be pretty large (at least 32G I believe?).
>>>
>>> That means we can have several giga bytes of metadata waiting to be
>>> written back.
>>>
>>> What makes things worse may be the fact that, metadata writeback
>>> nowadays are always in nodesize, no merging at all.
>>>
>>> Furthermore since your storage device is HDD, the low IOPS performance
>>> is making it even worse.
>>>
>>>
>>> Combining all those things together, we're writing several giga bytes of
>>> metadata, all in 16K nodesize no merging, resulting a very bad write
>>> performance on rotating disks...
>>>
>>> IIRC there are some ways to limit how many bytes can be utilized by page
>>> cache (btrfs metadata is also utilizing page cache), thus it may improve
>>> the situation by not writing too much metadata in one go.
>>>
>>> [...]
>>> Considering the transfer finished, and you can unmount the fs, it should
>>> really be a false alert, mind to share how large your RAM is?
>>> 32G or even 64G?
>>
>> Yes, you are right, using both :-)
>> That is 96GB of RAM...
>>
>> Thank you for the explanations, particularly the "16K nodesize no
>> merging" metadata chunks.
>> The reported 'time' of the transfer included a 'sync' command after
>> the btrfs receive.
>
> I guess the hung task backtrace that appeared during creating of free
> space tree cache had the same cause as this simple btrfs send and
> receive?

I can only be more or less certain on the receive end. (receive is
mostly just writing data into the fs, as most of the work is done in
user space with buffered write).

The v2 cache rebuilding process is indeed problematic, but for a
different reason.

When rebuilding v2 cache can cause a huge hang, that's for sure.
We are using a single transaction to build new v2 cache for each block
group, no wonder it will hang.

Anyway I'll change the rebuilding process to at least do
multi-transactional update, to avoid holding one transaction too long.

But the rebuilding itself can still be very time consuming anyway.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-12-17  6:55 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-08 16:02 clear space cache v1 fails with Unable to find block group for 0 j4nn
2024-12-08 20:26 ` Qu Wenruo
2024-12-08 21:25   ` j4nn
2024-12-08 21:36     ` Qu Wenruo
2024-12-08 22:19       ` j4nn
2024-12-08 22:32         ` Qu Wenruo
2024-12-08 22:50           ` j4nn
2024-12-16 18:52             ` j4nn
2024-12-17  5:31               ` j4nn
2024-12-17  5:50                 ` Qu Wenruo
2024-12-17  6:10                   ` j4nn
2024-12-17  6:34                     ` j4nn
2024-12-17  6:55                       ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox