* Out of space loop: skip_balance not working
@ 2023-06-12 4:47 Stefan N
2023-06-12 5:20 ` Qu Wenruo
0 siblings, 1 reply; 22+ messages in thread
From: Stefan N @ 2023-06-12 4:47 UTC (permalink / raw)
To: linux-btrfs
Hi,
I'm having trouble trying to break my array out of an out of space loop.
On reboot I'm able to mount the filesystem and read files fine but as
soon as I try to delete/write it hangs until the mount is made read
only when it then fails.
The following command (immediately after boot, no fstab) suggests
perhaps the skip_balance is not working as expected:
$ mount -o skip_balance -t btrfs /dev/sde /mnt/point && btrfs device
add /dev/loop12 /mnt/point/
ERROR: unable to start device add, another exclusive operation
'balance' in progress
and ps shows a [btrfs-balance] process.
If I perform a rm or truncate during this window it fails to perform
any action before being marked read only. The same applies if I
attempt to cancel the balance.
How can I get out of this cycle? I've previously run out of space and
been able to recover by deleting a few files etc without needing to
invoke skip_balance, but that was likely on older versions.
Any help would be greatly appreciated.
- Stefan
$ uname -a
Linux my.host 5.15.0-73-generic #80-Ubuntu SMP Mon May 15 15:18:26 UTC
2023 x86_64 x86_64 x86_64 GNU/Linux
$ btrfs --version
btrfs-progs v5.16.2
$ btrfs fi show
Label: none uuid: ---
Total devices 8 FS bytes used 64.67TiB
devid 1 size 10.91TiB used 10.91TiB path /dev/sdk
devid 2 size 10.91TiB used 10.91TiB path /dev/sdh
devid 3 size 10.91TiB used 10.91TiB path /dev/sdj
devid 4 size 10.91TiB used 10.91TiB path /dev/sdi
devid 5 size 10.91TiB used 10.91TiB path /dev/sdf
devid 6 size 10.91TiB used 10.91TiB path /dev/sdg
devid 7 size 10.91TiB used 10.91TiB path /dev/sdd
devid 8 size 10.91TiB used 10.91TiB path /dev/sde
$ btrfs fi df /mnt/point/
Data, RAID6: total=64.76TiB, used=64.59TiB
System, RAID1C4: total=37.00MiB, used=5.11MiB
Metadata, RAID1C4: total=77.79GiB, used=77.10GiB
GlobalReserve, single: total=512.00MiB, used=387.11MiB
$
BTRFS: Transaction aborted (error -28)
BTRFS: error (device sdk) in __btrfs_free_extent:3180: errno=-28 No space left
BTRFS info (device sdk): forced readonly
BTRFS error (device sdk): failed to run delayed ref for logical
101911627694080 num_bytes 126976 type 184 action 2 ref_mod 1: -28
WARNING: CPU: 2 PID: 7851 at fs/btrfs/extent-tree.c:3180
__btrfs_free_extent+0x7e4/0x950 [btrfs]
BTRFS: error (device sdk) in btrfs_run_delayed_refs:2152: errno=-28 No
space left
BTRFS warning (device sdk): btrfs_uuid_scan_kthread failed -28
Modules linked in: xt_nat xt_tcpudp veth xt_conntrack nft_chain_nat
xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 xfrm_user xfrm_algo nft_counter xt_addrtype nft_compat
nf_tables nfnetlink br_netfilter bridge stp llc ipmi_devintf
ipmi_msghandler overlay binfmt_misc intel_rapl_msr intel_rapl_common
edac_mce_amd snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio
snd_hda_codec_hdmi kvm_amd nls_iso8859_1 kvm snd_hda_intel rapl
snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core
wmi_bmof input_leds snd_hwdep snd_pcm k10temp snd_timer snd ccp
soundcore mac_hid sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc
scsi_dh_alua bonding tls ramoops pstore_blk msr reed_solomon
pstore_zone efi_pstore nfsd auth_rpcgss nfs_acl lockd grace sunrpc
ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10
raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor
raid6_pq libcrc32c raid1 raid0 multipath linear
hid_generic usbhid hid uas usb_storage amdgpu iommu_v2 gpu_sched
drm_ttm_helper crct10dif_pclmul ttm drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops crc32_pclmul cec ghash_clmulni_intel
aesni_intel mpt3sas rc_core raid_class crypto_simd drm nvme i2c_piix4
cryptd scsi_transport_sas igb dca ahci libahci xhci_pci qlcnic
i2c_algo_bit nvme_core xhci_pci_renesas wmi video
CPU: 2 PID: 7851 Comm: btrfs-transacti Not tainted 5.15.0-73-generic #80-Ubuntu
Hardware name: To Be Filled By O.E.M. X570M Pro4/X570M Pro4, BIOS
P3.70 02/23/2022
RIP: 0010:__btrfs_free_extent+0x7e4/0x950 [btrfs]
Code: a0 48 05 50 0a 00 00 f0 48 0f ba 28 03 72 1d 8b 45 84 83 f8 fb
74 32 83 f8 e2 74 2d 89 c6 48 c7 c7 98 f6 34 c1 e8 ed 42 a9 e6 <0f> 0b
8b 4d 84 48 8b 7d 90 ba 6c 0c 00 00 48 c7 c6 60 39 34 c1 e8
RSP: 0018:ffffb63581c9fb68 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 00000000000000d1 RCX: 0000000000000027
RDX: ffff8ceda0aa0588 RSI: 0000000000000001 RDI: ffff8ceda0aa0580
RBP: ffffb63581c9fc10 R08: 0000000000000003 R09: fffffffffffe2710
R10: 000000002938322d R11: 00000000322d2072 R12: 00005cb02659c000
R13: 00000000000014ce R14: ffff8ce8ab3fb7e0 R15: ffff8ce8de433800
FS: 0000000000000000(0000) GS:ffff8ceda0a80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055f2f46bb4c8 CR3: 000000010814c000 CR4: 00000000003506e0
Call Trace:
<TASK>
run_delayed_data_ref+0x93/0x160 [btrfs]
btrfs_run_delayed_refs_for_head+0x193/0x520 [btrfs]
__btrfs_run_delayed_refs+0x8c/0x1d0 [btrfs]
btrfs_run_delayed_refs+0x73/0x200 [btrfs]
btrfs_start_dirty_block_groups+0x296/0x4f0 [btrfs]
btrfs_commit_transaction+0x716/0xaa0 [btrfs]
? start_transaction+0xd1/0x5b0 [btrfs]
? __bpf_trace_hrtimer_init+0x20/0x20
transaction_kthread+0x13c/0x1b0 [btrfs]
? btrfs_cleanup_transaction.isra.0+0x3c0/0x3c0 [btrfs]
kthread+0x12a/0x150
? set_kthread_struct+0x50/0x50
ret_from_fork+0x22/0x30
</TASK>
---[ end trace 8a20922ac453f776 ]---
BTRFS: error (device sdk) in __btrfs_free_extent:3180: errno=-28 No space left
BTRFS error (device sdk): failed to run delayed ref for logical
101911627415552 num_bytes 126976 type 184 action 2 ref_mod 1: -28
BTRFS: error (device sdk) in btrfs_run_delayed_refs:2152: errno=-28 No
space left
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: Out of space loop: skip_balance not working 2023-06-12 4:47 Out of space loop: skip_balance not working Stefan N @ 2023-06-12 5:20 ` Qu Wenruo 2023-06-12 10:31 ` Stefan N 0 siblings, 1 reply; 22+ messages in thread From: Qu Wenruo @ 2023-06-12 5:20 UTC (permalink / raw) To: Stefan N, linux-btrfs On 2023/6/12 12:47, Stefan N wrote: > Hi, > > I'm having trouble trying to break my array out of an out of space loop. > > On reboot I'm able to mount the filesystem and read files fine but as > soon as I try to delete/write it hangs until the mount is made read > only when it then fails. > > The following command (immediately after boot, no fstab) suggests > perhaps the skip_balance is not working as expected: > $ mount -o skip_balance -t btrfs /dev/sde /mnt/point && btrfs device > add /dev/loop12 /mnt/point/ > ERROR: unable to start device add, another exclusive operation > 'balance' in progress skip_balance makes the balance into the paused status. You still need to cancel it first. > and ps shows a [btrfs-balance] process. Furthermore, balance won't help for your case. Both metadata and data are almost full. > > If I perform a rm or truncate during this window it fails to perform > any action before being marked read only. The same applies if I > attempt to cancel the balance. > > How can I get out of this cycle? I've previously run out of space and > been able to recover by deleting a few files etc without needing to > invoke skip_balance, but that was likely on older versions. > > Any help would be greatly appreciated. > > - Stefan > > $ uname -a > Linux my.host 5.15.0-73-generic #80-Ubuntu SMP Mon May 15 15:18:26 UTC > 2023 x86_64 x86_64 x86_64 GNU/Linux > $ btrfs --version > btrfs-progs v5.16.2 > $ btrfs fi show > Label: none uuid: --- > Total devices 8 FS bytes used 64.67TiB > devid 1 size 10.91TiB used 10.91TiB path /dev/sdk > devid 2 size 10.91TiB used 10.91TiB path /dev/sdh > devid 3 size 10.91TiB used 10.91TiB path /dev/sdj > devid 4 size 10.91TiB used 10.91TiB path /dev/sdi > devid 5 size 10.91TiB used 10.91TiB path /dev/sdf > devid 6 size 10.91TiB used 10.91TiB path /dev/sdg > devid 7 size 10.91TiB used 10.91TiB path /dev/sdd > devid 8 size 10.91TiB used 10.91TiB path /dev/sde > $ btrfs fi df /mnt/point/ > Data, RAID6: total=64.76TiB, used=64.59TiB > System, RAID1C4: total=37.00MiB, used=5.11MiB > Metadata, RAID1C4: total=77.79GiB, used=77.10GiB > GlobalReserve, single: total=512.00MiB, used=387.11MiB > $ > My recommendation is, try some newer kernel (easier with a rolling distro liveCD). Still with skip_balance, cancel the balance, and delete a small file first, then sync, and check if the fs is still fine. Then start with larger and larger files/subvolumes. Thanks, Qu > BTRFS: Transaction aborted (error -28) > BTRFS: error (device sdk) in __btrfs_free_extent:3180: errno=-28 No space left > BTRFS info (device sdk): forced readonly > BTRFS error (device sdk): failed to run delayed ref for logical > 101911627694080 num_bytes 126976 type 184 action 2 ref_mod 1: -28 > WARNING: CPU: 2 PID: 7851 at fs/btrfs/extent-tree.c:3180 > __btrfs_free_extent+0x7e4/0x950 [btrfs] > BTRFS: error (device sdk) in btrfs_run_delayed_refs:2152: errno=-28 No > space left > BTRFS warning (device sdk): btrfs_uuid_scan_kthread failed -28 > Modules linked in: xt_nat xt_tcpudp veth xt_conntrack nft_chain_nat > xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 > nf_defrag_ipv4 xfrm_user xfrm_algo nft_counter xt_addrtype nft_compat > nf_tables nfnetlink br_netfilter bridge stp llc ipmi_devintf > ipmi_msghandler overlay binfmt_misc intel_rapl_msr intel_rapl_common > edac_mce_amd snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio > snd_hda_codec_hdmi kvm_amd nls_iso8859_1 kvm snd_hda_intel rapl > snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core > wmi_bmof input_leds snd_hwdep snd_pcm k10temp snd_timer snd ccp > soundcore mac_hid sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc > scsi_dh_alua bonding tls ramoops pstore_blk msr reed_solomon > pstore_zone efi_pstore nfsd auth_rpcgss nfs_acl lockd grace sunrpc > ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 > raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor > raid6_pq libcrc32c raid1 raid0 multipath linear > hid_generic usbhid hid uas usb_storage amdgpu iommu_v2 gpu_sched > drm_ttm_helper crct10dif_pclmul ttm drm_kms_helper syscopyarea > sysfillrect sysimgblt fb_sys_fops crc32_pclmul cec ghash_clmulni_intel > aesni_intel mpt3sas rc_core raid_class crypto_simd drm nvme i2c_piix4 > cryptd scsi_transport_sas igb dca ahci libahci xhci_pci qlcnic > i2c_algo_bit nvme_core xhci_pci_renesas wmi video > CPU: 2 PID: 7851 Comm: btrfs-transacti Not tainted 5.15.0-73-generic #80-Ubuntu > Hardware name: To Be Filled By O.E.M. X570M Pro4/X570M Pro4, BIOS > P3.70 02/23/2022 > RIP: 0010:__btrfs_free_extent+0x7e4/0x950 [btrfs] > Code: a0 48 05 50 0a 00 00 f0 48 0f ba 28 03 72 1d 8b 45 84 83 f8 fb > 74 32 83 f8 e2 74 2d 89 c6 48 c7 c7 98 f6 34 c1 e8 ed 42 a9 e6 <0f> 0b > 8b 4d 84 48 8b 7d 90 ba 6c 0c 00 00 48 c7 c6 60 39 34 c1 e8 > RSP: 0018:ffffb63581c9fb68 EFLAGS: 00010286 > RAX: 0000000000000000 RBX: 00000000000000d1 RCX: 0000000000000027 > RDX: ffff8ceda0aa0588 RSI: 0000000000000001 RDI: ffff8ceda0aa0580 > RBP: ffffb63581c9fc10 R08: 0000000000000003 R09: fffffffffffe2710 > R10: 000000002938322d R11: 00000000322d2072 R12: 00005cb02659c000 > R13: 00000000000014ce R14: ffff8ce8ab3fb7e0 R15: ffff8ce8de433800 > FS: 0000000000000000(0000) GS:ffff8ceda0a80000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 000055f2f46bb4c8 CR3: 000000010814c000 CR4: 00000000003506e0 > Call Trace: > <TASK> > run_delayed_data_ref+0x93/0x160 [btrfs] > btrfs_run_delayed_refs_for_head+0x193/0x520 [btrfs] > __btrfs_run_delayed_refs+0x8c/0x1d0 [btrfs] > btrfs_run_delayed_refs+0x73/0x200 [btrfs] > btrfs_start_dirty_block_groups+0x296/0x4f0 [btrfs] > btrfs_commit_transaction+0x716/0xaa0 [btrfs] > ? start_transaction+0xd1/0x5b0 [btrfs] > ? __bpf_trace_hrtimer_init+0x20/0x20 > transaction_kthread+0x13c/0x1b0 [btrfs] > ? btrfs_cleanup_transaction.isra.0+0x3c0/0x3c0 [btrfs] > kthread+0x12a/0x150 > ? set_kthread_struct+0x50/0x50 > ret_from_fork+0x22/0x30 > </TASK> > ---[ end trace 8a20922ac453f776 ]--- > BTRFS: error (device sdk) in __btrfs_free_extent:3180: errno=-28 No space left > BTRFS error (device sdk): failed to run delayed ref for logical > 101911627415552 num_bytes 126976 type 184 action 2 ref_mod 1: -28 > BTRFS: error (device sdk) in btrfs_run_delayed_refs:2152: errno=-28 No > space left ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Out of space loop: skip_balance not working 2023-06-12 5:20 ` Qu Wenruo @ 2023-06-12 10:31 ` Stefan N 2023-06-12 10:46 ` Qu Wenruo 0 siblings, 1 reply; 22+ messages in thread From: Stefan N @ 2023-06-12 10:31 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs Hi Qu, Thanks for the quick helpful response, though perhaps it may not be sufficient in my case. I've tried using the latest ubuntu livecd which has btrfs-progs v6.2 on kernel 6.20.0-20 Unfortunately I haven't been able to get any further as even when doing a rm, truncate, btrfs fi sync or btrfs dev add immediately after mounting it still results in i/o error or read only. I tried removing a small file or two or directories with no difference. The stack trace below still suggests no space left but space_info does clarify that metadata is -124M but data has 149G and system 31M free. When I mount I get: using crc32c (crc32c-intel) checksum algorithm disk space caching is enabled bdev /dev/sdf errs: wr 0, rd 0, flush 0, corrupt 845, gen 0 bdev /dev/sdg errs: wr 41089, rd 1556, flush 0, corrupt 0, gen 0 bdev /dev/sdc errs: wr 3, rd 7, flush 0, corrupt 0, gen 0 bdev /dev/sde errs: wr 41, rd 0, flush 0, corrupt 0, gen 0 balance: resume skipped checking UUID tree Then when it inevitably crashes, something like: BTRFS: Transaction aborted (error -28) WARNING: CPU: 3 PID: 24859 at fs/btrfs/extent-tree.c:2847 do_free_extent_accounting+0x21a/0x220 [btrfs] Modules linked in: nfnetlink ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs cfg80211 snd_seq_dummy snd_hrtimer binfmt_misc zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) snd_hda_codec_realtek spl(O) snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core intel_rapl_msr snd_hwdep intel_rapl_common snd_pcm snd_seq_midi snd_seq_midi_event edac_mce_amd snd_rawmidi kvm_amd snd_seq snd_seq_device snd_timer kvm snd irqbypass soundcore rapl k10temp wmi_bmof ccp input_leds mac_hid msr parport_pc ppdev lp parport efi_pstore dmi_sysfs ip_tables x_tables autofs4 overlay isofs nls_iso8859_1 btrfs blake2b_generic xor raid6_pq libcrc32c dm_mirror dm_region_hash dm_loghid_generic usbhid hid amdgpu iommu_v2 drm_buddy gpu_sched drm_ttm_helper ttm drm_display_helper cec rc_core uas usb_storage mpt3sas drm_kms_helper raid_class crct10dif_pclmul nvme crc32_pclmul syscopyarea polyval_clmulni polyval_generic sysfillrect ghash_clmulni_intel sha512_ssse3 sysimgblt aesni_intel crypto_simd xhci_pci drm igb cryptd i2c_piix4 qlcnic xhci_pci_renesas scsi_transport_sas nvme_core ahci libahci dca nvme_common i2c_algo_bit videowmi CPU: 3 PID: 24859 Comm: kworker/u8:8 Tainted: P W O 6.2.0-20-generic #20-Ubuntu Hardware name: To Be Filled By O.E.M. X570M Pro4/X570M Pro4, BIOS P3.70 02/23/2022 Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs] RIP: 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] Code: f4 0f 0b eb b8 44 89 e6 48 c7 c7 20 99 8d c1 e8 7c 74 b1 f4 0f 0b e9 78 ff ff ff 44 89 e6 48 c7 c7 20 99 8d c1 e8 66 74 b1 f4 <0f> 0b eb b9 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 RSP: 0018:ffffb4374719fb58 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff9babc07adf08 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffb4374719fb80 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffe4 R13: 00005cd6df95f000 R14: 000000000001f000 R15: ffff9baad11628c0 FS: 0000000000000000(0000) GS:ffff9bb1e1180000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000306c962cc000 CR3: 0000000126e4e000 CR4: 00000000003506e0 Call Trace: <TASK> __btrfs_free_extent+0x6bc/0xf50 [btrfs] run_delayed_data_ref+0x8b/0x180 [btrfs] btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] flush_space+0x23c/0x2c0 [btrfs] btrfs_async_reclaim_metadata_space+0x1d4/0x300 [btrfs] process_one_work+0x225/0x430 worker_thread+0x50/0x3e0 ? __pfx_worker_thread+0x10/0x10 kthread+0xe9/0x110 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 </TASK> ---[ end trace 0000000000000000 ]--- BTRFS info (device sdi: state A): dumping space info: BTRFS info (device sdi: state A): space_info DATA has 160778199040 free, is not full BTRFS info (device sdi: state A): space_info total=71201958395904, used=71018527428608, pinned=22649229312, reserved=0, may_use=0, readonly=3538944 zone_unusable=0 BTRFS info (device sdi: state A): space_info METADATA has -130809856 free, is full BTRFS info (device sdi: state A): space_info total=83530612736, used=82789154816, pinned=245710848, reserved=495747072, may_use=130809856, readonly=0 zone_unusable=0 BTRFS info (device sdi: state A): space_info SYSTEM has 33439744 free, is not full BTRFS info (device sdi: state A): space_info total=38797312, used=5357568, pinned=0, reserved=0, may_use=0, readonly=0 zone_unusable=0 BTRFS info (device sdi: state A): global_block_rsv: size 536870912 reserved 130809856 BTRFS info (device sdi: state A): trans_block_rsv: size 0 reserved 0 BTRFS info (device sdi: state A): chunk_block_rsv: size 0 reserved 0 BTRFS info (device sdi: state A): delayed_block_rsv: size 0 reserved 0 BTRFS info (device sdi: state A): delayed_refs_rsv: size 220645556224 reserved 0 BTRFS: error (device sdi: state A) in do_free_extent_accounting:2847: errno=-28 No space left BTRFS info (device sdi: state EA): forced readonly On Mon, 12 Jun 2023 at 14:50, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > > > On 2023/6/12 12:47, Stefan N wrote: > > Hi, > > > > I'm having trouble trying to break my array out of an out of space loop. > > > > On reboot I'm able to mount the filesystem and read files fine but as > > soon as I try to delete/write it hangs until the mount is made read > > only when it then fails. > > > > The following command (immediately after boot, no fstab) suggests > > perhaps the skip_balance is not working as expected: > > $ mount -o skip_balance -t btrfs /dev/sde /mnt/point && btrfs device > > add /dev/loop12 /mnt/point/ > > ERROR: unable to start device add, another exclusive operation > > 'balance' in progress > > skip_balance makes the balance into the paused status. > You still need to cancel it first. > > > and ps shows a [btrfs-balance] process. > > Furthermore, balance won't help for your case. > > Both metadata and data are almost full. > > > > > If I perform a rm or truncate during this window it fails to perform > > any action before being marked read only. The same applies if I > > attempt to cancel the balance. > > > > How can I get out of this cycle? I've previously run out of space and > > been able to recover by deleting a few files etc without needing to > > invoke skip_balance, but that was likely on older versions. > > > > Any help would be greatly appreciated. > > > > - Stefan > > > > $ uname -a > > Linux my.host 5.15.0-73-generic #80-Ubuntu SMP Mon May 15 15:18:26 UTC > > 2023 x86_64 x86_64 x86_64 GNU/Linux > > $ btrfs --version > > btrfs-progs v5.16.2 > > $ btrfs fi show > > Label: none uuid: --- > > Total devices 8 FS bytes used 64.67TiB > > devid 1 size 10.91TiB used 10.91TiB path /dev/sdk > > devid 2 size 10.91TiB used 10.91TiB path /dev/sdh > > devid 3 size 10.91TiB used 10.91TiB path /dev/sdj > > devid 4 size 10.91TiB used 10.91TiB path /dev/sdi > > devid 5 size 10.91TiB used 10.91TiB path /dev/sdf > > devid 6 size 10.91TiB used 10.91TiB path /dev/sdg > > devid 7 size 10.91TiB used 10.91TiB path /dev/sdd > > devid 8 size 10.91TiB used 10.91TiB path /dev/sde > > $ btrfs fi df /mnt/point/ > > Data, RAID6: total=64.76TiB, used=64.59TiB > > System, RAID1C4: total=37.00MiB, used=5.11MiB > > Metadata, RAID1C4: total=77.79GiB, used=77.10GiB > > GlobalReserve, single: total=512.00MiB, used=387.11MiB > > $ > > > > My recommendation is, try some newer kernel (easier with a rolling > distro liveCD). > > Still with skip_balance, cancel the balance, and delete a small file > first, then sync, and check if the fs is still fine. > > Then start with larger and larger files/subvolumes. > > Thanks, > Qu > > > BTRFS: Transaction aborted (error -28) > > BTRFS: error (device sdk) in __btrfs_free_extent:3180: errno=-28 No space left > > BTRFS info (device sdk): forced readonly > > BTRFS error (device sdk): failed to run delayed ref for logical > > 101911627694080 num_bytes 126976 type 184 action 2 ref_mod 1: -28 > > WARNING: CPU: 2 PID: 7851 at fs/btrfs/extent-tree.c:3180 > > __btrfs_free_extent+0x7e4/0x950 [btrfs] > > BTRFS: error (device sdk) in btrfs_run_delayed_refs:2152: errno=-28 No > > space left > > BTRFS warning (device sdk): btrfs_uuid_scan_kthread failed -28 > > Modules linked in: xt_nat xt_tcpudp veth xt_conntrack nft_chain_nat > > xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 > > nf_defrag_ipv4 xfrm_user xfrm_algo nft_counter xt_addrtype nft_compat > > nf_tables nfnetlink br_netfilter bridge stp llc ipmi_devintf > > ipmi_msghandler overlay binfmt_misc intel_rapl_msr intel_rapl_common > > edac_mce_amd snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio > > snd_hda_codec_hdmi kvm_amd nls_iso8859_1 kvm snd_hda_intel rapl > > snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core > > wmi_bmof input_leds snd_hwdep snd_pcm k10temp snd_timer snd ccp > > soundcore mac_hid sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc > > scsi_dh_alua bonding tls ramoops pstore_blk msr reed_solomon > > pstore_zone efi_pstore nfsd auth_rpcgss nfs_acl lockd grace sunrpc > > ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 > > raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor > > raid6_pq libcrc32c raid1 raid0 multipath linear > > hid_generic usbhid hid uas usb_storage amdgpu iommu_v2 gpu_sched > > drm_ttm_helper crct10dif_pclmul ttm drm_kms_helper syscopyarea > > sysfillrect sysimgblt fb_sys_fops crc32_pclmul cec ghash_clmulni_intel > > aesni_intel mpt3sas rc_core raid_class crypto_simd drm nvme i2c_piix4 > > cryptd scsi_transport_sas igb dca ahci libahci xhci_pci qlcnic > > i2c_algo_bit nvme_core xhci_pci_renesas wmi video > > CPU: 2 PID: 7851 Comm: btrfs-transacti Not tainted 5.15.0-73-generic #80-Ubuntu > > Hardware name: To Be Filled By O.E.M. X570M Pro4/X570M Pro4, BIOS > > P3.70 02/23/2022 > > RIP: 0010:__btrfs_free_extent+0x7e4/0x950 [btrfs] > > Code: a0 48 05 50 0a 00 00 f0 48 0f ba 28 03 72 1d 8b 45 84 83 f8 fb > > 74 32 83 f8 e2 74 2d 89 c6 48 c7 c7 98 f6 34 c1 e8 ed 42 a9 e6 <0f> 0b > > 8b 4d 84 48 8b 7d 90 ba 6c 0c 00 00 48 c7 c6 60 39 34 c1 e8 > > RSP: 0018:ffffb63581c9fb68 EFLAGS: 00010286 > > RAX: 0000000000000000 RBX: 00000000000000d1 RCX: 0000000000000027 > > RDX: ffff8ceda0aa0588 RSI: 0000000000000001 RDI: ffff8ceda0aa0580 > > RBP: ffffb63581c9fc10 R08: 0000000000000003 R09: fffffffffffe2710 > > R10: 000000002938322d R11: 00000000322d2072 R12: 00005cb02659c000 > > R13: 00000000000014ce R14: ffff8ce8ab3fb7e0 R15: ffff8ce8de433800 > > FS: 0000000000000000(0000) GS:ffff8ceda0a80000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 000055f2f46bb4c8 CR3: 000000010814c000 CR4: 00000000003506e0 > > Call Trace: > > <TASK> > > run_delayed_data_ref+0x93/0x160 [btrfs] > > btrfs_run_delayed_refs_for_head+0x193/0x520 [btrfs] > > __btrfs_run_delayed_refs+0x8c/0x1d0 [btrfs] > > btrfs_run_delayed_refs+0x73/0x200 [btrfs] > > btrfs_start_dirty_block_groups+0x296/0x4f0 [btrfs] > > btrfs_commit_transaction+0x716/0xaa0 [btrfs] > > ? start_transaction+0xd1/0x5b0 [btrfs] > > ? __bpf_trace_hrtimer_init+0x20/0x20 > > transaction_kthread+0x13c/0x1b0 [btrfs] > > ? btrfs_cleanup_transaction.isra.0+0x3c0/0x3c0 [btrfs] > > kthread+0x12a/0x150 > > ? set_kthread_struct+0x50/0x50 > > ret_from_fork+0x22/0x30 > > </TASK> > > ---[ end trace 8a20922ac453f776 ]--- > > BTRFS: error (device sdk) in __btrfs_free_extent:3180: errno=-28 No space left > > BTRFS error (device sdk): failed to run delayed ref for logical > > 101911627415552 num_bytes 126976 type 184 action 2 ref_mod 1: -28 > > BTRFS: error (device sdk) in btrfs_run_delayed_refs:2152: errno=-28 No > > space left ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Out of space loop: skip_balance not working 2023-06-12 10:31 ` Stefan N @ 2023-06-12 10:46 ` Qu Wenruo 2023-06-12 13:02 ` Stefan N 0 siblings, 1 reply; 22+ messages in thread From: Qu Wenruo @ 2023-06-12 10:46 UTC (permalink / raw) To: Stefan N; +Cc: linux-btrfs On 2023/6/12 18:31, Stefan N wrote: > Hi Qu, > > Thanks for the quick helpful response, though perhaps it may not be > sufficient in my case. > > I've tried using the latest ubuntu livecd which has btrfs-progs v6.2 > on kernel 6.20.0-20 I guess you mean 6.2? In v6.2 kernel Josef introduced a new mechanism called FLUSH_EMERGENCY to try our best to squish out any extra metadata space. If that doesn't work, I'm running out of ideas. > > Unfortunately I haven't been able to get any further as even when > doing a rm, truncate, btrfs fi sync or btrfs dev add immediately after > mounting it still results in i/o error or read only. I tried removing > a small file or two or directories with no difference. > [...] > __btrfs_free_extent+0x6bc/0xf50 [btrfs] > run_delayed_data_ref+0x8b/0x180 [btrfs] > btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] > __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] > btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] > flush_space+0x23c/0x2c0 [btrfs] > btrfs_async_reclaim_metadata_space+0x1d4/0x300 [btrfs] > process_one_work+0x225/0x430 > worker_thread+0x50/0x3e0 > ? __pfx_worker_thread+0x10/0x10 > kthread+0xe9/0x110 > ? __pfx_kthread+0x10/0x10 > ret_from_fork+0x2c/0x50 > </TASK> > ---[ end trace 0000000000000000 ]--- > BTRFS info (device sdi: state A): dumping space info: > BTRFS info (device sdi: state A): space_info DATA has 160778199040 > free, is not full > BTRFS info (device sdi: state A): space_info total=71201958395904, > used=71018527428608, pinned=22649229312, reserved=0, may_use=0, > readonly=3538944 zone_unusable=0 > BTRFS info (device sdi: state A): space_info METADATA has -130809856 > free, is full That minus number is from the global RSV. Not a big deal to worry. > BTRFS info (device sdi: state A): space_info total=83530612736, > used=82789154816, pinned=245710848, reserved=495747072, > may_use=130809856, readonly=0 zone_unusable=0 The big concern here is, we have hundreds of MiBs for pinned/reserved/may_use. Which looks too large. My concern is either free space tree or extent tree updates are taking too much space. Have you tried to cancel the balance and sync? That doesn't work either? Considering you have quite some data space left, you may want to migrate to space cache v1. Unlike v2 cache, v1 cache only takes data space, thus may squish out some precious metadata space. Thanks, Qu > BTRFS info (device sdi: state A): space_info SYSTEM has 33439744 free, > is not full > BTRFS info (device sdi: state A): space_info total=38797312, > used=5357568, pinned=0, reserved=0, may_use=0, readonly=0 > zone_unusable=0 > BTRFS info (device sdi: state A): global_block_rsv: size 536870912 > reserved 130809856 > BTRFS info (device sdi: state A): trans_block_rsv: size 0 reserved 0 > BTRFS info (device sdi: state A): chunk_block_rsv: size 0 reserved 0 > BTRFS info (device sdi: state A): delayed_block_rsv: size 0 reserved 0 > BTRFS info (device sdi: state A): delayed_refs_rsv: size 220645556224 reserved 0 > BTRFS: error (device sdi: state A) in do_free_extent_accounting:2847: > errno=-28 No space left > BTRFS info (device sdi: state EA): forced readonly > > On Mon, 12 Jun 2023 at 14:50, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >> >> >> >> On 2023/6/12 12:47, Stefan N wrote: >>> Hi, >>> >>> I'm having trouble trying to break my array out of an out of space loop. >>> >>> On reboot I'm able to mount the filesystem and read files fine but as >>> soon as I try to delete/write it hangs until the mount is made read >>> only when it then fails. >>> >>> The following command (immediately after boot, no fstab) suggests >>> perhaps the skip_balance is not working as expected: >>> $ mount -o skip_balance -t btrfs /dev/sde /mnt/point && btrfs device >>> add /dev/loop12 /mnt/point/ >>> ERROR: unable to start device add, another exclusive operation >>> 'balance' in progress >> >> skip_balance makes the balance into the paused status. >> You still need to cancel it first. >> >>> and ps shows a [btrfs-balance] process. >> >> Furthermore, balance won't help for your case. >> >> Both metadata and data are almost full. >> >>> >>> If I perform a rm or truncate during this window it fails to perform >>> any action before being marked read only. The same applies if I >>> attempt to cancel the balance. >>> >>> How can I get out of this cycle? I've previously run out of space and >>> been able to recover by deleting a few files etc without needing to >>> invoke skip_balance, but that was likely on older versions. >>> >>> Any help would be greatly appreciated. >>> >>> - Stefan >>> >>> $ uname -a >>> Linux my.host 5.15.0-73-generic #80-Ubuntu SMP Mon May 15 15:18:26 UTC >>> 2023 x86_64 x86_64 x86_64 GNU/Linux >>> $ btrfs --version >>> btrfs-progs v5.16.2 >>> $ btrfs fi show >>> Label: none uuid: --- >>> Total devices 8 FS bytes used 64.67TiB >>> devid 1 size 10.91TiB used 10.91TiB path /dev/sdk >>> devid 2 size 10.91TiB used 10.91TiB path /dev/sdh >>> devid 3 size 10.91TiB used 10.91TiB path /dev/sdj >>> devid 4 size 10.91TiB used 10.91TiB path /dev/sdi >>> devid 5 size 10.91TiB used 10.91TiB path /dev/sdf >>> devid 6 size 10.91TiB used 10.91TiB path /dev/sdg >>> devid 7 size 10.91TiB used 10.91TiB path /dev/sdd >>> devid 8 size 10.91TiB used 10.91TiB path /dev/sde >>> $ btrfs fi df /mnt/point/ >>> Data, RAID6: total=64.76TiB, used=64.59TiB >>> System, RAID1C4: total=37.00MiB, used=5.11MiB >>> Metadata, RAID1C4: total=77.79GiB, used=77.10GiB >>> GlobalReserve, single: total=512.00MiB, used=387.11MiB >>> $ >>> >> >> My recommendation is, try some newer kernel (easier with a rolling >> distro liveCD). >> >> Still with skip_balance, cancel the balance, and delete a small file >> first, then sync, and check if the fs is still fine. >> >> Then start with larger and larger files/subvolumes. >> >> Thanks, >> Qu >> >>> BTRFS: Transaction aborted (error -28) >>> BTRFS: error (device sdk) in __btrfs_free_extent:3180: errno=-28 No space left >>> BTRFS info (device sdk): forced readonly >>> BTRFS error (device sdk): failed to run delayed ref for logical >>> 101911627694080 num_bytes 126976 type 184 action 2 ref_mod 1: -28 >>> WARNING: CPU: 2 PID: 7851 at fs/btrfs/extent-tree.c:3180 >>> __btrfs_free_extent+0x7e4/0x950 [btrfs] >>> BTRFS: error (device sdk) in btrfs_run_delayed_refs:2152: errno=-28 No >>> space left >>> BTRFS warning (device sdk): btrfs_uuid_scan_kthread failed -28 >>> Modules linked in: xt_nat xt_tcpudp veth xt_conntrack nft_chain_nat >>> xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 >>> nf_defrag_ipv4 xfrm_user xfrm_algo nft_counter xt_addrtype nft_compat >>> nf_tables nfnetlink br_netfilter bridge stp llc ipmi_devintf >>> ipmi_msghandler overlay binfmt_misc intel_rapl_msr intel_rapl_common >>> edac_mce_amd snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio >>> snd_hda_codec_hdmi kvm_amd nls_iso8859_1 kvm snd_hda_intel rapl >>> snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core >>> wmi_bmof input_leds snd_hwdep snd_pcm k10temp snd_timer snd ccp >>> soundcore mac_hid sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc >>> scsi_dh_alua bonding tls ramoops pstore_blk msr reed_solomon >>> pstore_zone efi_pstore nfsd auth_rpcgss nfs_acl lockd grace sunrpc >>> ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 >>> raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor >>> raid6_pq libcrc32c raid1 raid0 multipath linear >>> hid_generic usbhid hid uas usb_storage amdgpu iommu_v2 gpu_sched >>> drm_ttm_helper crct10dif_pclmul ttm drm_kms_helper syscopyarea >>> sysfillrect sysimgblt fb_sys_fops crc32_pclmul cec ghash_clmulni_intel >>> aesni_intel mpt3sas rc_core raid_class crypto_simd drm nvme i2c_piix4 >>> cryptd scsi_transport_sas igb dca ahci libahci xhci_pci qlcnic >>> i2c_algo_bit nvme_core xhci_pci_renesas wmi video >>> CPU: 2 PID: 7851 Comm: btrfs-transacti Not tainted 5.15.0-73-generic #80-Ubuntu >>> Hardware name: To Be Filled By O.E.M. X570M Pro4/X570M Pro4, BIOS >>> P3.70 02/23/2022 >>> RIP: 0010:__btrfs_free_extent+0x7e4/0x950 [btrfs] >>> Code: a0 48 05 50 0a 00 00 f0 48 0f ba 28 03 72 1d 8b 45 84 83 f8 fb >>> 74 32 83 f8 e2 74 2d 89 c6 48 c7 c7 98 f6 34 c1 e8 ed 42 a9 e6 <0f> 0b >>> 8b 4d 84 48 8b 7d 90 ba 6c 0c 00 00 48 c7 c6 60 39 34 c1 e8 >>> RSP: 0018:ffffb63581c9fb68 EFLAGS: 00010286 >>> RAX: 0000000000000000 RBX: 00000000000000d1 RCX: 0000000000000027 >>> RDX: ffff8ceda0aa0588 RSI: 0000000000000001 RDI: ffff8ceda0aa0580 >>> RBP: ffffb63581c9fc10 R08: 0000000000000003 R09: fffffffffffe2710 >>> R10: 000000002938322d R11: 00000000322d2072 R12: 00005cb02659c000 >>> R13: 00000000000014ce R14: ffff8ce8ab3fb7e0 R15: ffff8ce8de433800 >>> FS: 0000000000000000(0000) GS:ffff8ceda0a80000(0000) knlGS:0000000000000000 >>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> CR2: 000055f2f46bb4c8 CR3: 000000010814c000 CR4: 00000000003506e0 >>> Call Trace: >>> <TASK> >>> run_delayed_data_ref+0x93/0x160 [btrfs] >>> btrfs_run_delayed_refs_for_head+0x193/0x520 [btrfs] >>> __btrfs_run_delayed_refs+0x8c/0x1d0 [btrfs] >>> btrfs_run_delayed_refs+0x73/0x200 [btrfs] >>> btrfs_start_dirty_block_groups+0x296/0x4f0 [btrfs] >>> btrfs_commit_transaction+0x716/0xaa0 [btrfs] >>> ? start_transaction+0xd1/0x5b0 [btrfs] >>> ? __bpf_trace_hrtimer_init+0x20/0x20 >>> transaction_kthread+0x13c/0x1b0 [btrfs] >>> ? btrfs_cleanup_transaction.isra.0+0x3c0/0x3c0 [btrfs] >>> kthread+0x12a/0x150 >>> ? set_kthread_struct+0x50/0x50 >>> ret_from_fork+0x22/0x30 >>> </TASK> >>> ---[ end trace 8a20922ac453f776 ]--- >>> BTRFS: error (device sdk) in __btrfs_free_extent:3180: errno=-28 No space left >>> BTRFS error (device sdk): failed to run delayed ref for logical >>> 101911627415552 num_bytes 126976 type 184 action 2 ref_mod 1: -28 >>> BTRFS: error (device sdk) in btrfs_run_delayed_refs:2152: errno=-28 No >>> space left ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Out of space loop: skip_balance not working 2023-06-12 10:46 ` Qu Wenruo @ 2023-06-12 13:02 ` Stefan N 2023-06-13 1:29 ` Paul Jones 0 siblings, 1 reply; 22+ messages in thread From: Stefan N @ 2023-06-12 13:02 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Mon, 12 Jun 2023 at 20:16, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > I've tried using the latest ubuntu livecd which has btrfs-progs v6.2 > > on kernel 6.20.0-20 > > I guess you mean 6.2? Sorry yes kernel 6.2.0-20 (Ubuntu) > In v6.2 kernel Josef introduced a new mechanism called FLUSH_EMERGENCY > to try our best to squish out any extra metadata space. > > If that doesn't work, I'm running out of ideas. How do I go about forcing this to engage? Currently the array never stays in write mode long enough to do anything, so I'd need to trigger something immediately after mount to have a chance that it syncs before it goes into read only mode. > > BTRFS info (device sdi: state A): space_info total=83530612736, > > used=82789154816, pinned=245710848, reserved=495747072, > > may_use=130809856, readonly=0 zone_unusable=0 > > The big concern here is, we have hundreds of MiBs for > pinned/reserved/may_use. > > Which looks too large. > > My concern is either free space tree or extent tree updates are taking > too much space. > > Have you tried to cancel the balance and sync? That doesn't work either? The balance cancels ok, and there's no sync running except the auto UUID tree check on mount. > Considering you have quite some data space left, you may want to migrate > to space cache v1. > Unlike v2 cache, v1 cache only takes data space, thus may squish out > some precious metadata space. From the 'disk space caching is enabled' in the log it must still be using space cache v1, and forcing it as a flag doesn't appear to change anything. With many remount cycles, the best I've been able to achieve has been to rm some small files, but they never synced and were back in btrfs on remount. I'm running out of ideas, and given the size I really don't want to have to replace/rebuild if I can help it! Any other ideas would be greatly appreciated ^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: Out of space loop: skip_balance not working 2023-06-12 13:02 ` Stefan N @ 2023-06-13 1:29 ` Paul Jones 2023-06-13 1:54 ` Stefan N 0 siblings, 1 reply; 22+ messages in thread From: Paul Jones @ 2023-06-13 1:29 UTC (permalink / raw) To: Stefan N, Qu Wenruo; +Cc: linux-btrfs@vger.kernel.org > -----Original Message----- > From: Stefan N <stefannnau@gmail.com> > Sent: Monday, June 12, 2023 11:03 PM > To: Qu Wenruo <quwenruo.btrfs@gmx.com> > Cc: linux-btrfs@vger.kernel.org > Subject: Re: Out of space loop: skip_balance not working > > On Mon, 12 Jun 2023 at 20:16, Qu Wenruo <quwenruo.btrfs@gmx.com> > wrote: > > > I've tried using the latest ubuntu livecd which has btrfs-progs v6.2 > > > on kernel 6.20.0-20 > > > > I guess you mean 6.2? > > Sorry yes kernel 6.2.0-20 (Ubuntu) > > > In v6.2 kernel Josef introduced a new mechanism called > FLUSH_EMERGENCY > > to try our best to squish out any extra metadata space. > > > > If that doesn't work, I'm running out of ideas. > > How do I go about forcing this to engage? Currently the array never stays in > write mode long enough to do anything, so I'd need to trigger something > immediately after mount to have a chance that it syncs before it goes into > read only mode. > > > > BTRFS info (device sdi: state A): space_info total=83530612736, > > > used=82789154816, pinned=245710848, reserved=495747072, > > > may_use=130809856, readonly=0 zone_unusable=0 > > > > The big concern here is, we have hundreds of MiBs for > > pinned/reserved/may_use. > > > > Which looks too large. > > > > My concern is either free space tree or extent tree updates are taking > > too much space. > > > > Have you tried to cancel the balance and sync? That doesn't work either? > > The balance cancels ok, and there's no sync running except the auto UUID > tree check on mount. > > > Considering you have quite some data space left, you may want to > > migrate to space cache v1. > > Unlike v2 cache, v1 cache only takes data space, thus may squish out > > some precious metadata space. > > From the 'disk space caching is enabled' in the log it must still be using space > cache v1, and forcing it as a flag doesn't appear to change anything. > > With many remount cycles, the best I've been able to achieve has been to rm > some small files, but they never synced and were back in btrfs on remount. > > I'm running out of ideas, and given the size I really don't want to have to > replace/rebuild if I can help it! > > Any other ideas would be greatly appreciated When I've had similar issues in the past I've managed to create some space by adding a usb drive (or two) to the filesystem, which then gives enough of a buffer to remove some files, and when btrfs will let you remove the extra drive you know everything is back under control. Paul. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Out of space loop: skip_balance not working 2023-06-13 1:29 ` Paul Jones @ 2023-06-13 1:54 ` Stefan N 2023-06-13 1:58 ` Qu Wenruo 0 siblings, 1 reply; 22+ messages in thread From: Stefan N @ 2023-06-13 1:54 UTC (permalink / raw) To: Paul Jones; +Cc: Qu Wenruo, linux-btrfs@vger.kernel.org Hi Paul, Thanks for the suggestion, I've had similar success in the past, but unfortunately not this time. I'm guessing this is because the metadata is full rather than the data. If I do `mount && btrfs dev add` as a single command on a small loop device it still doesn't perform in time before it turns to read only. The same goes if I try to rm, truncate, btrfs fi sync or btrfs balance. - Stefan On Tue, 13 Jun 2023 at 10:59, Paul Jones <paul@pauljones.id.au> wrote: > > > > -----Original Message----- > > From: Stefan N <stefannnau@gmail.com> > > Sent: Monday, June 12, 2023 11:03 PM > > To: Qu Wenruo <quwenruo.btrfs@gmx.com> > > Cc: linux-btrfs@vger.kernel.org > > Subject: Re: Out of space loop: skip_balance not working > > > > On Mon, 12 Jun 2023 at 20:16, Qu Wenruo <quwenruo.btrfs@gmx.com> > > wrote: > > > > I've tried using the latest ubuntu livecd which has btrfs-progs v6.2 > > > > on kernel 6.20.0-20 > > > > > > I guess you mean 6.2? > > > > Sorry yes kernel 6.2.0-20 (Ubuntu) > > > > > In v6.2 kernel Josef introduced a new mechanism called > > FLUSH_EMERGENCY > > > to try our best to squish out any extra metadata space. > > > > > > If that doesn't work, I'm running out of ideas. > > > > How do I go about forcing this to engage? Currently the array never stays in > > write mode long enough to do anything, so I'd need to trigger something > > immediately after mount to have a chance that it syncs before it goes into > > read only mode. > > > > > > BTRFS info (device sdi: state A): space_info total=83530612736, > > > > used=82789154816, pinned=245710848, reserved=495747072, > > > > may_use=130809856, readonly=0 zone_unusable=0 > > > > > > The big concern here is, we have hundreds of MiBs for > > > pinned/reserved/may_use. > > > > > > Which looks too large. > > > > > > My concern is either free space tree or extent tree updates are taking > > > too much space. > > > > > > Have you tried to cancel the balance and sync? That doesn't work either? > > > > The balance cancels ok, and there's no sync running except the auto UUID > > tree check on mount. > > > > > Considering you have quite some data space left, you may want to > > > migrate to space cache v1. > > > Unlike v2 cache, v1 cache only takes data space, thus may squish out > > > some precious metadata space. > > > > From the 'disk space caching is enabled' in the log it must still be using space > > cache v1, and forcing it as a flag doesn't appear to change anything. > > > > With many remount cycles, the best I've been able to achieve has been to rm > > some small files, but they never synced and were back in btrfs on remount. > > > > I'm running out of ideas, and given the size I really don't want to have to > > replace/rebuild if I can help it! > > > > Any other ideas would be greatly appreciated > > > When I've had similar issues in the past I've managed to create some space by adding a usb drive (or two) to the filesystem, which then gives enough of a buffer to remove some files, and when btrfs will let you remove the extra drive you know everything is back under control. > > Paul. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Out of space loop: skip_balance not working 2023-06-13 1:54 ` Stefan N @ 2023-06-13 1:58 ` Qu Wenruo 2023-06-17 5:11 ` Stefan N 0 siblings, 1 reply; 22+ messages in thread From: Qu Wenruo @ 2023-06-13 1:58 UTC (permalink / raw) To: Stefan N, Paul Jones; +Cc: linux-btrfs@vger.kernel.org On 2023/6/13 09:54, Stefan N wrote: > Hi Paul, > > Thanks for the suggestion, I've had similar success in the past, but > unfortunately not this time. > > I'm guessing this is because the metadata is full rather than the data. > > If I do `mount && btrfs dev add` as a single command on a small loop > device it still doesn't perform in time before it turns to read only. This is because btrfs_init_new_device() would commit transaction. In your particular case, since you're running RAID1C4 you need to add 4 devices in one transaction. I can easily craft a patch to avoid commit transaction, but still you'll need to add at least 4 disks, and then sync to see if things would work. Furthermore this means you need a liveCD with full kernel compiling environment. If you want to go this path, I can send you the patch when you've prepared the needed environment. Thanks, Qu > The same goes if I try to rm, truncate, btrfs fi sync or btrfs > balance. > > - Stefan > > On Tue, 13 Jun 2023 at 10:59, Paul Jones <paul@pauljones.id.au> wrote: >> >> >>> -----Original Message----- >>> From: Stefan N <stefannnau@gmail.com> >>> Sent: Monday, June 12, 2023 11:03 PM >>> To: Qu Wenruo <quwenruo.btrfs@gmx.com> >>> Cc: linux-btrfs@vger.kernel.org >>> Subject: Re: Out of space loop: skip_balance not working >>> >>> On Mon, 12 Jun 2023 at 20:16, Qu Wenruo <quwenruo.btrfs@gmx.com> >>> wrote: >>>>> I've tried using the latest ubuntu livecd which has btrfs-progs v6.2 >>>>> on kernel 6.20.0-20 >>>> >>>> I guess you mean 6.2? >>> >>> Sorry yes kernel 6.2.0-20 (Ubuntu) >>> >>>> In v6.2 kernel Josef introduced a new mechanism called >>> FLUSH_EMERGENCY >>>> to try our best to squish out any extra metadata space. >>>> >>>> If that doesn't work, I'm running out of ideas. >>> >>> How do I go about forcing this to engage? Currently the array never stays in >>> write mode long enough to do anything, so I'd need to trigger something >>> immediately after mount to have a chance that it syncs before it goes into >>> read only mode. >>> >>>>> BTRFS info (device sdi: state A): space_info total=83530612736, >>>>> used=82789154816, pinned=245710848, reserved=495747072, >>>>> may_use=130809856, readonly=0 zone_unusable=0 >>>> >>>> The big concern here is, we have hundreds of MiBs for >>>> pinned/reserved/may_use. >>>> >>>> Which looks too large. >>>> >>>> My concern is either free space tree or extent tree updates are taking >>>> too much space. >>>> >>>> Have you tried to cancel the balance and sync? That doesn't work either? >>> >>> The balance cancels ok, and there's no sync running except the auto UUID >>> tree check on mount. >>> >>>> Considering you have quite some data space left, you may want to >>>> migrate to space cache v1. >>>> Unlike v2 cache, v1 cache only takes data space, thus may squish out >>>> some precious metadata space. >>> >>> From the 'disk space caching is enabled' in the log it must still be using space >>> cache v1, and forcing it as a flag doesn't appear to change anything. >>> >>> With many remount cycles, the best I've been able to achieve has been to rm >>> some small files, but they never synced and were back in btrfs on remount. >>> >>> I'm running out of ideas, and given the size I really don't want to have to >>> replace/rebuild if I can help it! >>> >>> Any other ideas would be greatly appreciated >> >> >> When I've had similar issues in the past I've managed to create some space by adding a usb drive (or two) to the filesystem, which then gives enough of a buffer to remove some files, and when btrfs will let you remove the extra drive you know everything is back under control. >> >> Paul. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Out of space loop: skip_balance not working 2023-06-13 1:58 ` Qu Wenruo @ 2023-06-17 5:11 ` Stefan N 2023-06-17 5:30 ` Qu Wenruo 0 siblings, 1 reply; 22+ messages in thread From: Stefan N @ 2023-06-17 5:11 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs@vger.kernel.org Hi Qu, I believe I've got this environment ready, with the 6.2.0 kernel as before using the Ubuntu kernel, but can switch to vanilla if required. I've not done anything kernel modifications for a solid decade, so would be keen for a bit of guidance. I will recover a 1tb SSD and partition it into 4 in a USB enclosure, but failing this will use 4x loop devices. On Tue, 13 Jun 2023 at 11:28, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > In your particular case, since you're running RAID1C4 you need to add 4 > devices in one transaction. > > I can easily craft a patch to avoid commit transaction, but still you'll > need to add at least 4 disks, and then sync to see if things would work. > > Furthermore this means you need a liveCD with full kernel compiling > environment. > > If you want to go this path, I can send you the patch when you've > prepared the needed environment. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Out of space loop: skip_balance not working 2023-06-17 5:11 ` Stefan N @ 2023-06-17 5:30 ` Qu Wenruo 2023-06-22 8:33 ` Stefan N 0 siblings, 1 reply; 22+ messages in thread From: Qu Wenruo @ 2023-06-17 5:30 UTC (permalink / raw) To: Stefan N, Qu Wenruo, linux-btrfs@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 2505 bytes --] On 2023/6/17 13:11, Stefan N wrote: > Hi Qu, > > I believe I've got this environment ready, with the 6.2.0 kernel as > before using the Ubuntu kernel, but can switch to vanilla if required. > > I've not done anything kernel modifications for a solid decade, so > would be keen for a bit of guidance. Sure no problem. Please fetch the kernel source tar ball (6.2.x) first, decompress, then apply the attached one-line patch by: $ tar czf linux*.tar.xz $ cd linux* $ patch -np1 -i <the patch file> Then use your running system kernel config if possible: $ cp /proc/config.gz . $ gunzip config.gz $ mv config .config $ make olddefconfig Then you can start your kernel compiling, and considering you're using your distro's default, it would include tons of drivers, thus would be very slow. (Replace the number to something more suitable to your system, using all CPU cores can be very hot) $ make -j12 Finally you need to install the modules/kernel. Unfortunately this is distro specific, but if you're using Ubuntu, it may be much easier: $ make bindeb-pkg Then install the generated dpkg I guess? I have never tried kernel building using deb/rpm, but only manual installation, which is also distro dependent in the initramfs generation part. # cp arch/x86/boot/bzImage /boot/vmlinuz-custom # make modules_install # mkinitcpio -k /boot/vmlinuz-custom -g /boot/initramfs-custom.img The last step is to update your bootloader to add the new kernel, which is not only distro dependent but also bootloader dependent. In my case, I go with systemd-boot with manually crafted entries. But if you go Ubuntu I believe just installing the kernel dpkg would have everything handled? Finally you can try reboot into the newer kernel, and try device add (need to add 4 disks), then sync and see if things work as expected. Thanks, Qu > > I will recover a 1tb SSD and partition it into 4 in a USB enclosure, > but failing this will use 4x loop devices. > > On Tue, 13 Jun 2023 at 11:28, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >> In your particular case, since you're running RAID1C4 you need to add 4 >> devices in one transaction. >> >> I can easily craft a patch to avoid commit transaction, but still you'll >> need to add at least 4 disks, and then sync to see if things would work. >> >> Furthermore this means you need a liveCD with full kernel compiling >> environment. >> >> If you want to go this path, I can send you the patch when you've >> prepared the needed environment. [-- Attachment #2: 0001-btrfs-do-not-commit-transaction-when-adding-a-new-de.patch --] [-- Type: text/x-patch, Size: 1044 bytes --] From b55b383150a185d367dd3b7a820dbe1efc5cfc9d Mon Sep 17 00:00:00 2001 Message-ID: <b55b383150a185d367dd3b7a820dbe1efc5cfc9d.1686979163.git.wqu@suse.com> From: Qu Wenruo <wqu@suse.com> Date: Sat, 17 Jun 2023 13:15:30 +0800 Subject: [PATCH] btrfs: do not commit transaction when adding a new device This is to address a ENOSPC situation where one has to add more than one disks before having enough space to commit the current transaction. (Including the one to add a new device). Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/volumes.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index df43093b7a46..b8f68d58f498 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2773,7 +2773,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path btrfs_sysfs_update_sprout_fsid(fs_devices); } - ret = btrfs_commit_transaction(trans); + ret = btrfs_end_transaction(trans); if (seeding_dev) { mutex_unlock(&uuid_mutex); -- 2.41.0 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: Out of space loop: skip_balance not working 2023-06-17 5:30 ` Qu Wenruo @ 2023-06-22 8:33 ` Stefan N 2023-06-22 9:18 ` Qu Wenruo 0 siblings, 1 reply; 22+ messages in thread From: Stefan N @ 2023-06-22 8:33 UTC (permalink / raw) To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs@vger.kernel.org Hi Qu, Many thanks for the detailed instructions and your patience. I got it working combined with https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel on the main system OS instead, tagged +btrfix $ uname -vr 6.2.0-23-generic #23+btrfix SMP PREEMPT_DYNAMIC Thu Jun 22 However, I've not had luck with the commands suggested, and would appreciate any further ideas. Outputs follow below, with /mnt/data as the btrfs mount point that currently contains 8x disks sd[a-j] with an additional 4x 64gb USB flash drives being added sd[l-o] $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs dev add -f /dev/sdl /dev/sdm /dev/sdn /dev/sdo /mnt/data ; sudo btrfs fi sync /mnt/data ERROR: error adding device '/dev/sdl': Input/output error ERROR: error adding device '/dev/sdm': Read-only file system ERROR: error adding device '/dev/sdn': Read-only file system ERROR: error adding device '/dev/sdo': Read-only file system ERROR: Could not sync filesystem: Read-only file system $ The same occurs if I try to add 4x 100mb loop devices (on a ssd so they're super quick to zero); $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs dev add -K -f /dev/loop16 /dev/loop17 /dev/loop18 /dev/loop19 /mnt/data ; sudo btrfs fi sync /mnt/data ERROR: error adding device '/dev/loop16': Input/output error ERROR: error adding device '/dev/loop17': Read-only file system ERROR: error adding device '/dev/loop18': Read-only file system ERROR: error adding device '/dev/loop19': Read-only file system ERROR: Could not sync filesystem: Read-only file system $ I confirmed before both these kernel builds that the replaced line was btrfs_end_transaction rather than btrfs_commit_transaction (anyone else following, I needed to remove the -n in the patch command earlier) $ grep -A3 -ri btrfs_sysfs_update_sprout */fs/btrfs/volumes.c* linux-6.2.0-dist/fs/btrfs/volumes.c: btrfs_sysfs_update_sprout_fsid(fs_devices); linux-6.2.0-dist/fs/btrfs/volumes.c- } linux-6.2.0-dist/fs/btrfs/volumes.c- linux-6.2.0-dist/fs/btrfs/volumes.c- ret = btrfs_commit_transaction(trans); -- linux-6.2.0-v2/fs/btrfs/volumes.c: btrfs_sysfs_update_sprout_fsid(fs_devices); linux-6.2.0-v2/fs/btrfs/volumes.c- } linux-6.2.0-v2/fs/btrfs/volumes.c- linux-6.2.0-v2/fs/btrfs/volumes.c- ret = btrfs_end_transaction(trans); -- linux-6.2.0-v3/fs/btrfs/volumes.c: btrfs_sysfs_update_sprout_fsid(fs_devices); linux-6.2.0-v3/fs/btrfs/volumes.c- } linux-6.2.0-v3/fs/btrfs/volumes.c- linux-6.2.0-v3/fs/btrfs/volumes.c- ret = btrfs_end_transaction(trans); $ $ btrfs fi usage /mnt/data Overall: Device size: 87.31TiB Device allocated: 87.31TiB Device unallocated: 1.94GiB Device missing: 0.00B Device slack: 0.00B Used: 87.08TiB Free (estimated): 173.29GiB (min: 172.33GiB) Free (statfs, df): 171.84GiB Data ratio: 1.34 Metadata ratio: 4.00 Global reserve: 512.00MiB (used: 371.25MiB) Multiple profiles: no Data,RAID6: Size:64.76TiB, Used:64.59TiB (99.74%) /dev/sdc 10.90TiB /dev/sdf 10.90TiB /dev/sda 10.86TiB /dev/sdg 10.87TiB /dev/sdh 10.86TiB /dev/sdd 10.87TiB /dev/sde 10.88TiB /dev/sdb 10.88TiB Metadata,RAID1C4: Size:77.79GiB, Used:77.11GiB (99.12%) /dev/sdc 15.33GiB /dev/sdf 18.41GiB /dev/sda 49.63GiB /dev/sdg 49.50GiB /dev/sdh 51.52GiB /dev/sdd 48.70GiB /dev/sde 39.09GiB /dev/sdb 39.01GiB System,RAID1C4: Size:37.00MiB, Used:5.11MiB (13.81%) /dev/sdc 1.00MiB /dev/sda 37.00MiB /dev/sdg 37.00MiB /dev/sdh 36.00MiB /dev/sdd 37.00MiB Unallocated: /dev/sdc 1.00MiB /dev/sdf 1.00MiB /dev/sda 1.27GiB /dev/sdg 1.00MiB /dev/sdh 1.00MiB /dev/sdd 687.00MiB /dev/sde 1.00MiB /dev/sdb 1.00MiB $ This first attempt generated the following syslog output: kernel: [ 868.435387] BTRFS info (device sde): using crc32c (crc32c-intel) checksum algorithm kernel: [ 868.435407] BTRFS info (device sde): disk space caching is enabled kernel: [ 874.477712] BTRFS info (device sde): bdev /dev/sdg errs: wr 0, rd 0, flush 0, corrupt 845, gen 0 kernel: [ 874.477727] BTRFS info (device sde): bdev /dev/sdc errs: wr 41089, rd 1556, flush 0, corrupt 0, gen 0 kernel: [ 874.477735] BTRFS info (device sde): bdev /dev/sdj errs: wr 3, rd 7, flush 0, corrupt 0, gen 0 kernel: [ 874.477740] BTRFS info (device sde): bdev /dev/sdf errs: wr 41, rd 0, flush 0, corrupt 0, gen 0 kernel: [ 1082.645551] BTRFS info (device sde): balance: resume skipped kernel: [ 1082.645564] BTRFS info (device sde): checking UUID tree kernel: [ 1082.645551] BTRFS info (device sde): balance: resume skipped kernel: [ 1082.645564] BTRFS info (device sde): checking UUID tree kernel: [ 1267.280506] BTRFS: Transaction aborted (error -28) kernel: [ 1267.280553] BTRFS: error (device sde: state A) in do_free_extent_accounting:2847: errno=-28 No space left kernel: [ 1267.280604] BTRFS info (device sde: state EA): forced readonly kernel: [ 1267.280610] BTRFS error (device sde: state EA): failed to run delayed ref for logical 102255404044288 num_bytes 294912 type 184 action 2 ref_mod 1: -28 kernel: [ 1267.280584] WARNING: CPU: 3 PID: 14519 at fs/btrfs/extent-tree.c:2847 do_free_extent_accounting+0x21a/0x220 [btrfs] kernel: [ 1267.280666] BTRFS: error (device sde: state EA) in btrfs_run_delayed_refs:2151: errno=-28 No space left kernel: [ 1267.280695] BTRFS warning (device sde: state EA): btrfs_uuid_scan_kthread failed -5 kernel: [ 1267.280794] Modules linked in: xt_nat xt_tcpudp veth xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) binfmt_misc nls_iso8859_1 intel_rapl_msr intel_rapl_common edac_mce_amd snd_hda_codec_realtek kvm_amd snd_hda_codec_generic ledtrig_audio kvm snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec irqbypass snd_hda_core snd_hwdep rapl snd_pcm snd_timer wmi_bmof k10temp snd ccp soundcore input_leds mac_hid dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua bonding tls efi_pstore msr nfsd auth_rpcgss nfs_acl lockd grace sunrpc dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_txxor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid amdgpu uas usb_storage kernel: [ 1267.280994] CPU: 3 PID: 14519 Comm: btrfs-transacti Tainted: G W O 6.2.0-23-generic #23+btrfix kernel: [ 1267.281005] RIP: 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] kernel: [ 1267.281181] __btrfs_free_extent+0x6bc/0xf50 [btrfs] kernel: [ 1267.281310] run_delayed_data_ref+0x8b/0x180 [btrfs] kernel: [ 1267.281444] btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] kernel: [ 1267.281570] __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] kernel: [ 1267.281694] btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] kernel: [ 1267.281818] btrfs_start_dirty_block_groups+0x36b/0x530 [btrfs] kernel: [ 1267.281976] btrfs_commit_transaction+0xb3/0xbc0 [btrfs] kernel: [ 1267.282110] ? start_transaction+0xc8/0x600 [btrfs] kernel: [ 1267.282244] transaction_kthread+0x14b/0x1c0 [btrfs] kernel: [ 1267.282375] ? __pfx_transaction_kthread+0x10/0x10 [btrfs] kernel: [ 1267.282548] BTRFS info (device sde: state EA): dumping space info: kernel: [ 1267.282552] BTRFS info (device sde: state EA): space_info DATA has 160777674752 free, is not full kernel: [ 1267.282558] BTRFS info (device sde: state EA): space_info total=71201958395904, used=71018191273984, pinned=22985908224, reserved=0, may_use=0, readonly=3538944 zone_unusable=0 kernel: [ 1267.282566] BTRFS info (device sde: state EA): space_info METADATA has -124944384 free, is full kernel: [ 1267.282571] BTRFS info (device sde: state EA): space_info total=83530612736, used=82791497728, pinned=242745344, reserved=496369664, may_use=124944384, readonly=0 zone_unusable=0 kernel: [ 1267.282577] BTRFS info (device sde: state EA): space_info SYSTEM has 33439744 free, is not full kernel: [ 1267.282582] BTRFS info (device sde: state EA): space_info total=38797312, used=5357568, pinned=0, reserved=0, may_use=0, readonly=0 zone_unusable=0 kernel: [ 1267.282588] BTRFS info (device sde: state EA): global_block_rsv: size 536870912 reserved 124944384 kernel: [ 1267.282592] BTRFS info (device sde: state EA): trans_block_rsv: size 0 reserved 0 kernel: [ 1267.282595] BTRFS info (device sde: state EA): chunk_block_rsv: size 0 reserved 0 kernel: [ 1267.282599] BTRFS info (device sde: state EA): delayed_block_rsv: size 0 reserved 0 kernel: [ 1267.282602] BTRFS info (device sde: state EA): delayed_refs_rsv: size 251322957824 reserved 0 kernel: [ 1267.282608] BTRFS: error (device sde: state EA) in do_free_extent_accounting:2847: errno=-28 No space left kernel: [ 1267.282653] BTRFS error (device sde: state EA): failed to run delayed ref for logical 102255401897984 num_bytes 126976 type 184 action 2 ref_mod 1: -28 kernel: [ 1267.282708] BTRFS: error (device sde: state EA) in btrfs_run_delayed_refs:2151: errno=-28 No space left A couple of kernel recompiles later, the second attempt on the SSD generated similar: kernel: [ 1472.203470] BTRFS info (device sdc): using crc32c (crc32c-intel) checksum algorithm kernel: [ 1472.203491] BTRFS info (device sdc): disk space caching is enabled kernel: [ 1478.155004] BTRFS info (device sdc): bdev /dev/sdf errs: wr 0, rd 0, flush 0, corrupt 845, gen 0 kernel: [ 1478.155022] BTRFS info (device sdc): bdev /dev/sda errs: wr 41089, rd 1556, flush 0, corrupt 0, gen 0 kernel: [ 1478.155034] BTRFS info (device sdc): bdev /dev/sdh errs: wr 3, rd 7, flush 0, corrupt 0, gen 0 kernel: [ 1478.155041] BTRFS info (device sdc): bdev /dev/sdd errs: wr 41, rd 0, flush 0, corrupt 0, gen 0 kernel: [ 1696.662526] BTRFS info (device sdc): balance: resume skipped kernel: [ 1696.662537] BTRFS info (device sdc): checking UUID tree kernel: [ 1919.452464] BTRFS: Transaction aborted (error -28) kernel: [ 1919.452534] WARNING: CPU: 1 PID: 161 at fs/btrfs/extent-tree.c:2847 do_free_extent_accounting+0x21a/0x220 [btrfs] kernel: [ 1919.452655] Modules linked in: xt_nat xt_tcpudp veth xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) binfmt_misc nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec intel_rapl_msr snd_hda_core intel_rapl_common edac_mce_amd snd_hwdep kvm_amd snd_pcm snd_timer kvm irqbypass rapl wmi_bmof snd k10temp soundcore ccp input_leds mac_hid dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua bonding tls nfsd msr auth_rpcgss efi_pstore nfs_acl lockd grace sunrpc dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid amdgpu uas hid iommu_v2 kernel: [ 1919.452839] Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs] kernel: [ 1919.452985] RIP: 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] kernel: [ 1919.453141] __btrfs_free_extent+0x6bc/0xf50 [btrfs] kernel: [ 1919.453256] run_delayed_data_ref+0x8b/0x180 [btrfs] kernel: [ 1919.453368] btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] kernel: [ 1919.453480] __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] kernel: [ 1919.453592] btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] kernel: [ 1919.453703] flush_space+0x23c/0x2c0 [btrfs] kernel: [ 1919.453845] btrfs_async_reclaim_metadata_space+0x19b/0x2b0 [btrfs] kernel: [ 1919.454034] BTRFS info (device sdc: state A): dumping space info: kernel: [ 1919.454038] BTRFS info (device sdc: state A): space_info DATA has 160778723328 free, is not full kernel: [ 1919.454043] BTRFS info (device sdc: state A): space_info total=71201958395904, used=71017442181120, pinned=23733952512, reserved=0, may_use=0, readonly=3538944 zone_unusable=0 kernel: [ 1919.454050] BTRFS info (device sdc: state A): space_info METADATA has -147570688 free, is full kernel: [ 1919.454054] BTRFS info (device sdc: state A): space_info total=83530612736, used=82792185856, pinned=238059520, reserved=500367360, may_use=147570688, readonly=0 zone_unusable=0 kernel: [ 1919.454060] BTRFS info (device sdc: state A): space_info SYSTEM has 33439744 free, is not full kernel: [ 1919.454064] BTRFS info (device sdc: state A): space_info total=38797312, used=5357568, pinned=0, reserved=0, may_use=0, readonly=0 zone_unusable=0 kernel: [ 1919.454070] BTRFS info (device sdc: state A): global_block_rsv: size 536870912 reserved 147570688 kernel: [ 1919.454074] BTRFS info (device sdc: state A): trans_block_rsv: size 0 reserved 0 kernel: [ 1919.454077] BTRFS info (device sdc: state A): chunk_block_rsv: size 0 reserved 0 kernel: [ 1919.454080] BTRFS info (device sdc: state A): delayed_block_rsv: size 0 reserved 0 kernel: [ 1919.454083] BTRFS info (device sdc: state A): delayed_refs_rsv: size 254292787200 reserved 0 kernel: [ 1919.454086] BTRFS: error (device sdc: state A) in do_free_extent_accounting:2847: errno=-28 No space left kernel: [ 1919.454123] BTRFS info (device sdc: state EA): forced readonly kernel: [ 1919.454127] BTRFS error (device sdc: state EA): failed to run delayed ref for logical 102538713931776 num_bytes 245760 type 184 action 2 ref_mod 1: -28 kernel: [ 1919.454176] BTRFS: error (device sdc: state EA) in btrfs_run_delayed_refs:2151: errno=-28 No space left kernel: [ 1919.454249] BTRFS warning (device sdc: state EA): btrfs_uuid_scan_kthread failed -5 kernel: [ 1919.472381] BTRFS: error (device sdc: state EA) in __btrfs_free_extent:3077: errno=-28 No space left kernel: [ 1919.472417] BTRFS error (device sdc: state EA): failed to run delayed ref for logical 102538732191744 num_bytes 245760 type 184 action 2 ref_mod 1: -28 kernel: [ 1919.472442] BTRFS: error (device sdc: state EA) in btrfs_run_delayed_refs:2151: errno=-28 No space left On Sat, 17 Jun 2023 at 15:00, Qu Wenruo <wqu@suse.com> wrote: > > > > On 2023/6/17 13:11, Stefan N wrote: > > Hi Qu, > > > > I believe I've got this environment ready, with the 6.2.0 kernel as > > before using the Ubuntu kernel, but can switch to vanilla if required. > > > > I've not done anything kernel modifications for a solid decade, so > > would be keen for a bit of guidance. > > Sure no problem. > > Please fetch the kernel source tar ball (6.2.x) first, decompress, then > apply the attached one-line patch by: > > $ tar czf linux*.tar.xz > $ cd linux* > $ patch -np1 -i <the patch file> > > Then use your running system kernel config if possible: > > $ cp /proc/config.gz . > $ gunzip config.gz > $ mv config .config > $ make olddefconfig > > Then you can start your kernel compiling, and considering you're using > your distro's default, it would include tons of drivers, thus would be > very slow. (Replace the number to something more suitable to your > system, using all CPU cores can be very hot) > > $ make -j12 > > Finally you need to install the modules/kernel. > > Unfortunately this is distro specific, but if you're using Ubuntu, it > may be much easier: > > $ make bindeb-pkg > > Then install the generated dpkg I guess? I have never tried kernel > building using deb/rpm, but only manual installation, which is also > distro dependent in the initramfs generation part. > > # cp arch/x86/boot/bzImage /boot/vmlinuz-custom > # make modules_install > # mkinitcpio -k /boot/vmlinuz-custom -g /boot/initramfs-custom.img > > > The last step is to update your bootloader to add the new kernel, which > is not only distro dependent but also bootloader dependent. > > In my case, I go with systemd-boot with manually crafted entries. > But if you go Ubuntu I believe just installing the kernel dpkg would > have everything handled? > > Finally you can try reboot into the newer kernel, and try device add > (need to add 4 disks), then sync and see if things work as expected. > > Thanks, > Qu > > > > I will recover a 1tb SSD and partition it into 4 in a USB enclosure, > > but failing this will use 4x loop devices. > > > > On Tue, 13 Jun 2023 at 11:28, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > >> In your particular case, since you're running RAID1C4 you need to add 4 > >> devices in one transaction. > >> > >> I can easily craft a patch to avoid commit transaction, but still you'll > >> need to add at least 4 disks, and then sync to see if things would work. > >> > >> Furthermore this means you need a liveCD with full kernel compiling > >> environment. > >> > >> If you want to go this path, I can send you the patch when you've > >> prepared the needed environment. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Out of space loop: skip_balance not working 2023-06-22 8:33 ` Stefan N @ 2023-06-22 9:18 ` Qu Wenruo 2023-06-22 22:18 ` Stefan N 0 siblings, 1 reply; 22+ messages in thread From: Qu Wenruo @ 2023-06-22 9:18 UTC (permalink / raw) To: Stefan N, Qu Wenruo; +Cc: linux-btrfs@vger.kernel.org On 2023/6/22 16:33, Stefan N wrote: > Hi Qu, > > Many thanks for the detailed instructions and your patience. I got it > working combined with > https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel on the main system > OS instead, tagged +btrfix > $ uname -vr > 6.2.0-23-generic #23+btrfix SMP PREEMPT_DYNAMIC Thu Jun 22 > > However, I've not had luck with the commands suggested, and would > appreciate any further ideas. > > Outputs follow below, with /mnt/data as the btrfs mount point that > currently contains 8x disks sd[a-j] with an additional 4x 64gb USB > flash drives being added sd[l-o] > $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs > dev add -f /dev/sdl /dev/sdm /dev/sdn /dev/sdo /mnt/data ; sudo btrfs > fi sync /mnt/data > ERROR: error adding device '/dev/sdl': Input/output error > ERROR: error adding device '/dev/sdm': Read-only file system > ERROR: error adding device '/dev/sdn': Read-only file system > ERROR: error adding device '/dev/sdo': Read-only file system > ERROR: Could not sync filesystem: Read-only file system > $ > > The same occurs if I try to add 4x 100mb loop devices (on a ssd so > they're super quick to zero); > $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs > dev add -K -f /dev/loop16 /dev/loop17 /dev/loop18 /dev/loop19 > /mnt/data ; sudo btrfs fi sync /mnt/data > ERROR: error adding device '/dev/loop16': Input/output error This is the interesting part, this means we're erroring out due to -EIO (not -ENOSPC) during the first device add. And by somehow, after the first device add, we already got the trans abort. Would you please try the following branch? https://github.com/adam900710/linux/tree/dev_add_no_commit It has not only the patch to skip the commit, but also extra debug output for the situation. Thanks, Qu > ERROR: error adding device '/dev/loop17': Read-only file system > ERROR: error adding device '/dev/loop18': Read-only file system > ERROR: error adding device '/dev/loop19': Read-only file system > ERROR: Could not sync filesystem: Read-only file system > $ > > I confirmed before both these kernel builds that the replaced line was > btrfs_end_transaction rather than btrfs_commit_transaction (anyone > else following, I needed to remove the -n in the patch command > earlier) > $ grep -A3 -ri btrfs_sysfs_update_sprout */fs/btrfs/volumes.c* > linux-6.2.0-dist/fs/btrfs/volumes.c: > btrfs_sysfs_update_sprout_fsid(fs_devices); > linux-6.2.0-dist/fs/btrfs/volumes.c- } > linux-6.2.0-dist/fs/btrfs/volumes.c- > linux-6.2.0-dist/fs/btrfs/volumes.c- ret = btrfs_commit_transaction(trans); > -- > linux-6.2.0-v2/fs/btrfs/volumes.c: > btrfs_sysfs_update_sprout_fsid(fs_devices); > linux-6.2.0-v2/fs/btrfs/volumes.c- } > linux-6.2.0-v2/fs/btrfs/volumes.c- > linux-6.2.0-v2/fs/btrfs/volumes.c- ret = btrfs_end_transaction(trans); > -- > linux-6.2.0-v3/fs/btrfs/volumes.c: > btrfs_sysfs_update_sprout_fsid(fs_devices); > linux-6.2.0-v3/fs/btrfs/volumes.c- } > linux-6.2.0-v3/fs/btrfs/volumes.c- > linux-6.2.0-v3/fs/btrfs/volumes.c- ret = btrfs_end_transaction(trans); > $ > > $ btrfs fi usage /mnt/data > Overall: > Device size: 87.31TiB > Device allocated: 87.31TiB > Device unallocated: 1.94GiB > Device missing: 0.00B > Device slack: 0.00B > Used: 87.08TiB > Free (estimated): 173.29GiB (min: 172.33GiB) > Free (statfs, df): 171.84GiB > Data ratio: 1.34 > Metadata ratio: 4.00 > Global reserve: 512.00MiB (used: 371.25MiB) > Multiple profiles: no > > Data,RAID6: Size:64.76TiB, Used:64.59TiB (99.74%) > /dev/sdc 10.90TiB > /dev/sdf 10.90TiB > /dev/sda 10.86TiB > /dev/sdg 10.87TiB > /dev/sdh 10.86TiB > /dev/sdd 10.87TiB > /dev/sde 10.88TiB > /dev/sdb 10.88TiB > > Metadata,RAID1C4: Size:77.79GiB, Used:77.11GiB (99.12%) > /dev/sdc 15.33GiB > /dev/sdf 18.41GiB > /dev/sda 49.63GiB > /dev/sdg 49.50GiB > /dev/sdh 51.52GiB > /dev/sdd 48.70GiB > /dev/sde 39.09GiB > /dev/sdb 39.01GiB > > System,RAID1C4: Size:37.00MiB, Used:5.11MiB (13.81%) > /dev/sdc 1.00MiB > /dev/sda 37.00MiB > /dev/sdg 37.00MiB > /dev/sdh 36.00MiB > /dev/sdd 37.00MiB > > Unallocated: > /dev/sdc 1.00MiB > /dev/sdf 1.00MiB > /dev/sda 1.27GiB > /dev/sdg 1.00MiB > /dev/sdh 1.00MiB > /dev/sdd 687.00MiB > /dev/sde 1.00MiB > /dev/sdb 1.00MiB > $ > > > This first attempt generated the following syslog output: > kernel: [ 868.435387] BTRFS info (device sde): using crc32c > (crc32c-intel) checksum algorithm > kernel: [ 868.435407] BTRFS info (device sde): disk space caching is enabled > kernel: [ 874.477712] BTRFS info (device sde): bdev /dev/sdg errs: wr > 0, rd 0, flush 0, corrupt 845, gen 0 > kernel: [ 874.477727] BTRFS info (device sde): bdev /dev/sdc errs: wr > 41089, rd 1556, flush 0, corrupt 0, gen 0 > kernel: [ 874.477735] BTRFS info (device sde): bdev /dev/sdj errs: wr > 3, rd 7, flush 0, corrupt 0, gen 0 > kernel: [ 874.477740] BTRFS info (device sde): bdev /dev/sdf errs: wr > 41, rd 0, flush 0, corrupt 0, gen 0 > kernel: [ 1082.645551] BTRFS info (device sde): balance: resume skipped > kernel: [ 1082.645564] BTRFS info (device sde): checking UUID tree > kernel: [ 1082.645551] BTRFS info (device sde): balance: resume skipped > kernel: [ 1082.645564] BTRFS info (device sde): checking UUID tree > kernel: [ 1267.280506] BTRFS: Transaction aborted (error -28) > kernel: [ 1267.280553] BTRFS: error (device sde: state A) in > do_free_extent_accounting:2847: errno=-28 No space left > kernel: [ 1267.280604] BTRFS info (device sde: state EA): forced readonly > kernel: [ 1267.280610] BTRFS error (device sde: state EA): failed to > run delayed ref for logical 102255404044288 num_bytes 294912 type 184 > action 2 ref_mod 1: -28 > kernel: [ 1267.280584] WARNING: CPU: 3 PID: 14519 at > fs/btrfs/extent-tree.c:2847 do_free_extent_accounting+0x21a/0x220 > [btrfs] > kernel: [ 1267.280666] BTRFS: error (device sde: state EA) in > btrfs_run_delayed_refs:2151: errno=-28 No space left > kernel: [ 1267.280695] BTRFS warning (device sde: state EA): > btrfs_uuid_scan_kthread failed -5 > kernel: [ 1267.280794] Modules linked in: xt_nat xt_tcpudp veth > xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo > xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc > ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) binfmt_misc > nls_iso8859_1 intel_rapl_msr intel_rapl_common edac_mce_amd > snd_hda_codec_realtek kvm_amd snd_hda_codec_generic ledtrig_audio kvm > snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi > snd_hda_codec irqbypass snd_hda_core snd_hwdep rapl snd_pcm snd_timer > wmi_bmof k10temp snd ccp soundcore input_leds mac_hid dm_multipath > scsi_dh_rdac scsi_dh_emc scsi_dh_alua bonding tls efi_pstore msr nfsd > auth_rpcgss nfs_acl lockd grace sunrpc dmi_sysfs ip_tables x_tables > autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov > async_memcpy async_pq async_xor async_txxor raid6_pq libcrc32c raid1 > raid0 multipath linear hid_generic usbhid hid amdgpu uas usb_storage > kernel: [ 1267.280994] CPU: 3 PID: 14519 Comm: btrfs-transacti > Tainted: G W O 6.2.0-23-generic #23+btrfix > kernel: [ 1267.281005] RIP: 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] > kernel: [ 1267.281181] __btrfs_free_extent+0x6bc/0xf50 [btrfs] > kernel: [ 1267.281310] run_delayed_data_ref+0x8b/0x180 [btrfs] > kernel: [ 1267.281444] btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] > kernel: [ 1267.281570] __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] > kernel: [ 1267.281694] btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] > kernel: [ 1267.281818] btrfs_start_dirty_block_groups+0x36b/0x530 [btrfs] > kernel: [ 1267.281976] btrfs_commit_transaction+0xb3/0xbc0 [btrfs] > kernel: [ 1267.282110] ? start_transaction+0xc8/0x600 [btrfs] > kernel: [ 1267.282244] transaction_kthread+0x14b/0x1c0 [btrfs] > kernel: [ 1267.282375] ? __pfx_transaction_kthread+0x10/0x10 [btrfs] > kernel: [ 1267.282548] BTRFS info (device sde: state EA): dumping space info: > kernel: [ 1267.282552] BTRFS info (device sde: state EA): space_info > DATA has 160777674752 free, is not full > kernel: [ 1267.282558] BTRFS info (device sde: state EA): space_info > total=71201958395904, used=71018191273984, pinned=22985908224, > reserved=0, may_use=0, readonly=3538944 zone_unusable=0 > kernel: [ 1267.282566] BTRFS info (device sde: state EA): space_info > METADATA has -124944384 free, is full > kernel: [ 1267.282571] BTRFS info (device sde: state EA): space_info > total=83530612736, used=82791497728, pinned=242745344, > reserved=496369664, may_use=124944384, readonly=0 zone_unusable=0 > kernel: [ 1267.282577] BTRFS info (device sde: state EA): space_info > SYSTEM has 33439744 free, is not full > kernel: [ 1267.282582] BTRFS info (device sde: state EA): space_info > total=38797312, used=5357568, pinned=0, reserved=0, may_use=0, > readonly=0 zone_unusable=0 > kernel: [ 1267.282588] BTRFS info (device sde: state EA): > global_block_rsv: size 536870912 reserved 124944384 > kernel: [ 1267.282592] BTRFS info (device sde: state EA): > trans_block_rsv: size 0 reserved 0 > kernel: [ 1267.282595] BTRFS info (device sde: state EA): > chunk_block_rsv: size 0 reserved 0 > kernel: [ 1267.282599] BTRFS info (device sde: state EA): > delayed_block_rsv: size 0 reserved 0 > kernel: [ 1267.282602] BTRFS info (device sde: state EA): > delayed_refs_rsv: size 251322957824 reserved 0 > kernel: [ 1267.282608] BTRFS: error (device sde: state EA) in > do_free_extent_accounting:2847: errno=-28 No space left > kernel: [ 1267.282653] BTRFS error (device sde: state EA): failed to > run delayed ref for logical 102255401897984 num_bytes 126976 type 184 > action 2 ref_mod 1: -28 > kernel: [ 1267.282708] BTRFS: error (device sde: state EA) in > btrfs_run_delayed_refs:2151: errno=-28 No space left > > A couple of kernel recompiles later, the second attempt on the SSD > generated similar: > kernel: [ 1472.203470] BTRFS info (device sdc): using crc32c > (crc32c-intel) checksum algorithm > kernel: [ 1472.203491] BTRFS info (device sdc): disk space caching is enabled > kernel: [ 1478.155004] BTRFS info (device sdc): bdev /dev/sdf errs: wr > 0, rd 0, flush 0, corrupt 845, gen 0 > kernel: [ 1478.155022] BTRFS info (device sdc): bdev /dev/sda errs: wr > 41089, rd 1556, flush 0, corrupt 0, gen 0 > kernel: [ 1478.155034] BTRFS info (device sdc): bdev /dev/sdh errs: wr > 3, rd 7, flush 0, corrupt 0, gen 0 > kernel: [ 1478.155041] BTRFS info (device sdc): bdev /dev/sdd errs: wr > 41, rd 0, flush 0, corrupt 0, gen 0 > kernel: [ 1696.662526] BTRFS info (device sdc): balance: resume skipped > kernel: [ 1696.662537] BTRFS info (device sdc): checking UUID tree > kernel: [ 1919.452464] BTRFS: Transaction aborted (error -28) > kernel: [ 1919.452534] WARNING: CPU: 1 PID: 161 at > fs/btrfs/extent-tree.c:2847 do_free_extent_accounting+0x21a/0x220 > [btrfs] > kernel: [ 1919.452655] Modules linked in: xt_nat xt_tcpudp veth > xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo > xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc > ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) binfmt_misc > nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic > ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg > snd_intel_sdw_acpi snd_hda_codec intel_rapl_msr snd_hda_core > intel_rapl_common edac_mce_amd snd_hwdep kvm_amd snd_pcm snd_timer kvm > irqbypass rapl wmi_bmof snd k10temp soundcore ccp input_leds mac_hid > dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua bonding tls nfsd > msr auth_rpcgss efi_pstore nfs_acl lockd grace sunrpc dmi_sysfs > ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 > async_raid6_recov async_memcpy async_pq async_xor async_tx xor > raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid > amdgpu uas hid iommu_v2 > kernel: [ 1919.452839] Workqueue: events_unbound > btrfs_async_reclaim_metadata_space [btrfs] > kernel: [ 1919.452985] RIP: 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] > kernel: [ 1919.453141] __btrfs_free_extent+0x6bc/0xf50 [btrfs] > kernel: [ 1919.453256] run_delayed_data_ref+0x8b/0x180 [btrfs] > kernel: [ 1919.453368] btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] > kernel: [ 1919.453480] __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] > kernel: [ 1919.453592] btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] > kernel: [ 1919.453703] flush_space+0x23c/0x2c0 [btrfs] > kernel: [ 1919.453845] btrfs_async_reclaim_metadata_space+0x19b/0x2b0 [btrfs] > kernel: [ 1919.454034] BTRFS info (device sdc: state A): dumping space info: > kernel: [ 1919.454038] BTRFS info (device sdc: state A): space_info > DATA has 160778723328 free, is not full > kernel: [ 1919.454043] BTRFS info (device sdc: state A): space_info > total=71201958395904, used=71017442181120, pinned=23733952512, > reserved=0, may_use=0, readonly=3538944 zone_unusable=0 > kernel: [ 1919.454050] BTRFS info (device sdc: state A): space_info > METADATA has -147570688 free, is full > kernel: [ 1919.454054] BTRFS info (device sdc: state A): space_info > total=83530612736, used=82792185856, pinned=238059520, > reserved=500367360, may_use=147570688, readonly=0 zone_unusable=0 > kernel: [ 1919.454060] BTRFS info (device sdc: state A): space_info > SYSTEM has 33439744 free, is not full > kernel: [ 1919.454064] BTRFS info (device sdc: state A): space_info > total=38797312, used=5357568, pinned=0, reserved=0, may_use=0, > readonly=0 zone_unusable=0 > kernel: [ 1919.454070] BTRFS info (device sdc: state A): > global_block_rsv: size 536870912 reserved 147570688 > kernel: [ 1919.454074] BTRFS info (device sdc: state A): > trans_block_rsv: size 0 reserved 0 > kernel: [ 1919.454077] BTRFS info (device sdc: state A): > chunk_block_rsv: size 0 reserved 0 > kernel: [ 1919.454080] BTRFS info (device sdc: state A): > delayed_block_rsv: size 0 reserved 0 > kernel: [ 1919.454083] BTRFS info (device sdc: state A): > delayed_refs_rsv: size 254292787200 reserved 0 > kernel: [ 1919.454086] BTRFS: error (device sdc: state A) in > do_free_extent_accounting:2847: errno=-28 No space left > kernel: [ 1919.454123] BTRFS info (device sdc: state EA): forced readonly > kernel: [ 1919.454127] BTRFS error (device sdc: state EA): failed to > run delayed ref for logical 102538713931776 num_bytes 245760 type 184 > action 2 ref_mod 1: -28 > kernel: [ 1919.454176] BTRFS: error (device sdc: state EA) in > btrfs_run_delayed_refs:2151: errno=-28 No space left > kernel: [ 1919.454249] BTRFS warning (device sdc: state EA): > btrfs_uuid_scan_kthread failed -5 > kernel: [ 1919.472381] BTRFS: error (device sdc: state EA) in > __btrfs_free_extent:3077: errno=-28 No space left > kernel: [ 1919.472417] BTRFS error (device sdc: state EA): failed to > run delayed ref for logical 102538732191744 num_bytes 245760 type 184 > action 2 ref_mod 1: -28 > kernel: [ 1919.472442] BTRFS: error (device sdc: state EA) in > btrfs_run_delayed_refs:2151: errno=-28 No space left > > > On Sat, 17 Jun 2023 at 15:00, Qu Wenruo <wqu@suse.com> wrote: >> >> >> >> On 2023/6/17 13:11, Stefan N wrote: >>> Hi Qu, >>> >>> I believe I've got this environment ready, with the 6.2.0 kernel as >>> before using the Ubuntu kernel, but can switch to vanilla if required. >>> >>> I've not done anything kernel modifications for a solid decade, so >>> would be keen for a bit of guidance. >> >> Sure no problem. >> >> Please fetch the kernel source tar ball (6.2.x) first, decompress, then >> apply the attached one-line patch by: >> >> $ tar czf linux*.tar.xz >> $ cd linux* >> $ patch -np1 -i <the patch file> >> >> Then use your running system kernel config if possible: >> >> $ cp /proc/config.gz . >> $ gunzip config.gz >> $ mv config .config >> $ make olddefconfig >> >> Then you can start your kernel compiling, and considering you're using >> your distro's default, it would include tons of drivers, thus would be >> very slow. (Replace the number to something more suitable to your >> system, using all CPU cores can be very hot) >> >> $ make -j12 >> >> Finally you need to install the modules/kernel. >> >> Unfortunately this is distro specific, but if you're using Ubuntu, it >> may be much easier: >> >> $ make bindeb-pkg >> >> Then install the generated dpkg I guess? I have never tried kernel >> building using deb/rpm, but only manual installation, which is also >> distro dependent in the initramfs generation part. >> >> # cp arch/x86/boot/bzImage /boot/vmlinuz-custom >> # make modules_install >> # mkinitcpio -k /boot/vmlinuz-custom -g /boot/initramfs-custom.img >> >> >> The last step is to update your bootloader to add the new kernel, which >> is not only distro dependent but also bootloader dependent. >> >> In my case, I go with systemd-boot with manually crafted entries. >> But if you go Ubuntu I believe just installing the kernel dpkg would >> have everything handled? >> >> Finally you can try reboot into the newer kernel, and try device add >> (need to add 4 disks), then sync and see if things work as expected. >> >> Thanks, >> Qu >>> >>> I will recover a 1tb SSD and partition it into 4 in a USB enclosure, >>> but failing this will use 4x loop devices. >>> >>> On Tue, 13 Jun 2023 at 11:28, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >>>> In your particular case, since you're running RAID1C4 you need to add 4 >>>> devices in one transaction. >>>> >>>> I can easily craft a patch to avoid commit transaction, but still you'll >>>> need to add at least 4 disks, and then sync to see if things would work. >>>> >>>> Furthermore this means you need a liveCD with full kernel compiling >>>> environment. >>>> >>>> If you want to go this path, I can send you the patch when you've >>>> prepared the needed environment. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Out of space loop: skip_balance not working 2023-06-22 9:18 ` Qu Wenruo @ 2023-06-22 22:18 ` Stefan N 2023-06-23 0:57 ` Qu Wenruo 0 siblings, 1 reply; 22+ messages in thread From: Stefan N @ 2023-06-22 22:18 UTC (permalink / raw) To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs@vger.kernel.org Hi Qu, I got one new line this time, but it doesn't seem to match your commit ERROR: zoned: unable to stat /dev/loop/13 I tried it on the USB flash drives too and didn't get any extra line In context $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs dev add -K -f /dev/loop12 /dev/loop/13 /dev/loop14 /dev/loop15 /mnt/data ; sudo btrfs fi sync /mnt/data ERROR: error adding device '/dev/loop12': Input/output error ERROR: zoned: unable to stat /dev/loop/13 ERROR: checking status of /dev/loop/13: No such file or directory ERROR: error adding device '/dev/loop14': Read-only file system ERROR: error adding device '/dev/loop15': Read-only file system ERROR: Could not sync filesystem: Read-only file system $ $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs dev add -f /dev/sdl /dev/sdm /dev/sdn /dev/sdo /mnt/data ; sudo btrfs fi sync /mnt/data ERROR: error adding device '/dev/sdl': Input/output error ERROR: error adding device '/dev/sdm': Read-only file system ERROR: error adding device '/dev/sdn': Read-only file system ERROR: error adding device '/dev/sdo': Read-only file system ERROR: Could not sync filesystem: Read-only file system $ On Thu, 22 Jun 2023 at 18:48, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > > > On 2023/6/22 16:33, Stefan N wrote: > > Hi Qu, > > > > Many thanks for the detailed instructions and your patience. I got it > > working combined with > > https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel on the main system > > OS instead, tagged +btrfix > > $ uname -vr > > 6.2.0-23-generic #23+btrfix SMP PREEMPT_DYNAMIC Thu Jun 22 > > > > However, I've not had luck with the commands suggested, and would > > appreciate any further ideas. > > > > Outputs follow below, with /mnt/data as the btrfs mount point that > > currently contains 8x disks sd[a-j] with an additional 4x 64gb USB > > flash drives being added sd[l-o] > > $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs > > dev add -f /dev/sdl /dev/sdm /dev/sdn /dev/sdo /mnt/data ; sudo btrfs > > fi sync /mnt/data > > ERROR: error adding device '/dev/sdl': Input/output error > > ERROR: error adding device '/dev/sdm': Read-only file system > > ERROR: error adding device '/dev/sdn': Read-only file system > > ERROR: error adding device '/dev/sdo': Read-only file system > > ERROR: Could not sync filesystem: Read-only file system > > $ > > > > The same occurs if I try to add 4x 100mb loop devices (on a ssd so > > they're super quick to zero); > > $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs > > dev add -K -f /dev/loop16 /dev/loop17 /dev/loop18 /dev/loop19 > > /mnt/data ; sudo btrfs fi sync /mnt/data > > ERROR: error adding device '/dev/loop16': Input/output error > > This is the interesting part, this means we're erroring out due to -EIO > (not -ENOSPC) during the first device add. > > And by somehow, after the first device add, we already got the trans abort. > > Would you please try the following branch? > > https://github.com/adam900710/linux/tree/dev_add_no_commit > > It has not only the patch to skip the commit, but also extra debug > output for the situation. > > Thanks, > Qu > > > ERROR: error adding device '/dev/loop17': Read-only file system > > ERROR: error adding device '/dev/loop18': Read-only file system > > ERROR: error adding device '/dev/loop19': Read-only file system > > ERROR: Could not sync filesystem: Read-only file system > > $ > > > > I confirmed before both these kernel builds that the replaced line was > > btrfs_end_transaction rather than btrfs_commit_transaction (anyone > > else following, I needed to remove the -n in the patch command > > earlier) > > $ grep -A3 -ri btrfs_sysfs_update_sprout */fs/btrfs/volumes.c* > > linux-6.2.0-dist/fs/btrfs/volumes.c: > > btrfs_sysfs_update_sprout_fsid(fs_devices); > > linux-6.2.0-dist/fs/btrfs/volumes.c- } > > linux-6.2.0-dist/fs/btrfs/volumes.c- > > linux-6.2.0-dist/fs/btrfs/volumes.c- ret = btrfs_commit_transaction(trans); > > -- > > linux-6.2.0-v2/fs/btrfs/volumes.c: > > btrfs_sysfs_update_sprout_fsid(fs_devices); > > linux-6.2.0-v2/fs/btrfs/volumes.c- } > > linux-6.2.0-v2/fs/btrfs/volumes.c- > > linux-6.2.0-v2/fs/btrfs/volumes.c- ret = btrfs_end_transaction(trans); > > -- > > linux-6.2.0-v3/fs/btrfs/volumes.c: > > btrfs_sysfs_update_sprout_fsid(fs_devices); > > linux-6.2.0-v3/fs/btrfs/volumes.c- } > > linux-6.2.0-v3/fs/btrfs/volumes.c- > > linux-6.2.0-v3/fs/btrfs/volumes.c- ret = btrfs_end_transaction(trans); > > $ > > > > $ btrfs fi usage /mnt/data > > Overall: > > Device size: 87.31TiB > > Device allocated: 87.31TiB > > Device unallocated: 1.94GiB > > Device missing: 0.00B > > Device slack: 0.00B > > Used: 87.08TiB > > Free (estimated): 173.29GiB (min: 172.33GiB) > > Free (statfs, df): 171.84GiB > > Data ratio: 1.34 > > Metadata ratio: 4.00 > > Global reserve: 512.00MiB (used: 371.25MiB) > > Multiple profiles: no > > > > Data,RAID6: Size:64.76TiB, Used:64.59TiB (99.74%) > > /dev/sdc 10.90TiB > > /dev/sdf 10.90TiB > > /dev/sda 10.86TiB > > /dev/sdg 10.87TiB > > /dev/sdh 10.86TiB > > /dev/sdd 10.87TiB > > /dev/sde 10.88TiB > > /dev/sdb 10.88TiB > > > > Metadata,RAID1C4: Size:77.79GiB, Used:77.11GiB (99.12%) > > /dev/sdc 15.33GiB > > /dev/sdf 18.41GiB > > /dev/sda 49.63GiB > > /dev/sdg 49.50GiB > > /dev/sdh 51.52GiB > > /dev/sdd 48.70GiB > > /dev/sde 39.09GiB > > /dev/sdb 39.01GiB > > > > System,RAID1C4: Size:37.00MiB, Used:5.11MiB (13.81%) > > /dev/sdc 1.00MiB > > /dev/sda 37.00MiB > > /dev/sdg 37.00MiB > > /dev/sdh 36.00MiB > > /dev/sdd 37.00MiB > > > > Unallocated: > > /dev/sdc 1.00MiB > > /dev/sdf 1.00MiB > > /dev/sda 1.27GiB > > /dev/sdg 1.00MiB > > /dev/sdh 1.00MiB > > /dev/sdd 687.00MiB > > /dev/sde 1.00MiB > > /dev/sdb 1.00MiB > > $ > > > > > > This first attempt generated the following syslog output: > > kernel: [ 868.435387] BTRFS info (device sde): using crc32c > > (crc32c-intel) checksum algorithm > > kernel: [ 868.435407] BTRFS info (device sde): disk space caching is enabled > > kernel: [ 874.477712] BTRFS info (device sde): bdev /dev/sdg errs: wr > > 0, rd 0, flush 0, corrupt 845, gen 0 > > kernel: [ 874.477727] BTRFS info (device sde): bdev /dev/sdc errs: wr > > 41089, rd 1556, flush 0, corrupt 0, gen 0 > > kernel: [ 874.477735] BTRFS info (device sde): bdev /dev/sdj errs: wr > > 3, rd 7, flush 0, corrupt 0, gen 0 > > kernel: [ 874.477740] BTRFS info (device sde): bdev /dev/sdf errs: wr > > 41, rd 0, flush 0, corrupt 0, gen 0 > > kernel: [ 1082.645551] BTRFS info (device sde): balance: resume skipped > > kernel: [ 1082.645564] BTRFS info (device sde): checking UUID tree > > kernel: [ 1082.645551] BTRFS info (device sde): balance: resume skipped > > kernel: [ 1082.645564] BTRFS info (device sde): checking UUID tree > > kernel: [ 1267.280506] BTRFS: Transaction aborted (error -28) > > kernel: [ 1267.280553] BTRFS: error (device sde: state A) in > > do_free_extent_accounting:2847: errno=-28 No space left > > kernel: [ 1267.280604] BTRFS info (device sde: state EA): forced readonly > > kernel: [ 1267.280610] BTRFS error (device sde: state EA): failed to > > run delayed ref for logical 102255404044288 num_bytes 294912 type 184 > > action 2 ref_mod 1: -28 > > kernel: [ 1267.280584] WARNING: CPU: 3 PID: 14519 at > > fs/btrfs/extent-tree.c:2847 do_free_extent_accounting+0x21a/0x220 > > [btrfs] > > kernel: [ 1267.280666] BTRFS: error (device sde: state EA) in > > btrfs_run_delayed_refs:2151: errno=-28 No space left > > kernel: [ 1267.280695] BTRFS warning (device sde: state EA): > > btrfs_uuid_scan_kthread failed -5 > > kernel: [ 1267.280794] Modules linked in: xt_nat xt_tcpudp veth > > xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink > > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo > > xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc > > ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) binfmt_misc > > nls_iso8859_1 intel_rapl_msr intel_rapl_common edac_mce_amd > > snd_hda_codec_realtek kvm_amd snd_hda_codec_generic ledtrig_audio kvm > > snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi > > snd_hda_codec irqbypass snd_hda_core snd_hwdep rapl snd_pcm snd_timer > > wmi_bmof k10temp snd ccp soundcore input_leds mac_hid dm_multipath > > scsi_dh_rdac scsi_dh_emc scsi_dh_alua bonding tls efi_pstore msr nfsd > > auth_rpcgss nfs_acl lockd grace sunrpc dmi_sysfs ip_tables x_tables > > autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov > > async_memcpy async_pq async_xor async_txxor raid6_pq libcrc32c raid1 > > raid0 multipath linear hid_generic usbhid hid amdgpu uas usb_storage > > kernel: [ 1267.280994] CPU: 3 PID: 14519 Comm: btrfs-transacti > > Tainted: G W O 6.2.0-23-generic #23+btrfix > > kernel: [ 1267.281005] RIP: 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] > > kernel: [ 1267.281181] __btrfs_free_extent+0x6bc/0xf50 [btrfs] > > kernel: [ 1267.281310] run_delayed_data_ref+0x8b/0x180 [btrfs] > > kernel: [ 1267.281444] btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] > > kernel: [ 1267.281570] __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] > > kernel: [ 1267.281694] btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] > > kernel: [ 1267.281818] btrfs_start_dirty_block_groups+0x36b/0x530 [btrfs] > > kernel: [ 1267.281976] btrfs_commit_transaction+0xb3/0xbc0 [btrfs] > > kernel: [ 1267.282110] ? start_transaction+0xc8/0x600 [btrfs] > > kernel: [ 1267.282244] transaction_kthread+0x14b/0x1c0 [btrfs] > > kernel: [ 1267.282375] ? __pfx_transaction_kthread+0x10/0x10 [btrfs] > > kernel: [ 1267.282548] BTRFS info (device sde: state EA): dumping space info: > > kernel: [ 1267.282552] BTRFS info (device sde: state EA): space_info > > DATA has 160777674752 free, is not full > > kernel: [ 1267.282558] BTRFS info (device sde: state EA): space_info > > total=71201958395904, used=71018191273984, pinned=22985908224, > > reserved=0, may_use=0, readonly=3538944 zone_unusable=0 > > kernel: [ 1267.282566] BTRFS info (device sde: state EA): space_info > > METADATA has -124944384 free, is full > > kernel: [ 1267.282571] BTRFS info (device sde: state EA): space_info > > total=83530612736, used=82791497728, pinned=242745344, > > reserved=496369664, may_use=124944384, readonly=0 zone_unusable=0 > > kernel: [ 1267.282577] BTRFS info (device sde: state EA): space_info > > SYSTEM has 33439744 free, is not full > > kernel: [ 1267.282582] BTRFS info (device sde: state EA): space_info > > total=38797312, used=5357568, pinned=0, reserved=0, may_use=0, > > readonly=0 zone_unusable=0 > > kernel: [ 1267.282588] BTRFS info (device sde: state EA): > > global_block_rsv: size 536870912 reserved 124944384 > > kernel: [ 1267.282592] BTRFS info (device sde: state EA): > > trans_block_rsv: size 0 reserved 0 > > kernel: [ 1267.282595] BTRFS info (device sde: state EA): > > chunk_block_rsv: size 0 reserved 0 > > kernel: [ 1267.282599] BTRFS info (device sde: state EA): > > delayed_block_rsv: size 0 reserved 0 > > kernel: [ 1267.282602] BTRFS info (device sde: state EA): > > delayed_refs_rsv: size 251322957824 reserved 0 > > kernel: [ 1267.282608] BTRFS: error (device sde: state EA) in > > do_free_extent_accounting:2847: errno=-28 No space left > > kernel: [ 1267.282653] BTRFS error (device sde: state EA): failed to > > run delayed ref for logical 102255401897984 num_bytes 126976 type 184 > > action 2 ref_mod 1: -28 > > kernel: [ 1267.282708] BTRFS: error (device sde: state EA) in > > btrfs_run_delayed_refs:2151: errno=-28 No space left > > > > A couple of kernel recompiles later, the second attempt on the SSD > > generated similar: > > kernel: [ 1472.203470] BTRFS info (device sdc): using crc32c > > (crc32c-intel) checksum algorithm > > kernel: [ 1472.203491] BTRFS info (device sdc): disk space caching is enabled > > kernel: [ 1478.155004] BTRFS info (device sdc): bdev /dev/sdf errs: wr > > 0, rd 0, flush 0, corrupt 845, gen 0 > > kernel: [ 1478.155022] BTRFS info (device sdc): bdev /dev/sda errs: wr > > 41089, rd 1556, flush 0, corrupt 0, gen 0 > > kernel: [ 1478.155034] BTRFS info (device sdc): bdev /dev/sdh errs: wr > > 3, rd 7, flush 0, corrupt 0, gen 0 > > kernel: [ 1478.155041] BTRFS info (device sdc): bdev /dev/sdd errs: wr > > 41, rd 0, flush 0, corrupt 0, gen 0 > > kernel: [ 1696.662526] BTRFS info (device sdc): balance: resume skipped > > kernel: [ 1696.662537] BTRFS info (device sdc): checking UUID tree > > kernel: [ 1919.452464] BTRFS: Transaction aborted (error -28) > > kernel: [ 1919.452534] WARNING: CPU: 1 PID: 161 at > > fs/btrfs/extent-tree.c:2847 do_free_extent_accounting+0x21a/0x220 > > [btrfs] > > kernel: [ 1919.452655] Modules linked in: xt_nat xt_tcpudp veth > > xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink > > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo > > xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc > > ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) binfmt_misc > > nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic > > ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg > > snd_intel_sdw_acpi snd_hda_codec intel_rapl_msr snd_hda_core > > intel_rapl_common edac_mce_amd snd_hwdep kvm_amd snd_pcm snd_timer kvm > > irqbypass rapl wmi_bmof snd k10temp soundcore ccp input_leds mac_hid > > dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua bonding tls nfsd > > msr auth_rpcgss efi_pstore nfs_acl lockd grace sunrpc dmi_sysfs > > ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 > > async_raid6_recov async_memcpy async_pq async_xor async_tx xor > > raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid > > amdgpu uas hid iommu_v2 > > kernel: [ 1919.452839] Workqueue: events_unbound > > btrfs_async_reclaim_metadata_space [btrfs] > > kernel: [ 1919.452985] RIP: 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] > > kernel: [ 1919.453141] __btrfs_free_extent+0x6bc/0xf50 [btrfs] > > kernel: [ 1919.453256] run_delayed_data_ref+0x8b/0x180 [btrfs] > > kernel: [ 1919.453368] btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] > > kernel: [ 1919.453480] __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] > > kernel: [ 1919.453592] btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] > > kernel: [ 1919.453703] flush_space+0x23c/0x2c0 [btrfs] > > kernel: [ 1919.453845] btrfs_async_reclaim_metadata_space+0x19b/0x2b0 [btrfs] > > kernel: [ 1919.454034] BTRFS info (device sdc: state A): dumping space info: > > kernel: [ 1919.454038] BTRFS info (device sdc: state A): space_info > > DATA has 160778723328 free, is not full > > kernel: [ 1919.454043] BTRFS info (device sdc: state A): space_info > > total=71201958395904, used=71017442181120, pinned=23733952512, > > reserved=0, may_use=0, readonly=3538944 zone_unusable=0 > > kernel: [ 1919.454050] BTRFS info (device sdc: state A): space_info > > METADATA has -147570688 free, is full > > kernel: [ 1919.454054] BTRFS info (device sdc: state A): space_info > > total=83530612736, used=82792185856, pinned=238059520, > > reserved=500367360, may_use=147570688, readonly=0 zone_unusable=0 > > kernel: [ 1919.454060] BTRFS info (device sdc: state A): space_info > > SYSTEM has 33439744 free, is not full > > kernel: [ 1919.454064] BTRFS info (device sdc: state A): space_info > > total=38797312, used=5357568, pinned=0, reserved=0, may_use=0, > > readonly=0 zone_unusable=0 > > kernel: [ 1919.454070] BTRFS info (device sdc: state A): > > global_block_rsv: size 536870912 reserved 147570688 > > kernel: [ 1919.454074] BTRFS info (device sdc: state A): > > trans_block_rsv: size 0 reserved 0 > > kernel: [ 1919.454077] BTRFS info (device sdc: state A): > > chunk_block_rsv: size 0 reserved 0 > > kernel: [ 1919.454080] BTRFS info (device sdc: state A): > > delayed_block_rsv: size 0 reserved 0 > > kernel: [ 1919.454083] BTRFS info (device sdc: state A): > > delayed_refs_rsv: size 254292787200 reserved 0 > > kernel: [ 1919.454086] BTRFS: error (device sdc: state A) in > > do_free_extent_accounting:2847: errno=-28 No space left > > kernel: [ 1919.454123] BTRFS info (device sdc: state EA): forced readonly > > kernel: [ 1919.454127] BTRFS error (device sdc: state EA): failed to > > run delayed ref for logical 102538713931776 num_bytes 245760 type 184 > > action 2 ref_mod 1: -28 > > kernel: [ 1919.454176] BTRFS: error (device sdc: state EA) in > > btrfs_run_delayed_refs:2151: errno=-28 No space left > > kernel: [ 1919.454249] BTRFS warning (device sdc: state EA): > > btrfs_uuid_scan_kthread failed -5 > > kernel: [ 1919.472381] BTRFS: error (device sdc: state EA) in > > __btrfs_free_extent:3077: errno=-28 No space left > > kernel: [ 1919.472417] BTRFS error (device sdc: state EA): failed to > > run delayed ref for logical 102538732191744 num_bytes 245760 type 184 > > action 2 ref_mod 1: -28 > > kernel: [ 1919.472442] BTRFS: error (device sdc: state EA) in > > btrfs_run_delayed_refs:2151: errno=-28 No space left > > > > > > On Sat, 17 Jun 2023 at 15:00, Qu Wenruo <wqu@suse.com> wrote: > >> > >> > >> > >> On 2023/6/17 13:11, Stefan N wrote: > >>> Hi Qu, > >>> > >>> I believe I've got this environment ready, with the 6.2.0 kernel as > >>> before using the Ubuntu kernel, but can switch to vanilla if required. > >>> > >>> I've not done anything kernel modifications for a solid decade, so > >>> would be keen for a bit of guidance. > >> > >> Sure no problem. > >> > >> Please fetch the kernel source tar ball (6.2.x) first, decompress, then > >> apply the attached one-line patch by: > >> > >> $ tar czf linux*.tar.xz > >> $ cd linux* > >> $ patch -np1 -i <the patch file> > >> > >> Then use your running system kernel config if possible: > >> > >> $ cp /proc/config.gz . > >> $ gunzip config.gz > >> $ mv config .config > >> $ make olddefconfig > >> > >> Then you can start your kernel compiling, and considering you're using > >> your distro's default, it would include tons of drivers, thus would be > >> very slow. (Replace the number to something more suitable to your > >> system, using all CPU cores can be very hot) > >> > >> $ make -j12 > >> > >> Finally you need to install the modules/kernel. > >> > >> Unfortunately this is distro specific, but if you're using Ubuntu, it > >> may be much easier: > >> > >> $ make bindeb-pkg > >> > >> Then install the generated dpkg I guess? I have never tried kernel > >> building using deb/rpm, but only manual installation, which is also > >> distro dependent in the initramfs generation part. > >> > >> # cp arch/x86/boot/bzImage /boot/vmlinuz-custom > >> # make modules_install > >> # mkinitcpio -k /boot/vmlinuz-custom -g /boot/initramfs-custom.img > >> > >> > >> The last step is to update your bootloader to add the new kernel, which > >> is not only distro dependent but also bootloader dependent. > >> > >> In my case, I go with systemd-boot with manually crafted entries. > >> But if you go Ubuntu I believe just installing the kernel dpkg would > >> have everything handled? > >> > >> Finally you can try reboot into the newer kernel, and try device add > >> (need to add 4 disks), then sync and see if things work as expected. > >> > >> Thanks, > >> Qu > >>> > >>> I will recover a 1tb SSD and partition it into 4 in a USB enclosure, > >>> but failing this will use 4x loop devices. > >>> > >>> On Tue, 13 Jun 2023 at 11:28, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > >>>> In your particular case, since you're running RAID1C4 you need to add 4 > >>>> devices in one transaction. > >>>> > >>>> I can easily craft a patch to avoid commit transaction, but still you'll > >>>> need to add at least 4 disks, and then sync to see if things would work. > >>>> > >>>> Furthermore this means you need a liveCD with full kernel compiling > >>>> environment. > >>>> > >>>> If you want to go this path, I can send you the patch when you've > >>>> prepared the needed environment. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Out of space loop: skip_balance not working 2023-06-22 22:18 ` Stefan N @ 2023-06-23 0:57 ` Qu Wenruo 2023-06-23 9:00 ` Stefan N 0 siblings, 1 reply; 22+ messages in thread From: Qu Wenruo @ 2023-06-23 0:57 UTC (permalink / raw) To: Stefan N; +Cc: Qu Wenruo, linux-btrfs@vger.kernel.org On 2023/6/23 06:18, Stefan N wrote: > Hi Qu, > > I got one new line this time, but it doesn't seem to match your commit > ERROR: zoned: unable to stat /dev/loop/13 Please provide the dmesg of that attempt, as all the extra debug info is inside dmesg. With that info provided, we can determine what to do next. Thanks, Qu > > I tried it on the USB flash drives too and didn't get any extra line > > In context > $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs > dev add -K -f /dev/loop12 /dev/loop/13 /dev/loop14 /dev/loop15 > /mnt/data ; sudo btrfs fi sync /mnt/data > ERROR: error adding device '/dev/loop12': Input/output error > ERROR: zoned: unable to stat /dev/loop/13 > ERROR: checking status of /dev/loop/13: No such file or directory > ERROR: error adding device '/dev/loop14': Read-only file system > ERROR: error adding device '/dev/loop15': Read-only file system > ERROR: Could not sync filesystem: Read-only file system > $ > > $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs > dev add -f /dev/sdl /dev/sdm /dev/sdn /dev/sdo /mnt/data ; sudo btrfs > fi sync /mnt/data > ERROR: error adding device '/dev/sdl': Input/output error > ERROR: error adding device '/dev/sdm': Read-only file system > ERROR: error adding device '/dev/sdn': Read-only file system > ERROR: error adding device '/dev/sdo': Read-only file system > ERROR: Could not sync filesystem: Read-only file system > $ > > On Thu, 22 Jun 2023 at 18:48, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >> >> >> >> On 2023/6/22 16:33, Stefan N wrote: >>> Hi Qu, >>> >>> Many thanks for the detailed instructions and your patience. I got it >>> working combined with >>> https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel on the main system >>> OS instead, tagged +btrfix >>> $ uname -vr >>> 6.2.0-23-generic #23+btrfix SMP PREEMPT_DYNAMIC Thu Jun 22 >>> >>> However, I've not had luck with the commands suggested, and would >>> appreciate any further ideas. >>> >>> Outputs follow below, with /mnt/data as the btrfs mount point that >>> currently contains 8x disks sd[a-j] with an additional 4x 64gb USB >>> flash drives being added sd[l-o] >>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs >>> dev add -f /dev/sdl /dev/sdm /dev/sdn /dev/sdo /mnt/data ; sudo btrfs >>> fi sync /mnt/data >>> ERROR: error adding device '/dev/sdl': Input/output error >>> ERROR: error adding device '/dev/sdm': Read-only file system >>> ERROR: error adding device '/dev/sdn': Read-only file system >>> ERROR: error adding device '/dev/sdo': Read-only file system >>> ERROR: Could not sync filesystem: Read-only file system >>> $ >>> >>> The same occurs if I try to add 4x 100mb loop devices (on a ssd so >>> they're super quick to zero); >>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs >>> dev add -K -f /dev/loop16 /dev/loop17 /dev/loop18 /dev/loop19 >>> /mnt/data ; sudo btrfs fi sync /mnt/data >>> ERROR: error adding device '/dev/loop16': Input/output error >> >> This is the interesting part, this means we're erroring out due to -EIO >> (not -ENOSPC) during the first device add. >> >> And by somehow, after the first device add, we already got the trans abort. >> >> Would you please try the following branch? >> >> https://github.com/adam900710/linux/tree/dev_add_no_commit >> >> It has not only the patch to skip the commit, but also extra debug >> output for the situation. >> >> Thanks, >> Qu >> >>> ERROR: error adding device '/dev/loop17': Read-only file system >>> ERROR: error adding device '/dev/loop18': Read-only file system >>> ERROR: error adding device '/dev/loop19': Read-only file system >>> ERROR: Could not sync filesystem: Read-only file system >>> $ >>> >>> I confirmed before both these kernel builds that the replaced line was >>> btrfs_end_transaction rather than btrfs_commit_transaction (anyone >>> else following, I needed to remove the -n in the patch command >>> earlier) >>> $ grep -A3 -ri btrfs_sysfs_update_sprout */fs/btrfs/volumes.c* >>> linux-6.2.0-dist/fs/btrfs/volumes.c: >>> btrfs_sysfs_update_sprout_fsid(fs_devices); >>> linux-6.2.0-dist/fs/btrfs/volumes.c- } >>> linux-6.2.0-dist/fs/btrfs/volumes.c- >>> linux-6.2.0-dist/fs/btrfs/volumes.c- ret = btrfs_commit_transaction(trans); >>> -- >>> linux-6.2.0-v2/fs/btrfs/volumes.c: >>> btrfs_sysfs_update_sprout_fsid(fs_devices); >>> linux-6.2.0-v2/fs/btrfs/volumes.c- } >>> linux-6.2.0-v2/fs/btrfs/volumes.c- >>> linux-6.2.0-v2/fs/btrfs/volumes.c- ret = btrfs_end_transaction(trans); >>> -- >>> linux-6.2.0-v3/fs/btrfs/volumes.c: >>> btrfs_sysfs_update_sprout_fsid(fs_devices); >>> linux-6.2.0-v3/fs/btrfs/volumes.c- } >>> linux-6.2.0-v3/fs/btrfs/volumes.c- >>> linux-6.2.0-v3/fs/btrfs/volumes.c- ret = btrfs_end_transaction(trans); >>> $ >>> >>> $ btrfs fi usage /mnt/data >>> Overall: >>> Device size: 87.31TiB >>> Device allocated: 87.31TiB >>> Device unallocated: 1.94GiB >>> Device missing: 0.00B >>> Device slack: 0.00B >>> Used: 87.08TiB >>> Free (estimated): 173.29GiB (min: 172.33GiB) >>> Free (statfs, df): 171.84GiB >>> Data ratio: 1.34 >>> Metadata ratio: 4.00 >>> Global reserve: 512.00MiB (used: 371.25MiB) >>> Multiple profiles: no >>> >>> Data,RAID6: Size:64.76TiB, Used:64.59TiB (99.74%) >>> /dev/sdc 10.90TiB >>> /dev/sdf 10.90TiB >>> /dev/sda 10.86TiB >>> /dev/sdg 10.87TiB >>> /dev/sdh 10.86TiB >>> /dev/sdd 10.87TiB >>> /dev/sde 10.88TiB >>> /dev/sdb 10.88TiB >>> >>> Metadata,RAID1C4: Size:77.79GiB, Used:77.11GiB (99.12%) >>> /dev/sdc 15.33GiB >>> /dev/sdf 18.41GiB >>> /dev/sda 49.63GiB >>> /dev/sdg 49.50GiB >>> /dev/sdh 51.52GiB >>> /dev/sdd 48.70GiB >>> /dev/sde 39.09GiB >>> /dev/sdb 39.01GiB >>> >>> System,RAID1C4: Size:37.00MiB, Used:5.11MiB (13.81%) >>> /dev/sdc 1.00MiB >>> /dev/sda 37.00MiB >>> /dev/sdg 37.00MiB >>> /dev/sdh 36.00MiB >>> /dev/sdd 37.00MiB >>> >>> Unallocated: >>> /dev/sdc 1.00MiB >>> /dev/sdf 1.00MiB >>> /dev/sda 1.27GiB >>> /dev/sdg 1.00MiB >>> /dev/sdh 1.00MiB >>> /dev/sdd 687.00MiB >>> /dev/sde 1.00MiB >>> /dev/sdb 1.00MiB >>> $ >>> >>> >>> This first attempt generated the following syslog output: >>> kernel: [ 868.435387] BTRFS info (device sde): using crc32c >>> (crc32c-intel) checksum algorithm >>> kernel: [ 868.435407] BTRFS info (device sde): disk space caching is enabled >>> kernel: [ 874.477712] BTRFS info (device sde): bdev /dev/sdg errs: wr >>> 0, rd 0, flush 0, corrupt 845, gen 0 >>> kernel: [ 874.477727] BTRFS info (device sde): bdev /dev/sdc errs: wr >>> 41089, rd 1556, flush 0, corrupt 0, gen 0 >>> kernel: [ 874.477735] BTRFS info (device sde): bdev /dev/sdj errs: wr >>> 3, rd 7, flush 0, corrupt 0, gen 0 >>> kernel: [ 874.477740] BTRFS info (device sde): bdev /dev/sdf errs: wr >>> 41, rd 0, flush 0, corrupt 0, gen 0 >>> kernel: [ 1082.645551] BTRFS info (device sde): balance: resume skipped >>> kernel: [ 1082.645564] BTRFS info (device sde): checking UUID tree >>> kernel: [ 1082.645551] BTRFS info (device sde): balance: resume skipped >>> kernel: [ 1082.645564] BTRFS info (device sde): checking UUID tree >>> kernel: [ 1267.280506] BTRFS: Transaction aborted (error -28) >>> kernel: [ 1267.280553] BTRFS: error (device sde: state A) in >>> do_free_extent_accounting:2847: errno=-28 No space left >>> kernel: [ 1267.280604] BTRFS info (device sde: state EA): forced readonly >>> kernel: [ 1267.280610] BTRFS error (device sde: state EA): failed to >>> run delayed ref for logical 102255404044288 num_bytes 294912 type 184 >>> action 2 ref_mod 1: -28 >>> kernel: [ 1267.280584] WARNING: CPU: 3 PID: 14519 at >>> fs/btrfs/extent-tree.c:2847 do_free_extent_accounting+0x21a/0x220 >>> [btrfs] >>> kernel: [ 1267.280666] BTRFS: error (device sde: state EA) in >>> btrfs_run_delayed_refs:2151: errno=-28 No space left >>> kernel: [ 1267.280695] BTRFS warning (device sde: state EA): >>> btrfs_uuid_scan_kthread failed -5 >>> kernel: [ 1267.280794] Modules linked in: xt_nat xt_tcpudp veth >>> xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink >>> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo >>> xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc >>> ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) binfmt_misc >>> nls_iso8859_1 intel_rapl_msr intel_rapl_common edac_mce_amd >>> snd_hda_codec_realtek kvm_amd snd_hda_codec_generic ledtrig_audio kvm >>> snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi >>> snd_hda_codec irqbypass snd_hda_core snd_hwdep rapl snd_pcm snd_timer >>> wmi_bmof k10temp snd ccp soundcore input_leds mac_hid dm_multipath >>> scsi_dh_rdac scsi_dh_emc scsi_dh_alua bonding tls efi_pstore msr nfsd >>> auth_rpcgss nfs_acl lockd grace sunrpc dmi_sysfs ip_tables x_tables >>> autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov >>> async_memcpy async_pq async_xor async_txxor raid6_pq libcrc32c raid1 >>> raid0 multipath linear hid_generic usbhid hid amdgpu uas usb_storage >>> kernel: [ 1267.280994] CPU: 3 PID: 14519 Comm: btrfs-transacti >>> Tainted: G W O 6.2.0-23-generic #23+btrfix >>> kernel: [ 1267.281005] RIP: 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] >>> kernel: [ 1267.281181] __btrfs_free_extent+0x6bc/0xf50 [btrfs] >>> kernel: [ 1267.281310] run_delayed_data_ref+0x8b/0x180 [btrfs] >>> kernel: [ 1267.281444] btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] >>> kernel: [ 1267.281570] __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] >>> kernel: [ 1267.281694] btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] >>> kernel: [ 1267.281818] btrfs_start_dirty_block_groups+0x36b/0x530 [btrfs] >>> kernel: [ 1267.281976] btrfs_commit_transaction+0xb3/0xbc0 [btrfs] >>> kernel: [ 1267.282110] ? start_transaction+0xc8/0x600 [btrfs] >>> kernel: [ 1267.282244] transaction_kthread+0x14b/0x1c0 [btrfs] >>> kernel: [ 1267.282375] ? __pfx_transaction_kthread+0x10/0x10 [btrfs] >>> kernel: [ 1267.282548] BTRFS info (device sde: state EA): dumping space info: >>> kernel: [ 1267.282552] BTRFS info (device sde: state EA): space_info >>> DATA has 160777674752 free, is not full >>> kernel: [ 1267.282558] BTRFS info (device sde: state EA): space_info >>> total=71201958395904, used=71018191273984, pinned=22985908224, >>> reserved=0, may_use=0, readonly=3538944 zone_unusable=0 >>> kernel: [ 1267.282566] BTRFS info (device sde: state EA): space_info >>> METADATA has -124944384 free, is full >>> kernel: [ 1267.282571] BTRFS info (device sde: state EA): space_info >>> total=83530612736, used=82791497728, pinned=242745344, >>> reserved=496369664, may_use=124944384, readonly=0 zone_unusable=0 >>> kernel: [ 1267.282577] BTRFS info (device sde: state EA): space_info >>> SYSTEM has 33439744 free, is not full >>> kernel: [ 1267.282582] BTRFS info (device sde: state EA): space_info >>> total=38797312, used=5357568, pinned=0, reserved=0, may_use=0, >>> readonly=0 zone_unusable=0 >>> kernel: [ 1267.282588] BTRFS info (device sde: state EA): >>> global_block_rsv: size 536870912 reserved 124944384 >>> kernel: [ 1267.282592] BTRFS info (device sde: state EA): >>> trans_block_rsv: size 0 reserved 0 >>> kernel: [ 1267.282595] BTRFS info (device sde: state EA): >>> chunk_block_rsv: size 0 reserved 0 >>> kernel: [ 1267.282599] BTRFS info (device sde: state EA): >>> delayed_block_rsv: size 0 reserved 0 >>> kernel: [ 1267.282602] BTRFS info (device sde: state EA): >>> delayed_refs_rsv: size 251322957824 reserved 0 >>> kernel: [ 1267.282608] BTRFS: error (device sde: state EA) in >>> do_free_extent_accounting:2847: errno=-28 No space left >>> kernel: [ 1267.282653] BTRFS error (device sde: state EA): failed to >>> run delayed ref for logical 102255401897984 num_bytes 126976 type 184 >>> action 2 ref_mod 1: -28 >>> kernel: [ 1267.282708] BTRFS: error (device sde: state EA) in >>> btrfs_run_delayed_refs:2151: errno=-28 No space left >>> >>> A couple of kernel recompiles later, the second attempt on the SSD >>> generated similar: >>> kernel: [ 1472.203470] BTRFS info (device sdc): using crc32c >>> (crc32c-intel) checksum algorithm >>> kernel: [ 1472.203491] BTRFS info (device sdc): disk space caching is enabled >>> kernel: [ 1478.155004] BTRFS info (device sdc): bdev /dev/sdf errs: wr >>> 0, rd 0, flush 0, corrupt 845, gen 0 >>> kernel: [ 1478.155022] BTRFS info (device sdc): bdev /dev/sda errs: wr >>> 41089, rd 1556, flush 0, corrupt 0, gen 0 >>> kernel: [ 1478.155034] BTRFS info (device sdc): bdev /dev/sdh errs: wr >>> 3, rd 7, flush 0, corrupt 0, gen 0 >>> kernel: [ 1478.155041] BTRFS info (device sdc): bdev /dev/sdd errs: wr >>> 41, rd 0, flush 0, corrupt 0, gen 0 >>> kernel: [ 1696.662526] BTRFS info (device sdc): balance: resume skipped >>> kernel: [ 1696.662537] BTRFS info (device sdc): checking UUID tree >>> kernel: [ 1919.452464] BTRFS: Transaction aborted (error -28) >>> kernel: [ 1919.452534] WARNING: CPU: 1 PID: 161 at >>> fs/btrfs/extent-tree.c:2847 do_free_extent_accounting+0x21a/0x220 >>> [btrfs] >>> kernel: [ 1919.452655] Modules linked in: xt_nat xt_tcpudp veth >>> xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink >>> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo >>> xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc >>> ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) binfmt_misc >>> nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic >>> ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg >>> snd_intel_sdw_acpi snd_hda_codec intel_rapl_msr snd_hda_core >>> intel_rapl_common edac_mce_amd snd_hwdep kvm_amd snd_pcm snd_timer kvm >>> irqbypass rapl wmi_bmof snd k10temp soundcore ccp input_leds mac_hid >>> dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua bonding tls nfsd >>> msr auth_rpcgss efi_pstore nfs_acl lockd grace sunrpc dmi_sysfs >>> ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 >>> async_raid6_recov async_memcpy async_pq async_xor async_tx xor >>> raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid >>> amdgpu uas hid iommu_v2 >>> kernel: [ 1919.452839] Workqueue: events_unbound >>> btrfs_async_reclaim_metadata_space [btrfs] >>> kernel: [ 1919.452985] RIP: 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] >>> kernel: [ 1919.453141] __btrfs_free_extent+0x6bc/0xf50 [btrfs] >>> kernel: [ 1919.453256] run_delayed_data_ref+0x8b/0x180 [btrfs] >>> kernel: [ 1919.453368] btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] >>> kernel: [ 1919.453480] __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] >>> kernel: [ 1919.453592] btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] >>> kernel: [ 1919.453703] flush_space+0x23c/0x2c0 [btrfs] >>> kernel: [ 1919.453845] btrfs_async_reclaim_metadata_space+0x19b/0x2b0 [btrfs] >>> kernel: [ 1919.454034] BTRFS info (device sdc: state A): dumping space info: >>> kernel: [ 1919.454038] BTRFS info (device sdc: state A): space_info >>> DATA has 160778723328 free, is not full >>> kernel: [ 1919.454043] BTRFS info (device sdc: state A): space_info >>> total=71201958395904, used=71017442181120, pinned=23733952512, >>> reserved=0, may_use=0, readonly=3538944 zone_unusable=0 >>> kernel: [ 1919.454050] BTRFS info (device sdc: state A): space_info >>> METADATA has -147570688 free, is full >>> kernel: [ 1919.454054] BTRFS info (device sdc: state A): space_info >>> total=83530612736, used=82792185856, pinned=238059520, >>> reserved=500367360, may_use=147570688, readonly=0 zone_unusable=0 >>> kernel: [ 1919.454060] BTRFS info (device sdc: state A): space_info >>> SYSTEM has 33439744 free, is not full >>> kernel: [ 1919.454064] BTRFS info (device sdc: state A): space_info >>> total=38797312, used=5357568, pinned=0, reserved=0, may_use=0, >>> readonly=0 zone_unusable=0 >>> kernel: [ 1919.454070] BTRFS info (device sdc: state A): >>> global_block_rsv: size 536870912 reserved 147570688 >>> kernel: [ 1919.454074] BTRFS info (device sdc: state A): >>> trans_block_rsv: size 0 reserved 0 >>> kernel: [ 1919.454077] BTRFS info (device sdc: state A): >>> chunk_block_rsv: size 0 reserved 0 >>> kernel: [ 1919.454080] BTRFS info (device sdc: state A): >>> delayed_block_rsv: size 0 reserved 0 >>> kernel: [ 1919.454083] BTRFS info (device sdc: state A): >>> delayed_refs_rsv: size 254292787200 reserved 0 >>> kernel: [ 1919.454086] BTRFS: error (device sdc: state A) in >>> do_free_extent_accounting:2847: errno=-28 No space left >>> kernel: [ 1919.454123] BTRFS info (device sdc: state EA): forced readonly >>> kernel: [ 1919.454127] BTRFS error (device sdc: state EA): failed to >>> run delayed ref for logical 102538713931776 num_bytes 245760 type 184 >>> action 2 ref_mod 1: -28 >>> kernel: [ 1919.454176] BTRFS: error (device sdc: state EA) in >>> btrfs_run_delayed_refs:2151: errno=-28 No space left >>> kernel: [ 1919.454249] BTRFS warning (device sdc: state EA): >>> btrfs_uuid_scan_kthread failed -5 >>> kernel: [ 1919.472381] BTRFS: error (device sdc: state EA) in >>> __btrfs_free_extent:3077: errno=-28 No space left >>> kernel: [ 1919.472417] BTRFS error (device sdc: state EA): failed to >>> run delayed ref for logical 102538732191744 num_bytes 245760 type 184 >>> action 2 ref_mod 1: -28 >>> kernel: [ 1919.472442] BTRFS: error (device sdc: state EA) in >>> btrfs_run_delayed_refs:2151: errno=-28 No space left >>> >>> >>> On Sat, 17 Jun 2023 at 15:00, Qu Wenruo <wqu@suse.com> wrote: >>>> >>>> >>>> >>>> On 2023/6/17 13:11, Stefan N wrote: >>>>> Hi Qu, >>>>> >>>>> I believe I've got this environment ready, with the 6.2.0 kernel as >>>>> before using the Ubuntu kernel, but can switch to vanilla if required. >>>>> >>>>> I've not done anything kernel modifications for a solid decade, so >>>>> would be keen for a bit of guidance. >>>> >>>> Sure no problem. >>>> >>>> Please fetch the kernel source tar ball (6.2.x) first, decompress, then >>>> apply the attached one-line patch by: >>>> >>>> $ tar czf linux*.tar.xz >>>> $ cd linux* >>>> $ patch -np1 -i <the patch file> >>>> >>>> Then use your running system kernel config if possible: >>>> >>>> $ cp /proc/config.gz . >>>> $ gunzip config.gz >>>> $ mv config .config >>>> $ make olddefconfig >>>> >>>> Then you can start your kernel compiling, and considering you're using >>>> your distro's default, it would include tons of drivers, thus would be >>>> very slow. (Replace the number to something more suitable to your >>>> system, using all CPU cores can be very hot) >>>> >>>> $ make -j12 >>>> >>>> Finally you need to install the modules/kernel. >>>> >>>> Unfortunately this is distro specific, but if you're using Ubuntu, it >>>> may be much easier: >>>> >>>> $ make bindeb-pkg >>>> >>>> Then install the generated dpkg I guess? I have never tried kernel >>>> building using deb/rpm, but only manual installation, which is also >>>> distro dependent in the initramfs generation part. >>>> >>>> # cp arch/x86/boot/bzImage /boot/vmlinuz-custom >>>> # make modules_install >>>> # mkinitcpio -k /boot/vmlinuz-custom -g /boot/initramfs-custom.img >>>> >>>> >>>> The last step is to update your bootloader to add the new kernel, which >>>> is not only distro dependent but also bootloader dependent. >>>> >>>> In my case, I go with systemd-boot with manually crafted entries. >>>> But if you go Ubuntu I believe just installing the kernel dpkg would >>>> have everything handled? >>>> >>>> Finally you can try reboot into the newer kernel, and try device add >>>> (need to add 4 disks), then sync and see if things work as expected. >>>> >>>> Thanks, >>>> Qu >>>>> >>>>> I will recover a 1tb SSD and partition it into 4 in a USB enclosure, >>>>> but failing this will use 4x loop devices. >>>>> >>>>> On Tue, 13 Jun 2023 at 11:28, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >>>>>> In your particular case, since you're running RAID1C4 you need to add 4 >>>>>> devices in one transaction. >>>>>> >>>>>> I can easily craft a patch to avoid commit transaction, but still you'll >>>>>> need to add at least 4 disks, and then sync to see if things would work. >>>>>> >>>>>> Furthermore this means you need a liveCD with full kernel compiling >>>>>> environment. >>>>>> >>>>>> If you want to go this path, I can send you the patch when you've >>>>>> prepared the needed environment. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Out of space loop: skip_balance not working 2023-06-23 0:57 ` Qu Wenruo @ 2023-06-23 9:00 ` Stefan N 2023-06-23 9:46 ` Qu Wenruo 0 siblings, 1 reply; 22+ messages in thread From: Stefan N @ 2023-06-23 9:00 UTC (permalink / raw) To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs@vger.kernel.org Apologies, I thought I included the log output too, though I can't see any additional output From a fresh run, still using the same kernel $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs dev add -f /dev/sdl /dev/sdm /dev/sdn /dev/sdo /mnt/data ; sudo btrfs fi sync /mnt/data ERROR: error adding device '/dev/sdl': Input/output error ERROR: error adding device '/dev/sdm': Read-only file system ERROR: error adding device '/dev/sdn': Read-only file system ERROR: error adding device '/dev/sdo': Read-only file system ERROR: Could not sync filesystem: Read-only file system $ Output from kern.log, syslog or dmesg -k kernel: [ 384.993736] BTRFS info (device sde): using crc32c (crc32c-intel) checksum algorithm kernel: [ 384.993749] BTRFS info (device sde): disk space caching is enabled kernel: [ 390.851902] BTRFS info (device sde): bdev /dev/sdf errs: wr 0, rd 0, flush 0, corrupt 845, gen 0 kernel: [ 390.851919] BTRFS info (device sde): bdev /dev/sdc errs: wr 41089, rd 1556, flush 0, corrupt 0, gen 0 kernel: [ 390.851931] BTRFS info (device sde): bdev /dev/sdi errs: wr 3, rd 7, flush 0, corrupt 0, gen 0 kernel: [ 390.851939] BTRFS info (device sde): bdev /dev/sdg errs: wr 41, rd 0, flush 0, corrupt 0, gen 0 kernel: [ 598.443937] BTRFS info (device sde): balance: resume skipped kernel: [ 598.443951] BTRFS info (device sde): checking UUID tree kernel: [ 781.098478] ------------[ cut here ]------------ kernel: [ 781.098484] BTRFS: Transaction aborted (error -28) kernel: [ 781.098555] WARNING: CPU: 3 PID: 4168 at fs/btrfs/extent-tree.c:2847 do_free_extent_accounting+0x21a/0x220 [btrfs] kernel: [ 781.098690] Modules linked in: ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) binfmt_misc nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep kvm irqbypass snd_pcm snd_timer raplwmi_bmof snd k10temp ccp soundcore input_leds mac_hid dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua bonding tls nfsd msr auth_rpcgss nfs_acl lockd efi_pstore grace sunrpc dmi_sysfs ip_tables x_tables autofs4btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid amdgpu iommu_v2 drm_buddy uasgpu_sched drm_ttm_helper usb_storage ttm drm_display_helper cec rc_core drm_kms_helper mpt3sas crct10dif_pclmul syscopyarea crc32_pclmul polyval_clmulni polyval_generic sysfillrect ghash_clmulni_intel sysimgblt kernel: [ 781.098819] sha512_ssse3 aesni_intel nvme crypto_simd cryptd drm raid_class scsi_transport_sas i2c_piix4 igb nvme_core nvme_common ahci libahci qlcnic xhci_pci dca xhci_pci_renesas i2c_algo_bit video wmi kernel: [ 781.098853] CPU: 3 PID: 4168 Comm: kworker/u64:2 Tainted: G W O 6.2.0-23-generic #23+btrdebug kernel: [ 781.098860] Hardware name: To Be Filled By O.E.M. X570M Pro4/X570M Pro4, BIOS P3.70 02/23/2022 kernel: [ 781.098864] Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs] kernel: [ 781.099027] RIP: 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] kernel: [ 781.099151] Code: e0 0f 0b eb b8 44 89 e6 48 c7 c7 38 d9 91 c1 e8 2c 35 4d e0 0f 0b e9 78 ff ff ff 44 89 e6 48 c7 c7 38 d9 91 c1 e8 16 35 4d e0 <0f> 0b eb b9 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 kernel: [ 781.099156] RSP: 0018:ffffa27fc59cbb58 EFLAGS: 00010246 kernel: [ 781.099162] RAX: 0000000000000000 RBX: ffff921a6c859e38 RCX: 0000000000000000 kernel: [ 781.099166] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 kernel: [ 781.099169] RBP: ffffa27fc59cbb80 R08: 0000000000000000 R09: 0000000000000000 kernel: [ 781.099172] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffe4 kernel: [ 781.099175] R13: 00005cba077b2000 R14: 000000000002c000 R15: ffff921bdc8420e0 kernel: [ 781.099179] FS: 0000000000000000(0000) GS:ffff922120ac0000(0000) knlGS:0000000000000000 kernel: [ 781.099183] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: [ 781.099187] CR2: 00007ffdf0d6ceba CR3: 000000014ae5a000 CR4: 00000000003506e0 kernel: [ 781.099191] Call Trace: kernel: [ 781.099194] <TASK> kernel: [ 781.099200] __btrfs_free_extent+0x6bc/0xf50 [btrfs] kernel: [ 781.099328] run_delayed_data_ref+0x8b/0x180 [btrfs] kernel: [ 781.099453] btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] kernel: [ 781.099577] __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] kernel: [ 781.099701] btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] kernel: [ 781.099825] flush_space+0x23c/0x2c0 [btrfs] kernel: [ 781.099983] btrfs_async_reclaim_metadata_space+0x19b/0x2b0 [btrfs] kernel: [ 781.100140] process_one_work+0x225/0x430 kernel: [ 781.100150] worker_thread+0x50/0x3e0 kernel: [ 781.100156] ? __pfx_worker_thread+0x10/0x10 kernel: [ 781.100162] kthread+0xe9/0x110 kernel: [ 781.100169] ? __pfx_kthread+0x10/0x10 kernel: [ 781.100176] ret_from_fork+0x2c/0x50 kernel: [ 781.100186] </TASK> kernel: [ 781.100188] ---[ end trace 0000000000000000 ]--- kernel: [ 781.100192] BTRFS info (device sde: state A): dumping space info: kernel: [ 781.100197] BTRFS info (device sde: state A): space_info DATA has 160770072576 free, is not full kernel: [ 781.100203] BTRFS info (device sde: state A): space_info total=71201958395904, used=71019661762560, pinned=21523021824, reserved=0, may_use=0, readonly=3538944 zone_unusable=0 kernel: [ 781.100211] BTRFS info (device sde: state A): space_info METADATA has -136544256 free, is full kernel: [ 781.100215] BTRFS info (device sde: state A): space_info total=83530612736, used=82787270656, pinned=254132224, reserved=489209856, may_use=136544256, readonly=0 zone_unusable=0 kernel: [ 781.100222] BTRFS info (device sde: state A): space_info SYSTEM has 33439744 free, is not full kernel: [ 781.100227] BTRFS info (device sde: state A): space_info total=38797312, used=5357568, pinned=0, reserved=0, may_use=0, readonly=0 zone_unusable=0 kernel: [ 781.100233] BTRFS info (device sde: state A): global_block_rsv: size 536870912 reserved 136544256 kernel: [ 781.100237] BTRFS info (device sde: state A): trans_block_rsv: size 0 reserved 0 kernel: [ 781.100241] BTRFS info (device sde: state A): chunk_block_rsv: size 0 reserved 0 kernel: [ 781.100244] BTRFS info (device sde: state A): delayed_block_rsv: size 0 reserved 0 kernel: [ 781.100247] BTRFS info (device sde: state A): delayed_refs_rsv: size 191623069696 reserved 0 kernel: [ 781.100251] BTRFS: error (device sde: state A) in do_free_extent_accounting:2847: errno=-28 No space left kernel: [ 781.100292] BTRFS info (device sde: state EA): forced readonly kernel: [ 781.100296] BTRFS error (device sde: state EA): failed to run delayed ref for logical 101954059182080 num_bytes 180224 type 184 action 2 ref_mod 1: -28 kernel: [ 781.100351] BTRFS: error (device sde: state EA) in btrfs_run_delayed_refs:2151: errno=-28 No space left kernel: [ 781.100423] BTRFS warning (device sde: state EA): btrfs_uuid_scan_kthread failed -5 kernel: [ 781.109776] BTRFS: error (device sde: state EA) in do_free_extent_accounting:2847: errno=-28 No space left kernel: [ 781.109839] BTRFS error (device sde: state EA): failed to run delayed ref for logical 101954133950464 num_bytes 126976 type 184 action 2 ref_mod 1: -28 kernel: [ 781.109897] BTRFS: error (device sde: state EA) in btrfs_run_delayed_refs:2151: errno=-28 No space left However, now I started digging into logs to check I hadn't missed where the errors were being logged, I've found this from roughly a week before I started having issues, which I had not previously noticed [ 1990.495861] BTRFS error (device sdh): failed to run delayed ref for logical 107988943355904 num_bytes 245760 type 184 action 2 ref_mod 1: -28 [ 1990.518282] BTRFS error (device sdh): failed to run delayed ref for logical 107989043494912 num_bytes 245760 type 184 action 2 ref_mod 1: -28 [ 620.104065] BTRFS error (device sdk): failed to run delayed ref for logical 123187655077888 num_bytes 176128 type 184 action 2 ref_mod 1: -28 [ 620.126209] BTRFS error (device sdk): failed to run delayed ref for logical 123190279929856 num_bytes 134217728 type 184 action 2 ref_mod 1: -28 [ 620.126241] BTRFS error (device sdk): failed to run delayed ref for logical 123189970468864 num_bytes 134217728 type 184 action 2 ref_mod 1: -28 [ 620.126271] BTRFS error (device sdk): failed to run delayed ref for logical 123190414409728 num_bytes 134217728 type 184 action 2 ref_mod 1: -28 [ 476.565308] BTRFS error (device sdh): failed to run delayed ref for logical 101906434228224 num_bytes 651264 type 184 action 2 ref_mod 1: -28 [ 476.565932] BTRFS error (device sdh): failed to run delayed ref for logical 101906434031616 num_bytes 180224 type 184 action 2 ref_mod 1: -28 [ 447.371754] BTRFS error (device sdh): failed to run delayed ref for logical 101946151927808 num_bytes 262144 type 184 action 2 ref_mod 1: -28 [ 447.372362] BTRFS error (device sdh): failed to run delayed ref for logical 101946083725312 num_bytes 245760 type 184 action 2 ref_mod 1: -28 [ 439.839007] BTRFS error (device sdj): failed to run delayed ref for logical 101923102179328 num_bytes 192512 type 184 action 2 ref_mod 1: -28 [ 439.839578] BTRFS error (device sdj): failed to run delayed ref for logical 101923401629696 num_bytes 245760 type 184 action 2 ref_mod 1: -28 [ 466.393884] BTRFS error (device sdh): failed to run delayed ref for logical 101981116137472 num_bytes 245760 type 184 action 2 ref_mod 1: -28 [ 466.394451] BTRFS error (device sdh): failed to run delayed ref for logical 101981122854912 num_bytes 1720320 type 184 action 2 ref_mod 1: -28 [ 431.541367] BTRFS error (device sdh): failed to run delayed ref for logical 101876426952704 num_bytes 126976 type 184 action 2 ref_mod 1: -28 [ 431.542010] BTRFS error (device sdh): failed to run delayed ref for logical 101876427780096 num_bytes 126976 type 184 action 2 ref_mod 1: -28 [ 597.487948] BTRFS error (device sdj): failed to run delayed ref for logical 108127459409920 num_bytes 196608 type 184 action 2 ref_mod 1: -28 [ 597.488539] BTRFS error (device sdj): failed to run delayed ref for logical 108124677865472 num_bytes 126976 type 184 action 2 ref_mod 1: -28 [ 534.717509] BTRFS error (device sdh): failed to run delayed ref for logical 101958618710016 num_bytes 1597440 type 184 action 2 ref_mod 1: -28 [ 534.718494] BTRFS error (device sdh): failed to run delayed ref for logical 101958756335616 num_bytes 368640 type 184 action 2 ref_mod 1: -28 [ 508.089394] BTRFS error (device sdk): failed to run delayed ref for logical 101911627694080 num_bytes 126976 type 184 action 2 ref_mod 1: -28 [ 508.090007] BTRFS error (device sdk): failed to run delayed ref for logical 101911627415552 num_bytes 126976 type 184 action 2 ref_mod 1: -28 [ 1632.112084] BTRFS error (device sdh): failed to run delayed ref for logical 102203759886336 num_bytes 229376 type 184 action 2 ref_mod 1: -28 [ 1632.112885] BTRFS error (device sdh): failed to run delayed ref for logical 102203764379648 num_bytes 126976 type 184 action 2 ref_mod 1: -28 and today, when leaving the disks mounted read-only for a while, I found many occurances similar to: BTRFS error (device sdc: state EA): level verify failed on logical 201329754554368 mirror 1 wanted 2 found 0 BTRFS error (device sdc: state EA): level verify failed on logical 201329754554368 mirror 2 wanted 2 found 0 BTRFS error (device sdc: state EA): level verify failed on logical 201329754554368 mirror 3 wanted 2 found 0 BTRFS error (device sdc: state EA): level verify failed on logical 201329754554368 mirror 4 wanted 2 found 0 BTRFS error (device sdc: state EA): level verify failed on logical 201329754554368 mirror 1 wanted 2 found 0 BTRFS error (device sdc: state EA): level verify failed on logical 201329754554368 mirror 2 wanted 2 found 0 BTRFS error (device sdc: state EA): level verify failed on logical 201329754554368 mirror 3 wanted 2 found 0 BTRFS error (device sdc: state EA): level verify failed on logical 201350830227456 mirror 4 wanted 2 found 0 BTRFS error (device sdc: state EA): level verify failed on logical 201350830227456 mirror 1 wanted 2 found 0 BTRFS error (device sdc: state EA): level verify failed on logical 201350830227456 mirror 2 wanted 2 found 0 On Fri, 23 Jun 2023 at 10:27, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > > > On 2023/6/23 06:18, Stefan N wrote: > > Hi Qu, > > > > I got one new line this time, but it doesn't seem to match your commit > > ERROR: zoned: unable to stat /dev/loop/13 > > Please provide the dmesg of that attempt, as all the extra debug info is > inside dmesg. > > With that info provided, we can determine what to do next. > > Thanks, > Qu > > > > > I tried it on the USB flash drives too and didn't get any extra line > > > > In context > > $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs > > dev add -K -f /dev/loop12 /dev/loop/13 /dev/loop14 /dev/loop15 > > /mnt/data ; sudo btrfs fi sync /mnt/data > > ERROR: error adding device '/dev/loop12': Input/output error > > ERROR: zoned: unable to stat /dev/loop/13 > > ERROR: checking status of /dev/loop/13: No such file or directory > > ERROR: error adding device '/dev/loop14': Read-only file system > > ERROR: error adding device '/dev/loop15': Read-only file system > > ERROR: Could not sync filesystem: Read-only file system > > $ > > > > $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs > > dev add -f /dev/sdl /dev/sdm /dev/sdn /dev/sdo /mnt/data ; sudo btrfs > > fi sync /mnt/data > > ERROR: error adding device '/dev/sdl': Input/output error > > ERROR: error adding device '/dev/sdm': Read-only file system > > ERROR: error adding device '/dev/sdn': Read-only file system > > ERROR: error adding device '/dev/sdo': Read-only file system > > ERROR: Could not sync filesystem: Read-only file system > > $ > > > > On Thu, 22 Jun 2023 at 18:48, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > >> > >> > >> > >> On 2023/6/22 16:33, Stefan N wrote: > >>> Hi Qu, > >>> > >>> Many thanks for the detailed instructions and your patience. I got it > >>> working combined with > >>> https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel on the main system > >>> OS instead, tagged +btrfix > >>> $ uname -vr > >>> 6.2.0-23-generic #23+btrfix SMP PREEMPT_DYNAMIC Thu Jun 22 > >>> > >>> However, I've not had luck with the commands suggested, and would > >>> appreciate any further ideas. > >>> > >>> Outputs follow below, with /mnt/data as the btrfs mount point that > >>> currently contains 8x disks sd[a-j] with an additional 4x 64gb USB > >>> flash drives being added sd[l-o] > >>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs > >>> dev add -f /dev/sdl /dev/sdm /dev/sdn /dev/sdo /mnt/data ; sudo btrfs > >>> fi sync /mnt/data > >>> ERROR: error adding device '/dev/sdl': Input/output error > >>> ERROR: error adding device '/dev/sdm': Read-only file system > >>> ERROR: error adding device '/dev/sdn': Read-only file system > >>> ERROR: error adding device '/dev/sdo': Read-only file system > >>> ERROR: Could not sync filesystem: Read-only file system > >>> $ > >>> > >>> The same occurs if I try to add 4x 100mb loop devices (on a ssd so > >>> they're super quick to zero); > >>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs > >>> dev add -K -f /dev/loop16 /dev/loop17 /dev/loop18 /dev/loop19 > >>> /mnt/data ; sudo btrfs fi sync /mnt/data > >>> ERROR: error adding device '/dev/loop16': Input/output error > >> > >> This is the interesting part, this means we're erroring out due to -EIO > >> (not -ENOSPC) during the first device add. > >> > >> And by somehow, after the first device add, we already got the trans abort. > >> > >> Would you please try the following branch? > >> > >> https://github.com/adam900710/linux/tree/dev_add_no_commit > >> > >> It has not only the patch to skip the commit, but also extra debug > >> output for the situation. > >> > >> Thanks, > >> Qu > >> > >>> ERROR: error adding device '/dev/loop17': Read-only file system > >>> ERROR: error adding device '/dev/loop18': Read-only file system > >>> ERROR: error adding device '/dev/loop19': Read-only file system > >>> ERROR: Could not sync filesystem: Read-only file system > >>> $ > >>> > >>> I confirmed before both these kernel builds that the replaced line was > >>> btrfs_end_transaction rather than btrfs_commit_transaction (anyone > >>> else following, I needed to remove the -n in the patch command > >>> earlier) > >>> $ grep -A3 -ri btrfs_sysfs_update_sprout */fs/btrfs/volumes.c* > >>> linux-6.2.0-dist/fs/btrfs/volumes.c: > >>> btrfs_sysfs_update_sprout_fsid(fs_devices); > >>> linux-6.2.0-dist/fs/btrfs/volumes.c- } > >>> linux-6.2.0-dist/fs/btrfs/volumes.c- > >>> linux-6.2.0-dist/fs/btrfs/volumes.c- ret = btrfs_commit_transaction(trans); > >>> -- > >>> linux-6.2.0-v2/fs/btrfs/volumes.c: > >>> btrfs_sysfs_update_sprout_fsid(fs_devices); > >>> linux-6.2.0-v2/fs/btrfs/volumes.c- } > >>> linux-6.2.0-v2/fs/btrfs/volumes.c- > >>> linux-6.2.0-v2/fs/btrfs/volumes.c- ret = btrfs_end_transaction(trans); > >>> -- > >>> linux-6.2.0-v3/fs/btrfs/volumes.c: > >>> btrfs_sysfs_update_sprout_fsid(fs_devices); > >>> linux-6.2.0-v3/fs/btrfs/volumes.c- } > >>> linux-6.2.0-v3/fs/btrfs/volumes.c- > >>> linux-6.2.0-v3/fs/btrfs/volumes.c- ret = btrfs_end_transaction(trans); > >>> $ > >>> > >>> $ btrfs fi usage /mnt/data > >>> Overall: > >>> Device size: 87.31TiB > >>> Device allocated: 87.31TiB > >>> Device unallocated: 1.94GiB > >>> Device missing: 0.00B > >>> Device slack: 0.00B > >>> Used: 87.08TiB > >>> Free (estimated): 173.29GiB (min: 172.33GiB) > >>> Free (statfs, df): 171.84GiB > >>> Data ratio: 1.34 > >>> Metadata ratio: 4.00 > >>> Global reserve: 512.00MiB (used: 371.25MiB) > >>> Multiple profiles: no > >>> > >>> Data,RAID6: Size:64.76TiB, Used:64.59TiB (99.74%) > >>> /dev/sdc 10.90TiB > >>> /dev/sdf 10.90TiB > >>> /dev/sda 10.86TiB > >>> /dev/sdg 10.87TiB > >>> /dev/sdh 10.86TiB > >>> /dev/sdd 10.87TiB > >>> /dev/sde 10.88TiB > >>> /dev/sdb 10.88TiB > >>> > >>> Metadata,RAID1C4: Size:77.79GiB, Used:77.11GiB (99.12%) > >>> /dev/sdc 15.33GiB > >>> /dev/sdf 18.41GiB > >>> /dev/sda 49.63GiB > >>> /dev/sdg 49.50GiB > >>> /dev/sdh 51.52GiB > >>> /dev/sdd 48.70GiB > >>> /dev/sde 39.09GiB > >>> /dev/sdb 39.01GiB > >>> > >>> System,RAID1C4: Size:37.00MiB, Used:5.11MiB (13.81%) > >>> /dev/sdc 1.00MiB > >>> /dev/sda 37.00MiB > >>> /dev/sdg 37.00MiB > >>> /dev/sdh 36.00MiB > >>> /dev/sdd 37.00MiB > >>> > >>> Unallocated: > >>> /dev/sdc 1.00MiB > >>> /dev/sdf 1.00MiB > >>> /dev/sda 1.27GiB > >>> /dev/sdg 1.00MiB > >>> /dev/sdh 1.00MiB > >>> /dev/sdd 687.00MiB > >>> /dev/sde 1.00MiB > >>> /dev/sdb 1.00MiB > >>> $ > >>> > >>> > >>> This first attempt generated the following syslog output: > >>> kernel: [ 868.435387] BTRFS info (device sde): using crc32c > >>> (crc32c-intel) checksum algorithm > >>> kernel: [ 868.435407] BTRFS info (device sde): disk space caching is enabled > >>> kernel: [ 874.477712] BTRFS info (device sde): bdev /dev/sdg errs: wr > >>> 0, rd 0, flush 0, corrupt 845, gen 0 > >>> kernel: [ 874.477727] BTRFS info (device sde): bdev /dev/sdc errs: wr > >>> 41089, rd 1556, flush 0, corrupt 0, gen 0 > >>> kernel: [ 874.477735] BTRFS info (device sde): bdev /dev/sdj errs: wr > >>> 3, rd 7, flush 0, corrupt 0, gen 0 > >>> kernel: [ 874.477740] BTRFS info (device sde): bdev /dev/sdf errs: wr > >>> 41, rd 0, flush 0, corrupt 0, gen 0 > >>> kernel: [ 1082.645551] BTRFS info (device sde): balance: resume skipped > >>> kernel: [ 1082.645564] BTRFS info (device sde): checking UUID tree > >>> kernel: [ 1082.645551] BTRFS info (device sde): balance: resume skipped > >>> kernel: [ 1082.645564] BTRFS info (device sde): checking UUID tree > >>> kernel: [ 1267.280506] BTRFS: Transaction aborted (error -28) > >>> kernel: [ 1267.280553] BTRFS: error (device sde: state A) in > >>> do_free_extent_accounting:2847: errno=-28 No space left > >>> kernel: [ 1267.280604] BTRFS info (device sde: state EA): forced readonly > >>> kernel: [ 1267.280610] BTRFS error (device sde: state EA): failed to > >>> run delayed ref for logical 102255404044288 num_bytes 294912 type 184 > >>> action 2 ref_mod 1: -28 > >>> kernel: [ 1267.280584] WARNING: CPU: 3 PID: 14519 at > >>> fs/btrfs/extent-tree.c:2847 do_free_extent_accounting+0x21a/0x220 > >>> [btrfs] > >>> kernel: [ 1267.280666] BTRFS: error (device sde: state EA) in > >>> btrfs_run_delayed_refs:2151: errno=-28 No space left > >>> kernel: [ 1267.280695] BTRFS warning (device sde: state EA): > >>> btrfs_uuid_scan_kthread failed -5 > >>> kernel: [ 1267.280794] Modules linked in: xt_nat xt_tcpudp veth > >>> xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink > >>> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo > >>> xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc > >>> ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) binfmt_misc > >>> nls_iso8859_1 intel_rapl_msr intel_rapl_common edac_mce_amd > >>> snd_hda_codec_realtek kvm_amd snd_hda_codec_generic ledtrig_audio kvm > >>> snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi > >>> snd_hda_codec irqbypass snd_hda_core snd_hwdep rapl snd_pcm snd_timer > >>> wmi_bmof k10temp snd ccp soundcore input_leds mac_hid dm_multipath > >>> scsi_dh_rdac scsi_dh_emc scsi_dh_alua bonding tls efi_pstore msr nfsd > >>> auth_rpcgss nfs_acl lockd grace sunrpc dmi_sysfs ip_tables x_tables > >>> autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov > >>> async_memcpy async_pq async_xor async_txxor raid6_pq libcrc32c raid1 > >>> raid0 multipath linear hid_generic usbhid hid amdgpu uas usb_storage > >>> kernel: [ 1267.280994] CPU: 3 PID: 14519 Comm: btrfs-transacti > >>> Tainted: G W O 6.2.0-23-generic #23+btrfix > >>> kernel: [ 1267.281005] RIP: 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] > >>> kernel: [ 1267.281181] __btrfs_free_extent+0x6bc/0xf50 [btrfs] > >>> kernel: [ 1267.281310] run_delayed_data_ref+0x8b/0x180 [btrfs] > >>> kernel: [ 1267.281444] btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] > >>> kernel: [ 1267.281570] __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] > >>> kernel: [ 1267.281694] btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] > >>> kernel: [ 1267.281818] btrfs_start_dirty_block_groups+0x36b/0x530 [btrfs] > >>> kernel: [ 1267.281976] btrfs_commit_transaction+0xb3/0xbc0 [btrfs] > >>> kernel: [ 1267.282110] ? start_transaction+0xc8/0x600 [btrfs] > >>> kernel: [ 1267.282244] transaction_kthread+0x14b/0x1c0 [btrfs] > >>> kernel: [ 1267.282375] ? __pfx_transaction_kthread+0x10/0x10 [btrfs] > >>> kernel: [ 1267.282548] BTRFS info (device sde: state EA): dumping space info: > >>> kernel: [ 1267.282552] BTRFS info (device sde: state EA): space_info > >>> DATA has 160777674752 free, is not full > >>> kernel: [ 1267.282558] BTRFS info (device sde: state EA): space_info > >>> total=71201958395904, used=71018191273984, pinned=22985908224, > >>> reserved=0, may_use=0, readonly=3538944 zone_unusable=0 > >>> kernel: [ 1267.282566] BTRFS info (device sde: state EA): space_info > >>> METADATA has -124944384 free, is full > >>> kernel: [ 1267.282571] BTRFS info (device sde: state EA): space_info > >>> total=83530612736, used=82791497728, pinned=242745344, > >>> reserved=496369664, may_use=124944384, readonly=0 zone_unusable=0 > >>> kernel: [ 1267.282577] BTRFS info (device sde: state EA): space_info > >>> SYSTEM has 33439744 free, is not full > >>> kernel: [ 1267.282582] BTRFS info (device sde: state EA): space_info > >>> total=38797312, used=5357568, pinned=0, reserved=0, may_use=0, > >>> readonly=0 zone_unusable=0 > >>> kernel: [ 1267.282588] BTRFS info (device sde: state EA): > >>> global_block_rsv: size 536870912 reserved 124944384 > >>> kernel: [ 1267.282592] BTRFS info (device sde: state EA): > >>> trans_block_rsv: size 0 reserved 0 > >>> kernel: [ 1267.282595] BTRFS info (device sde: state EA): > >>> chunk_block_rsv: size 0 reserved 0 > >>> kernel: [ 1267.282599] BTRFS info (device sde: state EA): > >>> delayed_block_rsv: size 0 reserved 0 > >>> kernel: [ 1267.282602] BTRFS info (device sde: state EA): > >>> delayed_refs_rsv: size 251322957824 reserved 0 > >>> kernel: [ 1267.282608] BTRFS: error (device sde: state EA) in > >>> do_free_extent_accounting:2847: errno=-28 No space left > >>> kernel: [ 1267.282653] BTRFS error (device sde: state EA): failed to > >>> run delayed ref for logical 102255401897984 num_bytes 126976 type 184 > >>> action 2 ref_mod 1: -28 > >>> kernel: [ 1267.282708] BTRFS: error (device sde: state EA) in > >>> btrfs_run_delayed_refs:2151: errno=-28 No space left > >>> > >>> A couple of kernel recompiles later, the second attempt on the SSD > >>> generated similar: > >>> kernel: [ 1472.203470] BTRFS info (device sdc): using crc32c > >>> (crc32c-intel) checksum algorithm > >>> kernel: [ 1472.203491] BTRFS info (device sdc): disk space caching is enabled > >>> kernel: [ 1478.155004] BTRFS info (device sdc): bdev /dev/sdf errs: wr > >>> 0, rd 0, flush 0, corrupt 845, gen 0 > >>> kernel: [ 1478.155022] BTRFS info (device sdc): bdev /dev/sda errs: wr > >>> 41089, rd 1556, flush 0, corrupt 0, gen 0 > >>> kernel: [ 1478.155034] BTRFS info (device sdc): bdev /dev/sdh errs: wr > >>> 3, rd 7, flush 0, corrupt 0, gen 0 > >>> kernel: [ 1478.155041] BTRFS info (device sdc): bdev /dev/sdd errs: wr > >>> 41, rd 0, flush 0, corrupt 0, gen 0 > >>> kernel: [ 1696.662526] BTRFS info (device sdc): balance: resume skipped > >>> kernel: [ 1696.662537] BTRFS info (device sdc): checking UUID tree > >>> kernel: [ 1919.452464] BTRFS: Transaction aborted (error -28) > >>> kernel: [ 1919.452534] WARNING: CPU: 1 PID: 161 at > >>> fs/btrfs/extent-tree.c:2847 do_free_extent_accounting+0x21a/0x220 > >>> [btrfs] > >>> kernel: [ 1919.452655] Modules linked in: xt_nat xt_tcpudp veth > >>> xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink > >>> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo > >>> xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc > >>> ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) binfmt_misc > >>> nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic > >>> ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg > >>> snd_intel_sdw_acpi snd_hda_codec intel_rapl_msr snd_hda_core > >>> intel_rapl_common edac_mce_amd snd_hwdep kvm_amd snd_pcm snd_timer kvm > >>> irqbypass rapl wmi_bmof snd k10temp soundcore ccp input_leds mac_hid > >>> dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua bonding tls nfsd > >>> msr auth_rpcgss efi_pstore nfs_acl lockd grace sunrpc dmi_sysfs > >>> ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 > >>> async_raid6_recov async_memcpy async_pq async_xor async_tx xor > >>> raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid > >>> amdgpu uas hid iommu_v2 > >>> kernel: [ 1919.452839] Workqueue: events_unbound > >>> btrfs_async_reclaim_metadata_space [btrfs] > >>> kernel: [ 1919.452985] RIP: 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] > >>> kernel: [ 1919.453141] __btrfs_free_extent+0x6bc/0xf50 [btrfs] > >>> kernel: [ 1919.453256] run_delayed_data_ref+0x8b/0x180 [btrfs] > >>> kernel: [ 1919.453368] btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] > >>> kernel: [ 1919.453480] __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] > >>> kernel: [ 1919.453592] btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] > >>> kernel: [ 1919.453703] flush_space+0x23c/0x2c0 [btrfs] > >>> kernel: [ 1919.453845] btrfs_async_reclaim_metadata_space+0x19b/0x2b0 [btrfs] > >>> kernel: [ 1919.454034] BTRFS info (device sdc: state A): dumping space info: > >>> kernel: [ 1919.454038] BTRFS info (device sdc: state A): space_info > >>> DATA has 160778723328 free, is not full > >>> kernel: [ 1919.454043] BTRFS info (device sdc: state A): space_info > >>> total=71201958395904, used=71017442181120, pinned=23733952512, > >>> reserved=0, may_use=0, readonly=3538944 zone_unusable=0 > >>> kernel: [ 1919.454050] BTRFS info (device sdc: state A): space_info > >>> METADATA has -147570688 free, is full > >>> kernel: [ 1919.454054] BTRFS info (device sdc: state A): space_info > >>> total=83530612736, used=82792185856, pinned=238059520, > >>> reserved=500367360, may_use=147570688, readonly=0 zone_unusable=0 > >>> kernel: [ 1919.454060] BTRFS info (device sdc: state A): space_info > >>> SYSTEM has 33439744 free, is not full > >>> kernel: [ 1919.454064] BTRFS info (device sdc: state A): space_info > >>> total=38797312, used=5357568, pinned=0, reserved=0, may_use=0, > >>> readonly=0 zone_unusable=0 > >>> kernel: [ 1919.454070] BTRFS info (device sdc: state A): > >>> global_block_rsv: size 536870912 reserved 147570688 > >>> kernel: [ 1919.454074] BTRFS info (device sdc: state A): > >>> trans_block_rsv: size 0 reserved 0 > >>> kernel: [ 1919.454077] BTRFS info (device sdc: state A): > >>> chunk_block_rsv: size 0 reserved 0 > >>> kernel: [ 1919.454080] BTRFS info (device sdc: state A): > >>> delayed_block_rsv: size 0 reserved 0 > >>> kernel: [ 1919.454083] BTRFS info (device sdc: state A): > >>> delayed_refs_rsv: size 254292787200 reserved 0 > >>> kernel: [ 1919.454086] BTRFS: error (device sdc: state A) in > >>> do_free_extent_accounting:2847: errno=-28 No space left > >>> kernel: [ 1919.454123] BTRFS info (device sdc: state EA): forced readonly > >>> kernel: [ 1919.454127] BTRFS error (device sdc: state EA): failed to > >>> run delayed ref for logical 102538713931776 num_bytes 245760 type 184 > >>> action 2 ref_mod 1: -28 > >>> kernel: [ 1919.454176] BTRFS: error (device sdc: state EA) in > >>> btrfs_run_delayed_refs:2151: errno=-28 No space left > >>> kernel: [ 1919.454249] BTRFS warning (device sdc: state EA): > >>> btrfs_uuid_scan_kthread failed -5 > >>> kernel: [ 1919.472381] BTRFS: error (device sdc: state EA) in > >>> __btrfs_free_extent:3077: errno=-28 No space left > >>> kernel: [ 1919.472417] BTRFS error (device sdc: state EA): failed to > >>> run delayed ref for logical 102538732191744 num_bytes 245760 type 184 > >>> action 2 ref_mod 1: -28 > >>> kernel: [ 1919.472442] BTRFS: error (device sdc: state EA) in > >>> btrfs_run_delayed_refs:2151: errno=-28 No space left > >>> > >>> > >>> On Sat, 17 Jun 2023 at 15:00, Qu Wenruo <wqu@suse.com> wrote: > >>>> > >>>> > >>>> > >>>> On 2023/6/17 13:11, Stefan N wrote: > >>>>> Hi Qu, > >>>>> > >>>>> I believe I've got this environment ready, with the 6.2.0 kernel as > >>>>> before using the Ubuntu kernel, but can switch to vanilla if required. > >>>>> > >>>>> I've not done anything kernel modifications for a solid decade, so > >>>>> would be keen for a bit of guidance. > >>>> > >>>> Sure no problem. > >>>> > >>>> Please fetch the kernel source tar ball (6.2.x) first, decompress, then > >>>> apply the attached one-line patch by: > >>>> > >>>> $ tar czf linux*.tar.xz > >>>> $ cd linux* > >>>> $ patch -np1 -i <the patch file> > >>>> > >>>> Then use your running system kernel config if possible: > >>>> > >>>> $ cp /proc/config.gz . > >>>> $ gunzip config.gz > >>>> $ mv config .config > >>>> $ make olddefconfig > >>>> > >>>> Then you can start your kernel compiling, and considering you're using > >>>> your distro's default, it would include tons of drivers, thus would be > >>>> very slow. (Replace the number to something more suitable to your > >>>> system, using all CPU cores can be very hot) > >>>> > >>>> $ make -j12 > >>>> > >>>> Finally you need to install the modules/kernel. > >>>> > >>>> Unfortunately this is distro specific, but if you're using Ubuntu, it > >>>> may be much easier: > >>>> > >>>> $ make bindeb-pkg > >>>> > >>>> Then install the generated dpkg I guess? I have never tried kernel > >>>> building using deb/rpm, but only manual installation, which is also > >>>> distro dependent in the initramfs generation part. > >>>> > >>>> # cp arch/x86/boot/bzImage /boot/vmlinuz-custom > >>>> # make modules_install > >>>> # mkinitcpio -k /boot/vmlinuz-custom -g /boot/initramfs-custom.img > >>>> > >>>> > >>>> The last step is to update your bootloader to add the new kernel, which > >>>> is not only distro dependent but also bootloader dependent. > >>>> > >>>> In my case, I go with systemd-boot with manually crafted entries. > >>>> But if you go Ubuntu I believe just installing the kernel dpkg would > >>>> have everything handled? > >>>> > >>>> Finally you can try reboot into the newer kernel, and try device add > >>>> (need to add 4 disks), then sync and see if things work as expected. > >>>> > >>>> Thanks, > >>>> Qu > >>>>> > >>>>> I will recover a 1tb SSD and partition it into 4 in a USB enclosure, > >>>>> but failing this will use 4x loop devices. > >>>>> > >>>>> On Tue, 13 Jun 2023 at 11:28, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > >>>>>> In your particular case, since you're running RAID1C4 you need to add 4 > >>>>>> devices in one transaction. > >>>>>> > >>>>>> I can easily craft a patch to avoid commit transaction, but still you'll > >>>>>> need to add at least 4 disks, and then sync to see if things would work. > >>>>>> > >>>>>> Furthermore this means you need a liveCD with full kernel compiling > >>>>>> environment. > >>>>>> > >>>>>> If you want to go this path, I can send you the patch when you've > >>>>>> prepared the needed environment. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Out of space loop: skip_balance not working 2023-06-23 9:00 ` Stefan N @ 2023-06-23 9:46 ` Qu Wenruo 2023-06-24 15:29 ` Stefan N 0 siblings, 1 reply; 22+ messages in thread From: Qu Wenruo @ 2023-06-23 9:46 UTC (permalink / raw) To: Stefan N, Qu Wenruo; +Cc: linux-btrfs@vger.kernel.org On 2023/6/23 17:00, Stefan N wrote: > Apologies, I thought I included the log output too, though I can't see > any additional output > > From a fresh run, still using the same kernel > $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs > dev add -f /dev/sdl /dev/sdm /dev/sdn /dev/sdo /mnt/data ; sudo btrfs > fi sync /mnt/data > ERROR: error adding device '/dev/sdl': Input/output error > ERROR: error adding device '/dev/sdm': Read-only file system > ERROR: error adding device '/dev/sdn': Read-only file system > ERROR: error adding device '/dev/sdo': Read-only file system > ERROR: Could not sync filesystem: Read-only file system > $ > > Output from kern.log, syslog or dmesg -k > [...] None of the newly added debug lines triggered, so there is something else causing the problem. And furthermore the backtrace is not that helpful, it only shows it's some async metadata reclaim kthread causing the problem. Although I guess the async metadata reclaim is triggered by the btrfs_start_transaction() call when adding a device. So I updated my github branch to go btrfs_join_transaction() which would not flush any metadata, thus avoid the problem. Would you please give it a try again? > > However, now I started digging into logs to check I hadn't missed > where the errors were being logged, I've found this from roughly a > week before I started having issues, which I had not previously > noticed You don't need to bother most error messages after the fs flipped RO. As it's known to have some false alerts. Thanks, Qu > [ 1990.495861] BTRFS error (device sdh): failed to run delayed ref for > logical 107988943355904 num_bytes 245760 type 184 action 2 ref_mod 1: > -28 > [ 1990.518282] BTRFS error (device sdh): failed to run delayed ref for > logical 107989043494912 num_bytes 245760 type 184 action 2 ref_mod 1: > -28 > [ 620.104065] BTRFS error (device sdk): failed to run delayed ref for > logical 123187655077888 num_bytes 176128 type 184 action 2 ref_mod 1: > -28 > [ 620.126209] BTRFS error (device sdk): failed to run delayed ref for > logical 123190279929856 num_bytes 134217728 type 184 action 2 ref_mod > 1: -28 > [ 620.126241] BTRFS error (device sdk): failed to run delayed ref for > logical 123189970468864 num_bytes 134217728 type 184 action 2 ref_mod > 1: -28 > [ 620.126271] BTRFS error (device sdk): failed to run delayed ref for > logical 123190414409728 num_bytes 134217728 type 184 action 2 ref_mod > 1: -28 > [ 476.565308] BTRFS error (device sdh): failed to run delayed ref for > logical 101906434228224 num_bytes 651264 type 184 action 2 ref_mod 1: > -28 > [ 476.565932] BTRFS error (device sdh): failed to run delayed ref for > logical 101906434031616 num_bytes 180224 type 184 action 2 ref_mod 1: > -28 > [ 447.371754] BTRFS error (device sdh): failed to run delayed ref for > logical 101946151927808 num_bytes 262144 type 184 action 2 ref_mod 1: > -28 > [ 447.372362] BTRFS error (device sdh): failed to run delayed ref for > logical 101946083725312 num_bytes 245760 type 184 action 2 ref_mod 1: > -28 > [ 439.839007] BTRFS error (device sdj): failed to run delayed ref for > logical 101923102179328 num_bytes 192512 type 184 action 2 ref_mod 1: > -28 > [ 439.839578] BTRFS error (device sdj): failed to run delayed ref for > logical 101923401629696 num_bytes 245760 type 184 action 2 ref_mod 1: > -28 > [ 466.393884] BTRFS error (device sdh): failed to run delayed ref for > logical 101981116137472 num_bytes 245760 type 184 action 2 ref_mod 1: > -28 > [ 466.394451] BTRFS error (device sdh): failed to run delayed ref for > logical 101981122854912 num_bytes 1720320 type 184 action 2 ref_mod 1: > -28 > [ 431.541367] BTRFS error (device sdh): failed to run delayed ref for > logical 101876426952704 num_bytes 126976 type 184 action 2 ref_mod 1: > -28 > [ 431.542010] BTRFS error (device sdh): failed to run delayed ref for > logical 101876427780096 num_bytes 126976 type 184 action 2 ref_mod 1: > -28 > [ 597.487948] BTRFS error (device sdj): failed to run delayed ref for > logical 108127459409920 num_bytes 196608 type 184 action 2 ref_mod 1: > -28 > [ 597.488539] BTRFS error (device sdj): failed to run delayed ref for > logical 108124677865472 num_bytes 126976 type 184 action 2 ref_mod 1: > -28 > [ 534.717509] BTRFS error (device sdh): failed to run delayed ref for > logical 101958618710016 num_bytes 1597440 type 184 action 2 ref_mod 1: > -28 > [ 534.718494] BTRFS error (device sdh): failed to run delayed ref for > logical 101958756335616 num_bytes 368640 type 184 action 2 ref_mod 1: > -28 > [ 508.089394] BTRFS error (device sdk): failed to run delayed ref for > logical 101911627694080 num_bytes 126976 type 184 action 2 ref_mod 1: > -28 > [ 508.090007] BTRFS error (device sdk): failed to run delayed ref for > logical 101911627415552 num_bytes 126976 type 184 action 2 ref_mod 1: > -28 > [ 1632.112084] BTRFS error (device sdh): failed to run delayed ref for > logical 102203759886336 num_bytes 229376 type 184 action 2 ref_mod 1: > -28 > [ 1632.112885] BTRFS error (device sdh): failed to run delayed ref for > logical 102203764379648 num_bytes 126976 type 184 action 2 ref_mod 1: > -28 > > and today, when leaving the disks mounted read-only for a while, I > found many occurances similar to: > BTRFS error (device sdc: state EA): level verify failed on logical > 201329754554368 mirror 1 wanted 2 found 0 > BTRFS error (device sdc: state EA): level verify failed on logical > 201329754554368 mirror 2 wanted 2 found 0 > BTRFS error (device sdc: state EA): level verify failed on logical > 201329754554368 mirror 3 wanted 2 found 0 > BTRFS error (device sdc: state EA): level verify failed on logical > 201329754554368 mirror 4 wanted 2 found 0 > BTRFS error (device sdc: state EA): level verify failed on logical > 201329754554368 mirror 1 wanted 2 found 0 > BTRFS error (device sdc: state EA): level verify failed on logical > 201329754554368 mirror 2 wanted 2 found 0 > BTRFS error (device sdc: state EA): level verify failed on logical > 201329754554368 mirror 3 wanted 2 found 0 > BTRFS error (device sdc: state EA): level verify failed on logical > 201350830227456 mirror 4 wanted 2 found 0 > BTRFS error (device sdc: state EA): level verify failed on logical > 201350830227456 mirror 1 wanted 2 found 0 > BTRFS error (device sdc: state EA): level verify failed on logical > 201350830227456 mirror 2 wanted 2 found 0 > > On Fri, 23 Jun 2023 at 10:27, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >> >> >> >> On 2023/6/23 06:18, Stefan N wrote: >>> Hi Qu, >>> >>> I got one new line this time, but it doesn't seem to match your commit >>> ERROR: zoned: unable to stat /dev/loop/13 >> >> Please provide the dmesg of that attempt, as all the extra debug info is >> inside dmesg. >> >> With that info provided, we can determine what to do next. >> >> Thanks, >> Qu >> >>> >>> I tried it on the USB flash drives too and didn't get any extra line >>> >>> In context >>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs >>> dev add -K -f /dev/loop12 /dev/loop/13 /dev/loop14 /dev/loop15 >>> /mnt/data ; sudo btrfs fi sync /mnt/data >>> ERROR: error adding device '/dev/loop12': Input/output error >>> ERROR: zoned: unable to stat /dev/loop/13 >>> ERROR: checking status of /dev/loop/13: No such file or directory >>> ERROR: error adding device '/dev/loop14': Read-only file system >>> ERROR: error adding device '/dev/loop15': Read-only file system >>> ERROR: Could not sync filesystem: Read-only file system >>> $ >>> >>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs >>> dev add -f /dev/sdl /dev/sdm /dev/sdn /dev/sdo /mnt/data ; sudo btrfs >>> fi sync /mnt/data >>> ERROR: error adding device '/dev/sdl': Input/output error >>> ERROR: error adding device '/dev/sdm': Read-only file system >>> ERROR: error adding device '/dev/sdn': Read-only file system >>> ERROR: error adding device '/dev/sdo': Read-only file system >>> ERROR: Could not sync filesystem: Read-only file system >>> $ >>> >>> On Thu, 22 Jun 2023 at 18:48, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >>>> >>>> >>>> >>>> On 2023/6/22 16:33, Stefan N wrote: >>>>> Hi Qu, >>>>> >>>>> Many thanks for the detailed instructions and your patience. I got it >>>>> working combined with >>>>> https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel on the main system >>>>> OS instead, tagged +btrfix >>>>> $ uname -vr >>>>> 6.2.0-23-generic #23+btrfix SMP PREEMPT_DYNAMIC Thu Jun 22 >>>>> >>>>> However, I've not had luck with the commands suggested, and would >>>>> appreciate any further ideas. >>>>> >>>>> Outputs follow below, with /mnt/data as the btrfs mount point that >>>>> currently contains 8x disks sd[a-j] with an additional 4x 64gb USB >>>>> flash drives being added sd[l-o] >>>>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs >>>>> dev add -f /dev/sdl /dev/sdm /dev/sdn /dev/sdo /mnt/data ; sudo btrfs >>>>> fi sync /mnt/data >>>>> ERROR: error adding device '/dev/sdl': Input/output error >>>>> ERROR: error adding device '/dev/sdm': Read-only file system >>>>> ERROR: error adding device '/dev/sdn': Read-only file system >>>>> ERROR: error adding device '/dev/sdo': Read-only file system >>>>> ERROR: Could not sync filesystem: Read-only file system >>>>> $ >>>>> >>>>> The same occurs if I try to add 4x 100mb loop devices (on a ssd so >>>>> they're super quick to zero); >>>>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs >>>>> dev add -K -f /dev/loop16 /dev/loop17 /dev/loop18 /dev/loop19 >>>>> /mnt/data ; sudo btrfs fi sync /mnt/data >>>>> ERROR: error adding device '/dev/loop16': Input/output error >>>> >>>> This is the interesting part, this means we're erroring out due to -EIO >>>> (not -ENOSPC) during the first device add. >>>> >>>> And by somehow, after the first device add, we already got the trans abort. >>>> >>>> Would you please try the following branch? >>>> >>>> https://github.com/adam900710/linux/tree/dev_add_no_commit >>>> >>>> It has not only the patch to skip the commit, but also extra debug >>>> output for the situation. >>>> >>>> Thanks, >>>> Qu >>>> >>>>> ERROR: error adding device '/dev/loop17': Read-only file system >>>>> ERROR: error adding device '/dev/loop18': Read-only file system >>>>> ERROR: error adding device '/dev/loop19': Read-only file system >>>>> ERROR: Could not sync filesystem: Read-only file system >>>>> $ >>>>> >>>>> I confirmed before both these kernel builds that the replaced line was >>>>> btrfs_end_transaction rather than btrfs_commit_transaction (anyone >>>>> else following, I needed to remove the -n in the patch command >>>>> earlier) >>>>> $ grep -A3 -ri btrfs_sysfs_update_sprout */fs/btrfs/volumes.c* >>>>> linux-6.2.0-dist/fs/btrfs/volumes.c: >>>>> btrfs_sysfs_update_sprout_fsid(fs_devices); >>>>> linux-6.2.0-dist/fs/btrfs/volumes.c- } >>>>> linux-6.2.0-dist/fs/btrfs/volumes.c- >>>>> linux-6.2.0-dist/fs/btrfs/volumes.c- ret = btrfs_commit_transaction(trans); >>>>> -- >>>>> linux-6.2.0-v2/fs/btrfs/volumes.c: >>>>> btrfs_sysfs_update_sprout_fsid(fs_devices); >>>>> linux-6.2.0-v2/fs/btrfs/volumes.c- } >>>>> linux-6.2.0-v2/fs/btrfs/volumes.c- >>>>> linux-6.2.0-v2/fs/btrfs/volumes.c- ret = btrfs_end_transaction(trans); >>>>> -- >>>>> linux-6.2.0-v3/fs/btrfs/volumes.c: >>>>> btrfs_sysfs_update_sprout_fsid(fs_devices); >>>>> linux-6.2.0-v3/fs/btrfs/volumes.c- } >>>>> linux-6.2.0-v3/fs/btrfs/volumes.c- >>>>> linux-6.2.0-v3/fs/btrfs/volumes.c- ret = btrfs_end_transaction(trans); >>>>> $ >>>>> >>>>> $ btrfs fi usage /mnt/data >>>>> Overall: >>>>> Device size: 87.31TiB >>>>> Device allocated: 87.31TiB >>>>> Device unallocated: 1.94GiB >>>>> Device missing: 0.00B >>>>> Device slack: 0.00B >>>>> Used: 87.08TiB >>>>> Free (estimated): 173.29GiB (min: 172.33GiB) >>>>> Free (statfs, df): 171.84GiB >>>>> Data ratio: 1.34 >>>>> Metadata ratio: 4.00 >>>>> Global reserve: 512.00MiB (used: 371.25MiB) >>>>> Multiple profiles: no >>>>> >>>>> Data,RAID6: Size:64.76TiB, Used:64.59TiB (99.74%) >>>>> /dev/sdc 10.90TiB >>>>> /dev/sdf 10.90TiB >>>>> /dev/sda 10.86TiB >>>>> /dev/sdg 10.87TiB >>>>> /dev/sdh 10.86TiB >>>>> /dev/sdd 10.87TiB >>>>> /dev/sde 10.88TiB >>>>> /dev/sdb 10.88TiB >>>>> >>>>> Metadata,RAID1C4: Size:77.79GiB, Used:77.11GiB (99.12%) >>>>> /dev/sdc 15.33GiB >>>>> /dev/sdf 18.41GiB >>>>> /dev/sda 49.63GiB >>>>> /dev/sdg 49.50GiB >>>>> /dev/sdh 51.52GiB >>>>> /dev/sdd 48.70GiB >>>>> /dev/sde 39.09GiB >>>>> /dev/sdb 39.01GiB >>>>> >>>>> System,RAID1C4: Size:37.00MiB, Used:5.11MiB (13.81%) >>>>> /dev/sdc 1.00MiB >>>>> /dev/sda 37.00MiB >>>>> /dev/sdg 37.00MiB >>>>> /dev/sdh 36.00MiB >>>>> /dev/sdd 37.00MiB >>>>> >>>>> Unallocated: >>>>> /dev/sdc 1.00MiB >>>>> /dev/sdf 1.00MiB >>>>> /dev/sda 1.27GiB >>>>> /dev/sdg 1.00MiB >>>>> /dev/sdh 1.00MiB >>>>> /dev/sdd 687.00MiB >>>>> /dev/sde 1.00MiB >>>>> /dev/sdb 1.00MiB >>>>> $ >>>>> >>>>> >>>>> This first attempt generated the following syslog output: >>>>> kernel: [ 868.435387] BTRFS info (device sde): using crc32c >>>>> (crc32c-intel) checksum algorithm >>>>> kernel: [ 868.435407] BTRFS info (device sde): disk space caching is enabled >>>>> kernel: [ 874.477712] BTRFS info (device sde): bdev /dev/sdg errs: wr >>>>> 0, rd 0, flush 0, corrupt 845, gen 0 >>>>> kernel: [ 874.477727] BTRFS info (device sde): bdev /dev/sdc errs: wr >>>>> 41089, rd 1556, flush 0, corrupt 0, gen 0 >>>>> kernel: [ 874.477735] BTRFS info (device sde): bdev /dev/sdj errs: wr >>>>> 3, rd 7, flush 0, corrupt 0, gen 0 >>>>> kernel: [ 874.477740] BTRFS info (device sde): bdev /dev/sdf errs: wr >>>>> 41, rd 0, flush 0, corrupt 0, gen 0 >>>>> kernel: [ 1082.645551] BTRFS info (device sde): balance: resume skipped >>>>> kernel: [ 1082.645564] BTRFS info (device sde): checking UUID tree >>>>> kernel: [ 1082.645551] BTRFS info (device sde): balance: resume skipped >>>>> kernel: [ 1082.645564] BTRFS info (device sde): checking UUID tree >>>>> kernel: [ 1267.280506] BTRFS: Transaction aborted (error -28) >>>>> kernel: [ 1267.280553] BTRFS: error (device sde: state A) in >>>>> do_free_extent_accounting:2847: errno=-28 No space left >>>>> kernel: [ 1267.280604] BTRFS info (device sde: state EA): forced readonly >>>>> kernel: [ 1267.280610] BTRFS error (device sde: state EA): failed to >>>>> run delayed ref for logical 102255404044288 num_bytes 294912 type 184 >>>>> action 2 ref_mod 1: -28 >>>>> kernel: [ 1267.280584] WARNING: CPU: 3 PID: 14519 at >>>>> fs/btrfs/extent-tree.c:2847 do_free_extent_accounting+0x21a/0x220 >>>>> [btrfs] >>>>> kernel: [ 1267.280666] BTRFS: error (device sde: state EA) in >>>>> btrfs_run_delayed_refs:2151: errno=-28 No space left >>>>> kernel: [ 1267.280695] BTRFS warning (device sde: state EA): >>>>> btrfs_uuid_scan_kthread failed -5 >>>>> kernel: [ 1267.280794] Modules linked in: xt_nat xt_tcpudp veth >>>>> xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink >>>>> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo >>>>> xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc >>>>> ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) binfmt_misc >>>>> nls_iso8859_1 intel_rapl_msr intel_rapl_common edac_mce_amd >>>>> snd_hda_codec_realtek kvm_amd snd_hda_codec_generic ledtrig_audio kvm >>>>> snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi >>>>> snd_hda_codec irqbypass snd_hda_core snd_hwdep rapl snd_pcm snd_timer >>>>> wmi_bmof k10temp snd ccp soundcore input_leds mac_hid dm_multipath >>>>> scsi_dh_rdac scsi_dh_emc scsi_dh_alua bonding tls efi_pstore msr nfsd >>>>> auth_rpcgss nfs_acl lockd grace sunrpc dmi_sysfs ip_tables x_tables >>>>> autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov >>>>> async_memcpy async_pq async_xor async_txxor raid6_pq libcrc32c raid1 >>>>> raid0 multipath linear hid_generic usbhid hid amdgpu uas usb_storage >>>>> kernel: [ 1267.280994] CPU: 3 PID: 14519 Comm: btrfs-transacti >>>>> Tainted: G W O 6.2.0-23-generic #23+btrfix >>>>> kernel: [ 1267.281005] RIP: 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] >>>>> kernel: [ 1267.281181] __btrfs_free_extent+0x6bc/0xf50 [btrfs] >>>>> kernel: [ 1267.281310] run_delayed_data_ref+0x8b/0x180 [btrfs] >>>>> kernel: [ 1267.281444] btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] >>>>> kernel: [ 1267.281570] __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] >>>>> kernel: [ 1267.281694] btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] >>>>> kernel: [ 1267.281818] btrfs_start_dirty_block_groups+0x36b/0x530 [btrfs] >>>>> kernel: [ 1267.281976] btrfs_commit_transaction+0xb3/0xbc0 [btrfs] >>>>> kernel: [ 1267.282110] ? start_transaction+0xc8/0x600 [btrfs] >>>>> kernel: [ 1267.282244] transaction_kthread+0x14b/0x1c0 [btrfs] >>>>> kernel: [ 1267.282375] ? __pfx_transaction_kthread+0x10/0x10 [btrfs] >>>>> kernel: [ 1267.282548] BTRFS info (device sde: state EA): dumping space info: >>>>> kernel: [ 1267.282552] BTRFS info (device sde: state EA): space_info >>>>> DATA has 160777674752 free, is not full >>>>> kernel: [ 1267.282558] BTRFS info (device sde: state EA): space_info >>>>> total=71201958395904, used=71018191273984, pinned=22985908224, >>>>> reserved=0, may_use=0, readonly=3538944 zone_unusable=0 >>>>> kernel: [ 1267.282566] BTRFS info (device sde: state EA): space_info >>>>> METADATA has -124944384 free, is full >>>>> kernel: [ 1267.282571] BTRFS info (device sde: state EA): space_info >>>>> total=83530612736, used=82791497728, pinned=242745344, >>>>> reserved=496369664, may_use=124944384, readonly=0 zone_unusable=0 >>>>> kernel: [ 1267.282577] BTRFS info (device sde: state EA): space_info >>>>> SYSTEM has 33439744 free, is not full >>>>> kernel: [ 1267.282582] BTRFS info (device sde: state EA): space_info >>>>> total=38797312, used=5357568, pinned=0, reserved=0, may_use=0, >>>>> readonly=0 zone_unusable=0 >>>>> kernel: [ 1267.282588] BTRFS info (device sde: state EA): >>>>> global_block_rsv: size 536870912 reserved 124944384 >>>>> kernel: [ 1267.282592] BTRFS info (device sde: state EA): >>>>> trans_block_rsv: size 0 reserved 0 >>>>> kernel: [ 1267.282595] BTRFS info (device sde: state EA): >>>>> chunk_block_rsv: size 0 reserved 0 >>>>> kernel: [ 1267.282599] BTRFS info (device sde: state EA): >>>>> delayed_block_rsv: size 0 reserved 0 >>>>> kernel: [ 1267.282602] BTRFS info (device sde: state EA): >>>>> delayed_refs_rsv: size 251322957824 reserved 0 >>>>> kernel: [ 1267.282608] BTRFS: error (device sde: state EA) in >>>>> do_free_extent_accounting:2847: errno=-28 No space left >>>>> kernel: [ 1267.282653] BTRFS error (device sde: state EA): failed to >>>>> run delayed ref for logical 102255401897984 num_bytes 126976 type 184 >>>>> action 2 ref_mod 1: -28 >>>>> kernel: [ 1267.282708] BTRFS: error (device sde: state EA) in >>>>> btrfs_run_delayed_refs:2151: errno=-28 No space left >>>>> >>>>> A couple of kernel recompiles later, the second attempt on the SSD >>>>> generated similar: >>>>> kernel: [ 1472.203470] BTRFS info (device sdc): using crc32c >>>>> (crc32c-intel) checksum algorithm >>>>> kernel: [ 1472.203491] BTRFS info (device sdc): disk space caching is enabled >>>>> kernel: [ 1478.155004] BTRFS info (device sdc): bdev /dev/sdf errs: wr >>>>> 0, rd 0, flush 0, corrupt 845, gen 0 >>>>> kernel: [ 1478.155022] BTRFS info (device sdc): bdev /dev/sda errs: wr >>>>> 41089, rd 1556, flush 0, corrupt 0, gen 0 >>>>> kernel: [ 1478.155034] BTRFS info (device sdc): bdev /dev/sdh errs: wr >>>>> 3, rd 7, flush 0, corrupt 0, gen 0 >>>>> kernel: [ 1478.155041] BTRFS info (device sdc): bdev /dev/sdd errs: wr >>>>> 41, rd 0, flush 0, corrupt 0, gen 0 >>>>> kernel: [ 1696.662526] BTRFS info (device sdc): balance: resume skipped >>>>> kernel: [ 1696.662537] BTRFS info (device sdc): checking UUID tree >>>>> kernel: [ 1919.452464] BTRFS: Transaction aborted (error -28) >>>>> kernel: [ 1919.452534] WARNING: CPU: 1 PID: 161 at >>>>> fs/btrfs/extent-tree.c:2847 do_free_extent_accounting+0x21a/0x220 >>>>> [btrfs] >>>>> kernel: [ 1919.452655] Modules linked in: xt_nat xt_tcpudp veth >>>>> xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink >>>>> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo >>>>> xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc >>>>> ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) binfmt_misc >>>>> nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic >>>>> ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg >>>>> snd_intel_sdw_acpi snd_hda_codec intel_rapl_msr snd_hda_core >>>>> intel_rapl_common edac_mce_amd snd_hwdep kvm_amd snd_pcm snd_timer kvm >>>>> irqbypass rapl wmi_bmof snd k10temp soundcore ccp input_leds mac_hid >>>>> dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua bonding tls nfsd >>>>> msr auth_rpcgss efi_pstore nfs_acl lockd grace sunrpc dmi_sysfs >>>>> ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 >>>>> async_raid6_recov async_memcpy async_pq async_xor async_tx xor >>>>> raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid >>>>> amdgpu uas hid iommu_v2 >>>>> kernel: [ 1919.452839] Workqueue: events_unbound >>>>> btrfs_async_reclaim_metadata_space [btrfs] >>>>> kernel: [ 1919.452985] RIP: 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] >>>>> kernel: [ 1919.453141] __btrfs_free_extent+0x6bc/0xf50 [btrfs] >>>>> kernel: [ 1919.453256] run_delayed_data_ref+0x8b/0x180 [btrfs] >>>>> kernel: [ 1919.453368] btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] >>>>> kernel: [ 1919.453480] __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] >>>>> kernel: [ 1919.453592] btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] >>>>> kernel: [ 1919.453703] flush_space+0x23c/0x2c0 [btrfs] >>>>> kernel: [ 1919.453845] btrfs_async_reclaim_metadata_space+0x19b/0x2b0 [btrfs] >>>>> kernel: [ 1919.454034] BTRFS info (device sdc: state A): dumping space info: >>>>> kernel: [ 1919.454038] BTRFS info (device sdc: state A): space_info >>>>> DATA has 160778723328 free, is not full >>>>> kernel: [ 1919.454043] BTRFS info (device sdc: state A): space_info >>>>> total=71201958395904, used=71017442181120, pinned=23733952512, >>>>> reserved=0, may_use=0, readonly=3538944 zone_unusable=0 >>>>> kernel: [ 1919.454050] BTRFS info (device sdc: state A): space_info >>>>> METADATA has -147570688 free, is full >>>>> kernel: [ 1919.454054] BTRFS info (device sdc: state A): space_info >>>>> total=83530612736, used=82792185856, pinned=238059520, >>>>> reserved=500367360, may_use=147570688, readonly=0 zone_unusable=0 >>>>> kernel: [ 1919.454060] BTRFS info (device sdc: state A): space_info >>>>> SYSTEM has 33439744 free, is not full >>>>> kernel: [ 1919.454064] BTRFS info (device sdc: state A): space_info >>>>> total=38797312, used=5357568, pinned=0, reserved=0, may_use=0, >>>>> readonly=0 zone_unusable=0 >>>>> kernel: [ 1919.454070] BTRFS info (device sdc: state A): >>>>> global_block_rsv: size 536870912 reserved 147570688 >>>>> kernel: [ 1919.454074] BTRFS info (device sdc: state A): >>>>> trans_block_rsv: size 0 reserved 0 >>>>> kernel: [ 1919.454077] BTRFS info (device sdc: state A): >>>>> chunk_block_rsv: size 0 reserved 0 >>>>> kernel: [ 1919.454080] BTRFS info (device sdc: state A): >>>>> delayed_block_rsv: size 0 reserved 0 >>>>> kernel: [ 1919.454083] BTRFS info (device sdc: state A): >>>>> delayed_refs_rsv: size 254292787200 reserved 0 >>>>> kernel: [ 1919.454086] BTRFS: error (device sdc: state A) in >>>>> do_free_extent_accounting:2847: errno=-28 No space left >>>>> kernel: [ 1919.454123] BTRFS info (device sdc: state EA): forced readonly >>>>> kernel: [ 1919.454127] BTRFS error (device sdc: state EA): failed to >>>>> run delayed ref for logical 102538713931776 num_bytes 245760 type 184 >>>>> action 2 ref_mod 1: -28 >>>>> kernel: [ 1919.454176] BTRFS: error (device sdc: state EA) in >>>>> btrfs_run_delayed_refs:2151: errno=-28 No space left >>>>> kernel: [ 1919.454249] BTRFS warning (device sdc: state EA): >>>>> btrfs_uuid_scan_kthread failed -5 >>>>> kernel: [ 1919.472381] BTRFS: error (device sdc: state EA) in >>>>> __btrfs_free_extent:3077: errno=-28 No space left >>>>> kernel: [ 1919.472417] BTRFS error (device sdc: state EA): failed to >>>>> run delayed ref for logical 102538732191744 num_bytes 245760 type 184 >>>>> action 2 ref_mod 1: -28 >>>>> kernel: [ 1919.472442] BTRFS: error (device sdc: state EA) in >>>>> btrfs_run_delayed_refs:2151: errno=-28 No space left >>>>> >>>>> >>>>> On Sat, 17 Jun 2023 at 15:00, Qu Wenruo <wqu@suse.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On 2023/6/17 13:11, Stefan N wrote: >>>>>>> Hi Qu, >>>>>>> >>>>>>> I believe I've got this environment ready, with the 6.2.0 kernel as >>>>>>> before using the Ubuntu kernel, but can switch to vanilla if required. >>>>>>> >>>>>>> I've not done anything kernel modifications for a solid decade, so >>>>>>> would be keen for a bit of guidance. >>>>>> >>>>>> Sure no problem. >>>>>> >>>>>> Please fetch the kernel source tar ball (6.2.x) first, decompress, then >>>>>> apply the attached one-line patch by: >>>>>> >>>>>> $ tar czf linux*.tar.xz >>>>>> $ cd linux* >>>>>> $ patch -np1 -i <the patch file> >>>>>> >>>>>> Then use your running system kernel config if possible: >>>>>> >>>>>> $ cp /proc/config.gz . >>>>>> $ gunzip config.gz >>>>>> $ mv config .config >>>>>> $ make olddefconfig >>>>>> >>>>>> Then you can start your kernel compiling, and considering you're using >>>>>> your distro's default, it would include tons of drivers, thus would be >>>>>> very slow. (Replace the number to something more suitable to your >>>>>> system, using all CPU cores can be very hot) >>>>>> >>>>>> $ make -j12 >>>>>> >>>>>> Finally you need to install the modules/kernel. >>>>>> >>>>>> Unfortunately this is distro specific, but if you're using Ubuntu, it >>>>>> may be much easier: >>>>>> >>>>>> $ make bindeb-pkg >>>>>> >>>>>> Then install the generated dpkg I guess? I have never tried kernel >>>>>> building using deb/rpm, but only manual installation, which is also >>>>>> distro dependent in the initramfs generation part. >>>>>> >>>>>> # cp arch/x86/boot/bzImage /boot/vmlinuz-custom >>>>>> # make modules_install >>>>>> # mkinitcpio -k /boot/vmlinuz-custom -g /boot/initramfs-custom.img >>>>>> >>>>>> >>>>>> The last step is to update your bootloader to add the new kernel, which >>>>>> is not only distro dependent but also bootloader dependent. >>>>>> >>>>>> In my case, I go with systemd-boot with manually crafted entries. >>>>>> But if you go Ubuntu I believe just installing the kernel dpkg would >>>>>> have everything handled? >>>>>> >>>>>> Finally you can try reboot into the newer kernel, and try device add >>>>>> (need to add 4 disks), then sync and see if things work as expected. >>>>>> >>>>>> Thanks, >>>>>> Qu >>>>>>> >>>>>>> I will recover a 1tb SSD and partition it into 4 in a USB enclosure, >>>>>>> but failing this will use 4x loop devices. >>>>>>> >>>>>>> On Tue, 13 Jun 2023 at 11:28, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >>>>>>>> In your particular case, since you're running RAID1C4 you need to add 4 >>>>>>>> devices in one transaction. >>>>>>>> >>>>>>>> I can easily craft a patch to avoid commit transaction, but still you'll >>>>>>>> need to add at least 4 disks, and then sync to see if things would work. >>>>>>>> >>>>>>>> Furthermore this means you need a liveCD with full kernel compiling >>>>>>>> environment. >>>>>>>> >>>>>>>> If you want to go this path, I can send you the patch when you've >>>>>>>> prepared the needed environment. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Out of space loop: skip_balance not working 2023-06-23 9:46 ` Qu Wenruo @ 2023-06-24 15:29 ` Stefan N 2023-06-26 10:18 ` Qu Wenruo 0 siblings, 1 reply; 22+ messages in thread From: Stefan N @ 2023-06-24 15:29 UTC (permalink / raw) To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs@vger.kernel.org Whoops, I had left --dry-run on the first debug patch you commited, so that didn't run correctly. I've included the output from both patches, as they result in different output. Rerunning the older patch first, with loop devices (I tried both 4x100mb and 4x1gb) I get the following: $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs dev add -K -f /dev/loop16 /dev/loop17 /dev/loop18 /dev/loop19 /mnt/data ; sudo btrfs fi sync /mnt/data ERROR: error adding device '/dev/loop16': Input/output error ERROR: error adding device '/dev/loop17': Read-only file system ERROR: error adding device '/dev/loop18': Read-only file system ERROR: error adding device '/dev/loop19': Read-only file system ERROR: Could not sync filesystem: Read-only file system $ kernel: [ 299.846627] BTRFS info (device sdd): using crc32c (crc32c-intel) checksum algorithm kernel: [ 299.846648] BTRFS info (device sdd): disk space caching is enabled kernel: [ 304.864437] BTRFS info (device sdd): bdev /dev/sdh errs: wr 0, rd 0, flush 0, corrupt 845, gen 0 kernel: [ 304.864454] BTRFS info (device sdd): bdev /dev/sdb errs: wr 41089, rd 1556, flush 0, corrupt 0, gen 0 kernel: [ 304.864465] BTRFS info (device sdd): bdev /dev/sdi errs: wr 3, rd 7, flush 0, corrupt 0, gen 0 kernel: [ 304.864473] BTRFS info (device sdd): bdev /dev/sde errs: wr 41, rd 0, flush 0, corrupt 0, gen 0 kernel: [ 516.646032] BTRFS info (device sdd): balance: resume skipped kernel: [ 516.646046] BTRFS info (device sdd): checking UUID tree kernel: [ 722.307267] ------------[ cut here ]------------ kernel: [ 722.307274] BTRFS: Transaction aborted (error -28) kernel: [ 722.307352] WARNING: CPU: 3 PID: 3984 at fs/btrfs/extent-tree.c:2847 do_free_extent_accounting+0x21a/0x220 [btrfs] kernel: [ 722.307507] Modules linked in: ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) binfmt_misc nls_iso8859_1 intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec kvm snd_hda_core irqbypass snd_hwdep rapl snd_pcm wmi_bmof snd_timer k10temp snd soundcore ccp input_leds mac_hid dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua nfsd bonding auth_rpcgss nfs_acl tls lockd grace msr efi_pstore sunrpc dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid amdgpu uas usb_storage iommu_v2 drm_buddy gpu_sched drm_ttm_helper ttm drm_display_helper cec rc_core mpt3sas drm_kms_helper crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic syscopyarea sysfillrect ghash_clmulni_intel sha512_ssse3 kernel: [ 722.307642] sysimgblt aesni_intel nvme crypto_simd raid_class cryptd ahci drm i2c_piix4 scsi_transport_sas igb libahci xhci_pci nvme_core xhci_pci_renesas qlcnic dca nvme_common i2c_algo_bit video wmi kernel: [ 722.307677] CPU: 3 PID: 3984 Comm: kworker/u64:1 Tainted: G W O 6.2.0-23-generic #23+btrdebug1b kernel: [ 722.307685] Hardware name: To Be Filled By O.E.M. X570M Pro4/X570M Pro4, BIOS P3.70 02/23/2022 kernel: [ 722.307690] Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs] kernel: [ 722.307853] RIP: 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] kernel: [ 722.307977] Code: d3 0f 0b eb b8 44 89 e6 48 c7 c7 a0 49 88 c1 e8 2c c5 96 d3 0f 0b e9 78 ff ff ff 44 89 e6 48 c7 c7 a0 49 88 c1 e8 16 c5 96 d3 <0f> 0b eb b9 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 kernel: [ 722.307982] RSP: 0018:ffffbde605a4bb58 EFLAGS: 00010246 kernel: [ 722.307988] RAX: 0000000000000000 RBX: ffff9b276c9d9d68 RCX: 0000000000000000 kernel: [ 722.307992] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 kernel: [ 722.307995] RBP: ffffbde605a4bb80 R08: 0000000000000000 R09: 0000000000000000 kernel: [ 722.307998] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffe4 kernel: [ 722.308002] R13: 00005cd8260e0000 R14: 00000000001a0000 R15: ffff9b28dde0bcb0 kernel: [ 722.308006] FS: 0000000000000000(0000) GS:ffff9b2e20ac0000(0000) knlGS:0000000000000000 kernel: [ 722.308010] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: [ 722.308014] CR2: 00007f45b173a0e0 CR3: 0000000143f8e000 CR4: 00000000003506e0 kernel: [ 722.308018] Call Trace: kernel: [ 722.308022] <TASK> kernel: [ 722.308029] __btrfs_free_extent+0x6bc/0xf50 [btrfs] kernel: [ 722.308156] run_delayed_data_ref+0x8b/0x180 [btrfs] kernel: [ 722.308281] btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] kernel: [ 722.308406] __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] kernel: [ 722.308530] btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] kernel: [ 722.308653] flush_space+0x23c/0x2c0 [btrfs] kernel: [ 722.308812] btrfs_async_reclaim_metadata_space+0x19b/0x2b0 [btrfs] kernel: [ 722.308969] process_one_work+0x225/0x430 kernel: [ 722.308980] worker_thread+0x50/0x3e0 kernel: [ 722.308986] ? __pfx_worker_thread+0x10/0x10 kernel: [ 722.308992] kthread+0xe9/0x110 kernel: [ 722.309000] ? __pfx_kthread+0x10/0x10 kernel: [ 722.309008] ret_from_fork+0x2c/0x50 kernel: [ 722.309018] </TASK> kernel: [ 722.309020] ---[ end trace 0000000000000000 ]--- kernel: [ 722.309024] BTRFS info (device sdd: state A): dumping space info: kernel: [ 722.309029] BTRFS info (device sdd: state A): space_info DATA has 160778199040 free, is not full kernel: [ 722.309034] BTRFS info (device sdd: state A): space_info total=71201958395904, used=71018502324224, pinned=22674333696, reserved=0, may_use=0, readonly=3538944 zone_unusable=0 kernel: [ 722.309042] BTRFS info (device sdd: state A): space_info METADATA has -124960768 free, is full kernel: [ 722.309047] BTRFS info (device sdd: state A): space_info total=83530612736, used=82790334464, pinned=242483200, reserved=497795072, may_use=124960768, readonly=0 zone_unusable=0 kernel: [ 722.309054] BTRFS info (device sdd: state A): space_info SYSTEM has 33439744 free, is not full kernel: [ 722.309058] BTRFS info (device sdd: state A): space_info total=38797312, used=5357568, pinned=0, reserved=0, may_use=0, readonly=0 zone_unusable=0 kernel: [ 722.309065] BTRFS info (device sdd: state A): global_block_rsv: size 536870912 reserved 124960768 kernel: [ 722.309069] BTRFS info (device sdd: state A): trans_block_rsv: size 0 reserved 0 kernel: [ 722.309072] BTRFS info (device sdd: state A): chunk_block_rsv: size 0 reserved 0 kernel: [ 722.309076] BTRFS info (device sdd: state A): delayed_block_rsv: size 0 reserved 0 kernel: [ 722.309079] BTRFS info (device sdd: state A): delayed_refs_rsv: size 255294439424 reserved 0 kernel: [ 722.309083] BTRFS: error (device sdd: state A) in do_free_extent_accounting:2847: errno=-28 No space left kernel: [ 722.309128] BTRFS info (device sdd: state EA): forced readonly kernel: [ 722.309132] BTRFS error (device sdd: state EA): failed to run delayed ref for logical 102083421143040 num_bytes 1703936 type 184 action 2 ref_mod 1: -28 kernel: [ 722.309188] BTRFS: error (device sdd: state EA) in btrfs_run_delayed_refs:2151: errno=-28 No space left kernel: [ 722.309348] BTRFS warning (device sdd: state EA): btrfs_uuid_scan_kthread failed -5 kernel: [ 722.309350] BTRFS error (device sdd: state EA): failed to start trans: -5 kernel: [ 722.309585] BTRFS error (device sdd: state EA): failed to add disk /dev/loop16: -5 kernel: [ 722.324775] BTRFS: error (device sdd: state EA) in do_free_extent_accounting:2847: errno=-28 No space left kernel: [ 722.324825] BTRFS error (device sdd: state EA): failed to run delayed ref for logical 102084102217728 num_bytes 245760 type 184 action 2 ref_mod 1: -28 kernel: [ 722.324870] BTRFS: error (device sdd: state EA) in btrfs_run_delayed_refs:2151: errno=-28 No space left kernel: [ 722.329276] BTRFS error (device sdd: state EA): failed to add disk /dev/loop17: -30 kernel: [ 722.332629] BTRFS error (device sdd: state EA): failed to add disk /dev/loop18: -30 kernel: [ 722.336018] BTRFS error (device sdd: state EA): failed to add disk /dev/loop19: -30 *** The above is using the original patch as follows: $ diff fs/btrfs/ ../linux-6.2.0-dist/fs/btrfs/ diff fs/btrfs/ioctl.c ../linux-6.2.0-dist/fs/btrfs/ioctl.c 2656,2658d2655 < else < btrfs_err(fs_info, "failed to add disk %s: %d", < vol_args->name, ret); diff fs/btrfs/transaction.c ../linux-6.2.0-dist/fs/btrfs/transaction.c 1029d1028 < /* 1031d1029 < */ diff fs/btrfs/volumes.c ../linux-6.2.0-dist/fs/btrfs/volumes.c 2680d2679 < btrfs_err(fs_info, "failed to start trans: %d", ret); 2769d2767 < btrfs_err(fs_info, "failed to add dev item: %d", ret); 2787,2789c2785 < ret = btrfs_end_transaction(trans); < if (ret < 0) < btrfs_err(fs_info, "failed to end trans: %d", ret); --- > ret = btrfs_commit_transaction(trans); $ *** The below is using the newer patch as follows: $ diff fs/btrfs/ ../linux-6.2.0-dist/fs/btrfs/ diff fs/btrfs/ioctl.c ../linux-6.2.0-dist/fs/btrfs/ioctl.c 2656,2658d2655 < else < btrfs_err(fs_info, "failed to add disk %s: %d", < vol_args->name, ret); diff fs/btrfs/transaction.c ../linux-6.2.0-dist/fs/btrfs/transaction.c 1029d1028 < /* 1031d1029 < */ diff fs/btrfs/volumes.c ../linux-6.2.0-dist/fs/btrfs/volumes.c 2677c2677 < trans = btrfs_join_transaction(root); --- > trans = btrfs_start_transaction(root, 0); 2680d2679 < btrfs_err(fs_info, "failed to start trans: %d", ret); 2769d2767 < btrfs_err(fs_info, "failed to add dev item: %d", ret); 2787,2789c2785 < ret = btrfs_end_transaction(trans); < if (ret < 0) < btrfs_err(fs_info, "failed to end trans: %d", ret); --- > ret = btrfs_commit_transaction(trans); $ $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs dev add -K -f /dev/loop12 /dev/loop13 /dev/loop14 /dev/loop15 /mnt/data ; sudo btrfs fi sync /mnt/data ERROR: Could not sync filesystem: No space left on device $ kernel: [ 1811.846087] BTRFS info (device sdc): using crc32c (crc32c-intel) checksum algorithm kernel: [ 1811.846107] BTRFS info (device sdc): disk space caching is enabled kernel: [ 1817.852850] BTRFS info (device sdc): bdev /dev/sde errs: wr 0, rd 0, flush 0, corrupt 845, gen 0 kernel: [ 1817.852866] BTRFS info (device sdc): bdev /dev/sda errs: wr 41089, rd 1556, flush 0, corrupt 0, gen 0 kernel: [ 1817.852877] BTRFS info (device sdc): bdev /dev/sdh errs: wr 3, rd 7, flush 0, corrupt 0, gen 0 kernel: [ 1817.852884] BTRFS info (device sdc): bdev /dev/sdd errs: wr 41, rd 0, flush 0, corrupt 0, gen 0 kernel: [ 2037.562050] BTRFS info (device sdc): balance: resume skipped kernel: [ 2037.562064] BTRFS info (device sdc): checking UUID tree kernel: [ 2037.581550] BTRFS info (device sdc): disk added /dev/loop12 kernel: [ 2037.591163] BTRFS info (device sdc): disk added /dev/loop13 kernel: [ 2037.599477] BTRFS info (device sdc): disk added /dev/loop14 kernel: [ 2037.607064] BTRFS info (device sdc): disk added /dev/loop15 kernel: [ 2176.124630] INFO: task btrfs:7783 blocked for more than 120 seconds. kernel: [ 2176.124678] Tainted: G W O 6.2.0-23-generic #23+btrdebug2c kernel: [ 2176.124710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: [ 2176.124742] task:btrfs state:D stack:0 pid:7783 ppid:7782 flags:0x00004002 kernel: [ 2176.124753] Call Trace: kernel: [ 2176.124758] <TASK> kernel: [ 2176.124765] __schedule+0x2aa/0x610 kernel: [ 2176.124780] schedule+0x63/0x110 kernel: [ 2176.124788] btrfs_commit_transaction+0x9b7/0xbc0 [btrfs] kernel: [ 2176.124929] ? __pfx_autoremove_wake_function+0x10/0x10 kernel: [ 2176.124941] btrfs_sync_fs+0x5a/0x1b0 [btrfs] kernel: [ 2176.125060] btrfs_ioctl+0x643/0x14d0 [btrfs] kernel: [ 2176.125225] __x64_sys_ioctl+0xa0/0xe0 kernel: [ 2176.125235] do_syscall_64+0x5b/0x90 kernel: [ 2176.125242] ? do_sys_openat2+0xab/0x180 kernel: [ 2176.125251] ? exit_to_user_mode_prepare+0x30/0xb0 kernel: [ 2176.125260] ? syscall_exit_to_user_mode+0x29/0x50 kernel: [ 2176.125268] ? do_syscall_64+0x67/0x90 kernel: [ 2176.125275] entry_SYSCALL_64_after_hwframe+0x72/0xdc kernel: [ 2176.125282] RIP: 0033:0x7f2e8eb119ef kernel: [ 2176.125288] RSP: 002b:00007ffd632b6aa0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 kernel: [ 2176.125295] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f2e8eb119ef kernel: [ 2176.125300] RDX: 0000000000000000 RSI: 0000000000009408 RDI: 0000000000000003 kernel: [ 2176.125303] RBP: 0000000000000007 R08: 0000000000000000 R09: 0000000000000000 kernel: [ 2176.125306] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f2e8ebf642c kernel: [ 2176.125310] R13: 0000000000000001 R14: 000055cdb7940578 R15: 0000000000000000 kernel: [ 2176.125318] </TASK> kernel: [ 2296.956781] INFO: task btrfs:7783 blocked for more than 241 seconds. kernel: [ 2296.956824] Tainted: G W O 6.2.0-23-generic #23+btrdebug2c kernel: [ 2296.956856] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: [ 2296.956887] task:btrfs state:D stack:0 pid:7783 ppid:7782 flags:0x00004002 kernel: [ 2296.956898] Call Trace: kernel: [ 2296.956902] <TASK> kernel: [ 2296.956908] __schedule+0x2aa/0x610 kernel: [ 2296.956921] schedule+0x63/0x110 kernel: [ 2296.956928] btrfs_commit_transaction+0x9b7/0xbc0 [btrfs] kernel: [ 2296.957069] ? __pfx_autoremove_wake_function+0x10/0x10 kernel: [ 2296.957080] btrfs_sync_fs+0x5a/0x1b0 [btrfs] kernel: [ 2296.957200] btrfs_ioctl+0x643/0x14d0 [btrfs] kernel: [ 2296.957366] __x64_sys_ioctl+0xa0/0xe0 kernel: [ 2296.957375] do_syscall_64+0x5b/0x90 kernel: [ 2296.957383] ? do_sys_openat2+0xab/0x180 kernel: [ 2296.957391] ? exit_to_user_mode_prepare+0x30/0xb0 kernel: [ 2296.957399] ? syscall_exit_to_user_mode+0x29/0x50 kernel: [ 2296.957407] ? do_syscall_64+0x67/0x90 kernel: [ 2296.957414] entry_SYSCALL_64_after_hwframe+0x72/0xdc kernel: [ 2296.957420] RIP: 0033:0x7f2e8eb119ef kernel: [ 2296.957426] RSP: 002b:00007ffd632b6aa0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 kernel: [ 2296.957433] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f2e8eb119ef kernel: [ 2296.957438] RDX: 0000000000000000 RSI: 0000000000009408 RDI: 0000000000000003 kernel: [ 2296.957441] RBP: 0000000000000007 R08: 0000000000000000 R09: 0000000000000000 kernel: [ 2296.957444] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f2e8ebf642c kernel: [ 2296.957448] R13: 0000000000000001 R14: 000055cdb7940578 R15: 0000000000000000 kernel: [ 2296.957468] </TASK> kernel: [ 2314.043258] ------------[ cut here ]------------ kernel: [ 2314.043264] BTRFS: Transaction aborted (error -28) kernel: [ 2314.043334] WARNING: CPU: 2 PID: 7739 at fs/btrfs/extent-tree.c:2847 do_free_extent_accounting+0x21a/0x220 [btrfs] kernel: [ 2314.043467] Modules linked in: ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) binfmt_misc nls_iso8859_1 intel_rapl_msr snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio intel_rapl_common snd_hda_codec_hdmi edac_mce_amd snd_hda_intel snd_intel_dspcfg kvm_amd snd_intel_sdw_acpi snd_hda_codec kvm snd_hda_core snd_hwdep snd_pcm snd_timer irqbypass rapl wmi_bmof snd k10temp ccp soundcore input_leds mac_hid dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua bonding tls msr nfsd efi_pstore auth_rpcgss nfs_acl lockd grace sunrpc dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear amdgpu iommu_v2 drm_buddy gpu_sched drm_ttm_helper hid_generic ttm drm_display_helper cec uas rc_core usbhid hid drm_kms_helper crct10dif_pclmul syscopyarea usb_storage crc32_pclmul polyval_clmulni sysfillrect polyval_generic sysimgblt nvme ghash_clmulni_intel sha512_ssse3 kernel: [ 2314.043599] nvme_core aesni_intel crypto_simd mpt3sas drm cryptd raid_class ahci i2c_piix4 scsi_transport_sas nvme_common igb xhci_pci qlcnic dca xhci_pci_renesas libahci i2c_algo_bit video wmi kernel: [ 2314.043631] CPU: 2 PID: 7739 Comm: btrfs-transacti Tainted: G W O 6.2.0-23-generic #23+btrdebug2c kernel: [ 2314.043638] Hardware name: To Be Filled By O.E.M. X570M Pro4/X570M Pro4, BIOS P3.70 02/23/2022 kernel: [ 2314.043641] RIP: 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] kernel: [ 2314.043766] Code: ce 0f 0b eb b8 44 89 e6 48 c7 c7 a8 39 a0 c1 e8 2c d5 1e ce 0f 0b e9 78 ff ff ff 44 89 e6 48 c7 c7 a8 39 a0 c1 e8 16 d5 1e ce <0f> 0b eb b9 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 kernel: [ 2314.043771] RSP: 0018:ffffad0b11b7bb38 EFLAGS: 00010246 kernel: [ 2314.043777] RAX: 0000000000000000 RBX: ffff9c80e40e8f08 RCX: 0000000000000000 kernel: [ 2314.043781] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 kernel: [ 2314.043784] RBP: ffffad0b11b7bb60 R08: 0000000000000000 R09: 0000000000000000 kernel: [ 2314.043787] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffe4 kernel: [ 2314.043790] R13: 00005e4c359ba000 R14: 0000000000020000 R15: ffff9c824d9a58c0 kernel: [ 2314.043794] FS: 0000000000000000(0000) GS:ffff9c87a0a80000(0000) knlGS:0000000000000000 kernel: [ 2314.043798] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: [ 2314.043802] CR2: 00007f54adc86000 CR3: 00000001471d8000 CR4: 00000000003506e0 kernel: [ 2314.043806] Call Trace: kernel: [ 2314.043809] <TASK> kernel: [ 2314.043815] __btrfs_free_extent+0x6bc/0xf50 [btrfs] kernel: [ 2314.043943] run_delayed_data_ref+0x8b/0x180 [btrfs] kernel: [ 2314.044068] btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] kernel: [ 2314.044192] __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] kernel: [ 2314.044316] btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] kernel: [ 2314.044439] btrfs_start_dirty_block_groups+0x36b/0x530 [btrfs] kernel: [ 2314.044598] btrfs_commit_transaction+0xb3/0xbc0 [btrfs] kernel: [ 2314.044754] ? start_transaction+0xc8/0x600 [btrfs] kernel: [ 2314.044890] transaction_kthread+0x14b/0x1c0 [btrfs] kernel: [ 2314.045021] ? __pfx_transaction_kthread+0x10/0x10 [btrfs] kernel: [ 2314.045151] kthread+0xe9/0x110 kernel: [ 2314.045162] ? __pfx_kthread+0x10/0x10 kernel: [ 2314.045170] ret_from_fork+0x2c/0x50 kernel: [ 2314.045180] </TASK> kernel: [ 2314.045182] ---[ end trace 0000000000000000 ]--- kernel: [ 2314.045186] BTRFS info (device sdc: state A): dumping space info: kernel: [ 2314.045191] BTRFS info (device sdc: state A): space_info DATA has 160777674752 free, is not full kernel: [ 2314.045197] BTRFS info (device sdc: state A): space_info total=71201958395904, used=71013439856640, pinned=27737325568, reserved=0, may_use=0, readonly=3538944 zone_unusable=0 kernel: [ 2314.045205] BTRFS info (device sdc: state A): space_info METADATA has -429047808 free, is full kernel: [ 2314.045209] BTRFS info (device sdc: state A): space_info total=83634421760, used=82789777408, pinned=244891648, reserved=599687168, may_use=429047808, readonly=65536 zone_unusable=0 kernel: [ 2314.045217] BTRFS info (device sdc: state A): space_info SYSTEM has 33390592 free, is not full kernel: [ 2314.045221] BTRFS info (device sdc: state A): space_info total=38797312, used=5373952, pinned=16384, reserved=16384, may_use=0, readonly=0 zone_unusable=0 kernel: [ 2314.045227] BTRFS info (device sdc: state A): global_block_rsv: size 536870912 reserved 428523520 kernel: [ 2314.045231] BTRFS info (device sdc: state A): trans_block_rsv: size 524288 reserved 524288 kernel: [ 2314.045235] BTRFS info (device sdc: state A): chunk_block_rsv: size 0 reserved 0 kernel: [ 2314.045239] BTRFS info (device sdc: state A): delayed_block_rsv: size 0 reserved 0 kernel: [ 2314.045242] BTRFS info (device sdc: state A): delayed_refs_rsv: size 249756909568 reserved 0 kernel: [ 2314.045251] BTRFS: error (device sdc: state A) in do_free_extent_accounting:2847: errno=-28 No space left kernel: [ 2314.045265] BTRFS warning (device sdc: state A): btrfs_uuid_scan_kthread failed -28 kernel: [ 2314.045295] BTRFS info (device sdc: state EA): forced readonly kernel: [ 2314.045300] BTRFS error (device sdc: state EA): failed to run delayed ref for logical 103681409916928 num_bytes 131072 type 184 action 2 ref_mod 1: -28 kernel: [ 2314.045360] BTRFS: error (device sdc: state EA) in btrfs_run_delayed_refs:2151: errno=-28 No space left kernel: [ 2314.049204] BTRFS: error (device sdc: state EA) in btrfs_create_pending_block_groups:2487: errno=-28 No space left kernel: [ 2314.049331] BTRFS: error (device sdc: state EA) in btrfs_create_pending_block_groups:2499: errno=-28 No space left kernel: [ 2314.053259] BTRFS: error (device sdc: state EA) in do_free_extent_accounting:2847: errno=-28 No space left kernel: [ 2314.053318] BTRFS error (device sdc: state EA): failed to run delayed ref for logical 103681419366400 num_bytes 131072 type 184 action 2 ref_mod 1: -28 kernel: [ 2314.053375] BTRFS: error (device sdc: state EA) in btrfs_run_delayed_refs:2151: errno=-28 No space left kernel: [ 2314.053430] BTRFS warning (device sdc: state EA): Skipping commit of aborted transaction. kernel: [ 2314.053435] BTRFS: error (device sdc: state EA) in cleanup_transaction:1986: errno=-28 No space left On Fri, 23 Jun 2023 at 19:16, Qu Wenruo <wqu@suse.com> wrote: > > > > On 2023/6/23 17:00, Stefan N wrote: > > Apologies, I thought I included the log output too, though I can't see > > any additional output > > > > From a fresh run, still using the same kernel > > $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs > > dev add -f /dev/sdl /dev/sdm /dev/sdn /dev/sdo /mnt/data ; sudo btrfs > > fi sync /mnt/data > > ERROR: error adding device '/dev/sdl': Input/output error > > ERROR: error adding device '/dev/sdm': Read-only file system > > ERROR: error adding device '/dev/sdn': Read-only file system > > ERROR: error adding device '/dev/sdo': Read-only file system > > ERROR: Could not sync filesystem: Read-only file system > > $ > > > > Output from kern.log, syslog or dmesg -k > > > [...] > > None of the newly added debug lines triggered, so there is something > else causing the problem. > > And furthermore the backtrace is not that helpful, it only shows it's > some async metadata reclaim kthread causing the problem. > > Although I guess the async metadata reclaim is triggered by the > btrfs_start_transaction() call when adding a device. > So I updated my github branch to go btrfs_join_transaction() which would > not flush any metadata, thus avoid the problem. > > Would you please give it a try again? > > > > > However, now I started digging into logs to check I hadn't missed > > where the errors were being logged, I've found this from roughly a > > week before I started having issues, which I had not previously > > noticed > > You don't need to bother most error messages after the fs flipped RO. > As it's known to have some false alerts. > > Thanks, > Qu > > > [ 1990.495861] BTRFS error (device sdh): failed to run delayed ref for > > logical 107988943355904 num_bytes 245760 type 184 action 2 ref_mod 1: > > -28 > > [ 1990.518282] BTRFS error (device sdh): failed to run delayed ref for > > logical 107989043494912 num_bytes 245760 type 184 action 2 ref_mod 1: > > -28 > > [ 620.104065] BTRFS error (device sdk): failed to run delayed ref for > > logical 123187655077888 num_bytes 176128 type 184 action 2 ref_mod 1: > > -28 > > [ 620.126209] BTRFS error (device sdk): failed to run delayed ref for > > logical 123190279929856 num_bytes 134217728 type 184 action 2 ref_mod > > 1: -28 > > [ 620.126241] BTRFS error (device sdk): failed to run delayed ref for > > logical 123189970468864 num_bytes 134217728 type 184 action 2 ref_mod > > 1: -28 > > [ 620.126271] BTRFS error (device sdk): failed to run delayed ref for > > logical 123190414409728 num_bytes 134217728 type 184 action 2 ref_mod > > 1: -28 > > [ 476.565308] BTRFS error (device sdh): failed to run delayed ref for > > logical 101906434228224 num_bytes 651264 type 184 action 2 ref_mod 1: > > -28 > > [ 476.565932] BTRFS error (device sdh): failed to run delayed ref for > > logical 101906434031616 num_bytes 180224 type 184 action 2 ref_mod 1: > > -28 > > [ 447.371754] BTRFS error (device sdh): failed to run delayed ref for > > logical 101946151927808 num_bytes 262144 type 184 action 2 ref_mod 1: > > -28 > > [ 447.372362] BTRFS error (device sdh): failed to run delayed ref for > > logical 101946083725312 num_bytes 245760 type 184 action 2 ref_mod 1: > > -28 > > [ 439.839007] BTRFS error (device sdj): failed to run delayed ref for > > logical 101923102179328 num_bytes 192512 type 184 action 2 ref_mod 1: > > -28 > > [ 439.839578] BTRFS error (device sdj): failed to run delayed ref for > > logical 101923401629696 num_bytes 245760 type 184 action 2 ref_mod 1: > > -28 > > [ 466.393884] BTRFS error (device sdh): failed to run delayed ref for > > logical 101981116137472 num_bytes 245760 type 184 action 2 ref_mod 1: > > -28 > > [ 466.394451] BTRFS error (device sdh): failed to run delayed ref for > > logical 101981122854912 num_bytes 1720320 type 184 action 2 ref_mod 1: > > -28 > > [ 431.541367] BTRFS error (device sdh): failed to run delayed ref for > > logical 101876426952704 num_bytes 126976 type 184 action 2 ref_mod 1: > > -28 > > [ 431.542010] BTRFS error (device sdh): failed to run delayed ref for > > logical 101876427780096 num_bytes 126976 type 184 action 2 ref_mod 1: > > -28 > > [ 597.487948] BTRFS error (device sdj): failed to run delayed ref for > > logical 108127459409920 num_bytes 196608 type 184 action 2 ref_mod 1: > > -28 > > [ 597.488539] BTRFS error (device sdj): failed to run delayed ref for > > logical 108124677865472 num_bytes 126976 type 184 action 2 ref_mod 1: > > -28 > > [ 534.717509] BTRFS error (device sdh): failed to run delayed ref for > > logical 101958618710016 num_bytes 1597440 type 184 action 2 ref_mod 1: > > -28 > > [ 534.718494] BTRFS error (device sdh): failed to run delayed ref for > > logical 101958756335616 num_bytes 368640 type 184 action 2 ref_mod 1: > > -28 > > [ 508.089394] BTRFS error (device sdk): failed to run delayed ref for > > logical 101911627694080 num_bytes 126976 type 184 action 2 ref_mod 1: > > -28 > > [ 508.090007] BTRFS error (device sdk): failed to run delayed ref for > > logical 101911627415552 num_bytes 126976 type 184 action 2 ref_mod 1: > > -28 > > [ 1632.112084] BTRFS error (device sdh): failed to run delayed ref for > > logical 102203759886336 num_bytes 229376 type 184 action 2 ref_mod 1: > > -28 > > [ 1632.112885] BTRFS error (device sdh): failed to run delayed ref for > > logical 102203764379648 num_bytes 126976 type 184 action 2 ref_mod 1: > > -28 > > > > and today, when leaving the disks mounted read-only for a while, I > > found many occurances similar to: > > BTRFS error (device sdc: state EA): level verify failed on logical > > 201329754554368 mirror 1 wanted 2 found 0 > > BTRFS error (device sdc: state EA): level verify failed on logical > > 201329754554368 mirror 2 wanted 2 found 0 > > BTRFS error (device sdc: state EA): level verify failed on logical > > 201329754554368 mirror 3 wanted 2 found 0 > > BTRFS error (device sdc: state EA): level verify failed on logical > > 201329754554368 mirror 4 wanted 2 found 0 > > BTRFS error (device sdc: state EA): level verify failed on logical > > 201329754554368 mirror 1 wanted 2 found 0 > > BTRFS error (device sdc: state EA): level verify failed on logical > > 201329754554368 mirror 2 wanted 2 found 0 > > BTRFS error (device sdc: state EA): level verify failed on logical > > 201329754554368 mirror 3 wanted 2 found 0 > > BTRFS error (device sdc: state EA): level verify failed on logical > > 201350830227456 mirror 4 wanted 2 found 0 > > BTRFS error (device sdc: state EA): level verify failed on logical > > 201350830227456 mirror 1 wanted 2 found 0 > > BTRFS error (device sdc: state EA): level verify failed on logical > > 201350830227456 mirror 2 wanted 2 found 0 > > > > On Fri, 23 Jun 2023 at 10:27, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > >> > >> > >> > >> On 2023/6/23 06:18, Stefan N wrote: > >>> Hi Qu, > >>> > >>> I got one new line this time, but it doesn't seem to match your commit > >>> ERROR: zoned: unable to stat /dev/loop/13 > >> > >> Please provide the dmesg of that attempt, as all the extra debug info is > >> inside dmesg. > >> > >> With that info provided, we can determine what to do next. > >> > >> Thanks, > >> Qu > >> > >>> > >>> I tried it on the USB flash drives too and didn't get any extra line > >>> > >>> In context > >>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs > >>> dev add -K -f /dev/loop12 /dev/loop/13 /dev/loop14 /dev/loop15 > >>> /mnt/data ; sudo btrfs fi sync /mnt/data > >>> ERROR: error adding device '/dev/loop12': Input/output error > >>> ERROR: zoned: unable to stat /dev/loop/13 > >>> ERROR: checking status of /dev/loop/13: No such file or directory > >>> ERROR: error adding device '/dev/loop14': Read-only file system > >>> ERROR: error adding device '/dev/loop15': Read-only file system > >>> ERROR: Could not sync filesystem: Read-only file system > >>> $ > >>> > >>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs > >>> dev add -f /dev/sdl /dev/sdm /dev/sdn /dev/sdo /mnt/data ; sudo btrfs > >>> fi sync /mnt/data > >>> ERROR: error adding device '/dev/sdl': Input/output error > >>> ERROR: error adding device '/dev/sdm': Read-only file system > >>> ERROR: error adding device '/dev/sdn': Read-only file system > >>> ERROR: error adding device '/dev/sdo': Read-only file system > >>> ERROR: Could not sync filesystem: Read-only file system > >>> $ > >>> > >>> On Thu, 22 Jun 2023 at 18:48, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > >>>> > >>>> > >>>> > >>>> On 2023/6/22 16:33, Stefan N wrote: > >>>>> Hi Qu, > >>>>> > >>>>> Many thanks for the detailed instructions and your patience. I got it > >>>>> working combined with > >>>>> https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel on the main system > >>>>> OS instead, tagged +btrfix > >>>>> $ uname -vr > >>>>> 6.2.0-23-generic #23+btrfix SMP PREEMPT_DYNAMIC Thu Jun 22 > >>>>> > >>>>> However, I've not had luck with the commands suggested, and would > >>>>> appreciate any further ideas. > >>>>> > >>>>> Outputs follow below, with /mnt/data as the btrfs mount point that > >>>>> currently contains 8x disks sd[a-j] with an additional 4x 64gb USB > >>>>> flash drives being added sd[l-o] > >>>>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs > >>>>> dev add -f /dev/sdl /dev/sdm /dev/sdn /dev/sdo /mnt/data ; sudo btrfs > >>>>> fi sync /mnt/data > >>>>> ERROR: error adding device '/dev/sdl': Input/output error > >>>>> ERROR: error adding device '/dev/sdm': Read-only file system > >>>>> ERROR: error adding device '/dev/sdn': Read-only file system > >>>>> ERROR: error adding device '/dev/sdo': Read-only file system > >>>>> ERROR: Could not sync filesystem: Read-only file system > >>>>> $ > >>>>> > >>>>> The same occurs if I try to add 4x 100mb loop devices (on a ssd so > >>>>> they're super quick to zero); > >>>>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs > >>>>> dev add -K -f /dev/loop16 /dev/loop17 /dev/loop18 /dev/loop19 > >>>>> /mnt/data ; sudo btrfs fi sync /mnt/data > >>>>> ERROR: error adding device '/dev/loop16': Input/output error > >>>> > >>>> This is the interesting part, this means we're erroring out due to -EIO > >>>> (not -ENOSPC) during the first device add. > >>>> > >>>> And by somehow, after the first device add, we already got the trans abort. > >>>> > >>>> Would you please try the following branch? > >>>> > >>>> https://github.com/adam900710/linux/tree/dev_add_no_commit > >>>> > >>>> It has not only the patch to skip the commit, but also extra debug > >>>> output for the situation. > >>>> > >>>> Thanks, > >>>> Qu > >>>> > >>>>> ERROR: error adding device '/dev/loop17': Read-only file system > >>>>> ERROR: error adding device '/dev/loop18': Read-only file system > >>>>> ERROR: error adding device '/dev/loop19': Read-only file system > >>>>> ERROR: Could not sync filesystem: Read-only file system > >>>>> $ > >>>>> > >>>>> I confirmed before both these kernel builds that the replaced line was > >>>>> btrfs_end_transaction rather than btrfs_commit_transaction (anyone > >>>>> else following, I needed to remove the -n in the patch command > >>>>> earlier) > >>>>> $ grep -A3 -ri btrfs_sysfs_update_sprout */fs/btrfs/volumes.c* > >>>>> linux-6.2.0-dist/fs/btrfs/volumes.c: > >>>>> btrfs_sysfs_update_sprout_fsid(fs_devices); > >>>>> linux-6.2.0-dist/fs/btrfs/volumes.c- } > >>>>> linux-6.2.0-dist/fs/btrfs/volumes.c- > >>>>> linux-6.2.0-dist/fs/btrfs/volumes.c- ret = btrfs_commit_transaction(trans); > >>>>> -- > >>>>> linux-6.2.0-v2/fs/btrfs/volumes.c: > >>>>> btrfs_sysfs_update_sprout_fsid(fs_devices); > >>>>> linux-6.2.0-v2/fs/btrfs/volumes.c- } > >>>>> linux-6.2.0-v2/fs/btrfs/volumes.c- > >>>>> linux-6.2.0-v2/fs/btrfs/volumes.c- ret = btrfs_end_transaction(trans); > >>>>> -- > >>>>> linux-6.2.0-v3/fs/btrfs/volumes.c: > >>>>> btrfs_sysfs_update_sprout_fsid(fs_devices); > >>>>> linux-6.2.0-v3/fs/btrfs/volumes.c- } > >>>>> linux-6.2.0-v3/fs/btrfs/volumes.c- > >>>>> linux-6.2.0-v3/fs/btrfs/volumes.c- ret = btrfs_end_transaction(trans); > >>>>> $ > >>>>> > >>>>> $ btrfs fi usage /mnt/data > >>>>> Overall: > >>>>> Device size: 87.31TiB > >>>>> Device allocated: 87.31TiB > >>>>> Device unallocated: 1.94GiB > >>>>> Device missing: 0.00B > >>>>> Device slack: 0.00B > >>>>> Used: 87.08TiB > >>>>> Free (estimated): 173.29GiB (min: 172.33GiB) > >>>>> Free (statfs, df): 171.84GiB > >>>>> Data ratio: 1.34 > >>>>> Metadata ratio: 4.00 > >>>>> Global reserve: 512.00MiB (used: 371.25MiB) > >>>>> Multiple profiles: no > >>>>> > >>>>> Data,RAID6: Size:64.76TiB, Used:64.59TiB (99.74%) > >>>>> /dev/sdc 10.90TiB > >>>>> /dev/sdf 10.90TiB > >>>>> /dev/sda 10.86TiB > >>>>> /dev/sdg 10.87TiB > >>>>> /dev/sdh 10.86TiB > >>>>> /dev/sdd 10.87TiB > >>>>> /dev/sde 10.88TiB > >>>>> /dev/sdb 10.88TiB > >>>>> > >>>>> Metadata,RAID1C4: Size:77.79GiB, Used:77.11GiB (99.12%) > >>>>> /dev/sdc 15.33GiB > >>>>> /dev/sdf 18.41GiB > >>>>> /dev/sda 49.63GiB > >>>>> /dev/sdg 49.50GiB > >>>>> /dev/sdh 51.52GiB > >>>>> /dev/sdd 48.70GiB > >>>>> /dev/sde 39.09GiB > >>>>> /dev/sdb 39.01GiB > >>>>> > >>>>> System,RAID1C4: Size:37.00MiB, Used:5.11MiB (13.81%) > >>>>> /dev/sdc 1.00MiB > >>>>> /dev/sda 37.00MiB > >>>>> /dev/sdg 37.00MiB > >>>>> /dev/sdh 36.00MiB > >>>>> /dev/sdd 37.00MiB > >>>>> > >>>>> Unallocated: > >>>>> /dev/sdc 1.00MiB > >>>>> /dev/sdf 1.00MiB > >>>>> /dev/sda 1.27GiB > >>>>> /dev/sdg 1.00MiB > >>>>> /dev/sdh 1.00MiB > >>>>> /dev/sdd 687.00MiB > >>>>> /dev/sde 1.00MiB > >>>>> /dev/sdb 1.00MiB > >>>>> $ > >>>>> > >>>>> > >>>>> This first attempt generated the following syslog output: > >>>>> kernel: [ 868.435387] BTRFS info (device sde): using crc32c > >>>>> (crc32c-intel) checksum algorithm > >>>>> kernel: [ 868.435407] BTRFS info (device sde): disk space caching is enabled > >>>>> kernel: [ 874.477712] BTRFS info (device sde): bdev /dev/sdg errs: wr > >>>>> 0, rd 0, flush 0, corrupt 845, gen 0 > >>>>> kernel: [ 874.477727] BTRFS info (device sde): bdev /dev/sdc errs: wr > >>>>> 41089, rd 1556, flush 0, corrupt 0, gen 0 > >>>>> kernel: [ 874.477735] BTRFS info (device sde): bdev /dev/sdj errs: wr > >>>>> 3, rd 7, flush 0, corrupt 0, gen 0 > >>>>> kernel: [ 874.477740] BTRFS info (device sde): bdev /dev/sdf errs: wr > >>>>> 41, rd 0, flush 0, corrupt 0, gen 0 > >>>>> kernel: [ 1082.645551] BTRFS info (device sde): balance: resume skipped > >>>>> kernel: [ 1082.645564] BTRFS info (device sde): checking UUID tree > >>>>> kernel: [ 1082.645551] BTRFS info (device sde): balance: resume skipped > >>>>> kernel: [ 1082.645564] BTRFS info (device sde): checking UUID tree > >>>>> kernel: [ 1267.280506] BTRFS: Transaction aborted (error -28) > >>>>> kernel: [ 1267.280553] BTRFS: error (device sde: state A) in > >>>>> do_free_extent_accounting:2847: errno=-28 No space left > >>>>> kernel: [ 1267.280604] BTRFS info (device sde: state EA): forced readonly > >>>>> kernel: [ 1267.280610] BTRFS error (device sde: state EA): failed to > >>>>> run delayed ref for logical 102255404044288 num_bytes 294912 type 184 > >>>>> action 2 ref_mod 1: -28 > >>>>> kernel: [ 1267.280584] WARNING: CPU: 3 PID: 14519 at > >>>>> fs/btrfs/extent-tree.c:2847 do_free_extent_accounting+0x21a/0x220 > >>>>> [btrfs] > >>>>> kernel: [ 1267.280666] BTRFS: error (device sde: state EA) in > >>>>> btrfs_run_delayed_refs:2151: errno=-28 No space left > >>>>> kernel: [ 1267.280695] BTRFS warning (device sde: state EA): > >>>>> btrfs_uuid_scan_kthread failed -5 > >>>>> kernel: [ 1267.280794] Modules linked in: xt_nat xt_tcpudp veth > >>>>> xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink > >>>>> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo > >>>>> xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc > >>>>> ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) binfmt_misc > >>>>> nls_iso8859_1 intel_rapl_msr intel_rapl_common edac_mce_amd > >>>>> snd_hda_codec_realtek kvm_amd snd_hda_codec_generic ledtrig_audio kvm > >>>>> snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi > >>>>> snd_hda_codec irqbypass snd_hda_core snd_hwdep rapl snd_pcm snd_timer > >>>>> wmi_bmof k10temp snd ccp soundcore input_leds mac_hid dm_multipath > >>>>> scsi_dh_rdac scsi_dh_emc scsi_dh_alua bonding tls efi_pstore msr nfsd > >>>>> auth_rpcgss nfs_acl lockd grace sunrpc dmi_sysfs ip_tables x_tables > >>>>> autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov > >>>>> async_memcpy async_pq async_xor async_txxor raid6_pq libcrc32c raid1 > >>>>> raid0 multipath linear hid_generic usbhid hid amdgpu uas usb_storage > >>>>> kernel: [ 1267.280994] CPU: 3 PID: 14519 Comm: btrfs-transacti > >>>>> Tainted: G W O 6.2.0-23-generic #23+btrfix > >>>>> kernel: [ 1267.281005] RIP: 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] > >>>>> kernel: [ 1267.281181] __btrfs_free_extent+0x6bc/0xf50 [btrfs] > >>>>> kernel: [ 1267.281310] run_delayed_data_ref+0x8b/0x180 [btrfs] > >>>>> kernel: [ 1267.281444] btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] > >>>>> kernel: [ 1267.281570] __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] > >>>>> kernel: [ 1267.281694] btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] > >>>>> kernel: [ 1267.281818] btrfs_start_dirty_block_groups+0x36b/0x530 [btrfs] > >>>>> kernel: [ 1267.281976] btrfs_commit_transaction+0xb3/0xbc0 [btrfs] > >>>>> kernel: [ 1267.282110] ? start_transaction+0xc8/0x600 [btrfs] > >>>>> kernel: [ 1267.282244] transaction_kthread+0x14b/0x1c0 [btrfs] > >>>>> kernel: [ 1267.282375] ? __pfx_transaction_kthread+0x10/0x10 [btrfs] > >>>>> kernel: [ 1267.282548] BTRFS info (device sde: state EA): dumping space info: > >>>>> kernel: [ 1267.282552] BTRFS info (device sde: state EA): space_info > >>>>> DATA has 160777674752 free, is not full > >>>>> kernel: [ 1267.282558] BTRFS info (device sde: state EA): space_info > >>>>> total=71201958395904, used=71018191273984, pinned=22985908224, > >>>>> reserved=0, may_use=0, readonly=3538944 zone_unusable=0 > >>>>> kernel: [ 1267.282566] BTRFS info (device sde: state EA): space_info > >>>>> METADATA has -124944384 free, is full > >>>>> kernel: [ 1267.282571] BTRFS info (device sde: state EA): space_info > >>>>> total=83530612736, used=82791497728, pinned=242745344, > >>>>> reserved=496369664, may_use=124944384, readonly=0 zone_unusable=0 > >>>>> kernel: [ 1267.282577] BTRFS info (device sde: state EA): space_info > >>>>> SYSTEM has 33439744 free, is not full > >>>>> kernel: [ 1267.282582] BTRFS info (device sde: state EA): space_info > >>>>> total=38797312, used=5357568, pinned=0, reserved=0, may_use=0, > >>>>> readonly=0 zone_unusable=0 > >>>>> kernel: [ 1267.282588] BTRFS info (device sde: state EA): > >>>>> global_block_rsv: size 536870912 reserved 124944384 > >>>>> kernel: [ 1267.282592] BTRFS info (device sde: state EA): > >>>>> trans_block_rsv: size 0 reserved 0 > >>>>> kernel: [ 1267.282595] BTRFS info (device sde: state EA): > >>>>> chunk_block_rsv: size 0 reserved 0 > >>>>> kernel: [ 1267.282599] BTRFS info (device sde: state EA): > >>>>> delayed_block_rsv: size 0 reserved 0 > >>>>> kernel: [ 1267.282602] BTRFS info (device sde: state EA): > >>>>> delayed_refs_rsv: size 251322957824 reserved 0 > >>>>> kernel: [ 1267.282608] BTRFS: error (device sde: state EA) in > >>>>> do_free_extent_accounting:2847: errno=-28 No space left > >>>>> kernel: [ 1267.282653] BTRFS error (device sde: state EA): failed to > >>>>> run delayed ref for logical 102255401897984 num_bytes 126976 type 184 > >>>>> action 2 ref_mod 1: -28 > >>>>> kernel: [ 1267.282708] BTRFS: error (device sde: state EA) in > >>>>> btrfs_run_delayed_refs:2151: errno=-28 No space left > >>>>> > >>>>> A couple of kernel recompiles later, the second attempt on the SSD > >>>>> generated similar: > >>>>> kernel: [ 1472.203470] BTRFS info (device sdc): using crc32c > >>>>> (crc32c-intel) checksum algorithm > >>>>> kernel: [ 1472.203491] BTRFS info (device sdc): disk space caching is enabled > >>>>> kernel: [ 1478.155004] BTRFS info (device sdc): bdev /dev/sdf errs: wr > >>>>> 0, rd 0, flush 0, corrupt 845, gen 0 > >>>>> kernel: [ 1478.155022] BTRFS info (device sdc): bdev /dev/sda errs: wr > >>>>> 41089, rd 1556, flush 0, corrupt 0, gen 0 > >>>>> kernel: [ 1478.155034] BTRFS info (device sdc): bdev /dev/sdh errs: wr > >>>>> 3, rd 7, flush 0, corrupt 0, gen 0 > >>>>> kernel: [ 1478.155041] BTRFS info (device sdc): bdev /dev/sdd errs: wr > >>>>> 41, rd 0, flush 0, corrupt 0, gen 0 > >>>>> kernel: [ 1696.662526] BTRFS info (device sdc): balance: resume skipped > >>>>> kernel: [ 1696.662537] BTRFS info (device sdc): checking UUID tree > >>>>> kernel: [ 1919.452464] BTRFS: Transaction aborted (error -28) > >>>>> kernel: [ 1919.452534] WARNING: CPU: 1 PID: 161 at > >>>>> fs/btrfs/extent-tree.c:2847 do_free_extent_accounting+0x21a/0x220 > >>>>> [btrfs] > >>>>> kernel: [ 1919.452655] Modules linked in: xt_nat xt_tcpudp veth > >>>>> xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink > >>>>> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo > >>>>> xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc > >>>>> ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) binfmt_misc > >>>>> nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic > >>>>> ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg > >>>>> snd_intel_sdw_acpi snd_hda_codec intel_rapl_msr snd_hda_core > >>>>> intel_rapl_common edac_mce_amd snd_hwdep kvm_amd snd_pcm snd_timer kvm > >>>>> irqbypass rapl wmi_bmof snd k10temp soundcore ccp input_leds mac_hid > >>>>> dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua bonding tls nfsd > >>>>> msr auth_rpcgss efi_pstore nfs_acl lockd grace sunrpc dmi_sysfs > >>>>> ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 > >>>>> async_raid6_recov async_memcpy async_pq async_xor async_tx xor > >>>>> raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid > >>>>> amdgpu uas hid iommu_v2 > >>>>> kernel: [ 1919.452839] Workqueue: events_unbound > >>>>> btrfs_async_reclaim_metadata_space [btrfs] > >>>>> kernel: [ 1919.452985] RIP: 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] > >>>>> kernel: [ 1919.453141] __btrfs_free_extent+0x6bc/0xf50 [btrfs] > >>>>> kernel: [ 1919.453256] run_delayed_data_ref+0x8b/0x180 [btrfs] > >>>>> kernel: [ 1919.453368] btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] > >>>>> kernel: [ 1919.453480] __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] > >>>>> kernel: [ 1919.453592] btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] > >>>>> kernel: [ 1919.453703] flush_space+0x23c/0x2c0 [btrfs] > >>>>> kernel: [ 1919.453845] btrfs_async_reclaim_metadata_space+0x19b/0x2b0 [btrfs] > >>>>> kernel: [ 1919.454034] BTRFS info (device sdc: state A): dumping space info: > >>>>> kernel: [ 1919.454038] BTRFS info (device sdc: state A): space_info > >>>>> DATA has 160778723328 free, is not full > >>>>> kernel: [ 1919.454043] BTRFS info (device sdc: state A): space_info > >>>>> total=71201958395904, used=71017442181120, pinned=23733952512, > >>>>> reserved=0, may_use=0, readonly=3538944 zone_unusable=0 > >>>>> kernel: [ 1919.454050] BTRFS info (device sdc: state A): space_info > >>>>> METADATA has -147570688 free, is full > >>>>> kernel: [ 1919.454054] BTRFS info (device sdc: state A): space_info > >>>>> total=83530612736, used=82792185856, pinned=238059520, > >>>>> reserved=500367360, may_use=147570688, readonly=0 zone_unusable=0 > >>>>> kernel: [ 1919.454060] BTRFS info (device sdc: state A): space_info > >>>>> SYSTEM has 33439744 free, is not full > >>>>> kernel: [ 1919.454064] BTRFS info (device sdc: state A): space_info > >>>>> total=38797312, used=5357568, pinned=0, reserved=0, may_use=0, > >>>>> readonly=0 zone_unusable=0 > >>>>> kernel: [ 1919.454070] BTRFS info (device sdc: state A): > >>>>> global_block_rsv: size 536870912 reserved 147570688 > >>>>> kernel: [ 1919.454074] BTRFS info (device sdc: state A): > >>>>> trans_block_rsv: size 0 reserved 0 > >>>>> kernel: [ 1919.454077] BTRFS info (device sdc: state A): > >>>>> chunk_block_rsv: size 0 reserved 0 > >>>>> kernel: [ 1919.454080] BTRFS info (device sdc: state A): > >>>>> delayed_block_rsv: size 0 reserved 0 > >>>>> kernel: [ 1919.454083] BTRFS info (device sdc: state A): > >>>>> delayed_refs_rsv: size 254292787200 reserved 0 > >>>>> kernel: [ 1919.454086] BTRFS: error (device sdc: state A) in > >>>>> do_free_extent_accounting:2847: errno=-28 No space left > >>>>> kernel: [ 1919.454123] BTRFS info (device sdc: state EA): forced readonly > >>>>> kernel: [ 1919.454127] BTRFS error (device sdc: state EA): failed to > >>>>> run delayed ref for logical 102538713931776 num_bytes 245760 type 184 > >>>>> action 2 ref_mod 1: -28 > >>>>> kernel: [ 1919.454176] BTRFS: error (device sdc: state EA) in > >>>>> btrfs_run_delayed_refs:2151: errno=-28 No space left > >>>>> kernel: [ 1919.454249] BTRFS warning (device sdc: state EA): > >>>>> btrfs_uuid_scan_kthread failed -5 > >>>>> kernel: [ 1919.472381] BTRFS: error (device sdc: state EA) in > >>>>> __btrfs_free_extent:3077: errno=-28 No space left > >>>>> kernel: [ 1919.472417] BTRFS error (device sdc: state EA): failed to > >>>>> run delayed ref for logical 102538732191744 num_bytes 245760 type 184 > >>>>> action 2 ref_mod 1: -28 > >>>>> kernel: [ 1919.472442] BTRFS: error (device sdc: state EA) in > >>>>> btrfs_run_delayed_refs:2151: errno=-28 No space left > >>>>> > >>>>> > >>>>> On Sat, 17 Jun 2023 at 15:00, Qu Wenruo <wqu@suse.com> wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 2023/6/17 13:11, Stefan N wrote: > >>>>>>> Hi Qu, > >>>>>>> > >>>>>>> I believe I've got this environment ready, with the 6.2.0 kernel as > >>>>>>> before using the Ubuntu kernel, but can switch to vanilla if required. > >>>>>>> > >>>>>>> I've not done anything kernel modifications for a solid decade, so > >>>>>>> would be keen for a bit of guidance. > >>>>>> > >>>>>> Sure no problem. > >>>>>> > >>>>>> Please fetch the kernel source tar ball (6.2.x) first, decompress, then > >>>>>> apply the attached one-line patch by: > >>>>>> > >>>>>> $ tar czf linux*.tar.xz > >>>>>> $ cd linux* > >>>>>> $ patch -np1 -i <the patch file> > >>>>>> > >>>>>> Then use your running system kernel config if possible: > >>>>>> > >>>>>> $ cp /proc/config.gz . > >>>>>> $ gunzip config.gz > >>>>>> $ mv config .config > >>>>>> $ make olddefconfig > >>>>>> > >>>>>> Then you can start your kernel compiling, and considering you're using > >>>>>> your distro's default, it would include tons of drivers, thus would be > >>>>>> very slow. (Replace the number to something more suitable to your > >>>>>> system, using all CPU cores can be very hot) > >>>>>> > >>>>>> $ make -j12 > >>>>>> > >>>>>> Finally you need to install the modules/kernel. > >>>>>> > >>>>>> Unfortunately this is distro specific, but if you're using Ubuntu, it > >>>>>> may be much easier: > >>>>>> > >>>>>> $ make bindeb-pkg > >>>>>> > >>>>>> Then install the generated dpkg I guess? I have never tried kernel > >>>>>> building using deb/rpm, but only manual installation, which is also > >>>>>> distro dependent in the initramfs generation part. > >>>>>> > >>>>>> # cp arch/x86/boot/bzImage /boot/vmlinuz-custom > >>>>>> # make modules_install > >>>>>> # mkinitcpio -k /boot/vmlinuz-custom -g /boot/initramfs-custom.img > >>>>>> > >>>>>> > >>>>>> The last step is to update your bootloader to add the new kernel, which > >>>>>> is not only distro dependent but also bootloader dependent. > >>>>>> > >>>>>> In my case, I go with systemd-boot with manually crafted entries. > >>>>>> But if you go Ubuntu I believe just installing the kernel dpkg would > >>>>>> have everything handled? > >>>>>> > >>>>>> Finally you can try reboot into the newer kernel, and try device add > >>>>>> (need to add 4 disks), then sync and see if things work as expected. > >>>>>> > >>>>>> Thanks, > >>>>>> Qu > >>>>>>> > >>>>>>> I will recover a 1tb SSD and partition it into 4 in a USB enclosure, > >>>>>>> but failing this will use 4x loop devices. > >>>>>>> > >>>>>>> On Tue, 13 Jun 2023 at 11:28, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > >>>>>>>> In your particular case, since you're running RAID1C4 you need to add 4 > >>>>>>>> devices in one transaction. > >>>>>>>> > >>>>>>>> I can easily craft a patch to avoid commit transaction, but still you'll > >>>>>>>> need to add at least 4 disks, and then sync to see if things would work. > >>>>>>>> > >>>>>>>> Furthermore this means you need a liveCD with full kernel compiling > >>>>>>>> environment. > >>>>>>>> > >>>>>>>> If you want to go this path, I can send you the patch when you've > >>>>>>>> prepared the needed environment. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Out of space loop: skip_balance not working 2023-06-24 15:29 ` Stefan N @ 2023-06-26 10:18 ` Qu Wenruo 2023-06-26 12:58 ` Stefan N 0 siblings, 1 reply; 22+ messages in thread From: Qu Wenruo @ 2023-06-26 10:18 UTC (permalink / raw) To: Stefan N, Qu Wenruo; +Cc: linux-btrfs@vger.kernel.org On 2023/6/24 23:29, Stefan N wrote: > Whoops, I had left --dry-run on the first debug patch you commited, so > that didn't run correctly. > > I've included the output from both patches, as they result in different output. > > Rerunning the older patch first, with loop devices (I tried both > 4x100mb and 4x1gb) I get the following: > [...] > *** The below is using the newer patch as follows: > $ diff fs/btrfs/ ../linux-6.2.0-dist/fs/btrfs/ > diff fs/btrfs/ioctl.c ../linux-6.2.0-dist/fs/btrfs/ioctl.c > 2656,2658d2655 > < else > < btrfs_err(fs_info, "failed to add disk %s: %d", > < vol_args->name, ret); > diff fs/btrfs/transaction.c ../linux-6.2.0-dist/fs/btrfs/transaction.c > 1029d1028 > < /* > 1031d1029 > < */ > diff fs/btrfs/volumes.c ../linux-6.2.0-dist/fs/btrfs/volumes.c > 2677c2677 > < trans = btrfs_join_transaction(root); > --- >> trans = btrfs_start_transaction(root, 0); > 2680d2679 > < btrfs_err(fs_info, "failed to start trans: %d", ret); > 2769d2767 > < btrfs_err(fs_info, "failed to add dev item: %d", ret); > 2787,2789c2785 > < ret = btrfs_end_transaction(trans); > < if (ret < 0) > < btrfs_err(fs_info, "failed to end trans: %d", ret); > --- >> ret = btrfs_commit_transaction(trans); > $ > > $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs > dev add -K -f /dev/loop12 /dev/loop13 /dev/loop14 /dev/loop15 > /mnt/data ; sudo btrfs fi sync /mnt/data > ERROR: Could not sync filesystem: No space left on device Is it the same even with 4x1GiB loopback devices? > $ > > kernel: [ 1811.846087] BTRFS info (device sdc): using crc32c > (crc32c-intel) checksum algorithm > kernel: [ 1811.846107] BTRFS info (device sdc): disk space caching is enabled > kernel: [ 1817.852850] BTRFS info (device sdc): bdev /dev/sde errs: wr > 0, rd 0, flush 0, corrupt 845, gen 0 > kernel: [ 1817.852866] BTRFS info (device sdc): bdev /dev/sda errs: wr > 41089, rd 1556, flush 0, corrupt 0, gen 0 > kernel: [ 1817.852877] BTRFS info (device sdc): bdev /dev/sdh errs: wr > 3, rd 7, flush 0, corrupt 0, gen 0 > kernel: [ 1817.852884] BTRFS info (device sdc): bdev /dev/sdd errs: wr > 41, rd 0, flush 0, corrupt 0, gen 0 > kernel: [ 2037.562050] BTRFS info (device sdc): balance: resume skipped > kernel: [ 2037.562064] BTRFS info (device sdc): checking UUID tree > kernel: [ 2037.581550] BTRFS info (device sdc): disk added /dev/loop12 > kernel: [ 2037.591163] BTRFS info (device sdc): disk added /dev/loop13 > kernel: [ 2037.599477] BTRFS info (device sdc): disk added /dev/loop14 > kernel: [ 2037.607064] BTRFS info (device sdc): disk added /dev/loop15 > kernel: [ 2176.124630] INFO: task btrfs:7783 blocked for more than 120 seconds. > kernel: [ 2176.124678] Tainted: G W O > 6.2.0-23-generic #23+btrdebug2c > kernel: [ 2176.124710] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > kernel: [ 2176.124742] task:btrfs state:D stack:0 > pid:7783 ppid:7782 flags:0x00004002 > kernel: [ 2176.124753] Call Trace: > kernel: [ 2176.124758] <TASK> > kernel: [ 2176.124765] __schedule+0x2aa/0x610 > kernel: [ 2176.124780] schedule+0x63/0x110 > kernel: [ 2176.124788] btrfs_commit_transaction+0x9b7/0xbc0 [btrfs] This means we're doing the real work, but it seems to take too long. In fact this is already looking promising as we have when through the whole device add part. Just need to let the final commit to finish. > kernel: [ 2176.124929] ? __pfx_autoremove_wake_function+0x10/0x10 > kernel: [ 2176.124941] btrfs_sync_fs+0x5a/0x1b0 [btrfs] > kernel: [ 2176.125060] btrfs_ioctl+0x643/0x14d0 [btrfs] > kernel: [ 2176.125225] __x64_sys_ioctl+0xa0/0xe0 > kernel: [ 2176.125235] do_syscall_64+0x5b/0x90 > kernel: [ 2176.125242] ? do_sys_openat2+0xab/0x180 > kernel: [ 2176.125251] ? exit_to_user_mode_prepare+0x30/0xb0 > kernel: [ 2176.125260] ? syscall_exit_to_user_mode+0x29/0x50 > kernel: [ 2176.125268] ? do_syscall_64+0x67/0x90 > kernel: [ 2176.125275] entry_SYSCALL_64_after_hwframe+0x72/0xdc > kernel: [ 2176.125282] RIP: 0033:0x7f2e8eb119ef > kernel: [ 2176.125288] RSP: 002b:00007ffd632b6aa0 EFLAGS: 00000246 > ORIG_RAX: 0000000000000010 > kernel: [ 2176.125295] RAX: ffffffffffffffda RBX: 0000000000000003 > RCX: 00007f2e8eb119ef > kernel: [ 2176.125300] RDX: 0000000000000000 RSI: 0000000000009408 > RDI: 0000000000000003 > kernel: [ 2176.125303] RBP: 0000000000000007 R08: 0000000000000000 > R09: 0000000000000000 > kernel: [ 2176.125306] R10: 0000000000000000 R11: 0000000000000246 > R12: 00007f2e8ebf642c > kernel: [ 2176.125310] R13: 0000000000000001 R14: 000055cdb7940578 > R15: 0000000000000000 > kernel: [ 2176.125318] </TASK> > kernel: [ 2296.956781] INFO: task btrfs:7783 blocked for more than 241 seconds. > kernel: [ 2296.956824] Tainted: G W O > 6.2.0-23-generic #23+btrdebug2c > kernel: [ 2296.956856] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > kernel: [ 2296.956887] task:btrfs state:D stack:0 > pid:7783 ppid:7782 flags:0x00004002 > kernel: [ 2296.956898] Call Trace: > kernel: [ 2296.956902] <TASK> > kernel: [ 2296.956908] __schedule+0x2aa/0x610 > kernel: [ 2296.956921] schedule+0x63/0x110 > kernel: [ 2296.956928] btrfs_commit_transaction+0x9b7/0xbc0 [btrfs] > kernel: [ 2296.957069] ? __pfx_autoremove_wake_function+0x10/0x10 > kernel: [ 2296.957080] btrfs_sync_fs+0x5a/0x1b0 [btrfs] > kernel: [ 2296.957200] btrfs_ioctl+0x643/0x14d0 [btrfs] > kernel: [ 2296.957366] __x64_sys_ioctl+0xa0/0xe0 > kernel: [ 2296.957375] do_syscall_64+0x5b/0x90 > kernel: [ 2296.957383] ? do_sys_openat2+0xab/0x180 > kernel: [ 2296.957391] ? exit_to_user_mode_prepare+0x30/0xb0 > kernel: [ 2296.957399] ? syscall_exit_to_user_mode+0x29/0x50 > kernel: [ 2296.957407] ? do_syscall_64+0x67/0x90 > kernel: [ 2296.957414] entry_SYSCALL_64_after_hwframe+0x72/0xdc > kernel: [ 2296.957420] RIP: 0033:0x7f2e8eb119ef > kernel: [ 2296.957426] RSP: 002b:00007ffd632b6aa0 EFLAGS: 00000246 > ORIG_RAX: 0000000000000010 > kernel: [ 2296.957433] RAX: ffffffffffffffda RBX: 0000000000000003 > RCX: 00007f2e8eb119ef > kernel: [ 2296.957438] RDX: 0000000000000000 RSI: 0000000000009408 > RDI: 0000000000000003 > kernel: [ 2296.957441] RBP: 0000000000000007 R08: 0000000000000000 > R09: 0000000000000000 > kernel: [ 2296.957444] R10: 0000000000000000 R11: 0000000000000246 > R12: 00007f2e8ebf642c > kernel: [ 2296.957448] R13: 0000000000000001 R14: 000055cdb7940578 > R15: 0000000000000000 > kernel: [ 2296.957468] </TASK> > kernel: [ 2314.043258] ------------[ cut here ]------------ > kernel: [ 2314.043264] BTRFS: Transaction aborted (error -28) > kernel: [ 2314.043334] WARNING: CPU: 2 PID: 7739 at > fs/btrfs/extent-tree.c:2847 do_free_extent_accounting+0x21a/0x220 > [btrfs] > kernel: [ 2314.043467] Modules linked in: ipmi_devintf ipmi_msghandler > overlay iwlwifi_compat(O) binfmt_misc nls_iso8859_1 intel_rapl_msr > snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio > intel_rapl_common snd_hda_codec_hdmi edac_mce_amd snd_hda_intel > snd_intel_dspcfg kvm_amd snd_intel_sdw_acpi snd_hda_codec kvm > snd_hda_core snd_hwdep snd_pcm snd_timer irqbypass rapl wmi_bmof snd > k10temp ccp soundcore input_leds mac_hid dm_multipath scsi_dh_rdac > scsi_dh_emc scsi_dh_alua bonding tls msr nfsd efi_pstore auth_rpcgss > nfs_acl lockd grace sunrpc dmi_sysfs ip_tables x_tables autofs4 btrfs > blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq > async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear > amdgpu iommu_v2 drm_buddy gpu_sched drm_ttm_helper hid_generic ttm > drm_display_helper cec uas rc_core usbhid hid drm_kms_helper > crct10dif_pclmul syscopyarea usb_storage crc32_pclmul polyval_clmulni > sysfillrect polyval_generic sysimgblt nvme ghash_clmulni_intel > sha512_ssse3 > kernel: [ 2314.043599] nvme_core aesni_intel crypto_simd mpt3sas drm > cryptd raid_class ahci i2c_piix4 scsi_transport_sas nvme_common igb > xhci_pci qlcnic dca xhci_pci_renesas libahci i2c_algo_bit video wmi > kernel: [ 2314.043631] CPU: 2 PID: 7739 Comm: btrfs-transacti Tainted: > G W O 6.2.0-23-generic #23+btrdebug2c > kernel: [ 2314.043638] Hardware name: To Be Filled By O.E.M. X570M > Pro4/X570M Pro4, BIOS P3.70 02/23/2022 > kernel: [ 2314.043641] RIP: 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] > kernel: [ 2314.043766] Code: ce 0f 0b eb b8 44 89 e6 48 c7 c7 a8 39 a0 > c1 e8 2c d5 1e ce 0f 0b e9 78 ff ff ff 44 89 e6 48 c7 c7 a8 39 a0 c1 > e8 16 d5 1e ce <0f> 0b eb b9 66 90 90 90 90 90 90 90 90 90 90 90 90 90 > 90 90 90 90 > kernel: [ 2314.043771] RSP: 0018:ffffad0b11b7bb38 EFLAGS: 00010246 > kernel: [ 2314.043777] RAX: 0000000000000000 RBX: ffff9c80e40e8f08 > RCX: 0000000000000000 > kernel: [ 2314.043781] RDX: 0000000000000000 RSI: 0000000000000000 > RDI: 0000000000000000 > kernel: [ 2314.043784] RBP: ffffad0b11b7bb60 R08: 0000000000000000 > R09: 0000000000000000 > kernel: [ 2314.043787] R10: 0000000000000000 R11: 0000000000000000 > R12: 00000000ffffffe4 > kernel: [ 2314.043790] R13: 00005e4c359ba000 R14: 0000000000020000 > R15: ffff9c824d9a58c0 > kernel: [ 2314.043794] FS: 0000000000000000(0000) > GS:ffff9c87a0a80000(0000) knlGS:0000000000000000 > kernel: [ 2314.043798] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > kernel: [ 2314.043802] CR2: 00007f54adc86000 CR3: 00000001471d8000 > CR4: 00000000003506e0 > kernel: [ 2314.043806] Call Trace: > kernel: [ 2314.043809] <TASK> > kernel: [ 2314.043815] __btrfs_free_extent+0x6bc/0xf50 [btrfs] > kernel: [ 2314.043943] run_delayed_data_ref+0x8b/0x180 [btrfs] > kernel: [ 2314.044068] btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] > kernel: [ 2314.044192] __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] > kernel: [ 2314.044316] btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] > kernel: [ 2314.044439] btrfs_start_dirty_block_groups+0x36b/0x530 [btrfs] > kernel: [ 2314.044598] btrfs_commit_transaction+0xb3/0xbc0 [btrfs] > kernel: [ 2314.044754] ? start_transaction+0xc8/0x600 [btrfs] > kernel: [ 2314.044890] transaction_kthread+0x14b/0x1c0 [btrfs] > kernel: [ 2314.045021] ? __pfx_transaction_kthread+0x10/0x10 [btrfs] > kernel: [ 2314.045151] kthread+0xe9/0x110 > kernel: [ 2314.045162] ? __pfx_kthread+0x10/0x10 > kernel: [ 2314.045170] ret_from_fork+0x2c/0x50 > kernel: [ 2314.045180] </TASK> > kernel: [ 2314.045182] ---[ end trace 0000000000000000 ]--- > kernel: [ 2314.045186] BTRFS info (device sdc: state A): dumping space info: > kernel: [ 2314.045191] BTRFS info (device sdc: state A): space_info > DATA has 160777674752 free, is not full > kernel: [ 2314.045197] BTRFS info (device sdc: state A): space_info > total=71201958395904, used=71013439856640, pinned=27737325568, > reserved=0, may_use=0, readonly=3538944 zone_unusable=0 > kernel: [ 2314.045205] BTRFS info (device sdc: state A): space_info > METADATA has -429047808 free, is full This means we need at least 500+ MiB metadata space. Thus you may want to try 4x1GiB to see if this makes any difference. Thanks, Qu > kernel: [ 2314.045209] BTRFS info (device sdc: state A): space_info > total=83634421760, used=82789777408, pinned=244891648, > reserved=599687168, may_use=429047808, readonly=65536 zone_unusable=0 > kernel: [ 2314.045217] BTRFS info (device sdc: state A): space_info > SYSTEM has 33390592 free, is not full > kernel: [ 2314.045221] BTRFS info (device sdc: state A): space_info > total=38797312, used=5373952, pinned=16384, reserved=16384, may_use=0, > readonly=0 zone_unusable=0 > kernel: [ 2314.045227] BTRFS info (device sdc: state A): > global_block_rsv: size 536870912 reserved 428523520 > kernel: [ 2314.045231] BTRFS info (device sdc: state A): > trans_block_rsv: size 524288 reserved 524288 > kernel: [ 2314.045235] BTRFS info (device sdc: state A): > chunk_block_rsv: size 0 reserved 0 > kernel: [ 2314.045239] BTRFS info (device sdc: state A): > delayed_block_rsv: size 0 reserved 0 > kernel: [ 2314.045242] BTRFS info (device sdc: state A): > delayed_refs_rsv: size 249756909568 reserved 0 > kernel: [ 2314.045251] BTRFS: error (device sdc: state A) in > do_free_extent_accounting:2847: errno=-28 No space left > kernel: [ 2314.045265] BTRFS warning (device sdc: state A): > btrfs_uuid_scan_kthread failed -28 > kernel: [ 2314.045295] BTRFS info (device sdc: state EA): forced readonly > kernel: [ 2314.045300] BTRFS error (device sdc: state EA): failed to > run delayed ref for logical 103681409916928 num_bytes 131072 type 184 > action 2 ref_mod 1: -28 > kernel: [ 2314.045360] BTRFS: error (device sdc: state EA) in > btrfs_run_delayed_refs:2151: errno=-28 No space left > kernel: [ 2314.049204] BTRFS: error (device sdc: state EA) in > btrfs_create_pending_block_groups:2487: errno=-28 No space left > kernel: [ 2314.049331] BTRFS: error (device sdc: state EA) in > btrfs_create_pending_block_groups:2499: errno=-28 No space left > kernel: [ 2314.053259] BTRFS: error (device sdc: state EA) in > do_free_extent_accounting:2847: errno=-28 No space left > kernel: [ 2314.053318] BTRFS error (device sdc: state EA): failed to > run delayed ref for logical 103681419366400 num_bytes 131072 type 184 > action 2 ref_mod 1: -28 > kernel: [ 2314.053375] BTRFS: error (device sdc: state EA) in > btrfs_run_delayed_refs:2151: errno=-28 No space left > kernel: [ 2314.053430] BTRFS warning (device sdc: state EA): Skipping > commit of aborted transaction. > kernel: [ 2314.053435] BTRFS: error (device sdc: state EA) in > cleanup_transaction:1986: errno=-28 No space left > > > > On Fri, 23 Jun 2023 at 19:16, Qu Wenruo <wqu@suse.com> wrote: >> >> >> >> On 2023/6/23 17:00, Stefan N wrote: >>> Apologies, I thought I included the log output too, though I can't see >>> any additional output >>> >>> From a fresh run, still using the same kernel >>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs >>> dev add -f /dev/sdl /dev/sdm /dev/sdn /dev/sdo /mnt/data ; sudo btrfs >>> fi sync /mnt/data >>> ERROR: error adding device '/dev/sdl': Input/output error >>> ERROR: error adding device '/dev/sdm': Read-only file system >>> ERROR: error adding device '/dev/sdn': Read-only file system >>> ERROR: error adding device '/dev/sdo': Read-only file system >>> ERROR: Could not sync filesystem: Read-only file system >>> $ >>> >>> Output from kern.log, syslog or dmesg -k >>> >> [...] >> >> None of the newly added debug lines triggered, so there is something >> else causing the problem. >> >> And furthermore the backtrace is not that helpful, it only shows it's >> some async metadata reclaim kthread causing the problem. >> >> Although I guess the async metadata reclaim is triggered by the >> btrfs_start_transaction() call when adding a device. >> So I updated my github branch to go btrfs_join_transaction() which would >> not flush any metadata, thus avoid the problem. >> >> Would you please give it a try again? >> >>> >>> However, now I started digging into logs to check I hadn't missed >>> where the errors were being logged, I've found this from roughly a >>> week before I started having issues, which I had not previously >>> noticed >> >> You don't need to bother most error messages after the fs flipped RO. >> As it's known to have some false alerts. >> >> Thanks, >> Qu >> >>> [ 1990.495861] BTRFS error (device sdh): failed to run delayed ref for >>> logical 107988943355904 num_bytes 245760 type 184 action 2 ref_mod 1: >>> -28 >>> [ 1990.518282] BTRFS error (device sdh): failed to run delayed ref for >>> logical 107989043494912 num_bytes 245760 type 184 action 2 ref_mod 1: >>> -28 >>> [ 620.104065] BTRFS error (device sdk): failed to run delayed ref for >>> logical 123187655077888 num_bytes 176128 type 184 action 2 ref_mod 1: >>> -28 >>> [ 620.126209] BTRFS error (device sdk): failed to run delayed ref for >>> logical 123190279929856 num_bytes 134217728 type 184 action 2 ref_mod >>> 1: -28 >>> [ 620.126241] BTRFS error (device sdk): failed to run delayed ref for >>> logical 123189970468864 num_bytes 134217728 type 184 action 2 ref_mod >>> 1: -28 >>> [ 620.126271] BTRFS error (device sdk): failed to run delayed ref for >>> logical 123190414409728 num_bytes 134217728 type 184 action 2 ref_mod >>> 1: -28 >>> [ 476.565308] BTRFS error (device sdh): failed to run delayed ref for >>> logical 101906434228224 num_bytes 651264 type 184 action 2 ref_mod 1: >>> -28 >>> [ 476.565932] BTRFS error (device sdh): failed to run delayed ref for >>> logical 101906434031616 num_bytes 180224 type 184 action 2 ref_mod 1: >>> -28 >>> [ 447.371754] BTRFS error (device sdh): failed to run delayed ref for >>> logical 101946151927808 num_bytes 262144 type 184 action 2 ref_mod 1: >>> -28 >>> [ 447.372362] BTRFS error (device sdh): failed to run delayed ref for >>> logical 101946083725312 num_bytes 245760 type 184 action 2 ref_mod 1: >>> -28 >>> [ 439.839007] BTRFS error (device sdj): failed to run delayed ref for >>> logical 101923102179328 num_bytes 192512 type 184 action 2 ref_mod 1: >>> -28 >>> [ 439.839578] BTRFS error (device sdj): failed to run delayed ref for >>> logical 101923401629696 num_bytes 245760 type 184 action 2 ref_mod 1: >>> -28 >>> [ 466.393884] BTRFS error (device sdh): failed to run delayed ref for >>> logical 101981116137472 num_bytes 245760 type 184 action 2 ref_mod 1: >>> -28 >>> [ 466.394451] BTRFS error (device sdh): failed to run delayed ref for >>> logical 101981122854912 num_bytes 1720320 type 184 action 2 ref_mod 1: >>> -28 >>> [ 431.541367] BTRFS error (device sdh): failed to run delayed ref for >>> logical 101876426952704 num_bytes 126976 type 184 action 2 ref_mod 1: >>> -28 >>> [ 431.542010] BTRFS error (device sdh): failed to run delayed ref for >>> logical 101876427780096 num_bytes 126976 type 184 action 2 ref_mod 1: >>> -28 >>> [ 597.487948] BTRFS error (device sdj): failed to run delayed ref for >>> logical 108127459409920 num_bytes 196608 type 184 action 2 ref_mod 1: >>> -28 >>> [ 597.488539] BTRFS error (device sdj): failed to run delayed ref for >>> logical 108124677865472 num_bytes 126976 type 184 action 2 ref_mod 1: >>> -28 >>> [ 534.717509] BTRFS error (device sdh): failed to run delayed ref for >>> logical 101958618710016 num_bytes 1597440 type 184 action 2 ref_mod 1: >>> -28 >>> [ 534.718494] BTRFS error (device sdh): failed to run delayed ref for >>> logical 101958756335616 num_bytes 368640 type 184 action 2 ref_mod 1: >>> -28 >>> [ 508.089394] BTRFS error (device sdk): failed to run delayed ref for >>> logical 101911627694080 num_bytes 126976 type 184 action 2 ref_mod 1: >>> -28 >>> [ 508.090007] BTRFS error (device sdk): failed to run delayed ref for >>> logical 101911627415552 num_bytes 126976 type 184 action 2 ref_mod 1: >>> -28 >>> [ 1632.112084] BTRFS error (device sdh): failed to run delayed ref for >>> logical 102203759886336 num_bytes 229376 type 184 action 2 ref_mod 1: >>> -28 >>> [ 1632.112885] BTRFS error (device sdh): failed to run delayed ref for >>> logical 102203764379648 num_bytes 126976 type 184 action 2 ref_mod 1: >>> -28 >>> >>> and today, when leaving the disks mounted read-only for a while, I >>> found many occurances similar to: >>> BTRFS error (device sdc: state EA): level verify failed on logical >>> 201329754554368 mirror 1 wanted 2 found 0 >>> BTRFS error (device sdc: state EA): level verify failed on logical >>> 201329754554368 mirror 2 wanted 2 found 0 >>> BTRFS error (device sdc: state EA): level verify failed on logical >>> 201329754554368 mirror 3 wanted 2 found 0 >>> BTRFS error (device sdc: state EA): level verify failed on logical >>> 201329754554368 mirror 4 wanted 2 found 0 >>> BTRFS error (device sdc: state EA): level verify failed on logical >>> 201329754554368 mirror 1 wanted 2 found 0 >>> BTRFS error (device sdc: state EA): level verify failed on logical >>> 201329754554368 mirror 2 wanted 2 found 0 >>> BTRFS error (device sdc: state EA): level verify failed on logical >>> 201329754554368 mirror 3 wanted 2 found 0 >>> BTRFS error (device sdc: state EA): level verify failed on logical >>> 201350830227456 mirror 4 wanted 2 found 0 >>> BTRFS error (device sdc: state EA): level verify failed on logical >>> 201350830227456 mirror 1 wanted 2 found 0 >>> BTRFS error (device sdc: state EA): level verify failed on logical >>> 201350830227456 mirror 2 wanted 2 found 0 >>> >>> On Fri, 23 Jun 2023 at 10:27, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >>>> >>>> >>>> >>>> On 2023/6/23 06:18, Stefan N wrote: >>>>> Hi Qu, >>>>> >>>>> I got one new line this time, but it doesn't seem to match your commit >>>>> ERROR: zoned: unable to stat /dev/loop/13 >>>> >>>> Please provide the dmesg of that attempt, as all the extra debug info is >>>> inside dmesg. >>>> >>>> With that info provided, we can determine what to do next. >>>> >>>> Thanks, >>>> Qu >>>> >>>>> >>>>> I tried it on the USB flash drives too and didn't get any extra line >>>>> >>>>> In context >>>>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs >>>>> dev add -K -f /dev/loop12 /dev/loop/13 /dev/loop14 /dev/loop15 >>>>> /mnt/data ; sudo btrfs fi sync /mnt/data >>>>> ERROR: error adding device '/dev/loop12': Input/output error >>>>> ERROR: zoned: unable to stat /dev/loop/13 >>>>> ERROR: checking status of /dev/loop/13: No such file or directory >>>>> ERROR: error adding device '/dev/loop14': Read-only file system >>>>> ERROR: error adding device '/dev/loop15': Read-only file system >>>>> ERROR: Could not sync filesystem: Read-only file system >>>>> $ >>>>> >>>>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs >>>>> dev add -f /dev/sdl /dev/sdm /dev/sdn /dev/sdo /mnt/data ; sudo btrfs >>>>> fi sync /mnt/data >>>>> ERROR: error adding device '/dev/sdl': Input/output error >>>>> ERROR: error adding device '/dev/sdm': Read-only file system >>>>> ERROR: error adding device '/dev/sdn': Read-only file system >>>>> ERROR: error adding device '/dev/sdo': Read-only file system >>>>> ERROR: Could not sync filesystem: Read-only file system >>>>> $ >>>>> >>>>> On Thu, 22 Jun 2023 at 18:48, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On 2023/6/22 16:33, Stefan N wrote: >>>>>>> Hi Qu, >>>>>>> >>>>>>> Many thanks for the detailed instructions and your patience. I got it >>>>>>> working combined with >>>>>>> https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel on the main system >>>>>>> OS instead, tagged +btrfix >>>>>>> $ uname -vr >>>>>>> 6.2.0-23-generic #23+btrfix SMP PREEMPT_DYNAMIC Thu Jun 22 >>>>>>> >>>>>>> However, I've not had luck with the commands suggested, and would >>>>>>> appreciate any further ideas. >>>>>>> >>>>>>> Outputs follow below, with /mnt/data as the btrfs mount point that >>>>>>> currently contains 8x disks sd[a-j] with an additional 4x 64gb USB >>>>>>> flash drives being added sd[l-o] >>>>>>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs >>>>>>> dev add -f /dev/sdl /dev/sdm /dev/sdn /dev/sdo /mnt/data ; sudo btrfs >>>>>>> fi sync /mnt/data >>>>>>> ERROR: error adding device '/dev/sdl': Input/output error >>>>>>> ERROR: error adding device '/dev/sdm': Read-only file system >>>>>>> ERROR: error adding device '/dev/sdn': Read-only file system >>>>>>> ERROR: error adding device '/dev/sdo': Read-only file system >>>>>>> ERROR: Could not sync filesystem: Read-only file system >>>>>>> $ >>>>>>> >>>>>>> The same occurs if I try to add 4x 100mb loop devices (on a ssd so >>>>>>> they're super quick to zero); >>>>>>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs >>>>>>> dev add -K -f /dev/loop16 /dev/loop17 /dev/loop18 /dev/loop19 >>>>>>> /mnt/data ; sudo btrfs fi sync /mnt/data >>>>>>> ERROR: error adding device '/dev/loop16': Input/output error >>>>>> >>>>>> This is the interesting part, this means we're erroring out due to -EIO >>>>>> (not -ENOSPC) during the first device add. >>>>>> >>>>>> And by somehow, after the first device add, we already got the trans abort. >>>>>> >>>>>> Would you please try the following branch? >>>>>> >>>>>> https://github.com/adam900710/linux/tree/dev_add_no_commit >>>>>> >>>>>> It has not only the patch to skip the commit, but also extra debug >>>>>> output for the situation. >>>>>> >>>>>> Thanks, >>>>>> Qu >>>>>> >>>>>>> ERROR: error adding device '/dev/loop17': Read-only file system >>>>>>> ERROR: error adding device '/dev/loop18': Read-only file system >>>>>>> ERROR: error adding device '/dev/loop19': Read-only file system >>>>>>> ERROR: Could not sync filesystem: Read-only file system >>>>>>> $ >>>>>>> >>>>>>> I confirmed before both these kernel builds that the replaced line was >>>>>>> btrfs_end_transaction rather than btrfs_commit_transaction (anyone >>>>>>> else following, I needed to remove the -n in the patch command >>>>>>> earlier) >>>>>>> $ grep -A3 -ri btrfs_sysfs_update_sprout */fs/btrfs/volumes.c* >>>>>>> linux-6.2.0-dist/fs/btrfs/volumes.c: >>>>>>> btrfs_sysfs_update_sprout_fsid(fs_devices); >>>>>>> linux-6.2.0-dist/fs/btrfs/volumes.c- } >>>>>>> linux-6.2.0-dist/fs/btrfs/volumes.c- >>>>>>> linux-6.2.0-dist/fs/btrfs/volumes.c- ret = btrfs_commit_transaction(trans); >>>>>>> -- >>>>>>> linux-6.2.0-v2/fs/btrfs/volumes.c: >>>>>>> btrfs_sysfs_update_sprout_fsid(fs_devices); >>>>>>> linux-6.2.0-v2/fs/btrfs/volumes.c- } >>>>>>> linux-6.2.0-v2/fs/btrfs/volumes.c- >>>>>>> linux-6.2.0-v2/fs/btrfs/volumes.c- ret = btrfs_end_transaction(trans); >>>>>>> -- >>>>>>> linux-6.2.0-v3/fs/btrfs/volumes.c: >>>>>>> btrfs_sysfs_update_sprout_fsid(fs_devices); >>>>>>> linux-6.2.0-v3/fs/btrfs/volumes.c- } >>>>>>> linux-6.2.0-v3/fs/btrfs/volumes.c- >>>>>>> linux-6.2.0-v3/fs/btrfs/volumes.c- ret = btrfs_end_transaction(trans); >>>>>>> $ >>>>>>> >>>>>>> $ btrfs fi usage /mnt/data >>>>>>> Overall: >>>>>>> Device size: 87.31TiB >>>>>>> Device allocated: 87.31TiB >>>>>>> Device unallocated: 1.94GiB >>>>>>> Device missing: 0.00B >>>>>>> Device slack: 0.00B >>>>>>> Used: 87.08TiB >>>>>>> Free (estimated): 173.29GiB (min: 172.33GiB) >>>>>>> Free (statfs, df): 171.84GiB >>>>>>> Data ratio: 1.34 >>>>>>> Metadata ratio: 4.00 >>>>>>> Global reserve: 512.00MiB (used: 371.25MiB) >>>>>>> Multiple profiles: no >>>>>>> >>>>>>> Data,RAID6: Size:64.76TiB, Used:64.59TiB (99.74%) >>>>>>> /dev/sdc 10.90TiB >>>>>>> /dev/sdf 10.90TiB >>>>>>> /dev/sda 10.86TiB >>>>>>> /dev/sdg 10.87TiB >>>>>>> /dev/sdh 10.86TiB >>>>>>> /dev/sdd 10.87TiB >>>>>>> /dev/sde 10.88TiB >>>>>>> /dev/sdb 10.88TiB >>>>>>> >>>>>>> Metadata,RAID1C4: Size:77.79GiB, Used:77.11GiB (99.12%) >>>>>>> /dev/sdc 15.33GiB >>>>>>> /dev/sdf 18.41GiB >>>>>>> /dev/sda 49.63GiB >>>>>>> /dev/sdg 49.50GiB >>>>>>> /dev/sdh 51.52GiB >>>>>>> /dev/sdd 48.70GiB >>>>>>> /dev/sde 39.09GiB >>>>>>> /dev/sdb 39.01GiB >>>>>>> >>>>>>> System,RAID1C4: Size:37.00MiB, Used:5.11MiB (13.81%) >>>>>>> /dev/sdc 1.00MiB >>>>>>> /dev/sda 37.00MiB >>>>>>> /dev/sdg 37.00MiB >>>>>>> /dev/sdh 36.00MiB >>>>>>> /dev/sdd 37.00MiB >>>>>>> >>>>>>> Unallocated: >>>>>>> /dev/sdc 1.00MiB >>>>>>> /dev/sdf 1.00MiB >>>>>>> /dev/sda 1.27GiB >>>>>>> /dev/sdg 1.00MiB >>>>>>> /dev/sdh 1.00MiB >>>>>>> /dev/sdd 687.00MiB >>>>>>> /dev/sde 1.00MiB >>>>>>> /dev/sdb 1.00MiB >>>>>>> $ >>>>>>> >>>>>>> >>>>>>> This first attempt generated the following syslog output: >>>>>>> kernel: [ 868.435387] BTRFS info (device sde): using crc32c >>>>>>> (crc32c-intel) checksum algorithm >>>>>>> kernel: [ 868.435407] BTRFS info (device sde): disk space caching is enabled >>>>>>> kernel: [ 874.477712] BTRFS info (device sde): bdev /dev/sdg errs: wr >>>>>>> 0, rd 0, flush 0, corrupt 845, gen 0 >>>>>>> kernel: [ 874.477727] BTRFS info (device sde): bdev /dev/sdc errs: wr >>>>>>> 41089, rd 1556, flush 0, corrupt 0, gen 0 >>>>>>> kernel: [ 874.477735] BTRFS info (device sde): bdev /dev/sdj errs: wr >>>>>>> 3, rd 7, flush 0, corrupt 0, gen 0 >>>>>>> kernel: [ 874.477740] BTRFS info (device sde): bdev /dev/sdf errs: wr >>>>>>> 41, rd 0, flush 0, corrupt 0, gen 0 >>>>>>> kernel: [ 1082.645551] BTRFS info (device sde): balance: resume skipped >>>>>>> kernel: [ 1082.645564] BTRFS info (device sde): checking UUID tree >>>>>>> kernel: [ 1082.645551] BTRFS info (device sde): balance: resume skipped >>>>>>> kernel: [ 1082.645564] BTRFS info (device sde): checking UUID tree >>>>>>> kernel: [ 1267.280506] BTRFS: Transaction aborted (error -28) >>>>>>> kernel: [ 1267.280553] BTRFS: error (device sde: state A) in >>>>>>> do_free_extent_accounting:2847: errno=-28 No space left >>>>>>> kernel: [ 1267.280604] BTRFS info (device sde: state EA): forced readonly >>>>>>> kernel: [ 1267.280610] BTRFS error (device sde: state EA): failed to >>>>>>> run delayed ref for logical 102255404044288 num_bytes 294912 type 184 >>>>>>> action 2 ref_mod 1: -28 >>>>>>> kernel: [ 1267.280584] WARNING: CPU: 3 PID: 14519 at >>>>>>> fs/btrfs/extent-tree.c:2847 do_free_extent_accounting+0x21a/0x220 >>>>>>> [btrfs] >>>>>>> kernel: [ 1267.280666] BTRFS: error (device sde: state EA) in >>>>>>> btrfs_run_delayed_refs:2151: errno=-28 No space left >>>>>>> kernel: [ 1267.280695] BTRFS warning (device sde: state EA): >>>>>>> btrfs_uuid_scan_kthread failed -5 >>>>>>> kernel: [ 1267.280794] Modules linked in: xt_nat xt_tcpudp veth >>>>>>> xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink >>>>>>> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo >>>>>>> xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc >>>>>>> ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) binfmt_misc >>>>>>> nls_iso8859_1 intel_rapl_msr intel_rapl_common edac_mce_amd >>>>>>> snd_hda_codec_realtek kvm_amd snd_hda_codec_generic ledtrig_audio kvm >>>>>>> snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi >>>>>>> snd_hda_codec irqbypass snd_hda_core snd_hwdep rapl snd_pcm snd_timer >>>>>>> wmi_bmof k10temp snd ccp soundcore input_leds mac_hid dm_multipath >>>>>>> scsi_dh_rdac scsi_dh_emc scsi_dh_alua bonding tls efi_pstore msr nfsd >>>>>>> auth_rpcgss nfs_acl lockd grace sunrpc dmi_sysfs ip_tables x_tables >>>>>>> autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov >>>>>>> async_memcpy async_pq async_xor async_txxor raid6_pq libcrc32c raid1 >>>>>>> raid0 multipath linear hid_generic usbhid hid amdgpu uas usb_storage >>>>>>> kernel: [ 1267.280994] CPU: 3 PID: 14519 Comm: btrfs-transacti >>>>>>> Tainted: G W O 6.2.0-23-generic #23+btrfix >>>>>>> kernel: [ 1267.281005] RIP: 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] >>>>>>> kernel: [ 1267.281181] __btrfs_free_extent+0x6bc/0xf50 [btrfs] >>>>>>> kernel: [ 1267.281310] run_delayed_data_ref+0x8b/0x180 [btrfs] >>>>>>> kernel: [ 1267.281444] btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] >>>>>>> kernel: [ 1267.281570] __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] >>>>>>> kernel: [ 1267.281694] btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] >>>>>>> kernel: [ 1267.281818] btrfs_start_dirty_block_groups+0x36b/0x530 [btrfs] >>>>>>> kernel: [ 1267.281976] btrfs_commit_transaction+0xb3/0xbc0 [btrfs] >>>>>>> kernel: [ 1267.282110] ? start_transaction+0xc8/0x600 [btrfs] >>>>>>> kernel: [ 1267.282244] transaction_kthread+0x14b/0x1c0 [btrfs] >>>>>>> kernel: [ 1267.282375] ? __pfx_transaction_kthread+0x10/0x10 [btrfs] >>>>>>> kernel: [ 1267.282548] BTRFS info (device sde: state EA): dumping space info: >>>>>>> kernel: [ 1267.282552] BTRFS info (device sde: state EA): space_info >>>>>>> DATA has 160777674752 free, is not full >>>>>>> kernel: [ 1267.282558] BTRFS info (device sde: state EA): space_info >>>>>>> total=71201958395904, used=71018191273984, pinned=22985908224, >>>>>>> reserved=0, may_use=0, readonly=3538944 zone_unusable=0 >>>>>>> kernel: [ 1267.282566] BTRFS info (device sde: state EA): space_info >>>>>>> METADATA has -124944384 free, is full >>>>>>> kernel: [ 1267.282571] BTRFS info (device sde: state EA): space_info >>>>>>> total=83530612736, used=82791497728, pinned=242745344, >>>>>>> reserved=496369664, may_use=124944384, readonly=0 zone_unusable=0 >>>>>>> kernel: [ 1267.282577] BTRFS info (device sde: state EA): space_info >>>>>>> SYSTEM has 33439744 free, is not full >>>>>>> kernel: [ 1267.282582] BTRFS info (device sde: state EA): space_info >>>>>>> total=38797312, used=5357568, pinned=0, reserved=0, may_use=0, >>>>>>> readonly=0 zone_unusable=0 >>>>>>> kernel: [ 1267.282588] BTRFS info (device sde: state EA): >>>>>>> global_block_rsv: size 536870912 reserved 124944384 >>>>>>> kernel: [ 1267.282592] BTRFS info (device sde: state EA): >>>>>>> trans_block_rsv: size 0 reserved 0 >>>>>>> kernel: [ 1267.282595] BTRFS info (device sde: state EA): >>>>>>> chunk_block_rsv: size 0 reserved 0 >>>>>>> kernel: [ 1267.282599] BTRFS info (device sde: state EA): >>>>>>> delayed_block_rsv: size 0 reserved 0 >>>>>>> kernel: [ 1267.282602] BTRFS info (device sde: state EA): >>>>>>> delayed_refs_rsv: size 251322957824 reserved 0 >>>>>>> kernel: [ 1267.282608] BTRFS: error (device sde: state EA) in >>>>>>> do_free_extent_accounting:2847: errno=-28 No space left >>>>>>> kernel: [ 1267.282653] BTRFS error (device sde: state EA): failed to >>>>>>> run delayed ref for logical 102255401897984 num_bytes 126976 type 184 >>>>>>> action 2 ref_mod 1: -28 >>>>>>> kernel: [ 1267.282708] BTRFS: error (device sde: state EA) in >>>>>>> btrfs_run_delayed_refs:2151: errno=-28 No space left >>>>>>> >>>>>>> A couple of kernel recompiles later, the second attempt on the SSD >>>>>>> generated similar: >>>>>>> kernel: [ 1472.203470] BTRFS info (device sdc): using crc32c >>>>>>> (crc32c-intel) checksum algorithm >>>>>>> kernel: [ 1472.203491] BTRFS info (device sdc): disk space caching is enabled >>>>>>> kernel: [ 1478.155004] BTRFS info (device sdc): bdev /dev/sdf errs: wr >>>>>>> 0, rd 0, flush 0, corrupt 845, gen 0 >>>>>>> kernel: [ 1478.155022] BTRFS info (device sdc): bdev /dev/sda errs: wr >>>>>>> 41089, rd 1556, flush 0, corrupt 0, gen 0 >>>>>>> kernel: [ 1478.155034] BTRFS info (device sdc): bdev /dev/sdh errs: wr >>>>>>> 3, rd 7, flush 0, corrupt 0, gen 0 >>>>>>> kernel: [ 1478.155041] BTRFS info (device sdc): bdev /dev/sdd errs: wr >>>>>>> 41, rd 0, flush 0, corrupt 0, gen 0 >>>>>>> kernel: [ 1696.662526] BTRFS info (device sdc): balance: resume skipped >>>>>>> kernel: [ 1696.662537] BTRFS info (device sdc): checking UUID tree >>>>>>> kernel: [ 1919.452464] BTRFS: Transaction aborted (error -28) >>>>>>> kernel: [ 1919.452534] WARNING: CPU: 1 PID: 161 at >>>>>>> fs/btrfs/extent-tree.c:2847 do_free_extent_accounting+0x21a/0x220 >>>>>>> [btrfs] >>>>>>> kernel: [ 1919.452655] Modules linked in: xt_nat xt_tcpudp veth >>>>>>> xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink >>>>>>> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo >>>>>>> xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc >>>>>>> ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) binfmt_misc >>>>>>> nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic >>>>>>> ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg >>>>>>> snd_intel_sdw_acpi snd_hda_codec intel_rapl_msr snd_hda_core >>>>>>> intel_rapl_common edac_mce_amd snd_hwdep kvm_amd snd_pcm snd_timer kvm >>>>>>> irqbypass rapl wmi_bmof snd k10temp soundcore ccp input_leds mac_hid >>>>>>> dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua bonding tls nfsd >>>>>>> msr auth_rpcgss efi_pstore nfs_acl lockd grace sunrpc dmi_sysfs >>>>>>> ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 >>>>>>> async_raid6_recov async_memcpy async_pq async_xor async_tx xor >>>>>>> raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid >>>>>>> amdgpu uas hid iommu_v2 >>>>>>> kernel: [ 1919.452839] Workqueue: events_unbound >>>>>>> btrfs_async_reclaim_metadata_space [btrfs] >>>>>>> kernel: [ 1919.452985] RIP: 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] >>>>>>> kernel: [ 1919.453141] __btrfs_free_extent+0x6bc/0xf50 [btrfs] >>>>>>> kernel: [ 1919.453256] run_delayed_data_ref+0x8b/0x180 [btrfs] >>>>>>> kernel: [ 1919.453368] btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] >>>>>>> kernel: [ 1919.453480] __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] >>>>>>> kernel: [ 1919.453592] btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] >>>>>>> kernel: [ 1919.453703] flush_space+0x23c/0x2c0 [btrfs] >>>>>>> kernel: [ 1919.453845] btrfs_async_reclaim_metadata_space+0x19b/0x2b0 [btrfs] >>>>>>> kernel: [ 1919.454034] BTRFS info (device sdc: state A): dumping space info: >>>>>>> kernel: [ 1919.454038] BTRFS info (device sdc: state A): space_info >>>>>>> DATA has 160778723328 free, is not full >>>>>>> kernel: [ 1919.454043] BTRFS info (device sdc: state A): space_info >>>>>>> total=71201958395904, used=71017442181120, pinned=23733952512, >>>>>>> reserved=0, may_use=0, readonly=3538944 zone_unusable=0 >>>>>>> kernel: [ 1919.454050] BTRFS info (device sdc: state A): space_info >>>>>>> METADATA has -147570688 free, is full >>>>>>> kernel: [ 1919.454054] BTRFS info (device sdc: state A): space_info >>>>>>> total=83530612736, used=82792185856, pinned=238059520, >>>>>>> reserved=500367360, may_use=147570688, readonly=0 zone_unusable=0 >>>>>>> kernel: [ 1919.454060] BTRFS info (device sdc: state A): space_info >>>>>>> SYSTEM has 33439744 free, is not full >>>>>>> kernel: [ 1919.454064] BTRFS info (device sdc: state A): space_info >>>>>>> total=38797312, used=5357568, pinned=0, reserved=0, may_use=0, >>>>>>> readonly=0 zone_unusable=0 >>>>>>> kernel: [ 1919.454070] BTRFS info (device sdc: state A): >>>>>>> global_block_rsv: size 536870912 reserved 147570688 >>>>>>> kernel: [ 1919.454074] BTRFS info (device sdc: state A): >>>>>>> trans_block_rsv: size 0 reserved 0 >>>>>>> kernel: [ 1919.454077] BTRFS info (device sdc: state A): >>>>>>> chunk_block_rsv: size 0 reserved 0 >>>>>>> kernel: [ 1919.454080] BTRFS info (device sdc: state A): >>>>>>> delayed_block_rsv: size 0 reserved 0 >>>>>>> kernel: [ 1919.454083] BTRFS info (device sdc: state A): >>>>>>> delayed_refs_rsv: size 254292787200 reserved 0 >>>>>>> kernel: [ 1919.454086] BTRFS: error (device sdc: state A) in >>>>>>> do_free_extent_accounting:2847: errno=-28 No space left >>>>>>> kernel: [ 1919.454123] BTRFS info (device sdc: state EA): forced readonly >>>>>>> kernel: [ 1919.454127] BTRFS error (device sdc: state EA): failed to >>>>>>> run delayed ref for logical 102538713931776 num_bytes 245760 type 184 >>>>>>> action 2 ref_mod 1: -28 >>>>>>> kernel: [ 1919.454176] BTRFS: error (device sdc: state EA) in >>>>>>> btrfs_run_delayed_refs:2151: errno=-28 No space left >>>>>>> kernel: [ 1919.454249] BTRFS warning (device sdc: state EA): >>>>>>> btrfs_uuid_scan_kthread failed -5 >>>>>>> kernel: [ 1919.472381] BTRFS: error (device sdc: state EA) in >>>>>>> __btrfs_free_extent:3077: errno=-28 No space left >>>>>>> kernel: [ 1919.472417] BTRFS error (device sdc: state EA): failed to >>>>>>> run delayed ref for logical 102538732191744 num_bytes 245760 type 184 >>>>>>> action 2 ref_mod 1: -28 >>>>>>> kernel: [ 1919.472442] BTRFS: error (device sdc: state EA) in >>>>>>> btrfs_run_delayed_refs:2151: errno=-28 No space left >>>>>>> >>>>>>> >>>>>>> On Sat, 17 Jun 2023 at 15:00, Qu Wenruo <wqu@suse.com> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 2023/6/17 13:11, Stefan N wrote: >>>>>>>>> Hi Qu, >>>>>>>>> >>>>>>>>> I believe I've got this environment ready, with the 6.2.0 kernel as >>>>>>>>> before using the Ubuntu kernel, but can switch to vanilla if required. >>>>>>>>> >>>>>>>>> I've not done anything kernel modifications for a solid decade, so >>>>>>>>> would be keen for a bit of guidance. >>>>>>>> >>>>>>>> Sure no problem. >>>>>>>> >>>>>>>> Please fetch the kernel source tar ball (6.2.x) first, decompress, then >>>>>>>> apply the attached one-line patch by: >>>>>>>> >>>>>>>> $ tar czf linux*.tar.xz >>>>>>>> $ cd linux* >>>>>>>> $ patch -np1 -i <the patch file> >>>>>>>> >>>>>>>> Then use your running system kernel config if possible: >>>>>>>> >>>>>>>> $ cp /proc/config.gz . >>>>>>>> $ gunzip config.gz >>>>>>>> $ mv config .config >>>>>>>> $ make olddefconfig >>>>>>>> >>>>>>>> Then you can start your kernel compiling, and considering you're using >>>>>>>> your distro's default, it would include tons of drivers, thus would be >>>>>>>> very slow. (Replace the number to something more suitable to your >>>>>>>> system, using all CPU cores can be very hot) >>>>>>>> >>>>>>>> $ make -j12 >>>>>>>> >>>>>>>> Finally you need to install the modules/kernel. >>>>>>>> >>>>>>>> Unfortunately this is distro specific, but if you're using Ubuntu, it >>>>>>>> may be much easier: >>>>>>>> >>>>>>>> $ make bindeb-pkg >>>>>>>> >>>>>>>> Then install the generated dpkg I guess? I have never tried kernel >>>>>>>> building using deb/rpm, but only manual installation, which is also >>>>>>>> distro dependent in the initramfs generation part. >>>>>>>> >>>>>>>> # cp arch/x86/boot/bzImage /boot/vmlinuz-custom >>>>>>>> # make modules_install >>>>>>>> # mkinitcpio -k /boot/vmlinuz-custom -g /boot/initramfs-custom.img >>>>>>>> >>>>>>>> >>>>>>>> The last step is to update your bootloader to add the new kernel, which >>>>>>>> is not only distro dependent but also bootloader dependent. >>>>>>>> >>>>>>>> In my case, I go with systemd-boot with manually crafted entries. >>>>>>>> But if you go Ubuntu I believe just installing the kernel dpkg would >>>>>>>> have everything handled? >>>>>>>> >>>>>>>> Finally you can try reboot into the newer kernel, and try device add >>>>>>>> (need to add 4 disks), then sync and see if things work as expected. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Qu >>>>>>>>> >>>>>>>>> I will recover a 1tb SSD and partition it into 4 in a USB enclosure, >>>>>>>>> but failing this will use 4x loop devices. >>>>>>>>> >>>>>>>>> On Tue, 13 Jun 2023 at 11:28, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >>>>>>>>>> In your particular case, since you're running RAID1C4 you need to add 4 >>>>>>>>>> devices in one transaction. >>>>>>>>>> >>>>>>>>>> I can easily craft a patch to avoid commit transaction, but still you'll >>>>>>>>>> need to add at least 4 disks, and then sync to see if things would work. >>>>>>>>>> >>>>>>>>>> Furthermore this means you need a liveCD with full kernel compiling >>>>>>>>>> environment. >>>>>>>>>> >>>>>>>>>> If you want to go this path, I can send you the patch when you've >>>>>>>>>> prepared the needed environment. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Out of space loop: skip_balance not working 2023-06-26 10:18 ` Qu Wenruo @ 2023-06-26 12:58 ` Stefan N 2023-07-22 5:28 ` Stefan N 0 siblings, 1 reply; 22+ messages in thread From: Stefan N @ 2023-06-26 12:58 UTC (permalink / raw) To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs@vger.kernel.org Hi Qu, Thanks for all the help, I managed to get it mounted and synced with 5G loops (2G allocated to metadata, 3G unallocated on each). I'm able to read existing files, write new files, and any changes remain after an unmount and remount. $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs dev add -K -f /dev/loop20 /dev/loop21 /dev/loop22 /dev/loop23 /mnt/data ; sudo btrfs fi sync /mnt/data $ sudo btrfs fi show Label: none uuid: abc123 Total devices 12 FS bytes used 64.52TiB devid 1 size 10.91TiB used 10.89TiB path /dev/sdd devid 2 size 10.91TiB used 10.89TiB path /dev/sdh devid 3 size 10.91TiB used 10.89TiB path /dev/sdb devid 4 size 10.91TiB used 10.89TiB path /dev/sdg devid 5 size 10.91TiB used 10.89TiB path /dev/sdi devid 6 size 10.91TiB used 10.89TiB path /dev/sde devid 7 size 10.91TiB used 10.89TiB path /dev/sdf devid 8 size 10.91TiB used 10.89TiB path /dev/sdc devid 9 size 5.00GiB used 2.00GiB path /dev/loop20 devid 10 size 5.00GiB used 2.00GiB path /dev/loop21 devid 11 size 5.00GiB used 2.00GiB path /dev/loop22 devid 12 size 5.00GiB used 2.00GiB path /dev/loop23 $ I'd be keen to know what you'd suggest for next steps. I have two 18T disks to upgrade two of the existing 12T disks, which could be a substitute or add them over USB for a while. While a random sample of files seem to be perfectly intact, I'd be keen to verify the integrity to track down any corrupted files. Should I perform a scrub before adding/replacing the new disks, or can this be safely done afterwards? e.g. can I safely add 2x18tb, remove loops, begin scrub, and then remove 2x 12tb when scrub completes? See kernel log below: kernel: [ 399.272458] BTRFS info (device sdd): using crc32c (crc32c-intel) checksum algorithm kernel: [ 399.272476] BTRFS info (device sdd): disk space caching is enabled kernel: [ 404.855750] BTRFS info (device sdd): bdev /dev/sdh errs: wr 0, rd 0, flush 0, corrupt 845, gen 0 kernel: [ 404.855766] BTRFS info (device sdd): bdev /dev/sdb errs: wr 41089, rd 1556, flush 0, corrupt 0, gen 0 kernel: [ 404.855778] BTRFS info (device sdd): bdev /dev/sdi errs: wr 3, rd 7, flush 0, corrupt 0, gen 0 kernel: [ 404.855785] BTRFS info (device sdd): bdev /dev/sde errs: wr 41, rd 0, flush 0, corrupt 0, gen 0 kernel: [ 630.844173] BTRFS info (device sdd): balance: resume skipped kernel: [ 630.844185] BTRFS info (device sdd): checking UUID tree kernel: [ 630.871787] BTRFS info (device sdd): disk added /dev/loop20 kernel: [ 630.881223] BTRFS info (device sdd): disk added /dev/loop21 kernel: [ 630.888817] BTRFS info (device sdd): disk added /dev/loop22 kernel: [ 630.896302] BTRFS info (device sdd): disk added /dev/loop23 kernel: [ 846.849616] INFO: task btrfs-uuid:4834 blocked for more than 120 seconds. kernel: [ 846.849660] Tainted: G W O 6.2.0-23-generic #23+btrdebug2c kernel: [ 846.849693] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: [ 846.849725] task:btrfs-uuid state:D stack:0 pid:4834 ppid:2 flags:0x00004000 kernel: [ 846.849735] Call Trace: kernel: [ 846.849739] <TASK> kernel: [ 846.849747] __schedule+0x2aa/0x610 kernel: [ 846.849761] schedule+0x63/0x110 kernel: [ 846.849769] wait_current_trans+0x100/0x160 [btrfs] kernel: [ 846.849908] ? __pfx_autoremove_wake_function+0x10/0x10 kernel: [ 846.849920] start_transaction+0x28b/0x600 [btrfs] kernel: [ 846.850057] btrfs_start_transaction+0x1e/0x30 [btrfs] kernel: [ 846.850191] btrfs_uuid_scan_kthread+0x314/0x420 [btrfs] kernel: [ 846.850359] ? __pfx_btrfs_uuid_rescan_kthread+0x10/0x10 [btrfs] kernel: [ 846.850487] btrfs_uuid_rescan_kthread+0x20/0x70 [btrfs] kernel: [ 846.850614] kthread+0xe9/0x110 kernel: [ 846.850623] ? __pfx_kthread+0x10/0x10 kernel: [ 846.850631] ret_from_fork+0x2c/0x50 kernel: [ 846.850642] </TASK> kernel: [ 846.850645] INFO: task btrfs:4850 blocked for more than 120 seconds. kernel: [ 846.850676] Tainted: G W O 6.2.0-23-generic #23+btrdebug2c kernel: [ 846.850707] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: [ 846.850738] task:btrfs state:D stack:0 pid:4850 ppid:4849 flags:0x00000002 kernel: [ 846.850746] Call Trace: kernel: [ 846.850749] <TASK> kernel: [ 846.850752] __schedule+0x2aa/0x610 kernel: [ 846.850760] schedule+0x63/0x110 kernel: [ 846.850765] btrfs_commit_transaction+0x9b7/0xbc0 [btrfs] kernel: [ 846.850899] ? __pfx_autoremove_wake_function+0x10/0x10 kernel: [ 846.850908] btrfs_sync_fs+0x5a/0x1b0 [btrfs] kernel: [ 846.851027] btrfs_ioctl+0x643/0x14d0 [btrfs] kernel: [ 846.851186] ? putname+0x5d/0x80 kernel: [ 846.851195] ? do_sys_openat2+0xab/0x180 kernel: [ 846.851203] ? exit_to_user_mode_prepare+0x30/0xb0 kernel: [ 846.851213] __x64_sys_ioctl+0xa0/0xe0 kernel: [ 846.851221] do_syscall_64+0x5b/0x90 kernel: [ 846.851229] ? exc_page_fault+0x91/0x1b0 kernel: [ 846.851236] entry_SYSCALL_64_after_hwframe+0x72/0xdc kernel: [ 846.851243] RIP: 0033:0x7fbf339119ef kernel: [ 846.851249] RSP: 002b:00007ffd58427660 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 kernel: [ 846.851255] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fbf339119ef kernel: [ 846.851259] RDX: 0000000000000000 RSI: 0000000000009408 RDI: 0000000000000003 kernel: [ 846.851263] RBP: 0000000000000007 R08: 0000000000000000 R09: 0000000000000000 kernel: [ 846.851266] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fbf339f642c kernel: [ 846.851269] R13: 0000000000000001 R14: 0000557384b29578 R15: 0000000000000000 kernel: [ 846.851277] </TASK> kernel: [ 967.681770] INFO: task btrfs-uuid:4834 blocked for more than 241 seconds. kernel: [ 967.681818] Tainted: G W O 6.2.0-23-generic #23+btrdebug2c kernel: [ 967.681852] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: [ 967.681884] task:btrfs-uuid state:D stack:0 pid:4834 ppid:2 flags:0x00004000 kernel: [ 967.681895] Call Trace: kernel: [ 967.681899] <TASK> kernel: [ 967.681907] __schedule+0x2aa/0x610 kernel: [ 967.681922] schedule+0x63/0x110 kernel: [ 967.681931] wait_current_trans+0x100/0x160 [btrfs] kernel: [ 967.682070] ? __pfx_autoremove_wake_function+0x10/0x10 kernel: [ 967.682082] start_transaction+0x28b/0x600 [btrfs] kernel: [ 967.682219] btrfs_start_transaction+0x1e/0x30 [btrfs] kernel: [ 967.682353] btrfs_uuid_scan_kthread+0x314/0x420 [btrfs] kernel: [ 967.682519] ? __pfx_btrfs_uuid_rescan_kthread+0x10/0x10 [btrfs] kernel: [ 967.682645] btrfs_uuid_rescan_kthread+0x20/0x70 [btrfs] kernel: [ 967.682728] kthread+0xe9/0x110 kernel: [ 967.682734] ? __pfx_kthread+0x10/0x10 kernel: [ 967.682739] ret_from_fork+0x2c/0x50 kernel: [ 967.682746] </TASK> kernel: [ 967.682749] INFO: task btrfs:4850 blocked for more than 241 seconds. kernel: [ 967.682771] Tainted: G W O 6.2.0-23-generic #23+btrdebug2c kernel: [ 967.682793] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: [ 967.682815] task:btrfs state:D stack:0 pid:4850 ppid:4849 flags:0x00000002 kernel: [ 967.682820] Call Trace: kernel: [ 967.682822] <TASK> kernel: [ 967.682824] __schedule+0x2aa/0x610 kernel: [ 967.682829] schedule+0x63/0x110 kernel: [ 967.682832] btrfs_commit_transaction+0x9b7/0xbc0 [btrfs] kernel: [ 967.682918] ? __pfx_autoremove_wake_function+0x10/0x10 kernel: [ 967.682923] btrfs_sync_fs+0x5a/0x1b0 [btrfs] kernel: [ 967.682999] btrfs_ioctl+0x643/0x14d0 [btrfs] kernel: [ 967.683085] ? putname+0x5d/0x80 kernel: [ 967.683091] ? do_sys_openat2+0xab/0x180 kernel: [ 967.683096] ? exit_to_user_mode_prepare+0x30/0xb0 kernel: [ 967.683103] __x64_sys_ioctl+0xa0/0xe0 kernel: [ 967.683107] do_syscall_64+0x5b/0x90 kernel: [ 967.683112] ? exc_page_fault+0x91/0x1b0 kernel: [ 967.683116] entry_SYSCALL_64_after_hwframe+0x72/0xdc kernel: [ 967.683121] RIP: 0033:0x7fbf339119ef kernel: [ 967.683124] RSP: 002b:00007ffd58427660 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 kernel: [ 967.683128] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fbf339119ef kernel: [ 967.683130] RDX: 0000000000000000 RSI: 0000000000009408 RDI: 0000000000000003 kernel: [ 967.683132] RBP: 0000000000000007 R08: 0000000000000000 R09: 0000000000000000 kernel: [ 967.683134] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fbf339f642c kernel: [ 967.683136] R13: 0000000000000001 R14: 0000557384b29578 R15: 0000000000000000 kernel: [ 967.683141] </TASK> kernel: [ 1088.519959] INFO: task btrfs-uuid:4834 blocked for more than 362 seconds. kernel: [ 1088.520006] Tainted: G W O 6.2.0-23-generic #23+btrdebug2c kernel: [ 1088.520039] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: [ 1088.520071] task:btrfs-uuid state:D stack:0 pid:4834 ppid:2 flags:0x00004000 kernel: [ 1088.520082] Call Trace: kernel: [ 1088.520087] <TASK> kernel: [ 1088.520094] __schedule+0x2aa/0x610 kernel: [ 1088.520108] schedule+0x63/0x110 kernel: [ 1088.520117] wait_current_trans+0x100/0x160 [btrfs] kernel: [ 1088.520257] ? __pfx_autoremove_wake_function+0x10/0x10 kernel: [ 1088.520269] start_transaction+0x28b/0x600 [btrfs] kernel: [ 1088.520406] btrfs_start_transaction+0x1e/0x30 [btrfs] kernel: [ 1088.520539] btrfs_uuid_scan_kthread+0x314/0x420 [btrfs] kernel: [ 1088.520706] ? __pfx_btrfs_uuid_rescan_kthread+0x10/0x10 [btrfs] kernel: [ 1088.520834] btrfs_uuid_rescan_kthread+0x20/0x70 [btrfs] kernel: [ 1088.520961] kthread+0xe9/0x110 kernel: [ 1088.520969] ? __pfx_kthread+0x10/0x10 kernel: [ 1088.520977] ret_from_fork+0x2c/0x50 kernel: [ 1088.520987] </TASK> kernel: [ 1088.520990] INFO: task btrfs:4850 blocked for more than 362 seconds. kernel: [ 1088.521021] Tainted: G W O 6.2.0-23-generic #23+btrdebug2c kernel: [ 1088.521052] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: [ 1088.521084] task:btrfs state:D stack:0 pid:4850 ppid:4849 flags:0x00000002 kernel: [ 1088.521092] Call Trace: kernel: [ 1088.521095] <TASK> kernel: [ 1088.521098] __schedule+0x2aa/0x610 kernel: [ 1088.521106] schedule+0x63/0x110 kernel: [ 1088.521111] btrfs_commit_transaction+0x9b7/0xbc0 [btrfs] kernel: [ 1088.521245] ? __pfx_autoremove_wake_function+0x10/0x10 kernel: [ 1088.521254] btrfs_sync_fs+0x5a/0x1b0 [btrfs] kernel: [ 1088.521372] btrfs_ioctl+0x643/0x14d0 [btrfs] kernel: [ 1088.521530] ? putname+0x5d/0x80 kernel: [ 1088.521539] ? do_sys_openat2+0xab/0x180 kernel: [ 1088.521548] ? exit_to_user_mode_prepare+0x30/0xb0 kernel: [ 1088.521559] __x64_sys_ioctl+0xa0/0xe0 kernel: [ 1088.521567] do_syscall_64+0x5b/0x90 kernel: [ 1088.521575] ? exc_page_fault+0x91/0x1b0 kernel: [ 1088.521582] entry_SYSCALL_64_after_hwframe+0x72/0xdc kernel: [ 1088.521589] RIP: 0033:0x7fbf339119ef kernel: [ 1088.521595] RSP: 002b:00007ffd58427660 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 kernel: [ 1088.521602] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fbf339119ef kernel: [ 1088.521606] RDX: 0000000000000000 RSI: 0000000000009408 RDI: 0000000000000003 kernel: [ 1088.521610] RBP: 0000000000000007 R08: 0000000000000000 R09: 0000000000000000 kernel: [ 1088.521613] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fbf339f642c kernel: [ 1088.521616] R13: 0000000000000001 R14: 0000557384b29578 R15: 0000000000000000 kernel: [ 1088.521626] </TASK> kernel: [ 1209.357423] INFO: task btrfs-uuid:4834 blocked for more than 483 seconds. kernel: [ 1209.357473] Tainted: G W O 6.2.0-23-generic #23+btrdebug2c kernel: [ 1209.357507] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: [ 1209.357540] task:btrfs-uuid state:D stack:0 pid:4834 ppid:2 flags:0x00004000 kernel: [ 1209.357551] Call Trace: kernel: [ 1209.357555] <TASK> kernel: [ 1209.357563] __schedule+0x2aa/0x610 kernel: [ 1209.357577] schedule+0x63/0x110 kernel: [ 1209.357597] wait_current_trans+0x100/0x160 [btrfs] kernel: [ 1209.357738] ? __pfx_autoremove_wake_function+0x10/0x10 kernel: [ 1209.357750] start_transaction+0x28b/0x600 [btrfs] kernel: [ 1209.357887] btrfs_start_transaction+0x1e/0x30 [btrfs] kernel: [ 1209.358021] btrfs_uuid_scan_kthread+0x314/0x420 [btrfs] kernel: [ 1209.358187] ? __pfx_btrfs_uuid_rescan_kthread+0x10/0x10 [btrfs] kernel: [ 1209.358315] btrfs_uuid_rescan_kthread+0x20/0x70 [btrfs] kernel: [ 1209.358442] kthread+0xe9/0x110 kernel: [ 1209.358451] ? __pfx_kthread+0x10/0x10 kernel: [ 1209.358458] ret_from_fork+0x2c/0x50 kernel: [ 1209.358468] </TASK> kernel: [ 1330.195147] INFO: task btrfs-transacti:4088 blocked for more than 120 seconds. kernel: [ 1330.195192] Tainted: G W O 6.2.0-23-generic #23+btrdebug2c kernel: [ 1330.195221] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: [ 1330.195250] task:btrfs-transacti state:D stack:0 pid:4088 ppid:2 flags:0x00004000 kernel: [ 1330.195259] Call Trace: kernel: [ 1330.195263] <TASK> kernel: [ 1330.195269] __schedule+0x2aa/0x610 kernel: [ 1330.195281] schedule+0x63/0x110 kernel: [ 1330.195288] wait_for_commit+0x14c/0x1b0 [btrfs] kernel: [ 1330.195413] ? __pfx_autoremove_wake_function+0x10/0x10 kernel: [ 1330.195424] btrfs_commit_transaction+0x16c/0xbc0 [btrfs] kernel: [ 1330.195552] ? start_transaction+0xc8/0x600 [btrfs] kernel: [ 1330.195676] transaction_kthread+0x14b/0x1c0 [btrfs] kernel: [ 1330.195795] ? __pfx_transaction_kthread+0x10/0x10 [btrfs] kernel: [ 1330.195912] kthread+0xe9/0x110 kernel: [ 1330.195920] ? __pfx_kthread+0x10/0x10 kernel: [ 1330.195927] ret_from_fork+0x2c/0x50 kernel: [ 1330.195937] </TASK> kernel: [ 1330.195939] INFO: task btrfs-uuid:4834 blocked for more than 604 seconds. kernel: [ 1330.195968] Tainted: G W O 6.2.0-23-generic #23+btrdebug2c kernel: [ 1330.195997] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: [ 1330.196026] task:btrfs-uuid state:D stack:0 pid:4834 ppid:2 flags:0x00004000 kernel: [ 1330.196033] Call Trace: kernel: [ 1330.196036] <TASK> kernel: [ 1330.196039] __schedule+0x2aa/0x610 kernel: [ 1330.196046] schedule+0x63/0x110 kernel: [ 1330.196051] wait_current_trans+0x100/0x160 [btrfs] kernel: [ 1330.196169] ? __pfx_autoremove_wake_function+0x10/0x10 kernel: [ 1330.196177] start_transaction+0x28b/0x600 [btrfs] kernel: [ 1330.196298] btrfs_start_transaction+0x1e/0x30 [btrfs] kernel: [ 1330.196416] btrfs_uuid_scan_kthread+0x314/0x420 [btrfs] kernel: [ 1330.196565] ? __pfx_btrfs_uuid_rescan_kthread+0x10/0x10 [btrfs] kernel: [ 1330.196680] btrfs_uuid_rescan_kthread+0x20/0x70 [btrfs] kernel: [ 1330.196794] kthread+0xe9/0x110 kernel: [ 1330.196800] ? __pfx_kthread+0x10/0x10 kernel: [ 1330.196807] ret_from_fork+0x2c/0x50 kernel: [ 1330.196814] </TASK> kernel: [ 1451.031238] INFO: task btrfs-transacti:4088 blocked for more than 241 seconds. kernel: [ 1451.031286] Tainted: G W O 6.2.0-23-generic #23+btrdebug2c kernel: [ 1451.031319] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: [ 1451.031352] task:btrfs-transacti state:D stack:0 pid:4088 ppid:2 flags:0x00004000 kernel: [ 1451.031362] Call Trace: kernel: [ 1451.031366] <TASK> kernel: [ 1451.031373] __schedule+0x2aa/0x610 kernel: [ 1451.031388] schedule+0x63/0x110 kernel: [ 1451.031396] wait_for_commit+0x14c/0x1b0 [btrfs] kernel: [ 1451.031535] ? __pfx_autoremove_wake_function+0x10/0x10 kernel: [ 1451.031548] btrfs_commit_transaction+0x16c/0xbc0 [btrfs] kernel: [ 1451.031684] ? start_transaction+0xc8/0x600 [btrfs] kernel: [ 1451.031819] transaction_kthread+0x14b/0x1c0 [btrfs] kernel: [ 1451.031951] ? __pfx_transaction_kthread+0x10/0x10 [btrfs] kernel: [ 1451.032082] kthread+0xe9/0x110 kernel: [ 1451.032091] ? __pfx_kthread+0x10/0x10 kernel: [ 1451.032098] ret_from_fork+0x2c/0x50 kernel: [ 1451.032108] </TASK> On Mon, 26 Jun 2023 at 19:48, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > > > On 2023/6/24 23:29, Stefan N wrote: > > Whoops, I had left --dry-run on the first debug patch you commited, so > > that didn't run correctly. > > > > I've included the output from both patches, as they result in different output. > > > > Rerunning the older patch first, with loop devices (I tried both > > 4x100mb and 4x1gb) I get the following: > > > [...] > > *** The below is using the newer patch as follows: > > $ diff fs/btrfs/ ../linux-6.2.0-dist/fs/btrfs/ > > diff fs/btrfs/ioctl.c ../linux-6.2.0-dist/fs/btrfs/ioctl.c > > 2656,2658d2655 > > < else > > < btrfs_err(fs_info, "failed to add disk %s: %d", > > < vol_args->name, ret); > > diff fs/btrfs/transaction.c ../linux-6.2.0-dist/fs/btrfs/transaction.c > > 1029d1028 > > < /* > > 1031d1029 > > < */ > > diff fs/btrfs/volumes.c ../linux-6.2.0-dist/fs/btrfs/volumes.c > > 2677c2677 > > < trans = btrfs_join_transaction(root); > > --- > >> trans = btrfs_start_transaction(root, 0); > > 2680d2679 > > < btrfs_err(fs_info, "failed to start trans: %d", ret); > > 2769d2767 > > < btrfs_err(fs_info, "failed to add dev item: %d", ret); > > 2787,2789c2785 > > < ret = btrfs_end_transaction(trans); > > < if (ret < 0) > > < btrfs_err(fs_info, "failed to end trans: %d", ret); > > --- > >> ret = btrfs_commit_transaction(trans); > > $ > > > > $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs > > dev add -K -f /dev/loop12 /dev/loop13 /dev/loop14 /dev/loop15 > > /mnt/data ; sudo btrfs fi sync /mnt/data > > ERROR: Could not sync filesystem: No space left on device > > Is it the same even with 4x1GiB loopback devices? > > > $ > > > > kernel: [ 1811.846087] BTRFS info (device sdc): using crc32c > > (crc32c-intel) checksum algorithm > > kernel: [ 1811.846107] BTRFS info (device sdc): disk space caching is enabled > > kernel: [ 1817.852850] BTRFS info (device sdc): bdev /dev/sde errs: wr > > 0, rd 0, flush 0, corrupt 845, gen 0 > > kernel: [ 1817.852866] BTRFS info (device sdc): bdev /dev/sda errs: wr > > 41089, rd 1556, flush 0, corrupt 0, gen 0 > > kernel: [ 1817.852877] BTRFS info (device sdc): bdev /dev/sdh errs: wr > > 3, rd 7, flush 0, corrupt 0, gen 0 > > kernel: [ 1817.852884] BTRFS info (device sdc): bdev /dev/sdd errs: wr > > 41, rd 0, flush 0, corrupt 0, gen 0 > > kernel: [ 2037.562050] BTRFS info (device sdc): balance: resume skipped > > kernel: [ 2037.562064] BTRFS info (device sdc): checking UUID tree > > kernel: [ 2037.581550] BTRFS info (device sdc): disk added /dev/loop12 > > kernel: [ 2037.591163] BTRFS info (device sdc): disk added /dev/loop13 > > kernel: [ 2037.599477] BTRFS info (device sdc): disk added /dev/loop14 > > kernel: [ 2037.607064] BTRFS info (device sdc): disk added /dev/loop15 > > kernel: [ 2176.124630] INFO: task btrfs:7783 blocked for more than 120 seconds. > > kernel: [ 2176.124678] Tainted: G W O > > 6.2.0-23-generic #23+btrdebug2c > > kernel: [ 2176.124710] "echo 0 > > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > kernel: [ 2176.124742] task:btrfs state:D stack:0 > > pid:7783 ppid:7782 flags:0x00004002 > > kernel: [ 2176.124753] Call Trace: > > kernel: [ 2176.124758] <TASK> > > kernel: [ 2176.124765] __schedule+0x2aa/0x610 > > kernel: [ 2176.124780] schedule+0x63/0x110 > > kernel: [ 2176.124788] btrfs_commit_transaction+0x9b7/0xbc0 [btrfs] > > This means we're doing the real work, but it seems to take too long. > > In fact this is already looking promising as we have when through the > whole device add part. > > Just need to let the final commit to finish. > > > kernel: [ 2176.124929] ? __pfx_autoremove_wake_function+0x10/0x10 > > kernel: [ 2176.124941] btrfs_sync_fs+0x5a/0x1b0 [btrfs] > > kernel: [ 2176.125060] btrfs_ioctl+0x643/0x14d0 [btrfs] > > kernel: [ 2176.125225] __x64_sys_ioctl+0xa0/0xe0 > > kernel: [ 2176.125235] do_syscall_64+0x5b/0x90 > > kernel: [ 2176.125242] ? do_sys_openat2+0xab/0x180 > > kernel: [ 2176.125251] ? exit_to_user_mode_prepare+0x30/0xb0 > > kernel: [ 2176.125260] ? syscall_exit_to_user_mode+0x29/0x50 > > kernel: [ 2176.125268] ? do_syscall_64+0x67/0x90 > > kernel: [ 2176.125275] entry_SYSCALL_64_after_hwframe+0x72/0xdc > > kernel: [ 2176.125282] RIP: 0033:0x7f2e8eb119ef > > kernel: [ 2176.125288] RSP: 002b:00007ffd632b6aa0 EFLAGS: 00000246 > > ORIG_RAX: 0000000000000010 > > kernel: [ 2176.125295] RAX: ffffffffffffffda RBX: 0000000000000003 > > RCX: 00007f2e8eb119ef > > kernel: [ 2176.125300] RDX: 0000000000000000 RSI: 0000000000009408 > > RDI: 0000000000000003 > > kernel: [ 2176.125303] RBP: 0000000000000007 R08: 0000000000000000 > > R09: 0000000000000000 > > kernel: [ 2176.125306] R10: 0000000000000000 R11: 0000000000000246 > > R12: 00007f2e8ebf642c > > kernel: [ 2176.125310] R13: 0000000000000001 R14: 000055cdb7940578 > > R15: 0000000000000000 > > kernel: [ 2176.125318] </TASK> > > kernel: [ 2296.956781] INFO: task btrfs:7783 blocked for more than 241 seconds. > > kernel: [ 2296.956824] Tainted: G W O > > 6.2.0-23-generic #23+btrdebug2c > > kernel: [ 2296.956856] "echo 0 > > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > kernel: [ 2296.956887] task:btrfs state:D stack:0 > > pid:7783 ppid:7782 flags:0x00004002 > > kernel: [ 2296.956898] Call Trace: > > kernel: [ 2296.956902] <TASK> > > kernel: [ 2296.956908] __schedule+0x2aa/0x610 > > kernel: [ 2296.956921] schedule+0x63/0x110 > > kernel: [ 2296.956928] btrfs_commit_transaction+0x9b7/0xbc0 [btrfs] > > kernel: [ 2296.957069] ? __pfx_autoremove_wake_function+0x10/0x10 > > kernel: [ 2296.957080] btrfs_sync_fs+0x5a/0x1b0 [btrfs] > > kernel: [ 2296.957200] btrfs_ioctl+0x643/0x14d0 [btrfs] > > kernel: [ 2296.957366] __x64_sys_ioctl+0xa0/0xe0 > > kernel: [ 2296.957375] do_syscall_64+0x5b/0x90 > > kernel: [ 2296.957383] ? do_sys_openat2+0xab/0x180 > > kernel: [ 2296.957391] ? exit_to_user_mode_prepare+0x30/0xb0 > > kernel: [ 2296.957399] ? syscall_exit_to_user_mode+0x29/0x50 > > kernel: [ 2296.957407] ? do_syscall_64+0x67/0x90 > > kernel: [ 2296.957414] entry_SYSCALL_64_after_hwframe+0x72/0xdc > > kernel: [ 2296.957420] RIP: 0033:0x7f2e8eb119ef > > kernel: [ 2296.957426] RSP: 002b:00007ffd632b6aa0 EFLAGS: 00000246 > > ORIG_RAX: 0000000000000010 > > kernel: [ 2296.957433] RAX: ffffffffffffffda RBX: 0000000000000003 > > RCX: 00007f2e8eb119ef > > kernel: [ 2296.957438] RDX: 0000000000000000 RSI: 0000000000009408 > > RDI: 0000000000000003 > > kernel: [ 2296.957441] RBP: 0000000000000007 R08: 0000000000000000 > > R09: 0000000000000000 > > kernel: [ 2296.957444] R10: 0000000000000000 R11: 0000000000000246 > > R12: 00007f2e8ebf642c > > kernel: [ 2296.957448] R13: 0000000000000001 R14: 000055cdb7940578 > > R15: 0000000000000000 > > kernel: [ 2296.957468] </TASK> > > kernel: [ 2314.043258] ------------[ cut here ]------------ > > kernel: [ 2314.043264] BTRFS: Transaction aborted (error -28) > > kernel: [ 2314.043334] WARNING: CPU: 2 PID: 7739 at > > fs/btrfs/extent-tree.c:2847 do_free_extent_accounting+0x21a/0x220 > > [btrfs] > > kernel: [ 2314.043467] Modules linked in: ipmi_devintf ipmi_msghandler > > overlay iwlwifi_compat(O) binfmt_misc nls_iso8859_1 intel_rapl_msr > > snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio > > intel_rapl_common snd_hda_codec_hdmi edac_mce_amd snd_hda_intel > > snd_intel_dspcfg kvm_amd snd_intel_sdw_acpi snd_hda_codec kvm > > snd_hda_core snd_hwdep snd_pcm snd_timer irqbypass rapl wmi_bmof snd > > k10temp ccp soundcore input_leds mac_hid dm_multipath scsi_dh_rdac > > scsi_dh_emc scsi_dh_alua bonding tls msr nfsd efi_pstore auth_rpcgss > > nfs_acl lockd grace sunrpc dmi_sysfs ip_tables x_tables autofs4 btrfs > > blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq > > async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear > > amdgpu iommu_v2 drm_buddy gpu_sched drm_ttm_helper hid_generic ttm > > drm_display_helper cec uas rc_core usbhid hid drm_kms_helper > > crct10dif_pclmul syscopyarea usb_storage crc32_pclmul polyval_clmulni > > sysfillrect polyval_generic sysimgblt nvme ghash_clmulni_intel > > sha512_ssse3 > > kernel: [ 2314.043599] nvme_core aesni_intel crypto_simd mpt3sas drm > > cryptd raid_class ahci i2c_piix4 scsi_transport_sas nvme_common igb > > xhci_pci qlcnic dca xhci_pci_renesas libahci i2c_algo_bit video wmi > > kernel: [ 2314.043631] CPU: 2 PID: 7739 Comm: btrfs-transacti Tainted: > > G W O 6.2.0-23-generic #23+btrdebug2c > > kernel: [ 2314.043638] Hardware name: To Be Filled By O.E.M. X570M > > Pro4/X570M Pro4, BIOS P3.70 02/23/2022 > > kernel: [ 2314.043641] RIP: 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] > > kernel: [ 2314.043766] Code: ce 0f 0b eb b8 44 89 e6 48 c7 c7 a8 39 a0 > > c1 e8 2c d5 1e ce 0f 0b e9 78 ff ff ff 44 89 e6 48 c7 c7 a8 39 a0 c1 > > e8 16 d5 1e ce <0f> 0b eb b9 66 90 90 90 90 90 90 90 90 90 90 90 90 90 > > 90 90 90 90 > > kernel: [ 2314.043771] RSP: 0018:ffffad0b11b7bb38 EFLAGS: 00010246 > > kernel: [ 2314.043777] RAX: 0000000000000000 RBX: ffff9c80e40e8f08 > > RCX: 0000000000000000 > > kernel: [ 2314.043781] RDX: 0000000000000000 RSI: 0000000000000000 > > RDI: 0000000000000000 > > kernel: [ 2314.043784] RBP: ffffad0b11b7bb60 R08: 0000000000000000 > > R09: 0000000000000000 > > kernel: [ 2314.043787] R10: 0000000000000000 R11: 0000000000000000 > > R12: 00000000ffffffe4 > > kernel: [ 2314.043790] R13: 00005e4c359ba000 R14: 0000000000020000 > > R15: ffff9c824d9a58c0 > > kernel: [ 2314.043794] FS: 0000000000000000(0000) > > GS:ffff9c87a0a80000(0000) knlGS:0000000000000000 > > kernel: [ 2314.043798] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > kernel: [ 2314.043802] CR2: 00007f54adc86000 CR3: 00000001471d8000 > > CR4: 00000000003506e0 > > kernel: [ 2314.043806] Call Trace: > > kernel: [ 2314.043809] <TASK> > > kernel: [ 2314.043815] __btrfs_free_extent+0x6bc/0xf50 [btrfs] > > kernel: [ 2314.043943] run_delayed_data_ref+0x8b/0x180 [btrfs] > > kernel: [ 2314.044068] btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] > > kernel: [ 2314.044192] __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] > > kernel: [ 2314.044316] btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] > > kernel: [ 2314.044439] btrfs_start_dirty_block_groups+0x36b/0x530 [btrfs] > > kernel: [ 2314.044598] btrfs_commit_transaction+0xb3/0xbc0 [btrfs] > > kernel: [ 2314.044754] ? start_transaction+0xc8/0x600 [btrfs] > > kernel: [ 2314.044890] transaction_kthread+0x14b/0x1c0 [btrfs] > > kernel: [ 2314.045021] ? __pfx_transaction_kthread+0x10/0x10 [btrfs] > > kernel: [ 2314.045151] kthread+0xe9/0x110 > > kernel: [ 2314.045162] ? __pfx_kthread+0x10/0x10 > > kernel: [ 2314.045170] ret_from_fork+0x2c/0x50 > > kernel: [ 2314.045180] </TASK> > > kernel: [ 2314.045182] ---[ end trace 0000000000000000 ]--- > > kernel: [ 2314.045186] BTRFS info (device sdc: state A): dumping space info: > > kernel: [ 2314.045191] BTRFS info (device sdc: state A): space_info > > DATA has 160777674752 free, is not full > > kernel: [ 2314.045197] BTRFS info (device sdc: state A): space_info > > total=71201958395904, used=71013439856640, pinned=27737325568, > > reserved=0, may_use=0, readonly=3538944 zone_unusable=0 > > kernel: [ 2314.045205] BTRFS info (device sdc: state A): space_info > > METADATA has -429047808 free, is full > > This means we need at least 500+ MiB metadata space. > > Thus you may want to try 4x1GiB to see if this makes any difference. > > Thanks, > Qu > > kernel: [ 2314.045209] BTRFS info (device sdc: state A): space_info > > total=83634421760, used=82789777408, pinned=244891648, > > reserved=599687168, may_use=429047808, readonly=65536 zone_unusable=0 > > kernel: [ 2314.045217] BTRFS info (device sdc: state A): space_info > > SYSTEM has 33390592 free, is not full > > kernel: [ 2314.045221] BTRFS info (device sdc: state A): space_info > > total=38797312, used=5373952, pinned=16384, reserved=16384, may_use=0, > > readonly=0 zone_unusable=0 > > kernel: [ 2314.045227] BTRFS info (device sdc: state A): > > global_block_rsv: size 536870912 reserved 428523520 > > kernel: [ 2314.045231] BTRFS info (device sdc: state A): > > trans_block_rsv: size 524288 reserved 524288 > > kernel: [ 2314.045235] BTRFS info (device sdc: state A): > > chunk_block_rsv: size 0 reserved 0 > > kernel: [ 2314.045239] BTRFS info (device sdc: state A): > > delayed_block_rsv: size 0 reserved 0 > > kernel: [ 2314.045242] BTRFS info (device sdc: state A): > > delayed_refs_rsv: size 249756909568 reserved 0 > > kernel: [ 2314.045251] BTRFS: error (device sdc: state A) in > > do_free_extent_accounting:2847: errno=-28 No space left > > kernel: [ 2314.045265] BTRFS warning (device sdc: state A): > > btrfs_uuid_scan_kthread failed -28 > > kernel: [ 2314.045295] BTRFS info (device sdc: state EA): forced readonly > > kernel: [ 2314.045300] BTRFS error (device sdc: state EA): failed to > > run delayed ref for logical 103681409916928 num_bytes 131072 type 184 > > action 2 ref_mod 1: -28 > > kernel: [ 2314.045360] BTRFS: error (device sdc: state EA) in > > btrfs_run_delayed_refs:2151: errno=-28 No space left > > kernel: [ 2314.049204] BTRFS: error (device sdc: state EA) in > > btrfs_create_pending_block_groups:2487: errno=-28 No space left > > kernel: [ 2314.049331] BTRFS: error (device sdc: state EA) in > > btrfs_create_pending_block_groups:2499: errno=-28 No space left > > kernel: [ 2314.053259] BTRFS: error (device sdc: state EA) in > > do_free_extent_accounting:2847: errno=-28 No space left > > kernel: [ 2314.053318] BTRFS error (device sdc: state EA): failed to > > run delayed ref for logical 103681419366400 num_bytes 131072 type 184 > > action 2 ref_mod 1: -28 > > kernel: [ 2314.053375] BTRFS: error (device sdc: state EA) in > > btrfs_run_delayed_refs:2151: errno=-28 No space left > > kernel: [ 2314.053430] BTRFS warning (device sdc: state EA): Skipping > > commit of aborted transaction. > > kernel: [ 2314.053435] BTRFS: error (device sdc: state EA) in > > cleanup_transaction:1986: errno=-28 No space left > > > > > > > > On Fri, 23 Jun 2023 at 19:16, Qu Wenruo <wqu@suse.com> wrote: > >> > >> > >> > >> On 2023/6/23 17:00, Stefan N wrote: > >>> Apologies, I thought I included the log output too, though I can't see > >>> any additional output > >>> > >>> From a fresh run, still using the same kernel > >>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs > >>> dev add -f /dev/sdl /dev/sdm /dev/sdn /dev/sdo /mnt/data ; sudo btrfs > >>> fi sync /mnt/data > >>> ERROR: error adding device '/dev/sdl': Input/output error > >>> ERROR: error adding device '/dev/sdm': Read-only file system > >>> ERROR: error adding device '/dev/sdn': Read-only file system > >>> ERROR: error adding device '/dev/sdo': Read-only file system > >>> ERROR: Could not sync filesystem: Read-only file system > >>> $ > >>> > >>> Output from kern.log, syslog or dmesg -k > >>> > >> [...] > >> > >> None of the newly added debug lines triggered, so there is something > >> else causing the problem. > >> > >> And furthermore the backtrace is not that helpful, it only shows it's > >> some async metadata reclaim kthread causing the problem. > >> > >> Although I guess the async metadata reclaim is triggered by the > >> btrfs_start_transaction() call when adding a device. > >> So I updated my github branch to go btrfs_join_transaction() which would > >> not flush any metadata, thus avoid the problem. > >> > >> Would you please give it a try again? > >> > >>> > >>> However, now I started digging into logs to check I hadn't missed > >>> where the errors were being logged, I've found this from roughly a > >>> week before I started having issues, which I had not previously > >>> noticed > >> > >> You don't need to bother most error messages after the fs flipped RO. > >> As it's known to have some false alerts. > >> > >> Thanks, > >> Qu > >> > >>> [ 1990.495861] BTRFS error (device sdh): failed to run delayed ref for > >>> logical 107988943355904 num_bytes 245760 type 184 action 2 ref_mod 1: > >>> -28 > >>> [ 1990.518282] BTRFS error (device sdh): failed to run delayed ref for > >>> logical 107989043494912 num_bytes 245760 type 184 action 2 ref_mod 1: > >>> -28 > >>> [ 620.104065] BTRFS error (device sdk): failed to run delayed ref for > >>> logical 123187655077888 num_bytes 176128 type 184 action 2 ref_mod 1: > >>> -28 > >>> [ 620.126209] BTRFS error (device sdk): failed to run delayed ref for > >>> logical 123190279929856 num_bytes 134217728 type 184 action 2 ref_mod > >>> 1: -28 > >>> [ 620.126241] BTRFS error (device sdk): failed to run delayed ref for > >>> logical 123189970468864 num_bytes 134217728 type 184 action 2 ref_mod > >>> 1: -28 > >>> [ 620.126271] BTRFS error (device sdk): failed to run delayed ref for > >>> logical 123190414409728 num_bytes 134217728 type 184 action 2 ref_mod > >>> 1: -28 > >>> [ 476.565308] BTRFS error (device sdh): failed to run delayed ref for > >>> logical 101906434228224 num_bytes 651264 type 184 action 2 ref_mod 1: > >>> -28 > >>> [ 476.565932] BTRFS error (device sdh): failed to run delayed ref for > >>> logical 101906434031616 num_bytes 180224 type 184 action 2 ref_mod 1: > >>> -28 > >>> [ 447.371754] BTRFS error (device sdh): failed to run delayed ref for > >>> logical 101946151927808 num_bytes 262144 type 184 action 2 ref_mod 1: > >>> -28 > >>> [ 447.372362] BTRFS error (device sdh): failed to run delayed ref for > >>> logical 101946083725312 num_bytes 245760 type 184 action 2 ref_mod 1: > >>> -28 > >>> [ 439.839007] BTRFS error (device sdj): failed to run delayed ref for > >>> logical 101923102179328 num_bytes 192512 type 184 action 2 ref_mod 1: > >>> -28 > >>> [ 439.839578] BTRFS error (device sdj): failed to run delayed ref for > >>> logical 101923401629696 num_bytes 245760 type 184 action 2 ref_mod 1: > >>> -28 > >>> [ 466.393884] BTRFS error (device sdh): failed to run delayed ref for > >>> logical 101981116137472 num_bytes 245760 type 184 action 2 ref_mod 1: > >>> -28 > >>> [ 466.394451] BTRFS error (device sdh): failed to run delayed ref for > >>> logical 101981122854912 num_bytes 1720320 type 184 action 2 ref_mod 1: > >>> -28 > >>> [ 431.541367] BTRFS error (device sdh): failed to run delayed ref for > >>> logical 101876426952704 num_bytes 126976 type 184 action 2 ref_mod 1: > >>> -28 > >>> [ 431.542010] BTRFS error (device sdh): failed to run delayed ref for > >>> logical 101876427780096 num_bytes 126976 type 184 action 2 ref_mod 1: > >>> -28 > >>> [ 597.487948] BTRFS error (device sdj): failed to run delayed ref for > >>> logical 108127459409920 num_bytes 196608 type 184 action 2 ref_mod 1: > >>> -28 > >>> [ 597.488539] BTRFS error (device sdj): failed to run delayed ref for > >>> logical 108124677865472 num_bytes 126976 type 184 action 2 ref_mod 1: > >>> -28 > >>> [ 534.717509] BTRFS error (device sdh): failed to run delayed ref for > >>> logical 101958618710016 num_bytes 1597440 type 184 action 2 ref_mod 1: > >>> -28 > >>> [ 534.718494] BTRFS error (device sdh): failed to run delayed ref for > >>> logical 101958756335616 num_bytes 368640 type 184 action 2 ref_mod 1: > >>> -28 > >>> [ 508.089394] BTRFS error (device sdk): failed to run delayed ref for > >>> logical 101911627694080 num_bytes 126976 type 184 action 2 ref_mod 1: > >>> -28 > >>> [ 508.090007] BTRFS error (device sdk): failed to run delayed ref for > >>> logical 101911627415552 num_bytes 126976 type 184 action 2 ref_mod 1: > >>> -28 > >>> [ 1632.112084] BTRFS error (device sdh): failed to run delayed ref for > >>> logical 102203759886336 num_bytes 229376 type 184 action 2 ref_mod 1: > >>> -28 > >>> [ 1632.112885] BTRFS error (device sdh): failed to run delayed ref for > >>> logical 102203764379648 num_bytes 126976 type 184 action 2 ref_mod 1: > >>> -28 > >>> > >>> and today, when leaving the disks mounted read-only for a while, I > >>> found many occurances similar to: > >>> BTRFS error (device sdc: state EA): level verify failed on logical > >>> 201329754554368 mirror 1 wanted 2 found 0 > >>> BTRFS error (device sdc: state EA): level verify failed on logical > >>> 201329754554368 mirror 2 wanted 2 found 0 > >>> BTRFS error (device sdc: state EA): level verify failed on logical > >>> 201329754554368 mirror 3 wanted 2 found 0 > >>> BTRFS error (device sdc: state EA): level verify failed on logical > >>> 201329754554368 mirror 4 wanted 2 found 0 > >>> BTRFS error (device sdc: state EA): level verify failed on logical > >>> 201329754554368 mirror 1 wanted 2 found 0 > >>> BTRFS error (device sdc: state EA): level verify failed on logical > >>> 201329754554368 mirror 2 wanted 2 found 0 > >>> BTRFS error (device sdc: state EA): level verify failed on logical > >>> 201329754554368 mirror 3 wanted 2 found 0 > >>> BTRFS error (device sdc: state EA): level verify failed on logical > >>> 201350830227456 mirror 4 wanted 2 found 0 > >>> BTRFS error (device sdc: state EA): level verify failed on logical > >>> 201350830227456 mirror 1 wanted 2 found 0 > >>> BTRFS error (device sdc: state EA): level verify failed on logical > >>> 201350830227456 mirror 2 wanted 2 found 0 > >>> > >>> On Fri, 23 Jun 2023 at 10:27, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > >>>> > >>>> > >>>> > >>>> On 2023/6/23 06:18, Stefan N wrote: > >>>>> Hi Qu, > >>>>> > >>>>> I got one new line this time, but it doesn't seem to match your commit > >>>>> ERROR: zoned: unable to stat /dev/loop/13 > >>>> > >>>> Please provide the dmesg of that attempt, as all the extra debug info is > >>>> inside dmesg. > >>>> > >>>> With that info provided, we can determine what to do next. > >>>> > >>>> Thanks, > >>>> Qu > >>>> > >>>>> > >>>>> I tried it on the USB flash drives too and didn't get any extra line > >>>>> > >>>>> In context > >>>>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs > >>>>> dev add -K -f /dev/loop12 /dev/loop/13 /dev/loop14 /dev/loop15 > >>>>> /mnt/data ; sudo btrfs fi sync /mnt/data > >>>>> ERROR: error adding device '/dev/loop12': Input/output error > >>>>> ERROR: zoned: unable to stat /dev/loop/13 > >>>>> ERROR: checking status of /dev/loop/13: No such file or directory > >>>>> ERROR: error adding device '/dev/loop14': Read-only file system > >>>>> ERROR: error adding device '/dev/loop15': Read-only file system > >>>>> ERROR: Could not sync filesystem: Read-only file system > >>>>> $ > >>>>> > >>>>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs > >>>>> dev add -f /dev/sdl /dev/sdm /dev/sdn /dev/sdo /mnt/data ; sudo btrfs > >>>>> fi sync /mnt/data > >>>>> ERROR: error adding device '/dev/sdl': Input/output error > >>>>> ERROR: error adding device '/dev/sdm': Read-only file system > >>>>> ERROR: error adding device '/dev/sdn': Read-only file system > >>>>> ERROR: error adding device '/dev/sdo': Read-only file system > >>>>> ERROR: Could not sync filesystem: Read-only file system > >>>>> $ > >>>>> > >>>>> On Thu, 22 Jun 2023 at 18:48, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 2023/6/22 16:33, Stefan N wrote: > >>>>>>> Hi Qu, > >>>>>>> > >>>>>>> Many thanks for the detailed instructions and your patience. I got it > >>>>>>> working combined with > >>>>>>> https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel on the main system > >>>>>>> OS instead, tagged +btrfix > >>>>>>> $ uname -vr > >>>>>>> 6.2.0-23-generic #23+btrfix SMP PREEMPT_DYNAMIC Thu Jun 22 > >>>>>>> > >>>>>>> However, I've not had luck with the commands suggested, and would > >>>>>>> appreciate any further ideas. > >>>>>>> > >>>>>>> Outputs follow below, with /mnt/data as the btrfs mount point that > >>>>>>> currently contains 8x disks sd[a-j] with an additional 4x 64gb USB > >>>>>>> flash drives being added sd[l-o] > >>>>>>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs > >>>>>>> dev add -f /dev/sdl /dev/sdm /dev/sdn /dev/sdo /mnt/data ; sudo btrfs > >>>>>>> fi sync /mnt/data > >>>>>>> ERROR: error adding device '/dev/sdl': Input/output error > >>>>>>> ERROR: error adding device '/dev/sdm': Read-only file system > >>>>>>> ERROR: error adding device '/dev/sdn': Read-only file system > >>>>>>> ERROR: error adding device '/dev/sdo': Read-only file system > >>>>>>> ERROR: Could not sync filesystem: Read-only file system > >>>>>>> $ > >>>>>>> > >>>>>>> The same occurs if I try to add 4x 100mb loop devices (on a ssd so > >>>>>>> they're super quick to zero); > >>>>>>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs > >>>>>>> dev add -K -f /dev/loop16 /dev/loop17 /dev/loop18 /dev/loop19 > >>>>>>> /mnt/data ; sudo btrfs fi sync /mnt/data > >>>>>>> ERROR: error adding device '/dev/loop16': Input/output error > >>>>>> > >>>>>> This is the interesting part, this means we're erroring out due to -EIO > >>>>>> (not -ENOSPC) during the first device add. > >>>>>> > >>>>>> And by somehow, after the first device add, we already got the trans abort. > >>>>>> > >>>>>> Would you please try the following branch? > >>>>>> > >>>>>> https://github.com/adam900710/linux/tree/dev_add_no_commit > >>>>>> > >>>>>> It has not only the patch to skip the commit, but also extra debug > >>>>>> output for the situation. > >>>>>> > >>>>>> Thanks, > >>>>>> Qu > >>>>>> > >>>>>>> ERROR: error adding device '/dev/loop17': Read-only file system > >>>>>>> ERROR: error adding device '/dev/loop18': Read-only file system > >>>>>>> ERROR: error adding device '/dev/loop19': Read-only file system > >>>>>>> ERROR: Could not sync filesystem: Read-only file system > >>>>>>> $ > >>>>>>> > >>>>>>> I confirmed before both these kernel builds that the replaced line was > >>>>>>> btrfs_end_transaction rather than btrfs_commit_transaction (anyone > >>>>>>> else following, I needed to remove the -n in the patch command > >>>>>>> earlier) > >>>>>>> $ grep -A3 -ri btrfs_sysfs_update_sprout */fs/btrfs/volumes.c* > >>>>>>> linux-6.2.0-dist/fs/btrfs/volumes.c: > >>>>>>> btrfs_sysfs_update_sprout_fsid(fs_devices); > >>>>>>> linux-6.2.0-dist/fs/btrfs/volumes.c- } > >>>>>>> linux-6.2.0-dist/fs/btrfs/volumes.c- > >>>>>>> linux-6.2.0-dist/fs/btrfs/volumes.c- ret = btrfs_commit_transaction(trans); > >>>>>>> -- > >>>>>>> linux-6.2.0-v2/fs/btrfs/volumes.c: > >>>>>>> btrfs_sysfs_update_sprout_fsid(fs_devices); > >>>>>>> linux-6.2.0-v2/fs/btrfs/volumes.c- } > >>>>>>> linux-6.2.0-v2/fs/btrfs/volumes.c- > >>>>>>> linux-6.2.0-v2/fs/btrfs/volumes.c- ret = btrfs_end_transaction(trans); > >>>>>>> -- > >>>>>>> linux-6.2.0-v3/fs/btrfs/volumes.c: > >>>>>>> btrfs_sysfs_update_sprout_fsid(fs_devices); > >>>>>>> linux-6.2.0-v3/fs/btrfs/volumes.c- } > >>>>>>> linux-6.2.0-v3/fs/btrfs/volumes.c- > >>>>>>> linux-6.2.0-v3/fs/btrfs/volumes.c- ret = btrfs_end_transaction(trans); > >>>>>>> $ > >>>>>>> > >>>>>>> $ btrfs fi usage /mnt/data > >>>>>>> Overall: > >>>>>>> Device size: 87.31TiB > >>>>>>> Device allocated: 87.31TiB > >>>>>>> Device unallocated: 1.94GiB > >>>>>>> Device missing: 0.00B > >>>>>>> Device slack: 0.00B > >>>>>>> Used: 87.08TiB > >>>>>>> Free (estimated): 173.29GiB (min: 172.33GiB) > >>>>>>> Free (statfs, df): 171.84GiB > >>>>>>> Data ratio: 1.34 > >>>>>>> Metadata ratio: 4.00 > >>>>>>> Global reserve: 512.00MiB (used: 371.25MiB) > >>>>>>> Multiple profiles: no > >>>>>>> > >>>>>>> Data,RAID6: Size:64.76TiB, Used:64.59TiB (99.74%) > >>>>>>> /dev/sdc 10.90TiB > >>>>>>> /dev/sdf 10.90TiB > >>>>>>> /dev/sda 10.86TiB > >>>>>>> /dev/sdg 10.87TiB > >>>>>>> /dev/sdh 10.86TiB > >>>>>>> /dev/sdd 10.87TiB > >>>>>>> /dev/sde 10.88TiB > >>>>>>> /dev/sdb 10.88TiB > >>>>>>> > >>>>>>> Metadata,RAID1C4: Size:77.79GiB, Used:77.11GiB (99.12%) > >>>>>>> /dev/sdc 15.33GiB > >>>>>>> /dev/sdf 18.41GiB > >>>>>>> /dev/sda 49.63GiB > >>>>>>> /dev/sdg 49.50GiB > >>>>>>> /dev/sdh 51.52GiB > >>>>>>> /dev/sdd 48.70GiB > >>>>>>> /dev/sde 39.09GiB > >>>>>>> /dev/sdb 39.01GiB > >>>>>>> > >>>>>>> System,RAID1C4: Size:37.00MiB, Used:5.11MiB (13.81%) > >>>>>>> /dev/sdc 1.00MiB > >>>>>>> /dev/sda 37.00MiB > >>>>>>> /dev/sdg 37.00MiB > >>>>>>> /dev/sdh 36.00MiB > >>>>>>> /dev/sdd 37.00MiB > >>>>>>> > >>>>>>> Unallocated: > >>>>>>> /dev/sdc 1.00MiB > >>>>>>> /dev/sdf 1.00MiB > >>>>>>> /dev/sda 1.27GiB > >>>>>>> /dev/sdg 1.00MiB > >>>>>>> /dev/sdh 1.00MiB > >>>>>>> /dev/sdd 687.00MiB > >>>>>>> /dev/sde 1.00MiB > >>>>>>> /dev/sdb 1.00MiB > >>>>>>> $ > >>>>>>> > >>>>>>> > >>>>>>> This first attempt generated the following syslog output: > >>>>>>> kernel: [ 868.435387] BTRFS info (device sde): using crc32c > >>>>>>> (crc32c-intel) checksum algorithm > >>>>>>> kernel: [ 868.435407] BTRFS info (device sde): disk space caching is enabled > >>>>>>> kernel: [ 874.477712] BTRFS info (device sde): bdev /dev/sdg errs: wr > >>>>>>> 0, rd 0, flush 0, corrupt 845, gen 0 > >>>>>>> kernel: [ 874.477727] BTRFS info (device sde): bdev /dev/sdc errs: wr > >>>>>>> 41089, rd 1556, flush 0, corrupt 0, gen 0 > >>>>>>> kernel: [ 874.477735] BTRFS info (device sde): bdev /dev/sdj errs: wr > >>>>>>> 3, rd 7, flush 0, corrupt 0, gen 0 > >>>>>>> kernel: [ 874.477740] BTRFS info (device sde): bdev /dev/sdf errs: wr > >>>>>>> 41, rd 0, flush 0, corrupt 0, gen 0 > >>>>>>> kernel: [ 1082.645551] BTRFS info (device sde): balance: resume skipped > >>>>>>> kernel: [ 1082.645564] BTRFS info (device sde): checking UUID tree > >>>>>>> kernel: [ 1082.645551] BTRFS info (device sde): balance: resume skipped > >>>>>>> kernel: [ 1082.645564] BTRFS info (device sde): checking UUID tree > >>>>>>> kernel: [ 1267.280506] BTRFS: Transaction aborted (error -28) > >>>>>>> kernel: [ 1267.280553] BTRFS: error (device sde: state A) in > >>>>>>> do_free_extent_accounting:2847: errno=-28 No space left > >>>>>>> kernel: [ 1267.280604] BTRFS info (device sde: state EA): forced readonly > >>>>>>> kernel: [ 1267.280610] BTRFS error (device sde: state EA): failed to > >>>>>>> run delayed ref for logical 102255404044288 num_bytes 294912 type 184 > >>>>>>> action 2 ref_mod 1: -28 > >>>>>>> kernel: [ 1267.280584] WARNING: CPU: 3 PID: 14519 at > >>>>>>> fs/btrfs/extent-tree.c:2847 do_free_extent_accounting+0x21a/0x220 > >>>>>>> [btrfs] > >>>>>>> kernel: [ 1267.280666] BTRFS: error (device sde: state EA) in > >>>>>>> btrfs_run_delayed_refs:2151: errno=-28 No space left > >>>>>>> kernel: [ 1267.280695] BTRFS warning (device sde: state EA): > >>>>>>> btrfs_uuid_scan_kthread failed -5 > >>>>>>> kernel: [ 1267.280794] Modules linked in: xt_nat xt_tcpudp veth > >>>>>>> xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink > >>>>>>> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo > >>>>>>> xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc > >>>>>>> ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) binfmt_misc > >>>>>>> nls_iso8859_1 intel_rapl_msr intel_rapl_common edac_mce_amd > >>>>>>> snd_hda_codec_realtek kvm_amd snd_hda_codec_generic ledtrig_audio kvm > >>>>>>> snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi > >>>>>>> snd_hda_codec irqbypass snd_hda_core snd_hwdep rapl snd_pcm snd_timer > >>>>>>> wmi_bmof k10temp snd ccp soundcore input_leds mac_hid dm_multipath > >>>>>>> scsi_dh_rdac scsi_dh_emc scsi_dh_alua bonding tls efi_pstore msr nfsd > >>>>>>> auth_rpcgss nfs_acl lockd grace sunrpc dmi_sysfs ip_tables x_tables > >>>>>>> autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov > >>>>>>> async_memcpy async_pq async_xor async_txxor raid6_pq libcrc32c raid1 > >>>>>>> raid0 multipath linear hid_generic usbhid hid amdgpu uas usb_storage > >>>>>>> kernel: [ 1267.280994] CPU: 3 PID: 14519 Comm: btrfs-transacti > >>>>>>> Tainted: G W O 6.2.0-23-generic #23+btrfix > >>>>>>> kernel: [ 1267.281005] RIP: 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] > >>>>>>> kernel: [ 1267.281181] __btrfs_free_extent+0x6bc/0xf50 [btrfs] > >>>>>>> kernel: [ 1267.281310] run_delayed_data_ref+0x8b/0x180 [btrfs] > >>>>>>> kernel: [ 1267.281444] btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] > >>>>>>> kernel: [ 1267.281570] __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] > >>>>>>> kernel: [ 1267.281694] btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] > >>>>>>> kernel: [ 1267.281818] btrfs_start_dirty_block_groups+0x36b/0x530 [btrfs] > >>>>>>> kernel: [ 1267.281976] btrfs_commit_transaction+0xb3/0xbc0 [btrfs] > >>>>>>> kernel: [ 1267.282110] ? start_transaction+0xc8/0x600 [btrfs] > >>>>>>> kernel: [ 1267.282244] transaction_kthread+0x14b/0x1c0 [btrfs] > >>>>>>> kernel: [ 1267.282375] ? __pfx_transaction_kthread+0x10/0x10 [btrfs] > >>>>>>> kernel: [ 1267.282548] BTRFS info (device sde: state EA): dumping space info: > >>>>>>> kernel: [ 1267.282552] BTRFS info (device sde: state EA): space_info > >>>>>>> DATA has 160777674752 free, is not full > >>>>>>> kernel: [ 1267.282558] BTRFS info (device sde: state EA): space_info > >>>>>>> total=71201958395904, used=71018191273984, pinned=22985908224, > >>>>>>> reserved=0, may_use=0, readonly=3538944 zone_unusable=0 > >>>>>>> kernel: [ 1267.282566] BTRFS info (device sde: state EA): space_info > >>>>>>> METADATA has -124944384 free, is full > >>>>>>> kernel: [ 1267.282571] BTRFS info (device sde: state EA): space_info > >>>>>>> total=83530612736, used=82791497728, pinned=242745344, > >>>>>>> reserved=496369664, may_use=124944384, readonly=0 zone_unusable=0 > >>>>>>> kernel: [ 1267.282577] BTRFS info (device sde: state EA): space_info > >>>>>>> SYSTEM has 33439744 free, is not full > >>>>>>> kernel: [ 1267.282582] BTRFS info (device sde: state EA): space_info > >>>>>>> total=38797312, used=5357568, pinned=0, reserved=0, may_use=0, > >>>>>>> readonly=0 zone_unusable=0 > >>>>>>> kernel: [ 1267.282588] BTRFS info (device sde: state EA): > >>>>>>> global_block_rsv: size 536870912 reserved 124944384 > >>>>>>> kernel: [ 1267.282592] BTRFS info (device sde: state EA): > >>>>>>> trans_block_rsv: size 0 reserved 0 > >>>>>>> kernel: [ 1267.282595] BTRFS info (device sde: state EA): > >>>>>>> chunk_block_rsv: size 0 reserved 0 > >>>>>>> kernel: [ 1267.282599] BTRFS info (device sde: state EA): > >>>>>>> delayed_block_rsv: size 0 reserved 0 > >>>>>>> kernel: [ 1267.282602] BTRFS info (device sde: state EA): > >>>>>>> delayed_refs_rsv: size 251322957824 reserved 0 > >>>>>>> kernel: [ 1267.282608] BTRFS: error (device sde: state EA) in > >>>>>>> do_free_extent_accounting:2847: errno=-28 No space left > >>>>>>> kernel: [ 1267.282653] BTRFS error (device sde: state EA): failed to > >>>>>>> run delayed ref for logical 102255401897984 num_bytes 126976 type 184 > >>>>>>> action 2 ref_mod 1: -28 > >>>>>>> kernel: [ 1267.282708] BTRFS: error (device sde: state EA) in > >>>>>>> btrfs_run_delayed_refs:2151: errno=-28 No space left > >>>>>>> > >>>>>>> A couple of kernel recompiles later, the second attempt on the SSD > >>>>>>> generated similar: > >>>>>>> kernel: [ 1472.203470] BTRFS info (device sdc): using crc32c > >>>>>>> (crc32c-intel) checksum algorithm > >>>>>>> kernel: [ 1472.203491] BTRFS info (device sdc): disk space caching is enabled > >>>>>>> kernel: [ 1478.155004] BTRFS info (device sdc): bdev /dev/sdf errs: wr > >>>>>>> 0, rd 0, flush 0, corrupt 845, gen 0 > >>>>>>> kernel: [ 1478.155022] BTRFS info (device sdc): bdev /dev/sda errs: wr > >>>>>>> 41089, rd 1556, flush 0, corrupt 0, gen 0 > >>>>>>> kernel: [ 1478.155034] BTRFS info (device sdc): bdev /dev/sdh errs: wr > >>>>>>> 3, rd 7, flush 0, corrupt 0, gen 0 > >>>>>>> kernel: [ 1478.155041] BTRFS info (device sdc): bdev /dev/sdd errs: wr > >>>>>>> 41, rd 0, flush 0, corrupt 0, gen 0 > >>>>>>> kernel: [ 1696.662526] BTRFS info (device sdc): balance: resume skipped > >>>>>>> kernel: [ 1696.662537] BTRFS info (device sdc): checking UUID tree > >>>>>>> kernel: [ 1919.452464] BTRFS: Transaction aborted (error -28) > >>>>>>> kernel: [ 1919.452534] WARNING: CPU: 1 PID: 161 at > >>>>>>> fs/btrfs/extent-tree.c:2847 do_free_extent_accounting+0x21a/0x220 > >>>>>>> [btrfs] > >>>>>>> kernel: [ 1919.452655] Modules linked in: xt_nat xt_tcpudp veth > >>>>>>> xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink > >>>>>>> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo > >>>>>>> xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc > >>>>>>> ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) binfmt_misc > >>>>>>> nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic > >>>>>>> ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg > >>>>>>> snd_intel_sdw_acpi snd_hda_codec intel_rapl_msr snd_hda_core > >>>>>>> intel_rapl_common edac_mce_amd snd_hwdep kvm_amd snd_pcm snd_timer kvm > >>>>>>> irqbypass rapl wmi_bmof snd k10temp soundcore ccp input_leds mac_hid > >>>>>>> dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua bonding tls nfsd > >>>>>>> msr auth_rpcgss efi_pstore nfs_acl lockd grace sunrpc dmi_sysfs > >>>>>>> ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 > >>>>>>> async_raid6_recov async_memcpy async_pq async_xor async_tx xor > >>>>>>> raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid > >>>>>>> amdgpu uas hid iommu_v2 > >>>>>>> kernel: [ 1919.452839] Workqueue: events_unbound > >>>>>>> btrfs_async_reclaim_metadata_space [btrfs] > >>>>>>> kernel: [ 1919.452985] RIP: 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] > >>>>>>> kernel: [ 1919.453141] __btrfs_free_extent+0x6bc/0xf50 [btrfs] > >>>>>>> kernel: [ 1919.453256] run_delayed_data_ref+0x8b/0x180 [btrfs] > >>>>>>> kernel: [ 1919.453368] btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] > >>>>>>> kernel: [ 1919.453480] __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] > >>>>>>> kernel: [ 1919.453592] btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] > >>>>>>> kernel: [ 1919.453703] flush_space+0x23c/0x2c0 [btrfs] > >>>>>>> kernel: [ 1919.453845] btrfs_async_reclaim_metadata_space+0x19b/0x2b0 [btrfs] > >>>>>>> kernel: [ 1919.454034] BTRFS info (device sdc: state A): dumping space info: > >>>>>>> kernel: [ 1919.454038] BTRFS info (device sdc: state A): space_info > >>>>>>> DATA has 160778723328 free, is not full > >>>>>>> kernel: [ 1919.454043] BTRFS info (device sdc: state A): space_info > >>>>>>> total=71201958395904, used=71017442181120, pinned=23733952512, > >>>>>>> reserved=0, may_use=0, readonly=3538944 zone_unusable=0 > >>>>>>> kernel: [ 1919.454050] BTRFS info (device sdc: state A): space_info > >>>>>>> METADATA has -147570688 free, is full > >>>>>>> kernel: [ 1919.454054] BTRFS info (device sdc: state A): space_info > >>>>>>> total=83530612736, used=82792185856, pinned=238059520, > >>>>>>> reserved=500367360, may_use=147570688, readonly=0 zone_unusable=0 > >>>>>>> kernel: [ 1919.454060] BTRFS info (device sdc: state A): space_info > >>>>>>> SYSTEM has 33439744 free, is not full > >>>>>>> kernel: [ 1919.454064] BTRFS info (device sdc: state A): space_info > >>>>>>> total=38797312, used=5357568, pinned=0, reserved=0, may_use=0, > >>>>>>> readonly=0 zone_unusable=0 > >>>>>>> kernel: [ 1919.454070] BTRFS info (device sdc: state A): > >>>>>>> global_block_rsv: size 536870912 reserved 147570688 > >>>>>>> kernel: [ 1919.454074] BTRFS info (device sdc: state A): > >>>>>>> trans_block_rsv: size 0 reserved 0 > >>>>>>> kernel: [ 1919.454077] BTRFS info (device sdc: state A): > >>>>>>> chunk_block_rsv: size 0 reserved 0 > >>>>>>> kernel: [ 1919.454080] BTRFS info (device sdc: state A): > >>>>>>> delayed_block_rsv: size 0 reserved 0 > >>>>>>> kernel: [ 1919.454083] BTRFS info (device sdc: state A): > >>>>>>> delayed_refs_rsv: size 254292787200 reserved 0 > >>>>>>> kernel: [ 1919.454086] BTRFS: error (device sdc: state A) in > >>>>>>> do_free_extent_accounting:2847: errno=-28 No space left > >>>>>>> kernel: [ 1919.454123] BTRFS info (device sdc: state EA): forced readonly > >>>>>>> kernel: [ 1919.454127] BTRFS error (device sdc: state EA): failed to > >>>>>>> run delayed ref for logical 102538713931776 num_bytes 245760 type 184 > >>>>>>> action 2 ref_mod 1: -28 > >>>>>>> kernel: [ 1919.454176] BTRFS: error (device sdc: state EA) in > >>>>>>> btrfs_run_delayed_refs:2151: errno=-28 No space left > >>>>>>> kernel: [ 1919.454249] BTRFS warning (device sdc: state EA): > >>>>>>> btrfs_uuid_scan_kthread failed -5 > >>>>>>> kernel: [ 1919.472381] BTRFS: error (device sdc: state EA) in > >>>>>>> __btrfs_free_extent:3077: errno=-28 No space left > >>>>>>> kernel: [ 1919.472417] BTRFS error (device sdc: state EA): failed to > >>>>>>> run delayed ref for logical 102538732191744 num_bytes 245760 type 184 > >>>>>>> action 2 ref_mod 1: -28 > >>>>>>> kernel: [ 1919.472442] BTRFS: error (device sdc: state EA) in > >>>>>>> btrfs_run_delayed_refs:2151: errno=-28 No space left > >>>>>>> > >>>>>>> > >>>>>>> On Sat, 17 Jun 2023 at 15:00, Qu Wenruo <wqu@suse.com> wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On 2023/6/17 13:11, Stefan N wrote: > >>>>>>>>> Hi Qu, > >>>>>>>>> > >>>>>>>>> I believe I've got this environment ready, with the 6.2.0 kernel as > >>>>>>>>> before using the Ubuntu kernel, but can switch to vanilla if required. > >>>>>>>>> > >>>>>>>>> I've not done anything kernel modifications for a solid decade, so > >>>>>>>>> would be keen for a bit of guidance. > >>>>>>>> > >>>>>>>> Sure no problem. > >>>>>>>> > >>>>>>>> Please fetch the kernel source tar ball (6.2.x) first, decompress, then > >>>>>>>> apply the attached one-line patch by: > >>>>>>>> > >>>>>>>> $ tar czf linux*.tar.xz > >>>>>>>> $ cd linux* > >>>>>>>> $ patch -np1 -i <the patch file> > >>>>>>>> > >>>>>>>> Then use your running system kernel config if possible: > >>>>>>>> > >>>>>>>> $ cp /proc/config.gz . > >>>>>>>> $ gunzip config.gz > >>>>>>>> $ mv config .config > >>>>>>>> $ make olddefconfig > >>>>>>>> > >>>>>>>> Then you can start your kernel compiling, and considering you're using > >>>>>>>> your distro's default, it would include tons of drivers, thus would be > >>>>>>>> very slow. (Replace the number to something more suitable to your > >>>>>>>> system, using all CPU cores can be very hot) > >>>>>>>> > >>>>>>>> $ make -j12 > >>>>>>>> > >>>>>>>> Finally you need to install the modules/kernel. > >>>>>>>> > >>>>>>>> Unfortunately this is distro specific, but if you're using Ubuntu, it > >>>>>>>> may be much easier: > >>>>>>>> > >>>>>>>> $ make bindeb-pkg > >>>>>>>> > >>>>>>>> Then install the generated dpkg I guess? I have never tried kernel > >>>>>>>> building using deb/rpm, but only manual installation, which is also > >>>>>>>> distro dependent in the initramfs generation part. > >>>>>>>> > >>>>>>>> # cp arch/x86/boot/bzImage /boot/vmlinuz-custom > >>>>>>>> # make modules_install > >>>>>>>> # mkinitcpio -k /boot/vmlinuz-custom -g /boot/initramfs-custom.img > >>>>>>>> > >>>>>>>> > >>>>>>>> The last step is to update your bootloader to add the new kernel, which > >>>>>>>> is not only distro dependent but also bootloader dependent. > >>>>>>>> > >>>>>>>> In my case, I go with systemd-boot with manually crafted entries. > >>>>>>>> But if you go Ubuntu I believe just installing the kernel dpkg would > >>>>>>>> have everything handled? > >>>>>>>> > >>>>>>>> Finally you can try reboot into the newer kernel, and try device add > >>>>>>>> (need to add 4 disks), then sync and see if things work as expected. > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Qu > >>>>>>>>> > >>>>>>>>> I will recover a 1tb SSD and partition it into 4 in a USB enclosure, > >>>>>>>>> but failing this will use 4x loop devices. > >>>>>>>>> > >>>>>>>>> On Tue, 13 Jun 2023 at 11:28, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > >>>>>>>>>> In your particular case, since you're running RAID1C4 you need to add 4 > >>>>>>>>>> devices in one transaction. > >>>>>>>>>> > >>>>>>>>>> I can easily craft a patch to avoid commit transaction, but still you'll > >>>>>>>>>> need to add at least 4 disks, and then sync to see if things would work. > >>>>>>>>>> > >>>>>>>>>> Furthermore this means you need a liveCD with full kernel compiling > >>>>>>>>>> environment. > >>>>>>>>>> > >>>>>>>>>> If you want to go this path, I can send you the patch when you've > >>>>>>>>>> prepared the needed environment. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Out of space loop: skip_balance not working 2023-06-26 12:58 ` Stefan N @ 2023-07-22 5:28 ` Stefan N 2023-07-22 10:08 ` Qu Wenruo 0 siblings, 1 reply; 22+ messages in thread From: Stefan N @ 2023-07-22 5:28 UTC (permalink / raw) To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs@vger.kernel.org Hi again Qu, Thanks for all your help last month, I managed to get things going again and have been slowly adding new disks, but have now ended up with a similar but slightly more complicated problem I need some more assistance with. Since last time: I used loop devices to get the fs operational again, then deleted some files to create space, removed the loop devices, successfully used btrfs replace to replace 3x 12tb disks with 18tbs, and moved to space cache v2 in the hope it'd prevent future issues. The problem: during the 4th replace operation the metadata issue has recurred, the first time self correcting when remounted, but this second time has resulted in a similar paradox to last time. I've rebooted into the patched kernel from last month, but the same solution is now ineffective due to the system failing to detect the replace target, despite no disks having been removed nor changing from /dev/sda and /dev/sdl during the reboots. During the replace process the disks were in use, and while after there's plenty of space for data it seems enough was written to fill metadata again. In hindsight I should have left the 4 loop devices in place until the replaces had completed to satisfy the RAID1C4 requirement for the metadata, as despite deleting files data has not been freed from the existing 12tb disks. The 'missing' replace target is: Disk /dev/sda: 16.37 TiB, 18000207937536 bytes, 35156656128 sectors $ btrfs fi show Label: none uuid: 3cde0d85-f53e-4db6-ac2c-a0e6528c5ced Total devices 8 FS bytes used 65.22TiB devid 1 size 16.37TiB used 11.50TiB path /dev/sdf devid 2 size 10.91TiB used 10.91TiB path /dev/sdg devid 3 size 16.37TiB used 11.50TiB path /dev/sdd devid 4 size 10.91TiB used 10.91TiB path /dev/sdl devid 5 size 10.91TiB used 10.91TiB path /dev/sde devid 6 size 10.91TiB used 10.91TiB path /dev/sdc devid 7 size 16.37TiB used 11.50TiB path /dev/sdh devid 8 size 10.91TiB used 10.91TiB path /dev/sdb $ sudo btrfs fi usage /mnt/data/ Overall: Device size: 103.68TiB Device allocated: 89.06TiB Device unallocated: 14.62TiB Device missing: 0.00B Device slack: 0.00B Used: 88.79TiB Free (estimated): 10.96TiB (min: 3.85TiB) Free (statfs, df): 0.00B Data ratio: 1.36 Metadata ratio: 4.00 Global reserve: 512.00MiB (used: 48.00KiB) Multiple profiles: no Data,RAID6: Size:65.34TiB, Used:65.14TiB (99.69%) /dev/sdf 11.48TiB /dev/sdg 10.90TiB /dev/sdd 11.45TiB /dev/sdl 10.87TiB /dev/sde 10.86TiB /dev/sdc 10.87TiB /dev/sdh 11.46TiB /dev/sdb 10.88TiB Metadata,RAID1C4: Size:77.75GiB, Used:77.69GiB (99.93%) /dev/sdf 15.30GiB /dev/sdg 18.40GiB /dev/sdd 49.58GiB /dev/sdl 49.47GiB /dev/sde 51.50GiB /dev/sdc 48.65GiB /dev/sdh 39.07GiB /dev/sdb 39.01GiB System,RAID1C4: Size:25.00MiB, Used:5.28MiB (21.12%) /dev/sdf 21.00MiB /dev/sdd 25.00MiB /dev/sdl 5.00MiB /dev/sde 4.00MiB /dev/sdc 25.00MiB /dev/sdh 20.00MiB Unallocated: /dev/sdf 4.87TiB /dev/sdg 1.00MiB /dev/sdd 4.87TiB /dev/sdl 1.00MiB /dev/sde 1.00MiB /dev/sdc 1.00MiB /dev/sdh 4.87TiB /dev/sdb 1.00MiB $ $ sudo mount /mnt/data ; sudo btrfs replace cancel /mnt/data ; sudo btrfs dev add -K -f /dev/loop20 /dev/loop21 /dev/loop22 /dev/loop23 /mnt/data ; sudo btrfs fi sync /mnt/data mount: /mnt/data: wrong fs type, bad option, bad superblock on /dev/sde, missing codepage or helper program, or other error. dmesg(1) may have more information after failed mount system call. ERROR: not a btrfs filesystem: /mnt/data ERROR: not a btrfs filesystem: /mnt/data ERROR: Could not sync filesystem: Inappropriate ioctl for device $ syslog: BTRFS info (device sdf): using crc32c (crc32c-intel) checksum algorithm BTRFS info (device sdf): using free space tree BTRFS info (device sdf): bdev /dev/sdg errs: wr 0, rd 0, flush 0, corrupt 845, gen 0 BTRFS info (device sdf): bdev /dev/sde errs: wr 3, rd 7, flush 0, corrupt 0, gen 0 BTRFS info (device sdf): bdev /dev/sdc errs: wr 41, rd 0, flush 0, corrupt 0, gen 0 BTRFS warning (device sdf): cannot mount because device replace operation is ongoing and BTRFS warning (device sdf): tgtdev (devid 0) is missing, need to run 'btrfs dev scan'? BTRFS error (device sdf): failed to init dev_replace: -5 BTRFS error (device sdf): open_ctree failed $ sudo mount -o degraded /mnt/data ; sudo btrfs replace cancel /mnt/data ; sudo btrfs dev add -K -f /dev/loop20 /dev/loop21 /dev/loop22 /dev/loop23 /mnt/data ; sudo btrfs fi sync /mnt/data ERROR: error adding device '/dev/loop20': Read-only file system ERROR: error adding device '/dev/loop21': Read-only file system ERROR: error adding device '/dev/loop22': Read-only file system ERROR: error adding device '/dev/loop23': Read-only file system ERROR: Could not sync filesystem: Read-only file system $ syslog: BTRFS info (device sdf): using crc32c (crc32c-intel) checksum algorithm BTRFS info (device sdf): allowing degraded mounts BTRFS info (device sdf): using free space tree BTRFS info (device sdf): bdev /dev/sdg errs: wr 0, rd 0, flush 0, corrupt 845, gen 0 BTRFS info (device sdf): bdev /dev/sde errs: wr 3, rd 7, flush 0, corrupt 0, gen 0 BTRFS info (device sdf): bdev /dev/sdc errs: wr 41, rd 0, flush 0, corrupt 0, gen 0 BTRFS info (device sdf): cannot continue dev_replace, tgtdev is missing BTRFS info (device sdf): you may cancel the operation after 'mount -o degraded' BTRFS: Transaction aborted (error -28) WARNING: CPU: 0 PID: 6659 at fs/btrfs/extent-tree.c:3077 __btrfs_free_extent+0xa18/0xf50 [btrfs] Modules linked in: xt_nat xt_tcpudp veth xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc rpcsec_gss_krb5 nfsv4 nfs fscache netfs ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) binfmt_misc nls_iso8859_1 intel_rapl_msr snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg intel_rapl_common snd_intel_sdw_acpi edac_mce_amd snd_hda_codec kvm_amd snd_hda_core kvm snd_hwdep irqbypass snd_pcm rapl wmi_bmof snd_timer k10temp snd ccp soundcore joydev input_leds mac_hid dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua bonding tls msr nfsd efi_pstore auth_rpcgss nfs_acl lockd grace sunrpc dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_logitech_hidpp hid_logitech_dj amdgpu hid_generic iommu_v2 drm_buddy gpu_sched drm_ttm_helper ttm drm_display_helper uas cec rc_core usbhid hid usb_storage drm_kms_helper syscopyarea sysfillrect sysimgblt crct10dif_pclmul igb crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel dca sha512_ssse3 aesni_intel crypto_simd drm nvme ahci cryptd libahci qlcnic i2c_algo_bit nvme_core mpt3sas xhci_pci video raid_class scsi_transport_sas xhci_pci_renesas nvme_common i2c_piix4 wmi CPU: 0 PID: 6659 Comm: btrfs Tainted: G W O 6.2.0-23-generic #23+btrdebug2c Hardware name: To Be Filled By O.E.M. X570M Pro4/X570M Pro4, BIOS P3.70 02/23/2022 RIP: 0010:__btrfs_free_extent+0xa18/0xf50 [btrfs] Code: 48 c7 c6 80 19 71 c1 48 8b 78 50 e8 82 57 0e 00 41 b8 01 00 00 00 e9 58 fe ff ff 8b 75 94 48 c7 c7 a8 19 71 c1 e8 d8 92 4d c7 <0f> 0b e9 64 fb ff ff 8b 7d 90 e8 b9 04 ff ff 84 c0 0f 85 f1 01 00 RSP: 0018:ffffb05e4746fa38 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000b711db1d0000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffb05e4746fad8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 R13: 0000000000000000 R14: ffff88edc031ea90 R15: ffff88edc3ba0230 FS: 00007f2b14740d40(0000) GS:ffff88f4e0a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000c000253000 CR3: 00000001e7cc8000 CR4: 00000000003506f0 Call Trace: <TASK> run_delayed_tree_ref+0x69/0x1b0 [btrfs] btrfs_run_delayed_refs_for_head+0x3aa/0x520 [btrfs] ? btrfs_create_pending_block_groups+0x280/0x4d0 [btrfs] __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] commit_cowonly_roots+0x1e7/0x240 [btrfs] btrfs_commit_transaction+0x5d2/0xbc0 [btrfs] ? start_transaction+0xc8/0x600 [btrfs] btrfs_dev_replace_cancel+0x168/0x2e0 [btrfs] btrfs_ioctl+0x12ed/0x14d0 [btrfs] ? __handle_mm_fault+0x661/0x720 __x64_sys_ioctl+0xa0/0xe0 do_syscall_64+0x5b/0x90 ? do_user_addr_fault+0x1e8/0x720 ? exit_to_user_mode_prepare+0x30/0xb0 ? irqentry_exit_to_user_mode+0x9/0x20 ? irqentry_exit+0x43/0x50 ? exc_page_fault+0x91/0x1b0 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7f2b145119ef Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00 RSP: 002b:00007ffcda96ca10 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2b145119ef RDX: 00007ffcda96ca80 RSI: 00000000ca289435 RDI: 0000000000000003 RBP: 0000000000000003 R08: 0000000000021001 R09: 0000000000000000 R10: fffffffffffff000 R11: 0000000000000246 R12: 00007ffcda96e7eb R13: 000056092aafbe60 R14: 000056092aab3578 R15: 0000000000000000 </TASK> ---[ end trace 0000000000000000 ]--- BTRFS info (device sdf: state A): dumping space info: BTRFS info (device sdf: state A): space_info DATA has 219646795776 free, is not full BTRFS info (device sdf: state A): space_info total=71845742116864, used=71626091782144, pinned=0, reserved=0, may_use=0, readonly=3538944 zone_unusable=0 BTRFS info (device sdf: state A): space_info METADATA has -536821760 free, is full BTRFS info (device sdf: state A): space_info total=83481329664, used=83421233152, pinned=57606144, reserved=2490368, may_use=536821760, readonly=0 zone_unusable=0 BTRFS info (device sdf: state A): space_info SYSTEM has 20676608 free, is not full BTRFS info (device sdf: state A): space_info total=26214400, used=5537792, pinned=0, reserved=0, may_use=0, readonly=0 zone_unusable=0 BTRFS info (device sdf: state A): global_block_rsv: size 536870912 reserved 536805376 BTRFS info (device sdf: state A): trans_block_rsv: size 0 reserved 0 BTRFS info (device sdf: state A): chunk_block_rsv: size 0 reserved 0 BTRFS info (device sdf: state A): delayed_block_rsv: size 0 reserved 0 BTRFS info (device sdf: state A): delayed_refs_rsv: size 523239424 reserved 16384 BTRFS: error (device sdf: state A) in __btrfs_free_extent:3077: errno=-28 No space left BTRFS info (device sdf: state EA): forced readonly BTRFS error (device sdf: state EA): failed to run delayed ref for logical 201287318437888 num_bytes 16384 type 176 action 2 ref_mod 1: -28 BTRFS: error (device sdf: state EA) in btrfs_run_delayed_refs:2151: errno=-28 No space left BTRFS warning (device sdf: state EA): Skipping commit of aborted transaction. BTRFS: error (device sdf: state EA) in cleanup_transaction:1986: errno=-28 No space left ------------[ cut here ]------------ WARNING: CPU: 0 PID: 6659 at fs/btrfs/dev-replace.c:1121 btrfs_dev_replace_cancel+0x2b0/0x2e0 [btrfs] Modules linked in: xt_nat xt_tcpudp veth xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc rpcsec_gss_krb5 nfsv4 nfs fscache netfs ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) binfmt_misc nls_iso8859_1 intel_rapl_msr snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg intel_rapl_common snd_intel_sdw_acpi edac_mce_amd snd_hda_codec kvm_amd snd_hda_core kvm snd_hwdep irqbypass snd_pcm rapl wmi_bmof snd_timer k10temp snd ccp soundcore joydev input_leds mac_hid dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua bonding tls msr nfsd efi_pstore auth_rpcgss nfs_acl lockd grace sunrpc dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 2023-07-22T14:04:29.956673+09:30 ltsnas kernel: [ 422.690184] hid_logitech_hidpp hid_logitech_dj amdgpu hid_generic iommu_v2 drm_buddy gpu_sched drm_ttm_helper ttm drm_display_helper uas cec rc_core usbhid hid usb_storage drm_kms_helper syscopyarea sysfillrect sysimgblt crct10dif_pclmul igb crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel dca sha512_ssse3 aesni_intel crypto_simd drm nvme ahci cryptd libahci qlcnic i2c_algo_bit nvme_core mpt3sas xhci_pci video raid_class scsi_transport_sas xhci_pci_renesas nvme_common i2c_piix4 wmi CPU: 0 PID: 6659 Comm: btrfs Tainted: G W O 6.2.0-23-generic #23+btrdebug2c Hardware name: To Be Filled By O.E.M. X570M Pro4/X570M Pro4, BIOS P3.70 02/23/2022 RIP: 0010:btrfs_dev_replace_cancel+0x2b0/0x2e0 [btrfs] Code: 4c 89 c2 e8 52 3f 02 00 e8 9d 4a 4e c7 e9 35 ff ff ff 4c 89 e7 48 89 45 d0 e8 bc d5 3f c8 48 8b 45 d0 41 89 c5 e9 38 ff ff ff <0f> 0b e9 b9 fe ff ff 41 bd e2 ff ff ff e9 26 ff ff ff 48 c7 c2 74 RSP: 0018:ffffb05e4746fd58 EFLAGS: 00010286 RAX: 00000000ffffffe4 RBX: ffff88edda916000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffb05e4746fd88 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff88edda916ab0 R13: ffff88eddb627800 R14: ffff88ede7fad000 R15: ffff88edda916ad0 FS: 00007f2b14740d40(0000) GS:ffff88f4e0a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000c000253000 CR3: 00000001e7cc8000 CR4: 00000000003506f0 Call Trace: <TASK> btrfs_ioctl+0x12ed/0x14d0 [btrfs] ? __handle_mm_fault+0x661/0x720 __x64_sys_ioctl+0xa0/0xe0 do_syscall_64+0x5b/0x90 ? do_user_addr_fault+0x1e8/0x720 ? exit_to_user_mode_prepare+0x30/0xb0 ? irqentry_exit_to_user_mode+0x9/0x20 ? irqentry_exit+0x43/0x50 ? exc_page_fault+0x91/0x1b0 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7f2b145119ef Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00 RSP: 002b:00007ffcda96ca10 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2b145119ef RDX: 00007ffcda96ca80 RSI: 00000000ca289435 RDI: 0000000000000003 RBP: 0000000000000003 R08: 0000000000021001 R09: 0000000000000000 R10: fffffffffffff000 R11: 0000000000000246 R12: 00007ffcda96e7eb R13: 000056092aafbe60 R14: 000056092aab3578 R15: 0000000000000000 </TASK> ---[ end trace 0000000000000000 ]--- BTRFS info (device sdf: state EA): suspended dev_replace from /dev/sdl (devid 4) to <missing disk> canceled BTRFS error (device sdf: state EA): failed to add disk /dev/loop20: -30 BTRFS error (device sdf: state EA): failed to add disk /dev/loop21: -30 BTRFS error (device sdf: state EA): failed to add disk /dev/loop22: -30 BTRFS error (device sdf: state EA): failed to add disk /dev/loop23: -30 On Mon, 26 Jun 2023 at 22:28, Stefan N <stefannnau@gmail.com> wrote: > > Hi Qu, > > Thanks for all the help, I managed to get it mounted and synced with > 5G loops (2G allocated to metadata, 3G unallocated on each). > > I'm able to read existing files, write new files, and any changes > remain after an unmount and remount. > > $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs > dev add -K -f /dev/loop20 /dev/loop21 /dev/loop22 /dev/loop23 > /mnt/data ; sudo btrfs fi sync /mnt/data > $ sudo btrfs fi show > Label: none uuid: abc123 > Total devices 12 FS bytes used 64.52TiB > devid 1 size 10.91TiB used 10.89TiB path /dev/sdd > devid 2 size 10.91TiB used 10.89TiB path /dev/sdh > devid 3 size 10.91TiB used 10.89TiB path /dev/sdb > devid 4 size 10.91TiB used 10.89TiB path /dev/sdg > devid 5 size 10.91TiB used 10.89TiB path /dev/sdi > devid 6 size 10.91TiB used 10.89TiB path /dev/sde > devid 7 size 10.91TiB used 10.89TiB path /dev/sdf > devid 8 size 10.91TiB used 10.89TiB path /dev/sdc > devid 9 size 5.00GiB used 2.00GiB path /dev/loop20 > devid 10 size 5.00GiB used 2.00GiB path /dev/loop21 > devid 11 size 5.00GiB used 2.00GiB path /dev/loop22 > devid 12 size 5.00GiB used 2.00GiB path /dev/loop23 > $ > > I'd be keen to know what you'd suggest for next steps. I have two 18T > disks to upgrade two of the existing 12T disks, which could be a > substitute or add them over USB for a while. > > While a random sample of files seem to be perfectly intact, I'd be > keen to verify the integrity to track down any corrupted files. > > Should I perform a scrub before adding/replacing the new disks, or can > this be safely done afterwards? e.g. can I safely add 2x18tb, remove > loops, begin scrub, and then remove 2x 12tb when scrub completes? > > See kernel log below: > > kernel: [ 399.272458] BTRFS info (device sdd): using crc32c > (crc32c-intel) checksum algorithm > kernel: [ 399.272476] BTRFS info (device sdd): disk space caching is enabled > kernel: [ 404.855750] BTRFS info (device sdd): bdev /dev/sdh errs: wr > 0, rd 0, flush 0, corrupt 845, gen 0 > kernel: [ 404.855766] BTRFS info (device sdd): bdev /dev/sdb errs: wr > 41089, rd 1556, flush 0, corrupt 0, gen 0 > kernel: [ 404.855778] BTRFS info (device sdd): bdev /dev/sdi errs: wr > 3, rd 7, flush 0, corrupt 0, gen 0 > kernel: [ 404.855785] BTRFS info (device sdd): bdev /dev/sde errs: wr > 41, rd 0, flush 0, corrupt 0, gen 0 > kernel: [ 630.844173] BTRFS info (device sdd): balance: resume skipped > kernel: [ 630.844185] BTRFS info (device sdd): checking UUID tree > kernel: [ 630.871787] BTRFS info (device sdd): disk added /dev/loop20 > kernel: [ 630.881223] BTRFS info (device sdd): disk added /dev/loop21 > kernel: [ 630.888817] BTRFS info (device sdd): disk added /dev/loop22 > kernel: [ 630.896302] BTRFS info (device sdd): disk added /dev/loop23 > kernel: [ 846.849616] INFO: task btrfs-uuid:4834 blocked for more > than 120 seconds. > kernel: [ 846.849660] Tainted: G W O > 6.2.0-23-generic #23+btrdebug2c > kernel: [ 846.849693] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > kernel: [ 846.849725] task:btrfs-uuid state:D stack:0 > pid:4834 ppid:2 flags:0x00004000 > kernel: [ 846.849735] Call Trace: > kernel: [ 846.849739] <TASK> > kernel: [ 846.849747] __schedule+0x2aa/0x610 > kernel: [ 846.849761] schedule+0x63/0x110 > kernel: [ 846.849769] wait_current_trans+0x100/0x160 [btrfs] > kernel: [ 846.849908] ? __pfx_autoremove_wake_function+0x10/0x10 > kernel: [ 846.849920] start_transaction+0x28b/0x600 [btrfs] > kernel: [ 846.850057] btrfs_start_transaction+0x1e/0x30 [btrfs] > kernel: [ 846.850191] btrfs_uuid_scan_kthread+0x314/0x420 [btrfs] > kernel: [ 846.850359] ? __pfx_btrfs_uuid_rescan_kthread+0x10/0x10 [btrfs] > kernel: [ 846.850487] btrfs_uuid_rescan_kthread+0x20/0x70 [btrfs] > kernel: [ 846.850614] kthread+0xe9/0x110 > kernel: [ 846.850623] ? __pfx_kthread+0x10/0x10 > kernel: [ 846.850631] ret_from_fork+0x2c/0x50 > kernel: [ 846.850642] </TASK> > kernel: [ 846.850645] INFO: task btrfs:4850 blocked for more than 120 seconds. > kernel: [ 846.850676] Tainted: G W O > 6.2.0-23-generic #23+btrdebug2c > kernel: [ 846.850707] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > kernel: [ 846.850738] task:btrfs state:D stack:0 > pid:4850 ppid:4849 flags:0x00000002 > kernel: [ 846.850746] Call Trace: > kernel: [ 846.850749] <TASK> > kernel: [ 846.850752] __schedule+0x2aa/0x610 > kernel: [ 846.850760] schedule+0x63/0x110 > kernel: [ 846.850765] btrfs_commit_transaction+0x9b7/0xbc0 [btrfs] > kernel: [ 846.850899] ? __pfx_autoremove_wake_function+0x10/0x10 > kernel: [ 846.850908] btrfs_sync_fs+0x5a/0x1b0 [btrfs] > kernel: [ 846.851027] btrfs_ioctl+0x643/0x14d0 [btrfs] > kernel: [ 846.851186] ? putname+0x5d/0x80 > kernel: [ 846.851195] ? do_sys_openat2+0xab/0x180 > kernel: [ 846.851203] ? exit_to_user_mode_prepare+0x30/0xb0 > kernel: [ 846.851213] __x64_sys_ioctl+0xa0/0xe0 > kernel: [ 846.851221] do_syscall_64+0x5b/0x90 > kernel: [ 846.851229] ? exc_page_fault+0x91/0x1b0 > kernel: [ 846.851236] entry_SYSCALL_64_after_hwframe+0x72/0xdc > kernel: [ 846.851243] RIP: 0033:0x7fbf339119ef > kernel: [ 846.851249] RSP: 002b:00007ffd58427660 EFLAGS: 00000246 > ORIG_RAX: 0000000000000010 > kernel: [ 846.851255] RAX: ffffffffffffffda RBX: 0000000000000003 > RCX: 00007fbf339119ef > kernel: [ 846.851259] RDX: 0000000000000000 RSI: 0000000000009408 > RDI: 0000000000000003 > kernel: [ 846.851263] RBP: 0000000000000007 R08: 0000000000000000 > R09: 0000000000000000 > kernel: [ 846.851266] R10: 0000000000000000 R11: 0000000000000246 > R12: 00007fbf339f642c > kernel: [ 846.851269] R13: 0000000000000001 R14: 0000557384b29578 > R15: 0000000000000000 > kernel: [ 846.851277] </TASK> > kernel: [ 967.681770] INFO: task btrfs-uuid:4834 blocked for more > than 241 seconds. > kernel: [ 967.681818] Tainted: G W O > 6.2.0-23-generic #23+btrdebug2c > kernel: [ 967.681852] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > kernel: [ 967.681884] task:btrfs-uuid state:D stack:0 > pid:4834 ppid:2 flags:0x00004000 > kernel: [ 967.681895] Call Trace: > kernel: [ 967.681899] <TASK> > kernel: [ 967.681907] __schedule+0x2aa/0x610 > kernel: [ 967.681922] schedule+0x63/0x110 > kernel: [ 967.681931] wait_current_trans+0x100/0x160 [btrfs] > kernel: [ 967.682070] ? __pfx_autoremove_wake_function+0x10/0x10 > kernel: [ 967.682082] start_transaction+0x28b/0x600 [btrfs] > kernel: [ 967.682219] btrfs_start_transaction+0x1e/0x30 [btrfs] > kernel: [ 967.682353] btrfs_uuid_scan_kthread+0x314/0x420 [btrfs] > kernel: [ 967.682519] ? __pfx_btrfs_uuid_rescan_kthread+0x10/0x10 [btrfs] > kernel: [ 967.682645] btrfs_uuid_rescan_kthread+0x20/0x70 [btrfs] > kernel: [ 967.682728] kthread+0xe9/0x110 > kernel: [ 967.682734] ? __pfx_kthread+0x10/0x10 > kernel: [ 967.682739] ret_from_fork+0x2c/0x50 > kernel: [ 967.682746] </TASK> > kernel: [ 967.682749] INFO: task btrfs:4850 blocked for more than 241 seconds. > kernel: [ 967.682771] Tainted: G W O > 6.2.0-23-generic #23+btrdebug2c > kernel: [ 967.682793] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > kernel: [ 967.682815] task:btrfs state:D stack:0 > pid:4850 ppid:4849 flags:0x00000002 > kernel: [ 967.682820] Call Trace: > kernel: [ 967.682822] <TASK> > kernel: [ 967.682824] __schedule+0x2aa/0x610 > kernel: [ 967.682829] schedule+0x63/0x110 > kernel: [ 967.682832] btrfs_commit_transaction+0x9b7/0xbc0 [btrfs] > kernel: [ 967.682918] ? __pfx_autoremove_wake_function+0x10/0x10 > kernel: [ 967.682923] btrfs_sync_fs+0x5a/0x1b0 [btrfs] > kernel: [ 967.682999] btrfs_ioctl+0x643/0x14d0 [btrfs] > kernel: [ 967.683085] ? putname+0x5d/0x80 > kernel: [ 967.683091] ? do_sys_openat2+0xab/0x180 > kernel: [ 967.683096] ? exit_to_user_mode_prepare+0x30/0xb0 > kernel: [ 967.683103] __x64_sys_ioctl+0xa0/0xe0 > kernel: [ 967.683107] do_syscall_64+0x5b/0x90 > kernel: [ 967.683112] ? exc_page_fault+0x91/0x1b0 > kernel: [ 967.683116] entry_SYSCALL_64_after_hwframe+0x72/0xdc > kernel: [ 967.683121] RIP: 0033:0x7fbf339119ef > kernel: [ 967.683124] RSP: 002b:00007ffd58427660 EFLAGS: 00000246 > ORIG_RAX: 0000000000000010 > kernel: [ 967.683128] RAX: ffffffffffffffda RBX: 0000000000000003 > RCX: 00007fbf339119ef > kernel: [ 967.683130] RDX: 0000000000000000 RSI: 0000000000009408 > RDI: 0000000000000003 > kernel: [ 967.683132] RBP: 0000000000000007 R08: 0000000000000000 > R09: 0000000000000000 > kernel: [ 967.683134] R10: 0000000000000000 R11: 0000000000000246 > R12: 00007fbf339f642c > kernel: [ 967.683136] R13: 0000000000000001 R14: 0000557384b29578 > R15: 0000000000000000 > kernel: [ 967.683141] </TASK> > kernel: [ 1088.519959] INFO: task btrfs-uuid:4834 blocked for more > than 362 seconds. > kernel: [ 1088.520006] Tainted: G W O > 6.2.0-23-generic #23+btrdebug2c > kernel: [ 1088.520039] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > kernel: [ 1088.520071] task:btrfs-uuid state:D stack:0 > pid:4834 ppid:2 flags:0x00004000 > kernel: [ 1088.520082] Call Trace: > kernel: [ 1088.520087] <TASK> > kernel: [ 1088.520094] __schedule+0x2aa/0x610 > kernel: [ 1088.520108] schedule+0x63/0x110 > kernel: [ 1088.520117] wait_current_trans+0x100/0x160 [btrfs] > kernel: [ 1088.520257] ? __pfx_autoremove_wake_function+0x10/0x10 > kernel: [ 1088.520269] start_transaction+0x28b/0x600 [btrfs] > kernel: [ 1088.520406] btrfs_start_transaction+0x1e/0x30 [btrfs] > kernel: [ 1088.520539] btrfs_uuid_scan_kthread+0x314/0x420 [btrfs] > kernel: [ 1088.520706] ? __pfx_btrfs_uuid_rescan_kthread+0x10/0x10 [btrfs] > kernel: [ 1088.520834] btrfs_uuid_rescan_kthread+0x20/0x70 [btrfs] > kernel: [ 1088.520961] kthread+0xe9/0x110 > kernel: [ 1088.520969] ? __pfx_kthread+0x10/0x10 > kernel: [ 1088.520977] ret_from_fork+0x2c/0x50 > kernel: [ 1088.520987] </TASK> > kernel: [ 1088.520990] INFO: task btrfs:4850 blocked for more than 362 seconds. > kernel: [ 1088.521021] Tainted: G W O > 6.2.0-23-generic #23+btrdebug2c > kernel: [ 1088.521052] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > kernel: [ 1088.521084] task:btrfs state:D stack:0 > pid:4850 ppid:4849 flags:0x00000002 > kernel: [ 1088.521092] Call Trace: > kernel: [ 1088.521095] <TASK> > kernel: [ 1088.521098] __schedule+0x2aa/0x610 > kernel: [ 1088.521106] schedule+0x63/0x110 > kernel: [ 1088.521111] btrfs_commit_transaction+0x9b7/0xbc0 [btrfs] > kernel: [ 1088.521245] ? __pfx_autoremove_wake_function+0x10/0x10 > kernel: [ 1088.521254] btrfs_sync_fs+0x5a/0x1b0 [btrfs] > kernel: [ 1088.521372] btrfs_ioctl+0x643/0x14d0 [btrfs] > kernel: [ 1088.521530] ? putname+0x5d/0x80 > kernel: [ 1088.521539] ? do_sys_openat2+0xab/0x180 > kernel: [ 1088.521548] ? exit_to_user_mode_prepare+0x30/0xb0 > kernel: [ 1088.521559] __x64_sys_ioctl+0xa0/0xe0 > kernel: [ 1088.521567] do_syscall_64+0x5b/0x90 > kernel: [ 1088.521575] ? exc_page_fault+0x91/0x1b0 > kernel: [ 1088.521582] entry_SYSCALL_64_after_hwframe+0x72/0xdc > kernel: [ 1088.521589] RIP: 0033:0x7fbf339119ef > kernel: [ 1088.521595] RSP: 002b:00007ffd58427660 EFLAGS: 00000246 > ORIG_RAX: 0000000000000010 > kernel: [ 1088.521602] RAX: ffffffffffffffda RBX: 0000000000000003 > RCX: 00007fbf339119ef > kernel: [ 1088.521606] RDX: 0000000000000000 RSI: 0000000000009408 > RDI: 0000000000000003 > kernel: [ 1088.521610] RBP: 0000000000000007 R08: 0000000000000000 > R09: 0000000000000000 > kernel: [ 1088.521613] R10: 0000000000000000 R11: 0000000000000246 > R12: 00007fbf339f642c > kernel: [ 1088.521616] R13: 0000000000000001 R14: 0000557384b29578 > R15: 0000000000000000 > kernel: [ 1088.521626] </TASK> > kernel: [ 1209.357423] INFO: task btrfs-uuid:4834 blocked for more > than 483 seconds. > kernel: [ 1209.357473] Tainted: G W O > 6.2.0-23-generic #23+btrdebug2c > kernel: [ 1209.357507] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > kernel: [ 1209.357540] task:btrfs-uuid state:D stack:0 > pid:4834 ppid:2 flags:0x00004000 > kernel: [ 1209.357551] Call Trace: > kernel: [ 1209.357555] <TASK> > kernel: [ 1209.357563] __schedule+0x2aa/0x610 > kernel: [ 1209.357577] schedule+0x63/0x110 > kernel: [ 1209.357597] wait_current_trans+0x100/0x160 [btrfs] > kernel: [ 1209.357738] ? __pfx_autoremove_wake_function+0x10/0x10 > kernel: [ 1209.357750] start_transaction+0x28b/0x600 [btrfs] > kernel: [ 1209.357887] btrfs_start_transaction+0x1e/0x30 [btrfs] > kernel: [ 1209.358021] btrfs_uuid_scan_kthread+0x314/0x420 [btrfs] > kernel: [ 1209.358187] ? __pfx_btrfs_uuid_rescan_kthread+0x10/0x10 [btrfs] > kernel: [ 1209.358315] btrfs_uuid_rescan_kthread+0x20/0x70 [btrfs] > kernel: [ 1209.358442] kthread+0xe9/0x110 > kernel: [ 1209.358451] ? __pfx_kthread+0x10/0x10 > kernel: [ 1209.358458] ret_from_fork+0x2c/0x50 > kernel: [ 1209.358468] </TASK> > kernel: [ 1330.195147] INFO: task btrfs-transacti:4088 blocked for > more than 120 seconds. > kernel: [ 1330.195192] Tainted: G W O > 6.2.0-23-generic #23+btrdebug2c > kernel: [ 1330.195221] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > kernel: [ 1330.195250] task:btrfs-transacti state:D stack:0 > pid:4088 ppid:2 flags:0x00004000 > kernel: [ 1330.195259] Call Trace: > kernel: [ 1330.195263] <TASK> > kernel: [ 1330.195269] __schedule+0x2aa/0x610 > kernel: [ 1330.195281] schedule+0x63/0x110 > kernel: [ 1330.195288] wait_for_commit+0x14c/0x1b0 [btrfs] > kernel: [ 1330.195413] ? __pfx_autoremove_wake_function+0x10/0x10 > kernel: [ 1330.195424] btrfs_commit_transaction+0x16c/0xbc0 [btrfs] > kernel: [ 1330.195552] ? start_transaction+0xc8/0x600 [btrfs] > kernel: [ 1330.195676] transaction_kthread+0x14b/0x1c0 [btrfs] > kernel: [ 1330.195795] ? __pfx_transaction_kthread+0x10/0x10 [btrfs] > kernel: [ 1330.195912] kthread+0xe9/0x110 > kernel: [ 1330.195920] ? __pfx_kthread+0x10/0x10 > kernel: [ 1330.195927] ret_from_fork+0x2c/0x50 > kernel: [ 1330.195937] </TASK> > kernel: [ 1330.195939] INFO: task btrfs-uuid:4834 blocked for more > than 604 seconds. > kernel: [ 1330.195968] Tainted: G W O > 6.2.0-23-generic #23+btrdebug2c > kernel: [ 1330.195997] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > kernel: [ 1330.196026] task:btrfs-uuid state:D stack:0 > pid:4834 ppid:2 flags:0x00004000 > kernel: [ 1330.196033] Call Trace: > kernel: [ 1330.196036] <TASK> > kernel: [ 1330.196039] __schedule+0x2aa/0x610 > kernel: [ 1330.196046] schedule+0x63/0x110 > kernel: [ 1330.196051] wait_current_trans+0x100/0x160 [btrfs] > kernel: [ 1330.196169] ? __pfx_autoremove_wake_function+0x10/0x10 > kernel: [ 1330.196177] start_transaction+0x28b/0x600 [btrfs] > kernel: [ 1330.196298] btrfs_start_transaction+0x1e/0x30 [btrfs] > kernel: [ 1330.196416] btrfs_uuid_scan_kthread+0x314/0x420 [btrfs] > kernel: [ 1330.196565] ? __pfx_btrfs_uuid_rescan_kthread+0x10/0x10 [btrfs] > kernel: [ 1330.196680] btrfs_uuid_rescan_kthread+0x20/0x70 [btrfs] > kernel: [ 1330.196794] kthread+0xe9/0x110 > kernel: [ 1330.196800] ? __pfx_kthread+0x10/0x10 > kernel: [ 1330.196807] ret_from_fork+0x2c/0x50 > kernel: [ 1330.196814] </TASK> > kernel: [ 1451.031238] INFO: task btrfs-transacti:4088 blocked for > more than 241 seconds. > kernel: [ 1451.031286] Tainted: G W O > 6.2.0-23-generic #23+btrdebug2c > kernel: [ 1451.031319] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > kernel: [ 1451.031352] task:btrfs-transacti state:D stack:0 > pid:4088 ppid:2 flags:0x00004000 > kernel: [ 1451.031362] Call Trace: > kernel: [ 1451.031366] <TASK> > kernel: [ 1451.031373] __schedule+0x2aa/0x610 > kernel: [ 1451.031388] schedule+0x63/0x110 > kernel: [ 1451.031396] wait_for_commit+0x14c/0x1b0 [btrfs] > kernel: [ 1451.031535] ? __pfx_autoremove_wake_function+0x10/0x10 > kernel: [ 1451.031548] btrfs_commit_transaction+0x16c/0xbc0 [btrfs] > kernel: [ 1451.031684] ? start_transaction+0xc8/0x600 [btrfs] > kernel: [ 1451.031819] transaction_kthread+0x14b/0x1c0 [btrfs] > kernel: [ 1451.031951] ? __pfx_transaction_kthread+0x10/0x10 [btrfs] > kernel: [ 1451.032082] kthread+0xe9/0x110 > kernel: [ 1451.032091] ? __pfx_kthread+0x10/0x10 > kernel: [ 1451.032098] ret_from_fork+0x2c/0x50 > kernel: [ 1451.032108] </TASK> > > On Mon, 26 Jun 2023 at 19:48, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > > > > > > > On 2023/6/24 23:29, Stefan N wrote: > > > Whoops, I had left --dry-run on the first debug patch you commited, so > > > that didn't run correctly. > > > > > > I've included the output from both patches, as they result in different output. > > > > > > Rerunning the older patch first, with loop devices (I tried both > > > 4x100mb and 4x1gb) I get the following: > > > > > [...] > > > *** The below is using the newer patch as follows: > > > $ diff fs/btrfs/ ../linux-6.2.0-dist/fs/btrfs/ > > > diff fs/btrfs/ioctl.c ../linux-6.2.0-dist/fs/btrfs/ioctl.c > > > 2656,2658d2655 > > > < else > > > < btrfs_err(fs_info, "failed to add disk %s: %d", > > > < vol_args->name, ret); > > > diff fs/btrfs/transaction.c ../linux-6.2.0-dist/fs/btrfs/transaction.c > > > 1029d1028 > > > < /* > > > 1031d1029 > > > < */ > > > diff fs/btrfs/volumes.c ../linux-6.2.0-dist/fs/btrfs/volumes.c > > > 2677c2677 > > > < trans = btrfs_join_transaction(root); > > > --- > > >> trans = btrfs_start_transaction(root, 0); > > > 2680d2679 > > > < btrfs_err(fs_info, "failed to start trans: %d", ret); > > > 2769d2767 > > > < btrfs_err(fs_info, "failed to add dev item: %d", ret); > > > 2787,2789c2785 > > > < ret = btrfs_end_transaction(trans); > > > < if (ret < 0) > > > < btrfs_err(fs_info, "failed to end trans: %d", ret); > > > --- > > >> ret = btrfs_commit_transaction(trans); > > > $ > > > > > > $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs > > > dev add -K -f /dev/loop12 /dev/loop13 /dev/loop14 /dev/loop15 > > > /mnt/data ; sudo btrfs fi sync /mnt/data > > > ERROR: Could not sync filesystem: No space left on device > > > > Is it the same even with 4x1GiB loopback devices? > > > > > $ > > > > > > kernel: [ 1811.846087] BTRFS info (device sdc): using crc32c > > > (crc32c-intel) checksum algorithm > > > kernel: [ 1811.846107] BTRFS info (device sdc): disk space caching is enabled > > > kernel: [ 1817.852850] BTRFS info (device sdc): bdev /dev/sde errs: wr > > > 0, rd 0, flush 0, corrupt 845, gen 0 > > > kernel: [ 1817.852866] BTRFS info (device sdc): bdev /dev/sda errs: wr > > > 41089, rd 1556, flush 0, corrupt 0, gen 0 > > > kernel: [ 1817.852877] BTRFS info (device sdc): bdev /dev/sdh errs: wr > > > 3, rd 7, flush 0, corrupt 0, gen 0 > > > kernel: [ 1817.852884] BTRFS info (device sdc): bdev /dev/sdd errs: wr > > > 41, rd 0, flush 0, corrupt 0, gen 0 > > > kernel: [ 2037.562050] BTRFS info (device sdc): balance: resume skipped > > > kernel: [ 2037.562064] BTRFS info (device sdc): checking UUID tree > > > kernel: [ 2037.581550] BTRFS info (device sdc): disk added /dev/loop12 > > > kernel: [ 2037.591163] BTRFS info (device sdc): disk added /dev/loop13 > > > kernel: [ 2037.599477] BTRFS info (device sdc): disk added /dev/loop14 > > > kernel: [ 2037.607064] BTRFS info (device sdc): disk added /dev/loop15 > > > kernel: [ 2176.124630] INFO: task btrfs:7783 blocked for more than 120 seconds. > > > kernel: [ 2176.124678] Tainted: G W O > > > 6.2.0-23-generic #23+btrdebug2c > > > kernel: [ 2176.124710] "echo 0 > > > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > > kernel: [ 2176.124742] task:btrfs state:D stack:0 > > > pid:7783 ppid:7782 flags:0x00004002 > > > kernel: [ 2176.124753] Call Trace: > > > kernel: [ 2176.124758] <TASK> > > > kernel: [ 2176.124765] __schedule+0x2aa/0x610 > > > kernel: [ 2176.124780] schedule+0x63/0x110 > > > kernel: [ 2176.124788] btrfs_commit_transaction+0x9b7/0xbc0 [btrfs] > > > > This means we're doing the real work, but it seems to take too long. > > > > In fact this is already looking promising as we have when through the > > whole device add part. > > > > Just need to let the final commit to finish. > > > > > kernel: [ 2176.124929] ? __pfx_autoremove_wake_function+0x10/0x10 > > > kernel: [ 2176.124941] btrfs_sync_fs+0x5a/0x1b0 [btrfs] > > > kernel: [ 2176.125060] btrfs_ioctl+0x643/0x14d0 [btrfs] > > > kernel: [ 2176.125225] __x64_sys_ioctl+0xa0/0xe0 > > > kernel: [ 2176.125235] do_syscall_64+0x5b/0x90 > > > kernel: [ 2176.125242] ? do_sys_openat2+0xab/0x180 > > > kernel: [ 2176.125251] ? exit_to_user_mode_prepare+0x30/0xb0 > > > kernel: [ 2176.125260] ? syscall_exit_to_user_mode+0x29/0x50 > > > kernel: [ 2176.125268] ? do_syscall_64+0x67/0x90 > > > kernel: [ 2176.125275] entry_SYSCALL_64_after_hwframe+0x72/0xdc > > > kernel: [ 2176.125282] RIP: 0033:0x7f2e8eb119ef > > > kernel: [ 2176.125288] RSP: 002b:00007ffd632b6aa0 EFLAGS: 00000246 > > > ORIG_RAX: 0000000000000010 > > > kernel: [ 2176.125295] RAX: ffffffffffffffda RBX: 0000000000000003 > > > RCX: 00007f2e8eb119ef > > > kernel: [ 2176.125300] RDX: 0000000000000000 RSI: 0000000000009408 > > > RDI: 0000000000000003 > > > kernel: [ 2176.125303] RBP: 0000000000000007 R08: 0000000000000000 > > > R09: 0000000000000000 > > > kernel: [ 2176.125306] R10: 0000000000000000 R11: 0000000000000246 > > > R12: 00007f2e8ebf642c > > > kernel: [ 2176.125310] R13: 0000000000000001 R14: 000055cdb7940578 > > > R15: 0000000000000000 > > > kernel: [ 2176.125318] </TASK> > > > kernel: [ 2296.956781] INFO: task btrfs:7783 blocked for more than 241 seconds. > > > kernel: [ 2296.956824] Tainted: G W O > > > 6.2.0-23-generic #23+btrdebug2c > > > kernel: [ 2296.956856] "echo 0 > > > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > > kernel: [ 2296.956887] task:btrfs state:D stack:0 > > > pid:7783 ppid:7782 flags:0x00004002 > > > kernel: [ 2296.956898] Call Trace: > > > kernel: [ 2296.956902] <TASK> > > > kernel: [ 2296.956908] __schedule+0x2aa/0x610 > > > kernel: [ 2296.956921] schedule+0x63/0x110 > > > kernel: [ 2296.956928] btrfs_commit_transaction+0x9b7/0xbc0 [btrfs] > > > kernel: [ 2296.957069] ? __pfx_autoremove_wake_function+0x10/0x10 > > > kernel: [ 2296.957080] btrfs_sync_fs+0x5a/0x1b0 [btrfs] > > > kernel: [ 2296.957200] btrfs_ioctl+0x643/0x14d0 [btrfs] > > > kernel: [ 2296.957366] __x64_sys_ioctl+0xa0/0xe0 > > > kernel: [ 2296.957375] do_syscall_64+0x5b/0x90 > > > kernel: [ 2296.957383] ? do_sys_openat2+0xab/0x180 > > > kernel: [ 2296.957391] ? exit_to_user_mode_prepare+0x30/0xb0 > > > kernel: [ 2296.957399] ? syscall_exit_to_user_mode+0x29/0x50 > > > kernel: [ 2296.957407] ? do_syscall_64+0x67/0x90 > > > kernel: [ 2296.957414] entry_SYSCALL_64_after_hwframe+0x72/0xdc > > > kernel: [ 2296.957420] RIP: 0033:0x7f2e8eb119ef > > > kernel: [ 2296.957426] RSP: 002b:00007ffd632b6aa0 EFLAGS: 00000246 > > > ORIG_RAX: 0000000000000010 > > > kernel: [ 2296.957433] RAX: ffffffffffffffda RBX: 0000000000000003 > > > RCX: 00007f2e8eb119ef > > > kernel: [ 2296.957438] RDX: 0000000000000000 RSI: 0000000000009408 > > > RDI: 0000000000000003 > > > kernel: [ 2296.957441] RBP: 0000000000000007 R08: 0000000000000000 > > > R09: 0000000000000000 > > > kernel: [ 2296.957444] R10: 0000000000000000 R11: 0000000000000246 > > > R12: 00007f2e8ebf642c > > > kernel: [ 2296.957448] R13: 0000000000000001 R14: 000055cdb7940578 > > > R15: 0000000000000000 > > > kernel: [ 2296.957468] </TASK> > > > kernel: [ 2314.043258] ------------[ cut here ]------------ > > > kernel: [ 2314.043264] BTRFS: Transaction aborted (error -28) > > > kernel: [ 2314.043334] WARNING: CPU: 2 PID: 7739 at > > > fs/btrfs/extent-tree.c:2847 do_free_extent_accounting+0x21a/0x220 > > > [btrfs] > > > kernel: [ 2314.043467] Modules linked in: ipmi_devintf ipmi_msghandler > > > overlay iwlwifi_compat(O) binfmt_misc nls_iso8859_1 intel_rapl_msr > > > snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio > > > intel_rapl_common snd_hda_codec_hdmi edac_mce_amd snd_hda_intel > > > snd_intel_dspcfg kvm_amd snd_intel_sdw_acpi snd_hda_codec kvm > > > snd_hda_core snd_hwdep snd_pcm snd_timer irqbypass rapl wmi_bmof snd > > > k10temp ccp soundcore input_leds mac_hid dm_multipath scsi_dh_rdac > > > scsi_dh_emc scsi_dh_alua bonding tls msr nfsd efi_pstore auth_rpcgss > > > nfs_acl lockd grace sunrpc dmi_sysfs ip_tables x_tables autofs4 btrfs > > > blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq > > > async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear > > > amdgpu iommu_v2 drm_buddy gpu_sched drm_ttm_helper hid_generic ttm > > > drm_display_helper cec uas rc_core usbhid hid drm_kms_helper > > > crct10dif_pclmul syscopyarea usb_storage crc32_pclmul polyval_clmulni > > > sysfillrect polyval_generic sysimgblt nvme ghash_clmulni_intel > > > sha512_ssse3 > > > kernel: [ 2314.043599] nvme_core aesni_intel crypto_simd mpt3sas drm > > > cryptd raid_class ahci i2c_piix4 scsi_transport_sas nvme_common igb > > > xhci_pci qlcnic dca xhci_pci_renesas libahci i2c_algo_bit video wmi > > > kernel: [ 2314.043631] CPU: 2 PID: 7739 Comm: btrfs-transacti Tainted: > > > G W O 6.2.0-23-generic #23+btrdebug2c > > > kernel: [ 2314.043638] Hardware name: To Be Filled By O.E.M. X570M > > > Pro4/X570M Pro4, BIOS P3.70 02/23/2022 > > > kernel: [ 2314.043641] RIP: 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] > > > kernel: [ 2314.043766] Code: ce 0f 0b eb b8 44 89 e6 48 c7 c7 a8 39 a0 > > > c1 e8 2c d5 1e ce 0f 0b e9 78 ff ff ff 44 89 e6 48 c7 c7 a8 39 a0 c1 > > > e8 16 d5 1e ce <0f> 0b eb b9 66 90 90 90 90 90 90 90 90 90 90 90 90 90 > > > 90 90 90 90 > > > kernel: [ 2314.043771] RSP: 0018:ffffad0b11b7bb38 EFLAGS: 00010246 > > > kernel: [ 2314.043777] RAX: 0000000000000000 RBX: ffff9c80e40e8f08 > > > RCX: 0000000000000000 > > > kernel: [ 2314.043781] RDX: 0000000000000000 RSI: 0000000000000000 > > > RDI: 0000000000000000 > > > kernel: [ 2314.043784] RBP: ffffad0b11b7bb60 R08: 0000000000000000 > > > R09: 0000000000000000 > > > kernel: [ 2314.043787] R10: 0000000000000000 R11: 0000000000000000 > > > R12: 00000000ffffffe4 > > > kernel: [ 2314.043790] R13: 00005e4c359ba000 R14: 0000000000020000 > > > R15: ffff9c824d9a58c0 > > > kernel: [ 2314.043794] FS: 0000000000000000(0000) > > > GS:ffff9c87a0a80000(0000) knlGS:0000000000000000 > > > kernel: [ 2314.043798] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > kernel: [ 2314.043802] CR2: 00007f54adc86000 CR3: 00000001471d8000 > > > CR4: 00000000003506e0 > > > kernel: [ 2314.043806] Call Trace: > > > kernel: [ 2314.043809] <TASK> > > > kernel: [ 2314.043815] __btrfs_free_extent+0x6bc/0xf50 [btrfs] > > > kernel: [ 2314.043943] run_delayed_data_ref+0x8b/0x180 [btrfs] > > > kernel: [ 2314.044068] btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] > > > kernel: [ 2314.044192] __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] > > > kernel: [ 2314.044316] btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] > > > kernel: [ 2314.044439] btrfs_start_dirty_block_groups+0x36b/0x530 [btrfs] > > > kernel: [ 2314.044598] btrfs_commit_transaction+0xb3/0xbc0 [btrfs] > > > kernel: [ 2314.044754] ? start_transaction+0xc8/0x600 [btrfs] > > > kernel: [ 2314.044890] transaction_kthread+0x14b/0x1c0 [btrfs] > > > kernel: [ 2314.045021] ? __pfx_transaction_kthread+0x10/0x10 [btrfs] > > > kernel: [ 2314.045151] kthread+0xe9/0x110 > > > kernel: [ 2314.045162] ? __pfx_kthread+0x10/0x10 > > > kernel: [ 2314.045170] ret_from_fork+0x2c/0x50 > > > kernel: [ 2314.045180] </TASK> > > > kernel: [ 2314.045182] ---[ end trace 0000000000000000 ]--- > > > kernel: [ 2314.045186] BTRFS info (device sdc: state A): dumping space info: > > > kernel: [ 2314.045191] BTRFS info (device sdc: state A): space_info > > > DATA has 160777674752 free, is not full > > > kernel: [ 2314.045197] BTRFS info (device sdc: state A): space_info > > > total=71201958395904, used=71013439856640, pinned=27737325568, > > > reserved=0, may_use=0, readonly=3538944 zone_unusable=0 > > > kernel: [ 2314.045205] BTRFS info (device sdc: state A): space_info > > > METADATA has -429047808 free, is full > > > > This means we need at least 500+ MiB metadata space. > > > > Thus you may want to try 4x1GiB to see if this makes any difference. > > > > Thanks, > > Qu > > > kernel: [ 2314.045209] BTRFS info (device sdc: state A): space_info > > > total=83634421760, used=82789777408, pinned=244891648, > > > reserved=599687168, may_use=429047808, readonly=65536 zone_unusable=0 > > > kernel: [ 2314.045217] BTRFS info (device sdc: state A): space_info > > > SYSTEM has 33390592 free, is not full > > > kernel: [ 2314.045221] BTRFS info (device sdc: state A): space_info > > > total=38797312, used=5373952, pinned=16384, reserved=16384, may_use=0, > > > readonly=0 zone_unusable=0 > > > kernel: [ 2314.045227] BTRFS info (device sdc: state A): > > > global_block_rsv: size 536870912 reserved 428523520 > > > kernel: [ 2314.045231] BTRFS info (device sdc: state A): > > > trans_block_rsv: size 524288 reserved 524288 > > > kernel: [ 2314.045235] BTRFS info (device sdc: state A): > > > chunk_block_rsv: size 0 reserved 0 > > > kernel: [ 2314.045239] BTRFS info (device sdc: state A): > > > delayed_block_rsv: size 0 reserved 0 > > > kernel: [ 2314.045242] BTRFS info (device sdc: state A): > > > delayed_refs_rsv: size 249756909568 reserved 0 > > > kernel: [ 2314.045251] BTRFS: error (device sdc: state A) in > > > do_free_extent_accounting:2847: errno=-28 No space left > > > kernel: [ 2314.045265] BTRFS warning (device sdc: state A): > > > btrfs_uuid_scan_kthread failed -28 > > > kernel: [ 2314.045295] BTRFS info (device sdc: state EA): forced readonly > > > kernel: [ 2314.045300] BTRFS error (device sdc: state EA): failed to > > > run delayed ref for logical 103681409916928 num_bytes 131072 type 184 > > > action 2 ref_mod 1: -28 > > > kernel: [ 2314.045360] BTRFS: error (device sdc: state EA) in > > > btrfs_run_delayed_refs:2151: errno=-28 No space left > > > kernel: [ 2314.049204] BTRFS: error (device sdc: state EA) in > > > btrfs_create_pending_block_groups:2487: errno=-28 No space left > > > kernel: [ 2314.049331] BTRFS: error (device sdc: state EA) in > > > btrfs_create_pending_block_groups:2499: errno=-28 No space left > > > kernel: [ 2314.053259] BTRFS: error (device sdc: state EA) in > > > do_free_extent_accounting:2847: errno=-28 No space left > > > kernel: [ 2314.053318] BTRFS error (device sdc: state EA): failed to > > > run delayed ref for logical 103681419366400 num_bytes 131072 type 184 > > > action 2 ref_mod 1: -28 > > > kernel: [ 2314.053375] BTRFS: error (device sdc: state EA) in > > > btrfs_run_delayed_refs:2151: errno=-28 No space left > > > kernel: [ 2314.053430] BTRFS warning (device sdc: state EA): Skipping > > > commit of aborted transaction. > > > kernel: [ 2314.053435] BTRFS: error (device sdc: state EA) in > > > cleanup_transaction:1986: errno=-28 No space left > > > > > > > > > > > > On Fri, 23 Jun 2023 at 19:16, Qu Wenruo <wqu@suse.com> wrote: > > >> > > >> > > >> > > >> On 2023/6/23 17:00, Stefan N wrote: > > >>> Apologies, I thought I included the log output too, though I can't see > > >>> any additional output > > >>> > > >>> From a fresh run, still using the same kernel > > >>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs > > >>> dev add -f /dev/sdl /dev/sdm /dev/sdn /dev/sdo /mnt/data ; sudo btrfs > > >>> fi sync /mnt/data > > >>> ERROR: error adding device '/dev/sdl': Input/output error > > >>> ERROR: error adding device '/dev/sdm': Read-only file system > > >>> ERROR: error adding device '/dev/sdn': Read-only file system > > >>> ERROR: error adding device '/dev/sdo': Read-only file system > > >>> ERROR: Could not sync filesystem: Read-only file system > > >>> $ > > >>> > > >>> Output from kern.log, syslog or dmesg -k > > >>> > > >> [...] > > >> > > >> None of the newly added debug lines triggered, so there is something > > >> else causing the problem. > > >> > > >> And furthermore the backtrace is not that helpful, it only shows it's > > >> some async metadata reclaim kthread causing the problem. > > >> > > >> Although I guess the async metadata reclaim is triggered by the > > >> btrfs_start_transaction() call when adding a device. > > >> So I updated my github branch to go btrfs_join_transaction() which would > > >> not flush any metadata, thus avoid the problem. > > >> > > >> Would you please give it a try again? > > >> > > >>> > > >>> However, now I started digging into logs to check I hadn't missed > > >>> where the errors were being logged, I've found this from roughly a > > >>> week before I started having issues, which I had not previously > > >>> noticed > > >> > > >> You don't need to bother most error messages after the fs flipped RO. > > >> As it's known to have some false alerts. > > >> > > >> Thanks, > > >> Qu > > >> > > >>> [ 1990.495861] BTRFS error (device sdh): failed to run delayed ref for > > >>> logical 107988943355904 num_bytes 245760 type 184 action 2 ref_mod 1: > > >>> -28 > > >>> [ 1990.518282] BTRFS error (device sdh): failed to run delayed ref for > > >>> logical 107989043494912 num_bytes 245760 type 184 action 2 ref_mod 1: > > >>> -28 > > >>> [ 620.104065] BTRFS error (device sdk): failed to run delayed ref for > > >>> logical 123187655077888 num_bytes 176128 type 184 action 2 ref_mod 1: > > >>> -28 > > >>> [ 620.126209] BTRFS error (device sdk): failed to run delayed ref for > > >>> logical 123190279929856 num_bytes 134217728 type 184 action 2 ref_mod > > >>> 1: -28 > > >>> [ 620.126241] BTRFS error (device sdk): failed to run delayed ref for > > >>> logical 123189970468864 num_bytes 134217728 type 184 action 2 ref_mod > > >>> 1: -28 > > >>> [ 620.126271] BTRFS error (device sdk): failed to run delayed ref for > > >>> logical 123190414409728 num_bytes 134217728 type 184 action 2 ref_mod > > >>> 1: -28 > > >>> [ 476.565308] BTRFS error (device sdh): failed to run delayed ref for > > >>> logical 101906434228224 num_bytes 651264 type 184 action 2 ref_mod 1: > > >>> -28 > > >>> [ 476.565932] BTRFS error (device sdh): failed to run delayed ref for > > >>> logical 101906434031616 num_bytes 180224 type 184 action 2 ref_mod 1: > > >>> -28 > > >>> [ 447.371754] BTRFS error (device sdh): failed to run delayed ref for > > >>> logical 101946151927808 num_bytes 262144 type 184 action 2 ref_mod 1: > > >>> -28 > > >>> [ 447.372362] BTRFS error (device sdh): failed to run delayed ref for > > >>> logical 101946083725312 num_bytes 245760 type 184 action 2 ref_mod 1: > > >>> -28 > > >>> [ 439.839007] BTRFS error (device sdj): failed to run delayed ref for > > >>> logical 101923102179328 num_bytes 192512 type 184 action 2 ref_mod 1: > > >>> -28 > > >>> [ 439.839578] BTRFS error (device sdj): failed to run delayed ref for > > >>> logical 101923401629696 num_bytes 245760 type 184 action 2 ref_mod 1: > > >>> -28 > > >>> [ 466.393884] BTRFS error (device sdh): failed to run delayed ref for > > >>> logical 101981116137472 num_bytes 245760 type 184 action 2 ref_mod 1: > > >>> -28 > > >>> [ 466.394451] BTRFS error (device sdh): failed to run delayed ref for > > >>> logical 101981122854912 num_bytes 1720320 type 184 action 2 ref_mod 1: > > >>> -28 > > >>> [ 431.541367] BTRFS error (device sdh): failed to run delayed ref for > > >>> logical 101876426952704 num_bytes 126976 type 184 action 2 ref_mod 1: > > >>> -28 > > >>> [ 431.542010] BTRFS error (device sdh): failed to run delayed ref for > > >>> logical 101876427780096 num_bytes 126976 type 184 action 2 ref_mod 1: > > >>> -28 > > >>> [ 597.487948] BTRFS error (device sdj): failed to run delayed ref for > > >>> logical 108127459409920 num_bytes 196608 type 184 action 2 ref_mod 1: > > >>> -28 > > >>> [ 597.488539] BTRFS error (device sdj): failed to run delayed ref for > > >>> logical 108124677865472 num_bytes 126976 type 184 action 2 ref_mod 1: > > >>> -28 > > >>> [ 534.717509] BTRFS error (device sdh): failed to run delayed ref for > > >>> logical 101958618710016 num_bytes 1597440 type 184 action 2 ref_mod 1: > > >>> -28 > > >>> [ 534.718494] BTRFS error (device sdh): failed to run delayed ref for > > >>> logical 101958756335616 num_bytes 368640 type 184 action 2 ref_mod 1: > > >>> -28 > > >>> [ 508.089394] BTRFS error (device sdk): failed to run delayed ref for > > >>> logical 101911627694080 num_bytes 126976 type 184 action 2 ref_mod 1: > > >>> -28 > > >>> [ 508.090007] BTRFS error (device sdk): failed to run delayed ref for > > >>> logical 101911627415552 num_bytes 126976 type 184 action 2 ref_mod 1: > > >>> -28 > > >>> [ 1632.112084] BTRFS error (device sdh): failed to run delayed ref for > > >>> logical 102203759886336 num_bytes 229376 type 184 action 2 ref_mod 1: > > >>> -28 > > >>> [ 1632.112885] BTRFS error (device sdh): failed to run delayed ref for > > >>> logical 102203764379648 num_bytes 126976 type 184 action 2 ref_mod 1: > > >>> -28 > > >>> > > >>> and today, when leaving the disks mounted read-only for a while, I > > >>> found many occurances similar to: > > >>> BTRFS error (device sdc: state EA): level verify failed on logical > > >>> 201329754554368 mirror 1 wanted 2 found 0 > > >>> BTRFS error (device sdc: state EA): level verify failed on logical > > >>> 201329754554368 mirror 2 wanted 2 found 0 > > >>> BTRFS error (device sdc: state EA): level verify failed on logical > > >>> 201329754554368 mirror 3 wanted 2 found 0 > > >>> BTRFS error (device sdc: state EA): level verify failed on logical > > >>> 201329754554368 mirror 4 wanted 2 found 0 > > >>> BTRFS error (device sdc: state EA): level verify failed on logical > > >>> 201329754554368 mirror 1 wanted 2 found 0 > > >>> BTRFS error (device sdc: state EA): level verify failed on logical > > >>> 201329754554368 mirror 2 wanted 2 found 0 > > >>> BTRFS error (device sdc: state EA): level verify failed on logical > > >>> 201329754554368 mirror 3 wanted 2 found 0 > > >>> BTRFS error (device sdc: state EA): level verify failed on logical > > >>> 201350830227456 mirror 4 wanted 2 found 0 > > >>> BTRFS error (device sdc: state EA): level verify failed on logical > > >>> 201350830227456 mirror 1 wanted 2 found 0 > > >>> BTRFS error (device sdc: state EA): level verify failed on logical > > >>> 201350830227456 mirror 2 wanted 2 found 0 > > >>> > > >>> On Fri, 23 Jun 2023 at 10:27, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > >>>> > > >>>> > > >>>> > > >>>> On 2023/6/23 06:18, Stefan N wrote: > > >>>>> Hi Qu, > > >>>>> > > >>>>> I got one new line this time, but it doesn't seem to match your commit > > >>>>> ERROR: zoned: unable to stat /dev/loop/13 > > >>>> > > >>>> Please provide the dmesg of that attempt, as all the extra debug info is > > >>>> inside dmesg. > > >>>> > > >>>> With that info provided, we can determine what to do next. > > >>>> > > >>>> Thanks, > > >>>> Qu > > >>>> > > >>>>> > > >>>>> I tried it on the USB flash drives too and didn't get any extra line > > >>>>> > > >>>>> In context > > >>>>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs > > >>>>> dev add -K -f /dev/loop12 /dev/loop/13 /dev/loop14 /dev/loop15 > > >>>>> /mnt/data ; sudo btrfs fi sync /mnt/data > > >>>>> ERROR: error adding device '/dev/loop12': Input/output error > > >>>>> ERROR: zoned: unable to stat /dev/loop/13 > > >>>>> ERROR: checking status of /dev/loop/13: No such file or directory > > >>>>> ERROR: error adding device '/dev/loop14': Read-only file system > > >>>>> ERROR: error adding device '/dev/loop15': Read-only file system > > >>>>> ERROR: Could not sync filesystem: Read-only file system > > >>>>> $ > > >>>>> > > >>>>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs > > >>>>> dev add -f /dev/sdl /dev/sdm /dev/sdn /dev/sdo /mnt/data ; sudo btrfs > > >>>>> fi sync /mnt/data > > >>>>> ERROR: error adding device '/dev/sdl': Input/output error > > >>>>> ERROR: error adding device '/dev/sdm': Read-only file system > > >>>>> ERROR: error adding device '/dev/sdn': Read-only file system > > >>>>> ERROR: error adding device '/dev/sdo': Read-only file system > > >>>>> ERROR: Could not sync filesystem: Read-only file system > > >>>>> $ > > >>>>> > > >>>>> On Thu, 22 Jun 2023 at 18:48, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> On 2023/6/22 16:33, Stefan N wrote: > > >>>>>>> Hi Qu, > > >>>>>>> > > >>>>>>> Many thanks for the detailed instructions and your patience. I got it > > >>>>>>> working combined with > > >>>>>>> https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel on the main system > > >>>>>>> OS instead, tagged +btrfix > > >>>>>>> $ uname -vr > > >>>>>>> 6.2.0-23-generic #23+btrfix SMP PREEMPT_DYNAMIC Thu Jun 22 > > >>>>>>> > > >>>>>>> However, I've not had luck with the commands suggested, and would > > >>>>>>> appreciate any further ideas. > > >>>>>>> > > >>>>>>> Outputs follow below, with /mnt/data as the btrfs mount point that > > >>>>>>> currently contains 8x disks sd[a-j] with an additional 4x 64gb USB > > >>>>>>> flash drives being added sd[l-o] > > >>>>>>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs > > >>>>>>> dev add -f /dev/sdl /dev/sdm /dev/sdn /dev/sdo /mnt/data ; sudo btrfs > > >>>>>>> fi sync /mnt/data > > >>>>>>> ERROR: error adding device '/dev/sdl': Input/output error > > >>>>>>> ERROR: error adding device '/dev/sdm': Read-only file system > > >>>>>>> ERROR: error adding device '/dev/sdn': Read-only file system > > >>>>>>> ERROR: error adding device '/dev/sdo': Read-only file system > > >>>>>>> ERROR: Could not sync filesystem: Read-only file system > > >>>>>>> $ > > >>>>>>> > > >>>>>>> The same occurs if I try to add 4x 100mb loop devices (on a ssd so > > >>>>>>> they're super quick to zero); > > >>>>>>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs > > >>>>>>> dev add -K -f /dev/loop16 /dev/loop17 /dev/loop18 /dev/loop19 > > >>>>>>> /mnt/data ; sudo btrfs fi sync /mnt/data > > >>>>>>> ERROR: error adding device '/dev/loop16': Input/output error > > >>>>>> > > >>>>>> This is the interesting part, this means we're erroring out due to -EIO > > >>>>>> (not -ENOSPC) during the first device add. > > >>>>>> > > >>>>>> And by somehow, after the first device add, we already got the trans abort. > > >>>>>> > > >>>>>> Would you please try the following branch? > > >>>>>> > > >>>>>> https://github.com/adam900710/linux/tree/dev_add_no_commit > > >>>>>> > > >>>>>> It has not only the patch to skip the commit, but also extra debug > > >>>>>> output for the situation. > > >>>>>> > > >>>>>> Thanks, > > >>>>>> Qu > > >>>>>> > > >>>>>>> ERROR: error adding device '/dev/loop17': Read-only file system > > >>>>>>> ERROR: error adding device '/dev/loop18': Read-only file system > > >>>>>>> ERROR: error adding device '/dev/loop19': Read-only file system > > >>>>>>> ERROR: Could not sync filesystem: Read-only file system > > >>>>>>> $ > > >>>>>>> > > >>>>>>> I confirmed before both these kernel builds that the replaced line was > > >>>>>>> btrfs_end_transaction rather than btrfs_commit_transaction (anyone > > >>>>>>> else following, I needed to remove the -n in the patch command > > >>>>>>> earlier) > > >>>>>>> $ grep -A3 -ri btrfs_sysfs_update_sprout */fs/btrfs/volumes.c* > > >>>>>>> linux-6.2.0-dist/fs/btrfs/volumes.c: > > >>>>>>> btrfs_sysfs_update_sprout_fsid(fs_devices); > > >>>>>>> linux-6.2.0-dist/fs/btrfs/volumes.c- } > > >>>>>>> linux-6.2.0-dist/fs/btrfs/volumes.c- > > >>>>>>> linux-6.2.0-dist/fs/btrfs/volumes.c- ret = btrfs_commit_transaction(trans); > > >>>>>>> -- > > >>>>>>> linux-6.2.0-v2/fs/btrfs/volumes.c: > > >>>>>>> btrfs_sysfs_update_sprout_fsid(fs_devices); > > >>>>>>> linux-6.2.0-v2/fs/btrfs/volumes.c- } > > >>>>>>> linux-6.2.0-v2/fs/btrfs/volumes.c- > > >>>>>>> linux-6.2.0-v2/fs/btrfs/volumes.c- ret = btrfs_end_transaction(trans); > > >>>>>>> -- > > >>>>>>> linux-6.2.0-v3/fs/btrfs/volumes.c: > > >>>>>>> btrfs_sysfs_update_sprout_fsid(fs_devices); > > >>>>>>> linux-6.2.0-v3/fs/btrfs/volumes.c- } > > >>>>>>> linux-6.2.0-v3/fs/btrfs/volumes.c- > > >>>>>>> linux-6.2.0-v3/fs/btrfs/volumes.c- ret = btrfs_end_transaction(trans); > > >>>>>>> $ > > >>>>>>> > > >>>>>>> $ btrfs fi usage /mnt/data > > >>>>>>> Overall: > > >>>>>>> Device size: 87.31TiB > > >>>>>>> Device allocated: 87.31TiB > > >>>>>>> Device unallocated: 1.94GiB > > >>>>>>> Device missing: 0.00B > > >>>>>>> Device slack: 0.00B > > >>>>>>> Used: 87.08TiB > > >>>>>>> Free (estimated): 173.29GiB (min: 172.33GiB) > > >>>>>>> Free (statfs, df): 171.84GiB > > >>>>>>> Data ratio: 1.34 > > >>>>>>> Metadata ratio: 4.00 > > >>>>>>> Global reserve: 512.00MiB (used: 371.25MiB) > > >>>>>>> Multiple profiles: no > > >>>>>>> > > >>>>>>> Data,RAID6: Size:64.76TiB, Used:64.59TiB (99.74%) > > >>>>>>> /dev/sdc 10.90TiB > > >>>>>>> /dev/sdf 10.90TiB > > >>>>>>> /dev/sda 10.86TiB > > >>>>>>> /dev/sdg 10.87TiB > > >>>>>>> /dev/sdh 10.86TiB > > >>>>>>> /dev/sdd 10.87TiB > > >>>>>>> /dev/sde 10.88TiB > > >>>>>>> /dev/sdb 10.88TiB > > >>>>>>> > > >>>>>>> Metadata,RAID1C4: Size:77.79GiB, Used:77.11GiB (99.12%) > > >>>>>>> /dev/sdc 15.33GiB > > >>>>>>> /dev/sdf 18.41GiB > > >>>>>>> /dev/sda 49.63GiB > > >>>>>>> /dev/sdg 49.50GiB > > >>>>>>> /dev/sdh 51.52GiB > > >>>>>>> /dev/sdd 48.70GiB > > >>>>>>> /dev/sde 39.09GiB > > >>>>>>> /dev/sdb 39.01GiB > > >>>>>>> > > >>>>>>> System,RAID1C4: Size:37.00MiB, Used:5.11MiB (13.81%) > > >>>>>>> /dev/sdc 1.00MiB > > >>>>>>> /dev/sda 37.00MiB > > >>>>>>> /dev/sdg 37.00MiB > > >>>>>>> /dev/sdh 36.00MiB > > >>>>>>> /dev/sdd 37.00MiB > > >>>>>>> > > >>>>>>> Unallocated: > > >>>>>>> /dev/sdc 1.00MiB > > >>>>>>> /dev/sdf 1.00MiB > > >>>>>>> /dev/sda 1.27GiB > > >>>>>>> /dev/sdg 1.00MiB > > >>>>>>> /dev/sdh 1.00MiB > > >>>>>>> /dev/sdd 687.00MiB > > >>>>>>> /dev/sde 1.00MiB > > >>>>>>> /dev/sdb 1.00MiB > > >>>>>>> $ > > >>>>>>> > > >>>>>>> > > >>>>>>> This first attempt generated the following syslog output: > > >>>>>>> kernel: [ 868.435387] BTRFS info (device sde): using crc32c > > >>>>>>> (crc32c-intel) checksum algorithm > > >>>>>>> kernel: [ 868.435407] BTRFS info (device sde): disk space caching is enabled > > >>>>>>> kernel: [ 874.477712] BTRFS info (device sde): bdev /dev/sdg errs: wr > > >>>>>>> 0, rd 0, flush 0, corrupt 845, gen 0 > > >>>>>>> kernel: [ 874.477727] BTRFS info (device sde): bdev /dev/sdc errs: wr > > >>>>>>> 41089, rd 1556, flush 0, corrupt 0, gen 0 > > >>>>>>> kernel: [ 874.477735] BTRFS info (device sde): bdev /dev/sdj errs: wr > > >>>>>>> 3, rd 7, flush 0, corrupt 0, gen 0 > > >>>>>>> kernel: [ 874.477740] BTRFS info (device sde): bdev /dev/sdf errs: wr > > >>>>>>> 41, rd 0, flush 0, corrupt 0, gen 0 > > >>>>>>> kernel: [ 1082.645551] BTRFS info (device sde): balance: resume skipped > > >>>>>>> kernel: [ 1082.645564] BTRFS info (device sde): checking UUID tree > > >>>>>>> kernel: [ 1082.645551] BTRFS info (device sde): balance: resume skipped > > >>>>>>> kernel: [ 1082.645564] BTRFS info (device sde): checking UUID tree > > >>>>>>> kernel: [ 1267.280506] BTRFS: Transaction aborted (error -28) > > >>>>>>> kernel: [ 1267.280553] BTRFS: error (device sde: state A) in > > >>>>>>> do_free_extent_accounting:2847: errno=-28 No space left > > >>>>>>> kernel: [ 1267.280604] BTRFS info (device sde: state EA): forced readonly > > >>>>>>> kernel: [ 1267.280610] BTRFS error (device sde: state EA): failed to > > >>>>>>> run delayed ref for logical 102255404044288 num_bytes 294912 type 184 > > >>>>>>> action 2 ref_mod 1: -28 > > >>>>>>> kernel: [ 1267.280584] WARNING: CPU: 3 PID: 14519 at > > >>>>>>> fs/btrfs/extent-tree.c:2847 do_free_extent_accounting+0x21a/0x220 > > >>>>>>> [btrfs] > > >>>>>>> kernel: [ 1267.280666] BTRFS: error (device sde: state EA) in > > >>>>>>> btrfs_run_delayed_refs:2151: errno=-28 No space left > > >>>>>>> kernel: [ 1267.280695] BTRFS warning (device sde: state EA): > > >>>>>>> btrfs_uuid_scan_kthread failed -5 > > >>>>>>> kernel: [ 1267.280794] Modules linked in: xt_nat xt_tcpudp veth > > >>>>>>> xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink > > >>>>>>> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo > > >>>>>>> xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc > > >>>>>>> ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) binfmt_misc > > >>>>>>> nls_iso8859_1 intel_rapl_msr intel_rapl_common edac_mce_amd > > >>>>>>> snd_hda_codec_realtek kvm_amd snd_hda_codec_generic ledtrig_audio kvm > > >>>>>>> snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi > > >>>>>>> snd_hda_codec irqbypass snd_hda_core snd_hwdep rapl snd_pcm snd_timer > > >>>>>>> wmi_bmof k10temp snd ccp soundcore input_leds mac_hid dm_multipath > > >>>>>>> scsi_dh_rdac scsi_dh_emc scsi_dh_alua bonding tls efi_pstore msr nfsd > > >>>>>>> auth_rpcgss nfs_acl lockd grace sunrpc dmi_sysfs ip_tables x_tables > > >>>>>>> autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov > > >>>>>>> async_memcpy async_pq async_xor async_txxor raid6_pq libcrc32c raid1 > > >>>>>>> raid0 multipath linear hid_generic usbhid hid amdgpu uas usb_storage > > >>>>>>> kernel: [ 1267.280994] CPU: 3 PID: 14519 Comm: btrfs-transacti > > >>>>>>> Tainted: G W O 6.2.0-23-generic #23+btrfix > > >>>>>>> kernel: [ 1267.281005] RIP: 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] > > >>>>>>> kernel: [ 1267.281181] __btrfs_free_extent+0x6bc/0xf50 [btrfs] > > >>>>>>> kernel: [ 1267.281310] run_delayed_data_ref+0x8b/0x180 [btrfs] > > >>>>>>> kernel: [ 1267.281444] btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] > > >>>>>>> kernel: [ 1267.281570] __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] > > >>>>>>> kernel: [ 1267.281694] btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] > > >>>>>>> kernel: [ 1267.281818] btrfs_start_dirty_block_groups+0x36b/0x530 [btrfs] > > >>>>>>> kernel: [ 1267.281976] btrfs_commit_transaction+0xb3/0xbc0 [btrfs] > > >>>>>>> kernel: [ 1267.282110] ? start_transaction+0xc8/0x600 [btrfs] > > >>>>>>> kernel: [ 1267.282244] transaction_kthread+0x14b/0x1c0 [btrfs] > > >>>>>>> kernel: [ 1267.282375] ? __pfx_transaction_kthread+0x10/0x10 [btrfs] > > >>>>>>> kernel: [ 1267.282548] BTRFS info (device sde: state EA): dumping space info: > > >>>>>>> kernel: [ 1267.282552] BTRFS info (device sde: state EA): space_info > > >>>>>>> DATA has 160777674752 free, is not full > > >>>>>>> kernel: [ 1267.282558] BTRFS info (device sde: state EA): space_info > > >>>>>>> total=71201958395904, used=71018191273984, pinned=22985908224, > > >>>>>>> reserved=0, may_use=0, readonly=3538944 zone_unusable=0 > > >>>>>>> kernel: [ 1267.282566] BTRFS info (device sde: state EA): space_info > > >>>>>>> METADATA has -124944384 free, is full > > >>>>>>> kernel: [ 1267.282571] BTRFS info (device sde: state EA): space_info > > >>>>>>> total=83530612736, used=82791497728, pinned=242745344, > > >>>>>>> reserved=496369664, may_use=124944384, readonly=0 zone_unusable=0 > > >>>>>>> kernel: [ 1267.282577] BTRFS info (device sde: state EA): space_info > > >>>>>>> SYSTEM has 33439744 free, is not full > > >>>>>>> kernel: [ 1267.282582] BTRFS info (device sde: state EA): space_info > > >>>>>>> total=38797312, used=5357568, pinned=0, reserved=0, may_use=0, > > >>>>>>> readonly=0 zone_unusable=0 > > >>>>>>> kernel: [ 1267.282588] BTRFS info (device sde: state EA): > > >>>>>>> global_block_rsv: size 536870912 reserved 124944384 > > >>>>>>> kernel: [ 1267.282592] BTRFS info (device sde: state EA): > > >>>>>>> trans_block_rsv: size 0 reserved 0 > > >>>>>>> kernel: [ 1267.282595] BTRFS info (device sde: state EA): > > >>>>>>> chunk_block_rsv: size 0 reserved 0 > > >>>>>>> kernel: [ 1267.282599] BTRFS info (device sde: state EA): > > >>>>>>> delayed_block_rsv: size 0 reserved 0 > > >>>>>>> kernel: [ 1267.282602] BTRFS info (device sde: state EA): > > >>>>>>> delayed_refs_rsv: size 251322957824 reserved 0 > > >>>>>>> kernel: [ 1267.282608] BTRFS: error (device sde: state EA) in > > >>>>>>> do_free_extent_accounting:2847: errno=-28 No space left > > >>>>>>> kernel: [ 1267.282653] BTRFS error (device sde: state EA): failed to > > >>>>>>> run delayed ref for logical 102255401897984 num_bytes 126976 type 184 > > >>>>>>> action 2 ref_mod 1: -28 > > >>>>>>> kernel: [ 1267.282708] BTRFS: error (device sde: state EA) in > > >>>>>>> btrfs_run_delayed_refs:2151: errno=-28 No space left > > >>>>>>> > > >>>>>>> A couple of kernel recompiles later, the second attempt on the SSD > > >>>>>>> generated similar: > > >>>>>>> kernel: [ 1472.203470] BTRFS info (device sdc): using crc32c > > >>>>>>> (crc32c-intel) checksum algorithm > > >>>>>>> kernel: [ 1472.203491] BTRFS info (device sdc): disk space caching is enabled > > >>>>>>> kernel: [ 1478.155004] BTRFS info (device sdc): bdev /dev/sdf errs: wr > > >>>>>>> 0, rd 0, flush 0, corrupt 845, gen 0 > > >>>>>>> kernel: [ 1478.155022] BTRFS info (device sdc): bdev /dev/sda errs: wr > > >>>>>>> 41089, rd 1556, flush 0, corrupt 0, gen 0 > > >>>>>>> kernel: [ 1478.155034] BTRFS info (device sdc): bdev /dev/sdh errs: wr > > >>>>>>> 3, rd 7, flush 0, corrupt 0, gen 0 > > >>>>>>> kernel: [ 1478.155041] BTRFS info (device sdc): bdev /dev/sdd errs: wr > > >>>>>>> 41, rd 0, flush 0, corrupt 0, gen 0 > > >>>>>>> kernel: [ 1696.662526] BTRFS info (device sdc): balance: resume skipped > > >>>>>>> kernel: [ 1696.662537] BTRFS info (device sdc): checking UUID tree > > >>>>>>> kernel: [ 1919.452464] BTRFS: Transaction aborted (error -28) > > >>>>>>> kernel: [ 1919.452534] WARNING: CPU: 1 PID: 161 at > > >>>>>>> fs/btrfs/extent-tree.c:2847 do_free_extent_accounting+0x21a/0x220 > > >>>>>>> [btrfs] > > >>>>>>> kernel: [ 1919.452655] Modules linked in: xt_nat xt_tcpudp veth > > >>>>>>> xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink > > >>>>>>> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo > > >>>>>>> xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc > > >>>>>>> ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) binfmt_misc > > >>>>>>> nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic > > >>>>>>> ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg > > >>>>>>> snd_intel_sdw_acpi snd_hda_codec intel_rapl_msr snd_hda_core > > >>>>>>> intel_rapl_common edac_mce_amd snd_hwdep kvm_amd snd_pcm snd_timer kvm > > >>>>>>> irqbypass rapl wmi_bmof snd k10temp soundcore ccp input_leds mac_hid > > >>>>>>> dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua bonding tls nfsd > > >>>>>>> msr auth_rpcgss efi_pstore nfs_acl lockd grace sunrpc dmi_sysfs > > >>>>>>> ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 > > >>>>>>> async_raid6_recov async_memcpy async_pq async_xor async_tx xor > > >>>>>>> raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid > > >>>>>>> amdgpu uas hid iommu_v2 > > >>>>>>> kernel: [ 1919.452839] Workqueue: events_unbound > > >>>>>>> btrfs_async_reclaim_metadata_space [btrfs] > > >>>>>>> kernel: [ 1919.452985] RIP: 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] > > >>>>>>> kernel: [ 1919.453141] __btrfs_free_extent+0x6bc/0xf50 [btrfs] > > >>>>>>> kernel: [ 1919.453256] run_delayed_data_ref+0x8b/0x180 [btrfs] > > >>>>>>> kernel: [ 1919.453368] btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] > > >>>>>>> kernel: [ 1919.453480] __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] > > >>>>>>> kernel: [ 1919.453592] btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] > > >>>>>>> kernel: [ 1919.453703] flush_space+0x23c/0x2c0 [btrfs] > > >>>>>>> kernel: [ 1919.453845] btrfs_async_reclaim_metadata_space+0x19b/0x2b0 [btrfs] > > >>>>>>> kernel: [ 1919.454034] BTRFS info (device sdc: state A): dumping space info: > > >>>>>>> kernel: [ 1919.454038] BTRFS info (device sdc: state A): space_info > > >>>>>>> DATA has 160778723328 free, is not full > > >>>>>>> kernel: [ 1919.454043] BTRFS info (device sdc: state A): space_info > > >>>>>>> total=71201958395904, used=71017442181120, pinned=23733952512, > > >>>>>>> reserved=0, may_use=0, readonly=3538944 zone_unusable=0 > > >>>>>>> kernel: [ 1919.454050] BTRFS info (device sdc: state A): space_info > > >>>>>>> METADATA has -147570688 free, is full > > >>>>>>> kernel: [ 1919.454054] BTRFS info (device sdc: state A): space_info > > >>>>>>> total=83530612736, used=82792185856, pinned=238059520, > > >>>>>>> reserved=500367360, may_use=147570688, readonly=0 zone_unusable=0 > > >>>>>>> kernel: [ 1919.454060] BTRFS info (device sdc: state A): space_info > > >>>>>>> SYSTEM has 33439744 free, is not full > > >>>>>>> kernel: [ 1919.454064] BTRFS info (device sdc: state A): space_info > > >>>>>>> total=38797312, used=5357568, pinned=0, reserved=0, may_use=0, > > >>>>>>> readonly=0 zone_unusable=0 > > >>>>>>> kernel: [ 1919.454070] BTRFS info (device sdc: state A): > > >>>>>>> global_block_rsv: size 536870912 reserved 147570688 > > >>>>>>> kernel: [ 1919.454074] BTRFS info (device sdc: state A): > > >>>>>>> trans_block_rsv: size 0 reserved 0 > > >>>>>>> kernel: [ 1919.454077] BTRFS info (device sdc: state A): > > >>>>>>> chunk_block_rsv: size 0 reserved 0 > > >>>>>>> kernel: [ 1919.454080] BTRFS info (device sdc: state A): > > >>>>>>> delayed_block_rsv: size 0 reserved 0 > > >>>>>>> kernel: [ 1919.454083] BTRFS info (device sdc: state A): > > >>>>>>> delayed_refs_rsv: size 254292787200 reserved 0 > > >>>>>>> kernel: [ 1919.454086] BTRFS: error (device sdc: state A) in > > >>>>>>> do_free_extent_accounting:2847: errno=-28 No space left > > >>>>>>> kernel: [ 1919.454123] BTRFS info (device sdc: state EA): forced readonly > > >>>>>>> kernel: [ 1919.454127] BTRFS error (device sdc: state EA): failed to > > >>>>>>> run delayed ref for logical 102538713931776 num_bytes 245760 type 184 > > >>>>>>> action 2 ref_mod 1: -28 > > >>>>>>> kernel: [ 1919.454176] BTRFS: error (device sdc: state EA) in > > >>>>>>> btrfs_run_delayed_refs:2151: errno=-28 No space left > > >>>>>>> kernel: [ 1919.454249] BTRFS warning (device sdc: state EA): > > >>>>>>> btrfs_uuid_scan_kthread failed -5 > > >>>>>>> kernel: [ 1919.472381] BTRFS: error (device sdc: state EA) in > > >>>>>>> __btrfs_free_extent:3077: errno=-28 No space left > > >>>>>>> kernel: [ 1919.472417] BTRFS error (device sdc: state EA): failed to > > >>>>>>> run delayed ref for logical 102538732191744 num_bytes 245760 type 184 > > >>>>>>> action 2 ref_mod 1: -28 > > >>>>>>> kernel: [ 1919.472442] BTRFS: error (device sdc: state EA) in > > >>>>>>> btrfs_run_delayed_refs:2151: errno=-28 No space left > > >>>>>>> > > >>>>>>> > > >>>>>>> On Sat, 17 Jun 2023 at 15:00, Qu Wenruo <wqu@suse.com> wrote: > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> On 2023/6/17 13:11, Stefan N wrote: > > >>>>>>>>> Hi Qu, > > >>>>>>>>> > > >>>>>>>>> I believe I've got this environment ready, with the 6.2.0 kernel as > > >>>>>>>>> before using the Ubuntu kernel, but can switch to vanilla if required. > > >>>>>>>>> > > >>>>>>>>> I've not done anything kernel modifications for a solid decade, so > > >>>>>>>>> would be keen for a bit of guidance. > > >>>>>>>> > > >>>>>>>> Sure no problem. > > >>>>>>>> > > >>>>>>>> Please fetch the kernel source tar ball (6.2.x) first, decompress, then > > >>>>>>>> apply the attached one-line patch by: > > >>>>>>>> > > >>>>>>>> $ tar czf linux*.tar.xz > > >>>>>>>> $ cd linux* > > >>>>>>>> $ patch -np1 -i <the patch file> > > >>>>>>>> > > >>>>>>>> Then use your running system kernel config if possible: > > >>>>>>>> > > >>>>>>>> $ cp /proc/config.gz . > > >>>>>>>> $ gunzip config.gz > > >>>>>>>> $ mv config .config > > >>>>>>>> $ make olddefconfig > > >>>>>>>> > > >>>>>>>> Then you can start your kernel compiling, and considering you're using > > >>>>>>>> your distro's default, it would include tons of drivers, thus would be > > >>>>>>>> very slow. (Replace the number to something more suitable to your > > >>>>>>>> system, using all CPU cores can be very hot) > > >>>>>>>> > > >>>>>>>> $ make -j12 > > >>>>>>>> > > >>>>>>>> Finally you need to install the modules/kernel. > > >>>>>>>> > > >>>>>>>> Unfortunately this is distro specific, but if you're using Ubuntu, it > > >>>>>>>> may be much easier: > > >>>>>>>> > > >>>>>>>> $ make bindeb-pkg > > >>>>>>>> > > >>>>>>>> Then install the generated dpkg I guess? I have never tried kernel > > >>>>>>>> building using deb/rpm, but only manual installation, which is also > > >>>>>>>> distro dependent in the initramfs generation part. > > >>>>>>>> > > >>>>>>>> # cp arch/x86/boot/bzImage /boot/vmlinuz-custom > > >>>>>>>> # make modules_install > > >>>>>>>> # mkinitcpio -k /boot/vmlinuz-custom -g /boot/initramfs-custom.img > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> The last step is to update your bootloader to add the new kernel, which > > >>>>>>>> is not only distro dependent but also bootloader dependent. > > >>>>>>>> > > >>>>>>>> In my case, I go with systemd-boot with manually crafted entries. > > >>>>>>>> But if you go Ubuntu I believe just installing the kernel dpkg would > > >>>>>>>> have everything handled? > > >>>>>>>> > > >>>>>>>> Finally you can try reboot into the newer kernel, and try device add > > >>>>>>>> (need to add 4 disks), then sync and see if things work as expected. > > >>>>>>>> > > >>>>>>>> Thanks, > > >>>>>>>> Qu > > >>>>>>>>> > > >>>>>>>>> I will recover a 1tb SSD and partition it into 4 in a USB enclosure, > > >>>>>>>>> but failing this will use 4x loop devices. > > >>>>>>>>> > > >>>>>>>>> On Tue, 13 Jun 2023 at 11:28, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > >>>>>>>>>> In your particular case, since you're running RAID1C4 you need to add 4 > > >>>>>>>>>> devices in one transaction. > > >>>>>>>>>> > > >>>>>>>>>> I can easily craft a patch to avoid commit transaction, but still you'll > > >>>>>>>>>> need to add at least 4 disks, and then sync to see if things would work. > > >>>>>>>>>> > > >>>>>>>>>> Furthermore this means you need a liveCD with full kernel compiling > > >>>>>>>>>> environment. > > >>>>>>>>>> > > >>>>>>>>>> If you want to go this path, I can send you the patch when you've > > >>>>>>>>>> prepared the needed environment. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Out of space loop: skip_balance not working 2023-07-22 5:28 ` Stefan N @ 2023-07-22 10:08 ` Qu Wenruo [not found] ` <CA+W5K0oDRo2LZMiUiysYXpcpmfXTvS27hPdjm1pzq4kfq9=vdQ@mail.gmail.com> 0 siblings, 1 reply; 22+ messages in thread From: Qu Wenruo @ 2023-07-22 10:08 UTC (permalink / raw) To: Stefan N; +Cc: Qu Wenruo, linux-btrfs@vger.kernel.org On 2023/7/22 13:28, Stefan N wrote: > Hi again Qu, > > Thanks for all your help last month, I managed to get things going > again and have been slowly adding new disks, but have now ended up > with a similar but slightly more complicated problem I need some more > assistance with. > > Since last time: I used loop devices to get the fs operational again, > then deleted some files to create space, removed the loop devices, > successfully used btrfs replace to replace 3x 12tb disks with 18tbs, > and moved to space cache v2 in the hope it'd prevent future issues. > > The problem: during the 4th replace operation the metadata issue has > recurred, the first time self correcting when remounted, but this > second time has resulted in a similar paradox to last time. I've > rebooted into the patched kernel from last month, but the same > solution is now ineffective due to the system failing to detect the > replace target, despite no disks having been removed nor changing from > /dev/sda and /dev/sdl during the reboots. > > During the replace process the disks were in use, and while after > there's plenty of space for data it seems enough was written to fill > metadata again. In hindsight I should have left the 4 loop devices in > place until the replaces had completed to satisfy the RAID1C4 > requirement for the metadata, as despite deleting files data has not > been freed from the existing 12tb disks. > > The 'missing' replace target is: > Disk /dev/sda: 16.37 TiB, 18000207937536 bytes, 35156656128 sectors The problem seems to be that, replace cancel also needs to commit transaction, which is obviously a bad situation during high metadata stress. But the root problem is still why we hit ENOSPC, AFAIK Filipe is working on this problem. For now, the problem can be more or less worked around by the same method, instead of committing transaction we just cancel the current one so that you can continue to go with the patched device add. I have updated the branch to have a new patch, please try if this allows you to mount it with "-o degraded" then try cancel and add devices. https://github.com/adam900710/linux/tree/dev_add_no_commit Thanks, Qu [...] > > > $ sudo mount -o degraded /mnt/data ; sudo btrfs replace cancel > /mnt/data ; sudo btrfs dev add -K -f /dev/loop20 /dev/loop21 > /dev/loop22 /dev/loop23 /mnt/data ; sudo btrfs fi sync /mnt/data > ERROR: error adding device '/dev/loop20': Read-only file system > ERROR: error adding device '/dev/loop21': Read-only file system > ERROR: error adding device '/dev/loop22': Read-only file system > ERROR: error adding device '/dev/loop23': Read-only file system > ERROR: Could not sync filesystem: Read-only file system > $ > > syslog: > BTRFS info (device sdf): using crc32c (crc32c-intel) checksum algorithm > BTRFS info (device sdf): allowing degraded mounts > BTRFS info (device sdf): using free space tree > BTRFS info (device sdf): bdev /dev/sdg errs: wr 0, rd 0, flush 0, > corrupt 845, gen 0 > BTRFS info (device sdf): bdev /dev/sde errs: wr 3, rd 7, flush 0, > corrupt 0, gen 0 > BTRFS info (device sdf): bdev /dev/sdc errs: wr 41, rd 0, flush 0, > corrupt 0, gen 0 > BTRFS info (device sdf): cannot continue dev_replace, tgtdev is missing > BTRFS info (device sdf): you may cancel the operation after 'mount -o degraded' > BTRFS: Transaction aborted (error -28) > WARNING: CPU: 0 PID: 6659 at fs/btrfs/extent-tree.c:3077 > __btrfs_free_extent+0xa18/0xf50 [btrfs] > Modules linked in: xt_nat xt_tcpudp veth xt_conntrack nft_chain_nat > xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 > nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables > nfnetlink br_netfilter bridge stp llc rpcsec_gss_krb5 nfsv4 nfs > fscache netfs ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) > binfmt_misc nls_iso8859_1 intel_rapl_msr snd_hda_codec_realtek > snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel > snd_intel_dspcfg intel_rapl_common snd_intel_sdw_acpi edac_mce_amd > snd_hda_codec kvm_amd snd_hda_core kvm snd_hwdep irqbypass snd_pcm > rapl wmi_bmof snd_timer k10temp snd ccp soundcore joydev input_leds > mac_hid dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua bonding tls > msr nfsd efi_pstore auth_rpcgss nfs_acl lockd grace sunrpc dmi_sysfs > ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 > async_raid6_recov async_memcpy async_pq async_xor async_tx xor > raid6_pq libcrc32c raid1 raid0 multipath linear > hid_logitech_hidpp hid_logitech_dj amdgpu hid_generic iommu_v2 > drm_buddy gpu_sched drm_ttm_helper ttm drm_display_helper uas cec > rc_core usbhid hid usb_storage drm_kms_helper syscopyarea sysfillrect > sysimgblt crct10dif_pclmul igb crc32_pclmul polyval_clmulni > polyval_generic ghash_clmulni_intel dca sha512_ssse3 aesni_intel > crypto_simd drm nvme ahci cryptd libahci qlcnic i2c_algo_bit nvme_core > mpt3sas xhci_pci video raid_class scsi_transport_sas xhci_pci_renesas > nvme_common i2c_piix4 wmi > CPU: 0 PID: 6659 Comm: btrfs Tainted: G W O > 6.2.0-23-generic #23+btrdebug2c > Hardware name: To Be Filled By O.E.M. X570M Pro4/X570M Pro4, BIOS > P3.70 02/23/2022 > RIP: 0010:__btrfs_free_extent+0xa18/0xf50 [btrfs] > Code: 48 c7 c6 80 19 71 c1 48 8b 78 50 e8 82 57 0e 00 41 b8 01 00 00 > 00 e9 58 fe ff ff 8b 75 94 48 c7 c7 a8 19 71 c1 e8 d8 92 4d c7 <0f> 0b > e9 64 fb ff ff 8b 7d 90 e8 b9 04 ff ff 84 c0 0f 85 f1 01 00 > RSP: 0018:ffffb05e4746fa38 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: 0000b711db1d0000 RCX: 0000000000000000 > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > RBP: ffffb05e4746fad8 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 > R13: 0000000000000000 R14: ffff88edc031ea90 R15: ffff88edc3ba0230 > FS: 00007f2b14740d40(0000) GS:ffff88f4e0a00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 000000c000253000 CR3: 00000001e7cc8000 CR4: 00000000003506f0 > Call Trace: > <TASK> > run_delayed_tree_ref+0x69/0x1b0 [btrfs] > btrfs_run_delayed_refs_for_head+0x3aa/0x520 [btrfs] > ? btrfs_create_pending_block_groups+0x280/0x4d0 [btrfs] > __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] > btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] > commit_cowonly_roots+0x1e7/0x240 [btrfs] > btrfs_commit_transaction+0x5d2/0xbc0 [btrfs] > ? start_transaction+0xc8/0x600 [btrfs] > btrfs_dev_replace_cancel+0x168/0x2e0 [btrfs] > btrfs_ioctl+0x12ed/0x14d0 [btrfs] > ? __handle_mm_fault+0x661/0x720 > __x64_sys_ioctl+0xa0/0xe0 > do_syscall_64+0x5b/0x90 > ? do_user_addr_fault+0x1e8/0x720 > ? exit_to_user_mode_prepare+0x30/0xb0 > ? irqentry_exit_to_user_mode+0x9/0x20 > ? irqentry_exit+0x43/0x50 > ? exc_page_fault+0x91/0x1b0 > entry_SYSCALL_64_after_hwframe+0x72/0xdc > RIP: 0033:0x7f2b145119ef > Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 > 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 > 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00 > RSP: 002b:00007ffcda96ca10 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2b145119ef > RDX: 00007ffcda96ca80 RSI: 00000000ca289435 RDI: 0000000000000003 > RBP: 0000000000000003 R08: 0000000000021001 R09: 0000000000000000 > R10: fffffffffffff000 R11: 0000000000000246 R12: 00007ffcda96e7eb > R13: 000056092aafbe60 R14: 000056092aab3578 R15: 0000000000000000 > </TASK> > ---[ end trace 0000000000000000 ]--- > BTRFS info (device sdf: state A): dumping space info: > BTRFS info (device sdf: state A): space_info DATA has 219646795776 > free, is not full > BTRFS info (device sdf: state A): space_info total=71845742116864, > used=71626091782144, pinned=0, reserved=0, may_use=0, readonly=3538944 > zone_unusable=0 > BTRFS info (device sdf: state A): space_info METADATA has -536821760 > free, is full > BTRFS info (device sdf: state A): space_info total=83481329664, > used=83421233152, pinned=57606144, reserved=2490368, > may_use=536821760, readonly=0 zone_unusable=0 > BTRFS info (device sdf: state A): space_info SYSTEM has 20676608 free, > is not full > BTRFS info (device sdf: state A): space_info total=26214400, > used=5537792, pinned=0, reserved=0, may_use=0, readonly=0 > zone_unusable=0 > BTRFS info (device sdf: state A): global_block_rsv: size 536870912 > reserved 536805376 > BTRFS info (device sdf: state A): trans_block_rsv: size 0 reserved 0 > BTRFS info (device sdf: state A): chunk_block_rsv: size 0 reserved 0 > BTRFS info (device sdf: state A): delayed_block_rsv: size 0 reserved 0 > BTRFS info (device sdf: state A): delayed_refs_rsv: size 523239424 > reserved 16384 > BTRFS: error (device sdf: state A) in __btrfs_free_extent:3077: > errno=-28 No space left > BTRFS info (device sdf: state EA): forced readonly > BTRFS error (device sdf: state EA): failed to run delayed ref for > logical 201287318437888 num_bytes 16384 type 176 action 2 ref_mod 1: > -28 > BTRFS: error (device sdf: state EA) in btrfs_run_delayed_refs:2151: > errno=-28 No space left > BTRFS warning (device sdf: state EA): Skipping commit of aborted transaction. > BTRFS: error (device sdf: state EA) in cleanup_transaction:1986: > errno=-28 No space left > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 6659 at fs/btrfs/dev-replace.c:1121 > btrfs_dev_replace_cancel+0x2b0/0x2e0 [btrfs] > Modules linked in: xt_nat xt_tcpudp veth xt_conntrack nft_chain_nat > xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 > nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables > nfnetlink br_netfilter bridge stp llc rpcsec_gss_krb5 nfsv4 nfs > fscache netfs ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) > binfmt_misc nls_iso8859_1 intel_rapl_msr snd_hda_codec_realtek > snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel > snd_intel_dspcfg intel_rapl_common snd_intel_sdw_acpi edac_mce_amd > snd_hda_codec kvm_amd snd_hda_core kvm snd_hwdep irqbypass snd_pcm > rapl wmi_bmof snd_timer k10temp snd ccp soundcore joydev input_leds > mac_hid dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua bonding tls > msr nfsd efi_pstore auth_rpcgss nfs_acl lockd grace sunrpc dmi_sysfs > ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 > async_raid6_recov async_memcpy async_pq async_xor async_tx xor > raid6_pq libcrc32c raid1 raid0 multipath linear > 2023-07-22T14:04:29.956673+09:30 ltsnas kernel: [ 422.690184] > hid_logitech_hidpp hid_logitech_dj amdgpu hid_generic iommu_v2 > drm_buddy gpu_sched drm_ttm_helper ttm drm_display_helper uas cec > rc_core usbhid hid usb_storage drm_kms_helper syscopyarea sysfillrect > sysimgblt crct10dif_pclmul igb crc32_pclmul polyval_clmulni > polyval_generic ghash_clmulni_intel dca sha512_ssse3 aesni_intel > crypto_simd drm nvme ahci cryptd libahci qlcnic i2c_algo_bit nvme_core > mpt3sas xhci_pci video raid_class scsi_transport_sas xhci_pci_renesas > nvme_common i2c_piix4 wmi > CPU: 0 PID: 6659 Comm: btrfs Tainted: G W O > 6.2.0-23-generic #23+btrdebug2c > Hardware name: To Be Filled By O.E.M. X570M Pro4/X570M Pro4, BIOS > P3.70 02/23/2022 > RIP: 0010:btrfs_dev_replace_cancel+0x2b0/0x2e0 [btrfs] > Code: 4c 89 c2 e8 52 3f 02 00 e8 9d 4a 4e c7 e9 35 ff ff ff 4c 89 e7 > 48 89 45 d0 e8 bc d5 3f c8 48 8b 45 d0 41 89 c5 e9 38 ff ff ff <0f> 0b > e9 b9 fe ff ff 41 bd e2 ff ff ff e9 26 ff ff ff 48 c7 c2 74 > RSP: 0018:ffffb05e4746fd58 EFLAGS: 00010286 > RAX: 00000000ffffffe4 RBX: ffff88edda916000 RCX: 0000000000000000 > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > RBP: ffffb05e4746fd88 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000000 R12: ffff88edda916ab0 > R13: ffff88eddb627800 R14: ffff88ede7fad000 R15: ffff88edda916ad0 > FS: 00007f2b14740d40(0000) GS:ffff88f4e0a00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 000000c000253000 CR3: 00000001e7cc8000 CR4: 00000000003506f0 > Call Trace: > <TASK> > btrfs_ioctl+0x12ed/0x14d0 [btrfs] > ? __handle_mm_fault+0x661/0x720 > __x64_sys_ioctl+0xa0/0xe0 > do_syscall_64+0x5b/0x90 > ? do_user_addr_fault+0x1e8/0x720 > ? exit_to_user_mode_prepare+0x30/0xb0 > ? irqentry_exit_to_user_mode+0x9/0x20 > ? irqentry_exit+0x43/0x50 > ? exc_page_fault+0x91/0x1b0 > entry_SYSCALL_64_after_hwframe+0x72/0xdc > RIP: 0033:0x7f2b145119ef > Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 > 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 > 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00 > RSP: 002b:00007ffcda96ca10 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2b145119ef > RDX: 00007ffcda96ca80 RSI: 00000000ca289435 RDI: 0000000000000003 > RBP: 0000000000000003 R08: 0000000000021001 R09: 0000000000000000 > R10: fffffffffffff000 R11: 0000000000000246 R12: 00007ffcda96e7eb > R13: 000056092aafbe60 R14: 000056092aab3578 R15: 0000000000000000 > </TASK> > ---[ end trace 0000000000000000 ]--- > BTRFS info (device sdf: state EA): suspended dev_replace from /dev/sdl > (devid 4) to <missing disk> canceled > BTRFS error (device sdf: state EA): failed to add disk /dev/loop20: -30 > BTRFS error (device sdf: state EA): failed to add disk /dev/loop21: -30 > BTRFS error (device sdf: state EA): failed to add disk /dev/loop22: -30 > BTRFS error (device sdf: state EA): failed to add disk /dev/loop23: -30 > > On Mon, 26 Jun 2023 at 22:28, Stefan N <stefannnau@gmail.com> wrote: >> >> Hi Qu, >> >> Thanks for all the help, I managed to get it mounted and synced with >> 5G loops (2G allocated to metadata, 3G unallocated on each). >> >> I'm able to read existing files, write new files, and any changes >> remain after an unmount and remount. >> >> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs >> dev add -K -f /dev/loop20 /dev/loop21 /dev/loop22 /dev/loop23 >> /mnt/data ; sudo btrfs fi sync /mnt/data >> $ sudo btrfs fi show >> Label: none uuid: abc123 >> Total devices 12 FS bytes used 64.52TiB >> devid 1 size 10.91TiB used 10.89TiB path /dev/sdd >> devid 2 size 10.91TiB used 10.89TiB path /dev/sdh >> devid 3 size 10.91TiB used 10.89TiB path /dev/sdb >> devid 4 size 10.91TiB used 10.89TiB path /dev/sdg >> devid 5 size 10.91TiB used 10.89TiB path /dev/sdi >> devid 6 size 10.91TiB used 10.89TiB path /dev/sde >> devid 7 size 10.91TiB used 10.89TiB path /dev/sdf >> devid 8 size 10.91TiB used 10.89TiB path /dev/sdc >> devid 9 size 5.00GiB used 2.00GiB path /dev/loop20 >> devid 10 size 5.00GiB used 2.00GiB path /dev/loop21 >> devid 11 size 5.00GiB used 2.00GiB path /dev/loop22 >> devid 12 size 5.00GiB used 2.00GiB path /dev/loop23 >> $ >> >> I'd be keen to know what you'd suggest for next steps. I have two 18T >> disks to upgrade two of the existing 12T disks, which could be a >> substitute or add them over USB for a while. >> >> While a random sample of files seem to be perfectly intact, I'd be >> keen to verify the integrity to track down any corrupted files. >> >> Should I perform a scrub before adding/replacing the new disks, or can >> this be safely done afterwards? e.g. can I safely add 2x18tb, remove >> loops, begin scrub, and then remove 2x 12tb when scrub completes? >> >> See kernel log below: >> >> kernel: [ 399.272458] BTRFS info (device sdd): using crc32c >> (crc32c-intel) checksum algorithm >> kernel: [ 399.272476] BTRFS info (device sdd): disk space caching is enabled >> kernel: [ 404.855750] BTRFS info (device sdd): bdev /dev/sdh errs: wr >> 0, rd 0, flush 0, corrupt 845, gen 0 >> kernel: [ 404.855766] BTRFS info (device sdd): bdev /dev/sdb errs: wr >> 41089, rd 1556, flush 0, corrupt 0, gen 0 >> kernel: [ 404.855778] BTRFS info (device sdd): bdev /dev/sdi errs: wr >> 3, rd 7, flush 0, corrupt 0, gen 0 >> kernel: [ 404.855785] BTRFS info (device sdd): bdev /dev/sde errs: wr >> 41, rd 0, flush 0, corrupt 0, gen 0 >> kernel: [ 630.844173] BTRFS info (device sdd): balance: resume skipped >> kernel: [ 630.844185] BTRFS info (device sdd): checking UUID tree >> kernel: [ 630.871787] BTRFS info (device sdd): disk added /dev/loop20 >> kernel: [ 630.881223] BTRFS info (device sdd): disk added /dev/loop21 >> kernel: [ 630.888817] BTRFS info (device sdd): disk added /dev/loop22 >> kernel: [ 630.896302] BTRFS info (device sdd): disk added /dev/loop23 >> kernel: [ 846.849616] INFO: task btrfs-uuid:4834 blocked for more >> than 120 seconds. >> kernel: [ 846.849660] Tainted: G W O >> 6.2.0-23-generic #23+btrdebug2c >> kernel: [ 846.849693] "echo 0 > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> kernel: [ 846.849725] task:btrfs-uuid state:D stack:0 >> pid:4834 ppid:2 flags:0x00004000 >> kernel: [ 846.849735] Call Trace: >> kernel: [ 846.849739] <TASK> >> kernel: [ 846.849747] __schedule+0x2aa/0x610 >> kernel: [ 846.849761] schedule+0x63/0x110 >> kernel: [ 846.849769] wait_current_trans+0x100/0x160 [btrfs] >> kernel: [ 846.849908] ? __pfx_autoremove_wake_function+0x10/0x10 >> kernel: [ 846.849920] start_transaction+0x28b/0x600 [btrfs] >> kernel: [ 846.850057] btrfs_start_transaction+0x1e/0x30 [btrfs] >> kernel: [ 846.850191] btrfs_uuid_scan_kthread+0x314/0x420 [btrfs] >> kernel: [ 846.850359] ? __pfx_btrfs_uuid_rescan_kthread+0x10/0x10 [btrfs] >> kernel: [ 846.850487] btrfs_uuid_rescan_kthread+0x20/0x70 [btrfs] >> kernel: [ 846.850614] kthread+0xe9/0x110 >> kernel: [ 846.850623] ? __pfx_kthread+0x10/0x10 >> kernel: [ 846.850631] ret_from_fork+0x2c/0x50 >> kernel: [ 846.850642] </TASK> >> kernel: [ 846.850645] INFO: task btrfs:4850 blocked for more than 120 seconds. >> kernel: [ 846.850676] Tainted: G W O >> 6.2.0-23-generic #23+btrdebug2c >> kernel: [ 846.850707] "echo 0 > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> kernel: [ 846.850738] task:btrfs state:D stack:0 >> pid:4850 ppid:4849 flags:0x00000002 >> kernel: [ 846.850746] Call Trace: >> kernel: [ 846.850749] <TASK> >> kernel: [ 846.850752] __schedule+0x2aa/0x610 >> kernel: [ 846.850760] schedule+0x63/0x110 >> kernel: [ 846.850765] btrfs_commit_transaction+0x9b7/0xbc0 [btrfs] >> kernel: [ 846.850899] ? __pfx_autoremove_wake_function+0x10/0x10 >> kernel: [ 846.850908] btrfs_sync_fs+0x5a/0x1b0 [btrfs] >> kernel: [ 846.851027] btrfs_ioctl+0x643/0x14d0 [btrfs] >> kernel: [ 846.851186] ? putname+0x5d/0x80 >> kernel: [ 846.851195] ? do_sys_openat2+0xab/0x180 >> kernel: [ 846.851203] ? exit_to_user_mode_prepare+0x30/0xb0 >> kernel: [ 846.851213] __x64_sys_ioctl+0xa0/0xe0 >> kernel: [ 846.851221] do_syscall_64+0x5b/0x90 >> kernel: [ 846.851229] ? exc_page_fault+0x91/0x1b0 >> kernel: [ 846.851236] entry_SYSCALL_64_after_hwframe+0x72/0xdc >> kernel: [ 846.851243] RIP: 0033:0x7fbf339119ef >> kernel: [ 846.851249] RSP: 002b:00007ffd58427660 EFLAGS: 00000246 >> ORIG_RAX: 0000000000000010 >> kernel: [ 846.851255] RAX: ffffffffffffffda RBX: 0000000000000003 >> RCX: 00007fbf339119ef >> kernel: [ 846.851259] RDX: 0000000000000000 RSI: 0000000000009408 >> RDI: 0000000000000003 >> kernel: [ 846.851263] RBP: 0000000000000007 R08: 0000000000000000 >> R09: 0000000000000000 >> kernel: [ 846.851266] R10: 0000000000000000 R11: 0000000000000246 >> R12: 00007fbf339f642c >> kernel: [ 846.851269] R13: 0000000000000001 R14: 0000557384b29578 >> R15: 0000000000000000 >> kernel: [ 846.851277] </TASK> >> kernel: [ 967.681770] INFO: task btrfs-uuid:4834 blocked for more >> than 241 seconds. >> kernel: [ 967.681818] Tainted: G W O >> 6.2.0-23-generic #23+btrdebug2c >> kernel: [ 967.681852] "echo 0 > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> kernel: [ 967.681884] task:btrfs-uuid state:D stack:0 >> pid:4834 ppid:2 flags:0x00004000 >> kernel: [ 967.681895] Call Trace: >> kernel: [ 967.681899] <TASK> >> kernel: [ 967.681907] __schedule+0x2aa/0x610 >> kernel: [ 967.681922] schedule+0x63/0x110 >> kernel: [ 967.681931] wait_current_trans+0x100/0x160 [btrfs] >> kernel: [ 967.682070] ? __pfx_autoremove_wake_function+0x10/0x10 >> kernel: [ 967.682082] start_transaction+0x28b/0x600 [btrfs] >> kernel: [ 967.682219] btrfs_start_transaction+0x1e/0x30 [btrfs] >> kernel: [ 967.682353] btrfs_uuid_scan_kthread+0x314/0x420 [btrfs] >> kernel: [ 967.682519] ? __pfx_btrfs_uuid_rescan_kthread+0x10/0x10 [btrfs] >> kernel: [ 967.682645] btrfs_uuid_rescan_kthread+0x20/0x70 [btrfs] >> kernel: [ 967.682728] kthread+0xe9/0x110 >> kernel: [ 967.682734] ? __pfx_kthread+0x10/0x10 >> kernel: [ 967.682739] ret_from_fork+0x2c/0x50 >> kernel: [ 967.682746] </TASK> >> kernel: [ 967.682749] INFO: task btrfs:4850 blocked for more than 241 seconds. >> kernel: [ 967.682771] Tainted: G W O >> 6.2.0-23-generic #23+btrdebug2c >> kernel: [ 967.682793] "echo 0 > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> kernel: [ 967.682815] task:btrfs state:D stack:0 >> pid:4850 ppid:4849 flags:0x00000002 >> kernel: [ 967.682820] Call Trace: >> kernel: [ 967.682822] <TASK> >> kernel: [ 967.682824] __schedule+0x2aa/0x610 >> kernel: [ 967.682829] schedule+0x63/0x110 >> kernel: [ 967.682832] btrfs_commit_transaction+0x9b7/0xbc0 [btrfs] >> kernel: [ 967.682918] ? __pfx_autoremove_wake_function+0x10/0x10 >> kernel: [ 967.682923] btrfs_sync_fs+0x5a/0x1b0 [btrfs] >> kernel: [ 967.682999] btrfs_ioctl+0x643/0x14d0 [btrfs] >> kernel: [ 967.683085] ? putname+0x5d/0x80 >> kernel: [ 967.683091] ? do_sys_openat2+0xab/0x180 >> kernel: [ 967.683096] ? exit_to_user_mode_prepare+0x30/0xb0 >> kernel: [ 967.683103] __x64_sys_ioctl+0xa0/0xe0 >> kernel: [ 967.683107] do_syscall_64+0x5b/0x90 >> kernel: [ 967.683112] ? exc_page_fault+0x91/0x1b0 >> kernel: [ 967.683116] entry_SYSCALL_64_after_hwframe+0x72/0xdc >> kernel: [ 967.683121] RIP: 0033:0x7fbf339119ef >> kernel: [ 967.683124] RSP: 002b:00007ffd58427660 EFLAGS: 00000246 >> ORIG_RAX: 0000000000000010 >> kernel: [ 967.683128] RAX: ffffffffffffffda RBX: 0000000000000003 >> RCX: 00007fbf339119ef >> kernel: [ 967.683130] RDX: 0000000000000000 RSI: 0000000000009408 >> RDI: 0000000000000003 >> kernel: [ 967.683132] RBP: 0000000000000007 R08: 0000000000000000 >> R09: 0000000000000000 >> kernel: [ 967.683134] R10: 0000000000000000 R11: 0000000000000246 >> R12: 00007fbf339f642c >> kernel: [ 967.683136] R13: 0000000000000001 R14: 0000557384b29578 >> R15: 0000000000000000 >> kernel: [ 967.683141] </TASK> >> kernel: [ 1088.519959] INFO: task btrfs-uuid:4834 blocked for more >> than 362 seconds. >> kernel: [ 1088.520006] Tainted: G W O >> 6.2.0-23-generic #23+btrdebug2c >> kernel: [ 1088.520039] "echo 0 > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> kernel: [ 1088.520071] task:btrfs-uuid state:D stack:0 >> pid:4834 ppid:2 flags:0x00004000 >> kernel: [ 1088.520082] Call Trace: >> kernel: [ 1088.520087] <TASK> >> kernel: [ 1088.520094] __schedule+0x2aa/0x610 >> kernel: [ 1088.520108] schedule+0x63/0x110 >> kernel: [ 1088.520117] wait_current_trans+0x100/0x160 [btrfs] >> kernel: [ 1088.520257] ? __pfx_autoremove_wake_function+0x10/0x10 >> kernel: [ 1088.520269] start_transaction+0x28b/0x600 [btrfs] >> kernel: [ 1088.520406] btrfs_start_transaction+0x1e/0x30 [btrfs] >> kernel: [ 1088.520539] btrfs_uuid_scan_kthread+0x314/0x420 [btrfs] >> kernel: [ 1088.520706] ? __pfx_btrfs_uuid_rescan_kthread+0x10/0x10 [btrfs] >> kernel: [ 1088.520834] btrfs_uuid_rescan_kthread+0x20/0x70 [btrfs] >> kernel: [ 1088.520961] kthread+0xe9/0x110 >> kernel: [ 1088.520969] ? __pfx_kthread+0x10/0x10 >> kernel: [ 1088.520977] ret_from_fork+0x2c/0x50 >> kernel: [ 1088.520987] </TASK> >> kernel: [ 1088.520990] INFO: task btrfs:4850 blocked for more than 362 seconds. >> kernel: [ 1088.521021] Tainted: G W O >> 6.2.0-23-generic #23+btrdebug2c >> kernel: [ 1088.521052] "echo 0 > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> kernel: [ 1088.521084] task:btrfs state:D stack:0 >> pid:4850 ppid:4849 flags:0x00000002 >> kernel: [ 1088.521092] Call Trace: >> kernel: [ 1088.521095] <TASK> >> kernel: [ 1088.521098] __schedule+0x2aa/0x610 >> kernel: [ 1088.521106] schedule+0x63/0x110 >> kernel: [ 1088.521111] btrfs_commit_transaction+0x9b7/0xbc0 [btrfs] >> kernel: [ 1088.521245] ? __pfx_autoremove_wake_function+0x10/0x10 >> kernel: [ 1088.521254] btrfs_sync_fs+0x5a/0x1b0 [btrfs] >> kernel: [ 1088.521372] btrfs_ioctl+0x643/0x14d0 [btrfs] >> kernel: [ 1088.521530] ? putname+0x5d/0x80 >> kernel: [ 1088.521539] ? do_sys_openat2+0xab/0x180 >> kernel: [ 1088.521548] ? exit_to_user_mode_prepare+0x30/0xb0 >> kernel: [ 1088.521559] __x64_sys_ioctl+0xa0/0xe0 >> kernel: [ 1088.521567] do_syscall_64+0x5b/0x90 >> kernel: [ 1088.521575] ? exc_page_fault+0x91/0x1b0 >> kernel: [ 1088.521582] entry_SYSCALL_64_after_hwframe+0x72/0xdc >> kernel: [ 1088.521589] RIP: 0033:0x7fbf339119ef >> kernel: [ 1088.521595] RSP: 002b:00007ffd58427660 EFLAGS: 00000246 >> ORIG_RAX: 0000000000000010 >> kernel: [ 1088.521602] RAX: ffffffffffffffda RBX: 0000000000000003 >> RCX: 00007fbf339119ef >> kernel: [ 1088.521606] RDX: 0000000000000000 RSI: 0000000000009408 >> RDI: 0000000000000003 >> kernel: [ 1088.521610] RBP: 0000000000000007 R08: 0000000000000000 >> R09: 0000000000000000 >> kernel: [ 1088.521613] R10: 0000000000000000 R11: 0000000000000246 >> R12: 00007fbf339f642c >> kernel: [ 1088.521616] R13: 0000000000000001 R14: 0000557384b29578 >> R15: 0000000000000000 >> kernel: [ 1088.521626] </TASK> >> kernel: [ 1209.357423] INFO: task btrfs-uuid:4834 blocked for more >> than 483 seconds. >> kernel: [ 1209.357473] Tainted: G W O >> 6.2.0-23-generic #23+btrdebug2c >> kernel: [ 1209.357507] "echo 0 > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> kernel: [ 1209.357540] task:btrfs-uuid state:D stack:0 >> pid:4834 ppid:2 flags:0x00004000 >> kernel: [ 1209.357551] Call Trace: >> kernel: [ 1209.357555] <TASK> >> kernel: [ 1209.357563] __schedule+0x2aa/0x610 >> kernel: [ 1209.357577] schedule+0x63/0x110 >> kernel: [ 1209.357597] wait_current_trans+0x100/0x160 [btrfs] >> kernel: [ 1209.357738] ? __pfx_autoremove_wake_function+0x10/0x10 >> kernel: [ 1209.357750] start_transaction+0x28b/0x600 [btrfs] >> kernel: [ 1209.357887] btrfs_start_transaction+0x1e/0x30 [btrfs] >> kernel: [ 1209.358021] btrfs_uuid_scan_kthread+0x314/0x420 [btrfs] >> kernel: [ 1209.358187] ? __pfx_btrfs_uuid_rescan_kthread+0x10/0x10 [btrfs] >> kernel: [ 1209.358315] btrfs_uuid_rescan_kthread+0x20/0x70 [btrfs] >> kernel: [ 1209.358442] kthread+0xe9/0x110 >> kernel: [ 1209.358451] ? __pfx_kthread+0x10/0x10 >> kernel: [ 1209.358458] ret_from_fork+0x2c/0x50 >> kernel: [ 1209.358468] </TASK> >> kernel: [ 1330.195147] INFO: task btrfs-transacti:4088 blocked for >> more than 120 seconds. >> kernel: [ 1330.195192] Tainted: G W O >> 6.2.0-23-generic #23+btrdebug2c >> kernel: [ 1330.195221] "echo 0 > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> kernel: [ 1330.195250] task:btrfs-transacti state:D stack:0 >> pid:4088 ppid:2 flags:0x00004000 >> kernel: [ 1330.195259] Call Trace: >> kernel: [ 1330.195263] <TASK> >> kernel: [ 1330.195269] __schedule+0x2aa/0x610 >> kernel: [ 1330.195281] schedule+0x63/0x110 >> kernel: [ 1330.195288] wait_for_commit+0x14c/0x1b0 [btrfs] >> kernel: [ 1330.195413] ? __pfx_autoremove_wake_function+0x10/0x10 >> kernel: [ 1330.195424] btrfs_commit_transaction+0x16c/0xbc0 [btrfs] >> kernel: [ 1330.195552] ? start_transaction+0xc8/0x600 [btrfs] >> kernel: [ 1330.195676] transaction_kthread+0x14b/0x1c0 [btrfs] >> kernel: [ 1330.195795] ? __pfx_transaction_kthread+0x10/0x10 [btrfs] >> kernel: [ 1330.195912] kthread+0xe9/0x110 >> kernel: [ 1330.195920] ? __pfx_kthread+0x10/0x10 >> kernel: [ 1330.195927] ret_from_fork+0x2c/0x50 >> kernel: [ 1330.195937] </TASK> >> kernel: [ 1330.195939] INFO: task btrfs-uuid:4834 blocked for more >> than 604 seconds. >> kernel: [ 1330.195968] Tainted: G W O >> 6.2.0-23-generic #23+btrdebug2c >> kernel: [ 1330.195997] "echo 0 > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> kernel: [ 1330.196026] task:btrfs-uuid state:D stack:0 >> pid:4834 ppid:2 flags:0x00004000 >> kernel: [ 1330.196033] Call Trace: >> kernel: [ 1330.196036] <TASK> >> kernel: [ 1330.196039] __schedule+0x2aa/0x610 >> kernel: [ 1330.196046] schedule+0x63/0x110 >> kernel: [ 1330.196051] wait_current_trans+0x100/0x160 [btrfs] >> kernel: [ 1330.196169] ? __pfx_autoremove_wake_function+0x10/0x10 >> kernel: [ 1330.196177] start_transaction+0x28b/0x600 [btrfs] >> kernel: [ 1330.196298] btrfs_start_transaction+0x1e/0x30 [btrfs] >> kernel: [ 1330.196416] btrfs_uuid_scan_kthread+0x314/0x420 [btrfs] >> kernel: [ 1330.196565] ? __pfx_btrfs_uuid_rescan_kthread+0x10/0x10 [btrfs] >> kernel: [ 1330.196680] btrfs_uuid_rescan_kthread+0x20/0x70 [btrfs] >> kernel: [ 1330.196794] kthread+0xe9/0x110 >> kernel: [ 1330.196800] ? __pfx_kthread+0x10/0x10 >> kernel: [ 1330.196807] ret_from_fork+0x2c/0x50 >> kernel: [ 1330.196814] </TASK> >> kernel: [ 1451.031238] INFO: task btrfs-transacti:4088 blocked for >> more than 241 seconds. >> kernel: [ 1451.031286] Tainted: G W O >> 6.2.0-23-generic #23+btrdebug2c >> kernel: [ 1451.031319] "echo 0 > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> kernel: [ 1451.031352] task:btrfs-transacti state:D stack:0 >> pid:4088 ppid:2 flags:0x00004000 >> kernel: [ 1451.031362] Call Trace: >> kernel: [ 1451.031366] <TASK> >> kernel: [ 1451.031373] __schedule+0x2aa/0x610 >> kernel: [ 1451.031388] schedule+0x63/0x110 >> kernel: [ 1451.031396] wait_for_commit+0x14c/0x1b0 [btrfs] >> kernel: [ 1451.031535] ? __pfx_autoremove_wake_function+0x10/0x10 >> kernel: [ 1451.031548] btrfs_commit_transaction+0x16c/0xbc0 [btrfs] >> kernel: [ 1451.031684] ? start_transaction+0xc8/0x600 [btrfs] >> kernel: [ 1451.031819] transaction_kthread+0x14b/0x1c0 [btrfs] >> kernel: [ 1451.031951] ? __pfx_transaction_kthread+0x10/0x10 [btrfs] >> kernel: [ 1451.032082] kthread+0xe9/0x110 >> kernel: [ 1451.032091] ? __pfx_kthread+0x10/0x10 >> kernel: [ 1451.032098] ret_from_fork+0x2c/0x50 >> kernel: [ 1451.032108] </TASK> >> >> On Mon, 26 Jun 2023 at 19:48, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >>> >>> >>> >>> On 2023/6/24 23:29, Stefan N wrote: >>>> Whoops, I had left --dry-run on the first debug patch you commited, so >>>> that didn't run correctly. >>>> >>>> I've included the output from both patches, as they result in different output. >>>> >>>> Rerunning the older patch first, with loop devices (I tried both >>>> 4x100mb and 4x1gb) I get the following: >>>> >>> [...] >>>> *** The below is using the newer patch as follows: >>>> $ diff fs/btrfs/ ../linux-6.2.0-dist/fs/btrfs/ >>>> diff fs/btrfs/ioctl.c ../linux-6.2.0-dist/fs/btrfs/ioctl.c >>>> 2656,2658d2655 >>>> < else >>>> < btrfs_err(fs_info, "failed to add disk %s: %d", >>>> < vol_args->name, ret); >>>> diff fs/btrfs/transaction.c ../linux-6.2.0-dist/fs/btrfs/transaction.c >>>> 1029d1028 >>>> < /* >>>> 1031d1029 >>>> < */ >>>> diff fs/btrfs/volumes.c ../linux-6.2.0-dist/fs/btrfs/volumes.c >>>> 2677c2677 >>>> < trans = btrfs_join_transaction(root); >>>> --- >>>>> trans = btrfs_start_transaction(root, 0); >>>> 2680d2679 >>>> < btrfs_err(fs_info, "failed to start trans: %d", ret); >>>> 2769d2767 >>>> < btrfs_err(fs_info, "failed to add dev item: %d", ret); >>>> 2787,2789c2785 >>>> < ret = btrfs_end_transaction(trans); >>>> < if (ret < 0) >>>> < btrfs_err(fs_info, "failed to end trans: %d", ret); >>>> --- >>>>> ret = btrfs_commit_transaction(trans); >>>> $ >>>> >>>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs >>>> dev add -K -f /dev/loop12 /dev/loop13 /dev/loop14 /dev/loop15 >>>> /mnt/data ; sudo btrfs fi sync /mnt/data >>>> ERROR: Could not sync filesystem: No space left on device >>> >>> Is it the same even with 4x1GiB loopback devices? >>> >>>> $ >>>> >>>> kernel: [ 1811.846087] BTRFS info (device sdc): using crc32c >>>> (crc32c-intel) checksum algorithm >>>> kernel: [ 1811.846107] BTRFS info (device sdc): disk space caching is enabled >>>> kernel: [ 1817.852850] BTRFS info (device sdc): bdev /dev/sde errs: wr >>>> 0, rd 0, flush 0, corrupt 845, gen 0 >>>> kernel: [ 1817.852866] BTRFS info (device sdc): bdev /dev/sda errs: wr >>>> 41089, rd 1556, flush 0, corrupt 0, gen 0 >>>> kernel: [ 1817.852877] BTRFS info (device sdc): bdev /dev/sdh errs: wr >>>> 3, rd 7, flush 0, corrupt 0, gen 0 >>>> kernel: [ 1817.852884] BTRFS info (device sdc): bdev /dev/sdd errs: wr >>>> 41, rd 0, flush 0, corrupt 0, gen 0 >>>> kernel: [ 2037.562050] BTRFS info (device sdc): balance: resume skipped >>>> kernel: [ 2037.562064] BTRFS info (device sdc): checking UUID tree >>>> kernel: [ 2037.581550] BTRFS info (device sdc): disk added /dev/loop12 >>>> kernel: [ 2037.591163] BTRFS info (device sdc): disk added /dev/loop13 >>>> kernel: [ 2037.599477] BTRFS info (device sdc): disk added /dev/loop14 >>>> kernel: [ 2037.607064] BTRFS info (device sdc): disk added /dev/loop15 >>>> kernel: [ 2176.124630] INFO: task btrfs:7783 blocked for more than 120 seconds. >>>> kernel: [ 2176.124678] Tainted: G W O >>>> 6.2.0-23-generic #23+btrdebug2c >>>> kernel: [ 2176.124710] "echo 0 > >>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >>>> kernel: [ 2176.124742] task:btrfs state:D stack:0 >>>> pid:7783 ppid:7782 flags:0x00004002 >>>> kernel: [ 2176.124753] Call Trace: >>>> kernel: [ 2176.124758] <TASK> >>>> kernel: [ 2176.124765] __schedule+0x2aa/0x610 >>>> kernel: [ 2176.124780] schedule+0x63/0x110 >>>> kernel: [ 2176.124788] btrfs_commit_transaction+0x9b7/0xbc0 [btrfs] >>> >>> This means we're doing the real work, but it seems to take too long. >>> >>> In fact this is already looking promising as we have when through the >>> whole device add part. >>> >>> Just need to let the final commit to finish. >>> >>>> kernel: [ 2176.124929] ? __pfx_autoremove_wake_function+0x10/0x10 >>>> kernel: [ 2176.124941] btrfs_sync_fs+0x5a/0x1b0 [btrfs] >>>> kernel: [ 2176.125060] btrfs_ioctl+0x643/0x14d0 [btrfs] >>>> kernel: [ 2176.125225] __x64_sys_ioctl+0xa0/0xe0 >>>> kernel: [ 2176.125235] do_syscall_64+0x5b/0x90 >>>> kernel: [ 2176.125242] ? do_sys_openat2+0xab/0x180 >>>> kernel: [ 2176.125251] ? exit_to_user_mode_prepare+0x30/0xb0 >>>> kernel: [ 2176.125260] ? syscall_exit_to_user_mode+0x29/0x50 >>>> kernel: [ 2176.125268] ? do_syscall_64+0x67/0x90 >>>> kernel: [ 2176.125275] entry_SYSCALL_64_after_hwframe+0x72/0xdc >>>> kernel: [ 2176.125282] RIP: 0033:0x7f2e8eb119ef >>>> kernel: [ 2176.125288] RSP: 002b:00007ffd632b6aa0 EFLAGS: 00000246 >>>> ORIG_RAX: 0000000000000010 >>>> kernel: [ 2176.125295] RAX: ffffffffffffffda RBX: 0000000000000003 >>>> RCX: 00007f2e8eb119ef >>>> kernel: [ 2176.125300] RDX: 0000000000000000 RSI: 0000000000009408 >>>> RDI: 0000000000000003 >>>> kernel: [ 2176.125303] RBP: 0000000000000007 R08: 0000000000000000 >>>> R09: 0000000000000000 >>>> kernel: [ 2176.125306] R10: 0000000000000000 R11: 0000000000000246 >>>> R12: 00007f2e8ebf642c >>>> kernel: [ 2176.125310] R13: 0000000000000001 R14: 000055cdb7940578 >>>> R15: 0000000000000000 >>>> kernel: [ 2176.125318] </TASK> >>>> kernel: [ 2296.956781] INFO: task btrfs:7783 blocked for more than 241 seconds. >>>> kernel: [ 2296.956824] Tainted: G W O >>>> 6.2.0-23-generic #23+btrdebug2c >>>> kernel: [ 2296.956856] "echo 0 > >>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >>>> kernel: [ 2296.956887] task:btrfs state:D stack:0 >>>> pid:7783 ppid:7782 flags:0x00004002 >>>> kernel: [ 2296.956898] Call Trace: >>>> kernel: [ 2296.956902] <TASK> >>>> kernel: [ 2296.956908] __schedule+0x2aa/0x610 >>>> kernel: [ 2296.956921] schedule+0x63/0x110 >>>> kernel: [ 2296.956928] btrfs_commit_transaction+0x9b7/0xbc0 [btrfs] >>>> kernel: [ 2296.957069] ? __pfx_autoremove_wake_function+0x10/0x10 >>>> kernel: [ 2296.957080] btrfs_sync_fs+0x5a/0x1b0 [btrfs] >>>> kernel: [ 2296.957200] btrfs_ioctl+0x643/0x14d0 [btrfs] >>>> kernel: [ 2296.957366] __x64_sys_ioctl+0xa0/0xe0 >>>> kernel: [ 2296.957375] do_syscall_64+0x5b/0x90 >>>> kernel: [ 2296.957383] ? do_sys_openat2+0xab/0x180 >>>> kernel: [ 2296.957391] ? exit_to_user_mode_prepare+0x30/0xb0 >>>> kernel: [ 2296.957399] ? syscall_exit_to_user_mode+0x29/0x50 >>>> kernel: [ 2296.957407] ? do_syscall_64+0x67/0x90 >>>> kernel: [ 2296.957414] entry_SYSCALL_64_after_hwframe+0x72/0xdc >>>> kernel: [ 2296.957420] RIP: 0033:0x7f2e8eb119ef >>>> kernel: [ 2296.957426] RSP: 002b:00007ffd632b6aa0 EFLAGS: 00000246 >>>> ORIG_RAX: 0000000000000010 >>>> kernel: [ 2296.957433] RAX: ffffffffffffffda RBX: 0000000000000003 >>>> RCX: 00007f2e8eb119ef >>>> kernel: [ 2296.957438] RDX: 0000000000000000 RSI: 0000000000009408 >>>> RDI: 0000000000000003 >>>> kernel: [ 2296.957441] RBP: 0000000000000007 R08: 0000000000000000 >>>> R09: 0000000000000000 >>>> kernel: [ 2296.957444] R10: 0000000000000000 R11: 0000000000000246 >>>> R12: 00007f2e8ebf642c >>>> kernel: [ 2296.957448] R13: 0000000000000001 R14: 000055cdb7940578 >>>> R15: 0000000000000000 >>>> kernel: [ 2296.957468] </TASK> >>>> kernel: [ 2314.043258] ------------[ cut here ]------------ >>>> kernel: [ 2314.043264] BTRFS: Transaction aborted (error -28) >>>> kernel: [ 2314.043334] WARNING: CPU: 2 PID: 7739 at >>>> fs/btrfs/extent-tree.c:2847 do_free_extent_accounting+0x21a/0x220 >>>> [btrfs] >>>> kernel: [ 2314.043467] Modules linked in: ipmi_devintf ipmi_msghandler >>>> overlay iwlwifi_compat(O) binfmt_misc nls_iso8859_1 intel_rapl_msr >>>> snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio >>>> intel_rapl_common snd_hda_codec_hdmi edac_mce_amd snd_hda_intel >>>> snd_intel_dspcfg kvm_amd snd_intel_sdw_acpi snd_hda_codec kvm >>>> snd_hda_core snd_hwdep snd_pcm snd_timer irqbypass rapl wmi_bmof snd >>>> k10temp ccp soundcore input_leds mac_hid dm_multipath scsi_dh_rdac >>>> scsi_dh_emc scsi_dh_alua bonding tls msr nfsd efi_pstore auth_rpcgss >>>> nfs_acl lockd grace sunrpc dmi_sysfs ip_tables x_tables autofs4 btrfs >>>> blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq >>>> async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear >>>> amdgpu iommu_v2 drm_buddy gpu_sched drm_ttm_helper hid_generic ttm >>>> drm_display_helper cec uas rc_core usbhid hid drm_kms_helper >>>> crct10dif_pclmul syscopyarea usb_storage crc32_pclmul polyval_clmulni >>>> sysfillrect polyval_generic sysimgblt nvme ghash_clmulni_intel >>>> sha512_ssse3 >>>> kernel: [ 2314.043599] nvme_core aesni_intel crypto_simd mpt3sas drm >>>> cryptd raid_class ahci i2c_piix4 scsi_transport_sas nvme_common igb >>>> xhci_pci qlcnic dca xhci_pci_renesas libahci i2c_algo_bit video wmi >>>> kernel: [ 2314.043631] CPU: 2 PID: 7739 Comm: btrfs-transacti Tainted: >>>> G W O 6.2.0-23-generic #23+btrdebug2c >>>> kernel: [ 2314.043638] Hardware name: To Be Filled By O.E.M. X570M >>>> Pro4/X570M Pro4, BIOS P3.70 02/23/2022 >>>> kernel: [ 2314.043641] RIP: 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] >>>> kernel: [ 2314.043766] Code: ce 0f 0b eb b8 44 89 e6 48 c7 c7 a8 39 a0 >>>> c1 e8 2c d5 1e ce 0f 0b e9 78 ff ff ff 44 89 e6 48 c7 c7 a8 39 a0 c1 >>>> e8 16 d5 1e ce <0f> 0b eb b9 66 90 90 90 90 90 90 90 90 90 90 90 90 90 >>>> 90 90 90 90 >>>> kernel: [ 2314.043771] RSP: 0018:ffffad0b11b7bb38 EFLAGS: 00010246 >>>> kernel: [ 2314.043777] RAX: 0000000000000000 RBX: ffff9c80e40e8f08 >>>> RCX: 0000000000000000 >>>> kernel: [ 2314.043781] RDX: 0000000000000000 RSI: 0000000000000000 >>>> RDI: 0000000000000000 >>>> kernel: [ 2314.043784] RBP: ffffad0b11b7bb60 R08: 0000000000000000 >>>> R09: 0000000000000000 >>>> kernel: [ 2314.043787] R10: 0000000000000000 R11: 0000000000000000 >>>> R12: 00000000ffffffe4 >>>> kernel: [ 2314.043790] R13: 00005e4c359ba000 R14: 0000000000020000 >>>> R15: ffff9c824d9a58c0 >>>> kernel: [ 2314.043794] FS: 0000000000000000(0000) >>>> GS:ffff9c87a0a80000(0000) knlGS:0000000000000000 >>>> kernel: [ 2314.043798] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> kernel: [ 2314.043802] CR2: 00007f54adc86000 CR3: 00000001471d8000 >>>> CR4: 00000000003506e0 >>>> kernel: [ 2314.043806] Call Trace: >>>> kernel: [ 2314.043809] <TASK> >>>> kernel: [ 2314.043815] __btrfs_free_extent+0x6bc/0xf50 [btrfs] >>>> kernel: [ 2314.043943] run_delayed_data_ref+0x8b/0x180 [btrfs] >>>> kernel: [ 2314.044068] btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] >>>> kernel: [ 2314.044192] __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] >>>> kernel: [ 2314.044316] btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] >>>> kernel: [ 2314.044439] btrfs_start_dirty_block_groups+0x36b/0x530 [btrfs] >>>> kernel: [ 2314.044598] btrfs_commit_transaction+0xb3/0xbc0 [btrfs] >>>> kernel: [ 2314.044754] ? start_transaction+0xc8/0x600 [btrfs] >>>> kernel: [ 2314.044890] transaction_kthread+0x14b/0x1c0 [btrfs] >>>> kernel: [ 2314.045021] ? __pfx_transaction_kthread+0x10/0x10 [btrfs] >>>> kernel: [ 2314.045151] kthread+0xe9/0x110 >>>> kernel: [ 2314.045162] ? __pfx_kthread+0x10/0x10 >>>> kernel: [ 2314.045170] ret_from_fork+0x2c/0x50 >>>> kernel: [ 2314.045180] </TASK> >>>> kernel: [ 2314.045182] ---[ end trace 0000000000000000 ]--- >>>> kernel: [ 2314.045186] BTRFS info (device sdc: state A): dumping space info: >>>> kernel: [ 2314.045191] BTRFS info (device sdc: state A): space_info >>>> DATA has 160777674752 free, is not full >>>> kernel: [ 2314.045197] BTRFS info (device sdc: state A): space_info >>>> total=71201958395904, used=71013439856640, pinned=27737325568, >>>> reserved=0, may_use=0, readonly=3538944 zone_unusable=0 >>>> kernel: [ 2314.045205] BTRFS info (device sdc: state A): space_info >>>> METADATA has -429047808 free, is full >>> >>> This means we need at least 500+ MiB metadata space. >>> >>> Thus you may want to try 4x1GiB to see if this makes any difference. >>> >>> Thanks, >>> Qu >>>> kernel: [ 2314.045209] BTRFS info (device sdc: state A): space_info >>>> total=83634421760, used=82789777408, pinned=244891648, >>>> reserved=599687168, may_use=429047808, readonly=65536 zone_unusable=0 >>>> kernel: [ 2314.045217] BTRFS info (device sdc: state A): space_info >>>> SYSTEM has 33390592 free, is not full >>>> kernel: [ 2314.045221] BTRFS info (device sdc: state A): space_info >>>> total=38797312, used=5373952, pinned=16384, reserved=16384, may_use=0, >>>> readonly=0 zone_unusable=0 >>>> kernel: [ 2314.045227] BTRFS info (device sdc: state A): >>>> global_block_rsv: size 536870912 reserved 428523520 >>>> kernel: [ 2314.045231] BTRFS info (device sdc: state A): >>>> trans_block_rsv: size 524288 reserved 524288 >>>> kernel: [ 2314.045235] BTRFS info (device sdc: state A): >>>> chunk_block_rsv: size 0 reserved 0 >>>> kernel: [ 2314.045239] BTRFS info (device sdc: state A): >>>> delayed_block_rsv: size 0 reserved 0 >>>> kernel: [ 2314.045242] BTRFS info (device sdc: state A): >>>> delayed_refs_rsv: size 249756909568 reserved 0 >>>> kernel: [ 2314.045251] BTRFS: error (device sdc: state A) in >>>> do_free_extent_accounting:2847: errno=-28 No space left >>>> kernel: [ 2314.045265] BTRFS warning (device sdc: state A): >>>> btrfs_uuid_scan_kthread failed -28 >>>> kernel: [ 2314.045295] BTRFS info (device sdc: state EA): forced readonly >>>> kernel: [ 2314.045300] BTRFS error (device sdc: state EA): failed to >>>> run delayed ref for logical 103681409916928 num_bytes 131072 type 184 >>>> action 2 ref_mod 1: -28 >>>> kernel: [ 2314.045360] BTRFS: error (device sdc: state EA) in >>>> btrfs_run_delayed_refs:2151: errno=-28 No space left >>>> kernel: [ 2314.049204] BTRFS: error (device sdc: state EA) in >>>> btrfs_create_pending_block_groups:2487: errno=-28 No space left >>>> kernel: [ 2314.049331] BTRFS: error (device sdc: state EA) in >>>> btrfs_create_pending_block_groups:2499: errno=-28 No space left >>>> kernel: [ 2314.053259] BTRFS: error (device sdc: state EA) in >>>> do_free_extent_accounting:2847: errno=-28 No space left >>>> kernel: [ 2314.053318] BTRFS error (device sdc: state EA): failed to >>>> run delayed ref for logical 103681419366400 num_bytes 131072 type 184 >>>> action 2 ref_mod 1: -28 >>>> kernel: [ 2314.053375] BTRFS: error (device sdc: state EA) in >>>> btrfs_run_delayed_refs:2151: errno=-28 No space left >>>> kernel: [ 2314.053430] BTRFS warning (device sdc: state EA): Skipping >>>> commit of aborted transaction. >>>> kernel: [ 2314.053435] BTRFS: error (device sdc: state EA) in >>>> cleanup_transaction:1986: errno=-28 No space left >>>> >>>> >>>> >>>> On Fri, 23 Jun 2023 at 19:16, Qu Wenruo <wqu@suse.com> wrote: >>>>> >>>>> >>>>> >>>>> On 2023/6/23 17:00, Stefan N wrote: >>>>>> Apologies, I thought I included the log output too, though I can't see >>>>>> any additional output >>>>>> >>>>>> From a fresh run, still using the same kernel >>>>>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs >>>>>> dev add -f /dev/sdl /dev/sdm /dev/sdn /dev/sdo /mnt/data ; sudo btrfs >>>>>> fi sync /mnt/data >>>>>> ERROR: error adding device '/dev/sdl': Input/output error >>>>>> ERROR: error adding device '/dev/sdm': Read-only file system >>>>>> ERROR: error adding device '/dev/sdn': Read-only file system >>>>>> ERROR: error adding device '/dev/sdo': Read-only file system >>>>>> ERROR: Could not sync filesystem: Read-only file system >>>>>> $ >>>>>> >>>>>> Output from kern.log, syslog or dmesg -k >>>>>> >>>>> [...] >>>>> >>>>> None of the newly added debug lines triggered, so there is something >>>>> else causing the problem. >>>>> >>>>> And furthermore the backtrace is not that helpful, it only shows it's >>>>> some async metadata reclaim kthread causing the problem. >>>>> >>>>> Although I guess the async metadata reclaim is triggered by the >>>>> btrfs_start_transaction() call when adding a device. >>>>> So I updated my github branch to go btrfs_join_transaction() which would >>>>> not flush any metadata, thus avoid the problem. >>>>> >>>>> Would you please give it a try again? >>>>> >>>>>> >>>>>> However, now I started digging into logs to check I hadn't missed >>>>>> where the errors were being logged, I've found this from roughly a >>>>>> week before I started having issues, which I had not previously >>>>>> noticed >>>>> >>>>> You don't need to bother most error messages after the fs flipped RO. >>>>> As it's known to have some false alerts. >>>>> >>>>> Thanks, >>>>> Qu >>>>> >>>>>> [ 1990.495861] BTRFS error (device sdh): failed to run delayed ref for >>>>>> logical 107988943355904 num_bytes 245760 type 184 action 2 ref_mod 1: >>>>>> -28 >>>>>> [ 1990.518282] BTRFS error (device sdh): failed to run delayed ref for >>>>>> logical 107989043494912 num_bytes 245760 type 184 action 2 ref_mod 1: >>>>>> -28 >>>>>> [ 620.104065] BTRFS error (device sdk): failed to run delayed ref for >>>>>> logical 123187655077888 num_bytes 176128 type 184 action 2 ref_mod 1: >>>>>> -28 >>>>>> [ 620.126209] BTRFS error (device sdk): failed to run delayed ref for >>>>>> logical 123190279929856 num_bytes 134217728 type 184 action 2 ref_mod >>>>>> 1: -28 >>>>>> [ 620.126241] BTRFS error (device sdk): failed to run delayed ref for >>>>>> logical 123189970468864 num_bytes 134217728 type 184 action 2 ref_mod >>>>>> 1: -28 >>>>>> [ 620.126271] BTRFS error (device sdk): failed to run delayed ref for >>>>>> logical 123190414409728 num_bytes 134217728 type 184 action 2 ref_mod >>>>>> 1: -28 >>>>>> [ 476.565308] BTRFS error (device sdh): failed to run delayed ref for >>>>>> logical 101906434228224 num_bytes 651264 type 184 action 2 ref_mod 1: >>>>>> -28 >>>>>> [ 476.565932] BTRFS error (device sdh): failed to run delayed ref for >>>>>> logical 101906434031616 num_bytes 180224 type 184 action 2 ref_mod 1: >>>>>> -28 >>>>>> [ 447.371754] BTRFS error (device sdh): failed to run delayed ref for >>>>>> logical 101946151927808 num_bytes 262144 type 184 action 2 ref_mod 1: >>>>>> -28 >>>>>> [ 447.372362] BTRFS error (device sdh): failed to run delayed ref for >>>>>> logical 101946083725312 num_bytes 245760 type 184 action 2 ref_mod 1: >>>>>> -28 >>>>>> [ 439.839007] BTRFS error (device sdj): failed to run delayed ref for >>>>>> logical 101923102179328 num_bytes 192512 type 184 action 2 ref_mod 1: >>>>>> -28 >>>>>> [ 439.839578] BTRFS error (device sdj): failed to run delayed ref for >>>>>> logical 101923401629696 num_bytes 245760 type 184 action 2 ref_mod 1: >>>>>> -28 >>>>>> [ 466.393884] BTRFS error (device sdh): failed to run delayed ref for >>>>>> logical 101981116137472 num_bytes 245760 type 184 action 2 ref_mod 1: >>>>>> -28 >>>>>> [ 466.394451] BTRFS error (device sdh): failed to run delayed ref for >>>>>> logical 101981122854912 num_bytes 1720320 type 184 action 2 ref_mod 1: >>>>>> -28 >>>>>> [ 431.541367] BTRFS error (device sdh): failed to run delayed ref for >>>>>> logical 101876426952704 num_bytes 126976 type 184 action 2 ref_mod 1: >>>>>> -28 >>>>>> [ 431.542010] BTRFS error (device sdh): failed to run delayed ref for >>>>>> logical 101876427780096 num_bytes 126976 type 184 action 2 ref_mod 1: >>>>>> -28 >>>>>> [ 597.487948] BTRFS error (device sdj): failed to run delayed ref for >>>>>> logical 108127459409920 num_bytes 196608 type 184 action 2 ref_mod 1: >>>>>> -28 >>>>>> [ 597.488539] BTRFS error (device sdj): failed to run delayed ref for >>>>>> logical 108124677865472 num_bytes 126976 type 184 action 2 ref_mod 1: >>>>>> -28 >>>>>> [ 534.717509] BTRFS error (device sdh): failed to run delayed ref for >>>>>> logical 101958618710016 num_bytes 1597440 type 184 action 2 ref_mod 1: >>>>>> -28 >>>>>> [ 534.718494] BTRFS error (device sdh): failed to run delayed ref for >>>>>> logical 101958756335616 num_bytes 368640 type 184 action 2 ref_mod 1: >>>>>> -28 >>>>>> [ 508.089394] BTRFS error (device sdk): failed to run delayed ref for >>>>>> logical 101911627694080 num_bytes 126976 type 184 action 2 ref_mod 1: >>>>>> -28 >>>>>> [ 508.090007] BTRFS error (device sdk): failed to run delayed ref for >>>>>> logical 101911627415552 num_bytes 126976 type 184 action 2 ref_mod 1: >>>>>> -28 >>>>>> [ 1632.112084] BTRFS error (device sdh): failed to run delayed ref for >>>>>> logical 102203759886336 num_bytes 229376 type 184 action 2 ref_mod 1: >>>>>> -28 >>>>>> [ 1632.112885] BTRFS error (device sdh): failed to run delayed ref for >>>>>> logical 102203764379648 num_bytes 126976 type 184 action 2 ref_mod 1: >>>>>> -28 >>>>>> >>>>>> and today, when leaving the disks mounted read-only for a while, I >>>>>> found many occurances similar to: >>>>>> BTRFS error (device sdc: state EA): level verify failed on logical >>>>>> 201329754554368 mirror 1 wanted 2 found 0 >>>>>> BTRFS error (device sdc: state EA): level verify failed on logical >>>>>> 201329754554368 mirror 2 wanted 2 found 0 >>>>>> BTRFS error (device sdc: state EA): level verify failed on logical >>>>>> 201329754554368 mirror 3 wanted 2 found 0 >>>>>> BTRFS error (device sdc: state EA): level verify failed on logical >>>>>> 201329754554368 mirror 4 wanted 2 found 0 >>>>>> BTRFS error (device sdc: state EA): level verify failed on logical >>>>>> 201329754554368 mirror 1 wanted 2 found 0 >>>>>> BTRFS error (device sdc: state EA): level verify failed on logical >>>>>> 201329754554368 mirror 2 wanted 2 found 0 >>>>>> BTRFS error (device sdc: state EA): level verify failed on logical >>>>>> 201329754554368 mirror 3 wanted 2 found 0 >>>>>> BTRFS error (device sdc: state EA): level verify failed on logical >>>>>> 201350830227456 mirror 4 wanted 2 found 0 >>>>>> BTRFS error (device sdc: state EA): level verify failed on logical >>>>>> 201350830227456 mirror 1 wanted 2 found 0 >>>>>> BTRFS error (device sdc: state EA): level verify failed on logical >>>>>> 201350830227456 mirror 2 wanted 2 found 0 >>>>>> >>>>>> On Fri, 23 Jun 2023 at 10:27, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 2023/6/23 06:18, Stefan N wrote: >>>>>>>> Hi Qu, >>>>>>>> >>>>>>>> I got one new line this time, but it doesn't seem to match your commit >>>>>>>> ERROR: zoned: unable to stat /dev/loop/13 >>>>>>> >>>>>>> Please provide the dmesg of that attempt, as all the extra debug info is >>>>>>> inside dmesg. >>>>>>> >>>>>>> With that info provided, we can determine what to do next. >>>>>>> >>>>>>> Thanks, >>>>>>> Qu >>>>>>> >>>>>>>> >>>>>>>> I tried it on the USB flash drives too and didn't get any extra line >>>>>>>> >>>>>>>> In context >>>>>>>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs >>>>>>>> dev add -K -f /dev/loop12 /dev/loop/13 /dev/loop14 /dev/loop15 >>>>>>>> /mnt/data ; sudo btrfs fi sync /mnt/data >>>>>>>> ERROR: error adding device '/dev/loop12': Input/output error >>>>>>>> ERROR: zoned: unable to stat /dev/loop/13 >>>>>>>> ERROR: checking status of /dev/loop/13: No such file or directory >>>>>>>> ERROR: error adding device '/dev/loop14': Read-only file system >>>>>>>> ERROR: error adding device '/dev/loop15': Read-only file system >>>>>>>> ERROR: Could not sync filesystem: Read-only file system >>>>>>>> $ >>>>>>>> >>>>>>>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs >>>>>>>> dev add -f /dev/sdl /dev/sdm /dev/sdn /dev/sdo /mnt/data ; sudo btrfs >>>>>>>> fi sync /mnt/data >>>>>>>> ERROR: error adding device '/dev/sdl': Input/output error >>>>>>>> ERROR: error adding device '/dev/sdm': Read-only file system >>>>>>>> ERROR: error adding device '/dev/sdn': Read-only file system >>>>>>>> ERROR: error adding device '/dev/sdo': Read-only file system >>>>>>>> ERROR: Could not sync filesystem: Read-only file system >>>>>>>> $ >>>>>>>> >>>>>>>> On Thu, 22 Jun 2023 at 18:48, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 2023/6/22 16:33, Stefan N wrote: >>>>>>>>>> Hi Qu, >>>>>>>>>> >>>>>>>>>> Many thanks for the detailed instructions and your patience. I got it >>>>>>>>>> working combined with >>>>>>>>>> https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel on the main system >>>>>>>>>> OS instead, tagged +btrfix >>>>>>>>>> $ uname -vr >>>>>>>>>> 6.2.0-23-generic #23+btrfix SMP PREEMPT_DYNAMIC Thu Jun 22 >>>>>>>>>> >>>>>>>>>> However, I've not had luck with the commands suggested, and would >>>>>>>>>> appreciate any further ideas. >>>>>>>>>> >>>>>>>>>> Outputs follow below, with /mnt/data as the btrfs mount point that >>>>>>>>>> currently contains 8x disks sd[a-j] with an additional 4x 64gb USB >>>>>>>>>> flash drives being added sd[l-o] >>>>>>>>>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs >>>>>>>>>> dev add -f /dev/sdl /dev/sdm /dev/sdn /dev/sdo /mnt/data ; sudo btrfs >>>>>>>>>> fi sync /mnt/data >>>>>>>>>> ERROR: error adding device '/dev/sdl': Input/output error >>>>>>>>>> ERROR: error adding device '/dev/sdm': Read-only file system >>>>>>>>>> ERROR: error adding device '/dev/sdn': Read-only file system >>>>>>>>>> ERROR: error adding device '/dev/sdo': Read-only file system >>>>>>>>>> ERROR: Could not sync filesystem: Read-only file system >>>>>>>>>> $ >>>>>>>>>> >>>>>>>>>> The same occurs if I try to add 4x 100mb loop devices (on a ssd so >>>>>>>>>> they're super quick to zero); >>>>>>>>>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo btrfs >>>>>>>>>> dev add -K -f /dev/loop16 /dev/loop17 /dev/loop18 /dev/loop19 >>>>>>>>>> /mnt/data ; sudo btrfs fi sync /mnt/data >>>>>>>>>> ERROR: error adding device '/dev/loop16': Input/output error >>>>>>>>> >>>>>>>>> This is the interesting part, this means we're erroring out due to -EIO >>>>>>>>> (not -ENOSPC) during the first device add. >>>>>>>>> >>>>>>>>> And by somehow, after the first device add, we already got the trans abort. >>>>>>>>> >>>>>>>>> Would you please try the following branch? >>>>>>>>> >>>>>>>>> https://github.com/adam900710/linux/tree/dev_add_no_commit >>>>>>>>> >>>>>>>>> It has not only the patch to skip the commit, but also extra debug >>>>>>>>> output for the situation. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Qu >>>>>>>>> >>>>>>>>>> ERROR: error adding device '/dev/loop17': Read-only file system >>>>>>>>>> ERROR: error adding device '/dev/loop18': Read-only file system >>>>>>>>>> ERROR: error adding device '/dev/loop19': Read-only file system >>>>>>>>>> ERROR: Could not sync filesystem: Read-only file system >>>>>>>>>> $ >>>>>>>>>> >>>>>>>>>> I confirmed before both these kernel builds that the replaced line was >>>>>>>>>> btrfs_end_transaction rather than btrfs_commit_transaction (anyone >>>>>>>>>> else following, I needed to remove the -n in the patch command >>>>>>>>>> earlier) >>>>>>>>>> $ grep -A3 -ri btrfs_sysfs_update_sprout */fs/btrfs/volumes.c* >>>>>>>>>> linux-6.2.0-dist/fs/btrfs/volumes.c: >>>>>>>>>> btrfs_sysfs_update_sprout_fsid(fs_devices); >>>>>>>>>> linux-6.2.0-dist/fs/btrfs/volumes.c- } >>>>>>>>>> linux-6.2.0-dist/fs/btrfs/volumes.c- >>>>>>>>>> linux-6.2.0-dist/fs/btrfs/volumes.c- ret = btrfs_commit_transaction(trans); >>>>>>>>>> -- >>>>>>>>>> linux-6.2.0-v2/fs/btrfs/volumes.c: >>>>>>>>>> btrfs_sysfs_update_sprout_fsid(fs_devices); >>>>>>>>>> linux-6.2.0-v2/fs/btrfs/volumes.c- } >>>>>>>>>> linux-6.2.0-v2/fs/btrfs/volumes.c- >>>>>>>>>> linux-6.2.0-v2/fs/btrfs/volumes.c- ret = btrfs_end_transaction(trans); >>>>>>>>>> -- >>>>>>>>>> linux-6.2.0-v3/fs/btrfs/volumes.c: >>>>>>>>>> btrfs_sysfs_update_sprout_fsid(fs_devices); >>>>>>>>>> linux-6.2.0-v3/fs/btrfs/volumes.c- } >>>>>>>>>> linux-6.2.0-v3/fs/btrfs/volumes.c- >>>>>>>>>> linux-6.2.0-v3/fs/btrfs/volumes.c- ret = btrfs_end_transaction(trans); >>>>>>>>>> $ >>>>>>>>>> >>>>>>>>>> $ btrfs fi usage /mnt/data >>>>>>>>>> Overall: >>>>>>>>>> Device size: 87.31TiB >>>>>>>>>> Device allocated: 87.31TiB >>>>>>>>>> Device unallocated: 1.94GiB >>>>>>>>>> Device missing: 0.00B >>>>>>>>>> Device slack: 0.00B >>>>>>>>>> Used: 87.08TiB >>>>>>>>>> Free (estimated): 173.29GiB (min: 172.33GiB) >>>>>>>>>> Free (statfs, df): 171.84GiB >>>>>>>>>> Data ratio: 1.34 >>>>>>>>>> Metadata ratio: 4.00 >>>>>>>>>> Global reserve: 512.00MiB (used: 371.25MiB) >>>>>>>>>> Multiple profiles: no >>>>>>>>>> >>>>>>>>>> Data,RAID6: Size:64.76TiB, Used:64.59TiB (99.74%) >>>>>>>>>> /dev/sdc 10.90TiB >>>>>>>>>> /dev/sdf 10.90TiB >>>>>>>>>> /dev/sda 10.86TiB >>>>>>>>>> /dev/sdg 10.87TiB >>>>>>>>>> /dev/sdh 10.86TiB >>>>>>>>>> /dev/sdd 10.87TiB >>>>>>>>>> /dev/sde 10.88TiB >>>>>>>>>> /dev/sdb 10.88TiB >>>>>>>>>> >>>>>>>>>> Metadata,RAID1C4: Size:77.79GiB, Used:77.11GiB (99.12%) >>>>>>>>>> /dev/sdc 15.33GiB >>>>>>>>>> /dev/sdf 18.41GiB >>>>>>>>>> /dev/sda 49.63GiB >>>>>>>>>> /dev/sdg 49.50GiB >>>>>>>>>> /dev/sdh 51.52GiB >>>>>>>>>> /dev/sdd 48.70GiB >>>>>>>>>> /dev/sde 39.09GiB >>>>>>>>>> /dev/sdb 39.01GiB >>>>>>>>>> >>>>>>>>>> System,RAID1C4: Size:37.00MiB, Used:5.11MiB (13.81%) >>>>>>>>>> /dev/sdc 1.00MiB >>>>>>>>>> /dev/sda 37.00MiB >>>>>>>>>> /dev/sdg 37.00MiB >>>>>>>>>> /dev/sdh 36.00MiB >>>>>>>>>> /dev/sdd 37.00MiB >>>>>>>>>> >>>>>>>>>> Unallocated: >>>>>>>>>> /dev/sdc 1.00MiB >>>>>>>>>> /dev/sdf 1.00MiB >>>>>>>>>> /dev/sda 1.27GiB >>>>>>>>>> /dev/sdg 1.00MiB >>>>>>>>>> /dev/sdh 1.00MiB >>>>>>>>>> /dev/sdd 687.00MiB >>>>>>>>>> /dev/sde 1.00MiB >>>>>>>>>> /dev/sdb 1.00MiB >>>>>>>>>> $ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> This first attempt generated the following syslog output: >>>>>>>>>> kernel: [ 868.435387] BTRFS info (device sde): using crc32c >>>>>>>>>> (crc32c-intel) checksum algorithm >>>>>>>>>> kernel: [ 868.435407] BTRFS info (device sde): disk space caching is enabled >>>>>>>>>> kernel: [ 874.477712] BTRFS info (device sde): bdev /dev/sdg errs: wr >>>>>>>>>> 0, rd 0, flush 0, corrupt 845, gen 0 >>>>>>>>>> kernel: [ 874.477727] BTRFS info (device sde): bdev /dev/sdc errs: wr >>>>>>>>>> 41089, rd 1556, flush 0, corrupt 0, gen 0 >>>>>>>>>> kernel: [ 874.477735] BTRFS info (device sde): bdev /dev/sdj errs: wr >>>>>>>>>> 3, rd 7, flush 0, corrupt 0, gen 0 >>>>>>>>>> kernel: [ 874.477740] BTRFS info (device sde): bdev /dev/sdf errs: wr >>>>>>>>>> 41, rd 0, flush 0, corrupt 0, gen 0 >>>>>>>>>> kernel: [ 1082.645551] BTRFS info (device sde): balance: resume skipped >>>>>>>>>> kernel: [ 1082.645564] BTRFS info (device sde): checking UUID tree >>>>>>>>>> kernel: [ 1082.645551] BTRFS info (device sde): balance: resume skipped >>>>>>>>>> kernel: [ 1082.645564] BTRFS info (device sde): checking UUID tree >>>>>>>>>> kernel: [ 1267.280506] BTRFS: Transaction aborted (error -28) >>>>>>>>>> kernel: [ 1267.280553] BTRFS: error (device sde: state A) in >>>>>>>>>> do_free_extent_accounting:2847: errno=-28 No space left >>>>>>>>>> kernel: [ 1267.280604] BTRFS info (device sde: state EA): forced readonly >>>>>>>>>> kernel: [ 1267.280610] BTRFS error (device sde: state EA): failed to >>>>>>>>>> run delayed ref for logical 102255404044288 num_bytes 294912 type 184 >>>>>>>>>> action 2 ref_mod 1: -28 >>>>>>>>>> kernel: [ 1267.280584] WARNING: CPU: 3 PID: 14519 at >>>>>>>>>> fs/btrfs/extent-tree.c:2847 do_free_extent_accounting+0x21a/0x220 >>>>>>>>>> [btrfs] >>>>>>>>>> kernel: [ 1267.280666] BTRFS: error (device sde: state EA) in >>>>>>>>>> btrfs_run_delayed_refs:2151: errno=-28 No space left >>>>>>>>>> kernel: [ 1267.280695] BTRFS warning (device sde: state EA): >>>>>>>>>> btrfs_uuid_scan_kthread failed -5 >>>>>>>>>> kernel: [ 1267.280794] Modules linked in: xt_nat xt_tcpudp veth >>>>>>>>>> xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink >>>>>>>>>> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo >>>>>>>>>> xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc >>>>>>>>>> ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) binfmt_misc >>>>>>>>>> nls_iso8859_1 intel_rapl_msr intel_rapl_common edac_mce_amd >>>>>>>>>> snd_hda_codec_realtek kvm_amd snd_hda_codec_generic ledtrig_audio kvm >>>>>>>>>> snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi >>>>>>>>>> snd_hda_codec irqbypass snd_hda_core snd_hwdep rapl snd_pcm snd_timer >>>>>>>>>> wmi_bmof k10temp snd ccp soundcore input_leds mac_hid dm_multipath >>>>>>>>>> scsi_dh_rdac scsi_dh_emc scsi_dh_alua bonding tls efi_pstore msr nfsd >>>>>>>>>> auth_rpcgss nfs_acl lockd grace sunrpc dmi_sysfs ip_tables x_tables >>>>>>>>>> autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov >>>>>>>>>> async_memcpy async_pq async_xor async_txxor raid6_pq libcrc32c raid1 >>>>>>>>>> raid0 multipath linear hid_generic usbhid hid amdgpu uas usb_storage >>>>>>>>>> kernel: [ 1267.280994] CPU: 3 PID: 14519 Comm: btrfs-transacti >>>>>>>>>> Tainted: G W O 6.2.0-23-generic #23+btrfix >>>>>>>>>> kernel: [ 1267.281005] RIP: 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] >>>>>>>>>> kernel: [ 1267.281181] __btrfs_free_extent+0x6bc/0xf50 [btrfs] >>>>>>>>>> kernel: [ 1267.281310] run_delayed_data_ref+0x8b/0x180 [btrfs] >>>>>>>>>> kernel: [ 1267.281444] btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] >>>>>>>>>> kernel: [ 1267.281570] __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] >>>>>>>>>> kernel: [ 1267.281694] btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] >>>>>>>>>> kernel: [ 1267.281818] btrfs_start_dirty_block_groups+0x36b/0x530 [btrfs] >>>>>>>>>> kernel: [ 1267.281976] btrfs_commit_transaction+0xb3/0xbc0 [btrfs] >>>>>>>>>> kernel: [ 1267.282110] ? start_transaction+0xc8/0x600 [btrfs] >>>>>>>>>> kernel: [ 1267.282244] transaction_kthread+0x14b/0x1c0 [btrfs] >>>>>>>>>> kernel: [ 1267.282375] ? __pfx_transaction_kthread+0x10/0x10 [btrfs] >>>>>>>>>> kernel: [ 1267.282548] BTRFS info (device sde: state EA): dumping space info: >>>>>>>>>> kernel: [ 1267.282552] BTRFS info (device sde: state EA): space_info >>>>>>>>>> DATA has 160777674752 free, is not full >>>>>>>>>> kernel: [ 1267.282558] BTRFS info (device sde: state EA): space_info >>>>>>>>>> total=71201958395904, used=71018191273984, pinned=22985908224, >>>>>>>>>> reserved=0, may_use=0, readonly=3538944 zone_unusable=0 >>>>>>>>>> kernel: [ 1267.282566] BTRFS info (device sde: state EA): space_info >>>>>>>>>> METADATA has -124944384 free, is full >>>>>>>>>> kernel: [ 1267.282571] BTRFS info (device sde: state EA): space_info >>>>>>>>>> total=83530612736, used=82791497728, pinned=242745344, >>>>>>>>>> reserved=496369664, may_use=124944384, readonly=0 zone_unusable=0 >>>>>>>>>> kernel: [ 1267.282577] BTRFS info (device sde: state EA): space_info >>>>>>>>>> SYSTEM has 33439744 free, is not full >>>>>>>>>> kernel: [ 1267.282582] BTRFS info (device sde: state EA): space_info >>>>>>>>>> total=38797312, used=5357568, pinned=0, reserved=0, may_use=0, >>>>>>>>>> readonly=0 zone_unusable=0 >>>>>>>>>> kernel: [ 1267.282588] BTRFS info (device sde: state EA): >>>>>>>>>> global_block_rsv: size 536870912 reserved 124944384 >>>>>>>>>> kernel: [ 1267.282592] BTRFS info (device sde: state EA): >>>>>>>>>> trans_block_rsv: size 0 reserved 0 >>>>>>>>>> kernel: [ 1267.282595] BTRFS info (device sde: state EA): >>>>>>>>>> chunk_block_rsv: size 0 reserved 0 >>>>>>>>>> kernel: [ 1267.282599] BTRFS info (device sde: state EA): >>>>>>>>>> delayed_block_rsv: size 0 reserved 0 >>>>>>>>>> kernel: [ 1267.282602] BTRFS info (device sde: state EA): >>>>>>>>>> delayed_refs_rsv: size 251322957824 reserved 0 >>>>>>>>>> kernel: [ 1267.282608] BTRFS: error (device sde: state EA) in >>>>>>>>>> do_free_extent_accounting:2847: errno=-28 No space left >>>>>>>>>> kernel: [ 1267.282653] BTRFS error (device sde: state EA): failed to >>>>>>>>>> run delayed ref for logical 102255401897984 num_bytes 126976 type 184 >>>>>>>>>> action 2 ref_mod 1: -28 >>>>>>>>>> kernel: [ 1267.282708] BTRFS: error (device sde: state EA) in >>>>>>>>>> btrfs_run_delayed_refs:2151: errno=-28 No space left >>>>>>>>>> >>>>>>>>>> A couple of kernel recompiles later, the second attempt on the SSD >>>>>>>>>> generated similar: >>>>>>>>>> kernel: [ 1472.203470] BTRFS info (device sdc): using crc32c >>>>>>>>>> (crc32c-intel) checksum algorithm >>>>>>>>>> kernel: [ 1472.203491] BTRFS info (device sdc): disk space caching is enabled >>>>>>>>>> kernel: [ 1478.155004] BTRFS info (device sdc): bdev /dev/sdf errs: wr >>>>>>>>>> 0, rd 0, flush 0, corrupt 845, gen 0 >>>>>>>>>> kernel: [ 1478.155022] BTRFS info (device sdc): bdev /dev/sda errs: wr >>>>>>>>>> 41089, rd 1556, flush 0, corrupt 0, gen 0 >>>>>>>>>> kernel: [ 1478.155034] BTRFS info (device sdc): bdev /dev/sdh errs: wr >>>>>>>>>> 3, rd 7, flush 0, corrupt 0, gen 0 >>>>>>>>>> kernel: [ 1478.155041] BTRFS info (device sdc): bdev /dev/sdd errs: wr >>>>>>>>>> 41, rd 0, flush 0, corrupt 0, gen 0 >>>>>>>>>> kernel: [ 1696.662526] BTRFS info (device sdc): balance: resume skipped >>>>>>>>>> kernel: [ 1696.662537] BTRFS info (device sdc): checking UUID tree >>>>>>>>>> kernel: [ 1919.452464] BTRFS: Transaction aborted (error -28) >>>>>>>>>> kernel: [ 1919.452534] WARNING: CPU: 1 PID: 161 at >>>>>>>>>> fs/btrfs/extent-tree.c:2847 do_free_extent_accounting+0x21a/0x220 >>>>>>>>>> [btrfs] >>>>>>>>>> kernel: [ 1919.452655] Modules linked in: xt_nat xt_tcpudp veth >>>>>>>>>> xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink >>>>>>>>>> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo >>>>>>>>>> xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc >>>>>>>>>> ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) binfmt_misc >>>>>>>>>> nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic >>>>>>>>>> ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg >>>>>>>>>> snd_intel_sdw_acpi snd_hda_codec intel_rapl_msr snd_hda_core >>>>>>>>>> intel_rapl_common edac_mce_amd snd_hwdep kvm_amd snd_pcm snd_timer kvm >>>>>>>>>> irqbypass rapl wmi_bmof snd k10temp soundcore ccp input_leds mac_hid >>>>>>>>>> dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua bonding tls nfsd >>>>>>>>>> msr auth_rpcgss efi_pstore nfs_acl lockd grace sunrpc dmi_sysfs >>>>>>>>>> ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 >>>>>>>>>> async_raid6_recov async_memcpy async_pq async_xor async_tx xor >>>>>>>>>> raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid >>>>>>>>>> amdgpu uas hid iommu_v2 >>>>>>>>>> kernel: [ 1919.452839] Workqueue: events_unbound >>>>>>>>>> btrfs_async_reclaim_metadata_space [btrfs] >>>>>>>>>> kernel: [ 1919.452985] RIP: 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] >>>>>>>>>> kernel: [ 1919.453141] __btrfs_free_extent+0x6bc/0xf50 [btrfs] >>>>>>>>>> kernel: [ 1919.453256] run_delayed_data_ref+0x8b/0x180 [btrfs] >>>>>>>>>> kernel: [ 1919.453368] btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] >>>>>>>>>> kernel: [ 1919.453480] __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] >>>>>>>>>> kernel: [ 1919.453592] btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] >>>>>>>>>> kernel: [ 1919.453703] flush_space+0x23c/0x2c0 [btrfs] >>>>>>>>>> kernel: [ 1919.453845] btrfs_async_reclaim_metadata_space+0x19b/0x2b0 [btrfs] >>>>>>>>>> kernel: [ 1919.454034] BTRFS info (device sdc: state A): dumping space info: >>>>>>>>>> kernel: [ 1919.454038] BTRFS info (device sdc: state A): space_info >>>>>>>>>> DATA has 160778723328 free, is not full >>>>>>>>>> kernel: [ 1919.454043] BTRFS info (device sdc: state A): space_info >>>>>>>>>> total=71201958395904, used=71017442181120, pinned=23733952512, >>>>>>>>>> reserved=0, may_use=0, readonly=3538944 zone_unusable=0 >>>>>>>>>> kernel: [ 1919.454050] BTRFS info (device sdc: state A): space_info >>>>>>>>>> METADATA has -147570688 free, is full >>>>>>>>>> kernel: [ 1919.454054] BTRFS info (device sdc: state A): space_info >>>>>>>>>> total=83530612736, used=82792185856, pinned=238059520, >>>>>>>>>> reserved=500367360, may_use=147570688, readonly=0 zone_unusable=0 >>>>>>>>>> kernel: [ 1919.454060] BTRFS info (device sdc: state A): space_info >>>>>>>>>> SYSTEM has 33439744 free, is not full >>>>>>>>>> kernel: [ 1919.454064] BTRFS info (device sdc: state A): space_info >>>>>>>>>> total=38797312, used=5357568, pinned=0, reserved=0, may_use=0, >>>>>>>>>> readonly=0 zone_unusable=0 >>>>>>>>>> kernel: [ 1919.454070] BTRFS info (device sdc: state A): >>>>>>>>>> global_block_rsv: size 536870912 reserved 147570688 >>>>>>>>>> kernel: [ 1919.454074] BTRFS info (device sdc: state A): >>>>>>>>>> trans_block_rsv: size 0 reserved 0 >>>>>>>>>> kernel: [ 1919.454077] BTRFS info (device sdc: state A): >>>>>>>>>> chunk_block_rsv: size 0 reserved 0 >>>>>>>>>> kernel: [ 1919.454080] BTRFS info (device sdc: state A): >>>>>>>>>> delayed_block_rsv: size 0 reserved 0 >>>>>>>>>> kernel: [ 1919.454083] BTRFS info (device sdc: state A): >>>>>>>>>> delayed_refs_rsv: size 254292787200 reserved 0 >>>>>>>>>> kernel: [ 1919.454086] BTRFS: error (device sdc: state A) in >>>>>>>>>> do_free_extent_accounting:2847: errno=-28 No space left >>>>>>>>>> kernel: [ 1919.454123] BTRFS info (device sdc: state EA): forced readonly >>>>>>>>>> kernel: [ 1919.454127] BTRFS error (device sdc: state EA): failed to >>>>>>>>>> run delayed ref for logical 102538713931776 num_bytes 245760 type 184 >>>>>>>>>> action 2 ref_mod 1: -28 >>>>>>>>>> kernel: [ 1919.454176] BTRFS: error (device sdc: state EA) in >>>>>>>>>> btrfs_run_delayed_refs:2151: errno=-28 No space left >>>>>>>>>> kernel: [ 1919.454249] BTRFS warning (device sdc: state EA): >>>>>>>>>> btrfs_uuid_scan_kthread failed -5 >>>>>>>>>> kernel: [ 1919.472381] BTRFS: error (device sdc: state EA) in >>>>>>>>>> __btrfs_free_extent:3077: errno=-28 No space left >>>>>>>>>> kernel: [ 1919.472417] BTRFS error (device sdc: state EA): failed to >>>>>>>>>> run delayed ref for logical 102538732191744 num_bytes 245760 type 184 >>>>>>>>>> action 2 ref_mod 1: -28 >>>>>>>>>> kernel: [ 1919.472442] BTRFS: error (device sdc: state EA) in >>>>>>>>>> btrfs_run_delayed_refs:2151: errno=-28 No space left >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sat, 17 Jun 2023 at 15:00, Qu Wenruo <wqu@suse.com> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 2023/6/17 13:11, Stefan N wrote: >>>>>>>>>>>> Hi Qu, >>>>>>>>>>>> >>>>>>>>>>>> I believe I've got this environment ready, with the 6.2.0 kernel as >>>>>>>>>>>> before using the Ubuntu kernel, but can switch to vanilla if required. >>>>>>>>>>>> >>>>>>>>>>>> I've not done anything kernel modifications for a solid decade, so >>>>>>>>>>>> would be keen for a bit of guidance. >>>>>>>>>>> >>>>>>>>>>> Sure no problem. >>>>>>>>>>> >>>>>>>>>>> Please fetch the kernel source tar ball (6.2.x) first, decompress, then >>>>>>>>>>> apply the attached one-line patch by: >>>>>>>>>>> >>>>>>>>>>> $ tar czf linux*.tar.xz >>>>>>>>>>> $ cd linux* >>>>>>>>>>> $ patch -np1 -i <the patch file> >>>>>>>>>>> >>>>>>>>>>> Then use your running system kernel config if possible: >>>>>>>>>>> >>>>>>>>>>> $ cp /proc/config.gz . >>>>>>>>>>> $ gunzip config.gz >>>>>>>>>>> $ mv config .config >>>>>>>>>>> $ make olddefconfig >>>>>>>>>>> >>>>>>>>>>> Then you can start your kernel compiling, and considering you're using >>>>>>>>>>> your distro's default, it would include tons of drivers, thus would be >>>>>>>>>>> very slow. (Replace the number to something more suitable to your >>>>>>>>>>> system, using all CPU cores can be very hot) >>>>>>>>>>> >>>>>>>>>>> $ make -j12 >>>>>>>>>>> >>>>>>>>>>> Finally you need to install the modules/kernel. >>>>>>>>>>> >>>>>>>>>>> Unfortunately this is distro specific, but if you're using Ubuntu, it >>>>>>>>>>> may be much easier: >>>>>>>>>>> >>>>>>>>>>> $ make bindeb-pkg >>>>>>>>>>> >>>>>>>>>>> Then install the generated dpkg I guess? I have never tried kernel >>>>>>>>>>> building using deb/rpm, but only manual installation, which is also >>>>>>>>>>> distro dependent in the initramfs generation part. >>>>>>>>>>> >>>>>>>>>>> # cp arch/x86/boot/bzImage /boot/vmlinuz-custom >>>>>>>>>>> # make modules_install >>>>>>>>>>> # mkinitcpio -k /boot/vmlinuz-custom -g /boot/initramfs-custom.img >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> The last step is to update your bootloader to add the new kernel, which >>>>>>>>>>> is not only distro dependent but also bootloader dependent. >>>>>>>>>>> >>>>>>>>>>> In my case, I go with systemd-boot with manually crafted entries. >>>>>>>>>>> But if you go Ubuntu I believe just installing the kernel dpkg would >>>>>>>>>>> have everything handled? >>>>>>>>>>> >>>>>>>>>>> Finally you can try reboot into the newer kernel, and try device add >>>>>>>>>>> (need to add 4 disks), then sync and see if things work as expected. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Qu >>>>>>>>>>>> >>>>>>>>>>>> I will recover a 1tb SSD and partition it into 4 in a USB enclosure, >>>>>>>>>>>> but failing this will use 4x loop devices. >>>>>>>>>>>> >>>>>>>>>>>> On Tue, 13 Jun 2023 at 11:28, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >>>>>>>>>>>>> In your particular case, since you're running RAID1C4 you need to add 4 >>>>>>>>>>>>> devices in one transaction. >>>>>>>>>>>>> >>>>>>>>>>>>> I can easily craft a patch to avoid commit transaction, but still you'll >>>>>>>>>>>>> need to add at least 4 disks, and then sync to see if things would work. >>>>>>>>>>>>> >>>>>>>>>>>>> Furthermore this means you need a liveCD with full kernel compiling >>>>>>>>>>>>> environment. >>>>>>>>>>>>> >>>>>>>>>>>>> If you want to go this path, I can send you the patch when you've >>>>>>>>>>>>> prepared the needed environment. ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <CA+W5K0oDRo2LZMiUiysYXpcpmfXTvS27hPdjm1pzq4kfq9=vdQ@mail.gmail.com>]
* Re: Out of space loop: skip_balance not working [not found] ` <CA+W5K0oDRo2LZMiUiysYXpcpmfXTvS27hPdjm1pzq4kfq9=vdQ@mail.gmail.com> @ 2023-07-23 7:23 ` Qu Wenruo 0 siblings, 0 replies; 22+ messages in thread From: Qu Wenruo @ 2023-07-23 7:23 UTC (permalink / raw) To: Stefan N; +Cc: Qu Wenruo, linux-btrfs@vger.kernel.org, Josef Bacik, Filipe Manana On 2023/7/23 14:21, Stefan N wrote: > Hi Qu, > > Many thanks for that new patch, that's done the job. > > As I've now got 3 disks with plenty of space, I've converted > metadata/system to RAID1C3 to mitigate the issue until the 4th disk has > finished replacing. > > Hopefully a fix for the underlying issue is applied by the time I start > running low on space, though looking at the usage now (below) it looks > like I might never run out again 😂 judging by this, is it possible the > issue I had only existed because I was on LTS with kernel 5,15, and 6.2 > might already have fixed the under allocation issue that caused this? Unfortunately I'm not an export on the extent allocator nor ENOSPC situations, thus I can not help much on the root cauese. Filipe and Josef may provide mode helps on this. Thanks, Qu > > Many thanks again, > > Stefan > > Data,RAID6: Size:65.41TiB, Used:65.22TiB (99.70%) > /dev/sdf 11.49TiB > /dev/sdg 10.91TiB > /dev/sdd 11.46TiB > /dev/sdj 10.88TiB > /dev/sde 10.88TiB > /dev/sdc 10.88TiB > /dev/sdh 11.47TiB > /dev/sdb 10.89TiB > > Metadata,RAID1C3: Size:133.00GiB, Used:77.74GiB (58.45%) > /dev/sdf 133.00GiB > /dev/sdd 133.00GiB > /dev/sdh 133.00GiB > > System,RAID1C3: Size:32.00MiB, Used:5.25MiB (16.41%) > /dev/sdf 32.00MiB > /dev/sdd 32.00MiB > /dev/sdh 32.00MiB > > Unallocated: > /dev/sda 10.91TiB <-- replace target (in progress) > /dev/sdf 4.75TiB > /dev/sdg 5.41GiB > /dev/sdd 4.78TiB > /dev/sdj 36.49GiB > /dev/sde 38.53GiB > /dev/sdc 36.33GiB > /dev/sdh 4.77TiB > /dev/sdb 26.01GiB > > > On Sat, 22 Jul 2023 at 19:38, Qu Wenruo <quwenruo.btrfs@gmx.com > <mailto:quwenruo.btrfs@gmx.com>> wrote: > > > > On 2023/7/22 13:28, Stefan N wrote: > > Hi again Qu, > > > > Thanks for all your help last month, I managed to get things going > > again and have been slowly adding new disks, but have now ended up > > with a similar but slightly more complicated problem I need some more > > assistance with. > > > > Since last time: I used loop devices to get the fs operational again, > > then deleted some files to create space, removed the loop devices, > > successfully used btrfs replace to replace 3x 12tb disks with 18tbs, > > and moved to space cache v2 in the hope it'd prevent future issues. > > > > The problem: during the 4th replace operation the metadata issue has > > recurred, the first time self correcting when remounted, but this > > second time has resulted in a similar paradox to last time. I've > > rebooted into the patched kernel from last month, but the same > > solution is now ineffective due to the system failing to detect the > > replace target, despite no disks having been removed nor changing > from > > /dev/sda and /dev/sdl during the reboots. > > > > During the replace process the disks were in use, and while after > > there's plenty of space for data it seems enough was written to fill > > metadata again. In hindsight I should have left the 4 loop devices in > > place until the replaces had completed to satisfy the RAID1C4 > > requirement for the metadata, as despite deleting files data has not > > been freed from the existing 12tb disks. > > > > The 'missing' replace target is: > > Disk /dev/sda: 16.37 TiB, 18000207937536 bytes, 35156656128 sectors > > The problem seems to be that, replace cancel also needs to commit > transaction, which is obviously a bad situation during high metadata > stress. > > > But the root problem is still why we hit ENOSPC, AFAIK Filipe is working > on this problem. > > > For now, the problem can be more or less worked around by the same > method, instead of committing transaction we just cancel the current one > so that you can continue to go with the patched device add. > > I have updated the branch to have a new patch, please try if this allows > you to mount it with "-o degraded" then try cancel and add devices. > > https://github.com/adam900710/linux/tree/dev_add_no_commit > <https://github.com/adam900710/linux/tree/dev_add_no_commit> > > Thanks, > Qu > > [...] > > > > > > $ sudo mount -o degraded /mnt/data ; sudo btrfs replace cancel > > /mnt/data ; sudo btrfs dev add -K -f /dev/loop20 /dev/loop21 > > /dev/loop22 /dev/loop23 /mnt/data ; sudo btrfs fi sync /mnt/data > > ERROR: error adding device '/dev/loop20': Read-only file system > > ERROR: error adding device '/dev/loop21': Read-only file system > > ERROR: error adding device '/dev/loop22': Read-only file system > > ERROR: error adding device '/dev/loop23': Read-only file system > > ERROR: Could not sync filesystem: Read-only file system > > $ > > > > syslog: > > BTRFS info (device sdf): using crc32c (crc32c-intel) checksum > algorithm > > BTRFS info (device sdf): allowing degraded mounts > > BTRFS info (device sdf): using free space tree > > BTRFS info (device sdf): bdev /dev/sdg errs: wr 0, rd 0, flush 0, > > corrupt 845, gen 0 > > BTRFS info (device sdf): bdev /dev/sde errs: wr 3, rd 7, flush 0, > > corrupt 0, gen 0 > > BTRFS info (device sdf): bdev /dev/sdc errs: wr 41, rd 0, flush 0, > > corrupt 0, gen 0 > > BTRFS info (device sdf): cannot continue dev_replace, tgtdev is > missing > > BTRFS info (device sdf): you may cancel the operation after > 'mount -o degraded' > > BTRFS: Transaction aborted (error -28) > > WARNING: CPU: 0 PID: 6659 at fs/btrfs/extent-tree.c:3077 > > __btrfs_free_extent+0xa18/0xf50 [btrfs] > > Modules linked in: xt_nat xt_tcpudp veth xt_conntrack nft_chain_nat > > xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 > > nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables > > nfnetlink br_netfilter bridge stp llc rpcsec_gss_krb5 nfsv4 nfs > > fscache netfs ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) > > binfmt_misc nls_iso8859_1 intel_rapl_msr snd_hda_codec_realtek > > snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel > > snd_intel_dspcfg intel_rapl_common snd_intel_sdw_acpi edac_mce_amd > > snd_hda_codec kvm_amd snd_hda_core kvm snd_hwdep irqbypass snd_pcm > > rapl wmi_bmof snd_timer k10temp snd ccp soundcore joydev input_leds > > mac_hid dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua > bonding tls > > msr nfsd efi_pstore auth_rpcgss nfs_acl lockd grace sunrpc dmi_sysfs > > ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 > > async_raid6_recov async_memcpy async_pq async_xor async_tx xor > > raid6_pq libcrc32c raid1 raid0 multipath linear > > hid_logitech_hidpp hid_logitech_dj amdgpu hid_generic iommu_v2 > > drm_buddy gpu_sched drm_ttm_helper ttm drm_display_helper uas cec > > rc_core usbhid hid usb_storage drm_kms_helper syscopyarea sysfillrect > > sysimgblt crct10dif_pclmul igb crc32_pclmul polyval_clmulni > > polyval_generic ghash_clmulni_intel dca sha512_ssse3 aesni_intel > > crypto_simd drm nvme ahci cryptd libahci qlcnic i2c_algo_bit > nvme_core > > mpt3sas xhci_pci video raid_class scsi_transport_sas xhci_pci_renesas > > nvme_common i2c_piix4 wmi > > CPU: 0 PID: 6659 Comm: btrfs Tainted: G W O > > 6.2.0-23-generic #23+btrdebug2c > > Hardware name: To Be Filled By O.E.M. X570M Pro4/X570M Pro4, BIOS > > P3.70 02/23/2022 > > RIP: 0010:__btrfs_free_extent+0xa18/0xf50 [btrfs] > > Code: 48 c7 c6 80 19 71 c1 48 8b 78 50 e8 82 57 0e 00 41 b8 01 00 00 > > 00 e9 58 fe ff ff 8b 75 94 48 c7 c7 a8 19 71 c1 e8 d8 92 4d c7 > <0f> 0b > > e9 64 fb ff ff 8b 7d 90 e8 b9 04 ff ff 84 c0 0f 85 f1 01 00 > > RSP: 0018:ffffb05e4746fa38 EFLAGS: 00010246 > > RAX: 0000000000000000 RBX: 0000b711db1d0000 RCX: 0000000000000000 > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > > RBP: ffffb05e4746fad8 R08: 0000000000000000 R09: 0000000000000000 > > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 > > R13: 0000000000000000 R14: ffff88edc031ea90 R15: ffff88edc3ba0230 > > FS: 00007f2b14740d40(0000) GS:ffff88f4e0a00000(0000) > knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 000000c000253000 CR3: 00000001e7cc8000 CR4: 00000000003506f0 > > Call Trace: > > <TASK> > > run_delayed_tree_ref+0x69/0x1b0 [btrfs] > > btrfs_run_delayed_refs_for_head+0x3aa/0x520 [btrfs] > > ? btrfs_create_pending_block_groups+0x280/0x4d0 [btrfs] > > __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] > > btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] > > commit_cowonly_roots+0x1e7/0x240 [btrfs] > > btrfs_commit_transaction+0x5d2/0xbc0 [btrfs] > > ? start_transaction+0xc8/0x600 [btrfs] > > btrfs_dev_replace_cancel+0x168/0x2e0 [btrfs] > > btrfs_ioctl+0x12ed/0x14d0 [btrfs] > > ? __handle_mm_fault+0x661/0x720 > > __x64_sys_ioctl+0xa0/0xe0 > > do_syscall_64+0x5b/0x90 > > ? do_user_addr_fault+0x1e8/0x720 > > ? exit_to_user_mode_prepare+0x30/0xb0 > > ? irqentry_exit_to_user_mode+0x9/0x20 > > ? irqentry_exit+0x43/0x50 > > ? exc_page_fault+0x91/0x1b0 > > entry_SYSCALL_64_after_hwframe+0x72/0xdc > > RIP: 0033:0x7f2b145119ef > > Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 > > 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 > <89> c2 > > 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00 > > RSP: 002b:00007ffcda96ca10 EFLAGS: 00000246 ORIG_RAX: > 0000000000000010 > > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2b145119ef > > RDX: 00007ffcda96ca80 RSI: 00000000ca289435 RDI: 0000000000000003 > > RBP: 0000000000000003 R08: 0000000000021001 R09: 0000000000000000 > > R10: fffffffffffff000 R11: 0000000000000246 R12: 00007ffcda96e7eb > > R13: 000056092aafbe60 R14: 000056092aab3578 R15: 0000000000000000 > > </TASK> > > ---[ end trace 0000000000000000 ]--- > > BTRFS info (device sdf: state A): dumping space info: > > BTRFS info (device sdf: state A): space_info DATA has 219646795776 > > free, is not full > > BTRFS info (device sdf: state A): space_info total=71845742116864, > > used=71626091782144, pinned=0, reserved=0, may_use=0, > readonly=3538944 > > zone_unusable=0 > > BTRFS info (device sdf: state A): space_info METADATA has -536821760 > > free, is full > > BTRFS info (device sdf: state A): space_info total=83481329664, > > used=83421233152, pinned=57606144, reserved=2490368, > > may_use=536821760, readonly=0 zone_unusable=0 > > BTRFS info (device sdf: state A): space_info SYSTEM has 20676608 > free, > > is not full > > BTRFS info (device sdf: state A): space_info total=26214400, > > used=5537792, pinned=0, reserved=0, may_use=0, readonly=0 > > zone_unusable=0 > > BTRFS info (device sdf: state A): global_block_rsv: size 536870912 > > reserved 536805376 > > BTRFS info (device sdf: state A): trans_block_rsv: size 0 reserved 0 > > BTRFS info (device sdf: state A): chunk_block_rsv: size 0 reserved 0 > > BTRFS info (device sdf: state A): delayed_block_rsv: size 0 > reserved 0 > > BTRFS info (device sdf: state A): delayed_refs_rsv: size 523239424 > > reserved 16384 > > BTRFS: error (device sdf: state A) in __btrfs_free_extent:3077: > > errno=-28 No space left > > BTRFS info (device sdf: state EA): forced readonly > > BTRFS error (device sdf: state EA): failed to run delayed ref for > > logical 201287318437888 num_bytes 16384 type 176 action 2 ref_mod 1: > > -28 > > BTRFS: error (device sdf: state EA) in btrfs_run_delayed_refs:2151: > > errno=-28 No space left > > BTRFS warning (device sdf: state EA): Skipping commit of aborted > transaction. > > BTRFS: error (device sdf: state EA) in cleanup_transaction:1986: > > errno=-28 No space left > > ------------[ cut here ]------------ > > WARNING: CPU: 0 PID: 6659 at fs/btrfs/dev-replace.c:1121 > > btrfs_dev_replace_cancel+0x2b0/0x2e0 [btrfs] > > Modules linked in: xt_nat xt_tcpudp veth xt_conntrack nft_chain_nat > > xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 > > nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables > > nfnetlink br_netfilter bridge stp llc rpcsec_gss_krb5 nfsv4 nfs > > fscache netfs ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) > > binfmt_misc nls_iso8859_1 intel_rapl_msr snd_hda_codec_realtek > > snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel > > snd_intel_dspcfg intel_rapl_common snd_intel_sdw_acpi edac_mce_amd > > snd_hda_codec kvm_amd snd_hda_core kvm snd_hwdep irqbypass snd_pcm > > rapl wmi_bmof snd_timer k10temp snd ccp soundcore joydev input_leds > > mac_hid dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua > bonding tls > > msr nfsd efi_pstore auth_rpcgss nfs_acl lockd grace sunrpc dmi_sysfs > > ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 > > async_raid6_recov async_memcpy async_pq async_xor async_tx xor > > raid6_pq libcrc32c raid1 raid0 multipath linear > > 2023-07-22T14:04:29.956673+09:30 ltsnas kernel: [ 422.690184] > > hid_logitech_hidpp hid_logitech_dj amdgpu hid_generic iommu_v2 > > drm_buddy gpu_sched drm_ttm_helper ttm drm_display_helper uas cec > > rc_core usbhid hid usb_storage drm_kms_helper syscopyarea sysfillrect > > sysimgblt crct10dif_pclmul igb crc32_pclmul polyval_clmulni > > polyval_generic ghash_clmulni_intel dca sha512_ssse3 aesni_intel > > crypto_simd drm nvme ahci cryptd libahci qlcnic i2c_algo_bit > nvme_core > > mpt3sas xhci_pci video raid_class scsi_transport_sas xhci_pci_renesas > > nvme_common i2c_piix4 wmi > > CPU: 0 PID: 6659 Comm: btrfs Tainted: G W O > > 6.2.0-23-generic #23+btrdebug2c > > Hardware name: To Be Filled By O.E.M. X570M Pro4/X570M Pro4, BIOS > > P3.70 02/23/2022 > > RIP: 0010:btrfs_dev_replace_cancel+0x2b0/0x2e0 [btrfs] > > Code: 4c 89 c2 e8 52 3f 02 00 e8 9d 4a 4e c7 e9 35 ff ff ff 4c 89 e7 > > 48 89 45 d0 e8 bc d5 3f c8 48 8b 45 d0 41 89 c5 e9 38 ff ff ff > <0f> 0b > > e9 b9 fe ff ff 41 bd e2 ff ff ff e9 26 ff ff ff 48 c7 c2 74 > > RSP: 0018:ffffb05e4746fd58 EFLAGS: 00010286 > > RAX: 00000000ffffffe4 RBX: ffff88edda916000 RCX: 0000000000000000 > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > > RBP: ffffb05e4746fd88 R08: 0000000000000000 R09: 0000000000000000 > > R10: 0000000000000000 R11: 0000000000000000 R12: ffff88edda916ab0 > > R13: ffff88eddb627800 R14: ffff88ede7fad000 R15: ffff88edda916ad0 > > FS: 00007f2b14740d40(0000) GS:ffff88f4e0a00000(0000) > knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 000000c000253000 CR3: 00000001e7cc8000 CR4: 00000000003506f0 > > Call Trace: > > <TASK> > > btrfs_ioctl+0x12ed/0x14d0 [btrfs] > > ? __handle_mm_fault+0x661/0x720 > > __x64_sys_ioctl+0xa0/0xe0 > > do_syscall_64+0x5b/0x90 > > ? do_user_addr_fault+0x1e8/0x720 > > ? exit_to_user_mode_prepare+0x30/0xb0 > > ? irqentry_exit_to_user_mode+0x9/0x20 > > ? irqentry_exit+0x43/0x50 > > ? exc_page_fault+0x91/0x1b0 > > entry_SYSCALL_64_after_hwframe+0x72/0xdc > > RIP: 0033:0x7f2b145119ef > > Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 > > 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 > <89> c2 > > 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00 > > RSP: 002b:00007ffcda96ca10 EFLAGS: 00000246 ORIG_RAX: > 0000000000000010 > > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2b145119ef > > RDX: 00007ffcda96ca80 RSI: 00000000ca289435 RDI: 0000000000000003 > > RBP: 0000000000000003 R08: 0000000000021001 R09: 0000000000000000 > > R10: fffffffffffff000 R11: 0000000000000246 R12: 00007ffcda96e7eb > > R13: 000056092aafbe60 R14: 000056092aab3578 R15: 0000000000000000 > > </TASK> > > ---[ end trace 0000000000000000 ]--- > > BTRFS info (device sdf: state EA): suspended dev_replace from > /dev/sdl > > (devid 4) to <missing disk> canceled > > BTRFS error (device sdf: state EA): failed to add disk > /dev/loop20: -30 > > BTRFS error (device sdf: state EA): failed to add disk > /dev/loop21: -30 > > BTRFS error (device sdf: state EA): failed to add disk > /dev/loop22: -30 > > BTRFS error (device sdf: state EA): failed to add disk > /dev/loop23: -30 > > > > On Mon, 26 Jun 2023 at 22:28, Stefan N <stefannnau@gmail.com > <mailto:stefannnau@gmail.com>> wrote: > >> > >> Hi Qu, > >> > >> Thanks for all the help, I managed to get it mounted and synced with > >> 5G loops (2G allocated to metadata, 3G unallocated on each). > >> > >> I'm able to read existing files, write new files, and any changes > >> remain after an unmount and remount. > >> > >> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; sudo > btrfs > >> dev add -K -f /dev/loop20 /dev/loop21 /dev/loop22 /dev/loop23 > >> /mnt/data ; sudo btrfs fi sync /mnt/data > >> $ sudo btrfs fi show > >> Label: none uuid: abc123 > >> Total devices 12 FS bytes used 64.52TiB > >> devid 1 size 10.91TiB used 10.89TiB path /dev/sdd > >> devid 2 size 10.91TiB used 10.89TiB path /dev/sdh > >> devid 3 size 10.91TiB used 10.89TiB path /dev/sdb > >> devid 4 size 10.91TiB used 10.89TiB path /dev/sdg > >> devid 5 size 10.91TiB used 10.89TiB path /dev/sdi > >> devid 6 size 10.91TiB used 10.89TiB path /dev/sde > >> devid 7 size 10.91TiB used 10.89TiB path /dev/sdf > >> devid 8 size 10.91TiB used 10.89TiB path /dev/sdc > >> devid 9 size 5.00GiB used 2.00GiB path /dev/loop20 > >> devid 10 size 5.00GiB used 2.00GiB path /dev/loop21 > >> devid 11 size 5.00GiB used 2.00GiB path /dev/loop22 > >> devid 12 size 5.00GiB used 2.00GiB path /dev/loop23 > >> $ > >> > >> I'd be keen to know what you'd suggest for next steps. I have > two 18T > >> disks to upgrade two of the existing 12T disks, which could be a > >> substitute or add them over USB for a while. > >> > >> While a random sample of files seem to be perfectly intact, I'd be > >> keen to verify the integrity to track down any corrupted files. > >> > >> Should I perform a scrub before adding/replacing the new disks, > or can > >> this be safely done afterwards? e.g. can I safely add 2x18tb, remove > >> loops, begin scrub, and then remove 2x 12tb when scrub completes? > >> > >> See kernel log below: > >> > >> kernel: [ 399.272458] BTRFS info (device sdd): using crc32c > >> (crc32c-intel) checksum algorithm > >> kernel: [ 399.272476] BTRFS info (device sdd): disk space > caching is enabled > >> kernel: [ 404.855750] BTRFS info (device sdd): bdev /dev/sdh > errs: wr > >> 0, rd 0, flush 0, corrupt 845, gen 0 > >> kernel: [ 404.855766] BTRFS info (device sdd): bdev /dev/sdb > errs: wr > >> 41089, rd 1556, flush 0, corrupt 0, gen 0 > >> kernel: [ 404.855778] BTRFS info (device sdd): bdev /dev/sdi > errs: wr > >> 3, rd 7, flush 0, corrupt 0, gen 0 > >> kernel: [ 404.855785] BTRFS info (device sdd): bdev /dev/sde > errs: wr > >> 41, rd 0, flush 0, corrupt 0, gen 0 > >> kernel: [ 630.844173] BTRFS info (device sdd): balance: resume > skipped > >> kernel: [ 630.844185] BTRFS info (device sdd): checking UUID tree > >> kernel: [ 630.871787] BTRFS info (device sdd): disk added > /dev/loop20 > >> kernel: [ 630.881223] BTRFS info (device sdd): disk added > /dev/loop21 > >> kernel: [ 630.888817] BTRFS info (device sdd): disk added > /dev/loop22 > >> kernel: [ 630.896302] BTRFS info (device sdd): disk added > /dev/loop23 > >> kernel: [ 846.849616] INFO: task btrfs-uuid:4834 blocked for more > >> than 120 seconds. > >> kernel: [ 846.849660] Tainted: G W O > >> 6.2.0-23-generic #23+btrdebug2c > >> kernel: [ 846.849693] "echo 0 > > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. > >> kernel: [ 846.849725] task:btrfs-uuid state:D stack:0 > >> pid:4834 ppid:2 flags:0x00004000 > >> kernel: [ 846.849735] Call Trace: > >> kernel: [ 846.849739] <TASK> > >> kernel: [ 846.849747] __schedule+0x2aa/0x610 > >> kernel: [ 846.849761] schedule+0x63/0x110 > >> kernel: [ 846.849769] wait_current_trans+0x100/0x160 [btrfs] > >> kernel: [ 846.849908] ? __pfx_autoremove_wake_function+0x10/0x10 > >> kernel: [ 846.849920] start_transaction+0x28b/0x600 [btrfs] > >> kernel: [ 846.850057] btrfs_start_transaction+0x1e/0x30 [btrfs] > >> kernel: [ 846.850191] btrfs_uuid_scan_kthread+0x314/0x420 [btrfs] > >> kernel: [ 846.850359] ? > __pfx_btrfs_uuid_rescan_kthread+0x10/0x10 [btrfs] > >> kernel: [ 846.850487] btrfs_uuid_rescan_kthread+0x20/0x70 [btrfs] > >> kernel: [ 846.850614] kthread+0xe9/0x110 > >> kernel: [ 846.850623] ? __pfx_kthread+0x10/0x10 > >> kernel: [ 846.850631] ret_from_fork+0x2c/0x50 > >> kernel: [ 846.850642] </TASK> > >> kernel: [ 846.850645] INFO: task btrfs:4850 blocked for more > than 120 seconds. > >> kernel: [ 846.850676] Tainted: G W O > >> 6.2.0-23-generic #23+btrdebug2c > >> kernel: [ 846.850707] "echo 0 > > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. > >> kernel: [ 846.850738] task:btrfs state:D stack:0 > >> pid:4850 ppid:4849 flags:0x00000002 > >> kernel: [ 846.850746] Call Trace: > >> kernel: [ 846.850749] <TASK> > >> kernel: [ 846.850752] __schedule+0x2aa/0x610 > >> kernel: [ 846.850760] schedule+0x63/0x110 > >> kernel: [ 846.850765] btrfs_commit_transaction+0x9b7/0xbc0 [btrfs] > >> kernel: [ 846.850899] ? __pfx_autoremove_wake_function+0x10/0x10 > >> kernel: [ 846.850908] btrfs_sync_fs+0x5a/0x1b0 [btrfs] > >> kernel: [ 846.851027] btrfs_ioctl+0x643/0x14d0 [btrfs] > >> kernel: [ 846.851186] ? putname+0x5d/0x80 > >> kernel: [ 846.851195] ? do_sys_openat2+0xab/0x180 > >> kernel: [ 846.851203] ? exit_to_user_mode_prepare+0x30/0xb0 > >> kernel: [ 846.851213] __x64_sys_ioctl+0xa0/0xe0 > >> kernel: [ 846.851221] do_syscall_64+0x5b/0x90 > >> kernel: [ 846.851229] ? exc_page_fault+0x91/0x1b0 > >> kernel: [ 846.851236] entry_SYSCALL_64_after_hwframe+0x72/0xdc > >> kernel: [ 846.851243] RIP: 0033:0x7fbf339119ef > >> kernel: [ 846.851249] RSP: 002b:00007ffd58427660 EFLAGS: 00000246 > >> ORIG_RAX: 0000000000000010 > >> kernel: [ 846.851255] RAX: ffffffffffffffda RBX: 0000000000000003 > >> RCX: 00007fbf339119ef > >> kernel: [ 846.851259] RDX: 0000000000000000 RSI: 0000000000009408 > >> RDI: 0000000000000003 > >> kernel: [ 846.851263] RBP: 0000000000000007 R08: 0000000000000000 > >> R09: 0000000000000000 > >> kernel: [ 846.851266] R10: 0000000000000000 R11: 0000000000000246 > >> R12: 00007fbf339f642c > >> kernel: [ 846.851269] R13: 0000000000000001 R14: 0000557384b29578 > >> R15: 0000000000000000 > >> kernel: [ 846.851277] </TASK> > >> kernel: [ 967.681770] INFO: task btrfs-uuid:4834 blocked for more > >> than 241 seconds. > >> kernel: [ 967.681818] Tainted: G W O > >> 6.2.0-23-generic #23+btrdebug2c > >> kernel: [ 967.681852] "echo 0 > > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. > >> kernel: [ 967.681884] task:btrfs-uuid state:D stack:0 > >> pid:4834 ppid:2 flags:0x00004000 > >> kernel: [ 967.681895] Call Trace: > >> kernel: [ 967.681899] <TASK> > >> kernel: [ 967.681907] __schedule+0x2aa/0x610 > >> kernel: [ 967.681922] schedule+0x63/0x110 > >> kernel: [ 967.681931] wait_current_trans+0x100/0x160 [btrfs] > >> kernel: [ 967.682070] ? __pfx_autoremove_wake_function+0x10/0x10 > >> kernel: [ 967.682082] start_transaction+0x28b/0x600 [btrfs] > >> kernel: [ 967.682219] btrfs_start_transaction+0x1e/0x30 [btrfs] > >> kernel: [ 967.682353] btrfs_uuid_scan_kthread+0x314/0x420 [btrfs] > >> kernel: [ 967.682519] ? > __pfx_btrfs_uuid_rescan_kthread+0x10/0x10 [btrfs] > >> kernel: [ 967.682645] btrfs_uuid_rescan_kthread+0x20/0x70 [btrfs] > >> kernel: [ 967.682728] kthread+0xe9/0x110 > >> kernel: [ 967.682734] ? __pfx_kthread+0x10/0x10 > >> kernel: [ 967.682739] ret_from_fork+0x2c/0x50 > >> kernel: [ 967.682746] </TASK> > >> kernel: [ 967.682749] INFO: task btrfs:4850 blocked for more > than 241 seconds. > >> kernel: [ 967.682771] Tainted: G W O > >> 6.2.0-23-generic #23+btrdebug2c > >> kernel: [ 967.682793] "echo 0 > > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. > >> kernel: [ 967.682815] task:btrfs state:D stack:0 > >> pid:4850 ppid:4849 flags:0x00000002 > >> kernel: [ 967.682820] Call Trace: > >> kernel: [ 967.682822] <TASK> > >> kernel: [ 967.682824] __schedule+0x2aa/0x610 > >> kernel: [ 967.682829] schedule+0x63/0x110 > >> kernel: [ 967.682832] btrfs_commit_transaction+0x9b7/0xbc0 [btrfs] > >> kernel: [ 967.682918] ? __pfx_autoremove_wake_function+0x10/0x10 > >> kernel: [ 967.682923] btrfs_sync_fs+0x5a/0x1b0 [btrfs] > >> kernel: [ 967.682999] btrfs_ioctl+0x643/0x14d0 [btrfs] > >> kernel: [ 967.683085] ? putname+0x5d/0x80 > >> kernel: [ 967.683091] ? do_sys_openat2+0xab/0x180 > >> kernel: [ 967.683096] ? exit_to_user_mode_prepare+0x30/0xb0 > >> kernel: [ 967.683103] __x64_sys_ioctl+0xa0/0xe0 > >> kernel: [ 967.683107] do_syscall_64+0x5b/0x90 > >> kernel: [ 967.683112] ? exc_page_fault+0x91/0x1b0 > >> kernel: [ 967.683116] entry_SYSCALL_64_after_hwframe+0x72/0xdc > >> kernel: [ 967.683121] RIP: 0033:0x7fbf339119ef > >> kernel: [ 967.683124] RSP: 002b:00007ffd58427660 EFLAGS: 00000246 > >> ORIG_RAX: 0000000000000010 > >> kernel: [ 967.683128] RAX: ffffffffffffffda RBX: 0000000000000003 > >> RCX: 00007fbf339119ef > >> kernel: [ 967.683130] RDX: 0000000000000000 RSI: 0000000000009408 > >> RDI: 0000000000000003 > >> kernel: [ 967.683132] RBP: 0000000000000007 R08: 0000000000000000 > >> R09: 0000000000000000 > >> kernel: [ 967.683134] R10: 0000000000000000 R11: 0000000000000246 > >> R12: 00007fbf339f642c > >> kernel: [ 967.683136] R13: 0000000000000001 R14: 0000557384b29578 > >> R15: 0000000000000000 > >> kernel: [ 967.683141] </TASK> > >> kernel: [ 1088.519959] INFO: task btrfs-uuid:4834 blocked for more > >> than 362 seconds. > >> kernel: [ 1088.520006] Tainted: G W O > >> 6.2.0-23-generic #23+btrdebug2c > >> kernel: [ 1088.520039] "echo 0 > > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. > >> kernel: [ 1088.520071] task:btrfs-uuid state:D stack:0 > >> pid:4834 ppid:2 flags:0x00004000 > >> kernel: [ 1088.520082] Call Trace: > >> kernel: [ 1088.520087] <TASK> > >> kernel: [ 1088.520094] __schedule+0x2aa/0x610 > >> kernel: [ 1088.520108] schedule+0x63/0x110 > >> kernel: [ 1088.520117] wait_current_trans+0x100/0x160 [btrfs] > >> kernel: [ 1088.520257] ? __pfx_autoremove_wake_function+0x10/0x10 > >> kernel: [ 1088.520269] start_transaction+0x28b/0x600 [btrfs] > >> kernel: [ 1088.520406] btrfs_start_transaction+0x1e/0x30 [btrfs] > >> kernel: [ 1088.520539] btrfs_uuid_scan_kthread+0x314/0x420 [btrfs] > >> kernel: [ 1088.520706] ? > __pfx_btrfs_uuid_rescan_kthread+0x10/0x10 [btrfs] > >> kernel: [ 1088.520834] btrfs_uuid_rescan_kthread+0x20/0x70 [btrfs] > >> kernel: [ 1088.520961] kthread+0xe9/0x110 > >> kernel: [ 1088.520969] ? __pfx_kthread+0x10/0x10 > >> kernel: [ 1088.520977] ret_from_fork+0x2c/0x50 > >> kernel: [ 1088.520987] </TASK> > >> kernel: [ 1088.520990] INFO: task btrfs:4850 blocked for more > than 362 seconds. > >> kernel: [ 1088.521021] Tainted: G W O > >> 6.2.0-23-generic #23+btrdebug2c > >> kernel: [ 1088.521052] "echo 0 > > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. > >> kernel: [ 1088.521084] task:btrfs state:D stack:0 > >> pid:4850 ppid:4849 flags:0x00000002 > >> kernel: [ 1088.521092] Call Trace: > >> kernel: [ 1088.521095] <TASK> > >> kernel: [ 1088.521098] __schedule+0x2aa/0x610 > >> kernel: [ 1088.521106] schedule+0x63/0x110 > >> kernel: [ 1088.521111] btrfs_commit_transaction+0x9b7/0xbc0 [btrfs] > >> kernel: [ 1088.521245] ? __pfx_autoremove_wake_function+0x10/0x10 > >> kernel: [ 1088.521254] btrfs_sync_fs+0x5a/0x1b0 [btrfs] > >> kernel: [ 1088.521372] btrfs_ioctl+0x643/0x14d0 [btrfs] > >> kernel: [ 1088.521530] ? putname+0x5d/0x80 > >> kernel: [ 1088.521539] ? do_sys_openat2+0xab/0x180 > >> kernel: [ 1088.521548] ? exit_to_user_mode_prepare+0x30/0xb0 > >> kernel: [ 1088.521559] __x64_sys_ioctl+0xa0/0xe0 > >> kernel: [ 1088.521567] do_syscall_64+0x5b/0x90 > >> kernel: [ 1088.521575] ? exc_page_fault+0x91/0x1b0 > >> kernel: [ 1088.521582] entry_SYSCALL_64_after_hwframe+0x72/0xdc > >> kernel: [ 1088.521589] RIP: 0033:0x7fbf339119ef > >> kernel: [ 1088.521595] RSP: 002b:00007ffd58427660 EFLAGS: 00000246 > >> ORIG_RAX: 0000000000000010 > >> kernel: [ 1088.521602] RAX: ffffffffffffffda RBX: 0000000000000003 > >> RCX: 00007fbf339119ef > >> kernel: [ 1088.521606] RDX: 0000000000000000 RSI: 0000000000009408 > >> RDI: 0000000000000003 > >> kernel: [ 1088.521610] RBP: 0000000000000007 R08: 0000000000000000 > >> R09: 0000000000000000 > >> kernel: [ 1088.521613] R10: 0000000000000000 R11: 0000000000000246 > >> R12: 00007fbf339f642c > >> kernel: [ 1088.521616] R13: 0000000000000001 R14: 0000557384b29578 > >> R15: 0000000000000000 > >> kernel: [ 1088.521626] </TASK> > >> kernel: [ 1209.357423] INFO: task btrfs-uuid:4834 blocked for more > >> than 483 seconds. > >> kernel: [ 1209.357473] Tainted: G W O > >> 6.2.0-23-generic #23+btrdebug2c > >> kernel: [ 1209.357507] "echo 0 > > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. > >> kernel: [ 1209.357540] task:btrfs-uuid state:D stack:0 > >> pid:4834 ppid:2 flags:0x00004000 > >> kernel: [ 1209.357551] Call Trace: > >> kernel: [ 1209.357555] <TASK> > >> kernel: [ 1209.357563] __schedule+0x2aa/0x610 > >> kernel: [ 1209.357577] schedule+0x63/0x110 > >> kernel: [ 1209.357597] wait_current_trans+0x100/0x160 [btrfs] > >> kernel: [ 1209.357738] ? __pfx_autoremove_wake_function+0x10/0x10 > >> kernel: [ 1209.357750] start_transaction+0x28b/0x600 [btrfs] > >> kernel: [ 1209.357887] btrfs_start_transaction+0x1e/0x30 [btrfs] > >> kernel: [ 1209.358021] btrfs_uuid_scan_kthread+0x314/0x420 [btrfs] > >> kernel: [ 1209.358187] ? > __pfx_btrfs_uuid_rescan_kthread+0x10/0x10 [btrfs] > >> kernel: [ 1209.358315] btrfs_uuid_rescan_kthread+0x20/0x70 [btrfs] > >> kernel: [ 1209.358442] kthread+0xe9/0x110 > >> kernel: [ 1209.358451] ? __pfx_kthread+0x10/0x10 > >> kernel: [ 1209.358458] ret_from_fork+0x2c/0x50 > >> kernel: [ 1209.358468] </TASK> > >> kernel: [ 1330.195147] INFO: task btrfs-transacti:4088 blocked for > >> more than 120 seconds. > >> kernel: [ 1330.195192] Tainted: G W O > >> 6.2.0-23-generic #23+btrdebug2c > >> kernel: [ 1330.195221] "echo 0 > > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. > >> kernel: [ 1330.195250] task:btrfs-transacti state:D stack:0 > >> pid:4088 ppid:2 flags:0x00004000 > >> kernel: [ 1330.195259] Call Trace: > >> kernel: [ 1330.195263] <TASK> > >> kernel: [ 1330.195269] __schedule+0x2aa/0x610 > >> kernel: [ 1330.195281] schedule+0x63/0x110 > >> kernel: [ 1330.195288] wait_for_commit+0x14c/0x1b0 [btrfs] > >> kernel: [ 1330.195413] ? __pfx_autoremove_wake_function+0x10/0x10 > >> kernel: [ 1330.195424] btrfs_commit_transaction+0x16c/0xbc0 [btrfs] > >> kernel: [ 1330.195552] ? start_transaction+0xc8/0x600 [btrfs] > >> kernel: [ 1330.195676] transaction_kthread+0x14b/0x1c0 [btrfs] > >> kernel: [ 1330.195795] ? __pfx_transaction_kthread+0x10/0x10 > [btrfs] > >> kernel: [ 1330.195912] kthread+0xe9/0x110 > >> kernel: [ 1330.195920] ? __pfx_kthread+0x10/0x10 > >> kernel: [ 1330.195927] ret_from_fork+0x2c/0x50 > >> kernel: [ 1330.195937] </TASK> > >> kernel: [ 1330.195939] INFO: task btrfs-uuid:4834 blocked for more > >> than 604 seconds. > >> kernel: [ 1330.195968] Tainted: G W O > >> 6.2.0-23-generic #23+btrdebug2c > >> kernel: [ 1330.195997] "echo 0 > > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. > >> kernel: [ 1330.196026] task:btrfs-uuid state:D stack:0 > >> pid:4834 ppid:2 flags:0x00004000 > >> kernel: [ 1330.196033] Call Trace: > >> kernel: [ 1330.196036] <TASK> > >> kernel: [ 1330.196039] __schedule+0x2aa/0x610 > >> kernel: [ 1330.196046] schedule+0x63/0x110 > >> kernel: [ 1330.196051] wait_current_trans+0x100/0x160 [btrfs] > >> kernel: [ 1330.196169] ? __pfx_autoremove_wake_function+0x10/0x10 > >> kernel: [ 1330.196177] start_transaction+0x28b/0x600 [btrfs] > >> kernel: [ 1330.196298] btrfs_start_transaction+0x1e/0x30 [btrfs] > >> kernel: [ 1330.196416] btrfs_uuid_scan_kthread+0x314/0x420 [btrfs] > >> kernel: [ 1330.196565] ? > __pfx_btrfs_uuid_rescan_kthread+0x10/0x10 [btrfs] > >> kernel: [ 1330.196680] btrfs_uuid_rescan_kthread+0x20/0x70 [btrfs] > >> kernel: [ 1330.196794] kthread+0xe9/0x110 > >> kernel: [ 1330.196800] ? __pfx_kthread+0x10/0x10 > >> kernel: [ 1330.196807] ret_from_fork+0x2c/0x50 > >> kernel: [ 1330.196814] </TASK> > >> kernel: [ 1451.031238] INFO: task btrfs-transacti:4088 blocked for > >> more than 241 seconds. > >> kernel: [ 1451.031286] Tainted: G W O > >> 6.2.0-23-generic #23+btrdebug2c > >> kernel: [ 1451.031319] "echo 0 > > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. > >> kernel: [ 1451.031352] task:btrfs-transacti state:D stack:0 > >> pid:4088 ppid:2 flags:0x00004000 > >> kernel: [ 1451.031362] Call Trace: > >> kernel: [ 1451.031366] <TASK> > >> kernel: [ 1451.031373] __schedule+0x2aa/0x610 > >> kernel: [ 1451.031388] schedule+0x63/0x110 > >> kernel: [ 1451.031396] wait_for_commit+0x14c/0x1b0 [btrfs] > >> kernel: [ 1451.031535] ? __pfx_autoremove_wake_function+0x10/0x10 > >> kernel: [ 1451.031548] btrfs_commit_transaction+0x16c/0xbc0 [btrfs] > >> kernel: [ 1451.031684] ? start_transaction+0xc8/0x600 [btrfs] > >> kernel: [ 1451.031819] transaction_kthread+0x14b/0x1c0 [btrfs] > >> kernel: [ 1451.031951] ? __pfx_transaction_kthread+0x10/0x10 > [btrfs] > >> kernel: [ 1451.032082] kthread+0xe9/0x110 > >> kernel: [ 1451.032091] ? __pfx_kthread+0x10/0x10 > >> kernel: [ 1451.032098] ret_from_fork+0x2c/0x50 > >> kernel: [ 1451.032108] </TASK> > >> > >> On Mon, 26 Jun 2023 at 19:48, Qu Wenruo <quwenruo.btrfs@gmx.com > <mailto:quwenruo.btrfs@gmx.com>> wrote: > >>> > >>> > >>> > >>> On 2023/6/24 23:29, Stefan N wrote: > >>>> Whoops, I had left --dry-run on the first debug patch you > commited, so > >>>> that didn't run correctly. > >>>> > >>>> I've included the output from both patches, as they result in > different output. > >>>> > >>>> Rerunning the older patch first, with loop devices (I tried both > >>>> 4x100mb and 4x1gb) I get the following: > >>>> > >>> [...] > >>>> *** The below is using the newer patch as follows: > >>>> $ diff fs/btrfs/ ../linux-6.2.0-dist/fs/btrfs/ > >>>> diff fs/btrfs/ioctl.c ../linux-6.2.0-dist/fs/btrfs/ioctl.c > >>>> 2656,2658d2655 > >>>> < else > >>>> < btrfs_err(fs_info, "failed to add disk %s: %d", > >>>> < vol_args->name, ret); > >>>> diff fs/btrfs/transaction.c > ../linux-6.2.0-dist/fs/btrfs/transaction.c > >>>> 1029d1028 > >>>> < /* > >>>> 1031d1029 > >>>> < */ > >>>> diff fs/btrfs/volumes.c ../linux-6.2.0-dist/fs/btrfs/volumes.c > >>>> 2677c2677 > >>>> < trans = btrfs_join_transaction(root); > >>>> --- > >>>>> trans = btrfs_start_transaction(root, 0); > >>>> 2680d2679 > >>>> < btrfs_err(fs_info, "failed to start trans: > %d", ret); > >>>> 2769d2767 > >>>> < btrfs_err(fs_info, "failed to add dev item: > %d", ret); > >>>> 2787,2789c2785 > >>>> < ret = btrfs_end_transaction(trans); > >>>> < if (ret < 0) > >>>> < btrfs_err(fs_info, "failed to end trans: %d", > ret); > >>>> --- > >>>>> ret = btrfs_commit_transaction(trans); > >>>> $ > >>>> > >>>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; > sudo btrfs > >>>> dev add -K -f /dev/loop12 /dev/loop13 /dev/loop14 /dev/loop15 > >>>> /mnt/data ; sudo btrfs fi sync /mnt/data > >>>> ERROR: Could not sync filesystem: No space left on device > >>> > >>> Is it the same even with 4x1GiB loopback devices? > >>> > >>>> $ > >>>> > >>>> kernel: [ 1811.846087] BTRFS info (device sdc): using crc32c > >>>> (crc32c-intel) checksum algorithm > >>>> kernel: [ 1811.846107] BTRFS info (device sdc): disk space > caching is enabled > >>>> kernel: [ 1817.852850] BTRFS info (device sdc): bdev /dev/sde > errs: wr > >>>> 0, rd 0, flush 0, corrupt 845, gen 0 > >>>> kernel: [ 1817.852866] BTRFS info (device sdc): bdev /dev/sda > errs: wr > >>>> 41089, rd 1556, flush 0, corrupt 0, gen 0 > >>>> kernel: [ 1817.852877] BTRFS info (device sdc): bdev /dev/sdh > errs: wr > >>>> 3, rd 7, flush 0, corrupt 0, gen 0 > >>>> kernel: [ 1817.852884] BTRFS info (device sdc): bdev /dev/sdd > errs: wr > >>>> 41, rd 0, flush 0, corrupt 0, gen 0 > >>>> kernel: [ 2037.562050] BTRFS info (device sdc): balance: > resume skipped > >>>> kernel: [ 2037.562064] BTRFS info (device sdc): checking UUID tree > >>>> kernel: [ 2037.581550] BTRFS info (device sdc): disk added > /dev/loop12 > >>>> kernel: [ 2037.591163] BTRFS info (device sdc): disk added > /dev/loop13 > >>>> kernel: [ 2037.599477] BTRFS info (device sdc): disk added > /dev/loop14 > >>>> kernel: [ 2037.607064] BTRFS info (device sdc): disk added > /dev/loop15 > >>>> kernel: [ 2176.124630] INFO: task btrfs:7783 blocked for more > than 120 seconds. > >>>> kernel: [ 2176.124678] Tainted: G W O > >>>> 6.2.0-23-generic #23+btrdebug2c > >>>> kernel: [ 2176.124710] "echo 0 > > >>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message. > >>>> kernel: [ 2176.124742] task:btrfs state:D stack:0 > >>>> pid:7783 ppid:7782 flags:0x00004002 > >>>> kernel: [ 2176.124753] Call Trace: > >>>> kernel: [ 2176.124758] <TASK> > >>>> kernel: [ 2176.124765] __schedule+0x2aa/0x610 > >>>> kernel: [ 2176.124780] schedule+0x63/0x110 > >>>> kernel: [ 2176.124788] btrfs_commit_transaction+0x9b7/0xbc0 > [btrfs] > >>> > >>> This means we're doing the real work, but it seems to take too > long. > >>> > >>> In fact this is already looking promising as we have when > through the > >>> whole device add part. > >>> > >>> Just need to let the final commit to finish. > >>> > >>>> kernel: [ 2176.124929] ? __pfx_autoremove_wake_function+0x10/0x10 > >>>> kernel: [ 2176.124941] btrfs_sync_fs+0x5a/0x1b0 [btrfs] > >>>> kernel: [ 2176.125060] btrfs_ioctl+0x643/0x14d0 [btrfs] > >>>> kernel: [ 2176.125225] __x64_sys_ioctl+0xa0/0xe0 > >>>> kernel: [ 2176.125235] do_syscall_64+0x5b/0x90 > >>>> kernel: [ 2176.125242] ? do_sys_openat2+0xab/0x180 > >>>> kernel: [ 2176.125251] ? exit_to_user_mode_prepare+0x30/0xb0 > >>>> kernel: [ 2176.125260] ? syscall_exit_to_user_mode+0x29/0x50 > >>>> kernel: [ 2176.125268] ? do_syscall_64+0x67/0x90 > >>>> kernel: [ 2176.125275] entry_SYSCALL_64_after_hwframe+0x72/0xdc > >>>> kernel: [ 2176.125282] RIP: 0033:0x7f2e8eb119ef > >>>> kernel: [ 2176.125288] RSP: 002b:00007ffd632b6aa0 EFLAGS: 00000246 > >>>> ORIG_RAX: 0000000000000010 > >>>> kernel: [ 2176.125295] RAX: ffffffffffffffda RBX: 0000000000000003 > >>>> RCX: 00007f2e8eb119ef > >>>> kernel: [ 2176.125300] RDX: 0000000000000000 RSI: 0000000000009408 > >>>> RDI: 0000000000000003 > >>>> kernel: [ 2176.125303] RBP: 0000000000000007 R08: 0000000000000000 > >>>> R09: 0000000000000000 > >>>> kernel: [ 2176.125306] R10: 0000000000000000 R11: 0000000000000246 > >>>> R12: 00007f2e8ebf642c > >>>> kernel: [ 2176.125310] R13: 0000000000000001 R14: 000055cdb7940578 > >>>> R15: 0000000000000000 > >>>> kernel: [ 2176.125318] </TASK> > >>>> kernel: [ 2296.956781] INFO: task btrfs:7783 blocked for more > than 241 seconds. > >>>> kernel: [ 2296.956824] Tainted: G W O > >>>> 6.2.0-23-generic #23+btrdebug2c > >>>> kernel: [ 2296.956856] "echo 0 > > >>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message. > >>>> kernel: [ 2296.956887] task:btrfs state:D stack:0 > >>>> pid:7783 ppid:7782 flags:0x00004002 > >>>> kernel: [ 2296.956898] Call Trace: > >>>> kernel: [ 2296.956902] <TASK> > >>>> kernel: [ 2296.956908] __schedule+0x2aa/0x610 > >>>> kernel: [ 2296.956921] schedule+0x63/0x110 > >>>> kernel: [ 2296.956928] btrfs_commit_transaction+0x9b7/0xbc0 > [btrfs] > >>>> kernel: [ 2296.957069] ? __pfx_autoremove_wake_function+0x10/0x10 > >>>> kernel: [ 2296.957080] btrfs_sync_fs+0x5a/0x1b0 [btrfs] > >>>> kernel: [ 2296.957200] btrfs_ioctl+0x643/0x14d0 [btrfs] > >>>> kernel: [ 2296.957366] __x64_sys_ioctl+0xa0/0xe0 > >>>> kernel: [ 2296.957375] do_syscall_64+0x5b/0x90 > >>>> kernel: [ 2296.957383] ? do_sys_openat2+0xab/0x180 > >>>> kernel: [ 2296.957391] ? exit_to_user_mode_prepare+0x30/0xb0 > >>>> kernel: [ 2296.957399] ? syscall_exit_to_user_mode+0x29/0x50 > >>>> kernel: [ 2296.957407] ? do_syscall_64+0x67/0x90 > >>>> kernel: [ 2296.957414] entry_SYSCALL_64_after_hwframe+0x72/0xdc > >>>> kernel: [ 2296.957420] RIP: 0033:0x7f2e8eb119ef > >>>> kernel: [ 2296.957426] RSP: 002b:00007ffd632b6aa0 EFLAGS: 00000246 > >>>> ORIG_RAX: 0000000000000010 > >>>> kernel: [ 2296.957433] RAX: ffffffffffffffda RBX: 0000000000000003 > >>>> RCX: 00007f2e8eb119ef > >>>> kernel: [ 2296.957438] RDX: 0000000000000000 RSI: 0000000000009408 > >>>> RDI: 0000000000000003 > >>>> kernel: [ 2296.957441] RBP: 0000000000000007 R08: 0000000000000000 > >>>> R09: 0000000000000000 > >>>> kernel: [ 2296.957444] R10: 0000000000000000 R11: 0000000000000246 > >>>> R12: 00007f2e8ebf642c > >>>> kernel: [ 2296.957448] R13: 0000000000000001 R14: 000055cdb7940578 > >>>> R15: 0000000000000000 > >>>> kernel: [ 2296.957468] </TASK> > >>>> kernel: [ 2314.043258] ------------[ cut here ]------------ > >>>> kernel: [ 2314.043264] BTRFS: Transaction aborted (error -28) > >>>> kernel: [ 2314.043334] WARNING: CPU: 2 PID: 7739 at > >>>> fs/btrfs/extent-tree.c:2847 do_free_extent_accounting+0x21a/0x220 > >>>> [btrfs] > >>>> kernel: [ 2314.043467] Modules linked in: ipmi_devintf > ipmi_msghandler > >>>> overlay iwlwifi_compat(O) binfmt_misc nls_iso8859_1 intel_rapl_msr > >>>> snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio > >>>> intel_rapl_common snd_hda_codec_hdmi edac_mce_amd snd_hda_intel > >>>> snd_intel_dspcfg kvm_amd snd_intel_sdw_acpi snd_hda_codec kvm > >>>> snd_hda_core snd_hwdep snd_pcm snd_timer irqbypass rapl > wmi_bmof snd > >>>> k10temp ccp soundcore input_leds mac_hid dm_multipath scsi_dh_rdac > >>>> scsi_dh_emc scsi_dh_alua bonding tls msr nfsd efi_pstore > auth_rpcgss > >>>> nfs_acl lockd grace sunrpc dmi_sysfs ip_tables x_tables > autofs4 btrfs > >>>> blake2b_generic raid10 raid456 async_raid6_recov async_memcpy > async_pq > >>>> async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 > multipath linear > >>>> amdgpu iommu_v2 drm_buddy gpu_sched drm_ttm_helper hid_generic ttm > >>>> drm_display_helper cec uas rc_core usbhid hid drm_kms_helper > >>>> crct10dif_pclmul syscopyarea usb_storage crc32_pclmul > polyval_clmulni > >>>> sysfillrect polyval_generic sysimgblt nvme ghash_clmulni_intel > >>>> sha512_ssse3 > >>>> kernel: [ 2314.043599] nvme_core aesni_intel crypto_simd > mpt3sas drm > >>>> cryptd raid_class ahci i2c_piix4 scsi_transport_sas > nvme_common igb > >>>> xhci_pci qlcnic dca xhci_pci_renesas libahci i2c_algo_bit > video wmi > >>>> kernel: [ 2314.043631] CPU: 2 PID: 7739 Comm: btrfs-transacti > Tainted: > >>>> G W O 6.2.0-23-generic #23+btrdebug2c > >>>> kernel: [ 2314.043638] Hardware name: To Be Filled By O.E.M. X570M > >>>> Pro4/X570M Pro4, BIOS P3.70 02/23/2022 > >>>> kernel: [ 2314.043641] RIP: > 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] > >>>> kernel: [ 2314.043766] Code: ce 0f 0b eb b8 44 89 e6 48 c7 c7 > a8 39 a0 > >>>> c1 e8 2c d5 1e ce 0f 0b e9 78 ff ff ff 44 89 e6 48 c7 c7 a8 39 > a0 c1 > >>>> e8 16 d5 1e ce <0f> 0b eb b9 66 90 90 90 90 90 90 90 90 90 90 > 90 90 90 > >>>> 90 90 90 90 > >>>> kernel: [ 2314.043771] RSP: 0018:ffffad0b11b7bb38 EFLAGS: 00010246 > >>>> kernel: [ 2314.043777] RAX: 0000000000000000 RBX: ffff9c80e40e8f08 > >>>> RCX: 0000000000000000 > >>>> kernel: [ 2314.043781] RDX: 0000000000000000 RSI: 0000000000000000 > >>>> RDI: 0000000000000000 > >>>> kernel: [ 2314.043784] RBP: ffffad0b11b7bb60 R08: 0000000000000000 > >>>> R09: 0000000000000000 > >>>> kernel: [ 2314.043787] R10: 0000000000000000 R11: 0000000000000000 > >>>> R12: 00000000ffffffe4 > >>>> kernel: [ 2314.043790] R13: 00005e4c359ba000 R14: 0000000000020000 > >>>> R15: ffff9c824d9a58c0 > >>>> kernel: [ 2314.043794] FS: 0000000000000000(0000) > >>>> GS:ffff9c87a0a80000(0000) knlGS:0000000000000000 > >>>> kernel: [ 2314.043798] CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > >>>> kernel: [ 2314.043802] CR2: 00007f54adc86000 CR3: 00000001471d8000 > >>>> CR4: 00000000003506e0 > >>>> kernel: [ 2314.043806] Call Trace: > >>>> kernel: [ 2314.043809] <TASK> > >>>> kernel: [ 2314.043815] __btrfs_free_extent+0x6bc/0xf50 [btrfs] > >>>> kernel: [ 2314.043943] run_delayed_data_ref+0x8b/0x180 [btrfs] > >>>> kernel: [ 2314.044068] > btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] > >>>> kernel: [ 2314.044192] __btrfs_run_delayed_refs+0xe6/0x1d0 > [btrfs] > >>>> kernel: [ 2314.044316] btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] > >>>> kernel: [ 2314.044439] > btrfs_start_dirty_block_groups+0x36b/0x530 [btrfs] > >>>> kernel: [ 2314.044598] btrfs_commit_transaction+0xb3/0xbc0 > [btrfs] > >>>> kernel: [ 2314.044754] ? start_transaction+0xc8/0x600 [btrfs] > >>>> kernel: [ 2314.044890] transaction_kthread+0x14b/0x1c0 [btrfs] > >>>> kernel: [ 2314.045021] ? __pfx_transaction_kthread+0x10/0x10 > [btrfs] > >>>> kernel: [ 2314.045151] kthread+0xe9/0x110 > >>>> kernel: [ 2314.045162] ? __pfx_kthread+0x10/0x10 > >>>> kernel: [ 2314.045170] ret_from_fork+0x2c/0x50 > >>>> kernel: [ 2314.045180] </TASK> > >>>> kernel: [ 2314.045182] ---[ end trace 0000000000000000 ]--- > >>>> kernel: [ 2314.045186] BTRFS info (device sdc: state A): > dumping space info: > >>>> kernel: [ 2314.045191] BTRFS info (device sdc: state A): > space_info > >>>> DATA has 160777674752 free, is not full > >>>> kernel: [ 2314.045197] BTRFS info (device sdc: state A): > space_info > >>>> total=71201958395904, used=71013439856640, pinned=27737325568, > >>>> reserved=0, may_use=0, readonly=3538944 zone_unusable=0 > >>>> kernel: [ 2314.045205] BTRFS info (device sdc: state A): > space_info > >>>> METADATA has -429047808 free, is full > >>> > >>> This means we need at least 500+ MiB metadata space. > >>> > >>> Thus you may want to try 4x1GiB to see if this makes any > difference. > >>> > >>> Thanks, > >>> Qu > >>>> kernel: [ 2314.045209] BTRFS info (device sdc: state A): > space_info > >>>> total=83634421760, used=82789777408, pinned=244891648, > >>>> reserved=599687168, may_use=429047808, readonly=65536 > zone_unusable=0 > >>>> kernel: [ 2314.045217] BTRFS info (device sdc: state A): > space_info > >>>> SYSTEM has 33390592 free, is not full > >>>> kernel: [ 2314.045221] BTRFS info (device sdc: state A): > space_info > >>>> total=38797312, used=5373952, pinned=16384, reserved=16384, > may_use=0, > >>>> readonly=0 zone_unusable=0 > >>>> kernel: [ 2314.045227] BTRFS info (device sdc: state A): > >>>> global_block_rsv: size 536870912 reserved 428523520 > >>>> kernel: [ 2314.045231] BTRFS info (device sdc: state A): > >>>> trans_block_rsv: size 524288 reserved 524288 > >>>> kernel: [ 2314.045235] BTRFS info (device sdc: state A): > >>>> chunk_block_rsv: size 0 reserved 0 > >>>> kernel: [ 2314.045239] BTRFS info (device sdc: state A): > >>>> delayed_block_rsv: size 0 reserved 0 > >>>> kernel: [ 2314.045242] BTRFS info (device sdc: state A): > >>>> delayed_refs_rsv: size 249756909568 reserved 0 > >>>> kernel: [ 2314.045251] BTRFS: error (device sdc: state A) in > >>>> do_free_extent_accounting:2847: errno=-28 No space left > >>>> kernel: [ 2314.045265] BTRFS warning (device sdc: state A): > >>>> btrfs_uuid_scan_kthread failed -28 > >>>> kernel: [ 2314.045295] BTRFS info (device sdc: state EA): > forced readonly > >>>> kernel: [ 2314.045300] BTRFS error (device sdc: state EA): > failed to > >>>> run delayed ref for logical 103681409916928 num_bytes 131072 > type 184 > >>>> action 2 ref_mod 1: -28 > >>>> kernel: [ 2314.045360] BTRFS: error (device sdc: state EA) in > >>>> btrfs_run_delayed_refs:2151: errno=-28 No space left > >>>> kernel: [ 2314.049204] BTRFS: error (device sdc: state EA) in > >>>> btrfs_create_pending_block_groups:2487: errno=-28 No space left > >>>> kernel: [ 2314.049331] BTRFS: error (device sdc: state EA) in > >>>> btrfs_create_pending_block_groups:2499: errno=-28 No space left > >>>> kernel: [ 2314.053259] BTRFS: error (device sdc: state EA) in > >>>> do_free_extent_accounting:2847: errno=-28 No space left > >>>> kernel: [ 2314.053318] BTRFS error (device sdc: state EA): > failed to > >>>> run delayed ref for logical 103681419366400 num_bytes 131072 > type 184 > >>>> action 2 ref_mod 1: -28 > >>>> kernel: [ 2314.053375] BTRFS: error (device sdc: state EA) in > >>>> btrfs_run_delayed_refs:2151: errno=-28 No space left > >>>> kernel: [ 2314.053430] BTRFS warning (device sdc: state EA): > Skipping > >>>> commit of aborted transaction. > >>>> kernel: [ 2314.053435] BTRFS: error (device sdc: state EA) in > >>>> cleanup_transaction:1986: errno=-28 No space left > >>>> > >>>> > >>>> > >>>> On Fri, 23 Jun 2023 at 19:16, Qu Wenruo <wqu@suse.com > <mailto:wqu@suse.com>> wrote: > >>>>> > >>>>> > >>>>> > >>>>> On 2023/6/23 17:00, Stefan N wrote: > >>>>>> Apologies, I thought I included the log output too, though I > can't see > >>>>>> any additional output > >>>>>> > >>>>>> From a fresh run, still using the same kernel > >>>>>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; > sudo btrfs > >>>>>> dev add -f /dev/sdl /dev/sdm /dev/sdn /dev/sdo /mnt/data ; > sudo btrfs > >>>>>> fi sync /mnt/data > >>>>>> ERROR: error adding device '/dev/sdl': Input/output error > >>>>>> ERROR: error adding device '/dev/sdm': Read-only file system > >>>>>> ERROR: error adding device '/dev/sdn': Read-only file system > >>>>>> ERROR: error adding device '/dev/sdo': Read-only file system > >>>>>> ERROR: Could not sync filesystem: Read-only file system > >>>>>> $ > >>>>>> > >>>>>> Output from kern.log, syslog or dmesg -k > >>>>>> > >>>>> [...] > >>>>> > >>>>> None of the newly added debug lines triggered, so there is > something > >>>>> else causing the problem. > >>>>> > >>>>> And furthermore the backtrace is not that helpful, it only > shows it's > >>>>> some async metadata reclaim kthread causing the problem. > >>>>> > >>>>> Although I guess the async metadata reclaim is triggered by the > >>>>> btrfs_start_transaction() call when adding a device. > >>>>> So I updated my github branch to go btrfs_join_transaction() > which would > >>>>> not flush any metadata, thus avoid the problem. > >>>>> > >>>>> Would you please give it a try again? > >>>>> > >>>>>> > >>>>>> However, now I started digging into logs to check I hadn't > missed > >>>>>> where the errors were being logged, I've found this from > roughly a > >>>>>> week before I started having issues, which I had not previously > >>>>>> noticed > >>>>> > >>>>> You don't need to bother most error messages after the fs > flipped RO. > >>>>> As it's known to have some false alerts. > >>>>> > >>>>> Thanks, > >>>>> Qu > >>>>> > >>>>>> [ 1990.495861] BTRFS error (device sdh): failed to run > delayed ref for > >>>>>> logical 107988943355904 num_bytes 245760 type 184 action 2 > ref_mod 1: > >>>>>> -28 > >>>>>> [ 1990.518282] BTRFS error (device sdh): failed to run > delayed ref for > >>>>>> logical 107989043494912 num_bytes 245760 type 184 action 2 > ref_mod 1: > >>>>>> -28 > >>>>>> [ 620.104065] BTRFS error (device sdk): failed to run > delayed ref for > >>>>>> logical 123187655077888 num_bytes 176128 type 184 action 2 > ref_mod 1: > >>>>>> -28 > >>>>>> [ 620.126209] BTRFS error (device sdk): failed to run > delayed ref for > >>>>>> logical 123190279929856 num_bytes 134217728 type 184 action > 2 ref_mod > >>>>>> 1: -28 > >>>>>> [ 620.126241] BTRFS error (device sdk): failed to run > delayed ref for > >>>>>> logical 123189970468864 num_bytes 134217728 type 184 action > 2 ref_mod > >>>>>> 1: -28 > >>>>>> [ 620.126271] BTRFS error (device sdk): failed to run > delayed ref for > >>>>>> logical 123190414409728 num_bytes 134217728 type 184 action > 2 ref_mod > >>>>>> 1: -28 > >>>>>> [ 476.565308] BTRFS error (device sdh): failed to run > delayed ref for > >>>>>> logical 101906434228224 num_bytes 651264 type 184 action 2 > ref_mod 1: > >>>>>> -28 > >>>>>> [ 476.565932] BTRFS error (device sdh): failed to run > delayed ref for > >>>>>> logical 101906434031616 num_bytes 180224 type 184 action 2 > ref_mod 1: > >>>>>> -28 > >>>>>> [ 447.371754] BTRFS error (device sdh): failed to run > delayed ref for > >>>>>> logical 101946151927808 num_bytes 262144 type 184 action 2 > ref_mod 1: > >>>>>> -28 > >>>>>> [ 447.372362] BTRFS error (device sdh): failed to run > delayed ref for > >>>>>> logical 101946083725312 num_bytes 245760 type 184 action 2 > ref_mod 1: > >>>>>> -28 > >>>>>> [ 439.839007] BTRFS error (device sdj): failed to run > delayed ref for > >>>>>> logical 101923102179328 num_bytes 192512 type 184 action 2 > ref_mod 1: > >>>>>> -28 > >>>>>> [ 439.839578] BTRFS error (device sdj): failed to run > delayed ref for > >>>>>> logical 101923401629696 num_bytes 245760 type 184 action 2 > ref_mod 1: > >>>>>> -28 > >>>>>> [ 466.393884] BTRFS error (device sdh): failed to run > delayed ref for > >>>>>> logical 101981116137472 num_bytes 245760 type 184 action 2 > ref_mod 1: > >>>>>> -28 > >>>>>> [ 466.394451] BTRFS error (device sdh): failed to run > delayed ref for > >>>>>> logical 101981122854912 num_bytes 1720320 type 184 action 2 > ref_mod 1: > >>>>>> -28 > >>>>>> [ 431.541367] BTRFS error (device sdh): failed to run > delayed ref for > >>>>>> logical 101876426952704 num_bytes 126976 type 184 action 2 > ref_mod 1: > >>>>>> -28 > >>>>>> [ 431.542010] BTRFS error (device sdh): failed to run > delayed ref for > >>>>>> logical 101876427780096 num_bytes 126976 type 184 action 2 > ref_mod 1: > >>>>>> -28 > >>>>>> [ 597.487948] BTRFS error (device sdj): failed to run > delayed ref for > >>>>>> logical 108127459409920 num_bytes 196608 type 184 action 2 > ref_mod 1: > >>>>>> -28 > >>>>>> [ 597.488539] BTRFS error (device sdj): failed to run > delayed ref for > >>>>>> logical 108124677865472 num_bytes 126976 type 184 action 2 > ref_mod 1: > >>>>>> -28 > >>>>>> [ 534.717509] BTRFS error (device sdh): failed to run > delayed ref for > >>>>>> logical 101958618710016 num_bytes 1597440 type 184 action 2 > ref_mod 1: > >>>>>> -28 > >>>>>> [ 534.718494] BTRFS error (device sdh): failed to run > delayed ref for > >>>>>> logical 101958756335616 num_bytes 368640 type 184 action 2 > ref_mod 1: > >>>>>> -28 > >>>>>> [ 508.089394] BTRFS error (device sdk): failed to run > delayed ref for > >>>>>> logical 101911627694080 num_bytes 126976 type 184 action 2 > ref_mod 1: > >>>>>> -28 > >>>>>> [ 508.090007] BTRFS error (device sdk): failed to run > delayed ref for > >>>>>> logical 101911627415552 num_bytes 126976 type 184 action 2 > ref_mod 1: > >>>>>> -28 > >>>>>> [ 1632.112084] BTRFS error (device sdh): failed to run > delayed ref for > >>>>>> logical 102203759886336 num_bytes 229376 type 184 action 2 > ref_mod 1: > >>>>>> -28 > >>>>>> [ 1632.112885] BTRFS error (device sdh): failed to run > delayed ref for > >>>>>> logical 102203764379648 num_bytes 126976 type 184 action 2 > ref_mod 1: > >>>>>> -28 > >>>>>> > >>>>>> and today, when leaving the disks mounted read-only for a > while, I > >>>>>> found many occurances similar to: > >>>>>> BTRFS error (device sdc: state EA): level verify failed on > logical > >>>>>> 201329754554368 mirror 1 wanted 2 found 0 > >>>>>> BTRFS error (device sdc: state EA): level verify failed on > logical > >>>>>> 201329754554368 mirror 2 wanted 2 found 0 > >>>>>> BTRFS error (device sdc: state EA): level verify failed on > logical > >>>>>> 201329754554368 mirror 3 wanted 2 found 0 > >>>>>> BTRFS error (device sdc: state EA): level verify failed on > logical > >>>>>> 201329754554368 mirror 4 wanted 2 found 0 > >>>>>> BTRFS error (device sdc: state EA): level verify failed on > logical > >>>>>> 201329754554368 mirror 1 wanted 2 found 0 > >>>>>> BTRFS error (device sdc: state EA): level verify failed on > logical > >>>>>> 201329754554368 mirror 2 wanted 2 found 0 > >>>>>> BTRFS error (device sdc: state EA): level verify failed on > logical > >>>>>> 201329754554368 mirror 3 wanted 2 found 0 > >>>>>> BTRFS error (device sdc: state EA): level verify failed on > logical > >>>>>> 201350830227456 mirror 4 wanted 2 found 0 > >>>>>> BTRFS error (device sdc: state EA): level verify failed on > logical > >>>>>> 201350830227456 mirror 1 wanted 2 found 0 > >>>>>> BTRFS error (device sdc: state EA): level verify failed on > logical > >>>>>> 201350830227456 mirror 2 wanted 2 found 0 > >>>>>> > >>>>>> On Fri, 23 Jun 2023 at 10:27, Qu Wenruo > <quwenruo.btrfs@gmx.com <mailto:quwenruo.btrfs@gmx.com>> wrote: > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On 2023/6/23 06:18, Stefan N wrote: > >>>>>>>> Hi Qu, > >>>>>>>> > >>>>>>>> I got one new line this time, but it doesn't seem to match > your commit > >>>>>>>> ERROR: zoned: unable to stat /dev/loop/13 > >>>>>>> > >>>>>>> Please provide the dmesg of that attempt, as all the extra > debug info is > >>>>>>> inside dmesg. > >>>>>>> > >>>>>>> With that info provided, we can determine what to do next. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Qu > >>>>>>> > >>>>>>>> > >>>>>>>> I tried it on the USB flash drives too and didn't get any > extra line > >>>>>>>> > >>>>>>>> In context > >>>>>>>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; > sudo btrfs > >>>>>>>> dev add -K -f /dev/loop12 /dev/loop/13 /dev/loop14 /dev/loop15 > >>>>>>>> /mnt/data ; sudo btrfs fi sync /mnt/data > >>>>>>>> ERROR: error adding device '/dev/loop12': Input/output error > >>>>>>>> ERROR: zoned: unable to stat /dev/loop/13 > >>>>>>>> ERROR: checking status of /dev/loop/13: No such file or > directory > >>>>>>>> ERROR: error adding device '/dev/loop14': Read-only file > system > >>>>>>>> ERROR: error adding device '/dev/loop15': Read-only file > system > >>>>>>>> ERROR: Could not sync filesystem: Read-only file system > >>>>>>>> $ > >>>>>>>> > >>>>>>>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data ; > sudo btrfs > >>>>>>>> dev add -f /dev/sdl /dev/sdm /dev/sdn /dev/sdo /mnt/data ; > sudo btrfs > >>>>>>>> fi sync /mnt/data > >>>>>>>> ERROR: error adding device '/dev/sdl': Input/output error > >>>>>>>> ERROR: error adding device '/dev/sdm': Read-only file system > >>>>>>>> ERROR: error adding device '/dev/sdn': Read-only file system > >>>>>>>> ERROR: error adding device '/dev/sdo': Read-only file system > >>>>>>>> ERROR: Could not sync filesystem: Read-only file system > >>>>>>>> $ > >>>>>>>> > >>>>>>>> On Thu, 22 Jun 2023 at 18:48, Qu Wenruo > <quwenruo.btrfs@gmx.com <mailto:quwenruo.btrfs@gmx.com>> wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On 2023/6/22 16:33, Stefan N wrote: > >>>>>>>>>> Hi Qu, > >>>>>>>>>> > >>>>>>>>>> Many thanks for the detailed instructions and your > patience. I got it > >>>>>>>>>> working combined with > >>>>>>>>>> https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel > <https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel> on the main system > >>>>>>>>>> OS instead, tagged +btrfix > >>>>>>>>>> $ uname -vr > >>>>>>>>>> 6.2.0-23-generic #23+btrfix SMP PREEMPT_DYNAMIC Thu Jun 22 > >>>>>>>>>> > >>>>>>>>>> However, I've not had luck with the commands suggested, > and would > >>>>>>>>>> appreciate any further ideas. > >>>>>>>>>> > >>>>>>>>>> Outputs follow below, with /mnt/data as the btrfs mount > point that > >>>>>>>>>> currently contains 8x disks sd[a-j] with an additional > 4x 64gb USB > >>>>>>>>>> flash drives being added sd[l-o] > >>>>>>>>>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data > ; sudo btrfs > >>>>>>>>>> dev add -f /dev/sdl /dev/sdm /dev/sdn /dev/sdo /mnt/data > ; sudo btrfs > >>>>>>>>>> fi sync /mnt/data > >>>>>>>>>> ERROR: error adding device '/dev/sdl': Input/output error > >>>>>>>>>> ERROR: error adding device '/dev/sdm': Read-only file system > >>>>>>>>>> ERROR: error adding device '/dev/sdn': Read-only file system > >>>>>>>>>> ERROR: error adding device '/dev/sdo': Read-only file system > >>>>>>>>>> ERROR: Could not sync filesystem: Read-only file system > >>>>>>>>>> $ > >>>>>>>>>> > >>>>>>>>>> The same occurs if I try to add 4x 100mb loop devices > (on a ssd so > >>>>>>>>>> they're super quick to zero); > >>>>>>>>>> $ sudo mount -o skip_balance -t btrfs /dev/sde /mnt/data > ; sudo btrfs > >>>>>>>>>> dev add -K -f /dev/loop16 /dev/loop17 /dev/loop18 > /dev/loop19 > >>>>>>>>>> /mnt/data ; sudo btrfs fi sync /mnt/data > >>>>>>>>>> ERROR: error adding device '/dev/loop16': Input/output error > >>>>>>>>> > >>>>>>>>> This is the interesting part, this means we're erroring > out due to -EIO > >>>>>>>>> (not -ENOSPC) during the first device add. > >>>>>>>>> > >>>>>>>>> And by somehow, after the first device add, we already > got the trans abort. > >>>>>>>>> > >>>>>>>>> Would you please try the following branch? > >>>>>>>>> > >>>>>>>>> > https://github.com/adam900710/linux/tree/dev_add_no_commit > <https://github.com/adam900710/linux/tree/dev_add_no_commit> > >>>>>>>>> > >>>>>>>>> It has not only the patch to skip the commit, but also > extra debug > >>>>>>>>> output for the situation. > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> Qu > >>>>>>>>> > >>>>>>>>>> ERROR: error adding device '/dev/loop17': Read-only file > system > >>>>>>>>>> ERROR: error adding device '/dev/loop18': Read-only file > system > >>>>>>>>>> ERROR: error adding device '/dev/loop19': Read-only file > system > >>>>>>>>>> ERROR: Could not sync filesystem: Read-only file system > >>>>>>>>>> $ > >>>>>>>>>> > >>>>>>>>>> I confirmed before both these kernel builds that the > replaced line was > >>>>>>>>>> btrfs_end_transaction rather than > btrfs_commit_transaction (anyone > >>>>>>>>>> else following, I needed to remove the -n in the patch > command > >>>>>>>>>> earlier) > >>>>>>>>>> $ grep -A3 -ri btrfs_sysfs_update_sprout > */fs/btrfs/volumes.c* > >>>>>>>>>> linux-6.2.0-dist/fs/btrfs/volumes.c: > >>>>>>>>>> btrfs_sysfs_update_sprout_fsid(fs_devices); > >>>>>>>>>> linux-6.2.0-dist/fs/btrfs/volumes.c- } > >>>>>>>>>> linux-6.2.0-dist/fs/btrfs/volumes.c- > >>>>>>>>>> linux-6.2.0-dist/fs/btrfs/volumes.c- ret = > btrfs_commit_transaction(trans); > >>>>>>>>>> -- > >>>>>>>>>> linux-6.2.0-v2/fs/btrfs/volumes.c: > >>>>>>>>>> btrfs_sysfs_update_sprout_fsid(fs_devices); > >>>>>>>>>> linux-6.2.0-v2/fs/btrfs/volumes.c- } > >>>>>>>>>> linux-6.2.0-v2/fs/btrfs/volumes.c- > >>>>>>>>>> linux-6.2.0-v2/fs/btrfs/volumes.c- ret = > btrfs_end_transaction(trans); > >>>>>>>>>> -- > >>>>>>>>>> linux-6.2.0-v3/fs/btrfs/volumes.c: > >>>>>>>>>> btrfs_sysfs_update_sprout_fsid(fs_devices); > >>>>>>>>>> linux-6.2.0-v3/fs/btrfs/volumes.c- } > >>>>>>>>>> linux-6.2.0-v3/fs/btrfs/volumes.c- > >>>>>>>>>> linux-6.2.0-v3/fs/btrfs/volumes.c- ret = > btrfs_end_transaction(trans); > >>>>>>>>>> $ > >>>>>>>>>> > >>>>>>>>>> $ btrfs fi usage /mnt/data > >>>>>>>>>> Overall: > >>>>>>>>>> Device size: 87.31TiB > >>>>>>>>>> Device allocated: 87.31TiB > >>>>>>>>>> Device unallocated: 1.94GiB > >>>>>>>>>> Device missing: 0.00B > >>>>>>>>>> Device slack: 0.00B > >>>>>>>>>> Used: 87.08TiB > >>>>>>>>>> Free (estimated): 173.29GiB > (min: 172.33GiB) > >>>>>>>>>> Free (statfs, df): 171.84GiB > >>>>>>>>>> Data ratio: 1.34 > >>>>>>>>>> Metadata ratio: 4.00 > >>>>>>>>>> Global reserve: 512.00MiB > (used: 371.25MiB) > >>>>>>>>>> Multiple profiles: no > >>>>>>>>>> > >>>>>>>>>> Data,RAID6: Size:64.76TiB, Used:64.59TiB (99.74%) > >>>>>>>>>> /dev/sdc 10.90TiB > >>>>>>>>>> /dev/sdf 10.90TiB > >>>>>>>>>> /dev/sda 10.86TiB > >>>>>>>>>> /dev/sdg 10.87TiB > >>>>>>>>>> /dev/sdh 10.86TiB > >>>>>>>>>> /dev/sdd 10.87TiB > >>>>>>>>>> /dev/sde 10.88TiB > >>>>>>>>>> /dev/sdb 10.88TiB > >>>>>>>>>> > >>>>>>>>>> Metadata,RAID1C4: Size:77.79GiB, Used:77.11GiB (99.12%) > >>>>>>>>>> /dev/sdc 15.33GiB > >>>>>>>>>> /dev/sdf 18.41GiB > >>>>>>>>>> /dev/sda 49.63GiB > >>>>>>>>>> /dev/sdg 49.50GiB > >>>>>>>>>> /dev/sdh 51.52GiB > >>>>>>>>>> /dev/sdd 48.70GiB > >>>>>>>>>> /dev/sde 39.09GiB > >>>>>>>>>> /dev/sdb 39.01GiB > >>>>>>>>>> > >>>>>>>>>> System,RAID1C4: Size:37.00MiB, Used:5.11MiB (13.81%) > >>>>>>>>>> /dev/sdc 1.00MiB > >>>>>>>>>> /dev/sda 37.00MiB > >>>>>>>>>> /dev/sdg 37.00MiB > >>>>>>>>>> /dev/sdh 36.00MiB > >>>>>>>>>> /dev/sdd 37.00MiB > >>>>>>>>>> > >>>>>>>>>> Unallocated: > >>>>>>>>>> /dev/sdc 1.00MiB > >>>>>>>>>> /dev/sdf 1.00MiB > >>>>>>>>>> /dev/sda 1.27GiB > >>>>>>>>>> /dev/sdg 1.00MiB > >>>>>>>>>> /dev/sdh 1.00MiB > >>>>>>>>>> /dev/sdd 687.00MiB > >>>>>>>>>> /dev/sde 1.00MiB > >>>>>>>>>> /dev/sdb 1.00MiB > >>>>>>>>>> $ > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> This first attempt generated the following syslog output: > >>>>>>>>>> kernel: [ 868.435387] BTRFS info (device sde): using crc32c > >>>>>>>>>> (crc32c-intel) checksum algorithm > >>>>>>>>>> kernel: [ 868.435407] BTRFS info (device sde): disk > space caching is enabled > >>>>>>>>>> kernel: [ 874.477712] BTRFS info (device sde): bdev > /dev/sdg errs: wr > >>>>>>>>>> 0, rd 0, flush 0, corrupt 845, gen 0 > >>>>>>>>>> kernel: [ 874.477727] BTRFS info (device sde): bdev > /dev/sdc errs: wr > >>>>>>>>>> 41089, rd 1556, flush 0, corrupt 0, gen 0 > >>>>>>>>>> kernel: [ 874.477735] BTRFS info (device sde): bdev > /dev/sdj errs: wr > >>>>>>>>>> 3, rd 7, flush 0, corrupt 0, gen 0 > >>>>>>>>>> kernel: [ 874.477740] BTRFS info (device sde): bdev > /dev/sdf errs: wr > >>>>>>>>>> 41, rd 0, flush 0, corrupt 0, gen 0 > >>>>>>>>>> kernel: [ 1082.645551] BTRFS info (device sde): balance: > resume skipped > >>>>>>>>>> kernel: [ 1082.645564] BTRFS info (device sde): checking > UUID tree > >>>>>>>>>> kernel: [ 1082.645551] BTRFS info (device sde): balance: > resume skipped > >>>>>>>>>> kernel: [ 1082.645564] BTRFS info (device sde): checking > UUID tree > >>>>>>>>>> kernel: [ 1267.280506] BTRFS: Transaction aborted (error > -28) > >>>>>>>>>> kernel: [ 1267.280553] BTRFS: error (device sde: state A) in > >>>>>>>>>> do_free_extent_accounting:2847: errno=-28 No space left > >>>>>>>>>> kernel: [ 1267.280604] BTRFS info (device sde: state > EA): forced readonly > >>>>>>>>>> kernel: [ 1267.280610] BTRFS error (device sde: state > EA): failed to > >>>>>>>>>> run delayed ref for logical 102255404044288 num_bytes > 294912 type 184 > >>>>>>>>>> action 2 ref_mod 1: -28 > >>>>>>>>>> kernel: [ 1267.280584] WARNING: CPU: 3 PID: 14519 at > >>>>>>>>>> fs/btrfs/extent-tree.c:2847 > do_free_extent_accounting+0x21a/0x220 > >>>>>>>>>> [btrfs] > >>>>>>>>>> kernel: [ 1267.280666] BTRFS: error (device sde: state > EA) in > >>>>>>>>>> btrfs_run_delayed_refs:2151: errno=-28 No space left > >>>>>>>>>> kernel: [ 1267.280695] BTRFS warning (device sde: state EA): > >>>>>>>>>> btrfs_uuid_scan_kthread failed -5 > >>>>>>>>>> kernel: [ 1267.280794] Modules linked in: xt_nat > xt_tcpudp veth > >>>>>>>>>> xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat > nf_conntrack_netlink > >>>>>>>>>> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user > xfrm_algo > >>>>>>>>>> xt_addrtype nft_compat nf_tables nfnetlink br_netfilter > bridge stp llc > >>>>>>>>>> ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) > binfmt_misc > >>>>>>>>>> nls_iso8859_1 intel_rapl_msr intel_rapl_common edac_mce_amd > >>>>>>>>>> snd_hda_codec_realtek kvm_amd snd_hda_codec_generic > ledtrig_audio kvm > >>>>>>>>>> snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg > snd_intel_sdw_acpi > >>>>>>>>>> snd_hda_codec irqbypass snd_hda_core snd_hwdep rapl > snd_pcm snd_timer > >>>>>>>>>> wmi_bmof k10temp snd ccp soundcore input_leds mac_hid > dm_multipath > >>>>>>>>>> scsi_dh_rdac scsi_dh_emc scsi_dh_alua bonding tls > efi_pstore msr nfsd > >>>>>>>>>> auth_rpcgss nfs_acl lockd grace sunrpc dmi_sysfs > ip_tables x_tables > >>>>>>>>>> autofs4 btrfs blake2b_generic raid10 raid456 > async_raid6_recov > >>>>>>>>>> async_memcpy async_pq async_xor async_txxor raid6_pq > libcrc32c raid1 > >>>>>>>>>> raid0 multipath linear hid_generic usbhid hid amdgpu uas > usb_storage > >>>>>>>>>> kernel: [ 1267.280994] CPU: 3 PID: 14519 Comm: > btrfs-transacti > >>>>>>>>>> Tainted: G W O 6.2.0-23-generic #23+btrfix > >>>>>>>>>> kernel: [ 1267.281005] RIP: > 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] > >>>>>>>>>> kernel: [ 1267.281181] __btrfs_free_extent+0x6bc/0xf50 > [btrfs] > >>>>>>>>>> kernel: [ 1267.281310] run_delayed_data_ref+0x8b/0x180 > [btrfs] > >>>>>>>>>> kernel: [ 1267.281444] > btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] > >>>>>>>>>> kernel: [ 1267.281570] > __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] > >>>>>>>>>> kernel: [ 1267.281694] > btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] > >>>>>>>>>> kernel: [ 1267.281818] > btrfs_start_dirty_block_groups+0x36b/0x530 [btrfs] > >>>>>>>>>> kernel: [ 1267.281976] > btrfs_commit_transaction+0xb3/0xbc0 [btrfs] > >>>>>>>>>> kernel: [ 1267.282110] ? start_transaction+0xc8/0x600 > [btrfs] > >>>>>>>>>> kernel: [ 1267.282244] transaction_kthread+0x14b/0x1c0 > [btrfs] > >>>>>>>>>> kernel: [ 1267.282375] ? > __pfx_transaction_kthread+0x10/0x10 [btrfs] > >>>>>>>>>> kernel: [ 1267.282548] BTRFS info (device sde: state > EA): dumping space info: > >>>>>>>>>> kernel: [ 1267.282552] BTRFS info (device sde: state > EA): space_info > >>>>>>>>>> DATA has 160777674752 free, is not full > >>>>>>>>>> kernel: [ 1267.282558] BTRFS info (device sde: state > EA): space_info > >>>>>>>>>> total=71201958395904, used=71018191273984, > pinned=22985908224, > >>>>>>>>>> reserved=0, may_use=0, readonly=3538944 zone_unusable=0 > >>>>>>>>>> kernel: [ 1267.282566] BTRFS info (device sde: state > EA): space_info > >>>>>>>>>> METADATA has -124944384 free, is full > >>>>>>>>>> kernel: [ 1267.282571] BTRFS info (device sde: state > EA): space_info > >>>>>>>>>> total=83530612736, used=82791497728, pinned=242745344, > >>>>>>>>>> reserved=496369664, may_use=124944384, readonly=0 > zone_unusable=0 > >>>>>>>>>> kernel: [ 1267.282577] BTRFS info (device sde: state > EA): space_info > >>>>>>>>>> SYSTEM has 33439744 free, is not full > >>>>>>>>>> kernel: [ 1267.282582] BTRFS info (device sde: state > EA): space_info > >>>>>>>>>> total=38797312, used=5357568, pinned=0, reserved=0, > may_use=0, > >>>>>>>>>> readonly=0 zone_unusable=0 > >>>>>>>>>> kernel: [ 1267.282588] BTRFS info (device sde: state EA): > >>>>>>>>>> global_block_rsv: size 536870912 reserved 124944384 > >>>>>>>>>> kernel: [ 1267.282592] BTRFS info (device sde: state EA): > >>>>>>>>>> trans_block_rsv: size 0 reserved 0 > >>>>>>>>>> kernel: [ 1267.282595] BTRFS info (device sde: state EA): > >>>>>>>>>> chunk_block_rsv: size 0 reserved 0 > >>>>>>>>>> kernel: [ 1267.282599] BTRFS info (device sde: state EA): > >>>>>>>>>> delayed_block_rsv: size 0 reserved 0 > >>>>>>>>>> kernel: [ 1267.282602] BTRFS info (device sde: state EA): > >>>>>>>>>> delayed_refs_rsv: size 251322957824 reserved 0 > >>>>>>>>>> kernel: [ 1267.282608] BTRFS: error (device sde: state > EA) in > >>>>>>>>>> do_free_extent_accounting:2847: errno=-28 No space left > >>>>>>>>>> kernel: [ 1267.282653] BTRFS error (device sde: state > EA): failed to > >>>>>>>>>> run delayed ref for logical 102255401897984 num_bytes > 126976 type 184 > >>>>>>>>>> action 2 ref_mod 1: -28 > >>>>>>>>>> kernel: [ 1267.282708] BTRFS: error (device sde: state > EA) in > >>>>>>>>>> btrfs_run_delayed_refs:2151: errno=-28 No space left > >>>>>>>>>> > >>>>>>>>>> A couple of kernel recompiles later, the second attempt > on the SSD > >>>>>>>>>> generated similar: > >>>>>>>>>> kernel: [ 1472.203470] BTRFS info (device sdc): using crc32c > >>>>>>>>>> (crc32c-intel) checksum algorithm > >>>>>>>>>> kernel: [ 1472.203491] BTRFS info (device sdc): disk > space caching is enabled > >>>>>>>>>> kernel: [ 1478.155004] BTRFS info (device sdc): bdev > /dev/sdf errs: wr > >>>>>>>>>> 0, rd 0, flush 0, corrupt 845, gen 0 > >>>>>>>>>> kernel: [ 1478.155022] BTRFS info (device sdc): bdev > /dev/sda errs: wr > >>>>>>>>>> 41089, rd 1556, flush 0, corrupt 0, gen 0 > >>>>>>>>>> kernel: [ 1478.155034] BTRFS info (device sdc): bdev > /dev/sdh errs: wr > >>>>>>>>>> 3, rd 7, flush 0, corrupt 0, gen 0 > >>>>>>>>>> kernel: [ 1478.155041] BTRFS info (device sdc): bdev > /dev/sdd errs: wr > >>>>>>>>>> 41, rd 0, flush 0, corrupt 0, gen 0 > >>>>>>>>>> kernel: [ 1696.662526] BTRFS info (device sdc): balance: > resume skipped > >>>>>>>>>> kernel: [ 1696.662537] BTRFS info (device sdc): checking > UUID tree > >>>>>>>>>> kernel: [ 1919.452464] BTRFS: Transaction aborted (error > -28) > >>>>>>>>>> kernel: [ 1919.452534] WARNING: CPU: 1 PID: 161 at > >>>>>>>>>> fs/btrfs/extent-tree.c:2847 > do_free_extent_accounting+0x21a/0x220 > >>>>>>>>>> [btrfs] > >>>>>>>>>> kernel: [ 1919.452655] Modules linked in: xt_nat > xt_tcpudp veth > >>>>>>>>>> xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat > nf_conntrack_netlink > >>>>>>>>>> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user > xfrm_algo > >>>>>>>>>> xt_addrtype nft_compat nf_tables nfnetlink br_netfilter > bridge stp llc > >>>>>>>>>> ipmi_devintf ipmi_msghandler overlay iwlwifi_compat(O) > binfmt_misc > >>>>>>>>>> nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic > >>>>>>>>>> ledtrig_audio snd_hda_codec_hdmi snd_hda_intel > snd_intel_dspcfg > >>>>>>>>>> snd_intel_sdw_acpi snd_hda_codec intel_rapl_msr snd_hda_core > >>>>>>>>>> intel_rapl_common edac_mce_amd snd_hwdep kvm_amd snd_pcm > snd_timer kvm > >>>>>>>>>> irqbypass rapl wmi_bmof snd k10temp soundcore ccp > input_leds mac_hid > >>>>>>>>>> dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua > bonding tls nfsd > >>>>>>>>>> msr auth_rpcgss efi_pstore nfs_acl lockd grace sunrpc > dmi_sysfs > >>>>>>>>>> ip_tables x_tables autofs4 btrfs blake2b_generic raid10 > raid456 > >>>>>>>>>> async_raid6_recov async_memcpy async_pq async_xor > async_tx xor > >>>>>>>>>> raid6_pq libcrc32c raid1 raid0 multipath linear > hid_generic usbhid > >>>>>>>>>> amdgpu uas hid iommu_v2 > >>>>>>>>>> kernel: [ 1919.452839] Workqueue: events_unbound > >>>>>>>>>> btrfs_async_reclaim_metadata_space [btrfs] > >>>>>>>>>> kernel: [ 1919.452985] RIP: > 0010:do_free_extent_accounting+0x21a/0x220 [btrfs] > >>>>>>>>>> kernel: [ 1919.453141] __btrfs_free_extent+0x6bc/0xf50 > [btrfs] > >>>>>>>>>> kernel: [ 1919.453256] run_delayed_data_ref+0x8b/0x180 > [btrfs] > >>>>>>>>>> kernel: [ 1919.453368] > btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs] > >>>>>>>>>> kernel: [ 1919.453480] > __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs] > >>>>>>>>>> kernel: [ 1919.453592] > btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs] > >>>>>>>>>> kernel: [ 1919.453703] flush_space+0x23c/0x2c0 [btrfs] > >>>>>>>>>> kernel: [ 1919.453845] > btrfs_async_reclaim_metadata_space+0x19b/0x2b0 [btrfs] > >>>>>>>>>> kernel: [ 1919.454034] BTRFS info (device sdc: state A): > dumping space info: > >>>>>>>>>> kernel: [ 1919.454038] BTRFS info (device sdc: state A): > space_info > >>>>>>>>>> DATA has 160778723328 free, is not full > >>>>>>>>>> kernel: [ 1919.454043] BTRFS info (device sdc: state A): > space_info > >>>>>>>>>> total=71201958395904, used=71017442181120, > pinned=23733952512, > >>>>>>>>>> reserved=0, may_use=0, readonly=3538944 zone_unusable=0 > >>>>>>>>>> kernel: [ 1919.454050] BTRFS info (device sdc: state A): > space_info > >>>>>>>>>> METADATA has -147570688 free, is full > >>>>>>>>>> kernel: [ 1919.454054] BTRFS info (device sdc: state A): > space_info > >>>>>>>>>> total=83530612736, used=82792185856, pinned=238059520, > >>>>>>>>>> reserved=500367360, may_use=147570688, readonly=0 > zone_unusable=0 > >>>>>>>>>> kernel: [ 1919.454060] BTRFS info (device sdc: state A): > space_info > >>>>>>>>>> SYSTEM has 33439744 free, is not full > >>>>>>>>>> kernel: [ 1919.454064] BTRFS info (device sdc: state A): > space_info > >>>>>>>>>> total=38797312, used=5357568, pinned=0, reserved=0, > may_use=0, > >>>>>>>>>> readonly=0 zone_unusable=0 > >>>>>>>>>> kernel: [ 1919.454070] BTRFS info (device sdc: state A): > >>>>>>>>>> global_block_rsv: size 536870912 reserved 147570688 > >>>>>>>>>> kernel: [ 1919.454074] BTRFS info (device sdc: state A): > >>>>>>>>>> trans_block_rsv: size 0 reserved 0 > >>>>>>>>>> kernel: [ 1919.454077] BTRFS info (device sdc: state A): > >>>>>>>>>> chunk_block_rsv: size 0 reserved 0 > >>>>>>>>>> kernel: [ 1919.454080] BTRFS info (device sdc: state A): > >>>>>>>>>> delayed_block_rsv: size 0 reserved 0 > >>>>>>>>>> kernel: [ 1919.454083] BTRFS info (device sdc: state A): > >>>>>>>>>> delayed_refs_rsv: size 254292787200 reserved 0 > >>>>>>>>>> kernel: [ 1919.454086] BTRFS: error (device sdc: state A) in > >>>>>>>>>> do_free_extent_accounting:2847: errno=-28 No space left > >>>>>>>>>> kernel: [ 1919.454123] BTRFS info (device sdc: state > EA): forced readonly > >>>>>>>>>> kernel: [ 1919.454127] BTRFS error (device sdc: state > EA): failed to > >>>>>>>>>> run delayed ref for logical 102538713931776 num_bytes > 245760 type 184 > >>>>>>>>>> action 2 ref_mod 1: -28 > >>>>>>>>>> kernel: [ 1919.454176] BTRFS: error (device sdc: state > EA) in > >>>>>>>>>> btrfs_run_delayed_refs:2151: errno=-28 No space left > >>>>>>>>>> kernel: [ 1919.454249] BTRFS warning (device sdc: state EA): > >>>>>>>>>> btrfs_uuid_scan_kthread failed -5 > >>>>>>>>>> kernel: [ 1919.472381] BTRFS: error (device sdc: state > EA) in > >>>>>>>>>> __btrfs_free_extent:3077: errno=-28 No space left > >>>>>>>>>> kernel: [ 1919.472417] BTRFS error (device sdc: state > EA): failed to > >>>>>>>>>> run delayed ref for logical 102538732191744 num_bytes > 245760 type 184 > >>>>>>>>>> action 2 ref_mod 1: -28 > >>>>>>>>>> kernel: [ 1919.472442] BTRFS: error (device sdc: state > EA) in > >>>>>>>>>> btrfs_run_delayed_refs:2151: errno=-28 No space left > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Sat, 17 Jun 2023 at 15:00, Qu Wenruo <wqu@suse.com > <mailto:wqu@suse.com>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On 2023/6/17 13:11, Stefan N wrote: > >>>>>>>>>>>> Hi Qu, > >>>>>>>>>>>> > >>>>>>>>>>>> I believe I've got this environment ready, with the > 6.2.0 kernel as > >>>>>>>>>>>> before using the Ubuntu kernel, but can switch to > vanilla if required. > >>>>>>>>>>>> > >>>>>>>>>>>> I've not done anything kernel modifications for a > solid decade, so > >>>>>>>>>>>> would be keen for a bit of guidance. > >>>>>>>>>>> > >>>>>>>>>>> Sure no problem. > >>>>>>>>>>> > >>>>>>>>>>> Please fetch the kernel source tar ball (6.2.x) first, > decompress, then > >>>>>>>>>>> apply the attached one-line patch by: > >>>>>>>>>>> > >>>>>>>>>>> $ tar czf linux*.tar.xz > >>>>>>>>>>> $ cd linux* > >>>>>>>>>>> $ patch -np1 -i <the patch file> > >>>>>>>>>>> > >>>>>>>>>>> Then use your running system kernel config if possible: > >>>>>>>>>>> > >>>>>>>>>>> $ cp /proc/config.gz . > >>>>>>>>>>> $ gunzip config.gz > >>>>>>>>>>> $ mv config .config > >>>>>>>>>>> $ make olddefconfig > >>>>>>>>>>> > >>>>>>>>>>> Then you can start your kernel compiling, and > considering you're using > >>>>>>>>>>> your distro's default, it would include tons of > drivers, thus would be > >>>>>>>>>>> very slow. (Replace the number to something more > suitable to your > >>>>>>>>>>> system, using all CPU cores can be very hot) > >>>>>>>>>>> > >>>>>>>>>>> $ make -j12 > >>>>>>>>>>> > >>>>>>>>>>> Finally you need to install the modules/kernel. > >>>>>>>>>>> > >>>>>>>>>>> Unfortunately this is distro specific, but if you're > using Ubuntu, it > >>>>>>>>>>> may be much easier: > >>>>>>>>>>> > >>>>>>>>>>> $ make bindeb-pkg > >>>>>>>>>>> > >>>>>>>>>>> Then install the generated dpkg I guess? I have never > tried kernel > >>>>>>>>>>> building using deb/rpm, but only manual installation, > which is also > >>>>>>>>>>> distro dependent in the initramfs generation part. > >>>>>>>>>>> > >>>>>>>>>>> # cp arch/x86/boot/bzImage /boot/vmlinuz-custom > >>>>>>>>>>> # make modules_install > >>>>>>>>>>> # mkinitcpio -k /boot/vmlinuz-custom -g > /boot/initramfs-custom.img > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> The last step is to update your bootloader to add the > new kernel, which > >>>>>>>>>>> is not only distro dependent but also bootloader dependent. > >>>>>>>>>>> > >>>>>>>>>>> In my case, I go with systemd-boot with manually > crafted entries. > >>>>>>>>>>> But if you go Ubuntu I believe just installing the > kernel dpkg would > >>>>>>>>>>> have everything handled? > >>>>>>>>>>> > >>>>>>>>>>> Finally you can try reboot into the newer kernel, and > try device add > >>>>>>>>>>> (need to add 4 disks), then sync and see if things work > as expected. > >>>>>>>>>>> > >>>>>>>>>>> Thanks, > >>>>>>>>>>> Qu > >>>>>>>>>>>> > >>>>>>>>>>>> I will recover a 1tb SSD and partition it into 4 in a > USB enclosure, > >>>>>>>>>>>> but failing this will use 4x loop devices. > >>>>>>>>>>>> > >>>>>>>>>>>> On Tue, 13 Jun 2023 at 11:28, Qu Wenruo > <quwenruo.btrfs@gmx.com <mailto:quwenruo.btrfs@gmx.com>> wrote: > >>>>>>>>>>>>> In your particular case, since you're running RAID1C4 > you need to add 4 > >>>>>>>>>>>>> devices in one transaction. > >>>>>>>>>>>>> > >>>>>>>>>>>>> I can easily craft a patch to avoid commit > transaction, but still you'll > >>>>>>>>>>>>> need to add at least 4 disks, and then sync to see if > things would work. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Furthermore this means you need a liveCD with full > kernel compiling > >>>>>>>>>>>>> environment. > >>>>>>>>>>>>> > >>>>>>>>>>>>> If you want to go this path, I can send you the patch > when you've > >>>>>>>>>>>>> prepared the needed environment. > ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2023-07-23 7:23 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-12 4:47 Out of space loop: skip_balance not working Stefan N
2023-06-12 5:20 ` Qu Wenruo
2023-06-12 10:31 ` Stefan N
2023-06-12 10:46 ` Qu Wenruo
2023-06-12 13:02 ` Stefan N
2023-06-13 1:29 ` Paul Jones
2023-06-13 1:54 ` Stefan N
2023-06-13 1:58 ` Qu Wenruo
2023-06-17 5:11 ` Stefan N
2023-06-17 5:30 ` Qu Wenruo
2023-06-22 8:33 ` Stefan N
2023-06-22 9:18 ` Qu Wenruo
2023-06-22 22:18 ` Stefan N
2023-06-23 0:57 ` Qu Wenruo
2023-06-23 9:00 ` Stefan N
2023-06-23 9:46 ` Qu Wenruo
2023-06-24 15:29 ` Stefan N
2023-06-26 10:18 ` Qu Wenruo
2023-06-26 12:58 ` Stefan N
2023-07-22 5:28 ` Stefan N
2023-07-22 10:08 ` Qu Wenruo
[not found] ` <CA+W5K0oDRo2LZMiUiysYXpcpmfXTvS27hPdjm1pzq4kfq9=vdQ@mail.gmail.com>
2023-07-23 7:23 ` Qu Wenruo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox