From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Stefan N <stefannnau@gmail.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Out of space loop: skip_balance not working
Date: Mon, 12 Jun 2023 18:46:46 +0800 [thread overview]
Message-ID: <40ecba88-9de2-7315-4ac5-e3eb892aac39@gmx.com> (raw)
In-Reply-To: <CA+W5K0ow+95pWnzam8N6=c5Ff61ZeHyv7_yDK0LG6ujU48=yBA@mail.gmail.com>
On 2023/6/12 18:31, Stefan N wrote:
> Hi Qu,
>
> Thanks for the quick helpful response, though perhaps it may not be
> sufficient in my case.
>
> I've tried using the latest ubuntu livecd which has btrfs-progs v6.2
> on kernel 6.20.0-20
I guess you mean 6.2?
In v6.2 kernel Josef introduced a new mechanism called FLUSH_EMERGENCY
to try our best to squish out any extra metadata space.
If that doesn't work, I'm running out of ideas.
>
> Unfortunately I haven't been able to get any further as even when
> doing a rm, truncate, btrfs fi sync or btrfs dev add immediately after
> mounting it still results in i/o error or read only. I tried removing
> a small file or two or directories with no difference.
>
[...]
> __btrfs_free_extent+0x6bc/0xf50 [btrfs]
> run_delayed_data_ref+0x8b/0x180 [btrfs]
> btrfs_run_delayed_refs_for_head+0x196/0x520 [btrfs]
> __btrfs_run_delayed_refs+0xe6/0x1d0 [btrfs]
> btrfs_run_delayed_refs+0x6d/0x1f0 [btrfs]
> flush_space+0x23c/0x2c0 [btrfs]
> btrfs_async_reclaim_metadata_space+0x1d4/0x300 [btrfs]
> process_one_work+0x225/0x430
> worker_thread+0x50/0x3e0
> ? __pfx_worker_thread+0x10/0x10
> kthread+0xe9/0x110
> ? __pfx_kthread+0x10/0x10
> ret_from_fork+0x2c/0x50
> </TASK>
> ---[ end trace 0000000000000000 ]---
> BTRFS info (device sdi: state A): dumping space info:
> BTRFS info (device sdi: state A): space_info DATA has 160778199040
> free, is not full
> BTRFS info (device sdi: state A): space_info total=71201958395904,
> used=71018527428608, pinned=22649229312, reserved=0, may_use=0,
> readonly=3538944 zone_unusable=0
> BTRFS info (device sdi: state A): space_info METADATA has -130809856
> free, is full
That minus number is from the global RSV. Not a big deal to worry.
> BTRFS info (device sdi: state A): space_info total=83530612736,
> used=82789154816, pinned=245710848, reserved=495747072,
> may_use=130809856, readonly=0 zone_unusable=0
The big concern here is, we have hundreds of MiBs for
pinned/reserved/may_use.
Which looks too large.
My concern is either free space tree or extent tree updates are taking
too much space.
Have you tried to cancel the balance and sync? That doesn't work either?
Considering you have quite some data space left, you may want to migrate
to space cache v1.
Unlike v2 cache, v1 cache only takes data space, thus may squish out
some precious metadata space.
Thanks,
Qu
> BTRFS info (device sdi: state A): space_info SYSTEM has 33439744 free,
> is not full
> BTRFS info (device sdi: state A): space_info total=38797312,
> used=5357568, pinned=0, reserved=0, may_use=0, readonly=0
> zone_unusable=0
> BTRFS info (device sdi: state A): global_block_rsv: size 536870912
> reserved 130809856
> BTRFS info (device sdi: state A): trans_block_rsv: size 0 reserved 0
> BTRFS info (device sdi: state A): chunk_block_rsv: size 0 reserved 0
> BTRFS info (device sdi: state A): delayed_block_rsv: size 0 reserved 0
> BTRFS info (device sdi: state A): delayed_refs_rsv: size 220645556224 reserved 0
> BTRFS: error (device sdi: state A) in do_free_extent_accounting:2847:
> errno=-28 No space left
> BTRFS info (device sdi: state EA): forced readonly
>
> On Mon, 12 Jun 2023 at 14:50, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>>
>> On 2023/6/12 12:47, Stefan N wrote:
>>> Hi,
>>>
>>> I'm having trouble trying to break my array out of an out of space loop.
>>>
>>> On reboot I'm able to mount the filesystem and read files fine but as
>>> soon as I try to delete/write it hangs until the mount is made read
>>> only when it then fails.
>>>
>>> The following command (immediately after boot, no fstab) suggests
>>> perhaps the skip_balance is not working as expected:
>>> $ mount -o skip_balance -t btrfs /dev/sde /mnt/point && btrfs device
>>> add /dev/loop12 /mnt/point/
>>> ERROR: unable to start device add, another exclusive operation
>>> 'balance' in progress
>>
>> skip_balance makes the balance into the paused status.
>> You still need to cancel it first.
>>
>>> and ps shows a [btrfs-balance] process.
>>
>> Furthermore, balance won't help for your case.
>>
>> Both metadata and data are almost full.
>>
>>>
>>> If I perform a rm or truncate during this window it fails to perform
>>> any action before being marked read only. The same applies if I
>>> attempt to cancel the balance.
>>>
>>> How can I get out of this cycle? I've previously run out of space and
>>> been able to recover by deleting a few files etc without needing to
>>> invoke skip_balance, but that was likely on older versions.
>>>
>>> Any help would be greatly appreciated.
>>>
>>> - Stefan
>>>
>>> $ uname -a
>>> Linux my.host 5.15.0-73-generic #80-Ubuntu SMP Mon May 15 15:18:26 UTC
>>> 2023 x86_64 x86_64 x86_64 GNU/Linux
>>> $ btrfs --version
>>> btrfs-progs v5.16.2
>>> $ btrfs fi show
>>> Label: none uuid: ---
>>> Total devices 8 FS bytes used 64.67TiB
>>> devid 1 size 10.91TiB used 10.91TiB path /dev/sdk
>>> devid 2 size 10.91TiB used 10.91TiB path /dev/sdh
>>> devid 3 size 10.91TiB used 10.91TiB path /dev/sdj
>>> devid 4 size 10.91TiB used 10.91TiB path /dev/sdi
>>> devid 5 size 10.91TiB used 10.91TiB path /dev/sdf
>>> devid 6 size 10.91TiB used 10.91TiB path /dev/sdg
>>> devid 7 size 10.91TiB used 10.91TiB path /dev/sdd
>>> devid 8 size 10.91TiB used 10.91TiB path /dev/sde
>>> $ btrfs fi df /mnt/point/
>>> Data, RAID6: total=64.76TiB, used=64.59TiB
>>> System, RAID1C4: total=37.00MiB, used=5.11MiB
>>> Metadata, RAID1C4: total=77.79GiB, used=77.10GiB
>>> GlobalReserve, single: total=512.00MiB, used=387.11MiB
>>> $
>>>
>>
>> My recommendation is, try some newer kernel (easier with a rolling
>> distro liveCD).
>>
>> Still with skip_balance, cancel the balance, and delete a small file
>> first, then sync, and check if the fs is still fine.
>>
>> Then start with larger and larger files/subvolumes.
>>
>> Thanks,
>> Qu
>>
>>> BTRFS: Transaction aborted (error -28)
>>> BTRFS: error (device sdk) in __btrfs_free_extent:3180: errno=-28 No space left
>>> BTRFS info (device sdk): forced readonly
>>> BTRFS error (device sdk): failed to run delayed ref for logical
>>> 101911627694080 num_bytes 126976 type 184 action 2 ref_mod 1: -28
>>> WARNING: CPU: 2 PID: 7851 at fs/btrfs/extent-tree.c:3180
>>> __btrfs_free_extent+0x7e4/0x950 [btrfs]
>>> BTRFS: error (device sdk) in btrfs_run_delayed_refs:2152: errno=-28 No
>>> space left
>>> BTRFS warning (device sdk): btrfs_uuid_scan_kthread failed -28
>>> Modules linked in: xt_nat xt_tcpudp veth xt_conntrack nft_chain_nat
>>> xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6
>>> nf_defrag_ipv4 xfrm_user xfrm_algo nft_counter xt_addrtype nft_compat
>>> nf_tables nfnetlink br_netfilter bridge stp llc ipmi_devintf
>>> ipmi_msghandler overlay binfmt_misc intel_rapl_msr intel_rapl_common
>>> edac_mce_amd snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio
>>> snd_hda_codec_hdmi kvm_amd nls_iso8859_1 kvm snd_hda_intel rapl
>>> snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core
>>> wmi_bmof input_leds snd_hwdep snd_pcm k10temp snd_timer snd ccp
>>> soundcore mac_hid sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc
>>> scsi_dh_alua bonding tls ramoops pstore_blk msr reed_solomon
>>> pstore_zone efi_pstore nfsd auth_rpcgss nfs_acl lockd grace sunrpc
>>> ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10
>>> raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor
>>> raid6_pq libcrc32c raid1 raid0 multipath linear
>>> hid_generic usbhid hid uas usb_storage amdgpu iommu_v2 gpu_sched
>>> drm_ttm_helper crct10dif_pclmul ttm drm_kms_helper syscopyarea
>>> sysfillrect sysimgblt fb_sys_fops crc32_pclmul cec ghash_clmulni_intel
>>> aesni_intel mpt3sas rc_core raid_class crypto_simd drm nvme i2c_piix4
>>> cryptd scsi_transport_sas igb dca ahci libahci xhci_pci qlcnic
>>> i2c_algo_bit nvme_core xhci_pci_renesas wmi video
>>> CPU: 2 PID: 7851 Comm: btrfs-transacti Not tainted 5.15.0-73-generic #80-Ubuntu
>>> Hardware name: To Be Filled By O.E.M. X570M Pro4/X570M Pro4, BIOS
>>> P3.70 02/23/2022
>>> RIP: 0010:__btrfs_free_extent+0x7e4/0x950 [btrfs]
>>> Code: a0 48 05 50 0a 00 00 f0 48 0f ba 28 03 72 1d 8b 45 84 83 f8 fb
>>> 74 32 83 f8 e2 74 2d 89 c6 48 c7 c7 98 f6 34 c1 e8 ed 42 a9 e6 <0f> 0b
>>> 8b 4d 84 48 8b 7d 90 ba 6c 0c 00 00 48 c7 c6 60 39 34 c1 e8
>>> RSP: 0018:ffffb63581c9fb68 EFLAGS: 00010286
>>> RAX: 0000000000000000 RBX: 00000000000000d1 RCX: 0000000000000027
>>> RDX: ffff8ceda0aa0588 RSI: 0000000000000001 RDI: ffff8ceda0aa0580
>>> RBP: ffffb63581c9fc10 R08: 0000000000000003 R09: fffffffffffe2710
>>> R10: 000000002938322d R11: 00000000322d2072 R12: 00005cb02659c000
>>> R13: 00000000000014ce R14: ffff8ce8ab3fb7e0 R15: ffff8ce8de433800
>>> FS: 0000000000000000(0000) GS:ffff8ceda0a80000(0000) knlGS:0000000000000000
>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> CR2: 000055f2f46bb4c8 CR3: 000000010814c000 CR4: 00000000003506e0
>>> Call Trace:
>>> <TASK>
>>> run_delayed_data_ref+0x93/0x160 [btrfs]
>>> btrfs_run_delayed_refs_for_head+0x193/0x520 [btrfs]
>>> __btrfs_run_delayed_refs+0x8c/0x1d0 [btrfs]
>>> btrfs_run_delayed_refs+0x73/0x200 [btrfs]
>>> btrfs_start_dirty_block_groups+0x296/0x4f0 [btrfs]
>>> btrfs_commit_transaction+0x716/0xaa0 [btrfs]
>>> ? start_transaction+0xd1/0x5b0 [btrfs]
>>> ? __bpf_trace_hrtimer_init+0x20/0x20
>>> transaction_kthread+0x13c/0x1b0 [btrfs]
>>> ? btrfs_cleanup_transaction.isra.0+0x3c0/0x3c0 [btrfs]
>>> kthread+0x12a/0x150
>>> ? set_kthread_struct+0x50/0x50
>>> ret_from_fork+0x22/0x30
>>> </TASK>
>>> ---[ end trace 8a20922ac453f776 ]---
>>> BTRFS: error (device sdk) in __btrfs_free_extent:3180: errno=-28 No space left
>>> BTRFS error (device sdk): failed to run delayed ref for logical
>>> 101911627415552 num_bytes 126976 type 184 action 2 ref_mod 1: -28
>>> BTRFS: error (device sdk) in btrfs_run_delayed_refs:2152: errno=-28 No
>>> space left
next prev parent reply other threads:[~2023-06-12 11:00 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-12 4:47 Out of space loop: skip_balance not working Stefan N
2023-06-12 5:20 ` Qu Wenruo
2023-06-12 10:31 ` Stefan N
2023-06-12 10:46 ` Qu Wenruo [this message]
2023-06-12 13:02 ` Stefan N
2023-06-13 1:29 ` Paul Jones
2023-06-13 1:54 ` Stefan N
2023-06-13 1:58 ` Qu Wenruo
2023-06-17 5:11 ` Stefan N
2023-06-17 5:30 ` Qu Wenruo
2023-06-22 8:33 ` Stefan N
2023-06-22 9:18 ` Qu Wenruo
2023-06-22 22:18 ` Stefan N
2023-06-23 0:57 ` Qu Wenruo
2023-06-23 9:00 ` Stefan N
2023-06-23 9:46 ` Qu Wenruo
2023-06-24 15:29 ` Stefan N
2023-06-26 10:18 ` Qu Wenruo
2023-06-26 12:58 ` Stefan N
2023-07-22 5:28 ` Stefan N
2023-07-22 10:08 ` Qu Wenruo
[not found] ` <CA+W5K0oDRo2LZMiUiysYXpcpmfXTvS27hPdjm1pzq4kfq9=vdQ@mail.gmail.com>
2023-07-23 7:23 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=40ecba88-9de2-7315-4ac5-e3eb892aac39@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=stefannnau@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox