From: Jorge Bastos <jorge.mrbastos@gmail.com>
To: Naohiro Aota <Naohiro.Aota@wdc.com>
Cc: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>,
Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Btrfs with zoned devices
Date: Wed, 25 Feb 2026 10:11:53 +0000 [thread overview]
Message-ID: <CAHzMYBRCgV9A-iF0WgyaAiHYC_noTszOh7w6J4bSzB1v5dpC7w@mail.gmail.com> (raw)
In-Reply-To: <CAHzMYBTUrh7XGkj3VD5rOew+8s_9D0U8bskureT2jZAx4wLWfQ@mail.gmail.com>
One final update: I changed the commit mount option to commit=360, and
it finished the initial data loading without more crashes
While I cannot be certain this helped, it crashed 3 times for the
first 5T loaded, and after adding that, it loaded the remaining 8T
without any more crashes, so it may have.
In any case, now the data will be mostly static, with small updates
once in a while, so I hope it will no longer crash.
Cheers,
Jorge
On Tue, Feb 24, 2026 at 9:03 AM Jorge Bastos <jorge.mrbastos@gmail.com> wrote:
>
> Roger that, sorry I can't provide the debug info, I'm afraid it
> crashed ahgain, so the mount options didn't help, but there are some
> new erros before the crash this time, not sure if they help, hundreds
> of lines like these:
>
> Feb 24 08:32:26 Tower7 kernel: BTRFS error (device sdc): failed to run
> delalloc range, root=5 ino=2483 folio=288358400 submit_bitmap=0
> start=288358400 len=524288: -28
> Feb 24 08:32:26 Tower7 kernel: BTRFS error (device sdc):
> cow_file_range failed, root=5 inode=2481 start=739246080 len=524288
> cur_offset=739246080 cur_alloc_size=0: -28
> Feb 24 08:32:26 Tower7 kernel: BTRFS error (device sdc): failed to run
> delalloc range, root=5 ino=2481 folio=739246080 submit_bitmap=0
> start=739246080 len=524288: -28
> Feb 24 08:32:26 Tower7 kernel: BTRFS error (device sdc):
> cow_file_range failed, root=5 inode=2480 start=747634688 len=524288
> cur_offset=747634688 cur_alloc_size=0: -28
> Feb 24 08:32:26 Tower7 kernel: BTRFS error (device sdc): failed to run
> delalloc range, root=5 ino=2480 folio=747634688 submit_bitmap=0
> start=747634688 len=524288: -28
> Feb 24 08:32:26 Tower7 kernel: BTRFS error (device sdc):
> cow_file_range failed, root=5 inode=2477 start=862453760 len=524288
> cur_offset=862453760 cur_alloc_size=0: -28
> Feb 24 08:32:26 Tower7 kernel: BTRFS error (device sdc): failed to run
> delalloc range, root=5 ino=2477 folio=862453760 submit_bitmap=0
> start=862453760 len=524288: -28
> Feb 24 08:32:26 Tower7 kernel: BTRFS error (device sdc):
> cow_file_range failed, root=5 inode=2472 start=1995964416 len=524288
> cur_offset=1995964416 cur_alloc_size=0: -28
> Feb 24 08:32:26 Tower7 kernel: BTRFS error (device sdc): failed to run
> delalloc range, root=5 ino=2472 folio=1995964416 submit_bitmap=0
> start=1995964416 len=524288: -28
> Feb 24 08:32:26 Tower7 kernel: BTRFS error (device sdc):
> cow_file_range failed, root=5 inode=2479 start=726138880 len=524288
> cur_offset=726138880 cur_alloc_size=0: -28
> Feb 24 08:32:26 Tower7 kernel: BTRFS error (device sdc): failed to run
> delalloc range, root=5 ino=2479 folio=726138880 submit_bitmap=0
> start=726138880 len=524288: -28
> Feb 24 08:32:26 Tower7 kernel: BTRFS error (device sdc):
> cow_file_range failed, root=5 inode=2473 start=1681391616 len=524288
> cur_offset=1681391616 cur_alloc_size=0: -28
> Feb 24 08:32:26 Tower7 kernel: BTRFS error (device sdc): failed to run
> delalloc range, root=5 ino=2473 folio=1681391616 submit_bitmap=0
> start=1681391616 len=524288: -28
> Feb 24 08:32:27 Tower7 kernel: BTRFS error (device sdc):
> cow_file_range failed, root=5 inode=2471 start=2265972736 len=524288
> cur_offset=2265972736 cur_alloc_size=0: -28
> Feb 24 08:32:27 Tower7 kernel: BTRFS error (device sdc): failed to run
> delalloc range, root=5 ino=2471 folio=2265972736 submit_bitmap=0
> start=2265972736 len=524288: -28
> Feb 24 08:32:27 Tower7 kernel: BTRFS error (device sdc):
> cow_file_range failed, root=5 inode=2475 start=1724907520 len=524288
> cur_offset=1724907520 cur_alloc_size=0: -28
> Feb 24 08:32:27 Tower7 kernel: BTRFS error (device sdc): failed to run
> delalloc range, root=5 ino=2475 folio=1724907520 submit_bitmap=0
> start=1724907520 len=524288: -28
> Feb 24 08:32:27 Tower7 kernel: BTRFS error (device sdc):
> cow_file_range failed, root=5 inode=2474 start=1701314560 len=524288
> cur_offset=1701314560 cur_alloc_size=0: -28
> Feb 24 08:32:27 Tower7 kernel: BTRFS error (device sdc): failed to run
> delalloc range, root=5 ino=2474 folio=1701314560 submit_bitmap=0
> start=1701314560 len=524288:
>
> Then it crashed:
>
> Feb 24 08:32:30 Tower7 kernel: BTRFS: error (device sdc) in
> btrfs_commit_transaction:2536: errno=-11 unknown (Error while writing
> out transaction)
> Feb 24 08:32:30 Tower7 kernel: BTRFS info (device sdc state E): forced readonly
> Feb 24 08:32:30 Tower7 kernel: BTRFS warning (device sdc state E):
> Skipping commit of aborted transaction.
> Feb 24 08:32:30 Tower7 kernel: ------------[ cut here ]------------
> Feb 24 08:32:30 Tower7 kernel: BTRFS: Transaction aborted (error -11)
> Feb 24 08:32:30 Tower7 kernel: WARNING: CPU: 9 PID: 21919 at
> fs/btrfs/transaction.c:2021 btrfs_commit_transaction+0x994/0xb20
> Feb 24 08:32:30 Tower7 kernel: Modules linked in: br_netfilter
> nft_compat nf_conntrack_netlink xt_nat af_packet iptable_raw veth
> xt_conntrack bridge stp llc xfrm_user xfrm_algo xt_set ip_set
> xt_addrtype md_mod xt_MASQUERADE xt_tcpudp xt_mark tun nf_tables
> nfnetlink ip6table_nat iptable_nat nf_nat nf_conntrack nf_defrag_ipv6
> nf_defrag_ipv4 ipmi_devintf ip6table_filter ip6_tables iptable_filter
> ip_tables x_tables macvtap macvlan tap mlx5_core mlxfw tls igb
> intel_rapl_msr amd64_edac edac_mce_amd edac_core intel_rapl_common
> kvm_amd ast drm_shmem_helper drm_client_lib drm_kms_helper ipmi_ssif
> kvm ghash_clmulni_intel aesni_intel drm rapl acpi_cpufreq i2c_algo_bit
> backlight input_leds joydev led_class ccp i2c_piix4 i2c_smbus ses
> acpi_ipmi enclosure k10temp i2c_core ipmi_si button zfs(PO) spl(O)
> [last unloaded: mlxfw]
> Feb 24 08:32:30 Tower7 kernel: CPU: 9 UID: 0 PID: 21919 Comm:
> btrfs-transacti Tainted: P W O 6.18.9-Unraid #4
> PREEMPT(voluntary)
> Feb 24 08:32:30 Tower7 kernel: Tainted: [P]=PROPRIETARY_MODULE,
> [W]=WARN, [O]=OOT_MODULE
> Feb 24 08:32:30 Tower7 kernel: Hardware name: Supermicro Super
> Server/H11SSL-i, BIOS 2.4 12/27/2021
> Feb 24 08:32:30 Tower7 kernel: RIP: 0010:btrfs_commit_transaction+0x994/0xb20
> Feb 24 08:32:30 Tower7 kernel: Code: ba ff 49 8b 7c 24 60 89 da 48 c7
> c6 2a 81 57 82 e8 81 14 a9 ff e8 2c ef ba ff eb 10 89 de 48 c7 c7 4b
> 81 57 82 e8 6c d5 b1 ff <0f> 0b 41 b0 01 41 83 e0 01 89 d9 ba e5 07 00
> 00 4c 89 e7 48 c7 c6
> Feb 24 08:32:30 Tower7 kernel: RSP: 0018:ffffc9002dde7de0 EFLAGS: 00010282
> Feb 24 08:32:30 Tower7 kernel: RAX: 0000000000000000 RBX:
> 00000000fffffff5 RCX: 0000000000000002
> Feb 24 08:32:30 Tower7 kernel: RDX: 0000000000000027 RSI:
> ffffffff825f9e70 RDI: 00000000ffffffff
> Feb 24 08:32:30 Tower7 kernel: RBP: ffff8881d3244000 R08:
> 0000000000000000 R09: 0000000000000000
> Feb 24 08:32:30 Tower7 kernel: R10: 0000000000000019 R11:
> 00000000312d2072 R12: ffff88826fe59888
> Feb 24 08:32:30 Tower7 kernel: R13: ffff888302e73000 R14:
> ffff8881d3244000 R15: ffff88828da56300
> Feb 24 08:32:30 Tower7 kernel: FS: 0000000000000000(0000)
> GS:ffff88a0499bc000(0000) knlGS:0000000000000000
> Feb 24 08:32:30 Tower7 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Feb 24 08:32:30 Tower7 kernel: CR2: 00001455a3905d88 CR3:
> 000000026dba6000 CR4: 0000000000350ef0
> Feb 24 08:32:30 Tower7 kernel: Call Trace:
> Feb 24 08:32:30 Tower7 kernel: <TASK>
> Feb 24 08:32:30 Tower7 kernel: ? srso_return_thunk+0x5/0x5f
> Feb 24 08:32:30 Tower7 kernel: ? start_transaction+0x46e/0x5e0
> Feb 24 08:32:30 Tower7 kernel: ? hrtimer_nanosleep_restart+0x50/0x60
> Feb 24 08:32:30 Tower7 kernel: transaction_kthread+0xf0/0x170
> Feb 24 08:32:30 Tower7 kernel: ? __pfx_transaction_kthread+0x10/0x10
> Feb 24 08:32:30 Tower7 kernel: kthread+0x1ce/0x1e0
> Feb 24 08:32:30 Tower7 kernel: ? finish_task_switch.isra.0+0x13c/0x210
> Feb 24 08:32:30 Tower7 kernel: ? finish_task_switch.isra.0+0x139/0x210
> Feb 24 08:32:30 Tower7 kernel: ? __pfx_kthread+0x10/0x10
> Feb 24 08:32:30 Tower7 kernel: ? __pfx_kthread+0x10/0x10
> Feb 24 08:32:30 Tower7 kernel: ret_from_fork+0x24/0x130
> Feb 24 08:32:30 Tower7 kernel: ? __pfx_kthread+0x10/0x10
> Feb 24 08:32:30 Tower7 kernel: ret_from_fork_asm+0x1a/0x30
> Feb 24 08:32:30 Tower7 kernel: </TASK>
> Feb 24 08:32:30 Tower7 kernel: ---[ end trace 0000000000000000 ]---
> Feb 24 08:32:30 Tower7 kernel: BTRFS: error (device sdc state EA) in
> cleanup_transaction:2021: errno=-11 unknown
>
> On Tue, Feb 24, 2026 at 7:02 AM Naohiro Aota <Naohiro.Aota@wdc.com> wrote:
> >
> > On Tue Feb 24, 2026 at 1:03 AM JST, Jorge Bastos wrote:
> > > Thanks for the reply, I'm afraid I'm using a built kernel, and
> > > building my own with debug info is beyond my knowledge.
> > >
> > > Gemini believes the issue may be caused by too many open zones. I've
> > > monitored blkzone report output, and there are typically >100 zones in
> > > the 'Implicitly Open' or 'Explicitly Open' state, and I've seen it go
> > > over 120 before.
> > >
> > > I checked the queue limits for the device and
> > > /sys/block/sdb/queue/max_active_zones reports 0
> > > but
> > > /sys/block/sdb/queue/max_open_zones reports 128
> > >
> > > Could it be that btrfs is hitting the max_open_zones limit and
> > > receiving EAGAIN (-11) during btrfs_commit_transaction, possibly
> > > because it isn't self-limiting based on the max_open_zones value when
> > > max_active_zones is 0."
> >
> > Not really. Btrfs now limits by max_open_zones as below in zoned.c
> >
> > max_active_zones = min_not_zero(bdev_max_active_zones(bdev),
> > bdev_max_open_zones(bdev));
> >
> > And, EAGAIN basically should be coming from
> > btrfs_check_meta_write_pointer() when there is a hole in a writing
> > region. Usually at the transaction commit phase, all metadata in the
> > writing region should be allocated and ready for sequential writing. So,
> > something wrong happens here either mis-ordering or mis-skipping a
> > block.
> >
> > >
> > > Thanks,
> > > Jorge
> > >
> > > On Mon, Feb 23, 2026 at 3:15 PM Johannes Thumshirn
> > > <Johannes.Thumshirn@wdc.com> wrote:
> > >>
> > >> On 2/23/26 3:59 PM, Jorge Bastos wrote:
> > >> > Hi,
> > >> >
> > >> > I'm using a zoned device for the first time; it's a 27TB WD Ultrastar
> > >> > HC680, formatted with single data and DUP metadata.
> > >> >
> > >> > This will be used for non-critical WORM media data, but during the
> > >> > initial data load, using a single rsync thread, the filesystem crashed
> > >> > twice, 1st time after copying around 1.25T, 2nd time around 2.5T
> > >> > total.
> > >> >
> > >> > I'm now using some mount options suggested by LLMs, and it hasn't
> > >> > crashed so far, but it's not been long; currently at 3.58T used.
> > >> >
> > >> > mount -o rw,noatime,commit=60,flushoncommit,discard=async
> > >>
> > >> discard=async doesn't make a lot of sense on zoned and it will be ignored.
> > >>
> > >>
> > >> > My question is, are these mount options good for HM-SMR or do you
> > >> > recommend different ones, and could they help with the crashing?
> > >> >
> > >> >
> > >> > These were the crashes I saw, they look similar to me, and after
> > >> > unmounting and remounting, it worked again:
> > >>
> > >>
> > >> Yes these errors are transient (luckily).
> > >>
> > >>
> > >> > Kernel 6.18.9
> > >> > btrfs-progs v6.17.1
> > >> >
> > >> > 1st one:
> > >> >
> > >> > Feb 22 21:35:56 Tower7 kernel: BTRFS: error (device sdb) in
> > >> > btrfs_commit_transaction:2536: errno=-11 unknown (Error while writing
> > >> > out transaction)
> > >> > Feb 22 21:35:56 Tower7 kernel: BTRFS info (device sdb state E): forced readonly
> > >> > Feb 22 21:35:56 Tower7 kernel: BTRFS warning (device sdb state E):
> > >> > Skipping commit of aborted transaction.
> > >> > Feb 22 21:35:56 Tower7 kernel: ------------[ cut here ]------------
> > >> > Feb 22 21:35:56 Tower7 kernel: BTRFS: Transaction aborted (error -11)
> > >> > Feb 22 21:35:56 Tower7 kernel: WARNING: CPU: 8 PID: 109946 at
> > >> > fs/btrfs/transaction.c:2021 btrfs_commit_transaction+0x994/0xb20
> > >> > Feb 22 21:35:56 Tower7 kernel: Modules linked in: md_mod br_netfilter
> > >> > nft_compat af_packet veth nf_conntrack_netlink xt_nat iptable_raw
> > >> > xt_conntrack bridge stp llc xfrm_user xfrm_algo xt_set ip_set
> > >> > xt_addrtype xt_MASQUERADE xt_tcpudp xt_mark tun nf_tables nfnetlink
> > >> > ip6table_nat iptable_nat nf_nat nf_conntrack nf_defrag_ipv6
> > >> > nf_defrag_ipv4 ipmi_devintf ip6table_filter ip6_tables iptable_filter
> > >> > ip_tables x_tables macvtap macvlan tap mlx5_core mlxfw tls igb
> > >> > intel_rapl_msr amd64_edac edac_mce_amd edac_core intel_rapl_common
> > >> > kvm_amd ast kvm drm_shmem_helper drm_client_lib drm_kms_helper
> > >> > ipmi_ssif ghash_clmulni_intel aesni_intel drm rapl acpi_cpufreq
> > >> > backlight i2c_algo_bit input_leds joydev led_class ccp i2c_piix4
> > >> > i2c_smbus acpi_ipmi ses enclosure i2c_core k10temp ipmi_si button
> > >> > zfs(PO) spl(O) [last unloaded: md_mod]
> > >> > Feb 22 21:35:56 Tower7 kernel: CPU: 8 UID: 0 PID: 109946 Comm:
> > >> > btrfs-transacti Tainted: P W O 6.18.9-Unraid #4
> > >> > PREEMPT(voluntary)
> > >> > Feb 22 21:35:56 Tower7 kernel: Tainted: [P]=PROPRIETARY_MODULE,
> > >> > [W]=WARN, [O]=OOT_MODULE
> > >> > Feb 22 21:35:56 Tower7 kernel: Hardware name: Supermicro Super
> > >> > Server/H11SSL-i, BIOS 2.4 12/27/2021
> > >> > Feb 22 21:35:56 Tower7 kernel: RIP: 0010:btrfs_commit_transaction+0x994/0xb20
> > >> > Feb 22 21:35:56 Tower7 kernel: Code: ba ff 49 8b 7c 24 60 89 da 48 c7
> > >> > c6 2a 81 57 82 e8 81 14 a9 ff e8 2c ef ba ff eb 10 89 de 48 c7 c7 4b
> > >> > 81 57 82 e8 6c d5 b1 ff <0f> 0b 41 b0 01 41 83 e0 01 89 d9 ba e5 07 00
> > >> > 00 4c 89 e7 48 c7 c6
> > >> > Feb 22 21:35:56 Tower7 kernel: RSP: 0018:ffffc9003cac7de0 EFLAGS: 00010282
> > >> > Feb 22 21:35:56 Tower7 kernel: RAX: 0000000000000000 RBX:
> > >> > 00000000fffffff5 RCX: 0000000000000002
> > >> > Feb 22 21:35:56 Tower7 kernel: RDX: 0000000000000027 RSI:
> > >> > ffffffff825f9e70 RDI: 00000000ffffffff
> > >> > Feb 22 21:35:56 Tower7 kernel: RBP: ffff88826a27d000 R08:
> > >> > 0000000000000000 R09: 0000000000000000
> > >> > Feb 22 21:35:56 Tower7 kernel: R10: 0000000000000000 R11:
> > >> > 00000000312d2072 R12: ffff888290a1b7e0
> > >> > Feb 22 21:35:56 Tower7 kernel: R13: ffff888249304c00 R14:
> > >> > ffff88826a27d000 R15: ffff888100ec6300
> > >> > Feb 22 21:35:56 Tower7 kernel: FS: 0000000000000000(0000)
> > >> > GS:ffff88a04997c000(0000) knlGS:0000000000000000
> > >> > Feb 22 21:35:56 Tower7 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >> > Feb 22 21:35:56 Tower7 kernel: CR2: 00007ffcad620af8 CR3:
> > >> > 00000001f5915000 CR4: 0000000000350ef0
> > >> > Feb 22 21:35:56 Tower7 kernel: Call Trace:
> > >> > Feb 22 21:35:56 Tower7 kernel: <TASK>
> > >> > Feb 22 21:35:56 Tower7 kernel: ? srso_return_thunk+0x5/0x5f
> > >> > Feb 22 21:35:56 Tower7 kernel: ? start_transaction+0x46e/0x5e0
> > >> > Feb 22 21:35:56 Tower7 kernel: ? hrtimer_nanosleep_restart+0x50/0x60
> > >> > Feb 22 21:35:56 Tower7 kernel: transaction_kthread+0xf0/0x170
> > >> > Feb 22 21:35:56 Tower7 kernel: ? __pfx_transaction_kthread+0x10/0x10
> > >> > Feb 22 21:35:56 Tower7 kernel: kthread+0x1ce/0x1e0
> > >> > Feb 22 21:35:56 Tower7 kernel: ? srso_return_thunk+0x5/0x5f
> > >> > Feb 22 21:35:56 Tower7 kernel: ? srso_return_thunk+0x5/0x5f
> > >> > Feb 22 21:35:56 Tower7 kernel: ? finish_task_switch.isra.0+0x139/0x210
> > >> > Feb 22 21:35:56 Tower7 kernel: ? __pfx_kthread+0x10/0x10
> > >> > Feb 22 21:35:56 Tower7 kernel: ? __pfx_kthread+0x10/0x10
> > >> > Feb 22 21:35:56 Tower7 kernel: ret_from_fork+0x24/0x130
> > >> > Feb 22 21:35:56 Tower7 kernel: ? __pfx_kthread+0x10/0x10
> > >> > Feb 22 21:35:56 Tower7 kernel: ret_from_fork_asm+0x1a/0x30
> > >> > Feb 22 21:35:56 Tower7 kernel: </TASK>
> > >> > Feb 22 21:35:56 Tower7 kernel: ---[ end trace 0000000000000000 ]---
> > >> > Feb 22 21:35:56 Tower7 kernel: BTRFS: error (device sdb state EA) in
> > >> > cleanup_transaction:2021: errno=-11 unknown
> > >>
> > >> The FS is trying to commit a transaction and something down the path is
> > >> returning EAGAIN. Would be interesting who did it.
> > >>
> > >> Do you have the debug info for this kernel, so we can find out where it
> > >> breaks?
> > >>
prev parent reply other threads:[~2026-02-25 10:12 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-23 14:55 Btrfs with zoned devices Jorge Bastos
2026-02-23 15:15 ` Johannes Thumshirn
2026-02-23 16:03 ` Jorge Bastos
2026-02-24 7:01 ` Naohiro Aota
2026-02-24 9:03 ` Jorge Bastos
2026-02-25 10:09 ` Jorge Bastos
2026-02-25 10:11 ` Jorge Bastos [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAHzMYBRCgV9A-iF0WgyaAiHYC_noTszOh7w6J4bSzB1v5dpC7w@mail.gmail.com \
--to=jorge.mrbastos@gmail.com \
--cc=Johannes.Thumshirn@wdc.com \
--cc=Naohiro.Aota@wdc.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox