From: Marc MERLIN <marc@merlins.org>
To: Boris Burkov <boris@bur.io>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>,
Josef Bacik <josef@toxicpanda.com>, QuWenruo <wqu@suse.com>,
Qu Wenruo <quwenruo.btrfs@gmx.com>,
Filipe Manana <fdmanana@kernel.org>,
Chris Murphy <lists@colorremedies.com>,
Zygo Blaxell <ce3g8jdj@umail.furryterror.org>,
Roman Mamedov <rm@romanrm.net>, To: Su Yue <Damenly_Su@gmx.com>,
Su Yue <suy.fnst@cn.fujitsu.com>;
Subject: Re: Simple quota unsafe? RIP: 0010:__btrfs_free_extent.isra.0+0xc41/0x1020 [btrfs] / do_free_extent_accounting:2999: errno=-2 No such entry
Date: Mon, 13 Apr 2026 12:40:53 -0700 [thread overview]
Message-ID: <ad1GxXgWNBfgrRtN@merlins.org> (raw)
In-Reply-To: <20260413184731.GA3448810@zen.localdomain>
On Mon, Apr 13, 2026 at 11:47:31AM -0700, Boris Burkov wrote:
> I am currently a little confused about your full story, so please help
> me make sure I understand. I would like to fix any squotas problems you
> are seeing if possible. I'm going to restate what I have understood from
> your reports to try to confirm I am following properly.
Sure thing, thanks for caring and replying.
For moremagic, the first report, I'm very close to wiping the filesystem
and starting over since I can't mount it read/write.
Ironically if I do that it would be a good time to turn on squota but
at the same time, it may not be safe in 6.12.
> I will call this report 1. Report 1 is from a rpi running 6.12 with possible out
> of tree modules and raid5.
Correct. moremagic is running Raspberry Pi debian which I understand is
running its own kernel to support the chips on the board. Sadly it means
I'm stuck at 6.12 for that one. I didn't know it would be an issue for
btrfs, but if you feel squotas are not ready/safe in 6.12, I'll disable
them (well, it looks like I will be doing that no matter what since I
can't have moremagic crash its 22TB filesystem , but still your feedback
will be valued).
> I'll call this report 2. Report 2 is from a laptop with no fancy raid
> and upstream kernel 6.17.
Correct. amd4 system and Package: linux-image-6.17.11+deb14-amd64-unsigned
from upstream debian which I assume is clean.
> Is that all accurate?
>
> Some further questions/observations:
> - I noticed that your paste from report 1 (https://pastebin.com/7HmQwy3n)
> had 16k pages and 4k block size:
> 2026-04-10T10:43:22.673638-07:00 moremagic kernel: BTRFS warning (device dm-0): read-write for sector size 4096 with page size 16384 is experimental
Yeah, I saw that too. I don't have much of a choice on arm, they have
switched to 16k.
I tried formatting my new filesystems as 16k native and then had to
revert once I realized it broke btrfs send/receive (cannot send from 4k
FS to 16k fs).
> which seems a bit risky on an old kernel. There were a lot of fixes for
> subpage block size support in recent kernels. I believe it has been
> quite stable for us on 6.16 but Qu can give the most authoritative
> answer on when that got solid.
I would love to know, yes.
> - Is the laptop also running subpage block size? Do you have a full
> dmesg from that systewhich you can share?
The laptop is as simple and basic
> - On which of these systems did you enable squotas and when?
5 systems, I enabled them 4 days ago and already had 2 crashes
I did also enable block-group-tree at the same time since I read
it really helps when I have 100+ snapshots on a single filesystem
(due to backup server, btrfs send receive and historical snapshots)
- rPi5 with that 6.12 kernel (moremagic, the one with the crash). One
crashed on a 4k btrfs disk array that was built a long time ago and I
just converted to squota
- 2nd Rpi5 with same kernel (no choice) with FS I just rebuilt last
night once I realized I can't use 16k pages. On top of raid5 on top of
SSDs. It's currently doing a multi day long btrfs receive to populate
aragorn:/mnt/btrfs_pool1# df -h | grep btrfs_pool
/dev/mapper/dshelf1 30T 3.4T 26T 12% /mnt/btrfs_pool1 (raid5 8 SSDs)
/dev/mapper/dshelf2 25T 84G 25T 1% /mnt/btrfs_pool2 (raid6 10 SSDs)
This one is fresh so the squotas would be useful. I have not disabled
them yet, but probably will as soon as you confirm it's probably not
safe, especially with 16k pages in the kernel but 4k in the filesystem
It's the only one I still have running now.
- laptop #1 where I enabled them on every filesystem (6.16.8-amd64, will
reboot to 6.19.11 as soon as I can reboot), but given that squotas are
kind of useless on existing filesystems since you can't backfill
missing quota data, I'm going to disable them now, I can't have that
laptop crash
- laptop #2 (merlin, the one that crashed). Similar debian install and
btrfs filesystems. btrfs_pool3 gets btrfs receive backups ,
juggles snapshots and btrfs balance nightly. Thankfully the
filesystem was fixable, I've brought it back online and disabled
quotas on all filesystems too.
- old file server running 6.16.8-amd64-preempt-sysrq-20241007 I built
myself from source. It also has not crashed yet, but I had just
enabled quotas on a big 22T spinning rust array that I'm finishing
a bit send/received to. Given that I really don't want to lose that
only backup left with the one I just lost on moremagic, and squota
isn't that useful on an existing filesystem, I just turned that one
off too. I ran for 2 days, but there were no nightly snapshots or
btrfs balance happening nightly on it.
> I don't see any evidence for that, as discussed above about the object
> type referenced in the abort log. In fact, we don't really know that the
> freeing even had to do with the subvolume being deleted as we were
> running generic delayed refs as part of a consistency enforcing
> transaction commit before digging into qgroup logic. We have not
> connected the logical block that had the issue to subvol 83288, for
> which we would probably need a tree dump.
understood.
> Unfortunately, this second bullet is nonsense, the qgroup cleanup log
> is there simply because that is the caller of btrfs_commit_transaction
> that consumed the failed delayed ref errno and also logged its own
> failure. This is apparent from the stack trace and logs. This actually
> confused and distracted me quite a bit :)
apologies for that. Normally I only use gemini on stuff I personally
know and understand, so I can easily tell if it's full of crap, but in
the case of btrfs kernel code, I'm clueless, sadly.
Despite that, hopefully the multiple oops and tracebacks give some clue.
For now, I've disabled squota everywhere but one after my 2nd laptop got
hit overnight. I'll see if I get more issues on it later which won't
prove anything, but give some correlation ("crashed with squota after 3
days, fine for many days without"). I understand the rPi is more
problematic due to non standard kernel I can't upgrade.
Thanks much for your time,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08
WARNING: multiple messages have this Message-ID (diff)
From: Marc MERLIN <marc_btrfs@merlins.org>
To: Boris Burkov <boris@bur.io>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>,
Josef Bacik <josef@toxicpanda.com>, QuWenruo <wqu@suse.com>,
Qu Wenruo <quwenruo.btrfs@gmx.com>,
Filipe Manana <fdmanana@kernel.org>,
Chris Murphy <lists@colorremedies.com>,
Zygo Blaxell <ce3g8jdj@umail.furryterror.org>,
Roman Mamedov <rm@romanrm.net>, To: Su Yue <Damenly_Su@gmx.com>,
Su Yue <suy.fnst@cn.fujitsu.com>;
Subject: Re: Simple quota unsafe? RIP: 0010:__btrfs_free_extent.isra.0+0xc41/0x1020 [btrfs] / do_free_extent_accounting:2999: errno=-2 No such entry
Date: Mon, 13 Apr 2026 12:40:53 -0700 [thread overview]
Message-ID: <ad1GxXgWNBfgrRtN@merlins.org> (raw)
Message-ID: <20260413194053.6hQUZJW81k-d_PmJKUgxZWfysXHNJh8KXWShWVhZS7Q@z> (raw)
In-Reply-To: <20260413184731.GA3448810@zen.localdomain>
On Mon, Apr 13, 2026 at 11:47:31AM -0700, Boris Burkov wrote:
> I am currently a little confused about your full story, so please help
> me make sure I understand. I would like to fix any squotas problems you
> are seeing if possible. I'm going to restate what I have understood from
> your reports to try to confirm I am following properly.
Sure thing, thanks for caring and replying.
For moremagic, the first report, I'm very close to wiping the filesystem
and starting over since I can't mount it read/write.
Ironically if I do that it would be a good time to turn on squota but
at the same time, it may not be safe in 6.12.
> I will call this report 1. Report 1 is from a rpi running 6.12 with possible out
> of tree modules and raid5.
Correct. moremagic is running Raspberry Pi debian which I understand is
running its own kernel to support the chips on the board. Sadly it means
I'm stuck at 6.12 for that one. I didn't know it would be an issue for
btrfs, but if you feel squotas are not ready/safe in 6.12, I'll disable
them (well, it looks like I will be doing that no matter what since I
can't have moremagic crash its 22TB filesystem , but still your feedback
will be valued).
> I'll call this report 2. Report 2 is from a laptop with no fancy raid
> and upstream kernel 6.17.
Correct. amd4 system and Package: linux-image-6.17.11+deb14-amd64-unsigned
from upstream debian which I assume is clean.
> Is that all accurate?
>
> Some further questions/observations:
> - I noticed that your paste from report 1 (https://pastebin.com/7HmQwy3n)
> had 16k pages and 4k block size:
> 2026-04-10T10:43:22.673638-07:00 moremagic kernel: BTRFS warning (device dm-0): read-write for sector size 4096 with page size 16384 is experimental
Yeah, I saw that too. I don't have much of a choice on arm, they have
switched to 16k.
I tried formatting my new filesystems as 16k native and then had to
revert once I realized it broke btrfs send/receive (cannot send from 4k
FS to 16k fs).
> which seems a bit risky on an old kernel. There were a lot of fixes for
> subpage block size support in recent kernels. I believe it has been
> quite stable for us on 6.16 but Qu can give the most authoritative
> answer on when that got solid.
I would love to know, yes.
> - Is the laptop also running subpage block size? Do you have a full
> dmesg from that systewhich you can share?
The laptop is as simple and basic
> - On which of these systems did you enable squotas and when?
5 systems, I enabled them 4 days ago and already had 2 crashes
I did also enable block-group-tree at the same time since I read
it really helps when I have 100+ snapshots on a single filesystem
(due to backup server, btrfs send receive and historical snapshots)
- rPi5 with that 6.12 kernel (moremagic, the one with the crash). One
crashed on a 4k btrfs disk array that was built a long time ago and I
just converted to squota
- 2nd Rpi5 with same kernel (no choice) with FS I just rebuilt last
night once I realized I can't use 16k pages. On top of raid5 on top of
SSDs. It's currently doing a multi day long btrfs receive to populate
aragorn:/mnt/btrfs_pool1# df -h | grep btrfs_pool
/dev/mapper/dshelf1 30T 3.4T 26T 12% /mnt/btrfs_pool1 (raid5 8 SSDs)
/dev/mapper/dshelf2 25T 84G 25T 1% /mnt/btrfs_pool2 (raid6 10 SSDs)
This one is fresh so the squotas would be useful. I have not disabled
them yet, but probably will as soon as you confirm it's probably not
safe, especially with 16k pages in the kernel but 4k in the filesystem
It's the only one I still have running now.
- laptop #1 where I enabled them on every filesystem (6.16.8-amd64, will
reboot to 6.19.11 as soon as I can reboot), but given that squotas are
kind of useless on existing filesystems since you can't backfill
missing quota data, I'm going to disable them now, I can't have that
laptop crash
- laptop #2 (merlin, the one that crashed). Similar debian install and
btrfs filesystems. btrfs_pool3 gets btrfs receive backups ,
juggles snapshots and btrfs balance nightly. Thankfully the
filesystem was fixable, I've brought it back online and disabled
quotas on all filesystems too.
- old file server running 6.16.8-amd64-preempt-sysrq-20241007 I built
myself from source. It also has not crashed yet, but I had just
enabled quotas on a big 22T spinning rust array that I'm finishing
a bit send/received to. Given that I really don't want to lose that
only backup left with the one I just lost on moremagic, and squota
isn't that useful on an existing filesystem, I just turned that one
off too. I ran for 2 days, but there were no nightly snapshots or
btrfs balance happening nightly on it.
> I don't see any evidence for that, as discussed above about the object
> type referenced in the abort log. In fact, we don't really know that the
> freeing even had to do with the subvolume being deleted as we were
> running generic delayed refs as part of a consistency enforcing
> transaction commit before digging into qgroup logic. We have not
> connected the logical block that had the issue to subvol 83288, for
> which we would probably need a tree dump.
understood.
> Unfortunately, this second bullet is nonsense, the qgroup cleanup log
> is there simply because that is the caller of btrfs_commit_transaction
> that consumed the failed delayed ref errno and also logged its own
> failure. This is apparent from the stack trace and logs. This actually
> confused and distracted me quite a bit :)
apologies for that. Normally I only use gemini on stuff I personally
know and understand, so I can easily tell if it's full of crap, but in
the case of btrfs kernel code, I'm clueless, sadly.
Despite that, hopefully the multiple oops and tracebacks give some clue.
For now, I've disabled squota everywhere but one after my 2nd laptop got
hit overnight. I'll see if I get more issues on it later which won't
prove anything, but give some correlation ("crashed with squota after 3
days, fine for many days without"). I understand the rPi is more
problematic due to non standard kernel I can't upgrade.
Thanks much for your time,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08
next prev parent reply other threads:[~2026-04-13 19:40 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-11 3:35 BTRFS discard crash: failed to run delayed ref for logical 15506102321152 num_bytes 16384 type 182 action 2 ref_mod 1: -2 6.11.2) Marc MERLIN
2026-04-11 4:47 ` Qu Wenruo
2026-04-11 12:04 ` Roman Mamedov
2026-04-11 16:22 ` Marc MERLIN
2026-04-12 1:57 ` Marc MERLIN
2026-04-12 1:57 ` Marc MERLIN
2026-04-12 2:28 ` Marc MERLIN
2026-04-12 2:28 ` Marc MERLIN
2026-04-12 17:38 ` Marc MERLIN
2026-04-12 17:38 ` Marc MERLIN
2026-04-12 20:21 ` Marc MERLIN
2026-04-12 20:21 ` Marc MERLIN
2026-04-13 2:14 ` Roman Mamedov
2026-04-13 2:34 ` Marc MERLIN
2026-04-13 2:34 ` Marc MERLIN
2026-04-13 17:52 ` Simple quota unsafe? RIP: 0010:__btrfs_free_extent.isra.0+0xc41/0x1020 [btrfs] / do_free_extent_accounting:2999: errno=-2 No such entry Marc MERLIN
2026-04-13 17:52 ` Marc MERLIN
2026-04-13 18:47 ` Boris Burkov
2026-04-13 19:40 ` Marc MERLIN [this message]
2026-04-13 19:40 ` Marc MERLIN
2026-04-15 5:21 ` Marc MERLIN
2026-04-15 17:05 ` Boris Burkov
2026-04-15 17:59 ` Marc MERLIN
2026-04-15 18:44 ` Boris Burkov
2026-04-15 20:22 ` Marc MERLIN
2026-04-15 22:36 ` Boris Burkov
2026-04-15 22:55 ` Marc MERLIN
2026-04-15 23:25 ` Boris Burkov
2026-04-16 0:55 ` Marc MERLIN
2026-04-16 1:22 ` Boris Burkov
2026-04-16 0:45 ` Boris Burkov
2026-04-16 1:08 ` Marc MERLIN
2026-04-16 1:25 ` Boris Burkov
2026-04-16 16:51 ` Simple quota unsafe (FIXED: btrfstune --remove-simple-quota worked) Marc MERLIN
2026-04-16 17:21 ` Simple quota unsafe? RIP: 0010:__btrfs_free_extent.isra.0+0xc41/0x1020 [btrfs] / do_free_extent_accounting:2999: errno=-2 No such entry Marc MERLIN
2026-04-16 21:36 ` Boris Burkov
2026-04-16 21:47 ` Marc MERLIN
2026-04-17 21:51 ` Boris Burkov
2026-04-17 22:37 ` Marc MERLIN
2026-04-17 23:16 ` Boris Burkov
2026-04-18 0:18 ` Marc MERLIN
2026-04-17 3:43 ` BTRFS discard crash: failed to run delayed ref for logical 15506102321152 num_bytes 16384 type 182 action 2 ref_mod 1: -2 6.11.2) David Disseldorp
2026-04-17 5:19 ` Marc MERLIN
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ad1GxXgWNBfgrRtN@merlins.org \
--to=marc@merlins.org \
--cc=Damenly_Su@gmx.com \
--cc=boris@bur.io \
--cc=ce3g8jdj@umail.furryterror.org \
--cc=fdmanana@kernel.org \
--cc=josef@toxicpanda.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=lists@colorremedies.com \
--cc=quwenruo.btrfs@gmx.com \
--cc=rm@romanrm.net \
--cc=suy.fnst@cn.fujitsu.com \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox