Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Boris Burkov <boris@bur.io>
To: Marc MERLIN <marc_btrfs@merlins.org>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Simple quota unsafe? RIP: 0010:__btrfs_free_extent.isra.0+0xc41/0x1020 [btrfs] / do_free_extent_accounting:2999: errno=-2 No such entry
Date: Tue, 21 Apr 2026 19:26:27 -0700	[thread overview]
Message-ID: <20260422022627.GA1034721@zen.localdomain> (raw)
In-Reply-To: <aeLNxjmUGGhnW5ss@merlins.org>

On Fri, Apr 17, 2026 at 05:18:14PM -0700, Marc MERLIN wrote:
> On Fri, Apr 17, 2026 at 04:16:03PM -0700, Boris Burkov wrote:
> > Rad, so there is some more "exciting" bug with balance lurking there.
>  
> Correct, all 3 times the bug happened, it was during balance.
> And now that you mention it, all 3 filesystems it happened to had a
> bunch of data already before I enabled squota on them (because those
> FSes were several years old, and created with much older kernels).
> 
> On all 3, I ran:
> btrfstune -n -x -r $DEV; btrfstune --enable-simple-quota $DEV ; btrfstune --convert-to-block-group-tree $DEV
> and then on the mounted filesystem
> btrfs quota enable --simple .
> 
> I'm not sure how many of -n -x -r were already enabled before, but 
> 100% know squota were not and neither were block group trees
> 
> My last remaining FS with squota running is the only one that had
> squotas on it from the start (with the btrfs quota comamand but the FS
> was empty).
> If you feel the bug might not trigger in that use case, I can try to
> re-enable balance and scrub on it, but it's another 30TB FS, so it sucks
> if I lose it. Then again, I think we now know I can recover without
> losing it, so should I go ahead and re-enable them with 6.19 (even if
> 6.19 did crash on my older FS where I enabled squota after data was
> already there?)

I believe I have reproduced the balance bug, and in my reproducer, the
fact that the subvolume predated squota was critical.

reproducer sketch:
- create subvol
- write stuff
- snapshot subvol
- enable squota (usage = 0)
- delete subvol (leave snapshot)
- run a data balance that hits an extent in the snapshot (owned by subvol)
- balance double counts extents in squotas (valid interpretation) but
  critically, creates a new tree block with the old dead owner but a
  fresh generation.
- run_delayed_refs happens now, say from a commit running (important race),
  writing that dangerous tree block to disk (in the reloc root)
- we do the pointer swapping and drop the reloc root. the nodes of the
  reloc tree are of this bogus form and cause your abort
- I believe the snapshot also has some bogus leaves and would abort
  later, too.

So the thing that is definitely dangerous, as far as I can tell, is
running balance on a filesystem where squotas was enabled after any
subvolume owning a shared extent (snapshot or reflink) was deleted.

Is that story consistent with your situation? It sounds like yes, but I
think it's nice to double check :)

At any rate, I think that your fs is safe from this bug, but at this
point, it is hard to be certain of the safety of other balance vs.
squota interactions.

Now that I am at least clear on the more sophisticated bugs we have, I
think that I am ready to put in some fixes and some defense in depth. I
was feeling sheepish about just wallpapering over it without explaining
your actual bug. So I think you should wait for that, to be safe with
your big FS, even if it happens to not *need* the fixes.

Thanks again for your reports, follow-ups, etc.
Boris

> 
> > mkfs.btrfs -O squota <dev>
>  
> Aaah, I was missing that option, but even if it's a make time, do I
> still need to turn them on with "btrfs quota enable --simple mountpoint"?
> 

FYI
I believe that when btrfs mounts a filesystem with qgroup trees present
it will enable quotas, so you don't need to manually enable them if it
was created with squotas.

> If there is no good documentation on all this, it's been 12 years since
> I wrote all those missing docs/howtos on btrfs in https://marc.merlins.org/perso/btrfs/
> happy to make a new one to put a few new notes on squotas and block-group-tree
> which I was unaware of until just a week ago.
> 
> On the plus side, knowing it's squota and balance makes me feel better
> that btrfs on top of raid5 isn't as unsafe as it used to be (it's
> supposed to be safe, but I've had more than 5 unrecoverable FS crashes
> after swraid5 misbehaved when I was really hoping for just a bit of data
> loss or data corruption that scrub would find and then I could move on.
> Also, I'm pretty sure I had -m dup all these years, so it's sad it
> didn't help)
> 
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
>  
> Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

  reply	other threads:[~2026-04-22  2:27 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-11  3:35 BTRFS discard crash: failed to run delayed ref for logical 15506102321152 num_bytes 16384 type 182 action 2 ref_mod 1: -2 6.11.2) Marc MERLIN
2026-04-11  4:47 ` Qu Wenruo
2026-04-11 12:04 ` Roman Mamedov
2026-04-11 16:22   ` Marc MERLIN
2026-04-12  1:57 ` Marc MERLIN
2026-04-12  1:57   ` Marc MERLIN
2026-04-12  2:28   ` Marc MERLIN
2026-04-12  2:28     ` Marc MERLIN
2026-04-12 17:38     ` Marc MERLIN
2026-04-12 17:38       ` Marc MERLIN
2026-04-12 20:21       ` Marc MERLIN
2026-04-12 20:21         ` Marc MERLIN
2026-04-13  2:14         ` Roman Mamedov
2026-04-13  2:34           ` Marc MERLIN
2026-04-13  2:34             ` Marc MERLIN
2026-04-13 17:52 ` Simple quota unsafe? RIP: 0010:__btrfs_free_extent.isra.0+0xc41/0x1020 [btrfs] / do_free_extent_accounting:2999: errno=-2 No such entry Marc MERLIN
2026-04-13 17:52   ` Marc MERLIN
2026-04-13 18:47   ` Boris Burkov
2026-04-13 19:40     ` Marc MERLIN
2026-04-13 19:40       ` Marc MERLIN
2026-04-15  5:21       ` Marc MERLIN
2026-04-15 17:05         ` Boris Burkov
2026-04-15 17:59           ` Marc MERLIN
2026-04-15 18:44             ` Boris Burkov
2026-04-15 20:22               ` Marc MERLIN
2026-04-15 22:36                 ` Boris Burkov
2026-04-15 22:55                   ` Marc MERLIN
2026-04-15 23:25                     ` Boris Burkov
2026-04-16  0:55                       ` Marc MERLIN
2026-04-16  1:22                         ` Boris Burkov
2026-04-16  0:45                     ` Boris Burkov
2026-04-16  1:08                       ` Marc MERLIN
2026-04-16  1:25                         ` Boris Burkov
2026-04-16 16:51                           ` Simple quota unsafe (FIXED: btrfstune --remove-simple-quota worked) Marc MERLIN
2026-04-16 17:21                           ` Simple quota unsafe? RIP: 0010:__btrfs_free_extent.isra.0+0xc41/0x1020 [btrfs] / do_free_extent_accounting:2999: errno=-2 No such entry Marc MERLIN
2026-04-16 21:36                             ` Boris Burkov
2026-04-16 21:47                               ` Marc MERLIN
2026-04-17 21:51                                 ` Boris Burkov
2026-04-17 22:37                                   ` Marc MERLIN
2026-04-17 23:16                                     ` Boris Burkov
2026-04-18  0:18                                       ` Marc MERLIN
2026-04-22  2:26                                         ` Boris Burkov [this message]
2026-04-22  6:08                                           ` Marc MERLIN
2026-04-22 17:10                                           ` Deleted snapshots stay in squota, mayube because of bees? Marc MERLIN
2026-04-22 19:23                                             ` Boris Burkov
2026-04-22 19:30                                               ` Marc MERLIN
2026-04-22 19:38                                                 ` Boris Burkov
2026-04-22 20:11                                                   ` Marc MERLIN
2026-04-23 19:28                                                     ` Boris Burkov
2026-04-24  2:55                                                       ` Marc MERLIN
2026-04-17  3:43 ` BTRFS discard crash: failed to run delayed ref for logical 15506102321152 num_bytes 16384 type 182 action 2 ref_mod 1: -2 6.11.2) David Disseldorp
2026-04-17  5:19   ` Marc MERLIN

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260422022627.GA1034721@zen.localdomain \
    --to=boris@bur.io \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=marc_btrfs@merlins.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox