public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* BTRFS discard crash: failed to run delayed ref for logical 15506102321152 num_bytes 16384 type 182 action 2 ref_mod 1: -2 6.11.2)
@ 2026-04-11  3:35 Marc MERLIN
  2026-04-11  4:47 ` Qu Wenruo
                   ` (4 more replies)
  0 siblings, 5 replies; 43+ messages in thread
From: Marc MERLIN @ 2026-04-11  3:35 UTC (permalink / raw)
  To: linux-btrfs, Boris Burkov, Josef Bacik, QuWenruo, Qu Wenruo,
	Filipe Manana
  Cc: Chris Murphy, Zygo Blaxell, Roman Mamedov, Su Yue, Su Yue

[Is there a more appropriate way to report FS corruption? Looks like
Emails to just linux-btrfs@vger.kernel.org do not get seen amongst all
the patches hiding a normal Email]

Howdy,

I had btfrs filesystem on top of raid5 with 5 spinning drives.
I mistakenly enabled discard by mistake which caused a crash when the discard thread tried
to run (no discard on those drives)
Kernel 6.12

I worked on recovery using gemini 3.0 pro, mounting read only is fine, but I need read write
or will waste days (probably weeks) recreating this entire 20TB+ backup over the internet

I'm not qualified to say if everything Gemini said was correct, but I think summary is:
1) discard can apparently kill a filesystem when it's hard drives below (it did for me)
2) -o skip_balance,usebackuproot didn't help
3) no way to mount after space cache has been cleared and block-group-tree is enabled
4) still no way to mount read write after removing block-group-tree

It started with:
[23345.326321] BTRFS: error (device dm-0 state A) in do_free_extent_accounting:2996: errno=-2 No such entry
[23345.336394] BTRFS error (device dm-0 state EA): failed to run delayed ref for logical 15506102321152 num_bytes 16384 type 182 action 2 ref_mod 1: -2
[23345.350299] BTRFS: error (device dm-0 state EA) in btrfs_run_delayed_refs:2215: errno=-2 No such entry
[23345.360154] BTRFS warning (device dm-0 state EA):

I ended up with:

moremagic:~# mount -t btrfs -o rw,skip_balance,space_cache=v2,clear_cache /dev/mapper/crypt_bcache0 /mnt/btrfs_bigbackup
BTRFS: device label DS6 devid 1 transid 296950 /dev/mapper/crypt_bcache0 (251:0) scanned by mount (6029)
BTRFS info (device dm-0): first mount of filesystem a97dec85-a0d5-42ab-a0ef-e9b7479fbe43
BTRFS info (device dm-0): using crc32c (crc32c-generic) checksum algorithm
BTRFS warning (device dm-0): read-write for sector size 4096 with page size 16384 is experimental
BTRFS info (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, flush 0, corrupt 5074, gen 0
------------[ cut here ]------------
BTRFS: Transaction aborted (error -2)
WARNING: CPU: 3 PID: 6029 at fs/btrfs/extent-tree.c:2996 __btrfs_free_extent.isra.0+0x13a0/0x14a0 [btrfs]
Modules linked in: dm_crypt dm_mod bcache raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xt_MASQUERADE ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_conntrack xt_LOG nf_log_syslog nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables rfcomm algif_hash algif_skcipher af_alg bnep cp210x brcmfmac_wcc binfmt_misc usbserial hci_uart brcmfmac btbcm vc4 snd_soc_hdmi_codec brcmutil bluetooth drm_display_helper cfg80211 cec drm_dma_helper rpi_hevc_dec ecdh_generic v4l2_mem2mem ecc snd_soc_core pisp_be videobuf2_dma_contig v3d videobuf2_memops videobuf2_v4l2 gpu_sched rfkill videodev drm_shmem_helper snd_compress snd_pcm_dmaengine snd_pcm videobuf2_common rp1_pio snd_timer snd drm_kms_helper mc raspberrypi_gpiomem rp1_fw sg sch_fq_codel ecryptfs fuse drm drm_panel_orientation_quirks backlight nfnetlink ip_tables x_tables raid1 aes_ce_blk aes_ce_cipher ghash_ce gf128mul libaes sha2_ce spidev sha256_arm64 sha1_ce raspberrypi_hwmon sha1_generic ahci i2c_brcmstb spi_bcm2835
 md_mod gpio_keys libahci pwm_fan rp1_adc libata rp1_mailbox nvmem_rmem uio_pdrv_genirq uio btrfs blake2b_generic xor xor_neon raid6_pq zram lz4_compress ipv6
CPU: 3 UID: 0 PID: 6029 Comm: mount Not tainted 6.12.47+rpt-rpi-2712 #1  Debian 1:6.12.47-1+rpt1
Hardware name: Raspberry Pi 5 Model B Rev 1.1 (DT)
pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __btrfs_free_extent.isra.0+0x13a0/0x14a0 [btrfs]
lr : __btrfs_free_extent.isra.0+0x13a0/0x14a0 [btrfs]
sp : ffffc000868bb680
x29: ffffc000868bb720 x28: 0000000000000000 x27: 0000000000002f02
x26: 000000000000007f x25: ffff8001de833aa0 x24: 0000000000004000
x23: 0000000000000000 x22: ffff800102b64e70 x21: 0000000000004000
x20: 00000e1a4bb88000 x19: 00000000fffffffe x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
x11: 00000000000000c0 x10: 0000000000001a40 x9 : ffffd06fce4e06c0
x8 : ffff80011f56e0a0 x7 : 000000042f72a7bd x6 : 0000000000000039
x5 : 0000000000000001 x4 : 0000000000001ab0 x3 : 0000000000000804
x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff80011f56c600
Call trace:
 __btrfs_free_extent.isra.0+0x13a0/0x14a0 [btrfs]
 __btrfs_run_delayed_refs+0x508/0xec0 [btrfs]
 btrfs_run_delayed_refs+0x48/0x198 [btrfs]
 btrfs_commit_transaction+0x88/0xe20 [btrfs]
 btrfs_recover_relocation+0x55c/0x5d0 [btrfs]
 btrfs_start_pre_rw_mount+0x1d4/0x470 [btrfs]
 open_ctree+0x101c/0x13b8 [btrfs]
 btrfs_get_tree+0x5b4/0x800 [btrfs]
 vfs_get_tree+0x30/0x108
 fc_mount+0x20/0x68
 btrfs_get_tree+0x238/0x800 [btrfs]
 vfs_get_tree+0x30/0x108
 vfs_cmd_create+0x58/0xf8
 __arm64_sys_fsconfig+0x444/0x5b8
 invoke_syscall+0x50/0x120
 el0_svc_common.constprop.0+0x48/0xf0
 do_el0_svc+0x24/0x38
 el0_svc+0x30/0xf8
 el0t_64_sync_handler+0x120/0x130
 el0t_64_sync+0x190/0x198
---[ end trace 0000000000000000 ]---
BTRFS: error (device dm-0 state A) in do_free_extent_accounting:2996: errno=-2 No such entry
BTRFS error (device dm-0 state EA): failed to run delayed ref for logical 15506102321152 num_bytes 16384 type 182 action 2 ref_mod 1: -2
BTRFS: error (device dm-0 state EA) in btrfs_run_delayed_refs:2215: errno=-2 No such entry
BTRFS warning (device dm-0 state EA): failed to recover relocation: -2
BTRFS error (device dm-0 state EA): commit super ret -30
BTRFS error (device dm-0 state EA): open_ctree failed: -2


Gemini said

The Btrfs "Ghost" Accounting When you added discard=async to your fstab
(or remounted with it), you told the Btrfs kernel module to start a specific
background thread.
Btrfs's Perspective: "The user told me to use async discard. I will now start a
list of every extent we delete so I can 'trim' them later in the background."
The Problem: Btrfs doesn't check if the underlying dm-crypt device actually
supports discards before it starts its own internal accounting.
The Result: Btrfs started tracking a massive list of "extents to be discarded"
in its memory and metadata.

2. The "No Such Entry" (-2) Race Condition The crash didn't happen because a
command hit a drive; it happened because of a logic race inside the kernel's
Btrfs code:
The Balance Thread: You were running a balance. This thread moves data from "Old
Block A" to "New Block B."
The Discard Thread: Because discard=async was on, the discard thread saw "Old
Block A" get freed. It put "Old Block A" on its "to-do list."
The Metadata Conflict: The balance thread finished moving the data and
successfully deleted the reference to "Old Block A" from the extent tree.
The Crash: A few milliseconds later, the async discard thread woke up and tried
to "pin" or "process" the metadata for "Old Block A." It looked in the tree,
found nothing (because the balance already deleted it), and threw an ENOENT
(Error -2: No such entry).
Btrfs panicked: "Wait, I was told to discard this block, but it doesn't exist in
my records anymore! Something is inconsistent!" → Transaction Abort.

more details:
backuproot didn't work (read write)
I was forced to run
btrfstune --convert-from-block-group-tree /dev/mapper/crypt_bcache0
because
When you ran btrfs check --clear-space-cache v2, the tool did exactly
what it was supposed to do: it deleted the Free Space Tree and removed
the FREE_SPACE_TREE flag from your superblock.
The Conflict: Your 23TB array was formatted with the modern
block-group-tree feature (which speeds up mounting).
The Kernel Rule: The Btrfs kernel code explicitly dictates: If the Block
Group Tree is enabled, the Free Space Tree MUST also be enabled. * The
Crash: Because the FREE_SPACE_TREE flag is now missing, the kernel sees
an "illegal" superblock state and throws a fatal -22 error, refusing to
proceed to the mount options.

This was vexing, hours lost removing the block group tree.
and when it was finally finished, 
mount -t btrfs -o skip_balance /dev/mapper/crypt_bcache0 /mnt/btrfs_bigbackup/
did run, but crashed as above

Now doing a repair in case it can salvage things.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2026-04-18  0:18 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-11  3:35 BTRFS discard crash: failed to run delayed ref for logical 15506102321152 num_bytes 16384 type 182 action 2 ref_mod 1: -2 6.11.2) Marc MERLIN
2026-04-11  4:47 ` Qu Wenruo
2026-04-11 12:04 ` Roman Mamedov
2026-04-11 16:22   ` Marc MERLIN
2026-04-12  1:57 ` Marc MERLIN
2026-04-12  1:57   ` Marc MERLIN
2026-04-12  2:28   ` Marc MERLIN
2026-04-12  2:28     ` Marc MERLIN
2026-04-12 17:38     ` Marc MERLIN
2026-04-12 17:38       ` Marc MERLIN
2026-04-12 20:21       ` Marc MERLIN
2026-04-12 20:21         ` Marc MERLIN
2026-04-13  2:14         ` Roman Mamedov
2026-04-13  2:34           ` Marc MERLIN
2026-04-13  2:34             ` Marc MERLIN
2026-04-13 17:52 ` Simple quota unsafe? RIP: 0010:__btrfs_free_extent.isra.0+0xc41/0x1020 [btrfs] / do_free_extent_accounting:2999: errno=-2 No such entry Marc MERLIN
2026-04-13 17:52   ` Marc MERLIN
2026-04-13 18:47   ` Boris Burkov
2026-04-13 19:40     ` Marc MERLIN
2026-04-13 19:40       ` Marc MERLIN
2026-04-15  5:21       ` Marc MERLIN
2026-04-15 17:05         ` Boris Burkov
2026-04-15 17:59           ` Marc MERLIN
2026-04-15 18:44             ` Boris Burkov
2026-04-15 20:22               ` Marc MERLIN
2026-04-15 22:36                 ` Boris Burkov
2026-04-15 22:55                   ` Marc MERLIN
2026-04-15 23:25                     ` Boris Burkov
2026-04-16  0:55                       ` Marc MERLIN
2026-04-16  1:22                         ` Boris Burkov
2026-04-16  0:45                     ` Boris Burkov
2026-04-16  1:08                       ` Marc MERLIN
2026-04-16  1:25                         ` Boris Burkov
2026-04-16 16:51                           ` Simple quota unsafe (FIXED: btrfstune --remove-simple-quota worked) Marc MERLIN
2026-04-16 17:21                           ` Simple quota unsafe? RIP: 0010:__btrfs_free_extent.isra.0+0xc41/0x1020 [btrfs] / do_free_extent_accounting:2999: errno=-2 No such entry Marc MERLIN
2026-04-16 21:36                             ` Boris Burkov
2026-04-16 21:47                               ` Marc MERLIN
2026-04-17 21:51                                 ` Boris Burkov
2026-04-17 22:37                                   ` Marc MERLIN
2026-04-17 23:16                                     ` Boris Burkov
2026-04-18  0:18                                       ` Marc MERLIN
2026-04-17  3:43 ` BTRFS discard crash: failed to run delayed ref for logical 15506102321152 num_bytes 16384 type 182 action 2 ref_mod 1: -2 6.11.2) David Disseldorp
2026-04-17  5:19   ` Marc MERLIN

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox