Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: dsterba@suse.cz, "Emil.s" <emil@sandnabba.se>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Force remove of broken extent/subvolume? (Crash in btrfs_run_delayed_refs)
Date: Fri, 26 Jul 2024 08:49:10 +0930	[thread overview]
Message-ID: <aeed4735-f6f2-49ef-9a02-816a3b74cbd3@gmx.com> (raw)
In-Reply-To: <20240725224757.GD17473@twin.jikos.cz>



在 2024/7/26 08:17, David Sterba 写道:
> On Thu, Jul 25, 2024 at 11:06:00PM +0200, Emil.s wrote:
>> Hello!
>>
>> I got a corrupt filesystem due to backpointer mismatches:
>> ---
>> [2/7] checking extents
>> data extent[780333588480, 942080] size mismatch, extent item size
>> 925696 file item size 942080
>
> This looks like a single bit flip:
>
>>>> bin(925696)
> '0b11100010000000000000'
>>>> bin(942080)
> '0b11100110000000000000'
>>>> bin(942080 ^ 925696)
> 0b100000000000000'
>
> or an off by one error, as the delta is 0x4000, 4x page which is one
> node size.
>
>> backpointer mismatch on [780333588480 925696]
>> ---
>>
>> However only two extents seem to be affected, in a subvolume only used
>> for backups.
>>
>> Since I've not been able to repair it, I thought that I could just
>> delete the subvolume and recreate it.
>> But now the btrfs_run_delayed_refs function crashes a while after
>> mounting the filesystem. (Which is quite obvious when I think about
>> it, since I guess it's trying to reclaim space, hitting the bad extent
>> in the process?)
>>
>> Anyhow, is it possible to force removal of these extents in any way?
>> My understanding is that extents are mapped to a specific subvolume as
>> well?
>>
>> Here is the full crash dump:
>> https://gist.github.com/sandnabba/e3ed7f57e4d32f404355fdf988fcfbff
>
> WARNING: CPU: 3 PID: 199588 at fs/btrfs/extent-tree.c:858 lookup_inline_extent_backref+0x5c3/0x760 [btrfs]
>
>   858         } else if (WARN_ON(ret)) {
>   859                 btrfs_print_leaf(path->nodes[0]);
>   860                 btrfs_err(fs_info,
>   861 "extent item not found for insert, bytenr %llu num_bytes %llu parent %llu root_objectid %llu owner %llu offset %llu",
>   862                           bytenr, num_bytes, parent, root_objectid, owner,
>   863                           offset);
>   864                 ret = -EUCLEAN;
>   865                 goto out;
>   866         }
>   867
>
> CPU: 3 PID: 199588 Comm: btrfs-transacti Tainted: P           OE      6.9.9-arch1-1 #1 a564e80ab10c5cd5584d6e9a0715907a10e33ca4
> Hardware name: LENOVO 30B4S01W00/102F, BIOS S00KT73A 05/24/2022
> RIP: 0010:lookup_inline_extent_backref+0x5c3/0x760 [btrfs]
> RSP: 0018:ffffabb2cd4e3b00 EFLAGS: 00010202
> RAX: 0000000000000001 RBX: ffff992307d5c1c0 RCX: 0000000000000000
> RDX: 0000000000000001 RSI: ffff992312c0d590 RDI: ffff99222faff680
> RBP: 0000000000000000 R08: 00000000000000bc R09: 0000000000000001
> R10: a8000000b5a8c360 R11: 0000000000000000 R12: 000000b5af81a000
> R13: ffffabb2cd4e3b57 R14: 00000000000e6000 R15: ffff9927ca7551f8
> FS:  0000000000000000(0000) GS:ffff992997980000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000ad404625100 CR3: 000000080ea20002 CR4: 00000000003706f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>   <TASK>
>   ? lookup_inline_extent_backref+0x5c3/0x760 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
>   ? __warn.cold+0x8e/0xe8
>   ? lookup_inline_extent_backref+0x5c3/0x760 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
>   ? report_bug+0xff/0x140
>   ? handle_bug+0x3c/0x80
>   ? exc_invalid_op+0x17/0x70
>   ? asm_exc_invalid_op+0x1a/0x20
>   ? lookup_inline_extent_backref+0x5c3/0x760 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
>   ? set_extent_buffer_dirty+0x19/0x170 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
>   insert_inline_extent_backref+0x82/0x160 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
>   __btrfs_inc_extent_ref+0x9c/0x220 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
>   ? __btrfs_run_delayed_refs+0xf64/0xfb0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
>   __btrfs_run_delayed_refs+0xaf2/0xfb0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
>   btrfs_run_delayed_refs+0x3b/0xd0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
>   btrfs_commit_transaction+0x6c/0xc80 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
>   ? start_transaction+0x22c/0x830 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
>   transaction_kthread+0x159/0x1c0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
>
> followed by leaf dump with items relevant to the numbers:
>
>        item 117 key (780331704320 168 942080) itemoff 11917 itemsize 37
>                extent refs 1 gen 2245328 flags 1
>                ref#0: shared data backref parent 4455386873856 count 1
>        item 118 key (780332646400 168 942080) itemoff 11880 itemsize 37
>                extent refs 1 gen 2245328 flags 1
>                ref#0: shared data backref parent 4455386873856 count 1
>        item 119 key (780333588480 168 925696) itemoff 11827 itemsize 53
>                      ^^^^^^^^^^^^^^^^^^^^^^^
>
>                extent refs 1 gen 2245328 flags 1
>                ref#0: extent data backref root 2404 objectid 1141024 offset 0 count 1
>        item 120 key (780334530560 168 942080) itemoff 11774 itemsize 53
>                extent refs 1 gen 2245328 flags 1
>                ref#0: extent data backref root 2404 objectid 1141025 offset 0 count 1
>        item 121 key (780335472640 168 942080) itemoff 11721 itemsize 53
>                extent refs 1 gen 2245328 flags 1
>                ref#0: extent data backref root 2404 objectid 1141026 offset 0 count 1
>
> as you can see item 119 is the problematic one and also out of sequence, the
> adjacent items have the key offset 942080. Which confirms the bitlip
> case.
>
> As for any bitflip induced errors, it's hard to tell how far it got
> propagated, this could be the only instance or there could be other
> items referring to that one too.
>
> We don't have any ready made tool for fixing that, the bitlips hit
> random data structure groups or data, each is basically unique and would
> require analysis of tree dump and look for clues how bad it is.
>

Since we're pretty sure it's a bitflip now, would you please provide the
following info?

- History of the fs
   Since you're using Arch kernel, and since 5.14 we have all the write-
   time checkers, normally we should detect such out-of-key situation by
   flipping the fs RO.
   I'm wondering if the fs is handled by some older kernels thus tree-
   checker didn't catch it early.

- The hardware spec
   The dmesg only contains hardware spec "LENOVO 30B4S01W00", which seems
   to be a workstation.
   I'm wondering if it's certain CPU models which leads to possible
   unreliable memories.
   From my experience, the memory chip itself is pretty rare to be the
   cause, but either the connection (from BGA to DIMM slot) or the memory
   controller (nowadays in the CPU die).

Thanks,
Qu

  reply	other threads:[~2024-07-25 23:19 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-25 21:06 Force remove of broken extent/subvolume? (Crash in btrfs_run_delayed_refs) Emil.s
2024-07-25 22:47 ` David Sterba
2024-07-25 23:19   ` Qu Wenruo [this message]
2024-07-26 10:52     ` Emil.s
2024-07-27  0:42       ` Qu Wenruo
2024-07-28 16:09       ` Yuwei Han
2024-08-05  8:16         ` Emil.s
2024-08-05  8:59           ` Qu Wenruo
2024-08-10 18:39             ` Emil.s

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aeed4735-f6f2-49ef-9a02-816a3b74cbd3@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=dsterba@suse.cz \
    --cc=emil@sandnabba.se \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox