Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: David Sterba <dsterba@suse.cz>
To: "Emil.s" <emil@sandnabba.se>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Force remove of broken extent/subvolume? (Crash in btrfs_run_delayed_refs)
Date: Fri, 26 Jul 2024 00:47:57 +0200	[thread overview]
Message-ID: <20240725224757.GD17473@twin.jikos.cz> (raw)
In-Reply-To: <CAEA9r7DVO8gCRz-9vbwaNWznz9AOFxOyPLO0ukOJh-6Ef0o5Bw@mail.gmail.com>

On Thu, Jul 25, 2024 at 11:06:00PM +0200, Emil.s wrote:
> Hello!
> 
> I got a corrupt filesystem due to backpointer mismatches:
> ---
> [2/7] checking extents
> data extent[780333588480, 942080] size mismatch, extent item size
> 925696 file item size 942080

This looks like a single bit flip:

>>> bin(925696)
'0b11100010000000000000'
>>> bin(942080)
'0b11100110000000000000'
>>> bin(942080 ^ 925696)
0b100000000000000'

or an off by one error, as the delta is 0x4000, 4x page which is one
node size.

> backpointer mismatch on [780333588480 925696]
> ---
> 
> However only two extents seem to be affected, in a subvolume only used
> for backups.
> 
> Since I've not been able to repair it, I thought that I could just
> delete the subvolume and recreate it.
> But now the btrfs_run_delayed_refs function crashes a while after
> mounting the filesystem. (Which is quite obvious when I think about
> it, since I guess it's trying to reclaim space, hitting the bad extent
> in the process?)
> 
> Anyhow, is it possible to force removal of these extents in any way?
> My understanding is that extents are mapped to a specific subvolume as
> well?
> 
> Here is the full crash dump:
> https://gist.github.com/sandnabba/e3ed7f57e4d32f404355fdf988fcfbff

WARNING: CPU: 3 PID: 199588 at fs/btrfs/extent-tree.c:858 lookup_inline_extent_backref+0x5c3/0x760 [btrfs]

 858         } else if (WARN_ON(ret)) {
 859                 btrfs_print_leaf(path->nodes[0]);
 860                 btrfs_err(fs_info,
 861 "extent item not found for insert, bytenr %llu num_bytes %llu parent %llu root_objectid %llu owner %llu offset %llu",
 862                           bytenr, num_bytes, parent, root_objectid, owner,
 863                           offset);
 864                 ret = -EUCLEAN;
 865                 goto out;
 866         }
 867

CPU: 3 PID: 199588 Comm: btrfs-transacti Tainted: P           OE      6.9.9-arch1-1 #1 a564e80ab10c5cd5584d6e9a0715907a10e33ca4
Hardware name: LENOVO 30B4S01W00/102F, BIOS S00KT73A 05/24/2022
RIP: 0010:lookup_inline_extent_backref+0x5c3/0x760 [btrfs]
RSP: 0018:ffffabb2cd4e3b00 EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff992307d5c1c0 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffff992312c0d590 RDI: ffff99222faff680
RBP: 0000000000000000 R08: 00000000000000bc R09: 0000000000000001
R10: a8000000b5a8c360 R11: 0000000000000000 R12: 000000b5af81a000
R13: ffffabb2cd4e3b57 R14: 00000000000e6000 R15: ffff9927ca7551f8
FS:  0000000000000000(0000) GS:ffff992997980000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000ad404625100 CR3: 000000080ea20002 CR4: 00000000003706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 ? lookup_inline_extent_backref+0x5c3/0x760 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
 ? __warn.cold+0x8e/0xe8
 ? lookup_inline_extent_backref+0x5c3/0x760 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
 ? report_bug+0xff/0x140
 ? handle_bug+0x3c/0x80
 ? exc_invalid_op+0x17/0x70
 ? asm_exc_invalid_op+0x1a/0x20
 ? lookup_inline_extent_backref+0x5c3/0x760 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
 ? set_extent_buffer_dirty+0x19/0x170 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
 insert_inline_extent_backref+0x82/0x160 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
 __btrfs_inc_extent_ref+0x9c/0x220 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
 ? __btrfs_run_delayed_refs+0xf64/0xfb0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
 __btrfs_run_delayed_refs+0xaf2/0xfb0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
 btrfs_run_delayed_refs+0x3b/0xd0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
 btrfs_commit_transaction+0x6c/0xc80 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
 ? start_transaction+0x22c/0x830 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
 transaction_kthread+0x159/0x1c0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]

followed by leaf dump with items relevant to the numbers:

      item 117 key (780331704320 168 942080) itemoff 11917 itemsize 37
              extent refs 1 gen 2245328 flags 1
              ref#0: shared data backref parent 4455386873856 count 1
      item 118 key (780332646400 168 942080) itemoff 11880 itemsize 37
              extent refs 1 gen 2245328 flags 1
              ref#0: shared data backref parent 4455386873856 count 1
      item 119 key (780333588480 168 925696) itemoff 11827 itemsize 53
                    ^^^^^^^^^^^^^^^^^^^^^^^

              extent refs 1 gen 2245328 flags 1
              ref#0: extent data backref root 2404 objectid 1141024 offset 0 count 1
      item 120 key (780334530560 168 942080) itemoff 11774 itemsize 53
              extent refs 1 gen 2245328 flags 1
              ref#0: extent data backref root 2404 objectid 1141025 offset 0 count 1
      item 121 key (780335472640 168 942080) itemoff 11721 itemsize 53
              extent refs 1 gen 2245328 flags 1
              ref#0: extent data backref root 2404 objectid 1141026 offset 0 count 1

as you can see item 119 is the problematic one and also out of sequence, the
adjacent items have the key offset 942080. Which confirms the bitlip
case.

As for any bitflip induced errors, it's hard to tell how far it got
propagated, this could be the only instance or there could be other
items referring to that one too.

We don't have any ready made tool for fixing that, the bitlips hit
random data structure groups or data, each is basically unique and would
require analysis of tree dump and look for clues how bad it is.

  reply	other threads:[~2024-07-25 22:48 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-25 21:06 Force remove of broken extent/subvolume? (Crash in btrfs_run_delayed_refs) Emil.s
2024-07-25 22:47 ` David Sterba [this message]
2024-07-25 23:19   ` Qu Wenruo
2024-07-26 10:52     ` Emil.s
2024-07-27  0:42       ` Qu Wenruo
2024-07-28 16:09       ` Yuwei Han
2024-08-05  8:16         ` Emil.s
2024-08-05  8:59           ` Qu Wenruo
2024-08-10 18:39             ` Emil.s

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240725224757.GD17473@twin.jikos.cz \
    --to=dsterba@suse.cz \
    --cc=emil@sandnabba.se \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox