Re: Force remove of broken extent/subvolume? (Crash in btrfs_run_delayed_refs)

Linux Btrfs filesystem development
 help / color / mirror / Atom feed

From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: "Emil.s" <emil@sandnabba.se>, Yuwei Han <hrx@bupt.moe>
Cc: dsterba@suse.cz, linux-btrfs@vger.kernel.org
Subject: Re: Force remove of broken extent/subvolume? (Crash in btrfs_run_delayed_refs)
Date: Mon, 5 Aug 2024 18:29:41 +0930	[thread overview]
Message-ID: <086eee00-f2f3-420a-abd4-771f6098fd2c@gmx.com> (raw)
In-Reply-To: <CAEA9r7CaJJRvDZ3iL9LuKtgi-xO+R-qOxiUg-4Ms-vzG_y+Y5g@mail.gmail.com>



在 2024/8/5 17:46, Emil.s 写道:
>> For curiosity, did you setup any scrub after rebuild FS?
>
> The new FS is built on new drives, on new hardware and new Linux
> kernel. And I'm rsyncing over all files so that everything will be
> built from scratch.
>
> However, I still got a snapshot (from 2024-03-20) on my offsite backup.
> I just scrubbed that drive, and it reports no errors. I'm also quite
> sure I scrubbed the corrupt drive quite recently without issues.
>
> Another interesting note is that I'm also unable to send any file
> system from the corrupt drive (which is probably a good thing).
> ```
> $ btrfs send /mnt/snapshots/user_data_2024-03-20 > /dev/null
> At subvol /mnt/snapshots/user_data_2024-03-20
> ERROR: send ioctl failed with -30: Read-only file system
> ```
>
> Wasn't expecting a "Read-only file system" error when sending a
> read-only snapshot? (But maybe that is expected?).

In that case, please provide the dmesg of that incident. (if any)

It looks like something else is wrong.

Thanks,
Qu
>
>
> On Sun, 28 Jul 2024 at 18:09, Yuwei Han <hrx@bupt.moe> wrote:
>>
>>
>>
>> 在 2024/7/26 18:52, Emil.s 写道:
>>>> As for any bitflip induced errors, it's hard to tell how far it got
>>>> propagated, this could be the only instance or there could be other
>>>> items referring to that one too.
>>>
>>> Right, yeah that sounds a bit more challenging then I initially thought.
>>> Maybe it is easier to just rebuild the array after all.
>>>
>>> And in regards to Qu's question, that is probably a good idea anyhow.
>>>
>>>> - History of the fs
>>>> - The hardware spec
>>>
>>> This has been my personal NAS / home server for quite some time.
>>> It's basically a mix of just leftover desktop hardware (without ECC memory).
>>>
>>> It was a 12 year old Gigabyte H77-D3H motherboard, an Intel i7-2600 CPU
>>> and 4 DDR3 DIMMs, all of different types and brands.
>>> The disks are WD red series, and I see now that one of them has over
>>> 80k power on hours.
>>>
>>> I know I did a rebuild about 5 years ago so the FS was probably
>>> created using Ubuntu server 18.04 (Linux 4.15), which has been
>>> upgraded to the major LTS versions since then.
>>> I actually hit this error when I was doing the "final backup" before
>>> retiring this setup, and it seems it was about time! (Was running
>>> Ubuntu 22.04 / Linux 5.15)
>>>
>> For curiosity, did you setup any scrub after rebuild FS?
>>> The Arch setup on the Thinkstation is my workstation where I attempted
>>> the data recovery.
>>>
>>> So due to the legacy hardware and crappy setup I think it's worth
>>> wasting more time here.
>>>
>>> But thanks a lot for the detailed answer, much appreciated!
>>>
>>> Best,
>>> Emil
>>>
>>> On Fri, 26 Jul 2024 at 01:19, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>
>>>>
>>>>
>>>> 在 2024/7/26 08:17, David Sterba 写道:
>>>>> On Thu, Jul 25, 2024 at 11:06:00PM +0200, Emil.s wrote:
>>>>>> Hello!
>>>>>>
>>>>>> I got a corrupt filesystem due to backpointer mismatches:
>>>>>> ---
>>>>>> [2/7] checking extents
>>>>>> data extent[780333588480, 942080] size mismatch, extent item size
>>>>>> 925696 file item size 942080
>>>>>
>>>>> This looks like a single bit flip:
>>>>>
>>>>>>>> bin(925696)
>>>>> '0b11100010000000000000'
>>>>>>>> bin(942080)
>>>>> '0b11100110000000000000'
>>>>>>>> bin(942080 ^ 925696)
>>>>> 0b100000000000000'
>>>>>
>>>>> or an off by one error, as the delta is 0x4000, 4x page which is one
>>>>> node size.
>>>>>
>>>>>> backpointer mismatch on [780333588480 925696]
>>>>>> ---
>>>>>>
>>>>>> However only two extents seem to be affected, in a subvolume only used
>>>>>> for backups.
>>>>>>
>>>>>> Since I've not been able to repair it, I thought that I could just
>>>>>> delete the subvolume and recreate it.
>>>>>> But now the btrfs_run_delayed_refs function crashes a while after
>>>>>> mounting the filesystem. (Which is quite obvious when I think about
>>>>>> it, since I guess it's trying to reclaim space, hitting the bad extent
>>>>>> in the process?)
>>>>>>
>>>>>> Anyhow, is it possible to force removal of these extents in any way?
>>>>>> My understanding is that extents are mapped to a specific subvolume as
>>>>>> well?
>>>>>>
>>>>>> Here is the full crash dump:
>>>>>> https://gist.github.com/sandnabba/e3ed7f57e4d32f404355fdf988fcfbff
>>>>>
>>>>> WARNING: CPU: 3 PID: 199588 at fs/btrfs/extent-tree.c:858 lookup_inline_extent_backref+0x5c3/0x760 [btrfs]
>>>>>
>>>>>     858         } else if (WARN_ON(ret)) {
>>>>>     859                 btrfs_print_leaf(path->nodes[0]);
>>>>>     860                 btrfs_err(fs_info,
>>>>>     861 "extent item not found for insert, bytenr %llu num_bytes %llu parent %llu root_objectid %llu owner %llu offset %llu",
>>>>>     862                           bytenr, num_bytes, parent, root_objectid, owner,
>>>>>     863                           offset);
>>>>>     864                 ret = -EUCLEAN;
>>>>>     865                 goto out;
>>>>>     866         }
>>>>>     867
>>>>>
>>>>> CPU: 3 PID: 199588 Comm: btrfs-transacti Tainted: P           OE      6.9.9-arch1-1 #1 a564e80ab10c5cd5584d6e9a0715907a10e33ca4
>>>>> Hardware name: LENOVO 30B4S01W00/102F, BIOS S00KT73A 05/24/2022
>>>>> RIP: 0010:lookup_inline_extent_backref+0x5c3/0x760 [btrfs]
>>>>> RSP: 0018:ffffabb2cd4e3b00 EFLAGS: 00010202
>>>>> RAX: 0000000000000001 RBX: ffff992307d5c1c0 RCX: 0000000000000000
>>>>> RDX: 0000000000000001 RSI: ffff992312c0d590 RDI: ffff99222faff680
>>>>> RBP: 0000000000000000 R08: 00000000000000bc R09: 0000000000000001
>>>>> R10: a8000000b5a8c360 R11: 0000000000000000 R12: 000000b5af81a000
>>>>> R13: ffffabb2cd4e3b57 R14: 00000000000e6000 R15: ffff9927ca7551f8
>>>>> FS:  0000000000000000(0000) GS:ffff992997980000(0000) knlGS:0000000000000000
>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> CR2: 00000ad404625100 CR3: 000000080ea20002 CR4: 00000000003706f0
>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>> Call Trace:
>>>>>     <TASK>
>>>>>     ? lookup_inline_extent_backref+0x5c3/0x760 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
>>>>>     ? __warn.cold+0x8e/0xe8
>>>>>     ? lookup_inline_extent_backref+0x5c3/0x760 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
>>>>>     ? report_bug+0xff/0x140
>>>>>     ? handle_bug+0x3c/0x80
>>>>>     ? exc_invalid_op+0x17/0x70
>>>>>     ? asm_exc_invalid_op+0x1a/0x20
>>>>>     ? lookup_inline_extent_backref+0x5c3/0x760 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
>>>>>     ? set_extent_buffer_dirty+0x19/0x170 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
>>>>>     insert_inline_extent_backref+0x82/0x160 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
>>>>>     __btrfs_inc_extent_ref+0x9c/0x220 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
>>>>>     ? __btrfs_run_delayed_refs+0xf64/0xfb0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
>>>>>     __btrfs_run_delayed_refs+0xaf2/0xfb0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
>>>>>     btrfs_run_delayed_refs+0x3b/0xd0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
>>>>>     btrfs_commit_transaction+0x6c/0xc80 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
>>>>>     ? start_transaction+0x22c/0x830 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
>>>>>     transaction_kthread+0x159/0x1c0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e]
>>>>>
>>>>> followed by leaf dump with items relevant to the numbers:
>>>>>
>>>>>          item 117 key (780331704320 168 942080) itemoff 11917 itemsize 37
>>>>>                  extent refs 1 gen 2245328 flags 1
>>>>>                  ref#0: shared data backref parent 4455386873856 count 1
>>>>>          item 118 key (780332646400 168 942080) itemoff 11880 itemsize 37
>>>>>                  extent refs 1 gen 2245328 flags 1
>>>>>                  ref#0: shared data backref parent 4455386873856 count 1
>>>>>          item 119 key (780333588480 168 925696) itemoff 11827 itemsize 53
>>>>>                        ^^^^^^^^^^^^^^^^^^^^^^^
>>>>>
>>>>>                  extent refs 1 gen 2245328 flags 1
>>>>>                  ref#0: extent data backref root 2404 objectid 1141024 offset 0 count 1
>>>>>          item 120 key (780334530560 168 942080) itemoff 11774 itemsize 53
>>>>>                  extent refs 1 gen 2245328 flags 1
>>>>>                  ref#0: extent data backref root 2404 objectid 1141025 offset 0 count 1
>>>>>          item 121 key (780335472640 168 942080) itemoff 11721 itemsize 53
>>>>>                  extent refs 1 gen 2245328 flags 1
>>>>>                  ref#0: extent data backref root 2404 objectid 1141026 offset 0 count 1
>>>>>
>>>>> as you can see item 119 is the problematic one and also out of sequence, the
>>>>> adjacent items have the key offset 942080. Which confirms the bitlip
>>>>> case.
>>>>>
>>>>> As for any bitflip induced errors, it's hard to tell how far it got
>>>>> propagated, this could be the only instance or there could be other
>>>>> items referring to that one too.
>>>>>
>>>>> We don't have any ready made tool for fixing that, the bitlips hit
>>>>> random data structure groups or data, each is basically unique and would
>>>>> require analysis of tree dump and look for clues how bad it is.
>>>>>
>>>>
>>>> Since we're pretty sure it's a bitflip now, would you please provide the
>>>> following info?
>>>>
>>>> - History of the fs
>>>>      Since you're using Arch kernel, and since 5.14 we have all the write-
>>>>      time checkers, normally we should detect such out-of-key situation by
>>>>      flipping the fs RO.
>>>>      I'm wondering if the fs is handled by some older kernels thus tree-
>>>>      checker didn't catch it early.
>>>>
>>>> - The hardware spec
>>>>      The dmesg only contains hardware spec "LENOVO 30B4S01W00", which seems
>>>>      to be a workstation.
>>>>      I'm wondering if it's certain CPU models which leads to possible
>>>>      unreliable memories.
>>>>      From my experience, the memory chip itself is pretty rare to be the
>>>>      cause, but either the connection (from BGA to DIMM slot) or the memory
>>>>      controller (nowadays in the CPU die).
>>>>
>>>> Thanks,
>>>> Qu
>>>

next prev parent reply	other threads:[~2024-08-05  9:00 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-25 21:06 Force remove of broken extent/subvolume? (Crash in btrfs_run_delayed_refs) Emil.s
2024-07-25 22:47 ` David Sterba
2024-07-25 23:19   ` Qu Wenruo
2024-07-26 10:52     ` Emil.s
2024-07-27  0:42       ` Qu Wenruo
2024-07-28 16:09       ` Yuwei Han
2024-08-05  8:16         ` Emil.s
2024-08-05  8:59           ` Qu Wenruo [this message]
2024-08-10 18:39             ` Emil.s

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=086eee00-f2f3-420a-abd4-771f6098fd2c@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=dsterba@suse.cz \
    --cc=emil@sandnabba.se \
    --cc=hrx@bupt.moe \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox