* Force remove of broken extent/subvolume? (Crash in btrfs_run_delayed_refs) @ 2024-07-25 21:06 Emil.s 2024-07-25 22:47 ` David Sterba 0 siblings, 1 reply; 9+ messages in thread From: Emil.s @ 2024-07-25 21:06 UTC (permalink / raw) To: linux-btrfs Hello! I got a corrupt filesystem due to backpointer mismatches: --- [2/7] checking extents data extent[780333588480, 942080] size mismatch, extent item size 925696 file item size 942080 backpointer mismatch on [780333588480 925696] --- However only two extents seem to be affected, in a subvolume only used for backups. Since I've not been able to repair it, I thought that I could just delete the subvolume and recreate it. But now the btrfs_run_delayed_refs function crashes a while after mounting the filesystem. (Which is quite obvious when I think about it, since I guess it's trying to reclaim space, hitting the bad extent in the process?) Anyhow, is it possible to force removal of these extents in any way? My understanding is that extents are mapped to a specific subvolume as well? Here is the full crash dump: https://gist.github.com/sandnabba/e3ed7f57e4d32f404355fdf988fcfbff Best regards Emil Sandnabba ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Force remove of broken extent/subvolume? (Crash in btrfs_run_delayed_refs) 2024-07-25 21:06 Force remove of broken extent/subvolume? (Crash in btrfs_run_delayed_refs) Emil.s @ 2024-07-25 22:47 ` David Sterba 2024-07-25 23:19 ` Qu Wenruo 0 siblings, 1 reply; 9+ messages in thread From: David Sterba @ 2024-07-25 22:47 UTC (permalink / raw) To: Emil.s; +Cc: linux-btrfs On Thu, Jul 25, 2024 at 11:06:00PM +0200, Emil.s wrote: > Hello! > > I got a corrupt filesystem due to backpointer mismatches: > --- > [2/7] checking extents > data extent[780333588480, 942080] size mismatch, extent item size > 925696 file item size 942080 This looks like a single bit flip: >>> bin(925696) '0b11100010000000000000' >>> bin(942080) '0b11100110000000000000' >>> bin(942080 ^ 925696) 0b100000000000000' or an off by one error, as the delta is 0x4000, 4x page which is one node size. > backpointer mismatch on [780333588480 925696] > --- > > However only two extents seem to be affected, in a subvolume only used > for backups. > > Since I've not been able to repair it, I thought that I could just > delete the subvolume and recreate it. > But now the btrfs_run_delayed_refs function crashes a while after > mounting the filesystem. (Which is quite obvious when I think about > it, since I guess it's trying to reclaim space, hitting the bad extent > in the process?) > > Anyhow, is it possible to force removal of these extents in any way? > My understanding is that extents are mapped to a specific subvolume as > well? > > Here is the full crash dump: > https://gist.github.com/sandnabba/e3ed7f57e4d32f404355fdf988fcfbff WARNING: CPU: 3 PID: 199588 at fs/btrfs/extent-tree.c:858 lookup_inline_extent_backref+0x5c3/0x760 [btrfs] 858 } else if (WARN_ON(ret)) { 859 btrfs_print_leaf(path->nodes[0]); 860 btrfs_err(fs_info, 861 "extent item not found for insert, bytenr %llu num_bytes %llu parent %llu root_objectid %llu owner %llu offset %llu", 862 bytenr, num_bytes, parent, root_objectid, owner, 863 offset); 864 ret = -EUCLEAN; 865 goto out; 866 } 867 CPU: 3 PID: 199588 Comm: btrfs-transacti Tainted: P OE 6.9.9-arch1-1 #1 a564e80ab10c5cd5584d6e9a0715907a10e33ca4 Hardware name: LENOVO 30B4S01W00/102F, BIOS S00KT73A 05/24/2022 RIP: 0010:lookup_inline_extent_backref+0x5c3/0x760 [btrfs] RSP: 0018:ffffabb2cd4e3b00 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffff992307d5c1c0 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffff992312c0d590 RDI: ffff99222faff680 RBP: 0000000000000000 R08: 00000000000000bc R09: 0000000000000001 R10: a8000000b5a8c360 R11: 0000000000000000 R12: 000000b5af81a000 R13: ffffabb2cd4e3b57 R14: 00000000000e6000 R15: ffff9927ca7551f8 FS: 0000000000000000(0000) GS:ffff992997980000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000ad404625100 CR3: 000000080ea20002 CR4: 00000000003706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? lookup_inline_extent_backref+0x5c3/0x760 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] ? __warn.cold+0x8e/0xe8 ? lookup_inline_extent_backref+0x5c3/0x760 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] ? report_bug+0xff/0x140 ? handle_bug+0x3c/0x80 ? exc_invalid_op+0x17/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? lookup_inline_extent_backref+0x5c3/0x760 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] ? set_extent_buffer_dirty+0x19/0x170 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] insert_inline_extent_backref+0x82/0x160 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] __btrfs_inc_extent_ref+0x9c/0x220 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] ? __btrfs_run_delayed_refs+0xf64/0xfb0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] __btrfs_run_delayed_refs+0xaf2/0xfb0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] btrfs_run_delayed_refs+0x3b/0xd0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] btrfs_commit_transaction+0x6c/0xc80 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] ? start_transaction+0x22c/0x830 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] transaction_kthread+0x159/0x1c0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] followed by leaf dump with items relevant to the numbers: item 117 key (780331704320 168 942080) itemoff 11917 itemsize 37 extent refs 1 gen 2245328 flags 1 ref#0: shared data backref parent 4455386873856 count 1 item 118 key (780332646400 168 942080) itemoff 11880 itemsize 37 extent refs 1 gen 2245328 flags 1 ref#0: shared data backref parent 4455386873856 count 1 item 119 key (780333588480 168 925696) itemoff 11827 itemsize 53 ^^^^^^^^^^^^^^^^^^^^^^^ extent refs 1 gen 2245328 flags 1 ref#0: extent data backref root 2404 objectid 1141024 offset 0 count 1 item 120 key (780334530560 168 942080) itemoff 11774 itemsize 53 extent refs 1 gen 2245328 flags 1 ref#0: extent data backref root 2404 objectid 1141025 offset 0 count 1 item 121 key (780335472640 168 942080) itemoff 11721 itemsize 53 extent refs 1 gen 2245328 flags 1 ref#0: extent data backref root 2404 objectid 1141026 offset 0 count 1 as you can see item 119 is the problematic one and also out of sequence, the adjacent items have the key offset 942080. Which confirms the bitlip case. As for any bitflip induced errors, it's hard to tell how far it got propagated, this could be the only instance or there could be other items referring to that one too. We don't have any ready made tool for fixing that, the bitlips hit random data structure groups or data, each is basically unique and would require analysis of tree dump and look for clues how bad it is. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Force remove of broken extent/subvolume? (Crash in btrfs_run_delayed_refs) 2024-07-25 22:47 ` David Sterba @ 2024-07-25 23:19 ` Qu Wenruo 2024-07-26 10:52 ` Emil.s 0 siblings, 1 reply; 9+ messages in thread From: Qu Wenruo @ 2024-07-25 23:19 UTC (permalink / raw) To: dsterba, Emil.s; +Cc: linux-btrfs 在 2024/7/26 08:17, David Sterba 写道: > On Thu, Jul 25, 2024 at 11:06:00PM +0200, Emil.s wrote: >> Hello! >> >> I got a corrupt filesystem due to backpointer mismatches: >> --- >> [2/7] checking extents >> data extent[780333588480, 942080] size mismatch, extent item size >> 925696 file item size 942080 > > This looks like a single bit flip: > >>>> bin(925696) > '0b11100010000000000000' >>>> bin(942080) > '0b11100110000000000000' >>>> bin(942080 ^ 925696) > 0b100000000000000' > > or an off by one error, as the delta is 0x4000, 4x page which is one > node size. > >> backpointer mismatch on [780333588480 925696] >> --- >> >> However only two extents seem to be affected, in a subvolume only used >> for backups. >> >> Since I've not been able to repair it, I thought that I could just >> delete the subvolume and recreate it. >> But now the btrfs_run_delayed_refs function crashes a while after >> mounting the filesystem. (Which is quite obvious when I think about >> it, since I guess it's trying to reclaim space, hitting the bad extent >> in the process?) >> >> Anyhow, is it possible to force removal of these extents in any way? >> My understanding is that extents are mapped to a specific subvolume as >> well? >> >> Here is the full crash dump: >> https://gist.github.com/sandnabba/e3ed7f57e4d32f404355fdf988fcfbff > > WARNING: CPU: 3 PID: 199588 at fs/btrfs/extent-tree.c:858 lookup_inline_extent_backref+0x5c3/0x760 [btrfs] > > 858 } else if (WARN_ON(ret)) { > 859 btrfs_print_leaf(path->nodes[0]); > 860 btrfs_err(fs_info, > 861 "extent item not found for insert, bytenr %llu num_bytes %llu parent %llu root_objectid %llu owner %llu offset %llu", > 862 bytenr, num_bytes, parent, root_objectid, owner, > 863 offset); > 864 ret = -EUCLEAN; > 865 goto out; > 866 } > 867 > > CPU: 3 PID: 199588 Comm: btrfs-transacti Tainted: P OE 6.9.9-arch1-1 #1 a564e80ab10c5cd5584d6e9a0715907a10e33ca4 > Hardware name: LENOVO 30B4S01W00/102F, BIOS S00KT73A 05/24/2022 > RIP: 0010:lookup_inline_extent_backref+0x5c3/0x760 [btrfs] > RSP: 0018:ffffabb2cd4e3b00 EFLAGS: 00010202 > RAX: 0000000000000001 RBX: ffff992307d5c1c0 RCX: 0000000000000000 > RDX: 0000000000000001 RSI: ffff992312c0d590 RDI: ffff99222faff680 > RBP: 0000000000000000 R08: 00000000000000bc R09: 0000000000000001 > R10: a8000000b5a8c360 R11: 0000000000000000 R12: 000000b5af81a000 > R13: ffffabb2cd4e3b57 R14: 00000000000e6000 R15: ffff9927ca7551f8 > FS: 0000000000000000(0000) GS:ffff992997980000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00000ad404625100 CR3: 000000080ea20002 CR4: 00000000003706f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > <TASK> > ? lookup_inline_extent_backref+0x5c3/0x760 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > ? __warn.cold+0x8e/0xe8 > ? lookup_inline_extent_backref+0x5c3/0x760 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > ? report_bug+0xff/0x140 > ? handle_bug+0x3c/0x80 > ? exc_invalid_op+0x17/0x70 > ? asm_exc_invalid_op+0x1a/0x20 > ? lookup_inline_extent_backref+0x5c3/0x760 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > ? set_extent_buffer_dirty+0x19/0x170 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > insert_inline_extent_backref+0x82/0x160 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > __btrfs_inc_extent_ref+0x9c/0x220 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > ? __btrfs_run_delayed_refs+0xf64/0xfb0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > __btrfs_run_delayed_refs+0xaf2/0xfb0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > btrfs_run_delayed_refs+0x3b/0xd0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > btrfs_commit_transaction+0x6c/0xc80 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > ? start_transaction+0x22c/0x830 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > transaction_kthread+0x159/0x1c0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > > followed by leaf dump with items relevant to the numbers: > > item 117 key (780331704320 168 942080) itemoff 11917 itemsize 37 > extent refs 1 gen 2245328 flags 1 > ref#0: shared data backref parent 4455386873856 count 1 > item 118 key (780332646400 168 942080) itemoff 11880 itemsize 37 > extent refs 1 gen 2245328 flags 1 > ref#0: shared data backref parent 4455386873856 count 1 > item 119 key (780333588480 168 925696) itemoff 11827 itemsize 53 > ^^^^^^^^^^^^^^^^^^^^^^^ > > extent refs 1 gen 2245328 flags 1 > ref#0: extent data backref root 2404 objectid 1141024 offset 0 count 1 > item 120 key (780334530560 168 942080) itemoff 11774 itemsize 53 > extent refs 1 gen 2245328 flags 1 > ref#0: extent data backref root 2404 objectid 1141025 offset 0 count 1 > item 121 key (780335472640 168 942080) itemoff 11721 itemsize 53 > extent refs 1 gen 2245328 flags 1 > ref#0: extent data backref root 2404 objectid 1141026 offset 0 count 1 > > as you can see item 119 is the problematic one and also out of sequence, the > adjacent items have the key offset 942080. Which confirms the bitlip > case. > > As for any bitflip induced errors, it's hard to tell how far it got > propagated, this could be the only instance or there could be other > items referring to that one too. > > We don't have any ready made tool for fixing that, the bitlips hit > random data structure groups or data, each is basically unique and would > require analysis of tree dump and look for clues how bad it is. > Since we're pretty sure it's a bitflip now, would you please provide the following info? - History of the fs Since you're using Arch kernel, and since 5.14 we have all the write- time checkers, normally we should detect such out-of-key situation by flipping the fs RO. I'm wondering if the fs is handled by some older kernels thus tree- checker didn't catch it early. - The hardware spec The dmesg only contains hardware spec "LENOVO 30B4S01W00", which seems to be a workstation. I'm wondering if it's certain CPU models which leads to possible unreliable memories. From my experience, the memory chip itself is pretty rare to be the cause, but either the connection (from BGA to DIMM slot) or the memory controller (nowadays in the CPU die). Thanks, Qu ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Force remove of broken extent/subvolume? (Crash in btrfs_run_delayed_refs) 2024-07-25 23:19 ` Qu Wenruo @ 2024-07-26 10:52 ` Emil.s 2024-07-27 0:42 ` Qu Wenruo 2024-07-28 16:09 ` Yuwei Han 0 siblings, 2 replies; 9+ messages in thread From: Emil.s @ 2024-07-26 10:52 UTC (permalink / raw) To: Qu Wenruo; +Cc: dsterba, linux-btrfs > As for any bitflip induced errors, it's hard to tell how far it got > propagated, this could be the only instance or there could be other > items referring to that one too. Right, yeah that sounds a bit more challenging then I initially thought. Maybe it is easier to just rebuild the array after all. And in regards to Qu's question, that is probably a good idea anyhow. > - History of the fs > - The hardware spec This has been my personal NAS / home server for quite some time. It's basically a mix of just leftover desktop hardware (without ECC memory). It was a 12 year old Gigabyte H77-D3H motherboard, an Intel i7-2600 CPU and 4 DDR3 DIMMs, all of different types and brands. The disks are WD red series, and I see now that one of them has over 80k power on hours. I know I did a rebuild about 5 years ago so the FS was probably created using Ubuntu server 18.04 (Linux 4.15), which has been upgraded to the major LTS versions since then. I actually hit this error when I was doing the "final backup" before retiring this setup, and it seems it was about time! (Was running Ubuntu 22.04 / Linux 5.15) The Arch setup on the Thinkstation is my workstation where I attempted the data recovery. So due to the legacy hardware and crappy setup I think it's worth wasting more time here. But thanks a lot for the detailed answer, much appreciated! Best, Emil On Fri, 26 Jul 2024 at 01:19, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > > > 在 2024/7/26 08:17, David Sterba 写道: > > On Thu, Jul 25, 2024 at 11:06:00PM +0200, Emil.s wrote: > >> Hello! > >> > >> I got a corrupt filesystem due to backpointer mismatches: > >> --- > >> [2/7] checking extents > >> data extent[780333588480, 942080] size mismatch, extent item size > >> 925696 file item size 942080 > > > > This looks like a single bit flip: > > > >>>> bin(925696) > > '0b11100010000000000000' > >>>> bin(942080) > > '0b11100110000000000000' > >>>> bin(942080 ^ 925696) > > 0b100000000000000' > > > > or an off by one error, as the delta is 0x4000, 4x page which is one > > node size. > > > >> backpointer mismatch on [780333588480 925696] > >> --- > >> > >> However only two extents seem to be affected, in a subvolume only used > >> for backups. > >> > >> Since I've not been able to repair it, I thought that I could just > >> delete the subvolume and recreate it. > >> But now the btrfs_run_delayed_refs function crashes a while after > >> mounting the filesystem. (Which is quite obvious when I think about > >> it, since I guess it's trying to reclaim space, hitting the bad extent > >> in the process?) > >> > >> Anyhow, is it possible to force removal of these extents in any way? > >> My understanding is that extents are mapped to a specific subvolume as > >> well? > >> > >> Here is the full crash dump: > >> https://gist.github.com/sandnabba/e3ed7f57e4d32f404355fdf988fcfbff > > > > WARNING: CPU: 3 PID: 199588 at fs/btrfs/extent-tree.c:858 lookup_inline_extent_backref+0x5c3/0x760 [btrfs] > > > > 858 } else if (WARN_ON(ret)) { > > 859 btrfs_print_leaf(path->nodes[0]); > > 860 btrfs_err(fs_info, > > 861 "extent item not found for insert, bytenr %llu num_bytes %llu parent %llu root_objectid %llu owner %llu offset %llu", > > 862 bytenr, num_bytes, parent, root_objectid, owner, > > 863 offset); > > 864 ret = -EUCLEAN; > > 865 goto out; > > 866 } > > 867 > > > > CPU: 3 PID: 199588 Comm: btrfs-transacti Tainted: P OE 6.9.9-arch1-1 #1 a564e80ab10c5cd5584d6e9a0715907a10e33ca4 > > Hardware name: LENOVO 30B4S01W00/102F, BIOS S00KT73A 05/24/2022 > > RIP: 0010:lookup_inline_extent_backref+0x5c3/0x760 [btrfs] > > RSP: 0018:ffffabb2cd4e3b00 EFLAGS: 00010202 > > RAX: 0000000000000001 RBX: ffff992307d5c1c0 RCX: 0000000000000000 > > RDX: 0000000000000001 RSI: ffff992312c0d590 RDI: ffff99222faff680 > > RBP: 0000000000000000 R08: 00000000000000bc R09: 0000000000000001 > > R10: a8000000b5a8c360 R11: 0000000000000000 R12: 000000b5af81a000 > > R13: ffffabb2cd4e3b57 R14: 00000000000e6000 R15: ffff9927ca7551f8 > > FS: 0000000000000000(0000) GS:ffff992997980000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 00000ad404625100 CR3: 000000080ea20002 CR4: 00000000003706f0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > Call Trace: > > <TASK> > > ? lookup_inline_extent_backref+0x5c3/0x760 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > > ? __warn.cold+0x8e/0xe8 > > ? lookup_inline_extent_backref+0x5c3/0x760 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > > ? report_bug+0xff/0x140 > > ? handle_bug+0x3c/0x80 > > ? exc_invalid_op+0x17/0x70 > > ? asm_exc_invalid_op+0x1a/0x20 > > ? lookup_inline_extent_backref+0x5c3/0x760 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > > ? set_extent_buffer_dirty+0x19/0x170 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > > insert_inline_extent_backref+0x82/0x160 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > > __btrfs_inc_extent_ref+0x9c/0x220 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > > ? __btrfs_run_delayed_refs+0xf64/0xfb0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > > __btrfs_run_delayed_refs+0xaf2/0xfb0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > > btrfs_run_delayed_refs+0x3b/0xd0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > > btrfs_commit_transaction+0x6c/0xc80 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > > ? start_transaction+0x22c/0x830 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > > transaction_kthread+0x159/0x1c0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > > > > followed by leaf dump with items relevant to the numbers: > > > > item 117 key (780331704320 168 942080) itemoff 11917 itemsize 37 > > extent refs 1 gen 2245328 flags 1 > > ref#0: shared data backref parent 4455386873856 count 1 > > item 118 key (780332646400 168 942080) itemoff 11880 itemsize 37 > > extent refs 1 gen 2245328 flags 1 > > ref#0: shared data backref parent 4455386873856 count 1 > > item 119 key (780333588480 168 925696) itemoff 11827 itemsize 53 > > ^^^^^^^^^^^^^^^^^^^^^^^ > > > > extent refs 1 gen 2245328 flags 1 > > ref#0: extent data backref root 2404 objectid 1141024 offset 0 count 1 > > item 120 key (780334530560 168 942080) itemoff 11774 itemsize 53 > > extent refs 1 gen 2245328 flags 1 > > ref#0: extent data backref root 2404 objectid 1141025 offset 0 count 1 > > item 121 key (780335472640 168 942080) itemoff 11721 itemsize 53 > > extent refs 1 gen 2245328 flags 1 > > ref#0: extent data backref root 2404 objectid 1141026 offset 0 count 1 > > > > as you can see item 119 is the problematic one and also out of sequence, the > > adjacent items have the key offset 942080. Which confirms the bitlip > > case. > > > > As for any bitflip induced errors, it's hard to tell how far it got > > propagated, this could be the only instance or there could be other > > items referring to that one too. > > > > We don't have any ready made tool for fixing that, the bitlips hit > > random data structure groups or data, each is basically unique and would > > require analysis of tree dump and look for clues how bad it is. > > > > Since we're pretty sure it's a bitflip now, would you please provide the > following info? > > - History of the fs > Since you're using Arch kernel, and since 5.14 we have all the write- > time checkers, normally we should detect such out-of-key situation by > flipping the fs RO. > I'm wondering if the fs is handled by some older kernels thus tree- > checker didn't catch it early. > > - The hardware spec > The dmesg only contains hardware spec "LENOVO 30B4S01W00", which seems > to be a workstation. > I'm wondering if it's certain CPU models which leads to possible > unreliable memories. > From my experience, the memory chip itself is pretty rare to be the > cause, but either the connection (from BGA to DIMM slot) or the memory > controller (nowadays in the CPU die). > > Thanks, > Qu ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Force remove of broken extent/subvolume? (Crash in btrfs_run_delayed_refs) 2024-07-26 10:52 ` Emil.s @ 2024-07-27 0:42 ` Qu Wenruo 2024-07-28 16:09 ` Yuwei Han 1 sibling, 0 replies; 9+ messages in thread From: Qu Wenruo @ 2024-07-27 0:42 UTC (permalink / raw) To: Emil.s; +Cc: dsterba, linux-btrfs 在 2024/7/26 20:22, Emil.s 写道: >> As for any bitflip induced errors, it's hard to tell how far it got >> propagated, this could be the only instance or there could be other >> items referring to that one too. > > Right, yeah that sounds a bit more challenging then I initially thought. > Maybe it is easier to just rebuild the array after all. > > And in regards to Qu's question, that is probably a good idea anyhow. > >> - History of the fs >> - The hardware spec > > This has been my personal NAS / home server for quite some time. > It's basically a mix of just leftover desktop hardware (without ECC memory). > > It was a 12 year old Gigabyte H77-D3H motherboard, an Intel i7-2600 CPU > and 4 DDR3 DIMMs, all of different types and brands. > The disks are WD red series, and I see now that one of them has over > 80k power on hours. I wasn't expecting this, as the normal "memory chip seldom dies" mostly applies to a much smaller time span, like around 5 years. > > I know I did a rebuild about 5 years ago so the FS was probably > created using Ubuntu server 18.04 (Linux 4.15), which has been > upgraded to the major LTS versions since then. And I believe this is where the corruption happened, before any tree-checker was even introduced. Thus we didn't catch it early enough and wrote the corrupted data onto the disk. > I actually hit this error when I was doing the "final backup" before > retiring this setup, and it seems it was about time! (Was running > Ubuntu 22.04 / Linux 5.15) Thankfully that specific corruption is only on extent tree, you should still be able to do the backup with it mounted RO, or with "rescue=all" to be extra safe. Thanks, Qu > > The Arch setup on the Thinkstation is my workstation where I attempted > the data recovery. > > So due to the legacy hardware and crappy setup I think it's worth > wasting more time here. > > But thanks a lot for the detailed answer, much appreciated! > > Best, > Emil > > On Fri, 26 Jul 2024 at 01:19, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >> >> >> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Force remove of broken extent/subvolume? (Crash in btrfs_run_delayed_refs) 2024-07-26 10:52 ` Emil.s 2024-07-27 0:42 ` Qu Wenruo @ 2024-07-28 16:09 ` Yuwei Han 2024-08-05 8:16 ` Emil.s 1 sibling, 1 reply; 9+ messages in thread From: Yuwei Han @ 2024-07-28 16:09 UTC (permalink / raw) To: Emil.s, Qu Wenruo; +Cc: dsterba, linux-btrfs 在 2024/7/26 18:52, Emil.s 写道: >> As for any bitflip induced errors, it's hard to tell how far it got >> propagated, this could be the only instance or there could be other >> items referring to that one too. > > Right, yeah that sounds a bit more challenging then I initially thought. > Maybe it is easier to just rebuild the array after all. > > And in regards to Qu's question, that is probably a good idea anyhow. > >> - History of the fs >> - The hardware spec > > This has been my personal NAS / home server for quite some time. > It's basically a mix of just leftover desktop hardware (without ECC memory). > > It was a 12 year old Gigabyte H77-D3H motherboard, an Intel i7-2600 CPU > and 4 DDR3 DIMMs, all of different types and brands. > The disks are WD red series, and I see now that one of them has over > 80k power on hours. > > I know I did a rebuild about 5 years ago so the FS was probably > created using Ubuntu server 18.04 (Linux 4.15), which has been > upgraded to the major LTS versions since then. > I actually hit this error when I was doing the "final backup" before > retiring this setup, and it seems it was about time! (Was running > Ubuntu 22.04 / Linux 5.15) > For curiosity, did you setup any scrub after rebuild FS? > The Arch setup on the Thinkstation is my workstation where I attempted > the data recovery. > > So due to the legacy hardware and crappy setup I think it's worth > wasting more time here. > > But thanks a lot for the detailed answer, much appreciated! > > Best, > Emil > > On Fri, 26 Jul 2024 at 01:19, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >> >> >> >> 在 2024/7/26 08:17, David Sterba 写道: >>> On Thu, Jul 25, 2024 at 11:06:00PM +0200, Emil.s wrote: >>>> Hello! >>>> >>>> I got a corrupt filesystem due to backpointer mismatches: >>>> --- >>>> [2/7] checking extents >>>> data extent[780333588480, 942080] size mismatch, extent item size >>>> 925696 file item size 942080 >>> >>> This looks like a single bit flip: >>> >>>>>> bin(925696) >>> '0b11100010000000000000' >>>>>> bin(942080) >>> '0b11100110000000000000' >>>>>> bin(942080 ^ 925696) >>> 0b100000000000000' >>> >>> or an off by one error, as the delta is 0x4000, 4x page which is one >>> node size. >>> >>>> backpointer mismatch on [780333588480 925696] >>>> --- >>>> >>>> However only two extents seem to be affected, in a subvolume only used >>>> for backups. >>>> >>>> Since I've not been able to repair it, I thought that I could just >>>> delete the subvolume and recreate it. >>>> But now the btrfs_run_delayed_refs function crashes a while after >>>> mounting the filesystem. (Which is quite obvious when I think about >>>> it, since I guess it's trying to reclaim space, hitting the bad extent >>>> in the process?) >>>> >>>> Anyhow, is it possible to force removal of these extents in any way? >>>> My understanding is that extents are mapped to a specific subvolume as >>>> well? >>>> >>>> Here is the full crash dump: >>>> https://gist.github.com/sandnabba/e3ed7f57e4d32f404355fdf988fcfbff >>> >>> WARNING: CPU: 3 PID: 199588 at fs/btrfs/extent-tree.c:858 lookup_inline_extent_backref+0x5c3/0x760 [btrfs] >>> >>> 858 } else if (WARN_ON(ret)) { >>> 859 btrfs_print_leaf(path->nodes[0]); >>> 860 btrfs_err(fs_info, >>> 861 "extent item not found for insert, bytenr %llu num_bytes %llu parent %llu root_objectid %llu owner %llu offset %llu", >>> 862 bytenr, num_bytes, parent, root_objectid, owner, >>> 863 offset); >>> 864 ret = -EUCLEAN; >>> 865 goto out; >>> 866 } >>> 867 >>> >>> CPU: 3 PID: 199588 Comm: btrfs-transacti Tainted: P OE 6.9.9-arch1-1 #1 a564e80ab10c5cd5584d6e9a0715907a10e33ca4 >>> Hardware name: LENOVO 30B4S01W00/102F, BIOS S00KT73A 05/24/2022 >>> RIP: 0010:lookup_inline_extent_backref+0x5c3/0x760 [btrfs] >>> RSP: 0018:ffffabb2cd4e3b00 EFLAGS: 00010202 >>> RAX: 0000000000000001 RBX: ffff992307d5c1c0 RCX: 0000000000000000 >>> RDX: 0000000000000001 RSI: ffff992312c0d590 RDI: ffff99222faff680 >>> RBP: 0000000000000000 R08: 00000000000000bc R09: 0000000000000001 >>> R10: a8000000b5a8c360 R11: 0000000000000000 R12: 000000b5af81a000 >>> R13: ffffabb2cd4e3b57 R14: 00000000000e6000 R15: ffff9927ca7551f8 >>> FS: 0000000000000000(0000) GS:ffff992997980000(0000) knlGS:0000000000000000 >>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> CR2: 00000ad404625100 CR3: 000000080ea20002 CR4: 00000000003706f0 >>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>> Call Trace: >>> <TASK> >>> ? lookup_inline_extent_backref+0x5c3/0x760 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] >>> ? __warn.cold+0x8e/0xe8 >>> ? lookup_inline_extent_backref+0x5c3/0x760 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] >>> ? report_bug+0xff/0x140 >>> ? handle_bug+0x3c/0x80 >>> ? exc_invalid_op+0x17/0x70 >>> ? asm_exc_invalid_op+0x1a/0x20 >>> ? lookup_inline_extent_backref+0x5c3/0x760 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] >>> ? set_extent_buffer_dirty+0x19/0x170 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] >>> insert_inline_extent_backref+0x82/0x160 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] >>> __btrfs_inc_extent_ref+0x9c/0x220 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] >>> ? __btrfs_run_delayed_refs+0xf64/0xfb0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] >>> __btrfs_run_delayed_refs+0xaf2/0xfb0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] >>> btrfs_run_delayed_refs+0x3b/0xd0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] >>> btrfs_commit_transaction+0x6c/0xc80 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] >>> ? start_transaction+0x22c/0x830 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] >>> transaction_kthread+0x159/0x1c0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] >>> >>> followed by leaf dump with items relevant to the numbers: >>> >>> item 117 key (780331704320 168 942080) itemoff 11917 itemsize 37 >>> extent refs 1 gen 2245328 flags 1 >>> ref#0: shared data backref parent 4455386873856 count 1 >>> item 118 key (780332646400 168 942080) itemoff 11880 itemsize 37 >>> extent refs 1 gen 2245328 flags 1 >>> ref#0: shared data backref parent 4455386873856 count 1 >>> item 119 key (780333588480 168 925696) itemoff 11827 itemsize 53 >>> ^^^^^^^^^^^^^^^^^^^^^^^ >>> >>> extent refs 1 gen 2245328 flags 1 >>> ref#0: extent data backref root 2404 objectid 1141024 offset 0 count 1 >>> item 120 key (780334530560 168 942080) itemoff 11774 itemsize 53 >>> extent refs 1 gen 2245328 flags 1 >>> ref#0: extent data backref root 2404 objectid 1141025 offset 0 count 1 >>> item 121 key (780335472640 168 942080) itemoff 11721 itemsize 53 >>> extent refs 1 gen 2245328 flags 1 >>> ref#0: extent data backref root 2404 objectid 1141026 offset 0 count 1 >>> >>> as you can see item 119 is the problematic one and also out of sequence, the >>> adjacent items have the key offset 942080. Which confirms the bitlip >>> case. >>> >>> As for any bitflip induced errors, it's hard to tell how far it got >>> propagated, this could be the only instance or there could be other >>> items referring to that one too. >>> >>> We don't have any ready made tool for fixing that, the bitlips hit >>> random data structure groups or data, each is basically unique and would >>> require analysis of tree dump and look for clues how bad it is. >>> >> >> Since we're pretty sure it's a bitflip now, would you please provide the >> following info? >> >> - History of the fs >> Since you're using Arch kernel, and since 5.14 we have all the write- >> time checkers, normally we should detect such out-of-key situation by >> flipping the fs RO. >> I'm wondering if the fs is handled by some older kernels thus tree- >> checker didn't catch it early. >> >> - The hardware spec >> The dmesg only contains hardware spec "LENOVO 30B4S01W00", which seems >> to be a workstation. >> I'm wondering if it's certain CPU models which leads to possible >> unreliable memories. >> From my experience, the memory chip itself is pretty rare to be the >> cause, but either the connection (from BGA to DIMM slot) or the memory >> controller (nowadays in the CPU die). >> >> Thanks, >> Qu > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Force remove of broken extent/subvolume? (Crash in btrfs_run_delayed_refs) 2024-07-28 16:09 ` Yuwei Han @ 2024-08-05 8:16 ` Emil.s 2024-08-05 8:59 ` Qu Wenruo 0 siblings, 1 reply; 9+ messages in thread From: Emil.s @ 2024-08-05 8:16 UTC (permalink / raw) To: Yuwei Han; +Cc: Qu Wenruo, dsterba, linux-btrfs > For curiosity, did you setup any scrub after rebuild FS? The new FS is built on new drives, on new hardware and new Linux kernel. And I'm rsyncing over all files so that everything will be built from scratch. However, I still got a snapshot (from 2024-03-20) on my offsite backup. I just scrubbed that drive, and it reports no errors. I'm also quite sure I scrubbed the corrupt drive quite recently without issues. Another interesting note is that I'm also unable to send any file system from the corrupt drive (which is probably a good thing). ``` $ btrfs send /mnt/snapshots/user_data_2024-03-20 > /dev/null At subvol /mnt/snapshots/user_data_2024-03-20 ERROR: send ioctl failed with -30: Read-only file system ``` Wasn't expecting a "Read-only file system" error when sending a read-only snapshot? (But maybe that is expected?). On Sun, 28 Jul 2024 at 18:09, Yuwei Han <hrx@bupt.moe> wrote: > > > > 在 2024/7/26 18:52, Emil.s 写道: > >> As for any bitflip induced errors, it's hard to tell how far it got > >> propagated, this could be the only instance or there could be other > >> items referring to that one too. > > > > Right, yeah that sounds a bit more challenging then I initially thought. > > Maybe it is easier to just rebuild the array after all. > > > > And in regards to Qu's question, that is probably a good idea anyhow. > > > >> - History of the fs > >> - The hardware spec > > > > This has been my personal NAS / home server for quite some time. > > It's basically a mix of just leftover desktop hardware (without ECC memory). > > > > It was a 12 year old Gigabyte H77-D3H motherboard, an Intel i7-2600 CPU > > and 4 DDR3 DIMMs, all of different types and brands. > > The disks are WD red series, and I see now that one of them has over > > 80k power on hours. > > > > I know I did a rebuild about 5 years ago so the FS was probably > > created using Ubuntu server 18.04 (Linux 4.15), which has been > > upgraded to the major LTS versions since then. > > I actually hit this error when I was doing the "final backup" before > > retiring this setup, and it seems it was about time! (Was running > > Ubuntu 22.04 / Linux 5.15) > > > For curiosity, did you setup any scrub after rebuild FS? > > The Arch setup on the Thinkstation is my workstation where I attempted > > the data recovery. > > > > So due to the legacy hardware and crappy setup I think it's worth > > wasting more time here. > > > > But thanks a lot for the detailed answer, much appreciated! > > > > Best, > > Emil > > > > On Fri, 26 Jul 2024 at 01:19, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > >> > >> > >> > >> 在 2024/7/26 08:17, David Sterba 写道: > >>> On Thu, Jul 25, 2024 at 11:06:00PM +0200, Emil.s wrote: > >>>> Hello! > >>>> > >>>> I got a corrupt filesystem due to backpointer mismatches: > >>>> --- > >>>> [2/7] checking extents > >>>> data extent[780333588480, 942080] size mismatch, extent item size > >>>> 925696 file item size 942080 > >>> > >>> This looks like a single bit flip: > >>> > >>>>>> bin(925696) > >>> '0b11100010000000000000' > >>>>>> bin(942080) > >>> '0b11100110000000000000' > >>>>>> bin(942080 ^ 925696) > >>> 0b100000000000000' > >>> > >>> or an off by one error, as the delta is 0x4000, 4x page which is one > >>> node size. > >>> > >>>> backpointer mismatch on [780333588480 925696] > >>>> --- > >>>> > >>>> However only two extents seem to be affected, in a subvolume only used > >>>> for backups. > >>>> > >>>> Since I've not been able to repair it, I thought that I could just > >>>> delete the subvolume and recreate it. > >>>> But now the btrfs_run_delayed_refs function crashes a while after > >>>> mounting the filesystem. (Which is quite obvious when I think about > >>>> it, since I guess it's trying to reclaim space, hitting the bad extent > >>>> in the process?) > >>>> > >>>> Anyhow, is it possible to force removal of these extents in any way? > >>>> My understanding is that extents are mapped to a specific subvolume as > >>>> well? > >>>> > >>>> Here is the full crash dump: > >>>> https://gist.github.com/sandnabba/e3ed7f57e4d32f404355fdf988fcfbff > >>> > >>> WARNING: CPU: 3 PID: 199588 at fs/btrfs/extent-tree.c:858 lookup_inline_extent_backref+0x5c3/0x760 [btrfs] > >>> > >>> 858 } else if (WARN_ON(ret)) { > >>> 859 btrfs_print_leaf(path->nodes[0]); > >>> 860 btrfs_err(fs_info, > >>> 861 "extent item not found for insert, bytenr %llu num_bytes %llu parent %llu root_objectid %llu owner %llu offset %llu", > >>> 862 bytenr, num_bytes, parent, root_objectid, owner, > >>> 863 offset); > >>> 864 ret = -EUCLEAN; > >>> 865 goto out; > >>> 866 } > >>> 867 > >>> > >>> CPU: 3 PID: 199588 Comm: btrfs-transacti Tainted: P OE 6.9.9-arch1-1 #1 a564e80ab10c5cd5584d6e9a0715907a10e33ca4 > >>> Hardware name: LENOVO 30B4S01W00/102F, BIOS S00KT73A 05/24/2022 > >>> RIP: 0010:lookup_inline_extent_backref+0x5c3/0x760 [btrfs] > >>> RSP: 0018:ffffabb2cd4e3b00 EFLAGS: 00010202 > >>> RAX: 0000000000000001 RBX: ffff992307d5c1c0 RCX: 0000000000000000 > >>> RDX: 0000000000000001 RSI: ffff992312c0d590 RDI: ffff99222faff680 > >>> RBP: 0000000000000000 R08: 00000000000000bc R09: 0000000000000001 > >>> R10: a8000000b5a8c360 R11: 0000000000000000 R12: 000000b5af81a000 > >>> R13: ffffabb2cd4e3b57 R14: 00000000000e6000 R15: ffff9927ca7551f8 > >>> FS: 0000000000000000(0000) GS:ffff992997980000(0000) knlGS:0000000000000000 > >>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>> CR2: 00000ad404625100 CR3: 000000080ea20002 CR4: 00000000003706f0 > >>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >>> Call Trace: > >>> <TASK> > >>> ? lookup_inline_extent_backref+0x5c3/0x760 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > >>> ? __warn.cold+0x8e/0xe8 > >>> ? lookup_inline_extent_backref+0x5c3/0x760 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > >>> ? report_bug+0xff/0x140 > >>> ? handle_bug+0x3c/0x80 > >>> ? exc_invalid_op+0x17/0x70 > >>> ? asm_exc_invalid_op+0x1a/0x20 > >>> ? lookup_inline_extent_backref+0x5c3/0x760 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > >>> ? set_extent_buffer_dirty+0x19/0x170 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > >>> insert_inline_extent_backref+0x82/0x160 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > >>> __btrfs_inc_extent_ref+0x9c/0x220 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > >>> ? __btrfs_run_delayed_refs+0xf64/0xfb0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > >>> __btrfs_run_delayed_refs+0xaf2/0xfb0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > >>> btrfs_run_delayed_refs+0x3b/0xd0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > >>> btrfs_commit_transaction+0x6c/0xc80 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > >>> ? start_transaction+0x22c/0x830 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > >>> transaction_kthread+0x159/0x1c0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > >>> > >>> followed by leaf dump with items relevant to the numbers: > >>> > >>> item 117 key (780331704320 168 942080) itemoff 11917 itemsize 37 > >>> extent refs 1 gen 2245328 flags 1 > >>> ref#0: shared data backref parent 4455386873856 count 1 > >>> item 118 key (780332646400 168 942080) itemoff 11880 itemsize 37 > >>> extent refs 1 gen 2245328 flags 1 > >>> ref#0: shared data backref parent 4455386873856 count 1 > >>> item 119 key (780333588480 168 925696) itemoff 11827 itemsize 53 > >>> ^^^^^^^^^^^^^^^^^^^^^^^ > >>> > >>> extent refs 1 gen 2245328 flags 1 > >>> ref#0: extent data backref root 2404 objectid 1141024 offset 0 count 1 > >>> item 120 key (780334530560 168 942080) itemoff 11774 itemsize 53 > >>> extent refs 1 gen 2245328 flags 1 > >>> ref#0: extent data backref root 2404 objectid 1141025 offset 0 count 1 > >>> item 121 key (780335472640 168 942080) itemoff 11721 itemsize 53 > >>> extent refs 1 gen 2245328 flags 1 > >>> ref#0: extent data backref root 2404 objectid 1141026 offset 0 count 1 > >>> > >>> as you can see item 119 is the problematic one and also out of sequence, the > >>> adjacent items have the key offset 942080. Which confirms the bitlip > >>> case. > >>> > >>> As for any bitflip induced errors, it's hard to tell how far it got > >>> propagated, this could be the only instance or there could be other > >>> items referring to that one too. > >>> > >>> We don't have any ready made tool for fixing that, the bitlips hit > >>> random data structure groups or data, each is basically unique and would > >>> require analysis of tree dump and look for clues how bad it is. > >>> > >> > >> Since we're pretty sure it's a bitflip now, would you please provide the > >> following info? > >> > >> - History of the fs > >> Since you're using Arch kernel, and since 5.14 we have all the write- > >> time checkers, normally we should detect such out-of-key situation by > >> flipping the fs RO. > >> I'm wondering if the fs is handled by some older kernels thus tree- > >> checker didn't catch it early. > >> > >> - The hardware spec > >> The dmesg only contains hardware spec "LENOVO 30B4S01W00", which seems > >> to be a workstation. > >> I'm wondering if it's certain CPU models which leads to possible > >> unreliable memories. > >> From my experience, the memory chip itself is pretty rare to be the > >> cause, but either the connection (from BGA to DIMM slot) or the memory > >> controller (nowadays in the CPU die). > >> > >> Thanks, > >> Qu > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Force remove of broken extent/subvolume? (Crash in btrfs_run_delayed_refs) 2024-08-05 8:16 ` Emil.s @ 2024-08-05 8:59 ` Qu Wenruo 2024-08-10 18:39 ` Emil.s 0 siblings, 1 reply; 9+ messages in thread From: Qu Wenruo @ 2024-08-05 8:59 UTC (permalink / raw) To: Emil.s, Yuwei Han; +Cc: dsterba, linux-btrfs 在 2024/8/5 17:46, Emil.s 写道: >> For curiosity, did you setup any scrub after rebuild FS? > > The new FS is built on new drives, on new hardware and new Linux > kernel. And I'm rsyncing over all files so that everything will be > built from scratch. > > However, I still got a snapshot (from 2024-03-20) on my offsite backup. > I just scrubbed that drive, and it reports no errors. I'm also quite > sure I scrubbed the corrupt drive quite recently without issues. > > Another interesting note is that I'm also unable to send any file > system from the corrupt drive (which is probably a good thing). > ``` > $ btrfs send /mnt/snapshots/user_data_2024-03-20 > /dev/null > At subvol /mnt/snapshots/user_data_2024-03-20 > ERROR: send ioctl failed with -30: Read-only file system > ``` > > Wasn't expecting a "Read-only file system" error when sending a > read-only snapshot? (But maybe that is expected?). In that case, please provide the dmesg of that incident. (if any) It looks like something else is wrong. Thanks, Qu > > > On Sun, 28 Jul 2024 at 18:09, Yuwei Han <hrx@bupt.moe> wrote: >> >> >> >> 在 2024/7/26 18:52, Emil.s 写道: >>>> As for any bitflip induced errors, it's hard to tell how far it got >>>> propagated, this could be the only instance or there could be other >>>> items referring to that one too. >>> >>> Right, yeah that sounds a bit more challenging then I initially thought. >>> Maybe it is easier to just rebuild the array after all. >>> >>> And in regards to Qu's question, that is probably a good idea anyhow. >>> >>>> - History of the fs >>>> - The hardware spec >>> >>> This has been my personal NAS / home server for quite some time. >>> It's basically a mix of just leftover desktop hardware (without ECC memory). >>> >>> It was a 12 year old Gigabyte H77-D3H motherboard, an Intel i7-2600 CPU >>> and 4 DDR3 DIMMs, all of different types and brands. >>> The disks are WD red series, and I see now that one of them has over >>> 80k power on hours. >>> >>> I know I did a rebuild about 5 years ago so the FS was probably >>> created using Ubuntu server 18.04 (Linux 4.15), which has been >>> upgraded to the major LTS versions since then. >>> I actually hit this error when I was doing the "final backup" before >>> retiring this setup, and it seems it was about time! (Was running >>> Ubuntu 22.04 / Linux 5.15) >>> >> For curiosity, did you setup any scrub after rebuild FS? >>> The Arch setup on the Thinkstation is my workstation where I attempted >>> the data recovery. >>> >>> So due to the legacy hardware and crappy setup I think it's worth >>> wasting more time here. >>> >>> But thanks a lot for the detailed answer, much appreciated! >>> >>> Best, >>> Emil >>> >>> On Fri, 26 Jul 2024 at 01:19, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: >>>> >>>> >>>> >>>> 在 2024/7/26 08:17, David Sterba 写道: >>>>> On Thu, Jul 25, 2024 at 11:06:00PM +0200, Emil.s wrote: >>>>>> Hello! >>>>>> >>>>>> I got a corrupt filesystem due to backpointer mismatches: >>>>>> --- >>>>>> [2/7] checking extents >>>>>> data extent[780333588480, 942080] size mismatch, extent item size >>>>>> 925696 file item size 942080 >>>>> >>>>> This looks like a single bit flip: >>>>> >>>>>>>> bin(925696) >>>>> '0b11100010000000000000' >>>>>>>> bin(942080) >>>>> '0b11100110000000000000' >>>>>>>> bin(942080 ^ 925696) >>>>> 0b100000000000000' >>>>> >>>>> or an off by one error, as the delta is 0x4000, 4x page which is one >>>>> node size. >>>>> >>>>>> backpointer mismatch on [780333588480 925696] >>>>>> --- >>>>>> >>>>>> However only two extents seem to be affected, in a subvolume only used >>>>>> for backups. >>>>>> >>>>>> Since I've not been able to repair it, I thought that I could just >>>>>> delete the subvolume and recreate it. >>>>>> But now the btrfs_run_delayed_refs function crashes a while after >>>>>> mounting the filesystem. (Which is quite obvious when I think about >>>>>> it, since I guess it's trying to reclaim space, hitting the bad extent >>>>>> in the process?) >>>>>> >>>>>> Anyhow, is it possible to force removal of these extents in any way? >>>>>> My understanding is that extents are mapped to a specific subvolume as >>>>>> well? >>>>>> >>>>>> Here is the full crash dump: >>>>>> https://gist.github.com/sandnabba/e3ed7f57e4d32f404355fdf988fcfbff >>>>> >>>>> WARNING: CPU: 3 PID: 199588 at fs/btrfs/extent-tree.c:858 lookup_inline_extent_backref+0x5c3/0x760 [btrfs] >>>>> >>>>> 858 } else if (WARN_ON(ret)) { >>>>> 859 btrfs_print_leaf(path->nodes[0]); >>>>> 860 btrfs_err(fs_info, >>>>> 861 "extent item not found for insert, bytenr %llu num_bytes %llu parent %llu root_objectid %llu owner %llu offset %llu", >>>>> 862 bytenr, num_bytes, parent, root_objectid, owner, >>>>> 863 offset); >>>>> 864 ret = -EUCLEAN; >>>>> 865 goto out; >>>>> 866 } >>>>> 867 >>>>> >>>>> CPU: 3 PID: 199588 Comm: btrfs-transacti Tainted: P OE 6.9.9-arch1-1 #1 a564e80ab10c5cd5584d6e9a0715907a10e33ca4 >>>>> Hardware name: LENOVO 30B4S01W00/102F, BIOS S00KT73A 05/24/2022 >>>>> RIP: 0010:lookup_inline_extent_backref+0x5c3/0x760 [btrfs] >>>>> RSP: 0018:ffffabb2cd4e3b00 EFLAGS: 00010202 >>>>> RAX: 0000000000000001 RBX: ffff992307d5c1c0 RCX: 0000000000000000 >>>>> RDX: 0000000000000001 RSI: ffff992312c0d590 RDI: ffff99222faff680 >>>>> RBP: 0000000000000000 R08: 00000000000000bc R09: 0000000000000001 >>>>> R10: a8000000b5a8c360 R11: 0000000000000000 R12: 000000b5af81a000 >>>>> R13: ffffabb2cd4e3b57 R14: 00000000000e6000 R15: ffff9927ca7551f8 >>>>> FS: 0000000000000000(0000) GS:ffff992997980000(0000) knlGS:0000000000000000 >>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>> CR2: 00000ad404625100 CR3: 000000080ea20002 CR4: 00000000003706f0 >>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>>> Call Trace: >>>>> <TASK> >>>>> ? lookup_inline_extent_backref+0x5c3/0x760 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] >>>>> ? __warn.cold+0x8e/0xe8 >>>>> ? lookup_inline_extent_backref+0x5c3/0x760 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] >>>>> ? report_bug+0xff/0x140 >>>>> ? handle_bug+0x3c/0x80 >>>>> ? exc_invalid_op+0x17/0x70 >>>>> ? asm_exc_invalid_op+0x1a/0x20 >>>>> ? lookup_inline_extent_backref+0x5c3/0x760 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] >>>>> ? set_extent_buffer_dirty+0x19/0x170 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] >>>>> insert_inline_extent_backref+0x82/0x160 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] >>>>> __btrfs_inc_extent_ref+0x9c/0x220 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] >>>>> ? __btrfs_run_delayed_refs+0xf64/0xfb0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] >>>>> __btrfs_run_delayed_refs+0xaf2/0xfb0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] >>>>> btrfs_run_delayed_refs+0x3b/0xd0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] >>>>> btrfs_commit_transaction+0x6c/0xc80 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] >>>>> ? start_transaction+0x22c/0x830 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] >>>>> transaction_kthread+0x159/0x1c0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] >>>>> >>>>> followed by leaf dump with items relevant to the numbers: >>>>> >>>>> item 117 key (780331704320 168 942080) itemoff 11917 itemsize 37 >>>>> extent refs 1 gen 2245328 flags 1 >>>>> ref#0: shared data backref parent 4455386873856 count 1 >>>>> item 118 key (780332646400 168 942080) itemoff 11880 itemsize 37 >>>>> extent refs 1 gen 2245328 flags 1 >>>>> ref#0: shared data backref parent 4455386873856 count 1 >>>>> item 119 key (780333588480 168 925696) itemoff 11827 itemsize 53 >>>>> ^^^^^^^^^^^^^^^^^^^^^^^ >>>>> >>>>> extent refs 1 gen 2245328 flags 1 >>>>> ref#0: extent data backref root 2404 objectid 1141024 offset 0 count 1 >>>>> item 120 key (780334530560 168 942080) itemoff 11774 itemsize 53 >>>>> extent refs 1 gen 2245328 flags 1 >>>>> ref#0: extent data backref root 2404 objectid 1141025 offset 0 count 1 >>>>> item 121 key (780335472640 168 942080) itemoff 11721 itemsize 53 >>>>> extent refs 1 gen 2245328 flags 1 >>>>> ref#0: extent data backref root 2404 objectid 1141026 offset 0 count 1 >>>>> >>>>> as you can see item 119 is the problematic one and also out of sequence, the >>>>> adjacent items have the key offset 942080. Which confirms the bitlip >>>>> case. >>>>> >>>>> As for any bitflip induced errors, it's hard to tell how far it got >>>>> propagated, this could be the only instance or there could be other >>>>> items referring to that one too. >>>>> >>>>> We don't have any ready made tool for fixing that, the bitlips hit >>>>> random data structure groups or data, each is basically unique and would >>>>> require analysis of tree dump and look for clues how bad it is. >>>>> >>>> >>>> Since we're pretty sure it's a bitflip now, would you please provide the >>>> following info? >>>> >>>> - History of the fs >>>> Since you're using Arch kernel, and since 5.14 we have all the write- >>>> time checkers, normally we should detect such out-of-key situation by >>>> flipping the fs RO. >>>> I'm wondering if the fs is handled by some older kernels thus tree- >>>> checker didn't catch it early. >>>> >>>> - The hardware spec >>>> The dmesg only contains hardware spec "LENOVO 30B4S01W00", which seems >>>> to be a workstation. >>>> I'm wondering if it's certain CPU models which leads to possible >>>> unreliable memories. >>>> From my experience, the memory chip itself is pretty rare to be the >>>> cause, but either the connection (from BGA to DIMM slot) or the memory >>>> controller (nowadays in the CPU die). >>>> >>>> Thanks, >>>> Qu >>> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Force remove of broken extent/subvolume? (Crash in btrfs_run_delayed_refs) 2024-08-05 8:59 ` Qu Wenruo @ 2024-08-10 18:39 ` Emil.s 0 siblings, 0 replies; 9+ messages in thread From: Emil.s @ 2024-08-10 18:39 UTC (permalink / raw) To: Qu Wenruo; +Cc: Yuwei Han, dsterba, linux-btrfs Hi again, Did just spin up the old corrupt array to reproduce the error. Now I actually got one more line: "ERROR: failed to read stream from kernel: Bad file descriptor" I updated the system earlier this week, so I'm now on Linux 6.10.1 (last try was with 6.9.9). Full output: ``` $ btrfs send /mnt/snapshots/user_data_2024-03-20 > /dev/null At subvol /mnt/snapshots/user_data_2024-03-20 ERROR: send ioctl failed with -30: Read-only file system ERROR: failed to read stream from kernel: Bad file descriptor ``` But there are no additional log messages showing up in the kernel log / dmesg. Best regards Emil On Mon, 5 Aug 2024 at 10:59, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > > > 在 2024/8/5 17:46, Emil.s 写道: > >> For curiosity, did you setup any scrub after rebuild FS? > > > > The new FS is built on new drives, on new hardware and new Linux > > kernel. And I'm rsyncing over all files so that everything will be > > built from scratch. > > > > However, I still got a snapshot (from 2024-03-20) on my offsite backup. > > I just scrubbed that drive, and it reports no errors. I'm also quite > > sure I scrubbed the corrupt drive quite recently without issues. > > > > Another interesting note is that I'm also unable to send any file > > system from the corrupt drive (which is probably a good thing). > > ``` > > $ btrfs send /mnt/snapshots/user_data_2024-03-20 > /dev/null > > At subvol /mnt/snapshots/user_data_2024-03-20 > > ERROR: send ioctl failed with -30: Read-only file system > > ``` > > > > Wasn't expecting a "Read-only file system" error when sending a > > read-only snapshot? (But maybe that is expected?). > > In that case, please provide the dmesg of that incident. (if any) > > It looks like something else is wrong. > > Thanks, > Qu > > > > > > On Sun, 28 Jul 2024 at 18:09, Yuwei Han <hrx@bupt.moe> wrote: > >> > >> > >> > >> 在 2024/7/26 18:52, Emil.s 写道: > >>>> As for any bitflip induced errors, it's hard to tell how far it got > >>>> propagated, this could be the only instance or there could be other > >>>> items referring to that one too. > >>> > >>> Right, yeah that sounds a bit more challenging then I initially thought. > >>> Maybe it is easier to just rebuild the array after all. > >>> > >>> And in regards to Qu's question, that is probably a good idea anyhow. > >>> > >>>> - History of the fs > >>>> - The hardware spec > >>> > >>> This has been my personal NAS / home server for quite some time. > >>> It's basically a mix of just leftover desktop hardware (without ECC memory). > >>> > >>> It was a 12 year old Gigabyte H77-D3H motherboard, an Intel i7-2600 CPU > >>> and 4 DDR3 DIMMs, all of different types and brands. > >>> The disks are WD red series, and I see now that one of them has over > >>> 80k power on hours. > >>> > >>> I know I did a rebuild about 5 years ago so the FS was probably > >>> created using Ubuntu server 18.04 (Linux 4.15), which has been > >>> upgraded to the major LTS versions since then. > >>> I actually hit this error when I was doing the "final backup" before > >>> retiring this setup, and it seems it was about time! (Was running > >>> Ubuntu 22.04 / Linux 5.15) > >>> > >> For curiosity, did you setup any scrub after rebuild FS? > >>> The Arch setup on the Thinkstation is my workstation where I attempted > >>> the data recovery. > >>> > >>> So due to the legacy hardware and crappy setup I think it's worth > >>> wasting more time here. > >>> > >>> But thanks a lot for the detailed answer, much appreciated! > >>> > >>> Best, > >>> Emil > >>> > >>> On Fri, 26 Jul 2024 at 01:19, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > >>>> > >>>> > >>>> > >>>> 在 2024/7/26 08:17, David Sterba 写道: > >>>>> On Thu, Jul 25, 2024 at 11:06:00PM +0200, Emil.s wrote: > >>>>>> Hello! > >>>>>> > >>>>>> I got a corrupt filesystem due to backpointer mismatches: > >>>>>> --- > >>>>>> [2/7] checking extents > >>>>>> data extent[780333588480, 942080] size mismatch, extent item size > >>>>>> 925696 file item size 942080 > >>>>> > >>>>> This looks like a single bit flip: > >>>>> > >>>>>>>> bin(925696) > >>>>> '0b11100010000000000000' > >>>>>>>> bin(942080) > >>>>> '0b11100110000000000000' > >>>>>>>> bin(942080 ^ 925696) > >>>>> 0b100000000000000' > >>>>> > >>>>> or an off by one error, as the delta is 0x4000, 4x page which is one > >>>>> node size. > >>>>> > >>>>>> backpointer mismatch on [780333588480 925696] > >>>>>> --- > >>>>>> > >>>>>> However only two extents seem to be affected, in a subvolume only used > >>>>>> for backups. > >>>>>> > >>>>>> Since I've not been able to repair it, I thought that I could just > >>>>>> delete the subvolume and recreate it. > >>>>>> But now the btrfs_run_delayed_refs function crashes a while after > >>>>>> mounting the filesystem. (Which is quite obvious when I think about > >>>>>> it, since I guess it's trying to reclaim space, hitting the bad extent > >>>>>> in the process?) > >>>>>> > >>>>>> Anyhow, is it possible to force removal of these extents in any way? > >>>>>> My understanding is that extents are mapped to a specific subvolume as > >>>>>> well? > >>>>>> > >>>>>> Here is the full crash dump: > >>>>>> https://gist.github.com/sandnabba/e3ed7f57e4d32f404355fdf988fcfbff > >>>>> > >>>>> WARNING: CPU: 3 PID: 199588 at fs/btrfs/extent-tree.c:858 lookup_inline_extent_backref+0x5c3/0x760 [btrfs] > >>>>> > >>>>> 858 } else if (WARN_ON(ret)) { > >>>>> 859 btrfs_print_leaf(path->nodes[0]); > >>>>> 860 btrfs_err(fs_info, > >>>>> 861 "extent item not found for insert, bytenr %llu num_bytes %llu parent %llu root_objectid %llu owner %llu offset %llu", > >>>>> 862 bytenr, num_bytes, parent, root_objectid, owner, > >>>>> 863 offset); > >>>>> 864 ret = -EUCLEAN; > >>>>> 865 goto out; > >>>>> 866 } > >>>>> 867 > >>>>> > >>>>> CPU: 3 PID: 199588 Comm: btrfs-transacti Tainted: P OE 6.9.9-arch1-1 #1 a564e80ab10c5cd5584d6e9a0715907a10e33ca4 > >>>>> Hardware name: LENOVO 30B4S01W00/102F, BIOS S00KT73A 05/24/2022 > >>>>> RIP: 0010:lookup_inline_extent_backref+0x5c3/0x760 [btrfs] > >>>>> RSP: 0018:ffffabb2cd4e3b00 EFLAGS: 00010202 > >>>>> RAX: 0000000000000001 RBX: ffff992307d5c1c0 RCX: 0000000000000000 > >>>>> RDX: 0000000000000001 RSI: ffff992312c0d590 RDI: ffff99222faff680 > >>>>> RBP: 0000000000000000 R08: 00000000000000bc R09: 0000000000000001 > >>>>> R10: a8000000b5a8c360 R11: 0000000000000000 R12: 000000b5af81a000 > >>>>> R13: ffffabb2cd4e3b57 R14: 00000000000e6000 R15: ffff9927ca7551f8 > >>>>> FS: 0000000000000000(0000) GS:ffff992997980000(0000) knlGS:0000000000000000 > >>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>>>> CR2: 00000ad404625100 CR3: 000000080ea20002 CR4: 00000000003706f0 > >>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >>>>> Call Trace: > >>>>> <TASK> > >>>>> ? lookup_inline_extent_backref+0x5c3/0x760 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > >>>>> ? __warn.cold+0x8e/0xe8 > >>>>> ? lookup_inline_extent_backref+0x5c3/0x760 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > >>>>> ? report_bug+0xff/0x140 > >>>>> ? handle_bug+0x3c/0x80 > >>>>> ? exc_invalid_op+0x17/0x70 > >>>>> ? asm_exc_invalid_op+0x1a/0x20 > >>>>> ? lookup_inline_extent_backref+0x5c3/0x760 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > >>>>> ? set_extent_buffer_dirty+0x19/0x170 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > >>>>> insert_inline_extent_backref+0x82/0x160 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > >>>>> __btrfs_inc_extent_ref+0x9c/0x220 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > >>>>> ? __btrfs_run_delayed_refs+0xf64/0xfb0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > >>>>> __btrfs_run_delayed_refs+0xaf2/0xfb0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > >>>>> btrfs_run_delayed_refs+0x3b/0xd0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > >>>>> btrfs_commit_transaction+0x6c/0xc80 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > >>>>> ? start_transaction+0x22c/0x830 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > >>>>> transaction_kthread+0x159/0x1c0 [btrfs dcbea9ede49f9413c43a944f40925c800621e78e] > >>>>> > >>>>> followed by leaf dump with items relevant to the numbers: > >>>>> > >>>>> item 117 key (780331704320 168 942080) itemoff 11917 itemsize 37 > >>>>> extent refs 1 gen 2245328 flags 1 > >>>>> ref#0: shared data backref parent 4455386873856 count 1 > >>>>> item 118 key (780332646400 168 942080) itemoff 11880 itemsize 37 > >>>>> extent refs 1 gen 2245328 flags 1 > >>>>> ref#0: shared data backref parent 4455386873856 count 1 > >>>>> item 119 key (780333588480 168 925696) itemoff 11827 itemsize 53 > >>>>> ^^^^^^^^^^^^^^^^^^^^^^^ > >>>>> > >>>>> extent refs 1 gen 2245328 flags 1 > >>>>> ref#0: extent data backref root 2404 objectid 1141024 offset 0 count 1 > >>>>> item 120 key (780334530560 168 942080) itemoff 11774 itemsize 53 > >>>>> extent refs 1 gen 2245328 flags 1 > >>>>> ref#0: extent data backref root 2404 objectid 1141025 offset 0 count 1 > >>>>> item 121 key (780335472640 168 942080) itemoff 11721 itemsize 53 > >>>>> extent refs 1 gen 2245328 flags 1 > >>>>> ref#0: extent data backref root 2404 objectid 1141026 offset 0 count 1 > >>>>> > >>>>> as you can see item 119 is the problematic one and also out of sequence, the > >>>>> adjacent items have the key offset 942080. Which confirms the bitlip > >>>>> case. > >>>>> > >>>>> As for any bitflip induced errors, it's hard to tell how far it got > >>>>> propagated, this could be the only instance or there could be other > >>>>> items referring to that one too. > >>>>> > >>>>> We don't have any ready made tool for fixing that, the bitlips hit > >>>>> random data structure groups or data, each is basically unique and would > >>>>> require analysis of tree dump and look for clues how bad it is. > >>>>> > >>>> > >>>> Since we're pretty sure it's a bitflip now, would you please provide the > >>>> following info? > >>>> > >>>> - History of the fs > >>>> Since you're using Arch kernel, and since 5.14 we have all the write- > >>>> time checkers, normally we should detect such out-of-key situation by > >>>> flipping the fs RO. > >>>> I'm wondering if the fs is handled by some older kernels thus tree- > >>>> checker didn't catch it early. > >>>> > >>>> - The hardware spec > >>>> The dmesg only contains hardware spec "LENOVO 30B4S01W00", which seems > >>>> to be a workstation. > >>>> I'm wondering if it's certain CPU models which leads to possible > >>>> unreliable memories. > >>>> From my experience, the memory chip itself is pretty rare to be the > >>>> cause, but either the connection (from BGA to DIMM slot) or the memory > >>>> controller (nowadays in the CPU die). > >>>> > >>>> Thanks, > >>>> Qu > >>> ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-08-10 18:46 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-07-25 21:06 Force remove of broken extent/subvolume? (Crash in btrfs_run_delayed_refs) Emil.s 2024-07-25 22:47 ` David Sterba 2024-07-25 23:19 ` Qu Wenruo 2024-07-26 10:52 ` Emil.s 2024-07-27 0:42 ` Qu Wenruo 2024-07-28 16:09 ` Yuwei Han 2024-08-05 8:16 ` Emil.s 2024-08-05 8:59 ` Qu Wenruo 2024-08-10 18:39 ` Emil.s
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox