* (no subject) @ 2024-12-05 18:03 Kent Overstreet 2024-12-05 18:04 ` arm64: stacktrace: unwind exception boundaries Kent Overstreet 0 siblings, 1 reply; 7+ messages in thread From: Kent Overstreet @ 2024-12-05 18:03 UTC (permalink / raw) To: linux-arm-kernel, Mark Rutland, linux-bcachefs linux-kernel@vger.kernel.org Cc: Bcc: Subject: Re: arm64: stacktrace: unwind exception boundaries Message-ID: <cqmiqyusbdua2x4kh6eefndnneg6ever6doztccko4taky3b6k@ezqikwh2ooe7> Reply-To: On 6.13-rc1, I'm now seeing a ton of test failures due to this warning - what gives? 00104 ========= TEST generic/017 00104 run fstests generic/017 at 2024-12-05 11:47:43 00104 spectre-v4 mitigation disabled by command-line option 00106 bcachefs (vdc): starting version 1.13: inode_has_child_snapshots 00106 bcachefs (vdc): initializing new filesystem 00106 bcachefs (vdc): going read-write 00106 bcachefs (vdc): marking superblocks 00106 bcachefs (vdc): initializing freespace 00106 bcachefs (vdc): done initializing freespace 00106 bcachefs (vdc): reading snapshots table 00106 bcachefs (vdc): reading snapshots done 00106 bcachefs (vdc): done starting filesystem 00200 ------------[ cut here ]------------ 00200 WARNING: CPU: 8 PID: 12571 at arch/arm64/kernel/stacktrace.c:223 arch_stack_walk+0x2c0/0x388 00200 Modules linked in: 00200 CPU: 8 UID: 0 PID: 12571 Comm: cat Not tainted 6.13.0-rc1-ktest-gdf1fce23d033 #10877 00200 Hardware name: linux,dummy-virt (DT) 00200 pstate: 80001005 (Nzcv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--) 00200 pc : arch_stack_walk+0x2c0/0x388 00200 lr : arch_stack_walk+0x1f4/0x388 00200 sp : ffffff80ca757b00 00200 x29: ffffff80ca757b00 x28: ffffff80d6783758 x27: 000000000000036f 00200 x26: ffffff80d6783780 x25: ffffff80c19de180 x24: ffffff80c273ee40 00200 x23: ffffffc080022708 x22: ffffffc080025828 x21: ffffff80ca757be8 00200 x20: ffffffc0800c98c8 x19: ffffff80d8533d10 x18: ffffffffffffffff 00200 x17: 0000000000000000 x16: 0000000000000000 x15: ffffffffffffffff 00200 x14: 0000000000000000 x13: ffffff80d6c92000 x12: 0000000000000000 00200 x11: 0000000000000000 x10: 000000000000000a x9 : ffffffc08086beec 00200 x8 : ffffff80ca757bc0 x7 : 0000000000000000 x6 : ffffff80d8533d28 00200 x5 : ffffff80ca757b60 x4 : ffffff80d8534000 x3 : ffffff80ca757b40 00200 x2 : 0000000000000018 x1 : ffffff80d8533d10 x0 : 0000000000000000 00200 Call trace: 00200 arch_stack_walk+0x2c0/0x388 (P) 00200 arch_stack_walk+0x1f4/0x388 (L) 00200 stack_trace_save_tsk+0x90/0xd8 00200 bch2_save_backtrace+0x68/0x160 00200 bch2_prt_task_backtrace+0x30/0xb8 00200 bch2_btree_transactions_read+0xe0/0x1f0 00200 full_proxy_read+0x60/0xd0 00200 vfs_read+0x80/0x2d8 00200 ksys_read+0x5c/0xf0 00200 __arm64_sys_read+0x20/0x30 00200 invoke_syscall.constprop.0+0x54/0xe8 00200 do_el0_svc+0x44/0xc8 00200 el0_svc+0x18/0x58 00200 el0t_64_sync_handler+0x104/0x130 00200 el0t_64_sync+0x154/0x158 00200 ---[ end trace 0000000000000000 ]--- 00233 hrtimer: interrupt took 49506640 ns 00493 bcachefs (vdb): shutting down 00493 bcachefs (vdb): going read-only 00493 bcachefs (vdb): finished waiting for writes to stop 00493 bcachefs (vdb): flushing journal and stopping allocators, journal seq 18 00493 bcachefs (vdb): flushing journal and stopping allocators complete, journal seq 18 00493 bcachefs (vdb): clean shutdown complete, journal seq 19 00493 bcachefs (vdb): marking filesystem clean 00493 bcachefs (vdb): shutdown complete 00493 bcachefs (vdc): shutting down 00493 bcachefs (vdc): going read-only 00493 bcachefs (vdc): finished waiting for writes to stop 00493 bcachefs (vdc): flushing journal and stopping allocators, journal seq 41457 00493 bcachefs (vdc): flushing journal and stopping allocators complete, journal seq 41459 00493 bcachefs (vdc): clean shutdown complete, journal seq 41460 00493 bcachefs (vdc): marking filesystem clean 00493 bcachefs (vdc): shutdown complete 00495 bcachefs (vdc): starting version 1.13: inode_has_child_snapshots 00495 bcachefs (vdc): recovering from clean shutdown, journal seq 41460 00495 bcachefs (vdc): accounting_read... done 00495 bcachefs (vdc): alloc_read... done 00495 bcachefs (vdc): stripes_read... done 00495 bcachefs (vdc): snapshots_read... done 00495 bcachefs (vdc): going read-write 00495 bcachefs (vdc): journal_replay... done 00495 bcachefs (vdc): resume_logged_ops... done 00495 bcachefs (vdc): delete_dead_inodes... done 00495 bcachefs (vdc): done starting filesystem 00495 bcachefs (vdc): shutting down 00495 bcachefs (vdc): going read-only 00495 bcachefs (vdc): finished waiting for writes to stop 00495 bcachefs (vdc): flushing journal and stopping allocators, journal seq 41460 00495 bcachefs (vdc): flushing journal and stopping allocators complete, journal seq 41460 00495 bcachefs (vdc): clean shutdown complete, journal seq 41461 00495 bcachefs (vdc): marking filesystem clean 00495 bcachefs (vdc): shutdown complete 00495 _check_dmesg: something found in dmesg (see /ktest-out/xfstests/generic/017.dmesg) 00496 00496 /ktest-out/xfstests/generic/017.full: 00496 cat: /ktest-out/xfstests/generic/017.full: No such file or directory 00496 ========= FAILED generic/017 in 391s, exit status 0 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: arm64: stacktrace: unwind exception boundaries 2024-12-05 18:03 Kent Overstreet @ 2024-12-05 18:04 ` Kent Overstreet 2024-12-09 11:37 ` Mark Rutland 0 siblings, 1 reply; 7+ messages in thread From: Kent Overstreet @ 2024-12-05 18:04 UTC (permalink / raw) To: linux-arm-kernel, Mark Rutland, linux-bcachefs, linux-kernel On 6.13-rc1, I'm now seeing a ton of test failures due to this warning - what gives? 00104 ========= TEST generic/017 00104 run fstests generic/017 at 2024-12-05 11:47:43 00104 spectre-v4 mitigation disabled by command-line option 00106 bcachefs (vdc): starting version 1.13: inode_has_child_snapshots 00106 bcachefs (vdc): initializing new filesystem 00106 bcachefs (vdc): going read-write 00106 bcachefs (vdc): marking superblocks 00106 bcachefs (vdc): initializing freespace 00106 bcachefs (vdc): done initializing freespace 00106 bcachefs (vdc): reading snapshots table 00106 bcachefs (vdc): reading snapshots done 00106 bcachefs (vdc): done starting filesystem 00200 ------------[ cut here ]------------ 00200 WARNING: CPU: 8 PID: 12571 at arch/arm64/kernel/stacktrace.c:223 arch_stack_walk+0x2c0/0x388 00200 Modules linked in: 00200 CPU: 8 UID: 0 PID: 12571 Comm: cat Not tainted 6.13.0-rc1-ktest-gdf1fce23d033 #10877 00200 Hardware name: linux,dummy-virt (DT) 00200 pstate: 80001005 (Nzcv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--) 00200 pc : arch_stack_walk+0x2c0/0x388 00200 lr : arch_stack_walk+0x1f4/0x388 00200 sp : ffffff80ca757b00 00200 x29: ffffff80ca757b00 x28: ffffff80d6783758 x27: 000000000000036f 00200 x26: ffffff80d6783780 x25: ffffff80c19de180 x24: ffffff80c273ee40 00200 x23: ffffffc080022708 x22: ffffffc080025828 x21: ffffff80ca757be8 00200 x20: ffffffc0800c98c8 x19: ffffff80d8533d10 x18: ffffffffffffffff 00200 x17: 0000000000000000 x16: 0000000000000000 x15: ffffffffffffffff 00200 x14: 0000000000000000 x13: ffffff80d6c92000 x12: 0000000000000000 00200 x11: 0000000000000000 x10: 000000000000000a x9 : ffffffc08086beec 00200 x8 : ffffff80ca757bc0 x7 : 0000000000000000 x6 : ffffff80d8533d28 00200 x5 : ffffff80ca757b60 x4 : ffffff80d8534000 x3 : ffffff80ca757b40 00200 x2 : 0000000000000018 x1 : ffffff80d8533d10 x0 : 0000000000000000 00200 Call trace: 00200 arch_stack_walk+0x2c0/0x388 (P) 00200 arch_stack_walk+0x1f4/0x388 (L) 00200 stack_trace_save_tsk+0x90/0xd8 00200 bch2_save_backtrace+0x68/0x160 00200 bch2_prt_task_backtrace+0x30/0xb8 00200 bch2_btree_transactions_read+0xe0/0x1f0 00200 full_proxy_read+0x60/0xd0 00200 vfs_read+0x80/0x2d8 00200 ksys_read+0x5c/0xf0 00200 __arm64_sys_read+0x20/0x30 00200 invoke_syscall.constprop.0+0x54/0xe8 00200 do_el0_svc+0x44/0xc8 00200 el0_svc+0x18/0x58 00200 el0t_64_sync_handler+0x104/0x130 00200 el0t_64_sync+0x154/0x158 00200 ---[ end trace 0000000000000000 ]--- 00233 hrtimer: interrupt took 49506640 ns 00493 bcachefs (vdb): shutting down 00493 bcachefs (vdb): going read-only 00493 bcachefs (vdb): finished waiting for writes to stop 00493 bcachefs (vdb): flushing journal and stopping allocators, journal seq 18 00493 bcachefs (vdb): flushing journal and stopping allocators complete, journal seq 18 00493 bcachefs (vdb): clean shutdown complete, journal seq 19 00493 bcachefs (vdb): marking filesystem clean 00493 bcachefs (vdb): shutdown complete 00493 bcachefs (vdc): shutting down 00493 bcachefs (vdc): going read-only 00493 bcachefs (vdc): finished waiting for writes to stop 00493 bcachefs (vdc): flushing journal and stopping allocators, journal seq 41457 00493 bcachefs (vdc): flushing journal and stopping allocators complete, journal seq 41459 00493 bcachefs (vdc): clean shutdown complete, journal seq 41460 00493 bcachefs (vdc): marking filesystem clean 00493 bcachefs (vdc): shutdown complete 00495 bcachefs (vdc): starting version 1.13: inode_has_child_snapshots 00495 bcachefs (vdc): recovering from clean shutdown, journal seq 41460 00495 bcachefs (vdc): accounting_read... done 00495 bcachefs (vdc): alloc_read... done 00495 bcachefs (vdc): stripes_read... done 00495 bcachefs (vdc): snapshots_read... done 00495 bcachefs (vdc): going read-write 00495 bcachefs (vdc): journal_replay... done 00495 bcachefs (vdc): resume_logged_ops... done 00495 bcachefs (vdc): delete_dead_inodes... done 00495 bcachefs (vdc): done starting filesystem 00495 bcachefs (vdc): shutting down 00495 bcachefs (vdc): going read-only 00495 bcachefs (vdc): finished waiting for writes to stop 00495 bcachefs (vdc): flushing journal and stopping allocators, journal seq 41460 00495 bcachefs (vdc): flushing journal and stopping allocators complete, journal seq 41460 00495 bcachefs (vdc): clean shutdown complete, journal seq 41461 00495 bcachefs (vdc): marking filesystem clean 00495 bcachefs (vdc): shutdown complete 00495 _check_dmesg: something found in dmesg (see /ktest-out/xfstests/generic/017.dmesg) 00496 00496 /ktest-out/xfstests/generic/017.full: 00496 cat: /ktest-out/xfstests/generic/017.full: No such file or directory 00496 ========= FAILED generic/017 in 391s, exit status 0 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: arm64: stacktrace: unwind exception boundaries 2024-12-05 18:04 ` arm64: stacktrace: unwind exception boundaries Kent Overstreet @ 2024-12-09 11:37 ` Mark Rutland 2024-12-10 11:27 ` Mark Rutland 0 siblings, 1 reply; 7+ messages in thread From: Mark Rutland @ 2024-12-09 11:37 UTC (permalink / raw) To: Kent Overstreet; +Cc: linux-arm-kernel, linux-bcachefs, linux-kernel Hi Kent, On Thu, Dec 05, 2024 at 01:04:59PM -0500, Kent Overstreet wrote: > On 6.13-rc1, I'm now seeing a ton of test failures due to this warning - > what gives? Sorry about this; I just sent what I *thought* was a fix for this: https://lore.kernel.org/linux-arm-kernel/20241209110351.1876804-1-mark.rutland@arm.com/ ... but re-reading the below I see you're actually hitting a different issue. > 00104 ========= TEST generic/017 > 00104 run fstests generic/017 at 2024-12-05 11:47:43 > 00104 spectre-v4 mitigation disabled by command-line option > 00106 bcachefs (vdc): starting version 1.13: inode_has_child_snapshots > 00106 bcachefs (vdc): initializing new filesystem > 00106 bcachefs (vdc): going read-write > 00106 bcachefs (vdc): marking superblocks > 00106 bcachefs (vdc): initializing freespace > 00106 bcachefs (vdc): done initializing freespace > 00106 bcachefs (vdc): reading snapshots table > 00106 bcachefs (vdc): reading snapshots done > 00106 bcachefs (vdc): done starting filesystem > 00200 ------------[ cut here ]------------ > 00200 WARNING: CPU: 8 PID: 12571 at arch/arm64/kernel/stacktrace.c:223 arch_stack_walk+0x2c0/0x388 Looking at v6.13-rc1, that's the warning in kunwind_next_frame_record_meta() for when the frame_record_meta::type is not a valid value, which likely implies one of the following: (a) The logic to identify a frame_record_meta is wrong. (b) The entry logic has failed to initilialize pt_regs::stackframe::meta on an entry path somehow. (c) The stack has been corrupted, and some frame record has been clobbered to look like a frame_record_meta. The unwind will terminate at this point even if the warning is supressed, so the backtrace from the warning itself should indicate where the frame record was: > 00200 Modules linked in: > 00200 CPU: 8 UID: 0 PID: 12571 Comm: cat Not tainted 6.13.0-rc1-ktest-gdf1fce23d033 #10877 > 00200 Hardware name: linux,dummy-virt (DT) > 00200 pstate: 80001005 (Nzcv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--) > 00200 pc : arch_stack_walk+0x2c0/0x388 > 00200 lr : arch_stack_walk+0x1f4/0x388 > 00200 sp : ffffff80ca757b00 > 00200 x29: ffffff80ca757b00 x28: ffffff80d6783758 x27: 000000000000036f > 00200 x26: ffffff80d6783780 x25: ffffff80c19de180 x24: ffffff80c273ee40 > 00200 x23: ffffffc080022708 x22: ffffffc080025828 x21: ffffff80ca757be8 > 00200 x20: ffffffc0800c98c8 x19: ffffff80d8533d10 x18: ffffffffffffffff > 00200 x17: 0000000000000000 x16: 0000000000000000 x15: ffffffffffffffff > 00200 x14: 0000000000000000 x13: ffffff80d6c92000 x12: 0000000000000000 > 00200 x11: 0000000000000000 x10: 000000000000000a x9 : ffffffc08086beec > 00200 x8 : ffffff80ca757bc0 x7 : 0000000000000000 x6 : ffffff80d8533d28 > 00200 x5 : ffffff80ca757b60 x4 : ffffff80d8534000 x3 : ffffff80ca757b40 > 00200 x2 : 0000000000000018 x1 : ffffff80d8533d10 x0 : 0000000000000000 > 00200 Call trace: > 00200 arch_stack_walk+0x2c0/0x388 (P) > 00200 arch_stack_walk+0x1f4/0x388 (L) > 00200 stack_trace_save_tsk+0x90/0xd8 > 00200 bch2_save_backtrace+0x68/0x160 > 00200 bch2_prt_task_backtrace+0x30/0xb8 > 00200 bch2_btree_transactions_read+0xe0/0x1f0 > 00200 full_proxy_read+0x60/0xd0 > 00200 vfs_read+0x80/0x2d8 > 00200 ksys_read+0x5c/0xf0 > 00200 __arm64_sys_read+0x20/0x30 > 00200 invoke_syscall.constprop.0+0x54/0xe8 > 00200 do_el0_svc+0x44/0xc8 > 00200 el0_svc+0x18/0x58 > 00200 el0t_64_sync_handler+0x104/0x130 > 00200 el0t_64_sync+0x154/0x158 This suggests that the bad frame_record_meta is in the entry code, in the pt_regs. There *should* be a frame_record_meta there with type FRAME_META_TYPE_FINAL, which suggests the probkem isn't case (a) above, and either the entry code is failing to initialise pt_regs::stackframe::type somehow, or that is getting corrupted later. AFAICT, the entry code correctly initializes the frame_record_meta in the kernel_entry assembly macro: /* * Create a metadata frame record. The unwinder will use this to * identify and unwind exception boundaries. */ stp xzr, xzr, [sp, #S_STACKFRAME] .if \el == 0 mov x0, #FRAME_META_TYPE_FINAL .else mov x0, #FRAME_META_TYPE_PT_REGS .endif str x0, [sp, #S_STACKFRAME_TYPE] add x29, sp, #S_STACKFRAME ... and that's shared by *all* exceptions, so if it were wrong I'd expect it to trigger an explosion on the very first backtrace taken under an exception (from userspace or kernel). Given the above, I'd suspect that maybe you're encountering stack corruption here. How are you reproducing this, and how consistently do you see the failure? Is it always triggered by the same test? The below diff will log the bad frame_record_meta::type value; if you're able to run with that it might give an indication as to what's going on. Mark. ---->8---- diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c index caef85462acb6..d1d9436c49101 100644 --- a/arch/arm64/kernel/stacktrace.c +++ b/arch/arm64/kernel/stacktrace.c @@ -205,13 +205,15 @@ kunwind_next_frame_record_meta(struct kunwind_state *state) unsigned long fp = state->common.fp; struct frame_record_meta *meta; struct stack_info *info; + u64 type; info = unwind_find_stack(&state->common, fp, sizeof(*meta)); if (!info) return -EINVAL; meta = (struct frame_record_meta *)fp; - switch (READ_ONCE(meta->type)) { + type = READ_ONCE(meta->type); + switch (type) { case FRAME_META_TYPE_FINAL: if (meta == &task_pt_regs(tsk)->stackframe) return -ENOENT; @@ -220,7 +222,7 @@ kunwind_next_frame_record_meta(struct kunwind_state *state) case FRAME_META_TYPE_PT_REGS: return kunwind_next_regs_pc(state); default: - WARN_ON_ONCE(1); + WARN_ONCE(1, "Bad frame_record_meta::type: 0x%016llx\n", type); return -EINVAL; } } ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: arm64: stacktrace: unwind exception boundaries 2024-12-09 11:37 ` Mark Rutland @ 2024-12-10 11:27 ` Mark Rutland 2024-12-10 12:40 ` Kent Overstreet 0 siblings, 1 reply; 7+ messages in thread From: Mark Rutland @ 2024-12-10 11:27 UTC (permalink / raw) To: Kent Overstreet; +Cc: linux-arm-kernel, linux-bcachefs, linux-kernel On Mon, Dec 09, 2024 at 11:37:12AM +0000, Mark Rutland wrote: > Hi Kent, > > On Thu, Dec 05, 2024 at 01:04:59PM -0500, Kent Overstreet wrote: > > On 6.13-rc1, I'm now seeing a ton of test failures due to this warning - > > what gives? > > Sorry about this; I just sent what I *thought* was a fix for this: > > https://lore.kernel.org/linux-arm-kernel/20241209110351.1876804-1-mark.rutland@arm.com/ > > ... but re-reading the below I see you're actually hitting a different > issue. > > > 00104 ========= TEST generic/017 > > 00104 run fstests generic/017 at 2024-12-05 11:47:43 > > 00104 spectre-v4 mitigation disabled by command-line option > > 00106 bcachefs (vdc): starting version 1.13: inode_has_child_snapshots > > 00106 bcachefs (vdc): initializing new filesystem > > 00106 bcachefs (vdc): going read-write > > 00106 bcachefs (vdc): marking superblocks > > 00106 bcachefs (vdc): initializing freespace > > 00106 bcachefs (vdc): done initializing freespace > > 00106 bcachefs (vdc): reading snapshots table > > 00106 bcachefs (vdc): reading snapshots done > > 00106 bcachefs (vdc): done starting filesystem > > 00200 ------------[ cut here ]------------ > > 00200 WARNING: CPU: 8 PID: 12571 at arch/arm64/kernel/stacktrace.c:223 arch_stack_walk+0x2c0/0x388 > > Looking at v6.13-rc1, that's the warning in > kunwind_next_frame_record_meta() for when the frame_record_meta::type is > not a valid value, which likely implies one of the following: > > (a) The logic to identify a frame_record_meta is wrong. > > (b) The entry logic has failed to initilialize pt_regs::stackframe::meta > on an entry path somehow. > > (c) The stack has been corrupted, and some frame record has been > clobbered to look like a frame_record_meta. Looking some more, I see that bch2_btree_transactions_read() is trying to unwind other tasks, and I believe what's happening here is that the unwindee isn't actually blocked for the duration of the unwind, leading to the unwinder encountering junk and consequently producing the warning. As a test case, it's possible to trigger similar with a few parallel instances of: while true; do cat /proc/*/stack > /dev/null The only thing we can do on the arm64 side is remove the WARN_ON_ONCE(), which'll get rid of the splat. It seems we've never been unlucky enough to hit a stale fgraph entry, or that would've blown up also. Regardless of the way arm64 behaves here, the unwind performed by bch2_btree_transactions_read() is going to contain garbage unless the task is pinned in a blocked state. AFAICT the way btree_trans::locking_wait::task is used is here is racy, and there's no guarantee that the unwindee is actually blocked. Mark. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: arm64: stacktrace: unwind exception boundaries 2024-12-10 11:27 ` Mark Rutland @ 2024-12-10 12:40 ` Kent Overstreet 2024-12-10 19:17 ` Mark Rutland 0 siblings, 1 reply; 7+ messages in thread From: Kent Overstreet @ 2024-12-10 12:40 UTC (permalink / raw) To: Mark Rutland; +Cc: linux-arm-kernel, linux-bcachefs, linux-kernel On Tue, Dec 10, 2024 at 11:27:19AM +0000, Mark Rutland wrote: > On Mon, Dec 09, 2024 at 11:37:12AM +0000, Mark Rutland wrote: > > Hi Kent, > > > > On Thu, Dec 05, 2024 at 01:04:59PM -0500, Kent Overstreet wrote: > > > On 6.13-rc1, I'm now seeing a ton of test failures due to this warning - > > > what gives? > > > > Sorry about this; I just sent what I *thought* was a fix for this: > > > > https://lore.kernel.org/linux-arm-kernel/20241209110351.1876804-1-mark.rutland@arm.com/ > > > > ... but re-reading the below I see you're actually hitting a different > > issue. > > > > > 00104 ========= TEST generic/017 > > > 00104 run fstests generic/017 at 2024-12-05 11:47:43 > > > 00104 spectre-v4 mitigation disabled by command-line option > > > 00106 bcachefs (vdc): starting version 1.13: inode_has_child_snapshots > > > 00106 bcachefs (vdc): initializing new filesystem > > > 00106 bcachefs (vdc): going read-write > > > 00106 bcachefs (vdc): marking superblocks > > > 00106 bcachefs (vdc): initializing freespace > > > 00106 bcachefs (vdc): done initializing freespace > > > 00106 bcachefs (vdc): reading snapshots table > > > 00106 bcachefs (vdc): reading snapshots done > > > 00106 bcachefs (vdc): done starting filesystem > > > 00200 ------------[ cut here ]------------ > > > 00200 WARNING: CPU: 8 PID: 12571 at arch/arm64/kernel/stacktrace.c:223 arch_stack_walk+0x2c0/0x388 > > > > Looking at v6.13-rc1, that's the warning in > > kunwind_next_frame_record_meta() for when the frame_record_meta::type is > > not a valid value, which likely implies one of the following: > > > > (a) The logic to identify a frame_record_meta is wrong. > > > > (b) The entry logic has failed to initilialize pt_regs::stackframe::meta > > on an entry path somehow. > > > > (c) The stack has been corrupted, and some frame record has been > > clobbered to look like a frame_record_meta. > > Looking some more, I see that bch2_btree_transactions_read() is trying > to unwind other tasks, and I believe what's happening here is that the > unwindee isn't actually blocked for the duration of the unwind, leading > to the unwinder encountering junk and consequently producing the > warning. > > As a test case, it's possible to trigger similar with a few parallel > instances of: > > while true; do cat /proc/*/stack > /dev/null > > The only thing we can do on the arm64 side is remove the WARN_ON_ONCE(), > which'll get rid of the splat. It seems we've never been unlucky enough > to hit a stale fgraph entry, or that would've blown up also. > > Regardless of the way arm64 behaves here, the unwind performed by > bch2_btree_transactions_read() is going to contain garbage unless the > task is pinned in a blocked state. AFAICT the way > btree_trans::locking_wait::task is used is here is racy, and there's no > guarantee that the unwindee is actually blocked. Occasionally returning garbage is completely fine, as long as the interface is otherwise safe. This is debug info; it's important that it be available and we can't impose additional synchronization for it. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: arm64: stacktrace: unwind exception boundaries 2024-12-10 12:40 ` Kent Overstreet @ 2024-12-10 19:17 ` Mark Rutland 2024-12-10 19:31 ` Kent Overstreet 0 siblings, 1 reply; 7+ messages in thread From: Mark Rutland @ 2024-12-10 19:17 UTC (permalink / raw) To: Kent Overstreet; +Cc: linux-arm-kernel, linux-bcachefs, linux-kernel On Tue, Dec 10, 2024 at 07:40:04AM -0500, Kent Overstreet wrote: > On Tue, Dec 10, 2024 at 11:27:19AM +0000, Mark Rutland wrote: > > On Mon, Dec 09, 2024 at 11:37:12AM +0000, Mark Rutland wrote: > > > On Thu, Dec 05, 2024 at 01:04:59PM -0500, Kent Overstreet wrote: > > Looking some more, I see that bch2_btree_transactions_read() is trying > > to unwind other tasks, and I believe what's happening here is that the > > unwindee isn't actually blocked for the duration of the unwind, leading > > to the unwinder encountering junk and consequently producing the > > warning. > > > > As a test case, it's possible to trigger similar with a few parallel > > instances of: > > > > while true; do cat /proc/*/stack > /dev/null > > > > The only thing we can do on the arm64 side is remove the WARN_ON_ONCE(), > > which'll get rid of the splat. It seems we've never been unlucky enough > > to hit a stale fgraph entry, or that would've blown up also. > > > > Regardless of the way arm64 behaves here, the unwind performed by > > bch2_btree_transactions_read() is going to contain garbage unless the > > task is pinned in a blocked state. AFAICT the way > > btree_trans::locking_wait::task is used is here is racy, and there's no > > guarantee that the unwindee is actually blocked. > > Occasionally returning garbage is completely fine, as long as the > interface is otherwise safe. This is debug info; it's important that it > be available and we can't impose additional synchronization for it. Sure thing; just note that there's no guarantee that this is only "occasionally" garbage -- this could be wrong 1% of the time or 99% of the time depending on the specific scenario, HW it's running on, etc. As long as you're happy to hold the pieces when that happens, that's fine. I've pushed out fixes to the arm64/stacktrace/fixes branch on my kernel.org git repo: https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/stacktrace/fixes git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/stacktrace/fixes ... and I'll get that out as a series on the list tomorrow. Mark. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: arm64: stacktrace: unwind exception boundaries 2024-12-10 19:17 ` Mark Rutland @ 2024-12-10 19:31 ` Kent Overstreet 0 siblings, 0 replies; 7+ messages in thread From: Kent Overstreet @ 2024-12-10 19:31 UTC (permalink / raw) To: Mark Rutland; +Cc: linux-arm-kernel, linux-bcachefs, linux-kernel On Tue, Dec 10, 2024 at 07:17:36PM +0000, Mark Rutland wrote: > On Tue, Dec 10, 2024 at 07:40:04AM -0500, Kent Overstreet wrote: > > On Tue, Dec 10, 2024 at 11:27:19AM +0000, Mark Rutland wrote: > > > On Mon, Dec 09, 2024 at 11:37:12AM +0000, Mark Rutland wrote: > > > > On Thu, Dec 05, 2024 at 01:04:59PM -0500, Kent Overstreet wrote: > > > > Looking some more, I see that bch2_btree_transactions_read() is trying > > > to unwind other tasks, and I believe what's happening here is that the > > > unwindee isn't actually blocked for the duration of the unwind, leading > > > to the unwinder encountering junk and consequently producing the > > > warning. > > > > > > As a test case, it's possible to trigger similar with a few parallel > > > instances of: > > > > > > while true; do cat /proc/*/stack > /dev/null > > > > > > The only thing we can do on the arm64 side is remove the WARN_ON_ONCE(), > > > which'll get rid of the splat. It seems we've never been unlucky enough > > > to hit a stale fgraph entry, or that would've blown up also. > > > > > > Regardless of the way arm64 behaves here, the unwind performed by > > > bch2_btree_transactions_read() is going to contain garbage unless the > > > task is pinned in a blocked state. AFAICT the way > > > btree_trans::locking_wait::task is used is here is racy, and there's no > > > guarantee that the unwindee is actually blocked. > > > > Occasionally returning garbage is completely fine, as long as the > > interface is otherwise safe. This is debug info; it's important that it > > be available and we can't impose additional synchronization for it. > > Sure thing; just note that there's no guarantee that this is only > "occasionally" garbage -- this could be wrong 1% of the time or 99% of > the time depending on the specific scenario, HW it's running on, etc. As > long as you're happy to hold the pieces when that happens, that's fine. > > I've pushed out fixes to the arm64/stacktrace/fixes branch on my > kernel.org git repo: > > https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/stacktrace/fixes > git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/stacktrace/fixes > > ... and I'll get that out as a series on the list tomorrow. Wonderful The fact that /proc/pid/stack doesn't give anything if the process is in TASK_RUNNING has been a problem in the past, yet this is something we're not consistent on in the kernel; sysrq-trigger backtraces do give backtraces for running processes (which then may require sorting through everything). So thanks for this :) ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-12-10 19:32 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-12-05 18:03 Kent Overstreet 2024-12-05 18:04 ` arm64: stacktrace: unwind exception boundaries Kent Overstreet 2024-12-09 11:37 ` Mark Rutland 2024-12-10 11:27 ` Mark Rutland 2024-12-10 12:40 ` Kent Overstreet 2024-12-10 19:17 ` Mark Rutland 2024-12-10 19:31 ` Kent Overstreet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox