Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed

* (no subject)
@ 2024-12-05 18:03 Kent Overstreet
  2024-12-05 18:04 ` arm64: stacktrace: unwind exception boundaries Kent Overstreet
  0 siblings, 1 reply; 7+ messages in thread
From: Kent Overstreet @ 2024-12-05 18:03 UTC (permalink / raw)
  To: linux-arm-kernel, Mark Rutland, linux-bcachefs

linux-kernel@vger.kernel.org
Cc:
Bcc:
Subject: Re: arm64: stacktrace: unwind exception boundaries
Message-ID: <cqmiqyusbdua2x4kh6eefndnneg6ever6doztccko4taky3b6k@ezqikwh2ooe7>
Reply-To:

On 6.13-rc1, I'm now seeing a ton of test failures due to this warning -
what gives?

00104 ========= TEST   generic/017
00104        run fstests generic/017 at 2024-12-05 11:47:43
00104 spectre-v4 mitigation disabled by command-line option
00106 bcachefs (vdc): starting version 1.13: inode_has_child_snapshots
00106 bcachefs (vdc): initializing new filesystem
00106 bcachefs (vdc): going read-write
00106 bcachefs (vdc): marking superblocks
00106 bcachefs (vdc): initializing freespace
00106 bcachefs (vdc): done initializing freespace
00106 bcachefs (vdc): reading snapshots table
00106 bcachefs (vdc): reading snapshots done
00106 bcachefs (vdc): done starting filesystem
00200 ------------[ cut here ]------------
00200 WARNING: CPU: 8 PID: 12571 at arch/arm64/kernel/stacktrace.c:223 arch_stack_walk+0x2c0/0x388
00200 Modules linked in:
00200 CPU: 8 UID: 0 PID: 12571 Comm: cat Not tainted 6.13.0-rc1-ktest-gdf1fce23d033 #10877
00200 Hardware name: linux,dummy-virt (DT)
00200 pstate: 80001005 (Nzcv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
00200 pc : arch_stack_walk+0x2c0/0x388
00200 lr : arch_stack_walk+0x1f4/0x388
00200 sp : ffffff80ca757b00
00200 x29: ffffff80ca757b00 x28: ffffff80d6783758 x27: 000000000000036f
00200 x26: ffffff80d6783780 x25: ffffff80c19de180 x24: ffffff80c273ee40
00200 x23: ffffffc080022708 x22: ffffffc080025828 x21: ffffff80ca757be8
00200 x20: ffffffc0800c98c8 x19: ffffff80d8533d10 x18: ffffffffffffffff
00200 x17: 0000000000000000 x16: 0000000000000000 x15: ffffffffffffffff
00200 x14: 0000000000000000 x13: ffffff80d6c92000 x12: 0000000000000000
00200 x11: 0000000000000000 x10: 000000000000000a x9 : ffffffc08086beec
00200 x8 : ffffff80ca757bc0 x7 : 0000000000000000 x6 : ffffff80d8533d28
00200 x5 : ffffff80ca757b60 x4 : ffffff80d8534000 x3 : ffffff80ca757b40
00200 x2 : 0000000000000018 x1 : ffffff80d8533d10 x0 : 0000000000000000
00200 Call trace:
00200  arch_stack_walk+0x2c0/0x388 (P)
00200  arch_stack_walk+0x1f4/0x388 (L)
00200  stack_trace_save_tsk+0x90/0xd8
00200  bch2_save_backtrace+0x68/0x160
00200  bch2_prt_task_backtrace+0x30/0xb8
00200  bch2_btree_transactions_read+0xe0/0x1f0
00200  full_proxy_read+0x60/0xd0
00200  vfs_read+0x80/0x2d8
00200  ksys_read+0x5c/0xf0
00200  __arm64_sys_read+0x20/0x30
00200  invoke_syscall.constprop.0+0x54/0xe8
00200  do_el0_svc+0x44/0xc8
00200  el0_svc+0x18/0x58
00200  el0t_64_sync_handler+0x104/0x130
00200  el0t_64_sync+0x154/0x158
00200 ---[ end trace 0000000000000000 ]---
00233 hrtimer: interrupt took 49506640 ns
00493 bcachefs (vdb): shutting down
00493 bcachefs (vdb): going read-only
00493 bcachefs (vdb): finished waiting for writes to stop
00493 bcachefs (vdb): flushing journal and stopping allocators, journal seq 18
00493 bcachefs (vdb): flushing journal and stopping allocators complete, journal seq 18
00493 bcachefs (vdb): clean shutdown complete, journal seq 19
00493 bcachefs (vdb): marking filesystem clean
00493 bcachefs (vdb): shutdown complete
00493 bcachefs (vdc): shutting down
00493 bcachefs (vdc): going read-only
00493 bcachefs (vdc): finished waiting for writes to stop
00493 bcachefs (vdc): flushing journal and stopping allocators, journal seq 41457
00493 bcachefs (vdc): flushing journal and stopping allocators complete, journal seq 41459
00493 bcachefs (vdc): clean shutdown complete, journal seq 41460
00493 bcachefs (vdc): marking filesystem clean
00493 bcachefs (vdc): shutdown complete
00495 bcachefs (vdc): starting version 1.13: inode_has_child_snapshots
00495 bcachefs (vdc): recovering from clean shutdown, journal seq 41460
00495 bcachefs (vdc): accounting_read... done
00495 bcachefs (vdc): alloc_read... done
00495 bcachefs (vdc): stripes_read... done
00495 bcachefs (vdc): snapshots_read... done
00495 bcachefs (vdc): going read-write
00495 bcachefs (vdc): journal_replay... done
00495 bcachefs (vdc): resume_logged_ops... done
00495 bcachefs (vdc): delete_dead_inodes... done
00495 bcachefs (vdc): done starting filesystem
00495 bcachefs (vdc): shutting down
00495 bcachefs (vdc): going read-only
00495 bcachefs (vdc): finished waiting for writes to stop
00495 bcachefs (vdc): flushing journal and stopping allocators, journal seq 41460
00495 bcachefs (vdc): flushing journal and stopping allocators complete, journal seq 41460
00495 bcachefs (vdc): clean shutdown complete, journal seq 41461
00495 bcachefs (vdc): marking filesystem clean
00495 bcachefs (vdc): shutdown complete
00495 _check_dmesg: something found in dmesg (see /ktest-out/xfstests/generic/017.dmesg)
00496 
00496 /ktest-out/xfstests/generic/017.full:
00496 cat: /ktest-out/xfstests/generic/017.full: No such file or directory
00496 ========= FAILED generic/017 in 391s, exit status 0



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: arm64: stacktrace: unwind exception boundaries
  2024-12-05 18:03 Kent Overstreet
@ 2024-12-05 18:04 ` Kent Overstreet
  2024-12-09 11:37   ` Mark Rutland
  0 siblings, 1 reply; 7+ messages in thread
From: Kent Overstreet @ 2024-12-05 18:04 UTC (permalink / raw)
  To: linux-arm-kernel, Mark Rutland, linux-bcachefs, linux-kernel

On 6.13-rc1, I'm now seeing a ton of test failures due to this warning -
what gives?

00104 ========= TEST   generic/017
00104        run fstests generic/017 at 2024-12-05 11:47:43
00104 spectre-v4 mitigation disabled by command-line option
00106 bcachefs (vdc): starting version 1.13: inode_has_child_snapshots
00106 bcachefs (vdc): initializing new filesystem
00106 bcachefs (vdc): going read-write
00106 bcachefs (vdc): marking superblocks
00106 bcachefs (vdc): initializing freespace
00106 bcachefs (vdc): done initializing freespace
00106 bcachefs (vdc): reading snapshots table
00106 bcachefs (vdc): reading snapshots done
00106 bcachefs (vdc): done starting filesystem
00200 ------------[ cut here ]------------
00200 WARNING: CPU: 8 PID: 12571 at arch/arm64/kernel/stacktrace.c:223 arch_stack_walk+0x2c0/0x388
00200 Modules linked in:
00200 CPU: 8 UID: 0 PID: 12571 Comm: cat Not tainted 6.13.0-rc1-ktest-gdf1fce23d033 #10877
00200 Hardware name: linux,dummy-virt (DT)
00200 pstate: 80001005 (Nzcv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
00200 pc : arch_stack_walk+0x2c0/0x388
00200 lr : arch_stack_walk+0x1f4/0x388
00200 sp : ffffff80ca757b00
00200 x29: ffffff80ca757b00 x28: ffffff80d6783758 x27: 000000000000036f
00200 x26: ffffff80d6783780 x25: ffffff80c19de180 x24: ffffff80c273ee40
00200 x23: ffffffc080022708 x22: ffffffc080025828 x21: ffffff80ca757be8
00200 x20: ffffffc0800c98c8 x19: ffffff80d8533d10 x18: ffffffffffffffff
00200 x17: 0000000000000000 x16: 0000000000000000 x15: ffffffffffffffff
00200 x14: 0000000000000000 x13: ffffff80d6c92000 x12: 0000000000000000
00200 x11: 0000000000000000 x10: 000000000000000a x9 : ffffffc08086beec
00200 x8 : ffffff80ca757bc0 x7 : 0000000000000000 x6 : ffffff80d8533d28
00200 x5 : ffffff80ca757b60 x4 : ffffff80d8534000 x3 : ffffff80ca757b40
00200 x2 : 0000000000000018 x1 : ffffff80d8533d10 x0 : 0000000000000000
00200 Call trace:
00200  arch_stack_walk+0x2c0/0x388 (P)
00200  arch_stack_walk+0x1f4/0x388 (L)
00200  stack_trace_save_tsk+0x90/0xd8
00200  bch2_save_backtrace+0x68/0x160
00200  bch2_prt_task_backtrace+0x30/0xb8
00200  bch2_btree_transactions_read+0xe0/0x1f0
00200  full_proxy_read+0x60/0xd0
00200  vfs_read+0x80/0x2d8
00200  ksys_read+0x5c/0xf0
00200  __arm64_sys_read+0x20/0x30
00200  invoke_syscall.constprop.0+0x54/0xe8
00200  do_el0_svc+0x44/0xc8
00200  el0_svc+0x18/0x58
00200  el0t_64_sync_handler+0x104/0x130
00200  el0t_64_sync+0x154/0x158
00200 ---[ end trace 0000000000000000 ]---
00233 hrtimer: interrupt took 49506640 ns
00493 bcachefs (vdb): shutting down
00493 bcachefs (vdb): going read-only
00493 bcachefs (vdb): finished waiting for writes to stop
00493 bcachefs (vdb): flushing journal and stopping allocators, journal seq 18
00493 bcachefs (vdb): flushing journal and stopping allocators complete, journal seq 18
00493 bcachefs (vdb): clean shutdown complete, journal seq 19
00493 bcachefs (vdb): marking filesystem clean
00493 bcachefs (vdb): shutdown complete
00493 bcachefs (vdc): shutting down
00493 bcachefs (vdc): going read-only
00493 bcachefs (vdc): finished waiting for writes to stop
00493 bcachefs (vdc): flushing journal and stopping allocators, journal seq 41457
00493 bcachefs (vdc): flushing journal and stopping allocators complete, journal seq 41459
00493 bcachefs (vdc): clean shutdown complete, journal seq 41460
00493 bcachefs (vdc): marking filesystem clean
00493 bcachefs (vdc): shutdown complete
00495 bcachefs (vdc): starting version 1.13: inode_has_child_snapshots
00495 bcachefs (vdc): recovering from clean shutdown, journal seq 41460
00495 bcachefs (vdc): accounting_read... done
00495 bcachefs (vdc): alloc_read... done
00495 bcachefs (vdc): stripes_read... done
00495 bcachefs (vdc): snapshots_read... done
00495 bcachefs (vdc): going read-write
00495 bcachefs (vdc): journal_replay... done
00495 bcachefs (vdc): resume_logged_ops... done
00495 bcachefs (vdc): delete_dead_inodes... done
00495 bcachefs (vdc): done starting filesystem
00495 bcachefs (vdc): shutting down
00495 bcachefs (vdc): going read-only
00495 bcachefs (vdc): finished waiting for writes to stop
00495 bcachefs (vdc): flushing journal and stopping allocators, journal seq 41460
00495 bcachefs (vdc): flushing journal and stopping allocators complete, journal seq 41460
00495 bcachefs (vdc): clean shutdown complete, journal seq 41461
00495 bcachefs (vdc): marking filesystem clean
00495 bcachefs (vdc): shutdown complete
00495 _check_dmesg: something found in dmesg (see /ktest-out/xfstests/generic/017.dmesg)
00496 
00496 /ktest-out/xfstests/generic/017.full:
00496 cat: /ktest-out/xfstests/generic/017.full: No such file or directory
00496 ========= FAILED generic/017 in 391s, exit status 0



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: arm64: stacktrace: unwind exception boundaries
  2024-12-05 18:04 ` arm64: stacktrace: unwind exception boundaries Kent Overstreet
@ 2024-12-09 11:37   ` Mark Rutland
  2024-12-10 11:27     ` Mark Rutland
  0 siblings, 1 reply; 7+ messages in thread
From: Mark Rutland @ 2024-12-09 11:37 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-arm-kernel, linux-bcachefs, linux-kernel

Hi Kent,

On Thu, Dec 05, 2024 at 01:04:59PM -0500, Kent Overstreet wrote:
> On 6.13-rc1, I'm now seeing a ton of test failures due to this warning -
> what gives?

Sorry about this; I just sent what I *thought* was a fix for this:

  https://lore.kernel.org/linux-arm-kernel/20241209110351.1876804-1-mark.rutland@arm.com/

... but re-reading the below I see you're actually hitting a different
issue.

> 00104 ========= TEST   generic/017
> 00104        run fstests generic/017 at 2024-12-05 11:47:43
> 00104 spectre-v4 mitigation disabled by command-line option
> 00106 bcachefs (vdc): starting version 1.13: inode_has_child_snapshots
> 00106 bcachefs (vdc): initializing new filesystem
> 00106 bcachefs (vdc): going read-write
> 00106 bcachefs (vdc): marking superblocks
> 00106 bcachefs (vdc): initializing freespace
> 00106 bcachefs (vdc): done initializing freespace
> 00106 bcachefs (vdc): reading snapshots table
> 00106 bcachefs (vdc): reading snapshots done
> 00106 bcachefs (vdc): done starting filesystem
> 00200 ------------[ cut here ]------------
> 00200 WARNING: CPU: 8 PID: 12571 at arch/arm64/kernel/stacktrace.c:223 arch_stack_walk+0x2c0/0x388

Looking at v6.13-rc1, that's the warning in
kunwind_next_frame_record_meta() for when the frame_record_meta::type is
not a valid value, which likely implies one of the following:

(a) The logic to identify a frame_record_meta is wrong.

(b) The entry logic has failed to initilialize pt_regs::stackframe::meta
    on an entry path somehow.

(c) The stack has been corrupted, and some frame record has been
    clobbered to look like a frame_record_meta.

The unwind will terminate at this point even if the warning is
supressed, so the backtrace from the warning itself should indicate
where the frame record was:

> 00200 Modules linked in:
> 00200 CPU: 8 UID: 0 PID: 12571 Comm: cat Not tainted 6.13.0-rc1-ktest-gdf1fce23d033 #10877
> 00200 Hardware name: linux,dummy-virt (DT)
> 00200 pstate: 80001005 (Nzcv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
> 00200 pc : arch_stack_walk+0x2c0/0x388
> 00200 lr : arch_stack_walk+0x1f4/0x388
> 00200 sp : ffffff80ca757b00
> 00200 x29: ffffff80ca757b00 x28: ffffff80d6783758 x27: 000000000000036f
> 00200 x26: ffffff80d6783780 x25: ffffff80c19de180 x24: ffffff80c273ee40
> 00200 x23: ffffffc080022708 x22: ffffffc080025828 x21: ffffff80ca757be8
> 00200 x20: ffffffc0800c98c8 x19: ffffff80d8533d10 x18: ffffffffffffffff
> 00200 x17: 0000000000000000 x16: 0000000000000000 x15: ffffffffffffffff
> 00200 x14: 0000000000000000 x13: ffffff80d6c92000 x12: 0000000000000000
> 00200 x11: 0000000000000000 x10: 000000000000000a x9 : ffffffc08086beec
> 00200 x8 : ffffff80ca757bc0 x7 : 0000000000000000 x6 : ffffff80d8533d28
> 00200 x5 : ffffff80ca757b60 x4 : ffffff80d8534000 x3 : ffffff80ca757b40
> 00200 x2 : 0000000000000018 x1 : ffffff80d8533d10 x0 : 0000000000000000
> 00200 Call trace:
> 00200  arch_stack_walk+0x2c0/0x388 (P)
> 00200  arch_stack_walk+0x1f4/0x388 (L)
> 00200  stack_trace_save_tsk+0x90/0xd8
> 00200  bch2_save_backtrace+0x68/0x160
> 00200  bch2_prt_task_backtrace+0x30/0xb8
> 00200  bch2_btree_transactions_read+0xe0/0x1f0
> 00200  full_proxy_read+0x60/0xd0
> 00200  vfs_read+0x80/0x2d8
> 00200  ksys_read+0x5c/0xf0
> 00200  __arm64_sys_read+0x20/0x30
> 00200  invoke_syscall.constprop.0+0x54/0xe8
> 00200  do_el0_svc+0x44/0xc8
> 00200  el0_svc+0x18/0x58
> 00200  el0t_64_sync_handler+0x104/0x130
> 00200  el0t_64_sync+0x154/0x158

This suggests that the bad frame_record_meta is in the entry code, in
the pt_regs. There *should* be a frame_record_meta there with type
FRAME_META_TYPE_FINAL, which suggests the probkem isn't case (a) above,
and either the entry code is failing to initialise
pt_regs::stackframe::type somehow, or that is getting corrupted later.

AFAICT, the entry code correctly initializes the frame_record_meta in
the kernel_entry assembly macro:

        /*
         * Create a metadata frame record. The unwinder will use this to
         * identify and unwind exception boundaries.
         */
        stp     xzr, xzr, [sp, #S_STACKFRAME]
        .if \el == 0
        mov     x0, #FRAME_META_TYPE_FINAL
        .else
        mov     x0, #FRAME_META_TYPE_PT_REGS
        .endif
        str     x0, [sp, #S_STACKFRAME_TYPE]
        add     x29, sp, #S_STACKFRAME

... and that's shared by *all* exceptions, so if it were wrong I'd
expect it to trigger an explosion on the very first backtrace taken
under an exception (from userspace or kernel).

Given the above, I'd suspect that maybe you're encountering stack
corruption here.

How are you reproducing this, and how consistently do you see the
failure? Is it always triggered by the same test?

The below diff will log the bad frame_record_meta::type value; if you're
able to run with that it might give an indication as to what's going on.

Mark.

---->8----
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index caef85462acb6..d1d9436c49101 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -205,13 +205,15 @@ kunwind_next_frame_record_meta(struct kunwind_state *state)
        unsigned long fp = state->common.fp;
        struct frame_record_meta *meta;
        struct stack_info *info;
+       u64 type;
 
        info = unwind_find_stack(&state->common, fp, sizeof(*meta));
        if (!info)
                return -EINVAL;
 
        meta = (struct frame_record_meta *)fp;
-       switch (READ_ONCE(meta->type)) {
+       type = READ_ONCE(meta->type);
+       switch (type) {
        case FRAME_META_TYPE_FINAL:
                if (meta == &task_pt_regs(tsk)->stackframe)
                        return -ENOENT;
@@ -220,7 +222,7 @@ kunwind_next_frame_record_meta(struct kunwind_state *state)
        case FRAME_META_TYPE_PT_REGS:
                return kunwind_next_regs_pc(state);
        default:
-               WARN_ON_ONCE(1);
+               WARN_ONCE(1, "Bad frame_record_meta::type: 0x%016llx\n", type);
                return -EINVAL;
        }
 }
 


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: arm64: stacktrace: unwind exception boundaries
  2024-12-09 11:37   ` Mark Rutland
@ 2024-12-10 11:27     ` Mark Rutland
  2024-12-10 12:40       ` Kent Overstreet
  0 siblings, 1 reply; 7+ messages in thread
From: Mark Rutland @ 2024-12-10 11:27 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-arm-kernel, linux-bcachefs, linux-kernel

On Mon, Dec 09, 2024 at 11:37:12AM +0000, Mark Rutland wrote:
> Hi Kent,
> 
> On Thu, Dec 05, 2024 at 01:04:59PM -0500, Kent Overstreet wrote:
> > On 6.13-rc1, I'm now seeing a ton of test failures due to this warning -
> > what gives?
> 
> Sorry about this; I just sent what I *thought* was a fix for this:
> 
>   https://lore.kernel.org/linux-arm-kernel/20241209110351.1876804-1-mark.rutland@arm.com/
> 
> ... but re-reading the below I see you're actually hitting a different
> issue.
> 
> > 00104 ========= TEST   generic/017
> > 00104        run fstests generic/017 at 2024-12-05 11:47:43
> > 00104 spectre-v4 mitigation disabled by command-line option
> > 00106 bcachefs (vdc): starting version 1.13: inode_has_child_snapshots
> > 00106 bcachefs (vdc): initializing new filesystem
> > 00106 bcachefs (vdc): going read-write
> > 00106 bcachefs (vdc): marking superblocks
> > 00106 bcachefs (vdc): initializing freespace
> > 00106 bcachefs (vdc): done initializing freespace
> > 00106 bcachefs (vdc): reading snapshots table
> > 00106 bcachefs (vdc): reading snapshots done
> > 00106 bcachefs (vdc): done starting filesystem
> > 00200 ------------[ cut here ]------------
> > 00200 WARNING: CPU: 8 PID: 12571 at arch/arm64/kernel/stacktrace.c:223 arch_stack_walk+0x2c0/0x388
> 
> Looking at v6.13-rc1, that's the warning in
> kunwind_next_frame_record_meta() for when the frame_record_meta::type is
> not a valid value, which likely implies one of the following:
> 
> (a) The logic to identify a frame_record_meta is wrong.
> 
> (b) The entry logic has failed to initilialize pt_regs::stackframe::meta
>     on an entry path somehow.
> 
> (c) The stack has been corrupted, and some frame record has been
>     clobbered to look like a frame_record_meta.

Looking some more, I see that bch2_btree_transactions_read() is trying
to unwind other tasks, and I believe what's happening here is that the
unwindee isn't actually blocked for the duration of the unwind, leading
to the unwinder encountering junk and consequently producing the
warning.

As a test case, it's possible to trigger similar with a few parallel
instances of:

	while true; do cat /proc/*/stack > /dev/null

The only thing we can do on the arm64 side is remove the WARN_ON_ONCE(),
which'll get rid of the splat. It seems we've never been unlucky enough
to hit a stale fgraph entry, or that would've blown up also.

Regardless of the way arm64 behaves here, the unwind performed by
bch2_btree_transactions_read() is going to contain garbage unless the
task is pinned in a blocked state. AFAICT the way
btree_trans::locking_wait::task is used is here is racy, and there's no
guarantee that the unwindee is actually blocked.

Mark.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: arm64: stacktrace: unwind exception boundaries
  2024-12-10 11:27     ` Mark Rutland
@ 2024-12-10 12:40       ` Kent Overstreet
  2024-12-10 19:17         ` Mark Rutland
  0 siblings, 1 reply; 7+ messages in thread
From: Kent Overstreet @ 2024-12-10 12:40 UTC (permalink / raw)
  To: Mark Rutland; +Cc: linux-arm-kernel, linux-bcachefs, linux-kernel

On Tue, Dec 10, 2024 at 11:27:19AM +0000, Mark Rutland wrote:
> On Mon, Dec 09, 2024 at 11:37:12AM +0000, Mark Rutland wrote:
> > Hi Kent,
> > 
> > On Thu, Dec 05, 2024 at 01:04:59PM -0500, Kent Overstreet wrote:
> > > On 6.13-rc1, I'm now seeing a ton of test failures due to this warning -
> > > what gives?
> > 
> > Sorry about this; I just sent what I *thought* was a fix for this:
> > 
> >   https://lore.kernel.org/linux-arm-kernel/20241209110351.1876804-1-mark.rutland@arm.com/
> > 
> > ... but re-reading the below I see you're actually hitting a different
> > issue.
> > 
> > > 00104 ========= TEST   generic/017
> > > 00104        run fstests generic/017 at 2024-12-05 11:47:43
> > > 00104 spectre-v4 mitigation disabled by command-line option
> > > 00106 bcachefs (vdc): starting version 1.13: inode_has_child_snapshots
> > > 00106 bcachefs (vdc): initializing new filesystem
> > > 00106 bcachefs (vdc): going read-write
> > > 00106 bcachefs (vdc): marking superblocks
> > > 00106 bcachefs (vdc): initializing freespace
> > > 00106 bcachefs (vdc): done initializing freespace
> > > 00106 bcachefs (vdc): reading snapshots table
> > > 00106 bcachefs (vdc): reading snapshots done
> > > 00106 bcachefs (vdc): done starting filesystem
> > > 00200 ------------[ cut here ]------------
> > > 00200 WARNING: CPU: 8 PID: 12571 at arch/arm64/kernel/stacktrace.c:223 arch_stack_walk+0x2c0/0x388
> > 
> > Looking at v6.13-rc1, that's the warning in
> > kunwind_next_frame_record_meta() for when the frame_record_meta::type is
> > not a valid value, which likely implies one of the following:
> > 
> > (a) The logic to identify a frame_record_meta is wrong.
> > 
> > (b) The entry logic has failed to initilialize pt_regs::stackframe::meta
> >     on an entry path somehow.
> > 
> > (c) The stack has been corrupted, and some frame record has been
> >     clobbered to look like a frame_record_meta.
> 
> Looking some more, I see that bch2_btree_transactions_read() is trying
> to unwind other tasks, and I believe what's happening here is that the
> unwindee isn't actually blocked for the duration of the unwind, leading
> to the unwinder encountering junk and consequently producing the
> warning.
> 
> As a test case, it's possible to trigger similar with a few parallel
> instances of:
> 
> 	while true; do cat /proc/*/stack > /dev/null
> 
> The only thing we can do on the arm64 side is remove the WARN_ON_ONCE(),
> which'll get rid of the splat. It seems we've never been unlucky enough
> to hit a stale fgraph entry, or that would've blown up also.
> 
> Regardless of the way arm64 behaves here, the unwind performed by
> bch2_btree_transactions_read() is going to contain garbage unless the
> task is pinned in a blocked state. AFAICT the way
> btree_trans::locking_wait::task is used is here is racy, and there's no
> guarantee that the unwindee is actually blocked.

Occasionally returning garbage is completely fine, as long as the
interface is otherwise safe. This is debug info; it's important that it
be available and we can't impose additional synchronization for it.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: arm64: stacktrace: unwind exception boundaries
  2024-12-10 12:40       ` Kent Overstreet
@ 2024-12-10 19:17         ` Mark Rutland
  2024-12-10 19:31           ` Kent Overstreet
  0 siblings, 1 reply; 7+ messages in thread
From: Mark Rutland @ 2024-12-10 19:17 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-arm-kernel, linux-bcachefs, linux-kernel

On Tue, Dec 10, 2024 at 07:40:04AM -0500, Kent Overstreet wrote:
> On Tue, Dec 10, 2024 at 11:27:19AM +0000, Mark Rutland wrote:
> > On Mon, Dec 09, 2024 at 11:37:12AM +0000, Mark Rutland wrote:
> > > On Thu, Dec 05, 2024 at 01:04:59PM -0500, Kent Overstreet wrote:

> > Looking some more, I see that bch2_btree_transactions_read() is trying
> > to unwind other tasks, and I believe what's happening here is that the
> > unwindee isn't actually blocked for the duration of the unwind, leading
> > to the unwinder encountering junk and consequently producing the
> > warning.
> > 
> > As a test case, it's possible to trigger similar with a few parallel
> > instances of:
> > 
> > 	while true; do cat /proc/*/stack > /dev/null
> > 
> > The only thing we can do on the arm64 side is remove the WARN_ON_ONCE(),
> > which'll get rid of the splat. It seems we've never been unlucky enough
> > to hit a stale fgraph entry, or that would've blown up also.
> > 
> > Regardless of the way arm64 behaves here, the unwind performed by
> > bch2_btree_transactions_read() is going to contain garbage unless the
> > task is pinned in a blocked state. AFAICT the way
> > btree_trans::locking_wait::task is used is here is racy, and there's no
> > guarantee that the unwindee is actually blocked.
> 
> Occasionally returning garbage is completely fine, as long as the
> interface is otherwise safe. This is debug info; it's important that it
> be available and we can't impose additional synchronization for it.

Sure thing; just note that there's no guarantee that this is only
"occasionally" garbage -- this could be wrong 1% of the time or 99% of
the time depending on the specific scenario, HW it's running on, etc. As
long as you're happy to hold the pieces when that happens, that's fine.

I've pushed out fixes to the arm64/stacktrace/fixes branch on my
kernel.org git repo:

  https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/stacktrace/fixes
  git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/stacktrace/fixes

... and I'll get that out as a series on the list tomorrow.

Mark.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: arm64: stacktrace: unwind exception boundaries
  2024-12-10 19:17         ` Mark Rutland
@ 2024-12-10 19:31           ` Kent Overstreet
  0 siblings, 0 replies; 7+ messages in thread
From: Kent Overstreet @ 2024-12-10 19:31 UTC (permalink / raw)
  To: Mark Rutland; +Cc: linux-arm-kernel, linux-bcachefs, linux-kernel

On Tue, Dec 10, 2024 at 07:17:36PM +0000, Mark Rutland wrote:
> On Tue, Dec 10, 2024 at 07:40:04AM -0500, Kent Overstreet wrote:
> > On Tue, Dec 10, 2024 at 11:27:19AM +0000, Mark Rutland wrote:
> > > On Mon, Dec 09, 2024 at 11:37:12AM +0000, Mark Rutland wrote:
> > > > On Thu, Dec 05, 2024 at 01:04:59PM -0500, Kent Overstreet wrote:
> 
> > > Looking some more, I see that bch2_btree_transactions_read() is trying
> > > to unwind other tasks, and I believe what's happening here is that the
> > > unwindee isn't actually blocked for the duration of the unwind, leading
> > > to the unwinder encountering junk and consequently producing the
> > > warning.
> > > 
> > > As a test case, it's possible to trigger similar with a few parallel
> > > instances of:
> > > 
> > > 	while true; do cat /proc/*/stack > /dev/null
> > > 
> > > The only thing we can do on the arm64 side is remove the WARN_ON_ONCE(),
> > > which'll get rid of the splat. It seems we've never been unlucky enough
> > > to hit a stale fgraph entry, or that would've blown up also.
> > > 
> > > Regardless of the way arm64 behaves here, the unwind performed by
> > > bch2_btree_transactions_read() is going to contain garbage unless the
> > > task is pinned in a blocked state. AFAICT the way
> > > btree_trans::locking_wait::task is used is here is racy, and there's no
> > > guarantee that the unwindee is actually blocked.
> > 
> > Occasionally returning garbage is completely fine, as long as the
> > interface is otherwise safe. This is debug info; it's important that it
> > be available and we can't impose additional synchronization for it.
> 
> Sure thing; just note that there's no guarantee that this is only
> "occasionally" garbage -- this could be wrong 1% of the time or 99% of
> the time depending on the specific scenario, HW it's running on, etc. As
> long as you're happy to hold the pieces when that happens, that's fine.
> 
> I've pushed out fixes to the arm64/stacktrace/fixes branch on my
> kernel.org git repo:
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/stacktrace/fixes
>   git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/stacktrace/fixes
> 
> ... and I'll get that out as a series on the list tomorrow.

Wonderful

The fact that /proc/pid/stack doesn't give anything if the process is in
TASK_RUNNING has been a problem in the past, yet this is something we're
not consistent on in the kernel; sysrq-trigger backtraces do give
backtraces for running processes (which then may require sorting through
everything).

So thanks for this :)


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-12-10 19:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-05 18:03 Kent Overstreet
2024-12-05 18:04 ` arm64: stacktrace: unwind exception boundaries Kent Overstreet
2024-12-09 11:37   ` Mark Rutland
2024-12-10 11:27     ` Mark Rutland
2024-12-10 12:40       ` Kent Overstreet
2024-12-10 19:17         ` Mark Rutland
2024-12-10 19:31           ` Kent Overstreet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox