From: Nicolai Stange <nstange@suse.de>
To: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>,
linux-kernel@vger.kernel.org, Torsten Duwe <duwe@lst.de>,
live-patching@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
Subject: Re: ppc64le reliable stack unwinder and scheduled tasks
Date: Fri, 11 Jan 2019 01:00:38 +0100 [thread overview]
Message-ID: <87bm4ocbbt.fsf@suse.de> (raw)
In-Reply-To: <7f468285-b149-37e2-e782-c9e538b997a9@redhat.com> (Joe Lawrence's message of "Thu, 10 Jan 2019 16:14:00 -0500")
Hi Joe,
Joe Lawrence <joe.lawrence@redhat.com> writes:
> tl;dr: On ppc64le, what is top-most stack frame for scheduled tasks
> about?
If I'm reading the code in _switch() correctly, the first frame is
completely uninitialized except for the pointer back to the caller's
stack frame.
For completeness: _switch() saves the return address, i.e. the link
register into its parent's stack frame, as is mandated by the ABI and
consistent with your findings below: it's always the second stack frame
where the return address into __switch_to() is kept.
<snip>
>
>
> Example 1 (RHEL-7)
> ==================
>
> crash> struct task_struct.thread c00000022fd015c0 | grep ksp
> ksp = 0xc0000000288af9c0
>
> crash> rd 0xc0000000288af9c0 -e 0xc0000000288b0000
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[0]:
>
> c0000000288af9c0: c0000000288afb90 0000000000dd0000 ...(............
> c0000000288af9d0: c000000000002a94 c000000001c60a00 .*..............
>
> crash> sym c000000000002a94
> c000000000002a94 (T) hardware_interrupt_common+0x114
So that c000000000002a94 certainly wasn't stored by _switch(). I think
what might have happened is that the switching frame aliased with some
prior interrupt frame as setup by hardware_interrupt_common().
The interrupt and switching frames seem to share a common layout as far
as the lower STACK_FRAME_OVERHEAD + sizeof(struct pt_regs) bytes are
concerned.
That address into hardware_interrupt_common() could have been written by
the do_IRQ() called from there.
> c0000000288af9e0: c000000001c60a80 0000000000000000 ................
> c0000000288af9f0: c0000000288afbc0 0000000000dd0000 ...(............
> c0000000288afa00: c0000000014322e0 c000000001c60a00 ."C.............
> c0000000288afa10: c0000002303ae380 c0000002303ae380 ..:0......:0....
> c0000000288afa20: 7265677368657265 0000000000002200 erehsger."......
>
> Uh-oh...
>
> /* Mark stacktraces with exception frames as unreliable. */
> stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER
Aliasing of the switching stack frame with some prior interrupt stack
frame would explain why that STACK_FRAME_REGS_MARKER is still found on
the stack, i.e. it's a leftover.
For testing, could you try whether clearing the word at STACK_FRAME_MARKER
from _switch() helps?
I.e. something like (completely untested):
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 435927f549c4..b747d0647ec4 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -596,6 +596,10 @@ _GLOBAL(_switch)
SAVE_8GPRS(14, r1)
SAVE_10GPRS(22, r1)
std r0,_NIP(r1) /* Return to switch caller */
+
+ li r23,0
+ std r23,96(r1) /* 96 == STACK_FRAME_MARKER * sizeof(long) */
+
mfcr r23
std r23,_CCR(r1)
std r1,KSP(r3) /* Set old stack pointer */
<snap>
>
> save_stack_trace_tsk_reliable
> =============================
>
> arch/powerpc/kernel/stacktrace.c :: save_stack_trace_tsk_reliable() does
> take into account the first stackframe, but only to verify that the link
> register is indeed pointing at kernel code address.
It's actually the other way around:
if (!firstframe && !__kernel_text_address(ip))
return 1;
So the address gets sanitized only if it's _not_ coming from the first
frame.
Thanks,
Nicolai
--
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)
WARNING: multiple messages have this Message-ID (diff)
From: Nicolai Stange <nstange@suse.de>
To: Joe Lawrence <joe.lawrence@redhat.com>
Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
live-patching@vger.kernel.org, Torsten Duwe <duwe@lst.de>,
Michael Ellerman <mpe@ellerman.id.au>,
Jiri Kosina <jkosina@suse.cz>,
Balbir Singh <bsingharora@gmail.com>
Subject: Re: ppc64le reliable stack unwinder and scheduled tasks
Date: Fri, 11 Jan 2019 01:00:38 +0100 [thread overview]
Message-ID: <87bm4ocbbt.fsf@suse.de> (raw)
In-Reply-To: <7f468285-b149-37e2-e782-c9e538b997a9@redhat.com> (Joe Lawrence's message of "Thu, 10 Jan 2019 16:14:00 -0500")
Hi Joe,
Joe Lawrence <joe.lawrence@redhat.com> writes:
> tl;dr: On ppc64le, what is top-most stack frame for scheduled tasks
> about?
If I'm reading the code in _switch() correctly, the first frame is
completely uninitialized except for the pointer back to the caller's
stack frame.
For completeness: _switch() saves the return address, i.e. the link
register into its parent's stack frame, as is mandated by the ABI and
consistent with your findings below: it's always the second stack frame
where the return address into __switch_to() is kept.
<snip>
>
>
> Example 1 (RHEL-7)
> ==================
>
> crash> struct task_struct.thread c00000022fd015c0 | grep ksp
> ksp = 0xc0000000288af9c0
>
> crash> rd 0xc0000000288af9c0 -e 0xc0000000288b0000
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[0]:
>
> c0000000288af9c0: c0000000288afb90 0000000000dd0000 ...(............
> c0000000288af9d0: c000000000002a94 c000000001c60a00 .*..............
>
> crash> sym c000000000002a94
> c000000000002a94 (T) hardware_interrupt_common+0x114
So that c000000000002a94 certainly wasn't stored by _switch(). I think
what might have happened is that the switching frame aliased with some
prior interrupt frame as setup by hardware_interrupt_common().
The interrupt and switching frames seem to share a common layout as far
as the lower STACK_FRAME_OVERHEAD + sizeof(struct pt_regs) bytes are
concerned.
That address into hardware_interrupt_common() could have been written by
the do_IRQ() called from there.
> c0000000288af9e0: c000000001c60a80 0000000000000000 ................
> c0000000288af9f0: c0000000288afbc0 0000000000dd0000 ...(............
> c0000000288afa00: c0000000014322e0 c000000001c60a00 ."C.............
> c0000000288afa10: c0000002303ae380 c0000002303ae380 ..:0......:0....
> c0000000288afa20: 7265677368657265 0000000000002200 erehsger."......
>
> Uh-oh...
>
> /* Mark stacktraces with exception frames as unreliable. */
> stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER
Aliasing of the switching stack frame with some prior interrupt stack
frame would explain why that STACK_FRAME_REGS_MARKER is still found on
the stack, i.e. it's a leftover.
For testing, could you try whether clearing the word at STACK_FRAME_MARKER
from _switch() helps?
I.e. something like (completely untested):
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 435927f549c4..b747d0647ec4 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -596,6 +596,10 @@ _GLOBAL(_switch)
SAVE_8GPRS(14, r1)
SAVE_10GPRS(22, r1)
std r0,_NIP(r1) /* Return to switch caller */
+
+ li r23,0
+ std r23,96(r1) /* 96 == STACK_FRAME_MARKER * sizeof(long) */
+
mfcr r23
std r23,_CCR(r1)
std r1,KSP(r3) /* Set old stack pointer */
<snap>
>
> save_stack_trace_tsk_reliable
> =============================
>
> arch/powerpc/kernel/stacktrace.c :: save_stack_trace_tsk_reliable() does
> take into account the first stackframe, but only to verify that the link
> register is indeed pointing at kernel code address.
It's actually the other way around:
if (!firstframe && !__kernel_text_address(ip))
return 1;
So the address gets sanitized only if it's _not_ coming from the first
frame.
Thanks,
Nicolai
--
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)
next prev parent reply other threads:[~2019-01-11 0:04 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-10 21:14 ppc64le reliable stack unwinder and scheduled tasks Joe Lawrence
2019-01-10 21:14 ` Joe Lawrence
2019-01-11 0:00 ` Nicolai Stange [this message]
2019-01-11 0:00 ` Nicolai Stange
2019-01-11 1:08 ` Joe Lawrence
2019-01-11 1:08 ` Joe Lawrence
2019-01-11 7:51 ` Nicolai Stange
2019-01-11 7:51 ` Nicolai Stange
2019-01-14 4:09 ` Joe Lawrence
2019-01-14 4:09 ` Joe Lawrence
2019-01-14 7:21 ` Nicolai Stange
2019-01-14 7:21 ` Nicolai Stange
2019-01-14 16:46 ` Joe Lawrence
2019-01-14 16:46 ` Joe Lawrence
2019-01-14 17:09 ` Josh Poimboeuf
2019-01-14 17:09 ` Josh Poimboeuf
2019-01-14 17:54 ` Joe Lawrence
2019-01-14 17:54 ` Joe Lawrence
2019-01-12 1:09 ` Balbir Singh
2019-01-12 1:09 ` Balbir Singh
2019-01-12 8:45 ` Segher Boessenkool
2019-01-13 12:33 ` Balbir Singh
2019-01-13 13:05 ` Torsten Duwe
2019-01-13 13:05 ` Torsten Duwe
2019-01-17 14:52 ` Segher Boessenkool
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87bm4ocbbt.fsf@suse.de \
--to=nstange@suse.de \
--cc=duwe@lst.de \
--cc=jkosina@suse.cz \
--cc=joe.lawrence@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=live-patching@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.