All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joe Lawrence <joe.lawrence@redhat.com>
To: Nicolai Stange <nstange@suse.de>
Cc: Jiri Kosina <jkosina@suse.cz>,
	linux-kernel@vger.kernel.org, Torsten Duwe <duwe@lst.de>,
	live-patching@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
Subject: Re: ppc64le reliable stack unwinder and scheduled tasks
Date: Thu, 10 Jan 2019 20:08:08 -0500	[thread overview]
Message-ID: <20190111010808.GA17858@redhat.com> (raw)
In-Reply-To: <87bm4ocbbt.fsf@suse.de>

On Fri, Jan 11, 2019 at 01:00:38AM +0100, Nicolai Stange wrote:
> Hi Joe,
> 
> Joe Lawrence <joe.lawrence@redhat.com> writes:
> 
> > tl;dr: On ppc64le, what is top-most stack frame for scheduled tasks
> >        about?
> 
> If I'm reading the code in _switch() correctly, the first frame is
> completely uninitialized except for the pointer back to the caller's
> stack frame.
> 
> For completeness: _switch() saves the return address, i.e. the link
> register into its parent's stack frame, as is mandated by the ABI and
> consistent with your findings below: it's always the second stack frame
> where the return address into __switch_to() is kept.
>

Hi Nicolai,

Good, that makes a lot of sense.  I couldn't find any reference
explaining the contents of frame 0, only unwinding code here and there
(as in crash-utility) that stepped over it.
 
> <snip>
> 
> >
> >
> > Example 1 (RHEL-7)
> > ==================
> >
> > crash> struct task_struct.thread c00000022fd015c0 | grep ksp
> >     ksp = 0xc0000000288af9c0
> >
> > crash> rd 0xc0000000288af9c0 -e 0xc0000000288b0000
> >
> >  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> >
> > sp[0]:
> >
> > c0000000288af9c0:  c0000000288afb90 0000000000dd0000   ...(............
> > c0000000288af9d0:  c000000000002a94 c000000001c60a00   .*..............
> >
> >         crash> sym c000000000002a94
> >         c000000000002a94 (T) hardware_interrupt_common+0x114
> 
> So that c000000000002a94 certainly wasn't stored by _switch(). I think
> what might have happened is that the switching frame aliased with some
> prior interrupt frame as setup by hardware_interrupt_common().
> 
> The interrupt and switching frames seem to share a common layout as far
> as the lower STACK_FRAME_OVERHEAD + sizeof(struct pt_regs) bytes are
> concerned.
> 
> That address into hardware_interrupt_common() could have been written by
> the do_IRQ() called from there.
> 

That was my initial theory, but then when I saw an ordinary scheduled
task with a similarly strange frame 0, I thought that _switch() might
have been doing something clever (or not).  But according your earlier
explanation, it would line up that these values may be inherited from
do_IRQ() or the like.

> 
> > c0000000288af9e0:  c000000001c60a80 0000000000000000   ................
> > c0000000288af9f0:  c0000000288afbc0 0000000000dd0000   ...(............
> > c0000000288afa00:  c0000000014322e0 c000000001c60a00   ."C.............
> > c0000000288afa10:  c0000002303ae380 c0000002303ae380   ..:0......:0....
> > c0000000288afa20:  7265677368657265 0000000000002200   erehsger."......
> >
> >         Uh-oh...
> >
> >         /* Mark stacktraces with exception frames as unreliable. */
> >         stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER
> 
> 
> Aliasing of the switching stack frame with some prior interrupt stack
> frame would explain why that STACK_FRAME_REGS_MARKER is still found on
> the stack, i.e. it's a leftover.
> 
> For testing, could you try whether clearing the word at STACK_FRAME_MARKER
> from _switch() helps?
> 
> I.e. something like (completely untested):

I'll kick off some builds tonight and try to get tests lined up
tomorrow.  Unfortunately these take a bit of time to run setup, schedule
and complete, so perhaps by next week.

> 
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index 435927f549c4..b747d0647ec4 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -596,6 +596,10 @@ _GLOBAL(_switch)
>  	SAVE_8GPRS(14, r1)
>  	SAVE_10GPRS(22, r1)
>  	std	r0,_NIP(r1)	/* Return to switch caller */
> +
> +	li	r23,0
> +	std	r23,96(r1)	/* 96 == STACK_FRAME_MARKER * sizeof(long) */
> +
>  	mfcr	r23
>  	std	r23,_CCR(r1)
>  	std	r1,KSP(r3)	/* Set old stack pointer */
> 
> 

This may be sufficient to avoid the condition, but if the contents of
frame 0 are truely uninitialized (not to be trusted), should the
unwinder be even looking at that frame (for STACK_FRAMES_REGS_MARKER),
aside from the LR and other stack size geometry sanity checks?

> <snap>
> 
> >
> > save_stack_trace_tsk_reliable
> > =============================
> >
> > arch/powerpc/kernel/stacktrace.c :: save_stack_trace_tsk_reliable() does
> > take into account the first stackframe, but only to verify that the link
> > register is indeed pointing at kernel code address.
> 
> It's actually the other way around:
> 
> 	if (!firstframe && !__kernel_text_address(ip))
> 		return 1;
> 
> 
> So the address gets sanitized only if it's _not_ coming from the first
> frame.

Yup, that's right, I had it backwards.

Thanks!

-- Joe

WARNING: multiple messages have this Message-ID (diff)
From: Joe Lawrence <joe.lawrence@redhat.com>
To: Nicolai Stange <nstange@suse.de>
Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	live-patching@vger.kernel.org, Torsten Duwe <duwe@lst.de>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Jiri Kosina <jkosina@suse.cz>,
	Balbir Singh <bsingharora@gmail.com>
Subject: Re: ppc64le reliable stack unwinder and scheduled tasks
Date: Thu, 10 Jan 2019 20:08:08 -0500	[thread overview]
Message-ID: <20190111010808.GA17858@redhat.com> (raw)
In-Reply-To: <87bm4ocbbt.fsf@suse.de>

On Fri, Jan 11, 2019 at 01:00:38AM +0100, Nicolai Stange wrote:
> Hi Joe,
> 
> Joe Lawrence <joe.lawrence@redhat.com> writes:
> 
> > tl;dr: On ppc64le, what is top-most stack frame for scheduled tasks
> >        about?
> 
> If I'm reading the code in _switch() correctly, the first frame is
> completely uninitialized except for the pointer back to the caller's
> stack frame.
> 
> For completeness: _switch() saves the return address, i.e. the link
> register into its parent's stack frame, as is mandated by the ABI and
> consistent with your findings below: it's always the second stack frame
> where the return address into __switch_to() is kept.
>

Hi Nicolai,

Good, that makes a lot of sense.  I couldn't find any reference
explaining the contents of frame 0, only unwinding code here and there
(as in crash-utility) that stepped over it.
 
> <snip>
> 
> >
> >
> > Example 1 (RHEL-7)
> > ==================
> >
> > crash> struct task_struct.thread c00000022fd015c0 | grep ksp
> >     ksp = 0xc0000000288af9c0
> >
> > crash> rd 0xc0000000288af9c0 -e 0xc0000000288b0000
> >
> >  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> >
> > sp[0]:
> >
> > c0000000288af9c0:  c0000000288afb90 0000000000dd0000   ...(............
> > c0000000288af9d0:  c000000000002a94 c000000001c60a00   .*..............
> >
> >         crash> sym c000000000002a94
> >         c000000000002a94 (T) hardware_interrupt_common+0x114
> 
> So that c000000000002a94 certainly wasn't stored by _switch(). I think
> what might have happened is that the switching frame aliased with some
> prior interrupt frame as setup by hardware_interrupt_common().
> 
> The interrupt and switching frames seem to share a common layout as far
> as the lower STACK_FRAME_OVERHEAD + sizeof(struct pt_regs) bytes are
> concerned.
> 
> That address into hardware_interrupt_common() could have been written by
> the do_IRQ() called from there.
> 

That was my initial theory, but then when I saw an ordinary scheduled
task with a similarly strange frame 0, I thought that _switch() might
have been doing something clever (or not).  But according your earlier
explanation, it would line up that these values may be inherited from
do_IRQ() or the like.

> 
> > c0000000288af9e0:  c000000001c60a80 0000000000000000   ................
> > c0000000288af9f0:  c0000000288afbc0 0000000000dd0000   ...(............
> > c0000000288afa00:  c0000000014322e0 c000000001c60a00   ."C.............
> > c0000000288afa10:  c0000002303ae380 c0000002303ae380   ..:0......:0....
> > c0000000288afa20:  7265677368657265 0000000000002200   erehsger."......
> >
> >         Uh-oh...
> >
> >         /* Mark stacktraces with exception frames as unreliable. */
> >         stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER
> 
> 
> Aliasing of the switching stack frame with some prior interrupt stack
> frame would explain why that STACK_FRAME_REGS_MARKER is still found on
> the stack, i.e. it's a leftover.
> 
> For testing, could you try whether clearing the word at STACK_FRAME_MARKER
> from _switch() helps?
> 
> I.e. something like (completely untested):

I'll kick off some builds tonight and try to get tests lined up
tomorrow.  Unfortunately these take a bit of time to run setup, schedule
and complete, so perhaps by next week.

> 
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index 435927f549c4..b747d0647ec4 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -596,6 +596,10 @@ _GLOBAL(_switch)
>  	SAVE_8GPRS(14, r1)
>  	SAVE_10GPRS(22, r1)
>  	std	r0,_NIP(r1)	/* Return to switch caller */
> +
> +	li	r23,0
> +	std	r23,96(r1)	/* 96 == STACK_FRAME_MARKER * sizeof(long) */
> +
>  	mfcr	r23
>  	std	r23,_CCR(r1)
>  	std	r1,KSP(r3)	/* Set old stack pointer */
> 
> 

This may be sufficient to avoid the condition, but if the contents of
frame 0 are truely uninitialized (not to be trusted), should the
unwinder be even looking at that frame (for STACK_FRAMES_REGS_MARKER),
aside from the LR and other stack size geometry sanity checks?

> <snap>
> 
> >
> > save_stack_trace_tsk_reliable
> > =============================
> >
> > arch/powerpc/kernel/stacktrace.c :: save_stack_trace_tsk_reliable() does
> > take into account the first stackframe, but only to verify that the link
> > register is indeed pointing at kernel code address.
> 
> It's actually the other way around:
> 
> 	if (!firstframe && !__kernel_text_address(ip))
> 		return 1;
> 
> 
> So the address gets sanitized only if it's _not_ coming from the first
> frame.

Yup, that's right, I had it backwards.

Thanks!

-- Joe

  reply	other threads:[~2019-01-11  1:09 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-10 21:14 ppc64le reliable stack unwinder and scheduled tasks Joe Lawrence
2019-01-10 21:14 ` Joe Lawrence
2019-01-11  0:00 ` Nicolai Stange
2019-01-11  0:00   ` Nicolai Stange
2019-01-11  1:08   ` Joe Lawrence [this message]
2019-01-11  1:08     ` Joe Lawrence
2019-01-11  7:51     ` Nicolai Stange
2019-01-11  7:51       ` Nicolai Stange
2019-01-14  4:09       ` Joe Lawrence
2019-01-14  4:09         ` Joe Lawrence
2019-01-14  7:21         ` Nicolai Stange
2019-01-14  7:21           ` Nicolai Stange
2019-01-14 16:46           ` Joe Lawrence
2019-01-14 16:46             ` Joe Lawrence
2019-01-14 17:09             ` Josh Poimboeuf
2019-01-14 17:09               ` Josh Poimboeuf
2019-01-14 17:54               ` Joe Lawrence
2019-01-14 17:54                 ` Joe Lawrence
2019-01-12  1:09 ` Balbir Singh
2019-01-12  1:09   ` Balbir Singh
2019-01-12  8:45   ` Segher Boessenkool
2019-01-13 12:33     ` Balbir Singh
2019-01-13 13:05       ` Torsten Duwe
2019-01-13 13:05         ` Torsten Duwe
2019-01-17 14:52       ` Segher Boessenkool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190111010808.GA17858@redhat.com \
    --to=joe.lawrence@redhat.com \
    --cc=duwe@lst.de \
    --cc=jkosina@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=live-patching@vger.kernel.org \
    --cc=nstange@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.