From: Balbir Singh <bsingharora@gmail.com>
To: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>,
linux-kernel@vger.kernel.org, Torsten Duwe <duwe@lst.de>,
live-patching@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
Subject: Re: ppc64le reliable stack unwinder and scheduled tasks
Date: Sat, 12 Jan 2019 12:09:14 +1100 [thread overview]
Message-ID: <20190112010914.GB10427@350D> (raw)
In-Reply-To: <7f468285-b149-37e2-e782-c9e538b997a9@redhat.com>
On Thu, Jan 10, 2019 at 04:14:00PM -0500, Joe Lawrence wrote:
> Hi all,
>
> tl;dr: On ppc64le, what is top-most stack frame for scheduled tasks
> about?
>
> I am looking at a bug in which ~10% of livepatch tests on RHEL-7 and
> RHEL-8 distro kernels, the ppc64le reliable stack unwinder consistently
> reports an unreliable stack for a given task. This condition can
> eventually resolve itself, but sometimes this state remains wedged for
> hours and I can live-debug the system with crash-utility.
>
In summary, you think the reliable stack tracer is being conservative?
xmon (built in debugger) also allows for live-debugging and might
be more easier to get some of this done.
> I have tried reproducing with upstream 4.20 kernel, but haven't seen
> this exact condition yet. More on upstream in a bit.
>
> That said, I have a question about the ppc64 stack frame layout with
> regard to scheduled tasks. In each sticky "unreliable" stack trace
> instance that I've looked at, the task's stack has an interesting
> frame at the top:
>
Could you please define interesting frame on top a bit more? Usually
the topmost return address is in LR
>
> Example 1 (RHEL-7)
> ==================
>
> crash> struct task_struct.thread c00000022fd015c0 | grep ksp
> ksp = 0xc0000000288af9c0
>
> crash> rd 0xc0000000288af9c0 -e 0xc0000000288b0000
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[0]:
>
> c0000000288af9c0: c0000000288afb90 0000000000dd0000 ...(............
> c0000000288af9d0: c000000000002a94 c000000001c60a00 .*..............
>
> crash> sym c000000000002a94
> c000000000002a94 (T) hardware_interrupt_common+0x114
What does bt look like in this case? what does r1 point to? I would
look at xmon and see what the "t" (stack trace) looks like, I think
it's a more familiar tool.
>
> c0000000288af9e0: c000000001c60a80 0000000000000000 ................
> c0000000288af9f0: c0000000288afbc0 0000000000dd0000 ...(............
> c0000000288afa00: c0000000014322e0 c000000001c60a00 ."C.............
> c0000000288afa10: c0000002303ae380 c0000002303ae380 ..:0......:0....
> c0000000288afa20: 7265677368657265 0000000000002200 erehsger."......
>
> Uh-oh...
>
> /* Mark stacktraces with exception frames as unreliable. */
> stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER
>
> c0000000288afa30: c0000000001240ac c0000000288afcb0 .@.........(....
> c0000000288afa40: c0000000013e4d00 0000000000000000 .M>.............
> c0000000288afa50: 0000000000000001 0000000000000001 ................
> c0000000288afa60: c0000000014322e0 0000000000000804 ."C.............
> c0000000288afa70: c0000000288ac080 c00000022fd015c0 ...(......./....
> c0000000288afa80: c0000000288afae0 00000000ffffffff ...(............
> c0000000288afa90: c0000000288afae0 c000000007b80900 ...(............
> c0000000288afaa0: c000000000e90a00 c000000000e90a00 ................
> c0000000288afab0: c0000000001240ac c000000000e90a00 .@..............
> c0000000288afac0: c000000000e90a00 c000000004790580 ..........y.....
> c0000000288afad0: 0000000000000001 c000000000e90a00 ................
> c0000000288afae0: 0000000000000000 0000000000000004 ................
> c0000000288afaf0: c00000022fd01ad0 c000000000e73be0 .../.....;......
> c0000000288afb00: c00000023900f450 c000000001488190 P..9......H.....
> c0000000288afb10: 0000000000ad0000 c00000023900ef40 ........@..9....
> c0000000288afb20: c0000000288ac000 c000000000e73be0 ...(.....;......
> c0000000288afb30: c00000000001b514 c00000022fd01628 ........(../....
> c0000000288afb40: c0000000288afbb0 c000000000008800 ...(............
> c0000000288afb50: c000000000162880 0000000000000000 .(..............
> c0000000288afb60: 00000000240f3022 0000000000000004 "0.$............
> c0000000288afb70: c000000000e90a00 c00000022fd01a20 ........ ../....
> c0000000288afb80: c0000000288afbf0 c0000000014322e0 ...(....."C.....
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[1]:
>
> c0000000288afb90: c0000000288afbf0 c000000001960a00 ...(............
> c0000000288afba0: c00000000001b514 0000000000000004 ................
>
> crash> sym c00000000001b514
> c00000000001b514 (T) __switch_to+0x264
>
> c0000000288afbb0: c000000000e90a00 0000000000000000 ................
> c0000000288afbc0: c0000000288ac000 c0000000014322e0 ...(....."C.....
> c0000000288afbd0: c000000000e90a00 c000000001960a00 ................
> c0000000288afbe0: c00000022fd015c0 c00000023900ef40 .../....@..9....
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[2]:
>
> c0000000288afbf0: c0000000288afcd0 c000000000003300 ...(.....3......
> c0000000288afc00: c000000000a83918 c0000000013e4d00 .9.......M>.....
>
> crash> sym c000000000a83918
> c000000000a83918 (t) __schedule+0x428
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[3]:
>
> c0000000288afcd0: c0000000288afd80 0000000000003300 ...(.....3......
> c0000000288afce0: c0000000001240ac 0000000000000000 .@..............
>
> crash> sym c0000000001240ac
> c0000000001240ac (t) rescuer_thread+0x3ac
>
> [ ... and so on as we would normally expect ... ]
>
>
> Example 2 - (RHEL-7)
> ====================
>
> crash> struct task_struct.thread c00000004a660a00 | grep ksp
> ksp = 0xc0000005e85439c0,
>
> crash> rd 0xc0000005e8543b90 -e 0xc0000005e8544000
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[0]:
> c0000005e85439c0: c0000005e8543b90 0000000000000000 .;T.............
> c0000005e85439d0: c000000000002a94 0000000000000000 .*..............
>
> crash> sym c000000000002a94
> c000000000002a94 (T) hardware_interrupt_common+0x114
>
> c0000005e8543a20: 7265677368657265 c0000000013e4d00 erehsger.M>.....
>
> /* Mark stacktraces with exception frames as unreliable. */
> stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[1]:
> c0000005e8543b90: c0000005e8543bf0 c000000001960a00 .;T.............
> c0000005e8543ba0: c00000000001b514 0000000000000004 ................
>
> crash> sym c00000000001b514
> c00000000001b514 (T) __switch_to+0x264
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[2]:
> c0000005e8543bf0: c0000005e8543cd0 c000000000003300 .<T......3......
> c0000005e8543c00: c000000000a83918 c0000000013e4d00 .9.......M>.....
>
> crash> sym c000000000a83918
> c000000000a83918 (t) __schedule+0x428
>
> [ ... and so on as we would normally expect ... ]
>
>
> Example 3 (upstream 4.20)
> =========================
>
> This is not a repro of the sticky "unreliable" stack trace, but just the
> stack of a workqueue that fired and called msleep.
>
>
> crash> struct task_struct.thread c0000002257f0e00 | grep ksp
> ksp = 0xc00000022715f9b0,
>
> crash> rd 0xc00000022715f9b0 100
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[0]:
>
> c00000022715f9b0: c00000022715fb90 c00000023f4e8280 ...'......N?....
> c00000022715f9c0: c00000022715f9e0 0000000000000003 ...'............
>
> crash> sym c00000022715f9e0
> c00000022715f9e0 (B) _end+0x22594f9e0
That is weird, can you compare this to the output of "bt" or "t"
from xmon?
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[1]:
>
> c00000022715fb90: c00000022715fbf0 c00000000158634c ...'....LcX.....
> c00000022715fba0: c00000000001ebfc c000000001318a60 ........`.1.....
>
> crash> sym c00000000001ebfc
> c00000000001ebfc (T) __switch_to+0x2dc
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[2]:
>
> c00000022715fbf0: c00000022715fcd0 c000000000008800 ...'............
> c00000022715fc00: c000000000abcf40 c0000000013abf00 @.........:.....
>
> crash> sym c000000000abcf40
> c000000000abcf40 (t) __schedule+0x3b0
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[3]:
>
> c00000022715fcd0: c00000022715fd00 2c50fff000000000 ...'..........P,
> c00000022715fce0: c000000000abd628 c00000023bff57e8 (........W.;....
>
> crash> sym c000000000abd628
> c000000000abd628 (T) schedule+0x48
>
> [ ... etc ... ]
>
>
> save_stack_trace_tsk_reliable
> =============================
>
> arch/powerpc/kernel/stacktrace.c :: save_stack_trace_tsk_reliable() does
> take into account the first stackframe, but only to verify that the link
> register is indeed pointing at kernel code address.
>
> Can someone explain what __switch_to is doing with the stack and whether
> in such circumstances, the reliable stack unwinder should be skipping
> the first frame when verifying the frame contents like STACK_FRAME_MARKER,
> etc.
>
> I may be on the wrong path in debugging this, but figuring out this
> sp[0] frame state would be helpful.
>
I would compare the output of xmon across the unreliable stack frames with
the contents of what the stack unwinder has.
I suspect the patch is stuck trying to transition to enabled state, it'll
be interesting to see if we are really stuck
Balbir Singh.
WARNING: multiple messages have this Message-ID (diff)
From: Balbir Singh <bsingharora@gmail.com>
To: Joe Lawrence <joe.lawrence@redhat.com>
Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
live-patching@vger.kernel.org, Torsten Duwe <duwe@lst.de>,
Michael Ellerman <mpe@ellerman.id.au>,
Jiri Kosina <jkosina@suse.cz>
Subject: Re: ppc64le reliable stack unwinder and scheduled tasks
Date: Sat, 12 Jan 2019 12:09:14 +1100 [thread overview]
Message-ID: <20190112010914.GB10427@350D> (raw)
In-Reply-To: <7f468285-b149-37e2-e782-c9e538b997a9@redhat.com>
On Thu, Jan 10, 2019 at 04:14:00PM -0500, Joe Lawrence wrote:
> Hi all,
>
> tl;dr: On ppc64le, what is top-most stack frame for scheduled tasks
> about?
>
> I am looking at a bug in which ~10% of livepatch tests on RHEL-7 and
> RHEL-8 distro kernels, the ppc64le reliable stack unwinder consistently
> reports an unreliable stack for a given task. This condition can
> eventually resolve itself, but sometimes this state remains wedged for
> hours and I can live-debug the system with crash-utility.
>
In summary, you think the reliable stack tracer is being conservative?
xmon (built in debugger) also allows for live-debugging and might
be more easier to get some of this done.
> I have tried reproducing with upstream 4.20 kernel, but haven't seen
> this exact condition yet. More on upstream in a bit.
>
> That said, I have a question about the ppc64 stack frame layout with
> regard to scheduled tasks. In each sticky "unreliable" stack trace
> instance that I've looked at, the task's stack has an interesting
> frame at the top:
>
Could you please define interesting frame on top a bit more? Usually
the topmost return address is in LR
>
> Example 1 (RHEL-7)
> ==================
>
> crash> struct task_struct.thread c00000022fd015c0 | grep ksp
> ksp = 0xc0000000288af9c0
>
> crash> rd 0xc0000000288af9c0 -e 0xc0000000288b0000
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[0]:
>
> c0000000288af9c0: c0000000288afb90 0000000000dd0000 ...(............
> c0000000288af9d0: c000000000002a94 c000000001c60a00 .*..............
>
> crash> sym c000000000002a94
> c000000000002a94 (T) hardware_interrupt_common+0x114
What does bt look like in this case? what does r1 point to? I would
look at xmon and see what the "t" (stack trace) looks like, I think
it's a more familiar tool.
>
> c0000000288af9e0: c000000001c60a80 0000000000000000 ................
> c0000000288af9f0: c0000000288afbc0 0000000000dd0000 ...(............
> c0000000288afa00: c0000000014322e0 c000000001c60a00 ."C.............
> c0000000288afa10: c0000002303ae380 c0000002303ae380 ..:0......:0....
> c0000000288afa20: 7265677368657265 0000000000002200 erehsger."......
>
> Uh-oh...
>
> /* Mark stacktraces with exception frames as unreliable. */
> stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER
>
> c0000000288afa30: c0000000001240ac c0000000288afcb0 .@.........(....
> c0000000288afa40: c0000000013e4d00 0000000000000000 .M>.............
> c0000000288afa50: 0000000000000001 0000000000000001 ................
> c0000000288afa60: c0000000014322e0 0000000000000804 ."C.............
> c0000000288afa70: c0000000288ac080 c00000022fd015c0 ...(......./....
> c0000000288afa80: c0000000288afae0 00000000ffffffff ...(............
> c0000000288afa90: c0000000288afae0 c000000007b80900 ...(............
> c0000000288afaa0: c000000000e90a00 c000000000e90a00 ................
> c0000000288afab0: c0000000001240ac c000000000e90a00 .@..............
> c0000000288afac0: c000000000e90a00 c000000004790580 ..........y.....
> c0000000288afad0: 0000000000000001 c000000000e90a00 ................
> c0000000288afae0: 0000000000000000 0000000000000004 ................
> c0000000288afaf0: c00000022fd01ad0 c000000000e73be0 .../.....;......
> c0000000288afb00: c00000023900f450 c000000001488190 P..9......H.....
> c0000000288afb10: 0000000000ad0000 c00000023900ef40 ........@..9....
> c0000000288afb20: c0000000288ac000 c000000000e73be0 ...(.....;......
> c0000000288afb30: c00000000001b514 c00000022fd01628 ........(../....
> c0000000288afb40: c0000000288afbb0 c000000000008800 ...(............
> c0000000288afb50: c000000000162880 0000000000000000 .(..............
> c0000000288afb60: 00000000240f3022 0000000000000004 "0.$............
> c0000000288afb70: c000000000e90a00 c00000022fd01a20 ........ ../....
> c0000000288afb80: c0000000288afbf0 c0000000014322e0 ...(....."C.....
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[1]:
>
> c0000000288afb90: c0000000288afbf0 c000000001960a00 ...(............
> c0000000288afba0: c00000000001b514 0000000000000004 ................
>
> crash> sym c00000000001b514
> c00000000001b514 (T) __switch_to+0x264
>
> c0000000288afbb0: c000000000e90a00 0000000000000000 ................
> c0000000288afbc0: c0000000288ac000 c0000000014322e0 ...(....."C.....
> c0000000288afbd0: c000000000e90a00 c000000001960a00 ................
> c0000000288afbe0: c00000022fd015c0 c00000023900ef40 .../....@..9....
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[2]:
>
> c0000000288afbf0: c0000000288afcd0 c000000000003300 ...(.....3......
> c0000000288afc00: c000000000a83918 c0000000013e4d00 .9.......M>.....
>
> crash> sym c000000000a83918
> c000000000a83918 (t) __schedule+0x428
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[3]:
>
> c0000000288afcd0: c0000000288afd80 0000000000003300 ...(.....3......
> c0000000288afce0: c0000000001240ac 0000000000000000 .@..............
>
> crash> sym c0000000001240ac
> c0000000001240ac (t) rescuer_thread+0x3ac
>
> [ ... and so on as we would normally expect ... ]
>
>
> Example 2 - (RHEL-7)
> ====================
>
> crash> struct task_struct.thread c00000004a660a00 | grep ksp
> ksp = 0xc0000005e85439c0,
>
> crash> rd 0xc0000005e8543b90 -e 0xc0000005e8544000
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[0]:
> c0000005e85439c0: c0000005e8543b90 0000000000000000 .;T.............
> c0000005e85439d0: c000000000002a94 0000000000000000 .*..............
>
> crash> sym c000000000002a94
> c000000000002a94 (T) hardware_interrupt_common+0x114
>
> c0000005e8543a20: 7265677368657265 c0000000013e4d00 erehsger.M>.....
>
> /* Mark stacktraces with exception frames as unreliable. */
> stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[1]:
> c0000005e8543b90: c0000005e8543bf0 c000000001960a00 .;T.............
> c0000005e8543ba0: c00000000001b514 0000000000000004 ................
>
> crash> sym c00000000001b514
> c00000000001b514 (T) __switch_to+0x264
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[2]:
> c0000005e8543bf0: c0000005e8543cd0 c000000000003300 .<T......3......
> c0000005e8543c00: c000000000a83918 c0000000013e4d00 .9.......M>.....
>
> crash> sym c000000000a83918
> c000000000a83918 (t) __schedule+0x428
>
> [ ... and so on as we would normally expect ... ]
>
>
> Example 3 (upstream 4.20)
> =========================
>
> This is not a repro of the sticky "unreliable" stack trace, but just the
> stack of a workqueue that fired and called msleep.
>
>
> crash> struct task_struct.thread c0000002257f0e00 | grep ksp
> ksp = 0xc00000022715f9b0,
>
> crash> rd 0xc00000022715f9b0 100
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[0]:
>
> c00000022715f9b0: c00000022715fb90 c00000023f4e8280 ...'......N?....
> c00000022715f9c0: c00000022715f9e0 0000000000000003 ...'............
>
> crash> sym c00000022715f9e0
> c00000022715f9e0 (B) _end+0x22594f9e0
That is weird, can you compare this to the output of "bt" or "t"
from xmon?
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[1]:
>
> c00000022715fb90: c00000022715fbf0 c00000000158634c ...'....LcX.....
> c00000022715fba0: c00000000001ebfc c000000001318a60 ........`.1.....
>
> crash> sym c00000000001ebfc
> c00000000001ebfc (T) __switch_to+0x2dc
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[2]:
>
> c00000022715fbf0: c00000022715fcd0 c000000000008800 ...'............
> c00000022715fc00: c000000000abcf40 c0000000013abf00 @.........:.....
>
> crash> sym c000000000abcf40
> c000000000abcf40 (t) __schedule+0x3b0
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> sp[3]:
>
> c00000022715fcd0: c00000022715fd00 2c50fff000000000 ...'..........P,
> c00000022715fce0: c000000000abd628 c00000023bff57e8 (........W.;....
>
> crash> sym c000000000abd628
> c000000000abd628 (T) schedule+0x48
>
> [ ... etc ... ]
>
>
> save_stack_trace_tsk_reliable
> =============================
>
> arch/powerpc/kernel/stacktrace.c :: save_stack_trace_tsk_reliable() does
> take into account the first stackframe, but only to verify that the link
> register is indeed pointing at kernel code address.
>
> Can someone explain what __switch_to is doing with the stack and whether
> in such circumstances, the reliable stack unwinder should be skipping
> the first frame when verifying the frame contents like STACK_FRAME_MARKER,
> etc.
>
> I may be on the wrong path in debugging this, but figuring out this
> sp[0] frame state would be helpful.
>
I would compare the output of xmon across the unreliable stack frames with
the contents of what the stack unwinder has.
I suspect the patch is stuck trying to transition to enabled state, it'll
be interesting to see if we are really stuck
Balbir Singh.
next prev parent reply other threads:[~2019-01-12 1:10 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-10 21:14 ppc64le reliable stack unwinder and scheduled tasks Joe Lawrence
2019-01-10 21:14 ` Joe Lawrence
2019-01-11 0:00 ` Nicolai Stange
2019-01-11 0:00 ` Nicolai Stange
2019-01-11 1:08 ` Joe Lawrence
2019-01-11 1:08 ` Joe Lawrence
2019-01-11 7:51 ` Nicolai Stange
2019-01-11 7:51 ` Nicolai Stange
2019-01-14 4:09 ` Joe Lawrence
2019-01-14 4:09 ` Joe Lawrence
2019-01-14 7:21 ` Nicolai Stange
2019-01-14 7:21 ` Nicolai Stange
2019-01-14 16:46 ` Joe Lawrence
2019-01-14 16:46 ` Joe Lawrence
2019-01-14 17:09 ` Josh Poimboeuf
2019-01-14 17:09 ` Josh Poimboeuf
2019-01-14 17:54 ` Joe Lawrence
2019-01-14 17:54 ` Joe Lawrence
2019-01-12 1:09 ` Balbir Singh [this message]
2019-01-12 1:09 ` Balbir Singh
2019-01-12 8:45 ` Segher Boessenkool
2019-01-13 12:33 ` Balbir Singh
2019-01-13 13:05 ` Torsten Duwe
2019-01-13 13:05 ` Torsten Duwe
2019-01-17 14:52 ` Segher Boessenkool
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190112010914.GB10427@350D \
--to=bsingharora@gmail.com \
--cc=duwe@lst.de \
--cc=jkosina@suse.cz \
--cc=joe.lawrence@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=live-patching@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.