linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* Recent Power changes and stack_trace_save_tsk_reliable?
@ 2023-08-29 15:12 Joe Lawrence
  2023-08-30  0:32 ` Michael Ellerman
  0 siblings, 1 reply; 7+ messages in thread
From: Joe Lawrence @ 2023-08-29 15:12 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: live-patching, Joe Lawrence, Ryan Sullivan

Hi ppc-dev list,

We noticed that our kpatch integration tests started failing on ppc64le
when targeting the upstream v6.4 kernel, and then confirmed that the
in-tree livepatching kselftests similarly fail, too.  From the kselftest
results, it appears that livepatch transitions are no longer completing.

Looking at the commit logs for v6.4, there looks to be some churn in the
powerpc stack layout code -- I am suspicious that "reliable" stack
unwinding may be left untested/broken after those changes.  AFAICT, the
livepatching subsystem is the only user of this interface in
kernel/livepatch/transition.c :: klp_check_stack()'s call to
stack_trace_save_tsk_reliable().  As such, the livepatching kselftests
are probably the only way to test reliable unwinding.

Unfortunately, git bisect isn't cooperating (we keep falling into a long
span of non-bootable commits, despite efforts to `git bisect skip` over
them), so we don't have an offending commit or patchset to point to.

A few other details:

- Test machine is a Power 9 9009-42A (IBM Power System S924)
- Reproducable with v6.4, v6.5
- Minimal repro:
-- Build with CONFIG_TEST_LIVEPATCH=m
-- Run tools/testing/selftests/livepatch/test-livepatch.sh

If this has already been report or fixed, please send any pointers to
threads / commits.  If not, I can provide any other info to help
reproduce.

Thanks,

--
Joe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Recent Power changes and stack_trace_save_tsk_reliable?
  2023-08-29 15:12 Recent Power changes and stack_trace_save_tsk_reliable? Joe Lawrence
@ 2023-08-30  0:32 ` Michael Ellerman
  2023-08-30  6:37   ` Michael Ellerman
  0 siblings, 1 reply; 7+ messages in thread
From: Michael Ellerman @ 2023-08-30  0:32 UTC (permalink / raw)
  To: Joe Lawrence, linuxppc-dev; +Cc: live-patching, Joe Lawrence, Ryan Sullivan

Joe Lawrence <joe.lawrence@redhat.com> writes:
> Hi ppc-dev list,
>
> We noticed that our kpatch integration tests started failing on ppc64le
> when targeting the upstream v6.4 kernel, and then confirmed that the
> in-tree livepatching kselftests similarly fail, too.  From the kselftest
> results, it appears that livepatch transitions are no longer completing.

Hi Joe,

Thanks for the report.

I thought I was running the livepatch tests, but looks like somewhere
along the line my kernel .config lost CONFIG_TEST_LIVEPATCH=m, so I have
been running the test but it just skips. :/

I can reproduce the failure, and will see if I can bisect it more
successfully.

cheers

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Recent Power changes and stack_trace_save_tsk_reliable?
  2023-08-30  0:32 ` Michael Ellerman
@ 2023-08-30  6:37   ` Michael Ellerman
  2023-08-30 21:47     ` Joe Lawrence
  0 siblings, 1 reply; 7+ messages in thread
From: Michael Ellerman @ 2023-08-30  6:37 UTC (permalink / raw)
  To: Joe Lawrence, linuxppc-dev
  Cc: live-patching, Joe Lawrence, Ryan Sullivan, Nicholas Piggin

Michael Ellerman <mpe@ellerman.id.au> writes:
> Joe Lawrence <joe.lawrence@redhat.com> writes:
>> Hi ppc-dev list,
>>
>> We noticed that our kpatch integration tests started failing on ppc64le
>> when targeting the upstream v6.4 kernel, and then confirmed that the
>> in-tree livepatching kselftests similarly fail, too.  From the kselftest
>> results, it appears that livepatch transitions are no longer completing.
>
> Hi Joe,
>
> Thanks for the report.
>
> I thought I was running the livepatch tests, but looks like somewhere
> along the line my kernel .config lost CONFIG_TEST_LIVEPATCH=m, so I have
> been running the test but it just skips. :/
>
> I can reproduce the failure, and will see if I can bisect it more
> successfully.

It's caused by:

  eed7c420aac7 ("powerpc: copy_thread differentiate kthreads and user mode threads")

Which is obvious in hindsight :)

The diff below fixes it for me, can you test that on your setup?

A proper fix will need to be a bit bigger because the comments in there
are all slightly wrong now since the above commit.

Possibly we can also rework that code more substantially now that
copy_thread() is more careful about setting things up, but that would be
a follow-up.

diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c
index 5de8597eaab8..d0b3509f13ee 100644
--- a/arch/powerpc/kernel/stacktrace.c
+++ b/arch/powerpc/kernel/stacktrace.c
@@ -73,7 +73,7 @@ int __no_sanitize_address arch_stack_walk_reliable(stack_trace_consume_fn consum
 	bool firstframe;
 
 	stack_end = stack_page + THREAD_SIZE;
-	if (!is_idle_task(task)) {
+	if (!(task->flags & PF_KTHREAD)) {
 		/*
 		 * For user tasks, this is the SP value loaded on
 		 * kernel entry, see "PACAKSAVE(r13)" in _switch() and


cheers

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Recent Power changes and stack_trace_save_tsk_reliable?
  2023-08-30  6:37   ` Michael Ellerman
@ 2023-08-30 21:47     ` Joe Lawrence
  2023-09-20 14:16       ` Petr Mladek
  0 siblings, 1 reply; 7+ messages in thread
From: Joe Lawrence @ 2023-08-30 21:47 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev
  Cc: live-patching, Ryan Sullivan, Nicholas Piggin

On 8/30/23 02:37, Michael Ellerman wrote:
> Michael Ellerman <mpe@ellerman.id.au> writes:
>> Joe Lawrence <joe.lawrence@redhat.com> writes:
>>> Hi ppc-dev list,
>>>
>>> We noticed that our kpatch integration tests started failing on ppc64le
>>> when targeting the upstream v6.4 kernel, and then confirmed that the
>>> in-tree livepatching kselftests similarly fail, too.  From the kselftest
>>> results, it appears that livepatch transitions are no longer completing.
>>
>> Hi Joe,
>>
>> Thanks for the report.
>>
>> I thought I was running the livepatch tests, but looks like somewhere
>> along the line my kernel .config lost CONFIG_TEST_LIVEPATCH=m, so I have
>> been running the test but it just skips. :/
>>

That config option is easy to drop if you use `make localmodconfig` to
try and expedite the builds :D  Been there, done that too many times.

>> I can reproduce the failure, and will see if I can bisect it more
>> successfully.
> 
> It's caused by:
> 
>   eed7c420aac7 ("powerpc: copy_thread differentiate kthreads and user mode threads")
> 
> Which is obvious in hindsight :)
> 
> The diff below fixes it for me, can you test that on your setup?
> 

Thanks for the fast triage of this one.  The proposed fix works well on
our setup.  I have yet to try the kpatch integration tests with this,
but I can verify that all of the kernel livepatching kselftests now
happily run.

--
Joe

> A proper fix will need to be a bit bigger because the comments in there
> are all slightly wrong now since the above commit.
> 
> Possibly we can also rework that code more substantially now that
> copy_thread() is more careful about setting things up, but that would be
> a follow-up.
> 
> diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c
> index 5de8597eaab8..d0b3509f13ee 100644
> --- a/arch/powerpc/kernel/stacktrace.c
> +++ b/arch/powerpc/kernel/stacktrace.c
> @@ -73,7 +73,7 @@ int __no_sanitize_address arch_stack_walk_reliable(stack_trace_consume_fn consum
>  	bool firstframe;
>  
>  	stack_end = stack_page + THREAD_SIZE;
> -	if (!is_idle_task(task)) {
> +	if (!(task->flags & PF_KTHREAD)) {
>  		/*
>  		 * For user tasks, this is the SP value loaded on
>  		 * kernel entry, see "PACAKSAVE(r13)" in _switch() and
> 
> 
> cheers
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Recent Power changes and stack_trace_save_tsk_reliable?
  2023-08-30 21:47     ` Joe Lawrence
@ 2023-09-20 14:16       ` Petr Mladek
  2023-09-21 12:26         ` Michael Ellerman
  0 siblings, 1 reply; 7+ messages in thread
From: Petr Mladek @ 2023-09-20 14:16 UTC (permalink / raw)
  To: Joe Lawrence; +Cc: Ryan Sullivan, linuxppc-dev, Nicholas Piggin, live-patching

On Wed 2023-08-30 17:47:35, Joe Lawrence wrote:
> On 8/30/23 02:37, Michael Ellerman wrote:
> > Michael Ellerman <mpe@ellerman.id.au> writes:
> >> Joe Lawrence <joe.lawrence@redhat.com> writes:
> >>> Hi ppc-dev list,
> >>>
> >>> We noticed that our kpatch integration tests started failing on ppc64le
> >>> when targeting the upstream v6.4 kernel, and then confirmed that the
> >>> in-tree livepatching kselftests similarly fail, too.  From the kselftest
> >>> results, it appears that livepatch transitions are no longer completing.
> >>
> >> Hi Joe,
> >>
> >> Thanks for the report.
> >>
> >> I thought I was running the livepatch tests, but looks like somewhere
> >> along the line my kernel .config lost CONFIG_TEST_LIVEPATCH=m, so I have
> >> been running the test but it just skips. :/
> >>
> 
> That config option is easy to drop if you use `make localmodconfig` to
> try and expedite the builds :D  Been there, done that too many times.
> 
> >> I can reproduce the failure, and will see if I can bisect it more
> >> successfully.
> > 
> > It's caused by:
> > 
> >   eed7c420aac7 ("powerpc: copy_thread differentiate kthreads and user mode threads")
> > 
> > Which is obvious in hindsight :)
> > 
> > The diff below fixes it for me, can you test that on your setup?
> > 
> 
> Thanks for the fast triage of this one.  The proposed fix works well on
> our setup.  I have yet to try the kpatch integration tests with this,
> but I can verify that all of the kernel livepatching kselftests now
> happily run.

Have this been somehow handled, please? I do not see the proposed
change in linux-next as of now.

> > A proper fix will need to be a bit bigger because the comments in there
> > are all slightly wrong now since the above commit.
> > 
> > Possibly we can also rework that code more substantially now that
> > copy_thread() is more careful about setting things up, but that would be
> > a follow-up.
> > 
> > diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c
> > index 5de8597eaab8..d0b3509f13ee 100644
> > --- a/arch/powerpc/kernel/stacktrace.c
> > +++ b/arch/powerpc/kernel/stacktrace.c
> > @@ -73,7 +73,7 @@ int __no_sanitize_address arch_stack_walk_reliable(stack_trace_consume_fn consum
> >  	bool firstframe;
> >  
> >  	stack_end = stack_page + THREAD_SIZE;
> > -	if (!is_idle_task(task)) {
> > +	if (!(task->flags & PF_KTHREAD)) {
> >  		/*
> >  		 * For user tasks, this is the SP value loaded on
> >  		 * kernel entry, see "PACAKSAVE(r13)" in _switch() and

If I read the change in the commit eed7c420aac7fde ("powerpc: copy_thread
differentiate kthreads and user mode threads") correctly then the
above fix is correct.

It is probably just enough to update the comment about that
STACK_FRAME_MIN_SIZE is used by all kthreads.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Recent Power changes and stack_trace_save_tsk_reliable?
  2023-09-20 14:16       ` Petr Mladek
@ 2023-09-21 12:26         ` Michael Ellerman
  2023-09-21 13:22           ` Joe Lawrence
  0 siblings, 1 reply; 7+ messages in thread
From: Michael Ellerman @ 2023-09-21 12:26 UTC (permalink / raw)
  To: Petr Mladek, Joe Lawrence
  Cc: live-patching, linuxppc-dev, Nicholas Piggin, Ryan Sullivan

Petr Mladek <pmladek@suse.com> writes:
> On Wed 2023-08-30 17:47:35, Joe Lawrence wrote:
>> On 8/30/23 02:37, Michael Ellerman wrote:
>> > Michael Ellerman <mpe@ellerman.id.au> writes:
>> >> Joe Lawrence <joe.lawrence@redhat.com> writes:
>> >>> We noticed that our kpatch integration tests started failing on ppc64le
>> >>> when targeting the upstream v6.4 kernel, and then confirmed that the
>> >>> in-tree livepatching kselftests similarly fail, too.  From the kselftest
>> >>> results, it appears that livepatch transitions are no longer completing.
...
>> > 
>> > The diff below fixes it for me, can you test that on your setup?
>> > 
>> 
>> Thanks for the fast triage of this one.  The proposed fix works well on
>> our setup.  I have yet to try the kpatch integration tests with this,
>> but I can verify that all of the kernel livepatching kselftests now
>> happily run.
>
> Have this been somehow handled, please? I do not see the proposed
> change in linux-next as of now.

I thought I was waiting for Joe to run the kpatch integration tests, but
in hindsight maybe he was hinting that someone else should run them (ie. me) ;)

Patch incoming.

cheers

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Recent Power changes and stack_trace_save_tsk_reliable?
  2023-09-21 12:26         ` Michael Ellerman
@ 2023-09-21 13:22           ` Joe Lawrence
  0 siblings, 0 replies; 7+ messages in thread
From: Joe Lawrence @ 2023-09-21 13:22 UTC (permalink / raw)
  To: Michael Ellerman, Petr Mladek
  Cc: live-patching, linuxppc-dev, Nicholas Piggin, Ryan Sullivan

On 9/21/23 08:26, Michael Ellerman wrote:
> Petr Mladek <pmladek@suse.com> writes:
>> On Wed 2023-08-30 17:47:35, Joe Lawrence wrote:
>>> On 8/30/23 02:37, Michael Ellerman wrote:
>>>> Michael Ellerman <mpe@ellerman.id.au> writes:
>>>>> Joe Lawrence <joe.lawrence@redhat.com> writes:
>>>>>> We noticed that our kpatch integration tests started failing on ppc64le
>>>>>> when targeting the upstream v6.4 kernel, and then confirmed that the
>>>>>> in-tree livepatching kselftests similarly fail, too.  From the kselftest
>>>>>> results, it appears that livepatch transitions are no longer completing.
> ...
>>>>
>>>> The diff below fixes it for me, can you test that on your setup?
>>>>
>>>
>>> Thanks for the fast triage of this one.  The proposed fix works well on
>>> our setup.  I have yet to try the kpatch integration tests with this,
>>> but I can verify that all of the kernel livepatching kselftests now
>>> happily run.
>>
>> Have this been somehow handled, please? I do not see the proposed
>> change in linux-next as of now.
> 
> I thought I was waiting for Joe to run the kpatch integration tests, but
> in hindsight maybe he was hinting that someone else should run them (ie. me) ;)
> 
> Patch incoming.

Ah sorry for the confusion.  kpatch integration tests - that's on me.
If kernel stack unwinding is fixed, I'm pretty confident they will
execute.  I will kick them off today, but don't let that hold up the
kernel patches.

Thanks,
-- 
Joe


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-09-21 13:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-29 15:12 Recent Power changes and stack_trace_save_tsk_reliable? Joe Lawrence
2023-08-30  0:32 ` Michael Ellerman
2023-08-30  6:37   ` Michael Ellerman
2023-08-30 21:47     ` Joe Lawrence
2023-09-20 14:16       ` Petr Mladek
2023-09-21 12:26         ` Michael Ellerman
2023-09-21 13:22           ` Joe Lawrence

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).