linux-s390.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: ftrace hangs waiting for rcu
       [not found]     ` <YfQCohKWJg9H+uID@FVFF77S0Q05N>
@ 2022-01-28 16:08       ` Sven Schnelle
  2022-01-28 16:11         ` Mark Rutland
  0 siblings, 1 reply; 5+ messages in thread
From: Sven Schnelle @ 2022-01-28 16:08 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Steven Rostedt, LKML, Ingo Molnar, Andrew Morton, Yinan Liu,
	Ard Biesheuvel, Kees Cook, Sachin Sant, linuxppc-dev,
	Russell King, linux-arm-kernel, hca, linux-s390, Paul E. McKenney

Hi Mark,

Mark Rutland <mark.rutland@arm.com> writes:

> On arm64 I bisected this down to:
>
>   7a30871b6a27de1a ("rcu-tasks: Introduce ->percpu_enqueue_shift for dynamic queue selection")
>
> Which was going wrong because ilog2() rounds down, and so the shift was wrong
> for any nr_cpus that was not a power-of-two. Paul had already fixed that in
> rcu-next, and just sent a pull request to Linus:
>
>   https://lore.kernel.org/lkml/20220128143251.GA2398275@paulmck-ThinkPad-P17-Gen-1/
>
> With that applied, I no longer see these hangs.
>
> Does your s390 test machine have a non-power-of-two nr_cpus, and does that fix
> the issue for you?

We noticed the PR from Paul and are currently testing the fix. So far
it's looking good. The configuration where we have seen the hang is a
bit unusual:

- 16 physical CPUs on the kvm host
- 248 logical CPUs inside kvm
- debug kernel both on the host and kvm guest

So things are likely a bit slow in the kvm guest. Interesting is that
the number of CPUs is even. But maybe RCU sees an odd number of CPUs
and gets confused before all cpus are brought up. Have to read code/test
to see whether that could be possible.

Thanks for investigating!
Sven

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ftrace hangs waiting for rcu
  2022-01-28 16:08       ` ftrace hangs waiting for rcu Sven Schnelle
@ 2022-01-28 16:11         ` Mark Rutland
  2022-01-28 16:15           ` Paul E. McKenney
  2022-01-28 16:17           ` Sven Schnelle
  0 siblings, 2 replies; 5+ messages in thread
From: Mark Rutland @ 2022-01-28 16:11 UTC (permalink / raw)
  To: Sven Schnelle
  Cc: Steven Rostedt, LKML, Ingo Molnar, Andrew Morton, Yinan Liu,
	Ard Biesheuvel, Kees Cook, Sachin Sant, linuxppc-dev,
	Russell King, linux-arm-kernel, hca, linux-s390, Paul E. McKenney

On Fri, Jan 28, 2022 at 05:08:48PM +0100, Sven Schnelle wrote:
> Hi Mark,
> 
> Mark Rutland <mark.rutland@arm.com> writes:
> 
> > On arm64 I bisected this down to:
> >
> >   7a30871b6a27de1a ("rcu-tasks: Introduce ->percpu_enqueue_shift for dynamic queue selection")
> >
> > Which was going wrong because ilog2() rounds down, and so the shift was wrong
> > for any nr_cpus that was not a power-of-two. Paul had already fixed that in
> > rcu-next, and just sent a pull request to Linus:
> >
> >   https://lore.kernel.org/lkml/20220128143251.GA2398275@paulmck-ThinkPad-P17-Gen-1/
> >
> > With that applied, I no longer see these hangs.
> >
> > Does your s390 test machine have a non-power-of-two nr_cpus, and does that fix
> > the issue for you?
> 
> We noticed the PR from Paul and are currently testing the fix. So far
> it's looking good. The configuration where we have seen the hang is a
> bit unusual:
> 
> - 16 physical CPUs on the kvm host
> - 248 logical CPUs inside kvm

Aha! 248 is notably *NOT* a power of two, and in this case the shift would be
wrong (ilog2() would give 7, when we need a shift of 8).

So I suspect you're hitting the same issue as I was.

Thanks,
Mark.

> - debug kernel both on the host and kvm guest
> 
> So things are likely a bit slow in the kvm guest. Interesting is that
> the number of CPUs is even. But maybe RCU sees an odd number of CPUs
> and gets confused before all cpus are brought up. Have to read code/test
> to see whether that could be possible.
> 
> Thanks for investigating!
> Sven

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ftrace hangs waiting for rcu
  2022-01-28 16:11         ` Mark Rutland
@ 2022-01-28 16:15           ` Paul E. McKenney
  2022-01-28 17:47             ` Paul E. McKenney
  2022-01-28 16:17           ` Sven Schnelle
  1 sibling, 1 reply; 5+ messages in thread
From: Paul E. McKenney @ 2022-01-28 16:15 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Sven Schnelle, Steven Rostedt, LKML, Ingo Molnar, Andrew Morton,
	Yinan Liu, Ard Biesheuvel, Kees Cook, Sachin Sant, linuxppc-dev,
	Russell King, linux-arm-kernel, hca, linux-s390

On Fri, Jan 28, 2022 at 04:11:57PM +0000, Mark Rutland wrote:
> On Fri, Jan 28, 2022 at 05:08:48PM +0100, Sven Schnelle wrote:
> > Hi Mark,
> > 
> > Mark Rutland <mark.rutland@arm.com> writes:
> > 
> > > On arm64 I bisected this down to:
> > >
> > >   7a30871b6a27de1a ("rcu-tasks: Introduce ->percpu_enqueue_shift for dynamic queue selection")
> > >
> > > Which was going wrong because ilog2() rounds down, and so the shift was wrong
> > > for any nr_cpus that was not a power-of-two. Paul had already fixed that in
> > > rcu-next, and just sent a pull request to Linus:
> > >
> > >   https://lore.kernel.org/lkml/20220128143251.GA2398275@paulmck-ThinkPad-P17-Gen-1/
> > >
> > > With that applied, I no longer see these hangs.
> > >
> > > Does your s390 test machine have a non-power-of-two nr_cpus, and does that fix
> > > the issue for you?
> > 
> > We noticed the PR from Paul and are currently testing the fix. So far
> > it's looking good. The configuration where we have seen the hang is a
> > bit unusual:
> > 
> > - 16 physical CPUs on the kvm host
> > - 248 logical CPUs inside kvm
> 
> Aha! 248 is notably *NOT* a power of two, and in this case the shift would be
> wrong (ilog2() would give 7, when we need a shift of 8).
> 
> So I suspect you're hitting the same issue as I was.

And apparently no one runs -next on systems having a non-power-of-two
number of CPUs.  ;-)

							Thanx, Paul

> Thanks,
> Mark.
> 
> > - debug kernel both on the host and kvm guest
> > 
> > So things are likely a bit slow in the kvm guest. Interesting is that
> > the number of CPUs is even. But maybe RCU sees an odd number of CPUs
> > and gets confused before all cpus are brought up. Have to read code/test
> > to see whether that could be possible.
> > 
> > Thanks for investigating!
> > Sven

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ftrace hangs waiting for rcu
  2022-01-28 16:11         ` Mark Rutland
  2022-01-28 16:15           ` Paul E. McKenney
@ 2022-01-28 16:17           ` Sven Schnelle
  1 sibling, 0 replies; 5+ messages in thread
From: Sven Schnelle @ 2022-01-28 16:17 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Steven Rostedt, LKML, Ingo Molnar, Andrew Morton, Yinan Liu,
	Ard Biesheuvel, Kees Cook, Sachin Sant, linuxppc-dev,
	Russell King, linux-arm-kernel, hca, linux-s390, Paul E. McKenney

Hi Mark,

Mark Rutland <mark.rutland@arm.com> writes:

> On Fri, Jan 28, 2022 at 05:08:48PM +0100, Sven Schnelle wrote:
>> We noticed the PR from Paul and are currently testing the fix. So far
>> it's looking good. The configuration where we have seen the hang is a
>> bit unusual:
>> 
>> - 16 physical CPUs on the kvm host
>> - 248 logical CPUs inside kvm
>
> Aha! 248 is notably *NOT* a power of two, and in this case the shift would be
> wrong (ilog2() would give 7, when we need a shift of 8).
>
> So I suspect you're hitting the same issue as I was.

Argh, indeed! I somehow changed 'power of two' to 'odd number' in my
head. I guess it's time for the weekend. :-)

Thanks!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ftrace hangs waiting for rcu
  2022-01-28 16:15           ` Paul E. McKenney
@ 2022-01-28 17:47             ` Paul E. McKenney
  0 siblings, 0 replies; 5+ messages in thread
From: Paul E. McKenney @ 2022-01-28 17:47 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Sven Schnelle, Steven Rostedt, LKML, Ingo Molnar, Andrew Morton,
	Yinan Liu, Ard Biesheuvel, Kees Cook, Sachin Sant, linuxppc-dev,
	Russell King, linux-arm-kernel, hca, linux-s390, andriin

On Fri, Jan 28, 2022 at 08:15:47AM -0800, Paul E. McKenney wrote:
> On Fri, Jan 28, 2022 at 04:11:57PM +0000, Mark Rutland wrote:
> > On Fri, Jan 28, 2022 at 05:08:48PM +0100, Sven Schnelle wrote:
> > > Hi Mark,
> > > 
> > > Mark Rutland <mark.rutland@arm.com> writes:
> > > 
> > > > On arm64 I bisected this down to:
> > > >
> > > >   7a30871b6a27de1a ("rcu-tasks: Introduce ->percpu_enqueue_shift for dynamic queue selection")
> > > >
> > > > Which was going wrong because ilog2() rounds down, and so the shift was wrong
> > > > for any nr_cpus that was not a power-of-two. Paul had already fixed that in
> > > > rcu-next, and just sent a pull request to Linus:
> > > >
> > > >   https://lore.kernel.org/lkml/20220128143251.GA2398275@paulmck-ThinkPad-P17-Gen-1/
> > > >
> > > > With that applied, I no longer see these hangs.
> > > >
> > > > Does your s390 test machine have a non-power-of-two nr_cpus, and does that fix
> > > > the issue for you?
> > > 
> > > We noticed the PR from Paul and are currently testing the fix. So far
> > > it's looking good. The configuration where we have seen the hang is a
> > > bit unusual:
> > > 
> > > - 16 physical CPUs on the kvm host
> > > - 248 logical CPUs inside kvm
> > 
> > Aha! 248 is notably *NOT* a power of two, and in this case the shift would be
> > wrong (ilog2() would give 7, when we need a shift of 8).
> > 
> > So I suspect you're hitting the same issue as I was.
> 
> And apparently no one runs -next on systems having a non-power-of-two
> number of CPUs.  ;-)

And the fix is now in mainline.

							Thanx, Paul

> > Thanks,
> > Mark.
> > 
> > > - debug kernel both on the host and kvm guest
> > > 
> > > So things are likely a bit slow in the kvm guest. Interesting is that
> > > the number of CPUs is even. But maybe RCU sees an odd number of CPUs
> > > and gets confused before all cpus are brought up. Have to read code/test
> > > to see whether that could be possible.
> > > 
> > > Thanks for investigating!
> > > Sven

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-01-28 17:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20220127114249.03b1b52b@gandalf.local.home>
     [not found] ` <YfLjIOlGfFmbh1Zv@FVFF77S0Q05N>
     [not found]   ` <yt9dy231yq90.fsf_-_@linux.ibm.com>
     [not found]     ` <YfQCohKWJg9H+uID@FVFF77S0Q05N>
2022-01-28 16:08       ` ftrace hangs waiting for rcu Sven Schnelle
2022-01-28 16:11         ` Mark Rutland
2022-01-28 16:15           ` Paul E. McKenney
2022-01-28 17:47             ` Paul E. McKenney
2022-01-28 16:17           ` Sven Schnelle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).