public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <frederic@kernel.org>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: linux-kernel@vger.kernel.org
Subject: Re: Scenario TREE07 with CONFIG_PREEMPT_DYNAMIC=n?
Date: Fri, 11 Mar 2022 12:32:56 +0100	[thread overview]
Message-ID: <20220311113256.GB96127@lothringen> (raw)
In-Reply-To: <20220310225219.GE4285@paulmck-ThinkPad-P17-Gen-1>

On Thu, Mar 10, 2022 at 02:52:19PM -0800, Paul E. McKenney wrote:
> On Thu, Mar 10, 2022 at 11:41:03PM +0100, Frederic Weisbecker wrote:
> > On Thu, Mar 10, 2022 at 01:56:30PM -0800, Paul E. McKenney wrote:
> > > Hello, Frederic,
> > > 
> > > I recently added CONFIG_PREEMPT_DYNAMIC=n to the TREE07 file, and since
> > > then am getting roughly one RCU CPU stall warning (or silent hang)
> > > per few tens of hours of rcutorture testing on dual-socket systems.
> > > The stall warnings feature starvation of RCU grace-period kthread.
> > > 
> > > Any advice on debugging this?
> > 
> > Oh, I'm testing that!
> 
> Even better, thank you!  ;-)

One possibly interesting detail: the stalling CPU is stuck on
sync_rcu_do_polled_gp(), which is launched by the recent start_poll_synchronize_rcu_expedited():

[  463.518410]  <IRQ>
[  463.518691]  dump_stack_lvl+0x33/0x42
[  463.519182]  nmi_cpu_backtrace+0xc0/0xe0
[  463.519706]  ? lapic_can_unplug_cpu+0x90/0x90
[  463.522188]  nmi_trigger_cpumask_backtrace+0x82/0xc0
[  463.522863]  rcu_dump_cpu_stacks+0xc0/0xf0
[  463.523410]  rcu_sched_clock_irq+0x6e3/0xa30
[  463.523982]  ? tick_sched_handle.isra.21+0x40/0x40
[  463.524628]  update_process_times+0x87/0xb0
[  463.525183]  tick_sched_handle.isra.21+0x2b/0x40
[  463.525795]  tick_sched_timer+0x5e/0x70
[  463.526306]  __hrtimer_run_queues+0x108/0x240
[  463.526886]  hrtimer_interrupt+0xe0/0x240
[  463.527419]  __sysvec_apic_timer_interrupt+0x55/0xf0
[  463.528436]  sysvec_apic_timer_interrupt+0x43/0x80
[  463.529074]  </IRQ>
[  463.529366]  <TASK>
[  463.529656]  asm_sysvec_apic_timer_interrupt+0x12/0x20
[  463.530333] RIP: 0010:synchronize_rcu_expedited+0x0/0x3f0
[  463.531042] Code: 89 65 48 48 89 ef e8 8f 03 af 00 48 8b 85 80 00 00 00 48 85 c0 0f 84 70 ff ff ff 4c 8b 65 68 48 89 c5 eb b6 66 0f 1f 44 00 00 <41> 54 55 53 48 81 ec 90 00 00 00 44 8b 25 3a 10 7a 01 65 48 8b 04
[  463.533454] RSP: 0018:ffffb2e94045be88 EFLAGS: 00000287
[  463.534141] RAX: 000000000001300c RBX: 0000000000013010 RCX: ffff98941f2298e8
[  463.535078] RDX: 0000000000013011 RSI: 0000000000000286 RDI: ffffffff9bb3ba9c
[  463.536007] RBP: ffffffff9bb3baa8 R08: 000070675f756372 R09: 8080808080808080
[  463.536948] R10: ffffb2e940077d48 R11: fefefefefefefeff R12: ffffffff9bb3ba9c
[  463.537878] R13: 0000000000000000 R14: ffff98941f2298c0 R15: ffff98941f22e005
[  463.538810]  sync_rcu_do_polled_gp+0x39/0xc0
[  463.539382]  process_one_work+0x1ec/0x3b0
[  463.539917]  worker_thread+0x25/0x390
[  463.548472]  ? process_one_work+0x3b0/0x3b0
[  463.549193]  kthread+0xbd/0xe0
[  463.549730]  ? kthread_complete_and_exit+0x20/0x20
[  463.550548]  ret_from_fork+0x22/0x30
[  463.551167]  </TASK>


Could be this:

	while (!sync_exp_work_done(s))
		synchronize_rcu_expedited();

And if synchronize_rcu_expedited() -> rcu_blocking_is_gp() is true, then we
may run in a long loop withing the workqueue without a chance to report a QS,
expecially if we are running a non-preemptible kernel. This could be the cause
of the stalling grace period kthread.

rcu_blocking_is_gp() could be true if all other CPUs but the current one are
offline.

It's just a potential scenario, lemme check...

  reply	other threads:[~2022-03-11 11:33 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-10 21:56 Scenario TREE07 with CONFIG_PREEMPT_DYNAMIC=n? Paul E. McKenney
2022-03-10 22:41 ` Frederic Weisbecker
2022-03-10 22:52   ` Paul E. McKenney
2022-03-11 11:32     ` Frederic Weisbecker [this message]
2022-03-11 13:07     ` Frederic Weisbecker
2022-03-11 15:21       ` Paul E. McKenney
2022-03-11 15:46         ` Frederic Weisbecker
2022-03-11 16:06           ` Paul E. McKenney
2022-03-11 16:47             ` Paul E. McKenney
2022-03-11 17:26               ` Frederic Weisbecker
2022-03-11 17:33                 ` Paul E. McKenney
2022-03-11 11:08 ` Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220311113256.GB96127@lothringen \
    --to=frederic@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox