public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH RESEND 0/2] tick/nohz: CPU cannot enter NOHZ idle balance state
@ 2025-08-21  4:27 Adam Li
  2025-08-21  4:27 ` [PATCH RESEND 1/2] tick/nohz: Fix wrong NOHZ idle CPU state Adam Li
  2025-08-21  4:27 ` [PATCH RESEND 2/2] tick/nohz: Trigger warning when CPU in wrong NOHZ idle state Adam Li
  0 siblings, 2 replies; 9+ messages in thread
From: Adam Li @ 2025-08-21  4:27 UTC (permalink / raw)
  To: anna-maria, frederic, tglx, mingo, peterz, juri.lelli,
	vincent.guittot, vschneid
  Cc: dietmar.eggemann, rostedt, bsegall, mgorman, cl, linux-kernel,
	patches, Adam Li

Valentin Schneider suggested to resend this patch and copy to
scheduler reviewers [1].

When running llama on arm64 server, some CPUs *keep* idle while others
are 100% busy. All CPUs are in 'nohz_full=' cpu list, and CONFIG_NO_HZ_FULL
is set. The server has 192 CPUs, with kernel option 'nohz_full=0-191'.

The problem is caused by two issues:
1) Some idle CPUs cannot be added to 'nohz.idle_cpus_mask'. This bug
is fixed by the first patch in this serial:
"tick/nohz: Fix wrong NOHZ idle CPU state".

2) Even if the idle CPUs are in 'nohz.idle_cpus_mask', no CPU can be
selected to do NOHZ idle load balancing because conditions in
find_new_ilb() is too strict. This issue is fixed by patch in [2].

We can see that the idle CPUs are not in nohz.idle_cpus_mask. The NOHZ
idle load balancing only considers CPUs in nohz.idle_cpus_mask. The ticks
on the idle CPUs are stopped and therefore period load balancing
is not triggered. Therefore the CPUs are not used and the
imbalance persists.

A CPU is added to nohz.idle_cpus_mask in:
do_idle()
   -> tick_nohz_idle_stop_tick()
      -> nohz_balance_enter_idle()

nohz_balance_enter_idle() depends on '!was_stopped' condition.
It looks 'was_stopped' is used to avoid duplicated calling
nohz_balance_enter_idle() and duplicated setting 'ts->idle_jiffies'.

When the CPU is in nohz_full mode, 'was_stopped' may alwasy be true.
The call path might be:

tick_nohz_full_stop_tick() /* stop tick and set TS_FLAG_STOPPED */
... ...
do_idle()
    -> tick_nohz_idle_stop_tick() /* was_stoppped == 1 */

The first patch "Fix wrong NOHZ idle CPU state" makes
nohz_balance_enter_idle() independent of '!was_stopped'. It is safe
since in nohz_balance_enter_idle(), there exists a condition check
'rq->nohz_tick_stopped' to avoid duplicated nohz.idle_cpus_mask setting.

The second patch "Trigger warning when CPU in wrong NOHZ idle state"
is for debug only. It is not intended to be merged. The patch can help
to reproduce the bug.

Warning is triggerred when CPU is in this 'wrong' state:
1) tick was already stopped before tick_nohz_idle_stop_tick()
   stops the tick
2) and CPU is not in nohz.idle_cpus_mask
3) and CPU is idle
4) and tick is stopped

When kernel booting on my system there is warning:
[   15.536604] WARNING: CPU: 1 PID: 0 at kernel/time/tick-sched.c:1230 tick_nohz_idle_stop_tick+0x148/0x160
[   15.550687] Modules linked in:
[   15.553731] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.17.0-rc1-cls-00002-g39cde4c0206e-dirty #109 VOLUNTARY
[   15.580390] pstate: 614000c9 (nZCv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
<snip>
[   15.703028] Call trace:
[   15.705462]  tick_nohz_idle_stop_tick+0x148/0x160 (P)
[   15.710502]  cpuidle_idle_call+0x118/0x1d0
[   15.714588]  do_idle+0xf4/0x100
[   15.717717]  cpu_startup_entry+0x40/0x50
[   15.721627]  secondary_start_kernel+0xe4/0x128
[   15.732745]  __secondary_switched+0xc0/0xc8

After the first patch, CPU is added to nohz.idle_cpus_mask.
NOHZ idle balancing can move task to this CPU.

Adam Li (2):
  tick/nohz: Fix wrong NOHZ idle CPU state
  tick/nohz: Trigger warning when CPU in wrong NOHZ idle state

Links
[1]: https://lore.kernel.org/all/xhsmho6sagz7p.mognet@vschneid-thinkpadt14sgen2i.remote.csb/
[2]: https://lore.kernel.org/all/20250819025720.14794-1-adamli@os.amperecomputing.com/

 include/linux/sched/nohz.h | 2 ++
 kernel/sched/fair.c        | 5 +++++
 kernel/time/tick-sched.c   | 8 +++++---
 3 files changed, 12 insertions(+), 3 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-02-11 23:19 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-21  4:27 [PATCH RESEND 0/2] tick/nohz: CPU cannot enter NOHZ idle balance state Adam Li
2025-08-21  4:27 ` [PATCH RESEND 1/2] tick/nohz: Fix wrong NOHZ idle CPU state Adam Li
2025-09-04 16:05   ` Frederic Weisbecker
2025-09-04 16:10     ` Christoph Lameter (Ampere)
2025-09-05 11:47       ` Frederic Weisbecker
2025-09-08 15:25         ` Christoph Lameter (Ampere)
2026-02-11 23:19     ` Shubhang Kaushik
2025-08-21  4:27 ` [PATCH RESEND 2/2] tick/nohz: Trigger warning when CPU in wrong NOHZ idle state Adam Li
2025-09-03  8:01   ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox