The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH] timers/migration: Fix livelock in tmigr_handle_remote_up()
@ 2026-06-03 17:01 Amit Matityahu
  2026-06-04  9:45 ` Frederic Weisbecker
  2026-06-04 12:38 ` [tip: timers/urgent] " tip-bot2 for Amit Matityahu
  0 siblings, 2 replies; 4+ messages in thread
From: Amit Matityahu @ 2026-06-03 17:01 UTC (permalink / raw)
  To: tglx, anna-maria, frederic
  Cc: linux-kernel, stable, dwmw, jonnyc, abaransi, alonka, ronenk,
	farbere

tmigr_handle_remote_cpu() skips timer_expire_remote() when cpu ==
smp_processor_id(), assuming the local softirq path already handled
this CPU's timers.

This assumption breaks when jiffies advances between
run_timer_base(BASE_GLOBAL) and tmigr_handle_remote() in the same
softirq invocation - a timer expires after the wheel ran but before
the hierarchy snapshot is taken.

The stranded timer is never collected,
fetch_next_timer_interrupt_remote() keeps reporting it as expired,
and the event is re-queued with expires == now on each iteration.
The goto-again loop spins indefinitely.

Fix by calling timer_expire_remote() unconditionally.
__run_timer_base() already returns early when there is nothing to
expire, making this a no-op in the common case.

Fixes: 7ee988770326 ("timers: Implement the hierarchical pull model")
Cc: stable@vger.kernel.org
Reported-by: Alon Kariv <alonka@amazon.com>
Cc: Jonathan Chocron <jonnyc@amazon.com>
Cc: Akram Baransi <abaransi@amazon.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Amit Matityahu <amitmat@amazon.com>
---

Questions for maintainers:

1. What was the original rationale for the cpu != smp_processor_id()
   check? There is no code comment, commit message explanation or anything
   in the original patch's email discussion as to why
   timer_expire_remote() is skipped for the local CPU.

2. There seems to be a design tension where a CPU can have timers
   visible in the migration hierarchy while simultaneously running its
   own local softirq. Is the expectation that run_timer_base() always
   drains everything before tmigr_handle_remote() sees it, or should
   the remote path handle local-CPU timers as a fallback?

 kernel/time/timer_migration.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c
index 1d0d3a4058d5..298c34c942ae 100644
--- a/kernel/time/timer_migration.c
+++ b/kernel/time/timer_migration.c
@@ -978,8 +978,7 @@ static void tmigr_handle_remote_cpu(unsigned int cpu, u64 now,
 	/* Drop the lock to allow the remote CPU to exit idle */
 	raw_spin_unlock_irq(&tmc->lock);
 
-	if (cpu != smp_processor_id())
-		timer_expire_remote(cpu);
+	timer_expire_remote(cpu);
 
 	/*
 	 * Lock ordering needs to be preserved - timer_base locks before tmigr

base-commit: e43ffb69e0438cddd72aaa30898b4dc446f664f8
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-04 12:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-03 17:01 [PATCH] timers/migration: Fix livelock in tmigr_handle_remote_up() Amit Matityahu
2026-06-04  9:45 ` Frederic Weisbecker
2026-06-04 12:23   ` Thomas Gleixner
2026-06-04 12:38 ` [tip: timers/urgent] " tip-bot2 for Amit Matityahu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox