All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] timers/migration: Fix livelock in tmigr_handle_remote_up()
@ 2026-06-03 17:01 Amit Matityahu
  2026-06-04  9:45 ` Frederic Weisbecker
  2026-06-04 12:38 ` [tip: timers/urgent] " tip-bot2 for Amit Matityahu
  0 siblings, 2 replies; 4+ messages in thread
From: Amit Matityahu @ 2026-06-03 17:01 UTC (permalink / raw)
  To: tglx, anna-maria, frederic
  Cc: linux-kernel, stable, dwmw, jonnyc, abaransi, alonka, ronenk,
	farbere

tmigr_handle_remote_cpu() skips timer_expire_remote() when cpu ==
smp_processor_id(), assuming the local softirq path already handled
this CPU's timers.

This assumption breaks when jiffies advances between
run_timer_base(BASE_GLOBAL) and tmigr_handle_remote() in the same
softirq invocation - a timer expires after the wheel ran but before
the hierarchy snapshot is taken.

The stranded timer is never collected,
fetch_next_timer_interrupt_remote() keeps reporting it as expired,
and the event is re-queued with expires == now on each iteration.
The goto-again loop spins indefinitely.

Fix by calling timer_expire_remote() unconditionally.
__run_timer_base() already returns early when there is nothing to
expire, making this a no-op in the common case.

Fixes: 7ee988770326 ("timers: Implement the hierarchical pull model")
Cc: stable@vger.kernel.org
Reported-by: Alon Kariv <alonka@amazon.com>
Cc: Jonathan Chocron <jonnyc@amazon.com>
Cc: Akram Baransi <abaransi@amazon.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Amit Matityahu <amitmat@amazon.com>
---

Questions for maintainers:

1. What was the original rationale for the cpu != smp_processor_id()
   check? There is no code comment, commit message explanation or anything
   in the original patch's email discussion as to why
   timer_expire_remote() is skipped for the local CPU.

2. There seems to be a design tension where a CPU can have timers
   visible in the migration hierarchy while simultaneously running its
   own local softirq. Is the expectation that run_timer_base() always
   drains everything before tmigr_handle_remote() sees it, or should
   the remote path handle local-CPU timers as a fallback?

 kernel/time/timer_migration.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c
index 1d0d3a4058d5..298c34c942ae 100644
--- a/kernel/time/timer_migration.c
+++ b/kernel/time/timer_migration.c
@@ -978,8 +978,7 @@ static void tmigr_handle_remote_cpu(unsigned int cpu, u64 now,
 	/* Drop the lock to allow the remote CPU to exit idle */
 	raw_spin_unlock_irq(&tmc->lock);
 
-	if (cpu != smp_processor_id())
-		timer_expire_remote(cpu);
+	timer_expire_remote(cpu);
 
 	/*
 	 * Lock ordering needs to be preserved - timer_base locks before tmigr

base-commit: e43ffb69e0438cddd72aaa30898b4dc446f664f8
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-04 12:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-03 17:01 [PATCH] timers/migration: Fix livelock in tmigr_handle_remote_up() Amit Matityahu
2026-06-04  9:45 ` Frederic Weisbecker
2026-06-04 12:23   ` Thomas Gleixner
2026-06-04 12:38 ` [tip: timers/urgent] " tip-bot2 for Amit Matityahu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.