From: Frederic Weisbecker <frederic@kernel.org>
To: Amit Matityahu <amitmat@amazon.com>
Cc: tglx@kernel.org, anna-maria@linutronix.de,
linux-kernel@vger.kernel.org, stable@vger.kernel.org,
dwmw@amazon.co.uk, jonnyc@amazon.com, abaransi@amazon.com,
alonka@amazon.com, ronenk@amazon.com, farbere@amazon.com
Subject: Re: [PATCH] timers/migration: Fix livelock in tmigr_handle_remote_up()
Date: Thu, 4 Jun 2026 11:45:18 +0200 [thread overview]
Message-ID: <aiFJLiWDIVaMQOoV@localhost.localdomain> (raw)
In-Reply-To: <20260603170139.33628-1-amitmat@amazon.com>
Le Wed, Jun 03, 2026 at 05:01:39PM +0000, Amit Matityahu a écrit :
> tmigr_handle_remote_cpu() skips timer_expire_remote() when cpu ==
> smp_processor_id(), assuming the local softirq path already handled
> this CPU's timers.
>
> This assumption breaks when jiffies advances between
> run_timer_base(BASE_GLOBAL) and tmigr_handle_remote() in the same
> softirq invocation - a timer expires after the wheel ran but before
> the hierarchy snapshot is taken.
>
> The stranded timer is never collected,
> fetch_next_timer_interrupt_remote() keeps reporting it as expired,
> and the event is re-queued with expires == now on each iteration.
> The goto-again loop spins indefinitely.
>
> Fix by calling timer_expire_remote() unconditionally.
> __run_timer_base() already returns early when there is nothing to
> expire, making this a no-op in the common case.
>
> Fixes: 7ee988770326 ("timers: Implement the hierarchical pull model")
> Cc: stable@vger.kernel.org
> Reported-by: Alon Kariv <alonka@amazon.com>
> Cc: Jonathan Chocron <jonnyc@amazon.com>
> Cc: Akram Baransi <abaransi@amazon.com>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Signed-off-by: Amit Matityahu <amitmat@amazon.com>
That's quite serious indeed!
> ---
>
> Questions for maintainers:
>
> 1. What was the original rationale for the cpu != smp_processor_id()
> check? There is no code comment, commit message explanation or anything
> in the original patch's email discussion as to why
> timer_expire_remote() is skipped for the local CPU.
The rationale was about assuming that such an expired timerqueue actually
reflected a timer that was handled locally already and so it could be safely
discarded. So we could spare some locking.
>
> 2. There seems to be a design tension where a CPU can have timers
> visible in the migration hierarchy while simultaneously running its
> own local softirq. Is the expectation that run_timer_base() always
> drains everything before tmigr_handle_remote() sees it, or should
> the remote path handle local-CPU timers as a fallback?
That's not easy to defer all global timers handling to remote expiration
because the current CPU may or may not be the migrator.
>
> kernel/time/timer_migration.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c
> index 1d0d3a4058d5..298c34c942ae 100644
> --- a/kernel/time/timer_migration.c
> +++ b/kernel/time/timer_migration.c
> @@ -978,8 +978,7 @@ static void tmigr_handle_remote_cpu(unsigned int cpu, u64 now,
> /* Drop the lock to allow the remote CPU to exit idle */
> raw_spin_unlock_irq(&tmc->lock);
>
> - if (cpu != smp_processor_id())
> - timer_expire_remote(cpu);
> + timer_expire_remote(cpu);
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Thanks!
>
> /*
> * Lock ordering needs to be preserved - timer_base locks before tmigr
>
> base-commit: e43ffb69e0438cddd72aaa30898b4dc446f664f8
> --
> 2.47.3
>
--
Frederic Weisbecker
SUSE Labs
next prev parent reply other threads:[~2026-06-04 9:45 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-03 17:01 [PATCH] timers/migration: Fix livelock in tmigr_handle_remote_up() Amit Matityahu
2026-06-04 9:45 ` Frederic Weisbecker [this message]
2026-06-04 12:23 ` Thomas Gleixner
2026-06-04 12:38 ` [tip: timers/urgent] " tip-bot2 for Amit Matityahu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aiFJLiWDIVaMQOoV@localhost.localdomain \
--to=frederic@kernel.org \
--cc=abaransi@amazon.com \
--cc=alonka@amazon.com \
--cc=amitmat@amazon.com \
--cc=anna-maria@linutronix.de \
--cc=dwmw@amazon.co.uk \
--cc=farbere@amazon.com \
--cc=jonnyc@amazon.com \
--cc=linux-kernel@vger.kernel.org \
--cc=ronenk@amazon.com \
--cc=stable@vger.kernel.org \
--cc=tglx@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox