The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Frederic Weisbecker <frederic@kernel.org>
To: Amit Matityahu <amitmat@amazon.com>
Cc: tglx@kernel.org, anna-maria@linutronix.de,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org,
	dwmw@amazon.co.uk, jonnyc@amazon.com, abaransi@amazon.com,
	alonka@amazon.com, ronenk@amazon.com, farbere@amazon.com
Subject: Re: [PATCH] timers/migration: Fix livelock in tmigr_handle_remote_up()
Date: Thu, 4 Jun 2026 11:45:18 +0200	[thread overview]
Message-ID: <aiFJLiWDIVaMQOoV@localhost.localdomain> (raw)
In-Reply-To: <20260603170139.33628-1-amitmat@amazon.com>

Le Wed, Jun 03, 2026 at 05:01:39PM +0000, Amit Matityahu a écrit :
> tmigr_handle_remote_cpu() skips timer_expire_remote() when cpu ==
> smp_processor_id(), assuming the local softirq path already handled
> this CPU's timers.
> 
> This assumption breaks when jiffies advances between
> run_timer_base(BASE_GLOBAL) and tmigr_handle_remote() in the same
> softirq invocation - a timer expires after the wheel ran but before
> the hierarchy snapshot is taken.
> 
> The stranded timer is never collected,
> fetch_next_timer_interrupt_remote() keeps reporting it as expired,
> and the event is re-queued with expires == now on each iteration.
> The goto-again loop spins indefinitely.
> 
> Fix by calling timer_expire_remote() unconditionally.
> __run_timer_base() already returns early when there is nothing to
> expire, making this a no-op in the common case.
> 
> Fixes: 7ee988770326 ("timers: Implement the hierarchical pull model")
> Cc: stable@vger.kernel.org
> Reported-by: Alon Kariv <alonka@amazon.com>
> Cc: Jonathan Chocron <jonnyc@amazon.com>
> Cc: Akram Baransi <abaransi@amazon.com>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Signed-off-by: Amit Matityahu <amitmat@amazon.com>

That's quite serious indeed!

> ---
> 
> Questions for maintainers:
> 
> 1. What was the original rationale for the cpu != smp_processor_id()
>    check? There is no code comment, commit message explanation or anything
>    in the original patch's email discussion as to why
>    timer_expire_remote() is skipped for the local CPU.

The rationale was about assuming that such an expired timerqueue actually
reflected a timer that was handled locally already and so it could be safely
discarded. So we could spare some locking.

> 
> 2. There seems to be a design tension where a CPU can have timers
>    visible in the migration hierarchy while simultaneously running its
>    own local softirq. Is the expectation that run_timer_base() always
>    drains everything before tmigr_handle_remote() sees it, or should
>    the remote path handle local-CPU timers as a fallback?

That's not easy to defer all global timers handling to remote expiration
because the current CPU may or may not be the migrator.

> 
>  kernel/time/timer_migration.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c
> index 1d0d3a4058d5..298c34c942ae 100644
> --- a/kernel/time/timer_migration.c
> +++ b/kernel/time/timer_migration.c
> @@ -978,8 +978,7 @@ static void tmigr_handle_remote_cpu(unsigned int cpu, u64 now,
>  	/* Drop the lock to allow the remote CPU to exit idle */
>  	raw_spin_unlock_irq(&tmc->lock);
>  
> -	if (cpu != smp_processor_id())
> -		timer_expire_remote(cpu);
> +	timer_expire_remote(cpu);

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

Thanks!

>  
>  	/*
>  	 * Lock ordering needs to be preserved - timer_base locks before tmigr
> 
> base-commit: e43ffb69e0438cddd72aaa30898b4dc446f664f8
> -- 
> 2.47.3
> 

-- 
Frederic Weisbecker
SUSE Labs

  reply	other threads:[~2026-06-04  9:45 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-03 17:01 [PATCH] timers/migration: Fix livelock in tmigr_handle_remote_up() Amit Matityahu
2026-06-04  9:45 ` Frederic Weisbecker [this message]
2026-06-04 12:23   ` Thomas Gleixner
2026-06-04 12:38 ` [tip: timers/urgent] " tip-bot2 for Amit Matityahu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aiFJLiWDIVaMQOoV@localhost.localdomain \
    --to=frederic@kernel.org \
    --cc=abaransi@amazon.com \
    --cc=alonka@amazon.com \
    --cc=amitmat@amazon.com \
    --cc=anna-maria@linutronix.de \
    --cc=dwmw@amazon.co.uk \
    --cc=farbere@amazon.com \
    --cc=jonnyc@amazon.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ronenk@amazon.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox