All of lore.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <frederic@kernel.org>
To: Amit Matityahu <amitmat@amazon.com>
Cc: tglx@kernel.org, anna-maria@linutronix.de,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org,
	dwmw@amazon.co.uk, jonnyc@amazon.com, abaransi@amazon.com,
	alonka@amazon.com, ronenk@amazon.com, farbere@amazon.com
Subject: Re: [PATCH] timers/migration: Fix livelock in tmigr_handle_remote_up()
Date: Thu, 4 Jun 2026 11:45:18 +0200	[thread overview]
Message-ID: <aiFJLiWDIVaMQOoV@localhost.localdomain> (raw)
In-Reply-To: <20260603170139.33628-1-amitmat@amazon.com>

Le Wed, Jun 03, 2026 at 05:01:39PM +0000, Amit Matityahu a écrit :
> tmigr_handle_remote_cpu() skips timer_expire_remote() when cpu ==
> smp_processor_id(), assuming the local softirq path already handled
> this CPU's timers.
> 
> This assumption breaks when jiffies advances between
> run_timer_base(BASE_GLOBAL) and tmigr_handle_remote() in the same
> softirq invocation - a timer expires after the wheel ran but before
> the hierarchy snapshot is taken.
> 
> The stranded timer is never collected,
> fetch_next_timer_interrupt_remote() keeps reporting it as expired,
> and the event is re-queued with expires == now on each iteration.
> The goto-again loop spins indefinitely.
> 
> Fix by calling timer_expire_remote() unconditionally.
> __run_timer_base() already returns early when there is nothing to
> expire, making this a no-op in the common case.
> 
> Fixes: 7ee988770326 ("timers: Implement the hierarchical pull model")
> Cc: stable@vger.kernel.org
> Reported-by: Alon Kariv <alonka@amazon.com>
> Cc: Jonathan Chocron <jonnyc@amazon.com>
> Cc: Akram Baransi <abaransi@amazon.com>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Signed-off-by: Amit Matityahu <amitmat@amazon.com>

That's quite serious indeed!

> ---
> 
> Questions for maintainers:
> 
> 1. What was the original rationale for the cpu != smp_processor_id()
>    check? There is no code comment, commit message explanation or anything
>    in the original patch's email discussion as to why
>    timer_expire_remote() is skipped for the local CPU.

The rationale was about assuming that such an expired timerqueue actually
reflected a timer that was handled locally already and so it could be safely
discarded. So we could spare some locking.

> 
> 2. There seems to be a design tension where a CPU can have timers
>    visible in the migration hierarchy while simultaneously running its
>    own local softirq. Is the expectation that run_timer_base() always
>    drains everything before tmigr_handle_remote() sees it, or should
>    the remote path handle local-CPU timers as a fallback?

That's not easy to defer all global timers handling to remote expiration
because the current CPU may or may not be the migrator.

> 
>  kernel/time/timer_migration.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c
> index 1d0d3a4058d5..298c34c942ae 100644
> --- a/kernel/time/timer_migration.c
> +++ b/kernel/time/timer_migration.c
> @@ -978,8 +978,7 @@ static void tmigr_handle_remote_cpu(unsigned int cpu, u64 now,
>  	/* Drop the lock to allow the remote CPU to exit idle */
>  	raw_spin_unlock_irq(&tmc->lock);
>  
> -	if (cpu != smp_processor_id())
> -		timer_expire_remote(cpu);
> +	timer_expire_remote(cpu);

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

Thanks!

>  
>  	/*
>  	 * Lock ordering needs to be preserved - timer_base locks before tmigr
> 
> base-commit: e43ffb69e0438cddd72aaa30898b4dc446f664f8
> -- 
> 2.47.3
> 

-- 
Frederic Weisbecker
SUSE Labs

  reply	other threads:[~2026-06-04  9:45 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-03 17:01 [PATCH] timers/migration: Fix livelock in tmigr_handle_remote_up() Amit Matityahu
2026-06-04  9:45 ` Frederic Weisbecker [this message]
2026-06-04 12:23   ` Thomas Gleixner
2026-06-04 12:38 ` [tip: timers/urgent] " tip-bot2 for Amit Matityahu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aiFJLiWDIVaMQOoV@localhost.localdomain \
    --to=frederic@kernel.org \
    --cc=abaransi@amazon.com \
    --cc=alonka@amazon.com \
    --cc=amitmat@amazon.com \
    --cc=anna-maria@linutronix.de \
    --cc=dwmw@amazon.co.uk \
    --cc=farbere@amazon.com \
    --cc=jonnyc@amazon.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ronenk@amazon.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.