[PATCH 0/2] timers/workqueue: Add support for active CPU

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] timers/workqueue: Add support for active CPU
@ 2026-04-23  9:19 Partha Satapathy
  2026-04-23  9:19 ` [PATCH 1/2] timer: add add_timer_active_cpu() Partha Satapathy
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Partha Satapathy @ 2026-04-23  9:19 UTC (permalink / raw)
  To: partha.satapathy, anna-maria, frederic, tglx, linux-kernel, tj,
	jiangshanlai
  Cc: notify

From: Partha Sarathi Satapathy <partha.satapathy@oracle.com>

Hi,

Timers queued with add_timer_on() and delayed work queued with
queue_delayed_work_on() currently rely on the caller to ensure that the
target CPU remains online until the enqueue operation completes. In
practice, CPU hotplug can still race with that sequence and leave the
timer queued on an offline CPU, where it will not run until that CPU
comes back online.

For delayed work, this has a direct knock-on effect: if the backing
timer is stranded on an offline CPU, the work item is never queued for
execution until that CPU returns.

In many cases, the target CPU is chosen for locality and cache affinity
rather than as a strict execution requirement. Falling back to an active
CPU is preferable to leaving the timer or delayed work blocked on a dead
CPU. While callers can try to track CPU hotplug state themselves, that
does not close the race, and taking the hotplug lock around enqueue
operations is too expensive for this class of use.

This series adds opt-in helpers for that fallback behavior without
changing the semantics of the existing interfaces:

  - add_timer_active_cpu() queues a timer on the requested CPU only if
    the target CPU's timer base is active; otherwise it falls back to
    the current CPU.

  - queue_delayed_work_active_cpu() uses the new timer helper for the
    delayed timer path and updates dwork->cpu to reflect the CPU
    actually selected for the timer, so the work item is queued on the
    same active CPU.

The existing add_timer_on() and queue_delayed_work_on() behavior is left
unchanged for callers that require strict CPU placement.

Partha Sarathi Satapathy (2):
  timer: add add_timer_active_cpu()
  workqueue: add queue_delayed_work_active_cpu()

 include/linux/timer.h     |  1 +
 include/linux/workqueue.h |  3 ++
 kernel/time/timer.c       | 45 +++++++++++++++++-
 kernel/workqueue.c        | 96 +++++++++++++++++++++++++++++++++------
 4 files changed, 131 insertions(+), 14 deletions(-)

-- 
2.43.7

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] timer: add add_timer_active_cpu()
  2026-04-23  9:19 [PATCH 0/2] timers/workqueue: Add support for active CPU Partha Satapathy
@ 2026-04-23  9:19 ` Partha Satapathy
  2026-05-05 21:33   ` Thomas Gleixner
  2026-04-23  9:19 ` [PATCH 2/2] workqueue: add queue_delayed_work_active_cpu() Partha Satapathy
  2026-04-23 12:05 ` [PATCH 0/2] timers/workqueue: Add support for active CPU Frederic Weisbecker
  2 siblings, 1 reply; 6+ messages in thread
From: Partha Satapathy @ 2026-04-23  9:19 UTC (permalink / raw)
  To: partha.satapathy, anna-maria, frederic, tglx, linux-kernel, tj,
	jiangshanlai
  Cc: notify

From: Partha Sarathi Satapathy <partha.satapathy@oracle.com>

add_timer_on() can queue a timer on a CPU that goes offline before the
timer is enqueued. When that happens, the timer remains unserviced until
the CPU comes back online.

Callers can try to avoid that by checking CPU state themselves, but that
does not close the race with CPU hotplug. Taking the hotplug lock around
every enqueue is also too expensive for users that only want CPU-local
placement as a performance hint.

Add add_timer_active_cpu() for callers that want to queue a timer on a
specific CPU when that CPU is active, but otherwise fall back to an
active CPU. Implement this by teaching the enqueue path to verify that
the target timer base is active and, if not, requeue the timer on the
current CPU.

Leave add_timer_on() semantics unchanged for callers that require strict
CPU placement.

Signed-off-by: Partha Sarathi Satapathy <partha.satapathy@oracle.com>
---
 include/linux/timer.h |  1 +
 kernel/time/timer.c   | 45 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 62e1cea71125..8c771b367662 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -148,6 +148,7 @@ static inline int timer_pending(const struct timer_list * timer)
 }
 
 extern void add_timer_on(struct timer_list *timer, int cpu);
+extern void add_timer_active_cpu(struct timer_list *timer, int cpu);
 extern int mod_timer(struct timer_list *timer, unsigned long expires);
 extern int mod_timer_pending(struct timer_list *timer, unsigned long expires);
 extern int timer_reduce(struct timer_list *timer, unsigned long expires);
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 1f2364126894..c73a28701a31 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -260,6 +260,7 @@ struct timer_base {
 	bool			next_expiry_recalc;
 	bool			is_idle;
 	bool			timers_pending;
+	bool			is_active;
 	DECLARE_BITMAP(pending_map, WHEEL_SIZE);
 	struct hlist_head	vectors[WHEEL_SIZE];
 } ____cacheline_aligned;
@@ -1296,7 +1297,7 @@ EXPORT_SYMBOL(add_timer_global);
  *
  * See add_timer() for further details.
  */
-void add_timer_on(struct timer_list *timer, int cpu)
+static void __add_timer_on(struct timer_list *timer, int cpu, bool need_ol_cpu)
 {
 	struct timer_base *new_base, *base;
 	unsigned long flags;
@@ -1333,6 +1334,18 @@ void add_timer_on(struct timer_list *timer, int cpu)
 		WRITE_ONCE(timer->flags,
 			   (timer->flags & ~TIMER_BASEMASK) | cpu);
 	}
+#ifdef CONFIG_HOTPLUG_CPU
+	if (need_ol_cpu) {
+		if (!base->is_active) {
+			raw_spin_unlock(&base->lock);
+			base = this_cpu_ptr(&timer_bases[BASE_LOCAL]);
+			raw_spin_lock(&base->lock);
+			cpu = smp_processor_id();
+			WRITE_ONCE(timer->flags,
+				   (timer->flags & ~TIMER_BASEMASK) | cpu);
+		}
+	}
+#endif /* CONFIG_HOTPLUG_CPU */
 	forward_timer_base(base);
 
 	debug_timer_activate(timer);
@@ -1340,8 +1353,31 @@ void add_timer_on(struct timer_list *timer, int cpu)
 out_unlock:
 	raw_spin_unlock_irqrestore(&base->lock, flags);
 }
+
+void add_timer_on(struct timer_list *timer, int cpu)
+{
+	bool need_ol_cpu = false;
+
+	__add_timer_on(timer, cpu, need_ol_cpu);
+}
 EXPORT_SYMBOL_GPL(add_timer_on);
 
+/**
+ * add_timer_active_cpu - Start a timer on a particular CPU if online or current
+ * @timer:      The timer to be started
+ * @cpu:        The CPU to start it on
+ *
+ * This is like add_timer_on(), except that it queues the timer on the
+ * given CPU only when that CPU is online or on the current CPU.
+ */
+void add_timer_active_cpu(struct timer_list *timer, int cpu)
+{
+	bool need_ol_cpu = true;
+
+	__add_timer_on(timer, cpu, need_ol_cpu);
+}
+EXPORT_SYMBOL_GPL(add_timer_active_cpu);
+
 /**
  * __timer_delete - Internal function: Deactivate a timer
  * @timer:	The timer to be deactivated
@@ -2507,6 +2543,7 @@ int timers_prepare_cpu(unsigned int cpu)
 		base->next_expiry_recalc = false;
 		base->timers_pending = false;
 		base->is_idle = false;
+		base->is_active = true;
 	}
 	return 0;
 }
@@ -2535,6 +2572,11 @@ int timers_dead_cpu(unsigned int cpu)
 
 		WARN_ON_ONCE(old_base->running_timer);
 		old_base->running_timer = NULL;
+		/*
+		 * Make the dead CPU base unavailable to add_timer_on()
+		 * when the caller wants an active target CPU.
+		 */
+		old_base->is_active = false;
 
 		for (i = 0; i < WHEEL_SIZE; i++)
 			migrate_timer_list(new_base, old_base->vectors + i);
@@ -2559,6 +2601,7 @@ static void __init init_timer_cpu(int cpu)
 		raw_spin_lock_init(&base->lock);
 		base->clk = jiffies;
 		base->next_expiry = base->clk + TIMER_NEXT_MAX_DELTA;
+		base->is_active = true;
 		timer_base_init_expiry_lock(base);
 	}
 }
-- 
2.43.7


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] timer: add add_timer_active_cpu()
  2026-04-23  9:19 ` [PATCH 1/2] timer: add add_timer_active_cpu() Partha Satapathy
@ 2026-05-05 21:33   ` Thomas Gleixner
  0 siblings, 0 replies; 6+ messages in thread
From: Thomas Gleixner @ 2026-05-05 21:33 UTC (permalink / raw)
  To: Partha Satapathy, partha.satapathy, anna-maria, frederic,
	linux-kernel, tj, jiangshanlai
  Cc: notify

On Thu, Apr 23 2026 at 09:19, Partha Satapathy wrote:
> From: Partha Sarathi Satapathy <partha.satapathy@oracle.com>
>
> add_timer_on() can queue a timer on a CPU that goes offline before the
> timer is enqueued. When that happens, the timer remains unserviced until
> the CPU comes back online.
>
> Callers can try to avoid that by checking CPU state themselves, but that
> does not close the race with CPU hotplug. Taking the hotplug lock around
> every enqueue is also too expensive for users that only want CPU-local
> placement as a performance hint.
>
> Add add_timer_active_cpu() for callers that want to queue a timer on a

Aside of Frederic's comments this is a patently bad function name as it
suggests that the timer should be placed on any active CPU, which makes
it more confusing as the function has a @cpu argument.

> @@ -1296,7 +1297,7 @@ EXPORT_SYMBOL(add_timer_global);
>   *
>   * See add_timer() for further details.
>   */
> -void add_timer_on(struct timer_list *timer, int cpu)
> +static void __add_timer_on(struct timer_list *timer, int cpu, bool need_ol_cpu)

How is the kernel-doc comment still valid for that function?

Also 'need_ol_cpu' is the most ridiculous argument name I've seen in a
long time. My first read was: "Need [good] ol' CPU"

>  {
>  	struct timer_base *new_base, *base;
>  	unsigned long flags;
> @@ -1333,6 +1334,18 @@ void add_timer_on(struct timer_list *timer, int cpu)
>  		WRITE_ONCE(timer->flags,
>  			   (timer->flags & ~TIMER_BASEMASK) | cpu);
>  	}
> +#ifdef CONFIG_HOTPLUG_CPU
> +	if (need_ol_cpu) {

This #ifdeffery is pointless and can be replaced with

     if (IS_ENABLED(CONFIG...) && ...)

But that's just a cosmetic detail.

> +		if (!base->is_active) {
> +			raw_spin_unlock(&base->lock);
> +			base = this_cpu_ptr(&timer_bases[BASE_LOCAL]);
> +			raw_spin_lock(&base->lock);
> +			cpu = smp_processor_id();
> +			WRITE_ONCE(timer->flags,
> +				   (timer->flags & ~TIMER_BASEMASK) | cpu);

This is broken vs. the TIMER_MIGRATION semantics.

> +void add_timer_on(struct timer_list *timer, int cpu)
> +{
> +	bool need_ol_cpu = false;
> +
> +	__add_timer_on(timer, cpu, need_ol_cpu);
> +}
>  EXPORT_SYMBOL_GPL(add_timer_on);
>  
> +/**
> + * add_timer_active_cpu - Start a timer on a particular CPU if online or current
> + * @timer:      The timer to be started
> + * @cpu:        The CPU to start it on
> + *
> + * This is like add_timer_on(), except that it queues the timer on the
> + * given CPU only when that CPU is online or on the current CPU.
> + */
> +void add_timer_active_cpu(struct timer_list *timer, int cpu)
> +{
> +	bool need_ol_cpu = true;
> +
> +	__add_timer_on(timer, cpu, need_ol_cpu);
> +}
> +EXPORT_SYMBOL_GPL(add_timer_active_cpu);

We surely need more exports just for adding a misnamed argument.

> +
>  /**
>   * __timer_delete - Internal function: Deactivate a timer
>   * @timer:	The timer to be deactivated
> @@ -2507,6 +2543,7 @@ int timers_prepare_cpu(unsigned int cpu)
>  		base->next_expiry_recalc = false;
>  		base->timers_pending = false;
>  		base->is_idle = false;
> +		base->is_active = true;

That's just wrong. The base is active only when the CPU reaches online state.

> @@ -2535,6 +2572,11 @@ int timers_dead_cpu(unsigned int cpu)
>  
>  		WARN_ON_ONCE(old_base->running_timer);
>  		old_base->running_timer = NULL;
> +		/*
> +		 * Make the dead CPU base unavailable to add_timer_on()
> +		 * when the caller wants an active target CPU.
> +		 */
> +		old_base->is_active = false;

After a CPU went offline no timer should be queued on it anymore and
add_timer_on() clearly lacks a

      if (WARN_ON(cpu_offline(cpu)))
         return;

If at all what you really want is an interface, which

  1) allows to give a hint about the CPU to place the timer on

  2) checks whether the hinted CPU is online (or even active) and places
     if on it if so

  3) if #2 fails picks a CPU on the same node when available (online or
     active)

  4) Picks any online CPU (maybe prefer the calling CPU)

But let me look at your argument you provided to Frederic:

>  Commit 4c00f3b0dea023a9851908a501735c899b0772f9
>  ("net/rds: Add preferred_cpu option to rds_rdma.ko")
>  added `preferred_cpu` so follow-up work can be placed for locality and
>  performance, not because correctness requires that exact CPU.

If RDS fails to update the preferred CPU logic once that CPU goes
offline, then that's a clear bug in that code and should not be worked
around elsewhere.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 2/2] workqueue: add queue_delayed_work_active_cpu()
  2026-04-23  9:19 [PATCH 0/2] timers/workqueue: Add support for active CPU Partha Satapathy
  2026-04-23  9:19 ` [PATCH 1/2] timer: add add_timer_active_cpu() Partha Satapathy
@ 2026-04-23  9:19 ` Partha Satapathy
  2026-04-23 12:05 ` [PATCH 0/2] timers/workqueue: Add support for active CPU Frederic Weisbecker
  2 siblings, 0 replies; 6+ messages in thread
From: Partha Satapathy @ 2026-04-23  9:19 UTC (permalink / raw)
  To: partha.satapathy, anna-maria, frederic, tglx, linux-kernel, tj,
	jiangshanlai
  Cc: notify

From: Partha Sarathi Satapathy <partha.satapathy@oracle.com>

Delayed work queued with queue_delayed_work_on() inherits the same CPU
hotplug race as add_timer_on(): the backing timer can be enqueued on a
CPU that goes offline before the queueing completes, leaving the timer
unserviced until that CPU comes back online. As a result, the delayed
work item is never queued for execution.

Add queue_delayed_work_active_cpu() for callers that want delayed work
to target a specific CPU when that CPU is active, but fall back to an
active CPU otherwise.

For the delayed timer path, use add_timer_active_cpu() instead of
add_timer_on(). After enqueueing the timer, update dwork->cpu to match
the CPU recorded in the timer so that the work item is queued on the CPU
actually selected for the timer.

Leave queue_delayed_work_on() unchanged for callers that require the
existing strict behavior.

Signed-off-by: Partha Sarathi Satapathy <partha.satapathy@oracle.com>
---
 include/linux/workqueue.h |  3 ++
 kernel/workqueue.c        | 96 +++++++++++++++++++++++++++++++++------
 2 files changed, 86 insertions(+), 13 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index dabc351cc127..9799e44bca2b 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -596,6 +596,9 @@ extern bool queue_work_node(int node, struct workqueue_struct *wq,
 			    struct work_struct *work);
 extern bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
 			struct delayed_work *work, unsigned long delay);
+extern bool queue_delayed_work_active_cpu(int cpu, struct workqueue_struct *wq,
+					  struct delayed_work *work,
+					  unsigned long delay);
 extern bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq,
 			struct delayed_work *dwork, unsigned long delay);
 extern bool queue_rcu_work(struct workqueue_struct *wq, struct rcu_work *rwork);
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 253311af47c6..80f5d162624d 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2483,7 +2483,8 @@ void delayed_work_timer_fn(struct timer_list *t)
 EXPORT_SYMBOL(delayed_work_timer_fn);
 
 static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
-				struct delayed_work *dwork, unsigned long delay)
+				struct delayed_work *dwork, unsigned long delay,
+				bool dwork_active_cpu)
 {
 	struct timer_list *timer = &dwork->timer;
 	struct work_struct *work = &dwork->work;
@@ -2504,7 +2505,8 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
 		return;
 	}
 
-	WARN_ON_ONCE(cpu != WORK_CPU_UNBOUND && !cpu_online(cpu));
+	WARN_ON_ONCE(!dwork_active_cpu && cpu != WORK_CPU_UNBOUND &&
+		     !cpu_online(cpu));
 	dwork->wq = wq;
 	dwork->cpu = cpu;
 	timer->expires = jiffies + delay;
@@ -2516,10 +2518,17 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
 			cpu = housekeeping_any_cpu(HK_TYPE_TIMER);
 		add_timer_on(timer, cpu);
 	} else {
-		if (likely(cpu == WORK_CPU_UNBOUND))
+		if (likely(cpu == WORK_CPU_UNBOUND)) {
 			add_timer_global(timer);
-		else
+		} else if (dwork_active_cpu) {
+			add_timer_active_cpu(timer, cpu);
+			/* The target CPU may change if it is offline. */
+			/* Avoid selecting the same offline CPU. */
+			dwork->cpu = READ_ONCE(timer->flags) &
+				TIMER_CPUMASK;
+		} else {
 			add_timer_on(timer, cpu);
+		}
 	}
 }
 
@@ -2530,18 +2539,23 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
  * @dwork: work to queue
  * @delay: number of jiffies to wait before queueing
  *
- * We queue the delayed_work to a specific CPU, for non-zero delays the
- * caller must ensure it is online and can't go away. Callers that fail
- * to ensure this, may get @dwork->timer queued to an offlined CPU and
- * this will prevent queueing of @dwork->work unless the offlined CPU
- * becomes online again.
+ * We queue the delayed_work to a specific CPU, for non-zero delays and
+ * dwork_active_cpu is not set,  caller must ensure it is online and can't
+ * go away. Callers that fail to ensure this, may get @dwork->timer
+ * queued to an offlined CPU and this will prevent queueing of
+ * @dwork->work unless the offlined CPU becomes online again.
+ *
+ * If dwork_active_cpu is set, timer queued to an offlined CPU will be
+ * queued to the current cpu.
  *
  * Return: %false if @work was already on a queue, %true otherwise.  If
  * @delay is zero and @dwork is idle, it will be scheduled for immediate
  * execution.
  */
-bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
-			   struct delayed_work *dwork, unsigned long delay)
+static inline bool
+__queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
+			struct delayed_work *dwork, unsigned long delay,
+			bool dwork_active_cpu)
 {
 	struct work_struct *work = &dwork->work;
 	bool ret = false;
@@ -2552,15 +2566,70 @@ bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
 
 	if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work)) &&
 	    !clear_pending_if_disabled(work)) {
-		__queue_delayed_work(cpu, wq, dwork, delay);
+		__queue_delayed_work(cpu, wq, dwork, delay, dwork_active_cpu);
 		ret = true;
 	}
 
 	local_irq_restore(irq_flags);
 	return ret;
 }
+
+/**
+ * queue_delayed_work_on - queue work on specific CPU after delay
+ * @cpu: CPU number to execute work on
+ * @wq: workqueue to use
+ * @dwork: work to queue
+ * @delay: number of jiffies to wait before queueing
+ *
+ * We queue the delayed_work to a specific CPU, for non-zero delays the
+ * caller must ensure it is online and can't go away. Callers that fail
+ * to ensure this, may get @dwork->timer queued to an offlined CPU and
+ * this will prevent queueing of @dwork->work unless the offlined CPU
+ * becomes online again.
+ *
+ * Return: %false if @work was already on a queue, %true otherwise.  If
+ * @delay is zero and @dwork is idle, it will be scheduled for immediate
+ * execution.
+ */
+bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
+			   struct delayed_work *dwork, unsigned long delay)
+{
+	bool ret = false;
+	bool dwork_active_cpu = false;
+
+	ret = __queue_delayed_work_on(cpu, wq, dwork, delay, dwork_active_cpu);
+	return ret;
+}
 EXPORT_SYMBOL(queue_delayed_work_on);
 
+/**
+ * queue_delayed_work_active_cpu - queue delayed work for an active CPU
+ * @cpu: CPU number to execute work on
+ * @wq: workqueue to use
+ * @dwork: work to queue
+ * @delay: number of jiffies to wait before queueing
+ *
+ * This is like queue_delayed_work_on(), except that for delayed work
+ * the timer is queued on @cpu only if that CPU is online or is the
+ * current CPU.
+ *
+ * Return: %false if @work was already on a queue, %true otherwise.  If
+ * @delay is zero and @dwork is idle, it will be scheduled for immediate
+ * execution.
+ *
+ */
+bool queue_delayed_work_active_cpu(int cpu, struct workqueue_struct *wq,
+				   struct delayed_work *dwork,
+				   unsigned long delay)
+{
+	bool ret = false;
+	bool dwork_active_cpu = true;
+
+	ret = __queue_delayed_work_on(cpu, wq, dwork, delay, dwork_active_cpu);
+	return ret;
+}
+EXPORT_SYMBOL(queue_delayed_work_active_cpu);
+
 /**
  * mod_delayed_work_on - modify delay of or queue a delayed work on specific CPU
  * @cpu: CPU number to execute work on
@@ -2584,11 +2653,12 @@ bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq,
 {
 	unsigned long irq_flags;
 	bool ret;
+	bool dwork_active_cpu = false;
 
 	ret = work_grab_pending(&dwork->work, WORK_CANCEL_DELAYED, &irq_flags);
 
 	if (!clear_pending_if_disabled(&dwork->work))
-		__queue_delayed_work(cpu, wq, dwork, delay);
+		__queue_delayed_work(cpu, wq, dwork, delay, dwork_active_cpu);
 
 	local_irq_restore(irq_flags);
 	return ret;
-- 
2.43.7


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] timers/workqueue: Add support for active CPU
  2026-04-23  9:19 [PATCH 0/2] timers/workqueue: Add support for active CPU Partha Satapathy
  2026-04-23  9:19 ` [PATCH 1/2] timer: add add_timer_active_cpu() Partha Satapathy
  2026-04-23  9:19 ` [PATCH 2/2] workqueue: add queue_delayed_work_active_cpu() Partha Satapathy
@ 2026-04-23 12:05 ` Frederic Weisbecker
  2026-04-27 16:52   ` [External] : " Partha Satapathy
  2 siblings, 1 reply; 6+ messages in thread
From: Frederic Weisbecker @ 2026-04-23 12:05 UTC (permalink / raw)
  To: Partha Satapathy; +Cc: anna-maria, tglx, linux-kernel, tj, jiangshanlai, notify

Hi,

Le Thu, Apr 23, 2026 at 09:19:05AM +0000, Partha Satapathy a écrit :
> From: Partha Sarathi Satapathy <partha.satapathy@oracle.com>
> 
> Hi,
> 
> Timers queued with add_timer_on() and delayed work queued with
> queue_delayed_work_on() currently rely on the caller to ensure that the
> target CPU remains online until the enqueue operation completes. In
> practice, CPU hotplug can still race with that sequence and leave the
> timer queued on an offline CPU, where it will not run until that CPU
> comes back online.
> 
> For delayed work, this has a direct knock-on effect: if the backing
> timer is stranded on an offline CPU, the work item is never queued for
> execution until that CPU returns.
> 
> In many cases, the target CPU is chosen for locality and cache affinity
> rather than as a strict execution requirement. Falling back to an active
> CPU is preferable to leaving the timer or delayed work blocked on a dead
> CPU. While callers can try to track CPU hotplug state themselves, that
> does not close the race, and taking the hotplug lock around enqueue
> operations is too expensive for this class of use.
> 
> This series adds opt-in helpers for that fallback behavior without
> changing the semantics of the existing interfaces:
> 
>   - add_timer_active_cpu() queues a timer on the requested CPU only if
>     the target CPU's timer base is active; otherwise it falls back to
>     the current CPU.
> 
>   - queue_delayed_work_active_cpu() uses the new timer helper for the
>     delayed timer path and updates dwork->cpu to reflect the CPU
>     actually selected for the timer, so the work item is queued on the
>     same active CPU.
> 
> The existing add_timer_on() and queue_delayed_work_on() behavior is left
> unchanged for callers that require strict CPU placement.

Timers are migrated when CPUs go offline. So the problem is queueing
a timer to an offline CPU. It should be the responsibility of a subsystem
to synchronize with CPU hotplug in order to avoid that.

As for timers that are queued locally not for correctness but for performance
reasons, do we know such example?

Thanks.

-- 
Frederic Weisbecker
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [External] : Re: [PATCH 0/2] timers/workqueue: Add support for active CPU
  2026-04-23 12:05 ` [PATCH 0/2] timers/workqueue: Add support for active CPU Frederic Weisbecker
@ 2026-04-27 16:52   ` Partha Satapathy
  0 siblings, 0 replies; 6+ messages in thread
From: Partha Satapathy @ 2026-04-27 16:52 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: anna-maria, tglx, linux-kernel, tj, jiangshanlai, notify

Thanks for the feedback,
One concrete example is our downstream RDS/RDMA code.

Commit 4c00f3b0dea023a9851908a501735c899b0772f9
("net/rds: Add preferred_cpu option to rds_rdma.ko")
added `preferred_cpu` so follow-up work can be placed for locality and
performance, not because correctness requires that exact CPU.

The motivation was to keep processing close to where the completion was
observed. The aim is to keep the workers near the CPU where the HCA's
queues are mapped, so completion handling and follow-up work remain
local. With `preferred_cpu=cq`, work is steered toward the CPU
matching the CQ interrupt affinity, which helps latency and cache
locality. With `preferred_cpu=numa`, work is kept on CPUs local to the
HCA's NUMA node, which helps scale-out while preserving locality to the
device and memory.

  So here the chosen CPU is only a preference. If that CPU goes offline
in the window between selection and queueing, the delayed timer can end
up queued on a CPU that is no longer active, and execution is then
postponed until that CPU comes back.

  For this class of user, fallback to an active CPU is preferable,
because the placement is an optimization, not a correctness requirement.

  I also do not think this is specific to RDS. In general, network and
RDMA drivers often choose CPUs based on IRQ affinity, queue affinity, or
NUMA locality to improve latency and throughput, while still being able
to run on any online CPU. For that kind of workload, "active if
possible, fallback otherwise" looks like a useful generic behavior.

  That said, I agree that a downstream-only example by itself may not be
enough to justify a mainline core API change. I mentioned RDS mainly to
answer your question about whether there are real users that choose CPUs
for performance/locality rather than strict correctness.

On 23-04-2026 17:35, Frederic Weisbecker wrote:
> Hi,
> 
> Le Thu, Apr 23, 2026 at 09:19:05AM +0000, Partha Satapathy a écrit :
>> From: Partha Sarathi Satapathy <partha.satapathy@oracle.com>
>>
>> Hi,
>>
>> Timers queued with add_timer_on() and delayed work queued with
>> queue_delayed_work_on() currently rely on the caller to ensure that the
>> target CPU remains online until the enqueue operation completes. In
>> practice, CPU hotplug can still race with that sequence and leave the
>> timer queued on an offline CPU, where it will not run until that CPU
>> comes back online.
>>
>> For delayed work, this has a direct knock-on effect: if the backing
>> timer is stranded on an offline CPU, the work item is never queued for
>> execution until that CPU returns.
>>
>> In many cases, the target CPU is chosen for locality and cache affinity
>> rather than as a strict execution requirement. Falling back to an active
>> CPU is preferable to leaving the timer or delayed work blocked on a dead
>> CPU. While callers can try to track CPU hotplug state themselves, that
>> does not close the race, and taking the hotplug lock around enqueue
>> operations is too expensive for this class of use.
>>
>> This series adds opt-in helpers for that fallback behavior without
>> changing the semantics of the existing interfaces:
>>
>>   - add_timer_active_cpu() queues a timer on the requested CPU only if
>>     the target CPU's timer base is active; otherwise it falls back to
>>     the current CPU.
>>
>>   - queue_delayed_work_active_cpu() uses the new timer helper for the
>>     delayed timer path and updates dwork->cpu to reflect the CPU
>>     actually selected for the timer, so the work item is queued on the
>>     same active CPU.
>>
>> The existing add_timer_on() and queue_delayed_work_on() behavior is left
>> unchanged for callers that require strict CPU placement.
> 
> Timers are migrated when CPUs go offline. So the problem is queueing
> a timer to an offline CPU. It should be the responsibility of a subsystem
> to synchronize with CPU hotplug in order to avoid that.
> 
> As for timers that are queued locally not for correctness but for performance
> reasons, do we know such example?
> 
> Thanks.
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-05-05 21:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-23  9:19 [PATCH 0/2] timers/workqueue: Add support for active CPU Partha Satapathy
2026-04-23  9:19 ` [PATCH 1/2] timer: add add_timer_active_cpu() Partha Satapathy
2026-05-05 21:33   ` Thomas Gleixner
2026-04-23  9:19 ` [PATCH 2/2] workqueue: add queue_delayed_work_active_cpu() Partha Satapathy
2026-04-23 12:05 ` [PATCH 0/2] timers/workqueue: Add support for active CPU Frederic Weisbecker
2026-04-27 16:52   ` [External] : " Partha Satapathy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.