* [PATCH 0/2] timers/workqueue: Add support for active CPU
@ 2026-04-23 9:19 Partha Satapathy
2026-04-23 9:19 ` [PATCH 1/2] timer: add add_timer_active_cpu() Partha Satapathy
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Partha Satapathy @ 2026-04-23 9:19 UTC (permalink / raw)
To: partha.satapathy, anna-maria, frederic, tglx, linux-kernel, tj,
jiangshanlai
Cc: notify
From: Partha Sarathi Satapathy <partha.satapathy@oracle.com>
Hi,
Timers queued with add_timer_on() and delayed work queued with
queue_delayed_work_on() currently rely on the caller to ensure that the
target CPU remains online until the enqueue operation completes. In
practice, CPU hotplug can still race with that sequence and leave the
timer queued on an offline CPU, where it will not run until that CPU
comes back online.
For delayed work, this has a direct knock-on effect: if the backing
timer is stranded on an offline CPU, the work item is never queued for
execution until that CPU returns.
In many cases, the target CPU is chosen for locality and cache affinity
rather than as a strict execution requirement. Falling back to an active
CPU is preferable to leaving the timer or delayed work blocked on a dead
CPU. While callers can try to track CPU hotplug state themselves, that
does not close the race, and taking the hotplug lock around enqueue
operations is too expensive for this class of use.
This series adds opt-in helpers for that fallback behavior without
changing the semantics of the existing interfaces:
- add_timer_active_cpu() queues a timer on the requested CPU only if
the target CPU's timer base is active; otherwise it falls back to
the current CPU.
- queue_delayed_work_active_cpu() uses the new timer helper for the
delayed timer path and updates dwork->cpu to reflect the CPU
actually selected for the timer, so the work item is queued on the
same active CPU.
The existing add_timer_on() and queue_delayed_work_on() behavior is left
unchanged for callers that require strict CPU placement.
Partha Sarathi Satapathy (2):
timer: add add_timer_active_cpu()
workqueue: add queue_delayed_work_active_cpu()
include/linux/timer.h | 1 +
include/linux/workqueue.h | 3 ++
kernel/time/timer.c | 45 +++++++++++++++++-
kernel/workqueue.c | 96 +++++++++++++++++++++++++++++++++------
4 files changed, 131 insertions(+), 14 deletions(-)
--
2.43.7
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH 1/2] timer: add add_timer_active_cpu() 2026-04-23 9:19 [PATCH 0/2] timers/workqueue: Add support for active CPU Partha Satapathy @ 2026-04-23 9:19 ` Partha Satapathy 2026-05-05 21:33 ` Thomas Gleixner 2026-04-23 9:19 ` [PATCH 2/2] workqueue: add queue_delayed_work_active_cpu() Partha Satapathy 2026-04-23 12:05 ` [PATCH 0/2] timers/workqueue: Add support for active CPU Frederic Weisbecker 2 siblings, 1 reply; 6+ messages in thread From: Partha Satapathy @ 2026-04-23 9:19 UTC (permalink / raw) To: partha.satapathy, anna-maria, frederic, tglx, linux-kernel, tj, jiangshanlai Cc: notify From: Partha Sarathi Satapathy <partha.satapathy@oracle.com> add_timer_on() can queue a timer on a CPU that goes offline before the timer is enqueued. When that happens, the timer remains unserviced until the CPU comes back online. Callers can try to avoid that by checking CPU state themselves, but that does not close the race with CPU hotplug. Taking the hotplug lock around every enqueue is also too expensive for users that only want CPU-local placement as a performance hint. Add add_timer_active_cpu() for callers that want to queue a timer on a specific CPU when that CPU is active, but otherwise fall back to an active CPU. Implement this by teaching the enqueue path to verify that the target timer base is active and, if not, requeue the timer on the current CPU. Leave add_timer_on() semantics unchanged for callers that require strict CPU placement. Signed-off-by: Partha Sarathi Satapathy <partha.satapathy@oracle.com> --- include/linux/timer.h | 1 + kernel/time/timer.c | 45 ++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 45 insertions(+), 1 deletion(-) diff --git a/include/linux/timer.h b/include/linux/timer.h index 62e1cea71125..8c771b367662 100644 --- a/include/linux/timer.h +++ b/include/linux/timer.h @@ -148,6 +148,7 @@ static inline int timer_pending(const struct timer_list * timer) } extern void add_timer_on(struct timer_list *timer, int cpu); +extern void add_timer_active_cpu(struct timer_list *timer, int cpu); extern int mod_timer(struct timer_list *timer, unsigned long expires); extern int mod_timer_pending(struct timer_list *timer, unsigned long expires); extern int timer_reduce(struct timer_list *timer, unsigned long expires); diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 1f2364126894..c73a28701a31 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -260,6 +260,7 @@ struct timer_base { bool next_expiry_recalc; bool is_idle; bool timers_pending; + bool is_active; DECLARE_BITMAP(pending_map, WHEEL_SIZE); struct hlist_head vectors[WHEEL_SIZE]; } ____cacheline_aligned; @@ -1296,7 +1297,7 @@ EXPORT_SYMBOL(add_timer_global); * * See add_timer() for further details. */ -void add_timer_on(struct timer_list *timer, int cpu) +static void __add_timer_on(struct timer_list *timer, int cpu, bool need_ol_cpu) { struct timer_base *new_base, *base; unsigned long flags; @@ -1333,6 +1334,18 @@ void add_timer_on(struct timer_list *timer, int cpu) WRITE_ONCE(timer->flags, (timer->flags & ~TIMER_BASEMASK) | cpu); } +#ifdef CONFIG_HOTPLUG_CPU + if (need_ol_cpu) { + if (!base->is_active) { + raw_spin_unlock(&base->lock); + base = this_cpu_ptr(&timer_bases[BASE_LOCAL]); + raw_spin_lock(&base->lock); + cpu = smp_processor_id(); + WRITE_ONCE(timer->flags, + (timer->flags & ~TIMER_BASEMASK) | cpu); + } + } +#endif /* CONFIG_HOTPLUG_CPU */ forward_timer_base(base); debug_timer_activate(timer); @@ -1340,8 +1353,31 @@ void add_timer_on(struct timer_list *timer, int cpu) out_unlock: raw_spin_unlock_irqrestore(&base->lock, flags); } + +void add_timer_on(struct timer_list *timer, int cpu) +{ + bool need_ol_cpu = false; + + __add_timer_on(timer, cpu, need_ol_cpu); +} EXPORT_SYMBOL_GPL(add_timer_on); +/** + * add_timer_active_cpu - Start a timer on a particular CPU if online or current + * @timer: The timer to be started + * @cpu: The CPU to start it on + * + * This is like add_timer_on(), except that it queues the timer on the + * given CPU only when that CPU is online or on the current CPU. + */ +void add_timer_active_cpu(struct timer_list *timer, int cpu) +{ + bool need_ol_cpu = true; + + __add_timer_on(timer, cpu, need_ol_cpu); +} +EXPORT_SYMBOL_GPL(add_timer_active_cpu); + /** * __timer_delete - Internal function: Deactivate a timer * @timer: The timer to be deactivated @@ -2507,6 +2543,7 @@ int timers_prepare_cpu(unsigned int cpu) base->next_expiry_recalc = false; base->timers_pending = false; base->is_idle = false; + base->is_active = true; } return 0; } @@ -2535,6 +2572,11 @@ int timers_dead_cpu(unsigned int cpu) WARN_ON_ONCE(old_base->running_timer); old_base->running_timer = NULL; + /* + * Make the dead CPU base unavailable to add_timer_on() + * when the caller wants an active target CPU. + */ + old_base->is_active = false; for (i = 0; i < WHEEL_SIZE; i++) migrate_timer_list(new_base, old_base->vectors + i); @@ -2559,6 +2601,7 @@ static void __init init_timer_cpu(int cpu) raw_spin_lock_init(&base->lock); base->clk = jiffies; base->next_expiry = base->clk + TIMER_NEXT_MAX_DELTA; + base->is_active = true; timer_base_init_expiry_lock(base); } } -- 2.43.7 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] timer: add add_timer_active_cpu() 2026-04-23 9:19 ` [PATCH 1/2] timer: add add_timer_active_cpu() Partha Satapathy @ 2026-05-05 21:33 ` Thomas Gleixner 0 siblings, 0 replies; 6+ messages in thread From: Thomas Gleixner @ 2026-05-05 21:33 UTC (permalink / raw) To: Partha Satapathy, partha.satapathy, anna-maria, frederic, linux-kernel, tj, jiangshanlai Cc: notify On Thu, Apr 23 2026 at 09:19, Partha Satapathy wrote: > From: Partha Sarathi Satapathy <partha.satapathy@oracle.com> > > add_timer_on() can queue a timer on a CPU that goes offline before the > timer is enqueued. When that happens, the timer remains unserviced until > the CPU comes back online. > > Callers can try to avoid that by checking CPU state themselves, but that > does not close the race with CPU hotplug. Taking the hotplug lock around > every enqueue is also too expensive for users that only want CPU-local > placement as a performance hint. > > Add add_timer_active_cpu() for callers that want to queue a timer on a Aside of Frederic's comments this is a patently bad function name as it suggests that the timer should be placed on any active CPU, which makes it more confusing as the function has a @cpu argument. > @@ -1296,7 +1297,7 @@ EXPORT_SYMBOL(add_timer_global); > * > * See add_timer() for further details. > */ > -void add_timer_on(struct timer_list *timer, int cpu) > +static void __add_timer_on(struct timer_list *timer, int cpu, bool need_ol_cpu) How is the kernel-doc comment still valid for that function? Also 'need_ol_cpu' is the most ridiculous argument name I've seen in a long time. My first read was: "Need [good] ol' CPU" > { > struct timer_base *new_base, *base; > unsigned long flags; > @@ -1333,6 +1334,18 @@ void add_timer_on(struct timer_list *timer, int cpu) > WRITE_ONCE(timer->flags, > (timer->flags & ~TIMER_BASEMASK) | cpu); > } > +#ifdef CONFIG_HOTPLUG_CPU > + if (need_ol_cpu) { This #ifdeffery is pointless and can be replaced with if (IS_ENABLED(CONFIG...) && ...) But that's just a cosmetic detail. > + if (!base->is_active) { > + raw_spin_unlock(&base->lock); > + base = this_cpu_ptr(&timer_bases[BASE_LOCAL]); > + raw_spin_lock(&base->lock); > + cpu = smp_processor_id(); > + WRITE_ONCE(timer->flags, > + (timer->flags & ~TIMER_BASEMASK) | cpu); This is broken vs. the TIMER_MIGRATION semantics. > +void add_timer_on(struct timer_list *timer, int cpu) > +{ > + bool need_ol_cpu = false; > + > + __add_timer_on(timer, cpu, need_ol_cpu); > +} > EXPORT_SYMBOL_GPL(add_timer_on); > > +/** > + * add_timer_active_cpu - Start a timer on a particular CPU if online or current > + * @timer: The timer to be started > + * @cpu: The CPU to start it on > + * > + * This is like add_timer_on(), except that it queues the timer on the > + * given CPU only when that CPU is online or on the current CPU. > + */ > +void add_timer_active_cpu(struct timer_list *timer, int cpu) > +{ > + bool need_ol_cpu = true; > + > + __add_timer_on(timer, cpu, need_ol_cpu); > +} > +EXPORT_SYMBOL_GPL(add_timer_active_cpu); We surely need more exports just for adding a misnamed argument. > + > /** > * __timer_delete - Internal function: Deactivate a timer > * @timer: The timer to be deactivated > @@ -2507,6 +2543,7 @@ int timers_prepare_cpu(unsigned int cpu) > base->next_expiry_recalc = false; > base->timers_pending = false; > base->is_idle = false; > + base->is_active = true; That's just wrong. The base is active only when the CPU reaches online state. > @@ -2535,6 +2572,11 @@ int timers_dead_cpu(unsigned int cpu) > > WARN_ON_ONCE(old_base->running_timer); > old_base->running_timer = NULL; > + /* > + * Make the dead CPU base unavailable to add_timer_on() > + * when the caller wants an active target CPU. > + */ > + old_base->is_active = false; After a CPU went offline no timer should be queued on it anymore and add_timer_on() clearly lacks a if (WARN_ON(cpu_offline(cpu))) return; If at all what you really want is an interface, which 1) allows to give a hint about the CPU to place the timer on 2) checks whether the hinted CPU is online (or even active) and places if on it if so 3) if #2 fails picks a CPU on the same node when available (online or active) 4) Picks any online CPU (maybe prefer the calling CPU) But let me look at your argument you provided to Frederic: > Commit 4c00f3b0dea023a9851908a501735c899b0772f9 > ("net/rds: Add preferred_cpu option to rds_rdma.ko") > added `preferred_cpu` so follow-up work can be placed for locality and > performance, not because correctness requires that exact CPU. If RDS fails to update the preferred CPU logic once that CPU goes offline, then that's a clear bug in that code and should not be worked around elsewhere. Thanks, tglx ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 2/2] workqueue: add queue_delayed_work_active_cpu() 2026-04-23 9:19 [PATCH 0/2] timers/workqueue: Add support for active CPU Partha Satapathy 2026-04-23 9:19 ` [PATCH 1/2] timer: add add_timer_active_cpu() Partha Satapathy @ 2026-04-23 9:19 ` Partha Satapathy 2026-04-23 12:05 ` [PATCH 0/2] timers/workqueue: Add support for active CPU Frederic Weisbecker 2 siblings, 0 replies; 6+ messages in thread From: Partha Satapathy @ 2026-04-23 9:19 UTC (permalink / raw) To: partha.satapathy, anna-maria, frederic, tglx, linux-kernel, tj, jiangshanlai Cc: notify From: Partha Sarathi Satapathy <partha.satapathy@oracle.com> Delayed work queued with queue_delayed_work_on() inherits the same CPU hotplug race as add_timer_on(): the backing timer can be enqueued on a CPU that goes offline before the queueing completes, leaving the timer unserviced until that CPU comes back online. As a result, the delayed work item is never queued for execution. Add queue_delayed_work_active_cpu() for callers that want delayed work to target a specific CPU when that CPU is active, but fall back to an active CPU otherwise. For the delayed timer path, use add_timer_active_cpu() instead of add_timer_on(). After enqueueing the timer, update dwork->cpu to match the CPU recorded in the timer so that the work item is queued on the CPU actually selected for the timer. Leave queue_delayed_work_on() unchanged for callers that require the existing strict behavior. Signed-off-by: Partha Sarathi Satapathy <partha.satapathy@oracle.com> --- include/linux/workqueue.h | 3 ++ kernel/workqueue.c | 96 +++++++++++++++++++++++++++++++++------ 2 files changed, 86 insertions(+), 13 deletions(-) diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index dabc351cc127..9799e44bca2b 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -596,6 +596,9 @@ extern bool queue_work_node(int node, struct workqueue_struct *wq, struct work_struct *work); extern bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq, struct delayed_work *work, unsigned long delay); +extern bool queue_delayed_work_active_cpu(int cpu, struct workqueue_struct *wq, + struct delayed_work *work, + unsigned long delay); extern bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq, struct delayed_work *dwork, unsigned long delay); extern bool queue_rcu_work(struct workqueue_struct *wq, struct rcu_work *rwork); diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 253311af47c6..80f5d162624d 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2483,7 +2483,8 @@ void delayed_work_timer_fn(struct timer_list *t) EXPORT_SYMBOL(delayed_work_timer_fn); static void __queue_delayed_work(int cpu, struct workqueue_struct *wq, - struct delayed_work *dwork, unsigned long delay) + struct delayed_work *dwork, unsigned long delay, + bool dwork_active_cpu) { struct timer_list *timer = &dwork->timer; struct work_struct *work = &dwork->work; @@ -2504,7 +2505,8 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq, return; } - WARN_ON_ONCE(cpu != WORK_CPU_UNBOUND && !cpu_online(cpu)); + WARN_ON_ONCE(!dwork_active_cpu && cpu != WORK_CPU_UNBOUND && + !cpu_online(cpu)); dwork->wq = wq; dwork->cpu = cpu; timer->expires = jiffies + delay; @@ -2516,10 +2518,17 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq, cpu = housekeeping_any_cpu(HK_TYPE_TIMER); add_timer_on(timer, cpu); } else { - if (likely(cpu == WORK_CPU_UNBOUND)) + if (likely(cpu == WORK_CPU_UNBOUND)) { add_timer_global(timer); - else + } else if (dwork_active_cpu) { + add_timer_active_cpu(timer, cpu); + /* The target CPU may change if it is offline. */ + /* Avoid selecting the same offline CPU. */ + dwork->cpu = READ_ONCE(timer->flags) & + TIMER_CPUMASK; + } else { add_timer_on(timer, cpu); + } } } @@ -2530,18 +2539,23 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq, * @dwork: work to queue * @delay: number of jiffies to wait before queueing * - * We queue the delayed_work to a specific CPU, for non-zero delays the - * caller must ensure it is online and can't go away. Callers that fail - * to ensure this, may get @dwork->timer queued to an offlined CPU and - * this will prevent queueing of @dwork->work unless the offlined CPU - * becomes online again. + * We queue the delayed_work to a specific CPU, for non-zero delays and + * dwork_active_cpu is not set, caller must ensure it is online and can't + * go away. Callers that fail to ensure this, may get @dwork->timer + * queued to an offlined CPU and this will prevent queueing of + * @dwork->work unless the offlined CPU becomes online again. + * + * If dwork_active_cpu is set, timer queued to an offlined CPU will be + * queued to the current cpu. * * Return: %false if @work was already on a queue, %true otherwise. If * @delay is zero and @dwork is idle, it will be scheduled for immediate * execution. */ -bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq, - struct delayed_work *dwork, unsigned long delay) +static inline bool +__queue_delayed_work_on(int cpu, struct workqueue_struct *wq, + struct delayed_work *dwork, unsigned long delay, + bool dwork_active_cpu) { struct work_struct *work = &dwork->work; bool ret = false; @@ -2552,15 +2566,70 @@ bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq, if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work)) && !clear_pending_if_disabled(work)) { - __queue_delayed_work(cpu, wq, dwork, delay); + __queue_delayed_work(cpu, wq, dwork, delay, dwork_active_cpu); ret = true; } local_irq_restore(irq_flags); return ret; } + +/** + * queue_delayed_work_on - queue work on specific CPU after delay + * @cpu: CPU number to execute work on + * @wq: workqueue to use + * @dwork: work to queue + * @delay: number of jiffies to wait before queueing + * + * We queue the delayed_work to a specific CPU, for non-zero delays the + * caller must ensure it is online and can't go away. Callers that fail + * to ensure this, may get @dwork->timer queued to an offlined CPU and + * this will prevent queueing of @dwork->work unless the offlined CPU + * becomes online again. + * + * Return: %false if @work was already on a queue, %true otherwise. If + * @delay is zero and @dwork is idle, it will be scheduled for immediate + * execution. + */ +bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq, + struct delayed_work *dwork, unsigned long delay) +{ + bool ret = false; + bool dwork_active_cpu = false; + + ret = __queue_delayed_work_on(cpu, wq, dwork, delay, dwork_active_cpu); + return ret; +} EXPORT_SYMBOL(queue_delayed_work_on); +/** + * queue_delayed_work_active_cpu - queue delayed work for an active CPU + * @cpu: CPU number to execute work on + * @wq: workqueue to use + * @dwork: work to queue + * @delay: number of jiffies to wait before queueing + * + * This is like queue_delayed_work_on(), except that for delayed work + * the timer is queued on @cpu only if that CPU is online or is the + * current CPU. + * + * Return: %false if @work was already on a queue, %true otherwise. If + * @delay is zero and @dwork is idle, it will be scheduled for immediate + * execution. + * + */ +bool queue_delayed_work_active_cpu(int cpu, struct workqueue_struct *wq, + struct delayed_work *dwork, + unsigned long delay) +{ + bool ret = false; + bool dwork_active_cpu = true; + + ret = __queue_delayed_work_on(cpu, wq, dwork, delay, dwork_active_cpu); + return ret; +} +EXPORT_SYMBOL(queue_delayed_work_active_cpu); + /** * mod_delayed_work_on - modify delay of or queue a delayed work on specific CPU * @cpu: CPU number to execute work on @@ -2584,11 +2653,12 @@ bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq, { unsigned long irq_flags; bool ret; + bool dwork_active_cpu = false; ret = work_grab_pending(&dwork->work, WORK_CANCEL_DELAYED, &irq_flags); if (!clear_pending_if_disabled(&dwork->work)) - __queue_delayed_work(cpu, wq, dwork, delay); + __queue_delayed_work(cpu, wq, dwork, delay, dwork_active_cpu); local_irq_restore(irq_flags); return ret; -- 2.43.7 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 0/2] timers/workqueue: Add support for active CPU 2026-04-23 9:19 [PATCH 0/2] timers/workqueue: Add support for active CPU Partha Satapathy 2026-04-23 9:19 ` [PATCH 1/2] timer: add add_timer_active_cpu() Partha Satapathy 2026-04-23 9:19 ` [PATCH 2/2] workqueue: add queue_delayed_work_active_cpu() Partha Satapathy @ 2026-04-23 12:05 ` Frederic Weisbecker 2026-04-27 16:52 ` [External] : " Partha Satapathy 2 siblings, 1 reply; 6+ messages in thread From: Frederic Weisbecker @ 2026-04-23 12:05 UTC (permalink / raw) To: Partha Satapathy; +Cc: anna-maria, tglx, linux-kernel, tj, jiangshanlai, notify Hi, Le Thu, Apr 23, 2026 at 09:19:05AM +0000, Partha Satapathy a écrit : > From: Partha Sarathi Satapathy <partha.satapathy@oracle.com> > > Hi, > > Timers queued with add_timer_on() and delayed work queued with > queue_delayed_work_on() currently rely on the caller to ensure that the > target CPU remains online until the enqueue operation completes. In > practice, CPU hotplug can still race with that sequence and leave the > timer queued on an offline CPU, where it will not run until that CPU > comes back online. > > For delayed work, this has a direct knock-on effect: if the backing > timer is stranded on an offline CPU, the work item is never queued for > execution until that CPU returns. > > In many cases, the target CPU is chosen for locality and cache affinity > rather than as a strict execution requirement. Falling back to an active > CPU is preferable to leaving the timer or delayed work blocked on a dead > CPU. While callers can try to track CPU hotplug state themselves, that > does not close the race, and taking the hotplug lock around enqueue > operations is too expensive for this class of use. > > This series adds opt-in helpers for that fallback behavior without > changing the semantics of the existing interfaces: > > - add_timer_active_cpu() queues a timer on the requested CPU only if > the target CPU's timer base is active; otherwise it falls back to > the current CPU. > > - queue_delayed_work_active_cpu() uses the new timer helper for the > delayed timer path and updates dwork->cpu to reflect the CPU > actually selected for the timer, so the work item is queued on the > same active CPU. > > The existing add_timer_on() and queue_delayed_work_on() behavior is left > unchanged for callers that require strict CPU placement. Timers are migrated when CPUs go offline. So the problem is queueing a timer to an offline CPU. It should be the responsibility of a subsystem to synchronize with CPU hotplug in order to avoid that. As for timers that are queued locally not for correctness but for performance reasons, do we know such example? Thanks. -- Frederic Weisbecker SUSE Labs ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [External] : Re: [PATCH 0/2] timers/workqueue: Add support for active CPU 2026-04-23 12:05 ` [PATCH 0/2] timers/workqueue: Add support for active CPU Frederic Weisbecker @ 2026-04-27 16:52 ` Partha Satapathy 0 siblings, 0 replies; 6+ messages in thread From: Partha Satapathy @ 2026-04-27 16:52 UTC (permalink / raw) To: Frederic Weisbecker Cc: anna-maria, tglx, linux-kernel, tj, jiangshanlai, notify Thanks for the feedback, One concrete example is our downstream RDS/RDMA code. Commit 4c00f3b0dea023a9851908a501735c899b0772f9 ("net/rds: Add preferred_cpu option to rds_rdma.ko") added `preferred_cpu` so follow-up work can be placed for locality and performance, not because correctness requires that exact CPU. The motivation was to keep processing close to where the completion was observed. The aim is to keep the workers near the CPU where the HCA's queues are mapped, so completion handling and follow-up work remain local. With `preferred_cpu=cq`, work is steered toward the CPU matching the CQ interrupt affinity, which helps latency and cache locality. With `preferred_cpu=numa`, work is kept on CPUs local to the HCA's NUMA node, which helps scale-out while preserving locality to the device and memory. So here the chosen CPU is only a preference. If that CPU goes offline in the window between selection and queueing, the delayed timer can end up queued on a CPU that is no longer active, and execution is then postponed until that CPU comes back. For this class of user, fallback to an active CPU is preferable, because the placement is an optimization, not a correctness requirement. I also do not think this is specific to RDS. In general, network and RDMA drivers often choose CPUs based on IRQ affinity, queue affinity, or NUMA locality to improve latency and throughput, while still being able to run on any online CPU. For that kind of workload, "active if possible, fallback otherwise" looks like a useful generic behavior. That said, I agree that a downstream-only example by itself may not be enough to justify a mainline core API change. I mentioned RDS mainly to answer your question about whether there are real users that choose CPUs for performance/locality rather than strict correctness. On 23-04-2026 17:35, Frederic Weisbecker wrote: > Hi, > > Le Thu, Apr 23, 2026 at 09:19:05AM +0000, Partha Satapathy a écrit : >> From: Partha Sarathi Satapathy <partha.satapathy@oracle.com> >> >> Hi, >> >> Timers queued with add_timer_on() and delayed work queued with >> queue_delayed_work_on() currently rely on the caller to ensure that the >> target CPU remains online until the enqueue operation completes. In >> practice, CPU hotplug can still race with that sequence and leave the >> timer queued on an offline CPU, where it will not run until that CPU >> comes back online. >> >> For delayed work, this has a direct knock-on effect: if the backing >> timer is stranded on an offline CPU, the work item is never queued for >> execution until that CPU returns. >> >> In many cases, the target CPU is chosen for locality and cache affinity >> rather than as a strict execution requirement. Falling back to an active >> CPU is preferable to leaving the timer or delayed work blocked on a dead >> CPU. While callers can try to track CPU hotplug state themselves, that >> does not close the race, and taking the hotplug lock around enqueue >> operations is too expensive for this class of use. >> >> This series adds opt-in helpers for that fallback behavior without >> changing the semantics of the existing interfaces: >> >> - add_timer_active_cpu() queues a timer on the requested CPU only if >> the target CPU's timer base is active; otherwise it falls back to >> the current CPU. >> >> - queue_delayed_work_active_cpu() uses the new timer helper for the >> delayed timer path and updates dwork->cpu to reflect the CPU >> actually selected for the timer, so the work item is queued on the >> same active CPU. >> >> The existing add_timer_on() and queue_delayed_work_on() behavior is left >> unchanged for callers that require strict CPU placement. > > Timers are migrated when CPUs go offline. So the problem is queueing > a timer to an offline CPU. It should be the responsibility of a subsystem > to synchronize with CPU hotplug in order to avoid that. > > As for timers that are queued locally not for correctness but for performance > reasons, do we know such example? > > Thanks. > ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-05-05 21:33 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-23 9:19 [PATCH 0/2] timers/workqueue: Add support for active CPU Partha Satapathy 2026-04-23 9:19 ` [PATCH 1/2] timer: add add_timer_active_cpu() Partha Satapathy 2026-05-05 21:33 ` Thomas Gleixner 2026-04-23 9:19 ` [PATCH 2/2] workqueue: add queue_delayed_work_active_cpu() Partha Satapathy 2026-04-23 12:05 ` [PATCH 0/2] timers/workqueue: Add support for active CPU Frederic Weisbecker 2026-04-27 16:52 ` [External] : " Partha Satapathy
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.