* [PATCH v3 0/3] Allow to drop the softirq-BKL lock on PREEMPT_RT
@ 2025-09-04 14:25 Sebastian Andrzej Siewior
2025-09-04 14:25 ` [PATCH v3 1/3] workqueue: Provide a handshake for canceling BH workers Sebastian Andrzej Siewior
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-09-04 14:25 UTC (permalink / raw)
To: linux-rt-devel, linux-kernel
Cc: Clark Williams, Ingo Molnar, Lai Jiangshan, Peter Zijlstra,
Steven Rostedt, Tejun Heo, Thomas Gleixner,
Sebastian Andrzej Siewior
Users which rely on synchronisation within a BH-disabled section as in
- access to per-CPU data
- per-CPU timers
- synchronisation against another user within a BH-disabled section
rely on the local_lock_t lock in local_bh_disable() on PREEMPT_RT.
Almost all users dropped their dependency. The remaining (identified)
user in networking (pipapo) is in net-next.
What is left is the infrastructure as in tasklet and workqueue (for
bh-worker).
Both are part of this series. The last patch in the series adds an
option to drop the lock.
v1…v2: https://lore.kernel.org/all/20250901163811.963326-1-bigeasy@linutronix.de/
- In workqueue, __flush_work(), replace wq_pool_mutex with RCU locking.
v1 tasklet https://lore.kernel.org/all/20250812143930.22RBn5BW@linutronix.de
v1 workqueue https://lore.kernel.org/all/20250820103657.vDuDuLx6@linutronix.de
v1 lock-drop https://lore.kernel.org/all/20250613105653.1860729-2-bigeasy@linutronix.de
Sebastian Andrzej Siewior (3):
workqueue: Provide a handshake for canceling BH workers
softirq: Provide a handshake for canceling tasklets via polling
softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT
kernel/Kconfig.preempt | 13 ++++
kernel/softirq.c | 145 ++++++++++++++++++++++++++++++++++-------
kernel/workqueue.c | 50 +++++++++++---
3 files changed, 174 insertions(+), 34 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v3 1/3] workqueue: Provide a handshake for canceling BH workers
2025-09-04 14:25 [PATCH v3 0/3] Allow to drop the softirq-BKL lock on PREEMPT_RT Sebastian Andrzej Siewior
@ 2025-09-04 14:25 ` Sebastian Andrzej Siewior
2025-09-04 17:28 ` Tejun Heo
2025-09-04 14:25 ` [PATCH v3 2/3] softirq: Provide a handshake for canceling tasklets via polling Sebastian Andrzej Siewior
` (2 subsequent siblings)
3 siblings, 1 reply; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-09-04 14:25 UTC (permalink / raw)
To: linux-rt-devel, linux-kernel
Cc: Clark Williams, Ingo Molnar, Lai Jiangshan, Peter Zijlstra,
Steven Rostedt, Tejun Heo, Thomas Gleixner,
Sebastian Andrzej Siewior
While a BH work item is canceled, the core code spins until it
determines that the item completed. On PREEMPT_RT the spinning relies on
a lock in local_bh_disable() to avoid a live lock if the canceling
thread has higher priority than the BH-worker and preempts it. This lock
ensures that the BH-worker makes progress by PI-boosting it.
This lock in local_bh_disable() is a central per-CPU BKL and about to be
removed.
To provide the required synchronisation add a per pool lock. The lock is
acquired by the bh_worker at the begin while the individual callbacks
are invoked. To enforce progress in case of interruption, __flush_work()
needs to acquire the lock.
This will flush all BH-work items assigned to that pool.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
kernel/workqueue.c | 50 +++++++++++++++++++++++++++++++++++++---------
1 file changed, 41 insertions(+), 9 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index c6b79b3675c31..d6c94ee8edfc5 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -222,7 +222,9 @@ struct worker_pool {
struct workqueue_attrs *attrs; /* I: worker attributes */
struct hlist_node hash_node; /* PL: unbound_pool_hash node */
int refcnt; /* PL: refcnt for unbound pools */
-
+#ifdef CONFIG_PREEMPT_RT
+ spinlock_t cb_lock; /* BH worker cancel lock */
+#endif
/*
* Destruction of pool is RCU protected to allow dereferences
* from get_work_pool().
@@ -3078,6 +3080,31 @@ __acquires(&pool->lock)
goto restart;
}
+#ifdef CONFIG_PREEMPT_RT
+static void worker_lock_callback(struct worker_pool *pool)
+{
+ spin_lock(&pool->cb_lock);
+}
+
+static void worker_unlock_callback(struct worker_pool *pool)
+{
+ spin_unlock(&pool->cb_lock);
+}
+
+static void workqueue_callback_cancel_wait_running(struct worker_pool *pool)
+{
+ spin_lock(&pool->cb_lock);
+ spin_unlock(&pool->cb_lock);
+}
+
+#else
+
+static void worker_lock_callback(struct worker_pool *pool) { }
+static void worker_unlock_callback(struct worker_pool *pool) { }
+static void workqueue_callback_cancel_wait_running(struct worker_pool *pool) { }
+
+#endif
+
/**
* manage_workers - manage worker pool
* @worker: self
@@ -3557,6 +3584,7 @@ static void bh_worker(struct worker *worker)
int nr_restarts = BH_WORKER_RESTARTS;
unsigned long end = jiffies + BH_WORKER_JIFFIES;
+ worker_lock_callback(pool);
raw_spin_lock_irq(&pool->lock);
worker_leave_idle(worker);
@@ -3585,6 +3613,7 @@ static void bh_worker(struct worker *worker)
worker_enter_idle(worker);
kick_pool(pool);
raw_spin_unlock_irq(&pool->lock);
+ worker_unlock_callback(pool);
}
/*
@@ -4222,17 +4251,17 @@ static bool __flush_work(struct work_struct *work, bool from_cancel)
(data & WORK_OFFQ_BH)) {
/*
* On RT, prevent a live lock when %current preempted
- * soft interrupt processing or prevents ksoftirqd from
- * running by keeping flipping BH. If the BH work item
- * runs on a different CPU then this has no effect other
- * than doing the BH disable/enable dance for nothing.
- * This is copied from
- * kernel/softirq.c::tasklet_unlock_spin_wait().
+ * soft interrupt processing by blocking on lock which
+ * is owned by the thread invoking the callback.
*/
while (!try_wait_for_completion(&barr.done)) {
if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
- local_bh_disable();
- local_bh_enable();
+ struct worker_pool *pool;
+
+ guard(rcu)();
+ pool = get_work_pool(work);
+ if (pool)
+ workqueue_callback_cancel_wait_running(pool);
} else {
cpu_relax();
}
@@ -4782,6 +4811,9 @@ static int init_worker_pool(struct worker_pool *pool)
ida_init(&pool->worker_ida);
INIT_HLIST_NODE(&pool->hash_node);
pool->refcnt = 1;
+#ifdef CONFIG_PREEMPT_RT
+ spin_lock_init(&pool->cb_lock);
+#endif
/* shouldn't fail above this point */
pool->attrs = alloc_workqueue_attrs();
--
2.51.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 2/3] softirq: Provide a handshake for canceling tasklets via polling
2025-09-04 14:25 [PATCH v3 0/3] Allow to drop the softirq-BKL lock on PREEMPT_RT Sebastian Andrzej Siewior
2025-09-04 14:25 ` [PATCH v3 1/3] workqueue: Provide a handshake for canceling BH workers Sebastian Andrzej Siewior
@ 2025-09-04 14:25 ` Sebastian Andrzej Siewior
2025-09-05 10:15 ` Hillf Danton
2025-09-17 14:44 ` [tip: irq/core] " tip-bot2 for Sebastian Andrzej Siewior
2025-09-04 14:25 ` [PATCH v3 3/3] softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT Sebastian Andrzej Siewior
2025-09-15 7:42 ` [PATCH v3 0/3] " Sebastian Andrzej Siewior
3 siblings, 2 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-09-04 14:25 UTC (permalink / raw)
To: linux-rt-devel, linux-kernel
Cc: Clark Williams, Ingo Molnar, Lai Jiangshan, Peter Zijlstra,
Steven Rostedt, Tejun Heo, Thomas Gleixner,
Sebastian Andrzej Siewior
The tasklet_unlock_spin_wait() via tasklet_disable_in_atomic() is
provided for a few legacy tasklet users. The interface is used from
atomic context (which is either softirq or disabled preemption) on
non-PREEMPT_RT an relies on spinning until the tasklet callback
completes.
On PREEMPT_RT the context is never atomic but the busy polling logic
remains. It possible that the thread invoking tasklet_unlock_spin_wait()
has higher priority than the tasklet. If both run on the same CPU the
the tasklet makes no progress and the thread trying to cancel the
tasklet will live-lock the system.
To avoid the lockup tasklet_unlock_spin_wait() uses local_bh_disable()/
enable() which utilizes the local_lock_t for synchronisation. This lock
is a central per-CPU BKL and about to be removed.
Acquire a lock in tasklet_action_common() which is held while the
tasklet's callback is invoked. This lock will be acquired from
tasklet_unlock_spin_wait() via tasklet_callback_cancel_wait_running().
After the tasklet completed tasklet_callback_sync_wait_running() drops
the lock and acquires it again. In order to avoid unlocking the lock
even if there is no cancel request, there is a cb_waiters counter which
is incremented during a cancel request.
Blocking on the lock will PI-boost the tasklet if needed, ensuring
progress is made.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
kernel/softirq.c | 62 ++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 57 insertions(+), 5 deletions(-)
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 513b1945987cc..4e2c980e7712e 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -805,6 +805,58 @@ static bool tasklet_clear_sched(struct tasklet_struct *t)
return false;
}
+#ifdef CONFIG_PREEMPT_RT
+struct tasklet_sync_callback {
+ spinlock_t cb_lock;
+ atomic_t cb_waiters;
+};
+
+static DEFINE_PER_CPU(struct tasklet_sync_callback, tasklet_sync_callback) = {
+ .cb_lock = __SPIN_LOCK_UNLOCKED(tasklet_sync_callback.cb_lock),
+ .cb_waiters = ATOMIC_INIT(0),
+};
+
+static void tasklet_lock_callback(void)
+{
+ spin_lock(this_cpu_ptr(&tasklet_sync_callback.cb_lock));
+}
+
+static void tasklet_unlock_callback(void)
+{
+ spin_unlock(this_cpu_ptr(&tasklet_sync_callback.cb_lock));
+}
+
+static void tasklet_callback_cancel_wait_running(void)
+{
+ struct tasklet_sync_callback *sync_cb = this_cpu_ptr(&tasklet_sync_callback);
+
+ atomic_inc(&sync_cb->cb_waiters);
+ spin_lock(&sync_cb->cb_lock);
+ atomic_dec(&sync_cb->cb_waiters);
+ spin_unlock(&sync_cb->cb_lock);
+}
+
+static void tasklet_callback_sync_wait_running(void)
+{
+ struct tasklet_sync_callback *sync_cb = this_cpu_ptr(&tasklet_sync_callback);
+
+ if (atomic_read(&sync_cb->cb_waiters)) {
+ spin_unlock(&sync_cb->cb_lock);
+ spin_lock(&sync_cb->cb_lock);
+ }
+}
+
+#else /* !CONFIG_PREEMPT_RT: */
+
+static void tasklet_lock_callback(void) { }
+static void tasklet_unlock_callback(void) { }
+static void tasklet_callback_sync_wait_running(void) { }
+
+#ifdef CONFIG_SMP
+static void tasklet_callback_cancel_wait_running(void) { }
+#endif
+#endif /* !CONFIG_PREEMPT_RT */
+
static void tasklet_action_common(struct tasklet_head *tl_head,
unsigned int softirq_nr)
{
@@ -816,6 +868,7 @@ static void tasklet_action_common(struct tasklet_head *tl_head,
tl_head->tail = &tl_head->head;
local_irq_enable();
+ tasklet_lock_callback();
while (list) {
struct tasklet_struct *t = list;
@@ -835,6 +888,7 @@ static void tasklet_action_common(struct tasklet_head *tl_head,
}
}
tasklet_unlock(t);
+ tasklet_callback_sync_wait_running();
continue;
}
tasklet_unlock(t);
@@ -847,6 +901,7 @@ static void tasklet_action_common(struct tasklet_head *tl_head,
__raise_softirq_irqoff(softirq_nr);
local_irq_enable();
}
+ tasklet_unlock_callback();
}
static __latent_entropy void tasklet_action(void)
@@ -897,12 +952,9 @@ void tasklet_unlock_spin_wait(struct tasklet_struct *t)
/*
* Prevent a live lock when current preempted soft
* interrupt processing or prevents ksoftirqd from
- * running. If the tasklet runs on a different CPU
- * then this has no effect other than doing the BH
- * disable/enable dance for nothing.
+ * running.
*/
- local_bh_disable();
- local_bh_enable();
+ tasklet_callback_cancel_wait_running();
} else {
cpu_relax();
}
--
2.51.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 3/3] softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT
2025-09-04 14:25 [PATCH v3 0/3] Allow to drop the softirq-BKL lock on PREEMPT_RT Sebastian Andrzej Siewior
2025-09-04 14:25 ` [PATCH v3 1/3] workqueue: Provide a handshake for canceling BH workers Sebastian Andrzej Siewior
2025-09-04 14:25 ` [PATCH v3 2/3] softirq: Provide a handshake for canceling tasklets via polling Sebastian Andrzej Siewior
@ 2025-09-04 14:25 ` Sebastian Andrzej Siewior
2025-09-17 14:44 ` [tip: irq/core] " tip-bot2 for Sebastian Andrzej Siewior
2025-09-15 7:42 ` [PATCH v3 0/3] " Sebastian Andrzej Siewior
3 siblings, 1 reply; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-09-04 14:25 UTC (permalink / raw)
To: linux-rt-devel, linux-kernel
Cc: Clark Williams, Ingo Molnar, Lai Jiangshan, Peter Zijlstra,
Steven Rostedt, Tejun Heo, Thomas Gleixner,
Sebastian Andrzej Siewior
softirqs are preemptible on PREEMPT_RT. There is synchronisation between
individual sections which disable bottom halves. This in turn means that
a forced threaded interrupt can not preempt another forced threaded
interrupt. Instead it will PI-boost the other handler and wait for its
completion.
This is required because code within a softirq section is assumed to be
non-preemptible and may expect exclusive access to per-CPU resources
such as variables or pinned timers.
Code with such expectation has been identified and updated to use
local_lock_nested_bh() for locking of the per-CPU resource. This means
the lock can be removed.
Add CONFIG_PREEMPT_RT_NEEDS_BH_LOCK which keeps the old behaviour if
selected. Otherwise the softirq synchronising is lifted. The
softirq_ctrl.cnt accounting remains to let NOHZ code if softirqs are
currently handled.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
kernel/Kconfig.preempt | 13 +++++++
kernel/softirq.c | 83 ++++++++++++++++++++++++++++++++----------
2 files changed, 76 insertions(+), 20 deletions(-)
diff --git a/kernel/Kconfig.preempt b/kernel/Kconfig.preempt
index 54ea59ff8fbeb..da326800c1c9b 100644
--- a/kernel/Kconfig.preempt
+++ b/kernel/Kconfig.preempt
@@ -103,6 +103,19 @@ config PREEMPT_RT
Select this if you are building a kernel for systems which
require real-time guarantees.
+config PREEMPT_RT_NEEDS_BH_LOCK
+ bool "Enforce softirq synchronisation on PREEMPT_RT"
+ depends on PREEMPT_RT
+ help
+ Enforce synchronisation across the softirqs context. On PREEMPT_RT
+ the softirq is preemptible. This enforces the same per-CPU BLK
+ semantic non-PREEMPT_RT builds have. This should not be needed
+ because per-CPU locks were added to avoid the per-CPU BKL.
+
+ This switch provides the old behaviour for testing reasons. Select
+ this if you suspect an error with preemptible softirq and want test
+ the old synchronized behaviour.
+
config PREEMPT_COUNT
bool
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 4e2c980e7712e..77198911b8dd4 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -165,7 +165,11 @@ void __local_bh_disable_ip(unsigned long ip, unsigned int cnt)
/* First entry of a task into a BH disabled section? */
if (!current->softirq_disable_cnt) {
if (preemptible()) {
- local_lock(&softirq_ctrl.lock);
+ if (IS_ENABLED(CONFIG_PREEMPT_RT_NEEDS_BH_LOCK))
+ local_lock(&softirq_ctrl.lock);
+ else
+ migrate_disable();
+
/* Required to meet the RCU bottomhalf requirements. */
rcu_read_lock();
} else {
@@ -177,17 +181,34 @@ void __local_bh_disable_ip(unsigned long ip, unsigned int cnt)
* Track the per CPU softirq disabled state. On RT this is per CPU
* state to allow preemption of bottom half disabled sections.
*/
- newcnt = __this_cpu_add_return(softirq_ctrl.cnt, cnt);
- /*
- * Reflect the result in the task state to prevent recursion on the
- * local lock and to make softirq_count() & al work.
- */
- current->softirq_disable_cnt = newcnt;
+ if (IS_ENABLED(CONFIG_PREEMPT_RT_NEEDS_BH_LOCK)) {
+ newcnt = this_cpu_add_return(softirq_ctrl.cnt, cnt);
+ /*
+ * Reflect the result in the task state to prevent recursion on the
+ * local lock and to make softirq_count() & al work.
+ */
+ current->softirq_disable_cnt = newcnt;
- if (IS_ENABLED(CONFIG_TRACE_IRQFLAGS) && newcnt == cnt) {
- raw_local_irq_save(flags);
- lockdep_softirqs_off(ip);
- raw_local_irq_restore(flags);
+ if (IS_ENABLED(CONFIG_TRACE_IRQFLAGS) && newcnt == cnt) {
+ raw_local_irq_save(flags);
+ lockdep_softirqs_off(ip);
+ raw_local_irq_restore(flags);
+ }
+ } else {
+ bool sirq_dis = false;
+
+ if (!current->softirq_disable_cnt)
+ sirq_dis = true;
+
+ this_cpu_add(softirq_ctrl.cnt, cnt);
+ current->softirq_disable_cnt += cnt;
+ WARN_ON_ONCE(current->softirq_disable_cnt < 0);
+
+ if (IS_ENABLED(CONFIG_TRACE_IRQFLAGS) && sirq_dis) {
+ raw_local_irq_save(flags);
+ lockdep_softirqs_off(ip);
+ raw_local_irq_restore(flags);
+ }
}
}
EXPORT_SYMBOL(__local_bh_disable_ip);
@@ -195,23 +216,42 @@ EXPORT_SYMBOL(__local_bh_disable_ip);
static void __local_bh_enable(unsigned int cnt, bool unlock)
{
unsigned long flags;
+ bool sirq_en = false;
int newcnt;
- DEBUG_LOCKS_WARN_ON(current->softirq_disable_cnt !=
- this_cpu_read(softirq_ctrl.cnt));
+ if (IS_ENABLED(CONFIG_PREEMPT_RT_NEEDS_BH_LOCK)) {
+ DEBUG_LOCKS_WARN_ON(current->softirq_disable_cnt !=
+ this_cpu_read(softirq_ctrl.cnt));
+ if (softirq_count() == cnt)
+ sirq_en = true;
+ } else {
+ if (current->softirq_disable_cnt == cnt)
+ sirq_en = true;
+ }
- if (IS_ENABLED(CONFIG_TRACE_IRQFLAGS) && softirq_count() == cnt) {
+ if (IS_ENABLED(CONFIG_TRACE_IRQFLAGS) && sirq_en) {
raw_local_irq_save(flags);
lockdep_softirqs_on(_RET_IP_);
raw_local_irq_restore(flags);
}
- newcnt = __this_cpu_sub_return(softirq_ctrl.cnt, cnt);
- current->softirq_disable_cnt = newcnt;
+ if (IS_ENABLED(CONFIG_PREEMPT_RT_NEEDS_BH_LOCK)) {
+ newcnt = this_cpu_sub_return(softirq_ctrl.cnt, cnt);
+ current->softirq_disable_cnt = newcnt;
- if (!newcnt && unlock) {
- rcu_read_unlock();
- local_unlock(&softirq_ctrl.lock);
+ if (!newcnt && unlock) {
+ rcu_read_unlock();
+ local_unlock(&softirq_ctrl.lock);
+ }
+ } else {
+ current->softirq_disable_cnt -= cnt;
+ this_cpu_sub(softirq_ctrl.cnt, cnt);
+ if (unlock && !current->softirq_disable_cnt) {
+ migrate_enable();
+ rcu_read_unlock();
+ } else {
+ WARN_ON_ONCE(current->softirq_disable_cnt < 0);
+ }
}
}
@@ -228,7 +268,10 @@ void __local_bh_enable_ip(unsigned long ip, unsigned int cnt)
lock_map_release(&bh_lock_map);
local_irq_save(flags);
- curcnt = __this_cpu_read(softirq_ctrl.cnt);
+ if (IS_ENABLED(CONFIG_PREEMPT_RT_NEEDS_BH_LOCK))
+ curcnt = this_cpu_read(softirq_ctrl.cnt);
+ else
+ curcnt = current->softirq_disable_cnt;
/*
* If this is not reenabling soft interrupts, no point in trying to
--
2.51.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v3 1/3] workqueue: Provide a handshake for canceling BH workers
2025-09-04 14:25 ` [PATCH v3 1/3] workqueue: Provide a handshake for canceling BH workers Sebastian Andrzej Siewior
@ 2025-09-04 17:28 ` Tejun Heo
0 siblings, 0 replies; 13+ messages in thread
From: Tejun Heo @ 2025-09-04 17:28 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: linux-rt-devel, linux-kernel, Clark Williams, Ingo Molnar,
Lai Jiangshan, Peter Zijlstra, Steven Rostedt, Thomas Gleixner
On Thu, Sep 04, 2025 at 04:25:23PM +0200, Sebastian Andrzej Siewior wrote:
> While a BH work item is canceled, the core code spins until it
> determines that the item completed. On PREEMPT_RT the spinning relies on
> a lock in local_bh_disable() to avoid a live lock if the canceling
> thread has higher priority than the BH-worker and preempts it. This lock
> ensures that the BH-worker makes progress by PI-boosting it.
>
> This lock in local_bh_disable() is a central per-CPU BKL and about to be
> removed.
>
> To provide the required synchronisation add a per pool lock. The lock is
> acquired by the bh_worker at the begin while the individual callbacks
> are invoked. To enforce progress in case of interruption, __flush_work()
> needs to acquire the lock.
> This will flush all BH-work items assigned to that pool.
>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Applied to wq/for-6.18.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 2/3] softirq: Provide a handshake for canceling tasklets via polling
2025-09-04 14:25 ` [PATCH v3 2/3] softirq: Provide a handshake for canceling tasklets via polling Sebastian Andrzej Siewior
@ 2025-09-05 10:15 ` Hillf Danton
2025-09-15 7:39 ` Sebastian Andrzej Siewior
2025-09-17 14:44 ` [tip: irq/core] " tip-bot2 for Sebastian Andrzej Siewior
1 sibling, 1 reply; 13+ messages in thread
From: Hillf Danton @ 2025-09-05 10:15 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: linux-rt-devel, linux-kernel, Takashi Iwai, Peter Zijlstra
On Thu, 4 Sep 2025 16:25:24 +0200 Sebastian Andrzej Siewior wrote:
> The tasklet_unlock_spin_wait() via tasklet_disable_in_atomic() is
> provided for a few legacy tasklet users. The interface is used from
> atomic context (which is either softirq or disabled preemption) on
> non-PREEMPT_RT an relies on spinning until the tasklet callback
> completes.
> On PREEMPT_RT the context is never atomic but the busy polling logic
> remains. It possible that the thread invoking tasklet_unlock_spin_wait()
> has higher priority than the tasklet. If both run on the same CPU the
> the tasklet makes no progress and the thread trying to cancel the
> tasklet will live-lock the system.
> To avoid the lockup tasklet_unlock_spin_wait() uses local_bh_disable()/
> enable() which utilizes the local_lock_t for synchronisation. This lock
> is a central per-CPU BKL and about to be removed.
>
> Acquire a lock in tasklet_action_common() which is held while the
> tasklet's callback is invoked. This lock will be acquired from
> tasklet_unlock_spin_wait() via tasklet_callback_cancel_wait_running().
> After the tasklet completed tasklet_callback_sync_wait_running() drops
> the lock and acquires it again. In order to avoid unlocking the lock
> even if there is no cancel request, there is a cb_waiters counter which
> is incremented during a cancel request.
> Blocking on the lock will PI-boost the tasklet if needed, ensuring
> progress is made.
>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> kernel/softirq.c | 62 ++++++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 57 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index 513b1945987cc..4e2c980e7712e 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -805,6 +805,58 @@ static bool tasklet_clear_sched(struct tasklet_struct *t)
> return false;
> }
>
> +#ifdef CONFIG_PREEMPT_RT
> +struct tasklet_sync_callback {
> + spinlock_t cb_lock;
> + atomic_t cb_waiters;
> +};
> +
> +static DEFINE_PER_CPU(struct tasklet_sync_callback, tasklet_sync_callback) = {
> + .cb_lock = __SPIN_LOCK_UNLOCKED(tasklet_sync_callback.cb_lock),
> + .cb_waiters = ATOMIC_INIT(0),
> +};
> +
> +static void tasklet_lock_callback(void)
> +{
> + spin_lock(this_cpu_ptr(&tasklet_sync_callback.cb_lock));
> +}
> +
> +static void tasklet_unlock_callback(void)
> +{
> + spin_unlock(this_cpu_ptr(&tasklet_sync_callback.cb_lock));
> +}
> +
> +static void tasklet_callback_cancel_wait_running(void)
> +{
> + struct tasklet_sync_callback *sync_cb = this_cpu_ptr(&tasklet_sync_callback);
> +
> + atomic_inc(&sync_cb->cb_waiters);
> + spin_lock(&sync_cb->cb_lock);
> + atomic_dec(&sync_cb->cb_waiters);
> + spin_unlock(&sync_cb->cb_lock);
> +}
> +
CPU0 CPU1
---- ----
lock A
tasklet C callback
lock A
cancel tasklet B
DEADLOCK-01
After this work could DEADLOCK-01 be triggered, given no chance for DEADLOCK-02 ?
CPU2 CPU3
---- ----
lock A
timer C callback
lock A
timer_delete_sync(timer B)
DEADLOCK-02
> +static void tasklet_callback_sync_wait_running(void)
> +{
> + struct tasklet_sync_callback *sync_cb = this_cpu_ptr(&tasklet_sync_callback);
> +
> + if (atomic_read(&sync_cb->cb_waiters)) {
> + spin_unlock(&sync_cb->cb_lock);
> + spin_lock(&sync_cb->cb_lock);
> + }
> +}
> +
> +#else /* !CONFIG_PREEMPT_RT: */
> +
> +static void tasklet_lock_callback(void) { }
> +static void tasklet_unlock_callback(void) { }
> +static void tasklet_callback_sync_wait_running(void) { }
> +
> +#ifdef CONFIG_SMP
> +static void tasklet_callback_cancel_wait_running(void) { }
> +#endif
> +#endif /* !CONFIG_PREEMPT_RT */
> +
> static void tasklet_action_common(struct tasklet_head *tl_head,
> unsigned int softirq_nr)
> {
> @@ -816,6 +868,7 @@ static void tasklet_action_common(struct tasklet_head *tl_head,
> tl_head->tail = &tl_head->head;
> local_irq_enable();
>
> + tasklet_lock_callback();
> while (list) {
> struct tasklet_struct *t = list;
>
> @@ -835,6 +888,7 @@ static void tasklet_action_common(struct tasklet_head *tl_head,
> }
> }
> tasklet_unlock(t);
> + tasklet_callback_sync_wait_running();
> continue;
> }
> tasklet_unlock(t);
> @@ -847,6 +901,7 @@ static void tasklet_action_common(struct tasklet_head *tl_head,
> __raise_softirq_irqoff(softirq_nr);
> local_irq_enable();
> }
> + tasklet_unlock_callback();
> }
>
> static __latent_entropy void tasklet_action(void)
> @@ -897,12 +952,9 @@ void tasklet_unlock_spin_wait(struct tasklet_struct *t)
> /*
> * Prevent a live lock when current preempted soft
> * interrupt processing or prevents ksoftirqd from
> - * running. If the tasklet runs on a different CPU
> - * then this has no effect other than doing the BH
> - * disable/enable dance for nothing.
> + * running.
> */
> - local_bh_disable();
> - local_bh_enable();
> + tasklet_callback_cancel_wait_running();
> } else {
> cpu_relax();
> }
> --
> 2.51.0
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 2/3] softirq: Provide a handshake for canceling tasklets via polling
2025-09-05 10:15 ` Hillf Danton
@ 2025-09-15 7:39 ` Sebastian Andrzej Siewior
2025-09-18 13:47 ` Hillf Danton
0 siblings, 1 reply; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-09-15 7:39 UTC (permalink / raw)
To: Hillf Danton; +Cc: linux-rt-devel, linux-kernel, Takashi Iwai, Peter Zijlstra
On 2025-09-05 18:15:01 [+0800], Hillf Danton wrote:
> CPU0 CPU1
> ---- ----
> lock A
> tasklet C callback
> lock A
> cancel tasklet B
> DEADLOCK-01
>
> After this work could DEADLOCK-01 be triggered, given no chance for DEADLOCK-02 ?
>
> CPU2 CPU3
> ---- ----
> lock A
> timer C callback
> lock A
> timer_delete_sync(timer B)
> DEADLOCK-02
You are not supposed to acquire the lock, that is also acquired in the
callback, while canceling the timer/ tasklet.
Tell me please, how is this relevant?
If lock A is acquired on CPU0/ 2 then tasklet/ timer on CPU1/ 3 can't
make progress. Now CPU0/ 2 waits for the callback to complete. This
deadlocks as of today regardless of PREEMPT_RT and this change.
The difference is that !RT requires two CPU for this to happen while RT
is efficient and can trigger this with just one CPU.
Sebastian
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 0/3] Allow to drop the softirq-BKL lock on PREEMPT_RT
2025-09-04 14:25 [PATCH v3 0/3] Allow to drop the softirq-BKL lock on PREEMPT_RT Sebastian Andrzej Siewior
` (2 preceding siblings ...)
2025-09-04 14:25 ` [PATCH v3 3/3] softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT Sebastian Andrzej Siewior
@ 2025-09-15 7:42 ` Sebastian Andrzej Siewior
3 siblings, 0 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-09-15 7:42 UTC (permalink / raw)
To: linux-rt-devel, linux-kernel, Thomas Gleixner
Cc: Clark Williams, Ingo Molnar, Lai Jiangshan, Peter Zijlstra,
Steven Rostedt, Tejun Heo
On 2025-09-04 16:25:22 [+0200], To linux-rt-devel@lists.linux.dev wrote:
> Users which rely on synchronisation within a BH-disabled section as in
> - access to per-CPU data
> - per-CPU timers
> - synchronisation against another user within a BH-disabled section
>
> rely on the local_lock_t lock in local_bh_disable() on PREEMPT_RT.
> Almost all users dropped their dependency. The remaining (identified)
> user in networking (pipapo) is in net-next.
> What is left is the infrastructure as in tasklet and workqueue (for
> bh-worker).
> Both are part of this series. The last patch in the series adds an
> option to drop the lock.
Could we please route #2 and #3 via -tip?
Sebastian
^ permalink raw reply [flat|nested] 13+ messages in thread
* [tip: irq/core] softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT
2025-09-04 14:25 ` [PATCH v3 3/3] softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT Sebastian Andrzej Siewior
@ 2025-09-17 14:44 ` tip-bot2 for Sebastian Andrzej Siewior
0 siblings, 0 replies; 13+ messages in thread
From: tip-bot2 for Sebastian Andrzej Siewior @ 2025-09-17 14:44 UTC (permalink / raw)
To: linux-tip-commits
Cc: Sebastian Andrzej Siewior, Thomas Gleixner, x86, linux-kernel,
maz
The following commit has been merged into the irq/core branch of tip:
Commit-ID: 3253cb49cbad4772389d6ef55be75db1f97da910
Gitweb: https://git.kernel.org/tip/3253cb49cbad4772389d6ef55be75db1f97da910
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
AuthorDate: Thu, 04 Sep 2025 16:25:25 +02:00
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 17 Sep 2025 16:25:41 +02:00
softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT
softirqs are preemptible on PREEMPT_RT. There is synchronisation between
individual sections which disable bottom halves. This in turn means that
a forced threaded interrupt cannot preempt another forced threaded
interrupt. Instead it will PI-boost the other handler and wait for its
completion.
This is required because code within a softirq section is assumed to be
non-preemptible and may expect exclusive access to per-CPU resources
such as variables or pinned timers.
Code with such expectation has been identified and updated to use
local_lock_nested_bh() for locking of the per-CPU resource. This means the
softirq lock can be removed.
Disable the softirq synchronization, but add a new config switch
CONFIG_PREEMPT_RT_NEEDS_BH_LOCK which allows to re-enable the synchronized
behavior in case that there are issues, which haven't been detected yet.
The softirq_ctrl.cnt accounting remains to let the NOHZ code know if
softirqs are currently handled.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
kernel/Kconfig.preempt | 13 ++++++-
kernel/softirq.c | 83 +++++++++++++++++++++++++++++++----------
2 files changed, 76 insertions(+), 20 deletions(-)
diff --git a/kernel/Kconfig.preempt b/kernel/Kconfig.preempt
index 54ea59f..da32680 100644
--- a/kernel/Kconfig.preempt
+++ b/kernel/Kconfig.preempt
@@ -103,6 +103,19 @@ config PREEMPT_RT
Select this if you are building a kernel for systems which
require real-time guarantees.
+config PREEMPT_RT_NEEDS_BH_LOCK
+ bool "Enforce softirq synchronisation on PREEMPT_RT"
+ depends on PREEMPT_RT
+ help
+ Enforce synchronisation across the softirqs context. On PREEMPT_RT
+ the softirq is preemptible. This enforces the same per-CPU BLK
+ semantic non-PREEMPT_RT builds have. This should not be needed
+ because per-CPU locks were added to avoid the per-CPU BKL.
+
+ This switch provides the old behaviour for testing reasons. Select
+ this if you suspect an error with preemptible softirq and want test
+ the old synchronized behaviour.
+
config PREEMPT_COUNT
bool
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 4e2c980..7719891 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -165,7 +165,11 @@ void __local_bh_disable_ip(unsigned long ip, unsigned int cnt)
/* First entry of a task into a BH disabled section? */
if (!current->softirq_disable_cnt) {
if (preemptible()) {
- local_lock(&softirq_ctrl.lock);
+ if (IS_ENABLED(CONFIG_PREEMPT_RT_NEEDS_BH_LOCK))
+ local_lock(&softirq_ctrl.lock);
+ else
+ migrate_disable();
+
/* Required to meet the RCU bottomhalf requirements. */
rcu_read_lock();
} else {
@@ -177,17 +181,34 @@ void __local_bh_disable_ip(unsigned long ip, unsigned int cnt)
* Track the per CPU softirq disabled state. On RT this is per CPU
* state to allow preemption of bottom half disabled sections.
*/
- newcnt = __this_cpu_add_return(softirq_ctrl.cnt, cnt);
- /*
- * Reflect the result in the task state to prevent recursion on the
- * local lock and to make softirq_count() & al work.
- */
- current->softirq_disable_cnt = newcnt;
+ if (IS_ENABLED(CONFIG_PREEMPT_RT_NEEDS_BH_LOCK)) {
+ newcnt = this_cpu_add_return(softirq_ctrl.cnt, cnt);
+ /*
+ * Reflect the result in the task state to prevent recursion on the
+ * local lock and to make softirq_count() & al work.
+ */
+ current->softirq_disable_cnt = newcnt;
- if (IS_ENABLED(CONFIG_TRACE_IRQFLAGS) && newcnt == cnt) {
- raw_local_irq_save(flags);
- lockdep_softirqs_off(ip);
- raw_local_irq_restore(flags);
+ if (IS_ENABLED(CONFIG_TRACE_IRQFLAGS) && newcnt == cnt) {
+ raw_local_irq_save(flags);
+ lockdep_softirqs_off(ip);
+ raw_local_irq_restore(flags);
+ }
+ } else {
+ bool sirq_dis = false;
+
+ if (!current->softirq_disable_cnt)
+ sirq_dis = true;
+
+ this_cpu_add(softirq_ctrl.cnt, cnt);
+ current->softirq_disable_cnt += cnt;
+ WARN_ON_ONCE(current->softirq_disable_cnt < 0);
+
+ if (IS_ENABLED(CONFIG_TRACE_IRQFLAGS) && sirq_dis) {
+ raw_local_irq_save(flags);
+ lockdep_softirqs_off(ip);
+ raw_local_irq_restore(flags);
+ }
}
}
EXPORT_SYMBOL(__local_bh_disable_ip);
@@ -195,23 +216,42 @@ EXPORT_SYMBOL(__local_bh_disable_ip);
static void __local_bh_enable(unsigned int cnt, bool unlock)
{
unsigned long flags;
+ bool sirq_en = false;
int newcnt;
- DEBUG_LOCKS_WARN_ON(current->softirq_disable_cnt !=
- this_cpu_read(softirq_ctrl.cnt));
+ if (IS_ENABLED(CONFIG_PREEMPT_RT_NEEDS_BH_LOCK)) {
+ DEBUG_LOCKS_WARN_ON(current->softirq_disable_cnt !=
+ this_cpu_read(softirq_ctrl.cnt));
+ if (softirq_count() == cnt)
+ sirq_en = true;
+ } else {
+ if (current->softirq_disable_cnt == cnt)
+ sirq_en = true;
+ }
- if (IS_ENABLED(CONFIG_TRACE_IRQFLAGS) && softirq_count() == cnt) {
+ if (IS_ENABLED(CONFIG_TRACE_IRQFLAGS) && sirq_en) {
raw_local_irq_save(flags);
lockdep_softirqs_on(_RET_IP_);
raw_local_irq_restore(flags);
}
- newcnt = __this_cpu_sub_return(softirq_ctrl.cnt, cnt);
- current->softirq_disable_cnt = newcnt;
+ if (IS_ENABLED(CONFIG_PREEMPT_RT_NEEDS_BH_LOCK)) {
+ newcnt = this_cpu_sub_return(softirq_ctrl.cnt, cnt);
+ current->softirq_disable_cnt = newcnt;
- if (!newcnt && unlock) {
- rcu_read_unlock();
- local_unlock(&softirq_ctrl.lock);
+ if (!newcnt && unlock) {
+ rcu_read_unlock();
+ local_unlock(&softirq_ctrl.lock);
+ }
+ } else {
+ current->softirq_disable_cnt -= cnt;
+ this_cpu_sub(softirq_ctrl.cnt, cnt);
+ if (unlock && !current->softirq_disable_cnt) {
+ migrate_enable();
+ rcu_read_unlock();
+ } else {
+ WARN_ON_ONCE(current->softirq_disable_cnt < 0);
+ }
}
}
@@ -228,7 +268,10 @@ void __local_bh_enable_ip(unsigned long ip, unsigned int cnt)
lock_map_release(&bh_lock_map);
local_irq_save(flags);
- curcnt = __this_cpu_read(softirq_ctrl.cnt);
+ if (IS_ENABLED(CONFIG_PREEMPT_RT_NEEDS_BH_LOCK))
+ curcnt = this_cpu_read(softirq_ctrl.cnt);
+ else
+ curcnt = current->softirq_disable_cnt;
/*
* If this is not reenabling soft interrupts, no point in trying to
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [tip: irq/core] softirq: Provide a handshake for canceling tasklets via polling
2025-09-04 14:25 ` [PATCH v3 2/3] softirq: Provide a handshake for canceling tasklets via polling Sebastian Andrzej Siewior
2025-09-05 10:15 ` Hillf Danton
@ 2025-09-17 14:44 ` tip-bot2 for Sebastian Andrzej Siewior
1 sibling, 0 replies; 13+ messages in thread
From: tip-bot2 for Sebastian Andrzej Siewior @ 2025-09-17 14:44 UTC (permalink / raw)
To: linux-tip-commits
Cc: Sebastian Andrzej Siewior, Thomas Gleixner, x86, linux-kernel,
maz
The following commit has been merged into the irq/core branch of tip:
Commit-ID: fd4e876f59b7e70283b4025c717cad8948397be1
Gitweb: https://git.kernel.org/tip/fd4e876f59b7e70283b4025c717cad8948397be1
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
AuthorDate: Thu, 04 Sep 2025 16:25:24 +02:00
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 17 Sep 2025 16:25:41 +02:00
softirq: Provide a handshake for canceling tasklets via polling
The tasklet_unlock_spin_wait() via tasklet_disable_in_atomic() is
provided for a few legacy tasklet users. The interface is used from
atomic context (which is either softirq or disabled preemption) on
non-PREEMPT_RT and relies on spinning until the tasklet callback
completes.
On PREEMPT_RT the context is never atomic but the busy polling logic
remains. It is possible that the thread invoking tasklet_unlock_spin_wait()
has higher priority than the tasklet. If both run on the same CPU the the
tasklet makes no progress and the thread trying to cancel the tasklet will
live-lock the system.
To avoid the lockup tasklet_unlock_spin_wait() uses local_bh_disable()/
enable() which utilizes the local_lock_t for synchronisation. This lock is
a central per-CPU BKL and about to be removed.
Solve this by acquire a lock in tasklet_action_common() which is held while
the tasklet's callback is invoked. This lock will be acquired from
tasklet_unlock_spin_wait() via tasklet_callback_cancel_wait_running().
After the tasklet completed tasklet_callback_sync_wait_running() drops the
lock and acquires it again. In order to avoid unlocking the lock even if
there is no cancel request, there is a cb_waiters counter which is
incremented during a cancel request. Blocking on the lock will PI-boost
the tasklet if needed, ensuring progress is made.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
kernel/softirq.c | 62 +++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 57 insertions(+), 5 deletions(-)
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 513b194..4e2c980 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -805,6 +805,58 @@ static bool tasklet_clear_sched(struct tasklet_struct *t)
return false;
}
+#ifdef CONFIG_PREEMPT_RT
+struct tasklet_sync_callback {
+ spinlock_t cb_lock;
+ atomic_t cb_waiters;
+};
+
+static DEFINE_PER_CPU(struct tasklet_sync_callback, tasklet_sync_callback) = {
+ .cb_lock = __SPIN_LOCK_UNLOCKED(tasklet_sync_callback.cb_lock),
+ .cb_waiters = ATOMIC_INIT(0),
+};
+
+static void tasklet_lock_callback(void)
+{
+ spin_lock(this_cpu_ptr(&tasklet_sync_callback.cb_lock));
+}
+
+static void tasklet_unlock_callback(void)
+{
+ spin_unlock(this_cpu_ptr(&tasklet_sync_callback.cb_lock));
+}
+
+static void tasklet_callback_cancel_wait_running(void)
+{
+ struct tasklet_sync_callback *sync_cb = this_cpu_ptr(&tasklet_sync_callback);
+
+ atomic_inc(&sync_cb->cb_waiters);
+ spin_lock(&sync_cb->cb_lock);
+ atomic_dec(&sync_cb->cb_waiters);
+ spin_unlock(&sync_cb->cb_lock);
+}
+
+static void tasklet_callback_sync_wait_running(void)
+{
+ struct tasklet_sync_callback *sync_cb = this_cpu_ptr(&tasklet_sync_callback);
+
+ if (atomic_read(&sync_cb->cb_waiters)) {
+ spin_unlock(&sync_cb->cb_lock);
+ spin_lock(&sync_cb->cb_lock);
+ }
+}
+
+#else /* !CONFIG_PREEMPT_RT: */
+
+static void tasklet_lock_callback(void) { }
+static void tasklet_unlock_callback(void) { }
+static void tasklet_callback_sync_wait_running(void) { }
+
+#ifdef CONFIG_SMP
+static void tasklet_callback_cancel_wait_running(void) { }
+#endif
+#endif /* !CONFIG_PREEMPT_RT */
+
static void tasklet_action_common(struct tasklet_head *tl_head,
unsigned int softirq_nr)
{
@@ -816,6 +868,7 @@ static void tasklet_action_common(struct tasklet_head *tl_head,
tl_head->tail = &tl_head->head;
local_irq_enable();
+ tasklet_lock_callback();
while (list) {
struct tasklet_struct *t = list;
@@ -835,6 +888,7 @@ static void tasklet_action_common(struct tasklet_head *tl_head,
}
}
tasklet_unlock(t);
+ tasklet_callback_sync_wait_running();
continue;
}
tasklet_unlock(t);
@@ -847,6 +901,7 @@ static void tasklet_action_common(struct tasklet_head *tl_head,
__raise_softirq_irqoff(softirq_nr);
local_irq_enable();
}
+ tasklet_unlock_callback();
}
static __latent_entropy void tasklet_action(void)
@@ -897,12 +952,9 @@ void tasklet_unlock_spin_wait(struct tasklet_struct *t)
/*
* Prevent a live lock when current preempted soft
* interrupt processing or prevents ksoftirqd from
- * running. If the tasklet runs on a different CPU
- * then this has no effect other than doing the BH
- * disable/enable dance for nothing.
+ * running.
*/
- local_bh_disable();
- local_bh_enable();
+ tasklet_callback_cancel_wait_running();
} else {
cpu_relax();
}
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v3 2/3] softirq: Provide a handshake for canceling tasklets via polling
2025-09-15 7:39 ` Sebastian Andrzej Siewior
@ 2025-09-18 13:47 ` Hillf Danton
2025-09-18 15:49 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 13+ messages in thread
From: Hillf Danton @ 2025-09-18 13:47 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: linux-rt-devel, linux-kernel, Takashi Iwai, Peter Zijlstra
On Mon, 15 Sep 2025 09:39:33 +0200 Sebastian Andrzej Siewior wrote:
>On 2025-09-05 18:15:01 [+0800], Hillf Danton wrote:
>> CPU0 CPU1
>> ---- ----
>> lock A
>> tasklet C callback
>> lock A
>> cancel tasklet B
>> DEADLOCK-01
>>
>> After this work could DEADLOCK-01 be triggered, given no chance for DEADLOCK-02 ?
>>
>> CPU2 CPU3
>> ---- ----
>> lock A
>> timer C callback
>> lock A
>> timer_delete_sync(timer B)
>> DEADLOCK-02
>
> You are not supposed to acquire the lock, that is also acquired in the
> callback, while canceling the timer/ tasklet.
> Tell me please, how is this relevant?
>
> If lock A is acquired on CPU0/ 2 then tasklet/ timer on CPU1/ 3 can't
> make progress. Now CPU0/ 2 waits for the callback to complete. This
> deadlocks as of today regardless of PREEMPT_RT and this change.
>
In case of !RT, the chance for DEADLOCK-02 is zero because deadlock is
detected based on per-timer instead of per-cpu.
> The difference is that !RT requires two CPU for this to happen while RT
> is efficient and can trigger this with just one CPU.
In case of RT OTOH, false positive deadlock could be triggered because
canceling taskletB has nothing to do with the callback of taskletC.
In short I am highlighting the gap between per-timer/tasklet and per-cpu.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 2/3] softirq: Provide a handshake for canceling tasklets via polling
2025-09-18 13:47 ` Hillf Danton
@ 2025-09-18 15:49 ` Sebastian Andrzej Siewior
2025-09-20 1:29 ` Hillf Danton
0 siblings, 1 reply; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-09-18 15:49 UTC (permalink / raw)
To: Hillf Danton; +Cc: linux-rt-devel, linux-kernel, Takashi Iwai, Peter Zijlstra
On 2025-09-18 21:47:52 [+0800], Hillf Danton wrote:
> On Mon, 15 Sep 2025 09:39:33 +0200 Sebastian Andrzej Siewior wrote:
> >On 2025-09-05 18:15:01 [+0800], Hillf Danton wrote:
> >> CPU0 CPU1
> >> ---- ----
> >> lock A
> >> tasklet C callback
> >> lock A
> >> cancel tasklet B
> >> DEADLOCK-01
> >>
> >> After this work could DEADLOCK-01 be triggered, given no chance for DEADLOCK-02 ?
> >>
> >> CPU2 CPU3
> >> ---- ----
> >> lock A
> >> timer C callback
> >> lock A
> >> timer_delete_sync(timer B)
> >> DEADLOCK-02
> >
> > You are not supposed to acquire the lock, that is also acquired in the
> > callback, while canceling the timer/ tasklet.
> > Tell me please, how is this relevant?
> >
> > If lock A is acquired on CPU0/ 2 then tasklet/ timer on CPU1/ 3 can't
> > make progress. Now CPU0/ 2 waits for the callback to complete. This
> > deadlocks as of today regardless of PREEMPT_RT and this change.
> >
> In case of !RT, the chance for DEADLOCK-02 is zero because deadlock is
> detected based on per-timer instead of per-cpu.
But your "lock A" is global, isn't it?
> > The difference is that !RT requires two CPU for this to happen while RT
> > is efficient and can trigger this with just one CPU.
>
> In case of RT OTOH, false positive deadlock could be triggered because
> canceling taskletB has nothing to do with the callback of taskletC.
>
> In short I am highlighting the gap between per-timer/tasklet and per-cpu.
I don't see a problem here.
Sebastian
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3 2/3] softirq: Provide a handshake for canceling tasklets via polling
2025-09-18 15:49 ` Sebastian Andrzej Siewior
@ 2025-09-20 1:29 ` Hillf Danton
0 siblings, 0 replies; 13+ messages in thread
From: Hillf Danton @ 2025-09-20 1:29 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: linux-rt-devel, linux-kernel, Takashi Iwai, Peter Zijlstra
On Thu, 18 Sep 2025 17:49:37 +0200 Sebastian Andrzej Siewior wrote:
>On 2025-09-18 21:47:52 [+0800], Hillf Danton wrote:
>> On Mon, 15 Sep 2025 09:39:33 +0200 Sebastian Andrzej Siewior wrote:
>> >On 2025-09-05 18:15:01 [+0800], Hillf Danton wrote:
>> >> CPU0 CPU1
>> >> ---- ----
>> >> lock A
>> >> tasklet C callback
>> >> lock A
>> >> cancel tasklet B
>> >> DEADLOCK-01
>> >>
>> >> After this work could DEADLOCK-01 be triggered, given no chance for DEADLOCK-02 ?
>> >>
>> >> CPU2 CPU3
>> >> ---- ----
>> >> lock A
>> >> timer C callback
>> >> lock A
>> >> timer_delete_sync(timer B)
>> >> DEADLOCK-02
>> >
>> > You are not supposed to acquire the lock, that is also acquired in the
>> > callback, while canceling the timer/ tasklet.
>> > Tell me please, how is this relevant?
>> >
>> > If lock A is acquired on CPU0/ 2 then tasklet/ timer on CPU1/ 3 can't
>> > make progress. Now CPU0/ 2 waits for the callback to complete. This
>> > deadlocks as of today regardless of PREEMPT_RT and this change.
>> >
>> In case of !RT, the chance for DEADLOCK-02 is zero because deadlock is
>> detected based on per-timer instead of per-cpu.
>
> But your "lock A" is global, isn't it?
>
IIUC whether lockA is global can be safely ignored here. DEADLOCK-02 can not
be detected with !RT because by define the callback of timerB has nothing to
do with timerC in addition to the current per-timer detecting mechanism.
This work however adds per-cpu detecting mechanism that fails to tell the
difference between timerB and timerC, thus false positive result comes. For
example the callback of timerB does not acquire lockA.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-09-20 1:29 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-04 14:25 [PATCH v3 0/3] Allow to drop the softirq-BKL lock on PREEMPT_RT Sebastian Andrzej Siewior
2025-09-04 14:25 ` [PATCH v3 1/3] workqueue: Provide a handshake for canceling BH workers Sebastian Andrzej Siewior
2025-09-04 17:28 ` Tejun Heo
2025-09-04 14:25 ` [PATCH v3 2/3] softirq: Provide a handshake for canceling tasklets via polling Sebastian Andrzej Siewior
2025-09-05 10:15 ` Hillf Danton
2025-09-15 7:39 ` Sebastian Andrzej Siewior
2025-09-18 13:47 ` Hillf Danton
2025-09-18 15:49 ` Sebastian Andrzej Siewior
2025-09-20 1:29 ` Hillf Danton
2025-09-17 14:44 ` [tip: irq/core] " tip-bot2 for Sebastian Andrzej Siewior
2025-09-04 14:25 ` [PATCH v3 3/3] softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT Sebastian Andrzej Siewior
2025-09-17 14:44 ` [tip: irq/core] " tip-bot2 for Sebastian Andrzej Siewior
2025-09-15 7:42 ` [PATCH v3 0/3] " Sebastian Andrzej Siewior
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox