* [PATCH v6] sched: do not call __put_task_struct() on rt if pi_blocked_on is set
@ 2025-07-07 14:03 Luis Claudio R. Goncalves
2025-07-10 16:19 ` Valentin Schneider
` (3 more replies)
0 siblings, 4 replies; 16+ messages in thread
From: Luis Claudio R. Goncalves @ 2025-07-07 14:03 UTC (permalink / raw)
To: Peter Zijlstra, Sebastian Andrzej Siewior, Clark Williams,
Steven Rostedt, Tejun Heo, David Vernet, Barret Rhoden, Josh Don,
Crystal Wood, linux-kernel, linux-rt-devel, Juri Lelli,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Mel Gorman,
Valentin Schneider, Vincent Guittot, Thomas Gleixner,
Wander Lairson Costa, lclaudio00
With PREEMPT_RT enabled, some of the calls to put_task_struct() coming
from rt_mutex_adjust_prio_chain() could happen in preemptible context and
with a mutex enqueued. That could lead to this sequence:
rt_mutex_adjust_prio_chain()
put_task_struct()
__put_task_struct()
sched_ext_free()
spin_lock_irqsave()
rtlock_lock() ---> TRIGGERS
lockdep_assert(!current->pi_blocked_on);
This is not a SCHED_EXT bug. The first cleanup function called by
__put_task_struct() is sched_ext_free() and it happens to take a
(RT) spin_lock, which in the scenario described above, would trigger
the lockdep assertion of "!current->pi_blocked_on".
Crystal Wood was able to identify the problem as __put_task_struct()
being called during rt_mutex_adjust_prio_chain(), in the context of
a process with a mutex enqueued.
Instead of adding more complex conditions to decide when to directly
call __put_task_struct() and when to defer the call, unconditionally
resort to the deferred call on PREEMPT_RT to simplify the code.
Suggested-by: Crystal Wood <crwood@redhat.com>
Reviewed-by: Wander Lairson Costa <wander@redhat.com>
Fixes: 893cdaaa3977 ("sched: avoid false lockdep splat in put_task_struct()")
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
---
v6: (Sebastian) rework patch description with the note from Crystal Wood.
v5: Add the "Fixes:" tag.
v4: Fix the implementation of what was requested on v3.
v3: (Sebastian, PeterZ) always call the deferred __put_task_struct() on RT.
v2: (Rostedt) remove the #ifdef from put_task_struct() and create
tsk_is_pi_blocked_on() in sched.h to make the change cleaner.
include/linux/sched/task.h | 27 ++++++++++-----------------
1 file changed, 10 insertions(+), 17 deletions(-)
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 0f2aeb37bbb0..5873de8804d4 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -135,24 +135,17 @@ static inline void put_task_struct(struct task_struct *t)
return;
/*
- * In !RT, it is always safe to call __put_task_struct().
- * Under RT, we can only call it in preemptible context.
- */
- if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible()) {
- static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP);
-
- lock_map_acquire_try(&put_task_map);
- __put_task_struct(t);
- lock_map_release(&put_task_map);
- return;
- }
-
- /*
- * under PREEMPT_RT, we can't call put_task_struct
+ * Under PREEMPT_RT, we can't call __put_task_struct
* in atomic context because it will indirectly
- * acquire sleeping locks.
+ * acquire sleeping locks. The same is true if the
+ * current process has a mutex enqueued (blocked on
+ * a PI chain).
+ *
+ * In !RT, it is always safe to call __put_task_struct().
+ * Though, in order to simplify the code, resort to the
+ * deferred call too.
*
- * call_rcu() will schedule delayed_put_task_struct_rcu()
+ * call_rcu() will schedule __put_task_struct_rcu_cb()
* to be called in process context.
*
* __put_task_struct() is called when
@@ -165,7 +158,7 @@ static inline void put_task_struct(struct task_struct *t)
*
* delayed_free_task() also uses ->rcu, but it is only called
* when it fails to fork a process. Therefore, there is no
- * way it can conflict with put_task_struct().
+ * way it can conflict with __put_task_struct().
*/
call_rcu(&t->rcu, __put_task_struct_rcu_cb);
}
--
2.50.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v6] sched: do not call __put_task_struct() on rt if pi_blocked_on is set
2025-07-07 14:03 [PATCH v6] sched: do not call __put_task_struct() on rt if pi_blocked_on is set Luis Claudio R. Goncalves
@ 2025-07-10 16:19 ` Valentin Schneider
2025-07-14 14:15 ` Sebastian Andrzej Siewior
` (2 subsequent siblings)
3 siblings, 0 replies; 16+ messages in thread
From: Valentin Schneider @ 2025-07-10 16:19 UTC (permalink / raw)
To: Luis Claudio R. Goncalves, Peter Zijlstra,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
Tejun Heo, David Vernet, Barret Rhoden, Josh Don, Crystal Wood,
linux-kernel, linux-rt-devel, Juri Lelli, Ben Segall,
Dietmar Eggemann, Ingo Molnar, Mel Gorman, Vincent Guittot,
Thomas Gleixner, Wander Lairson Costa, lclaudio00
On 07/07/25 11:03, Luis Claudio R. Goncalves wrote:
> With PREEMPT_RT enabled, some of the calls to put_task_struct() coming
> from rt_mutex_adjust_prio_chain() could happen in preemptible context and
> with a mutex enqueued. That could lead to this sequence:
>
> rt_mutex_adjust_prio_chain()
> put_task_struct()
> __put_task_struct()
> sched_ext_free()
> spin_lock_irqsave()
> rtlock_lock() ---> TRIGGERS
> lockdep_assert(!current->pi_blocked_on);
>
> This is not a SCHED_EXT bug. The first cleanup function called by
> __put_task_struct() is sched_ext_free() and it happens to take a
> (RT) spin_lock, which in the scenario described above, would trigger
> the lockdep assertion of "!current->pi_blocked_on".
>
> Crystal Wood was able to identify the problem as __put_task_struct()
> being called during rt_mutex_adjust_prio_chain(), in the context of
> a process with a mutex enqueued.
>
> Instead of adding more complex conditions to decide when to directly
> call __put_task_struct() and when to defer the call, unconditionally
> resort to the deferred call on PREEMPT_RT to simplify the code.
>
> Suggested-by: Crystal Wood <crwood@redhat.com>
> Reviewed-by: Wander Lairson Costa <wander@redhat.com>
> Fixes: 893cdaaa3977 ("sched: avoid false lockdep splat in put_task_struct()")
> Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
FWIW:
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v6] sched: do not call __put_task_struct() on rt if pi_blocked_on is set
2025-07-07 14:03 [PATCH v6] sched: do not call __put_task_struct() on rt if pi_blocked_on is set Luis Claudio R. Goncalves
2025-07-10 16:19 ` Valentin Schneider
@ 2025-07-14 14:15 ` Sebastian Andrzej Siewior
2025-07-16 10:19 ` [tip: sched/core] sched: Do " tip-bot2 for Luis Claudio R. Goncalves
2025-07-28 20:14 ` [PATCH v6] sched: do " Oleg Nesterov
3 siblings, 0 replies; 16+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-07-14 14:15 UTC (permalink / raw)
To: Luis Claudio R. Goncalves
Cc: Peter Zijlstra, Clark Williams, Steven Rostedt, Tejun Heo,
David Vernet, Barret Rhoden, Josh Don, Crystal Wood, linux-kernel,
linux-rt-devel, Juri Lelli, Ben Segall, Dietmar Eggemann,
Ingo Molnar, Mel Gorman, Valentin Schneider, Vincent Guittot,
Thomas Gleixner, Wander Lairson Costa, lclaudio00
On 2025-07-07 11:03:59 [-0300], Luis Claudio R. Goncalves wrote:
> With PREEMPT_RT enabled, some of the calls to put_task_struct() coming
> from rt_mutex_adjust_prio_chain() could happen in preemptible context and
> with a mutex enqueued. That could lead to this sequence:
>
> rt_mutex_adjust_prio_chain()
> put_task_struct()
> __put_task_struct()
> sched_ext_free()
> spin_lock_irqsave()
> rtlock_lock() ---> TRIGGERS
> lockdep_assert(!current->pi_blocked_on);
>
> This is not a SCHED_EXT bug. The first cleanup function called by
> __put_task_struct() is sched_ext_free() and it happens to take a
> (RT) spin_lock, which in the scenario described above, would trigger
> the lockdep assertion of "!current->pi_blocked_on".
>
> Crystal Wood was able to identify the problem as __put_task_struct()
> being called during rt_mutex_adjust_prio_chain(), in the context of
> a process with a mutex enqueued.
>
> Instead of adding more complex conditions to decide when to directly
> call __put_task_struct() and when to defer the call, unconditionally
> resort to the deferred call on PREEMPT_RT to simplify the code.
>
> Suggested-by: Crystal Wood <crwood@redhat.com>
> Reviewed-by: Wander Lairson Costa <wander@redhat.com>
> Fixes: 893cdaaa3977 ("sched: avoid false lockdep splat in put_task_struct()")
> Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Side note: This simplifies the call chain to always free the task struct
via RCU. This also means that the stack is not immediately available for
recycle (CONFIG_VMAP_STACK) but after the grace period. Based on my
testing the new task was also deployed on a remote CPU therefore I
wouldn't worry too much here. Just wanted to point the only difference I
could come up with.
Sebastian
^ permalink raw reply [flat|nested] 16+ messages in thread
* [tip: sched/core] sched: Do not call __put_task_struct() on rt if pi_blocked_on is set
2025-07-07 14:03 [PATCH v6] sched: do not call __put_task_struct() on rt if pi_blocked_on is set Luis Claudio R. Goncalves
2025-07-10 16:19 ` Valentin Schneider
2025-07-14 14:15 ` Sebastian Andrzej Siewior
@ 2025-07-16 10:19 ` tip-bot2 for Luis Claudio R. Goncalves
2025-07-28 20:14 ` [PATCH v6] sched: do " Oleg Nesterov
3 siblings, 0 replies; 16+ messages in thread
From: tip-bot2 for Luis Claudio R. Goncalves @ 2025-07-16 10:19 UTC (permalink / raw)
To: linux-tip-commits
Cc: Crystal Wood, Luis Claudio R. Goncalves, Peter Zijlstra (Intel),
Wander Lairson Costa, Valentin Schneider,
Sebastian Andrzej Siewior, x86, linux-kernel
The following commit has been merged into the sched/core branch of tip:
Commit-ID: 8671bad873ebeb082afcf7b4501395c374da6023
Gitweb: https://git.kernel.org/tip/8671bad873ebeb082afcf7b4501395c374da6023
Author: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
AuthorDate: Mon, 07 Jul 2025 11:03:59 -03:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 14 Jul 2025 17:16:33 +02:00
sched: Do not call __put_task_struct() on rt if pi_blocked_on is set
With PREEMPT_RT enabled, some of the calls to put_task_struct() coming
from rt_mutex_adjust_prio_chain() could happen in preemptible context and
with a mutex enqueued. That could lead to this sequence:
rt_mutex_adjust_prio_chain()
put_task_struct()
__put_task_struct()
sched_ext_free()
spin_lock_irqsave()
rtlock_lock() ---> TRIGGERS
lockdep_assert(!current->pi_blocked_on);
This is not a SCHED_EXT bug. The first cleanup function called by
__put_task_struct() is sched_ext_free() and it happens to take a
(RT) spin_lock, which in the scenario described above, would trigger
the lockdep assertion of "!current->pi_blocked_on".
Crystal Wood was able to identify the problem as __put_task_struct()
being called during rt_mutex_adjust_prio_chain(), in the context of
a process with a mutex enqueued.
Instead of adding more complex conditions to decide when to directly
call __put_task_struct() and when to defer the call, unconditionally
resort to the deferred call on PREEMPT_RT to simplify the code.
Fixes: 893cdaaa3977 ("sched: avoid false lockdep splat in put_task_struct()")
Suggested-by: Crystal Wood <crwood@redhat.com>
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Wander Lairson Costa <wander@redhat.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/aGvTz5VaPFyj0pBV@uudg.org
---
include/linux/sched/task.h | 27 ++++++++++-----------------
1 file changed, 10 insertions(+), 17 deletions(-)
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index c517dbc..ea41795 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -131,24 +131,17 @@ static inline void put_task_struct(struct task_struct *t)
return;
/*
- * In !RT, it is always safe to call __put_task_struct().
- * Under RT, we can only call it in preemptible context.
- */
- if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible()) {
- static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP);
-
- lock_map_acquire_try(&put_task_map);
- __put_task_struct(t);
- lock_map_release(&put_task_map);
- return;
- }
-
- /*
- * under PREEMPT_RT, we can't call put_task_struct
+ * Under PREEMPT_RT, we can't call __put_task_struct
* in atomic context because it will indirectly
- * acquire sleeping locks.
+ * acquire sleeping locks. The same is true if the
+ * current process has a mutex enqueued (blocked on
+ * a PI chain).
+ *
+ * In !RT, it is always safe to call __put_task_struct().
+ * Though, in order to simplify the code, resort to the
+ * deferred call too.
*
- * call_rcu() will schedule delayed_put_task_struct_rcu()
+ * call_rcu() will schedule __put_task_struct_rcu_cb()
* to be called in process context.
*
* __put_task_struct() is called when
@@ -161,7 +154,7 @@ static inline void put_task_struct(struct task_struct *t)
*
* delayed_free_task() also uses ->rcu, but it is only called
* when it fails to fork a process. Therefore, there is no
- * way it can conflict with put_task_struct().
+ * way it can conflict with __put_task_struct().
*/
call_rcu(&t->rcu, __put_task_struct_rcu_cb);
}
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v6] sched: do not call __put_task_struct() on rt if pi_blocked_on is set
2025-07-07 14:03 [PATCH v6] sched: do not call __put_task_struct() on rt if pi_blocked_on is set Luis Claudio R. Goncalves
` (2 preceding siblings ...)
2025-07-16 10:19 ` [tip: sched/core] sched: Do " tip-bot2 for Luis Claudio R. Goncalves
@ 2025-07-28 20:14 ` Oleg Nesterov
2025-07-29 7:33 ` Luis Claudio R. Goncalves
3 siblings, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2025-07-28 20:14 UTC (permalink / raw)
To: Luis Claudio R. Goncalves
Cc: Peter Zijlstra, Sebastian Andrzej Siewior, Clark Williams,
Steven Rostedt, Tejun Heo, David Vernet, Barret Rhoden, Josh Don,
Crystal Wood, linux-kernel, linux-rt-devel, Juri Lelli,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Mel Gorman,
Valentin Schneider, Vincent Guittot, Thomas Gleixner,
Wander Lairson Costa, lclaudio00
On 07/07, Luis Claudio R. Goncalves wrote:
>
> Instead of adding more complex conditions to decide when to directly
> call __put_task_struct() and when to defer the call, unconditionally
> resort to the deferred call on PREEMPT_RT to simplify the code.
^^^^^^^^^^^^^
Confused... with this patch put_task_struct() always uses the deferred
call, regardless of PREEMPT_RT?
Oleg.
> --- a/include/linux/sched/task.h
> +++ b/include/linux/sched/task.h
> @@ -135,24 +135,17 @@ static inline void put_task_struct(struct task_struct *t)
> return;
>
> /*
> - * In !RT, it is always safe to call __put_task_struct().
> - * Under RT, we can only call it in preemptible context.
> - */
> - if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible()) {
> - static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP);
> -
> - lock_map_acquire_try(&put_task_map);
> - __put_task_struct(t);
> - lock_map_release(&put_task_map);
> - return;
> - }
> -
> - /*
> - * under PREEMPT_RT, we can't call put_task_struct
> + * Under PREEMPT_RT, we can't call __put_task_struct
> * in atomic context because it will indirectly
> - * acquire sleeping locks.
> + * acquire sleeping locks. The same is true if the
> + * current process has a mutex enqueued (blocked on
> + * a PI chain).
> + *
> + * In !RT, it is always safe to call __put_task_struct().
> + * Though, in order to simplify the code, resort to the
> + * deferred call too.
> *
> - * call_rcu() will schedule delayed_put_task_struct_rcu()
> + * call_rcu() will schedule __put_task_struct_rcu_cb()
> * to be called in process context.
> *
> * __put_task_struct() is called when
> @@ -165,7 +158,7 @@ static inline void put_task_struct(struct task_struct *t)
> *
> * delayed_free_task() also uses ->rcu, but it is only called
> * when it fails to fork a process. Therefore, there is no
> - * way it can conflict with put_task_struct().
> + * way it can conflict with __put_task_struct().
> */
> call_rcu(&t->rcu, __put_task_struct_rcu_cb);
> }
> --
> 2.50.0
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v6] sched: do not call __put_task_struct() on rt if pi_blocked_on is set
2025-07-28 20:14 ` [PATCH v6] sched: do " Oleg Nesterov
@ 2025-07-29 7:33 ` Luis Claudio R. Goncalves
2025-07-29 11:47 ` Oleg Nesterov
0 siblings, 1 reply; 16+ messages in thread
From: Luis Claudio R. Goncalves @ 2025-07-29 7:33 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Peter Zijlstra, Sebastian Andrzej Siewior, Clark Williams,
Steven Rostedt, Tejun Heo, David Vernet, Barret Rhoden, Josh Don,
Crystal Wood, linux-kernel, linux-rt-devel, Juri Lelli,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Mel Gorman,
Valentin Schneider, Vincent Guittot, Thomas Gleixner,
Wander Lairson Costa
On Mon, Jul 28, 2025 at 10:14:41PM +0200, Oleg Nesterov wrote:
> On 07/07, Luis Claudio R. Goncalves wrote:
> >
> > Instead of adding more complex conditions to decide when to directly
> > call __put_task_struct() and when to defer the call, unconditionally
> > resort to the deferred call on PREEMPT_RT to simplify the code.
> ^^^^^^^^^^^^^
> Confused... with this patch put_task_struct() always uses the deferred
> call, regardless of PREEMPT_RT?
>
> Oleg.
You are correct.
I mistakenly sent the patch from v3, with the updated description. I had
been working on that patch in parallel for a future RFC and mixed them up
when updating the description for v6. The changes from v4 onward were
updates to the patch description only.
The fact that I had discussed this patch (the one submitted) with some
people and was asking for tests to assess robustness and dependability
may have further composed the confusion.
I posted v6 (with the wrong patch) a couple of hours before leaving for
a 2-week vacation. This is also why I didn't notice the wrong submission
before. That was an unfortunate mistake on my part, no bad intent.
Not sure how to proceed here, if I should resend the correct patch or a
follow-up fix like this:
======
From: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Subject: sched: restore the behavior of put_task_struct() for non-rt
Commit 8671bad873eb ("sched: Do not call __put_task_struct() on rt
if pi_blocked_on is set") changed the behavior of put_task_struct()
unconditionally, even when PREEMPT_RT was not enabled, in clear mismatch
with the commit description.
Restore the previous behavior of put_task_struct() for the PREEMPT_RT
disabled case.
Fixes: 8671bad873eb ("sched: Do not call __put_task_struct() on rt if pi_blocked_on is set")
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
---
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index ea41795a352b..51678a541477 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -130,6 +133,16 @@ static inline void put_task_struct(struct task_struct *t)
if (!refcount_dec_and_test(&t->usage))
return;
+ /* In !RT, it is always safe to call __put_task_struct(). */
+ if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
+ static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP);
+
+ lock_map_acquire_try(&put_task_map);
+ __put_task_struct(t);
+ lock_map_release(&put_task_map);
+ return;
+ }
+
/*
* Under PREEMPT_RT, we can't call __put_task_struct
* in atomic context because it will indirectly
@@ -137,10 +150,6 @@ static inline void put_task_struct(struct task_struct *t)
* current process has a mutex enqueued (blocked on
* a PI chain).
*
- * In !RT, it is always safe to call __put_task_struct().
- * Though, in order to simplify the code, resort to the
- * deferred call too.
- *
* call_rcu() will schedule __put_task_struct_rcu_cb()
* to be called in process context.
*
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v6] sched: do not call __put_task_struct() on rt if pi_blocked_on is set
2025-07-29 7:33 ` Luis Claudio R. Goncalves
@ 2025-07-29 11:47 ` Oleg Nesterov
2025-07-29 12:45 ` Luis Claudio R. Goncalves
0 siblings, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2025-07-29 11:47 UTC (permalink / raw)
To: Luis Claudio R. Goncalves
Cc: Peter Zijlstra, Sebastian Andrzej Siewior, Clark Williams,
Steven Rostedt, Tejun Heo, David Vernet, Barret Rhoden, Josh Don,
Crystal Wood, linux-kernel, linux-rt-devel, Juri Lelli,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Mel Gorman,
Valentin Schneider, Vincent Guittot, Thomas Gleixner,
Wander Lairson Costa
On 07/29, Luis Claudio R. Goncalves wrote:
>
> From: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
> Subject: sched: restore the behavior of put_task_struct() for non-rt
>
> Commit 8671bad873eb ("sched: Do not call __put_task_struct() on rt
> if pi_blocked_on is set") changed the behavior of put_task_struct()
> unconditionally, even when PREEMPT_RT was not enabled, in clear mismatch
> with the commit description.
>
> Restore the previous behavior of put_task_struct() for the PREEMPT_RT
> disabled case.
>
> Fixes: 8671bad873eb ("sched: Do not call __put_task_struct() on rt if pi_blocked_on is set")
> Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
> ---
> diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
> index ea41795a352b..51678a541477 100644
> --- a/include/linux/sched/task.h
> +++ b/include/linux/sched/task.h
> @@ -130,6 +133,16 @@ static inline void put_task_struct(struct task_struct *t)
> if (!refcount_dec_and_test(&t->usage))
> return;
>
> + /* In !RT, it is always safe to call __put_task_struct(). */
> + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
> + static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP);
> +
> + lock_map_acquire_try(&put_task_map);
> + __put_task_struct(t);
> + lock_map_release(&put_task_map);
> + return;
> + }
FWIW:
Acked-by: Oleg Nesterov <oleg@redhat.com>
At the same time... I don't understand this DEFINE_WAIT_OVERRIDE_MAP().
IIUC, we need to shut up lockdep when put_task_struct() is called under
raw_spinlock_t and __put_task_struct() paths take spinlock_t, right?
Perhaps this deserves a comment...
But if I am right, why LD_WAIT_SLEEP? LD_WAIT_CONFIG should equally work, no?
LD_WAIT_SLEEP can fool lockdep more than we need, suppose that __put_task_struct()
does mutex_lock(). Not really a problem, might_sleep/etc will complain in this
case, but still.
Or I am totally confused?
Oleg.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v6] sched: do not call __put_task_struct() on rt if pi_blocked_on is set
2025-07-29 11:47 ` Oleg Nesterov
@ 2025-07-29 12:45 ` Luis Claudio R. Goncalves
2025-07-29 13:09 ` Oleg Nesterov
0 siblings, 1 reply; 16+ messages in thread
From: Luis Claudio R. Goncalves @ 2025-07-29 12:45 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Peter Zijlstra, Sebastian Andrzej Siewior, Clark Williams,
Steven Rostedt, Tejun Heo, David Vernet, Barret Rhoden, Josh Don,
Crystal Wood, linux-kernel, linux-rt-devel, Juri Lelli,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Mel Gorman,
Valentin Schneider, Vincent Guittot, Thomas Gleixner,
Wander Lairson Costa
On Tue, Jul 29, 2025 at 01:47:03PM +0200, Oleg Nesterov wrote:
> On 07/29, Luis Claudio R. Goncalves wrote:
> >
> > From: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
> > Subject: sched: restore the behavior of put_task_struct() for non-rt
> >
> > Commit 8671bad873eb ("sched: Do not call __put_task_struct() on rt
> > if pi_blocked_on is set") changed the behavior of put_task_struct()
> > unconditionally, even when PREEMPT_RT was not enabled, in clear mismatch
> > with the commit description.
> >
> > Restore the previous behavior of put_task_struct() for the PREEMPT_RT
> > disabled case.
> >
> > Fixes: 8671bad873eb ("sched: Do not call __put_task_struct() on rt if pi_blocked_on is set")
> > Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
> > ---
> > diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
> > index ea41795a352b..51678a541477 100644
> > --- a/include/linux/sched/task.h
> > +++ b/include/linux/sched/task.h
> > @@ -130,6 +133,16 @@ static inline void put_task_struct(struct task_struct *t)
> > if (!refcount_dec_and_test(&t->usage))
> > return;
> >
> > + /* In !RT, it is always safe to call __put_task_struct(). */
> > + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
> > + static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP);
> > +
> > + lock_map_acquire_try(&put_task_map);
> > + __put_task_struct(t);
> > + lock_map_release(&put_task_map);
> > + return;
> > + }
>
> FWIW:
>
> Acked-by: Oleg Nesterov <oleg@redhat.com>
>
>
> At the same time... I don't understand this DEFINE_WAIT_OVERRIDE_MAP().
> IIUC, we need to shut up lockdep when put_task_struct() is called under
> raw_spinlock_t and __put_task_struct() paths take spinlock_t, right?
> Perhaps this deserves a comment...
I reverted that code to the previous state, commit 893cdaaa3977 ("sched:
avoid false lockdep splat in put_task_struct()") and simplified the "if"
statement. In the original code, PREEMPT_RT could call __put_task_struct()
if the context was preemptible. But in the proposed code __put_task_struct()
is only called if PREEMPT_RT is disabled. In this case I believe we could
simply do:
+ if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
+ __put_task_struct(t);
+ return;
+ }
Does that make sense?
Luis
> But if I am right, why LD_WAIT_SLEEP? LD_WAIT_CONFIG should equally work, no?
>
> LD_WAIT_SLEEP can fool lockdep more than we need, suppose that __put_task_struct()
> does mutex_lock(). Not really a problem, might_sleep/etc will complain in this
> case, but still.
>
> Or I am totally confused?
>
> Oleg.
>
---end quoted text---
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v6] sched: do not call __put_task_struct() on rt if pi_blocked_on is set
2025-07-29 12:45 ` Luis Claudio R. Goncalves
@ 2025-07-29 13:09 ` Oleg Nesterov
2025-08-01 10:24 ` Oleg Nesterov
0 siblings, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2025-07-29 13:09 UTC (permalink / raw)
To: Luis Claudio R. Goncalves
Cc: Peter Zijlstra, Sebastian Andrzej Siewior, Clark Williams,
Steven Rostedt, Tejun Heo, David Vernet, Barret Rhoden, Josh Don,
Crystal Wood, linux-kernel, linux-rt-devel, Juri Lelli,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Mel Gorman,
Valentin Schneider, Vincent Guittot, Thomas Gleixner,
Wander Lairson Costa
On 07/29, Luis Claudio R. Goncalves wrote:
>
> On Tue, Jul 29, 2025 at 01:47:03PM +0200, Oleg Nesterov wrote:
> > On 07/29, Luis Claudio R. Goncalves wrote:
> > >
> > > + /* In !RT, it is always safe to call __put_task_struct(). */
> > > + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
> > > + static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP);
> > > +
> > > + lock_map_acquire_try(&put_task_map);
> > > + __put_task_struct(t);
> > > + lock_map_release(&put_task_map);
> > > + return;
> > > + }
> >
> > FWIW:
> >
> > Acked-by: Oleg Nesterov <oleg@redhat.com>
> >
> >
> > At the same time... I don't understand this DEFINE_WAIT_OVERRIDE_MAP().
> > IIUC, we need to shut up lockdep when put_task_struct() is called under
> > raw_spinlock_t and __put_task_struct() paths take spinlock_t, right?
> > Perhaps this deserves a comment...
>
> I reverted that code to the previous state, commit 893cdaaa3977 ("sched:
> avoid false lockdep splat in put_task_struct()") and simplified the "if"
> statement.
Yes, yes, I see and I have already acked your patch.
> In the original code, PREEMPT_RT could call __put_task_struct()
> if the context was preemptible. But in the proposed code __put_task_struct()
> is only called if PREEMPT_RT is disabled. In this case I believe we could
> simply do:
>
> + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
> + __put_task_struct(t);
> + return;
> + }
>
> Does that make sense?
Hmm... But, again unless I am totally confused, we do need the
DEFINE_WAIT_OVERRIDE_MAP() trick even if !PREEMPT_RT ?
Looking at lockdep_wait_type, I think that with
CONFIG_PROVE_RAW_LOCK_NESTING=y lockdep enforces the PREEMPT_RT
locking rules even if PREEMPT_RT is not set?
But:
> > But if I am right, why LD_WAIT_SLEEP? LD_WAIT_CONFIG should equally work, no?
> >
> > LD_WAIT_SLEEP can fool lockdep more than we need, suppose that __put_task_struct()
> > does mutex_lock(). Not really a problem, might_sleep/etc will complain in this
> > case, but still.
I still think LD_WAIT_CONFIG makes more sense.
Oleg.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v6] sched: do not call __put_task_struct() on rt if pi_blocked_on is set
2025-07-29 13:09 ` Oleg Nesterov
@ 2025-08-01 10:24 ` Oleg Nesterov
2025-08-01 10:51 ` Luis Claudio R. Goncalves
2025-08-11 10:59 ` Sebastian Andrzej Siewior
0 siblings, 2 replies; 16+ messages in thread
From: Oleg Nesterov @ 2025-08-01 10:24 UTC (permalink / raw)
To: Luis Claudio R. Goncalves
Cc: Peter Zijlstra, Sebastian Andrzej Siewior, Clark Williams,
Steven Rostedt, Tejun Heo, David Vernet, Barret Rhoden, Josh Don,
Crystal Wood, linux-kernel, linux-rt-devel, Juri Lelli,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Mel Gorman,
Valentin Schneider, Vincent Guittot, Thomas Gleixner,
Wander Lairson Costa
On 07/29, Oleg Nesterov wrote:
>
> On 07/29, Luis Claudio R. Goncalves wrote:
> >
> > On Tue, Jul 29, 2025 at 01:47:03PM +0200, Oleg Nesterov wrote:
> > > On 07/29, Luis Claudio R. Goncalves wrote:
> > > >
> > > > + /* In !RT, it is always safe to call __put_task_struct(). */
> > > > + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
> > > > + static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP);
> > > > +
> > > > + lock_map_acquire_try(&put_task_map);
> > > > + __put_task_struct(t);
> > > > + lock_map_release(&put_task_map);
> > > > + return;
> > > > + }
> > >
> > > FWIW:
> > >
> > > Acked-by: Oleg Nesterov <oleg@redhat.com>
> > >
> > >
> > > At the same time... I don't understand this DEFINE_WAIT_OVERRIDE_MAP().
> > > IIUC, we need to shut up lockdep when put_task_struct() is called under
> > > raw_spinlock_t and __put_task_struct() paths take spinlock_t, right?
> > > Perhaps this deserves a comment...
> >
> > I reverted that code to the previous state, commit 893cdaaa3977 ("sched:
> > avoid false lockdep splat in put_task_struct()") and simplified the "if"
> > statement.
>
> Yes, yes, I see and I have already acked your patch.
So I think you should just resend it.
s/LD_WAIT_SLEEP/LD_WAIT_CONFIG/ needs another discussion even if I am right,
sorry for the confusion.
Oleg.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v6] sched: do not call __put_task_struct() on rt if pi_blocked_on is set
2025-08-01 10:24 ` Oleg Nesterov
@ 2025-08-01 10:51 ` Luis Claudio R. Goncalves
2025-08-11 10:59 ` Sebastian Andrzej Siewior
1 sibling, 0 replies; 16+ messages in thread
From: Luis Claudio R. Goncalves @ 2025-08-01 10:51 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Peter Zijlstra, Sebastian Andrzej Siewior, Clark Williams,
Steven Rostedt, Tejun Heo, David Vernet, Barret Rhoden, Josh Don,
Crystal Wood, linux-kernel, linux-rt-devel, Juri Lelli,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Mel Gorman,
Valentin Schneider, Vincent Guittot, Thomas Gleixner,
Wander Lairson Costa
On Fri, Aug 01, 2025 at 12:24:29PM +0200, Oleg Nesterov wrote:
> On 07/29, Oleg Nesterov wrote:
> >
> > On 07/29, Luis Claudio R. Goncalves wrote:
> > >
> > > On Tue, Jul 29, 2025 at 01:47:03PM +0200, Oleg Nesterov wrote:
> > > > On 07/29, Luis Claudio R. Goncalves wrote:
> > > > >
> > > > > + /* In !RT, it is always safe to call __put_task_struct(). */
> > > > > + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
> > > > > + static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP);
> > > > > +
> > > > > + lock_map_acquire_try(&put_task_map);
> > > > > + __put_task_struct(t);
> > > > > + lock_map_release(&put_task_map);
> > > > > + return;
> > > > > + }
> > > >
> > > > FWIW:
> > > >
> > > > Acked-by: Oleg Nesterov <oleg@redhat.com>
> > > >
> > > >
> > > > At the same time... I don't understand this DEFINE_WAIT_OVERRIDE_MAP().
> > > > IIUC, we need to shut up lockdep when put_task_struct() is called under
> > > > raw_spinlock_t and __put_task_struct() paths take spinlock_t, right?
> > > > Perhaps this deserves a comment...
> > >
> > > I reverted that code to the previous state, commit 893cdaaa3977 ("sched:
> > > avoid false lockdep splat in put_task_struct()") and simplified the "if"
> > > statement.
> >
> > Yes, yes, I see and I have already acked your patch.
>
> So I think you should just resend it.
Thank you! I was a bit unsure on how to proceed :)
> s/LD_WAIT_SLEEP/LD_WAIT_CONFIG/ needs another discussion even if I am right,
> sorry for the confusion.
For what is worth, I tested the change with LD_WAIT_CONFIG and it worked as
expected.
Luis
> Oleg.
>
---end quoted text---
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v6] sched: do not call __put_task_struct() on rt if pi_blocked_on is set
2025-08-01 10:24 ` Oleg Nesterov
2025-08-01 10:51 ` Luis Claudio R. Goncalves
@ 2025-08-11 10:59 ` Sebastian Andrzej Siewior
2025-08-11 11:06 ` Oleg Nesterov
1 sibling, 1 reply; 16+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-08-11 10:59 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Luis Claudio R. Goncalves, Peter Zijlstra, Clark Williams,
Steven Rostedt, Tejun Heo, David Vernet, Barret Rhoden, Josh Don,
Crystal Wood, linux-kernel, linux-rt-devel, Juri Lelli,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Mel Gorman,
Valentin Schneider, Vincent Guittot, Thomas Gleixner,
Wander Lairson Costa
On 2025-08-01 12:24:29 [+0200], Oleg Nesterov wrote:
> s/LD_WAIT_SLEEP/LD_WAIT_CONFIG/ needs another discussion even if I am right,
> sorry for the confusion.
You are correct Oleg. I've been just verifying it and yes: LD_WAIT_SLEEP
suppresses also mutex while the intention is to only suppress
spinlock_t.
We have four users in tree, based on quick check all four should use
CONFIG, three of them do use SLEEP.
> Oleg.
Sebastian
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v6] sched: do not call __put_task_struct() on rt if pi_blocked_on is set
2025-08-11 10:59 ` Sebastian Andrzej Siewior
@ 2025-08-11 11:06 ` Oleg Nesterov
2025-08-11 12:16 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2025-08-11 11:06 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: Luis Claudio R. Goncalves, Peter Zijlstra, Clark Williams,
Steven Rostedt, Tejun Heo, David Vernet, Barret Rhoden, Josh Don,
Crystal Wood, linux-kernel, linux-rt-devel, Juri Lelli,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Mel Gorman,
Valentin Schneider, Vincent Guittot, Thomas Gleixner,
Wander Lairson Costa
On 08/11, Sebastian Andrzej Siewior wrote:
>
> On 2025-08-01 12:24:29 [+0200], Oleg Nesterov wrote:
> > s/LD_WAIT_SLEEP/LD_WAIT_CONFIG/ needs another discussion even if I am right,
> > sorry for the confusion.
>
> You are correct Oleg. I've been just verifying it and yes: LD_WAIT_SLEEP
> suppresses also mutex while the intention is to only suppress
> spinlock_t.
Good, thanks.
> We have four users in tree, based on quick check all four should use
> CONFIG, three of them do use SLEEP.
Yes. I'll send the simple patch when this patch from Luis is merged.
Oleg
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v6] sched: do not call __put_task_struct() on rt if pi_blocked_on is set
2025-08-11 11:06 ` Oleg Nesterov
@ 2025-08-11 12:16 ` Sebastian Andrzej Siewior
2025-08-11 12:19 ` Oleg Nesterov
0 siblings, 1 reply; 16+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-08-11 12:16 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Luis Claudio R. Goncalves, Peter Zijlstra, Clark Williams,
Steven Rostedt, Tejun Heo, David Vernet, Barret Rhoden, Josh Don,
Crystal Wood, linux-kernel, linux-rt-devel, Juri Lelli,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Mel Gorman,
Valentin Schneider, Vincent Guittot, Thomas Gleixner,
Wander Lairson Costa
On 2025-08-11 13:06:18 [+0200], Oleg Nesterov wrote:
> On 08/11, Sebastian Andrzej Siewior wrote:
> >
> > On 2025-08-01 12:24:29 [+0200], Oleg Nesterov wrote:
> > > s/LD_WAIT_SLEEP/LD_WAIT_CONFIG/ needs another discussion even if I am right,
> > > sorry for the confusion.
> >
> > You are correct Oleg. I've been just verifying it and yes: LD_WAIT_SLEEP
> > suppresses also mutex while the intention is to only suppress
> > spinlock_t.
>
> Good, thanks.
>
> > We have four users in tree, based on quick check all four should use
> > CONFIG, three of them do use SLEEP.
>
> Yes. I'll send the simple patch when this patch from Luis is merged.
Okay. Let me cover then the other instances.
> Oleg
Sebastian
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v6] sched: do not call __put_task_struct() on rt if pi_blocked_on is set
2025-08-11 12:16 ` Sebastian Andrzej Siewior
@ 2025-08-11 12:19 ` Oleg Nesterov
2025-08-11 12:27 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 16+ messages in thread
From: Oleg Nesterov @ 2025-08-11 12:19 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: Luis Claudio R. Goncalves, Peter Zijlstra, Clark Williams,
Steven Rostedt, Tejun Heo, David Vernet, Barret Rhoden, Josh Don,
Crystal Wood, linux-kernel, linux-rt-devel, Juri Lelli,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Mel Gorman,
Valentin Schneider, Vincent Guittot, Thomas Gleixner,
Wander Lairson Costa
On 08/11, Sebastian Andrzej Siewior wrote:
>
> On 2025-08-11 13:06:18 [+0200], Oleg Nesterov wrote:
> > On 08/11, Sebastian Andrzej Siewior wrote:
> > >
> > > On 2025-08-01 12:24:29 [+0200], Oleg Nesterov wrote:
> > > > s/LD_WAIT_SLEEP/LD_WAIT_CONFIG/ needs another discussion even if I am right,
> > > > sorry for the confusion.
> > >
> > > You are correct Oleg. I've been just verifying it and yes: LD_WAIT_SLEEP
> > > suppresses also mutex while the intention is to only suppress
> > > spinlock_t.
> >
> > Good, thanks.
> >
> > > We have four users in tree, based on quick check all four should use
> > > CONFIG, three of them do use SLEEP.
> >
> > Yes. I'll send the simple patch when this patch from Luis is merged.
>
> Okay. Let me cover then the other instances.
I was going to update them all, but feel free to do.
Oleg.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v6] sched: do not call __put_task_struct() on rt if pi_blocked_on is set
2025-08-11 12:19 ` Oleg Nesterov
@ 2025-08-11 12:27 ` Sebastian Andrzej Siewior
0 siblings, 0 replies; 16+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-08-11 12:27 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Luis Claudio R. Goncalves, Peter Zijlstra, Clark Williams,
Steven Rostedt, Tejun Heo, David Vernet, Barret Rhoden, Josh Don,
Crystal Wood, linux-kernel, linux-rt-devel, Juri Lelli,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Mel Gorman,
Valentin Schneider, Vincent Guittot, Thomas Gleixner,
Wander Lairson Costa
On 2025-08-11 14:19:51 [+0200], Oleg Nesterov wrote:
> > > Yes. I'll send the simple patch when this patch from Luis is merged.
> >
> > Okay. Let me cover then the other instances.
>
> I was going to update them all, but feel free to do.
Oh no. Please do cover them all and I review then. I understood that you
make a single patch for a single instance.
> Oleg.
Sebastian
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-08-11 12:27 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-07 14:03 [PATCH v6] sched: do not call __put_task_struct() on rt if pi_blocked_on is set Luis Claudio R. Goncalves
2025-07-10 16:19 ` Valentin Schneider
2025-07-14 14:15 ` Sebastian Andrzej Siewior
2025-07-16 10:19 ` [tip: sched/core] sched: Do " tip-bot2 for Luis Claudio R. Goncalves
2025-07-28 20:14 ` [PATCH v6] sched: do " Oleg Nesterov
2025-07-29 7:33 ` Luis Claudio R. Goncalves
2025-07-29 11:47 ` Oleg Nesterov
2025-07-29 12:45 ` Luis Claudio R. Goncalves
2025-07-29 13:09 ` Oleg Nesterov
2025-08-01 10:24 ` Oleg Nesterov
2025-08-01 10:51 ` Luis Claudio R. Goncalves
2025-08-11 10:59 ` Sebastian Andrzej Siewior
2025-08-11 11:06 ` Oleg Nesterov
2025-08-11 12:16 ` Sebastian Andrzej Siewior
2025-08-11 12:19 ` Oleg Nesterov
2025-08-11 12:27 ` Sebastian Andrzej Siewior
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).