linux-rt-devel.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] sched: do not call __put_task_struct() on rt if pi_blocked_on is set
@ 2025-04-10 12:10 Luis Claudio R. Goncalves
  2025-04-10 12:20 ` Sebastian Andrzej Siewior
  2025-04-10 12:40 ` Peter Zijlstra
  0 siblings, 2 replies; 5+ messages in thread
From: Luis Claudio R. Goncalves @ 2025-04-10 12:10 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Tejun Heo, David Vernet, Barret Rhoden, Josh Don, Crystal Wood,
	linux-kernel, linux-rt-devel, Juri Lelli, Ben Segall,
	Dietmar Eggemann, Ingo Molnar, Mel Gorman, Peter Zijlstra,
	Valentin Schneider, Vincent Guittot, Thomas Gleixner
  Cc: lclaudio00

With PREEMPT_RT enabled, some of the calls to put_task_struct() coming
from rt_mutex_adjust_prio_chain() could happen in preemptible context and
with a mutex enqueued. That could lead to this sequence:

	rt_mutex_adjust_prio_chain()
	  put_task_struct()
	    __put_task_struct()
	      sched_ext_free()
	        spin_lock_irqsave()
	          rtlock_lock() --->  TRIGGERS
	                              lockdep_assert(!current->pi_blocked_on);

Fix that by unconditionally resorting to the deferred call to
__put_task_struct().

v2: (Rostedt) remove the #ifdef from put_task_struct() and create
    tsk_is_pi_blocked_on() in sched.h to make the change cleaner.

v3: (Sebastian and PeterZ) always call the RCU deferred __put_task_struct().

Suggested-by: Crystal Wood <crwood@redhat.com>
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
---
 include/linux/sched/task.h |   20 +++++---------------
 1 file changed, 5 insertions(+), 15 deletions(-)

diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 0f2aeb37bbb04..49847efe5559e 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -134,22 +134,12 @@ static inline void put_task_struct(struct task_struct *t)
 		return;
 
 	/*
-	 * In !RT, it is always safe to call __put_task_struct().
-	 * Under RT, we can only call it in preemptible context.
-	 */
-	if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible()) {
-		static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP);
-
-		lock_map_acquire_try(&put_task_map);
-		__put_task_struct(t);
-		lock_map_release(&put_task_map);
-		return;
-	}
-
-	/*
-	 * under PREEMPT_RT, we can't call put_task_struct
+	 * In !RT, it is always safe to call __put_task_struct(),
+	 * but under PREEMPT_RT, we can't call put_task_struct
 	 * in atomic context because it will indirectly
-	 * acquire sleeping locks.
+	 * acquire sleeping locks. The same is true if the
+	 * current process has a mutex enqueued (blocked on
+	 * a PI chain).
 	 *
 	 * call_rcu() will schedule delayed_put_task_struct_rcu()
 	 * to be called in process context.
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] sched: do not call __put_task_struct() on rt if pi_blocked_on is set
  2025-04-10 12:10 [PATCH v3] sched: do not call __put_task_struct() on rt if pi_blocked_on is set Luis Claudio R. Goncalves
@ 2025-04-10 12:20 ` Sebastian Andrzej Siewior
  2025-04-10 12:39   ` Luis Claudio R. Goncalves
  2025-04-10 12:40 ` Peter Zijlstra
  1 sibling, 1 reply; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-04-10 12:20 UTC (permalink / raw)
  To: Luis Claudio R. Goncalves
  Cc: Clark Williams, Steven Rostedt, Tejun Heo, David Vernet,
	Barret Rhoden, Josh Don, Crystal Wood, linux-kernel,
	linux-rt-devel, Juri Lelli, Ben Segall, Dietmar Eggemann,
	Ingo Molnar, Mel Gorman, Peter Zijlstra, Valentin Schneider,
	Vincent Guittot, Thomas Gleixner, lclaudio00

On 2025-04-10 09:10:12 [-0300], Luis Claudio R. Goncalves wrote:
> diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
> --- a/include/linux/sched/task.h
> +++ b/include/linux/sched/task.h
> @@ -134,22 +134,12 @@ static inline void put_task_struct(struct task_struct *t)
>  		return;
>  
>  	/*
> -	 * In !RT, it is always safe to call __put_task_struct().
> -	 * Under RT, we can only call it in preemptible context.
> -	 */
> -	if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible()) {
> -		static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP);
> -
> -		lock_map_acquire_try(&put_task_map);
> -		__put_task_struct(t);
> -		lock_map_release(&put_task_map);
> -		return;
> -	}
> -
> -	/*
> -	 * under PREEMPT_RT, we can't call put_task_struct
> +	 * In !RT, it is always safe to call __put_task_struct(),
> +	 * but under PREEMPT_RT, we can't call put_task_struct
>  	 * in atomic context because it will indirectly
> -	 * acquire sleeping locks.
> +	 * acquire sleeping locks. The same is true if the
> +	 * current process has a mutex enqueued (blocked on
> +	 * a PI chain).
>  	 *
>  	 * call_rcu() will schedule delayed_put_task_struct_rcu()
>  	 * to be called in process context.

Did you test it with lockdep with and without PREEMPT_RT? It would be
nice to throw some testing on it.
This comment here "call_rcu will schedule bla in process context" is
wrong. It will schedule the callback in softirq context. Unless RCU is
configured to run the callbacks in rcuc/ thread which is the default for
PREEMPT_RT. Also delayed_put_task_struct_rcu() does not exist, imho
never did.

Sebastian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] sched: do not call __put_task_struct() on rt if pi_blocked_on is set
  2025-04-10 12:20 ` Sebastian Andrzej Siewior
@ 2025-04-10 12:39   ` Luis Claudio R. Goncalves
  0 siblings, 0 replies; 5+ messages in thread
From: Luis Claudio R. Goncalves @ 2025-04-10 12:39 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Clark Williams, Steven Rostedt, Tejun Heo, David Vernet,
	Barret Rhoden, Josh Don, Crystal Wood, linux-kernel,
	linux-rt-devel, Juri Lelli, Ben Segall, Dietmar Eggemann,
	Ingo Molnar, Mel Gorman, Peter Zijlstra, Valentin Schneider,
	Vincent Guittot, Thomas Gleixner

On Thu, Apr 10, 2025 at 02:20:02PM +0200, Sebastian Andrzej Siewior wrote:
> On 2025-04-10 09:10:12 [-0300], Luis Claudio R. Goncalves wrote:
> > diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
> > --- a/include/linux/sched/task.h
> > +++ b/include/linux/sched/task.h
> > @@ -134,22 +134,12 @@ static inline void put_task_struct(struct task_struct *t)
> >  		return;
> >  
> >  	/*
> > -	 * In !RT, it is always safe to call __put_task_struct().
> > -	 * Under RT, we can only call it in preemptible context.
> > -	 */
> > -	if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible()) {
> > -		static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP);
> > -
> > -		lock_map_acquire_try(&put_task_map);
> > -		__put_task_struct(t);
> > -		lock_map_release(&put_task_map);
> > -		return;
> > -	}
> > -
> > -	/*
> > -	 * under PREEMPT_RT, we can't call put_task_struct
> > +	 * In !RT, it is always safe to call __put_task_struct(),
> > +	 * but under PREEMPT_RT, we can't call put_task_struct
> >  	 * in atomic context because it will indirectly
> > -	 * acquire sleeping locks.
> > +	 * acquire sleeping locks. The same is true if the
> > +	 * current process has a mutex enqueued (blocked on
> > +	 * a PI chain).
> >  	 *
> >  	 * call_rcu() will schedule delayed_put_task_struct_rcu()
> >  	 * to be called in process context.
> 
> Did you test it with lockdep with and without PREEMPT_RT? It would be
> nice to throw some testing on it.

I will re-run the full set of tests on both kernels.

> This comment here "call_rcu will schedule bla in process context" is
> wrong. It will schedule the callback in softirq context. Unless RCU is
> configured to run the callbacks in rcuc/ thread which is the default for
> PREEMPT_RT. Also delayed_put_task_struct_rcu() does not exist, imho
> never did.

I kept the original comment about the call_rcu in process context, but
didn't realize that wouldn't hold true for !RT. Would you prefer I adjust
the comments (for RT vs non-RT and other possibilities) or remove them
entirely?

And I completely missed delayed_put_task_struct_rcu() vs
__put_task_struct_rcu_cb() in the original comment.

Thank you again for the review!
Luis
> 
> Sebastian
> 
---end quoted text---


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] sched: do not call __put_task_struct() on rt if pi_blocked_on is set
  2025-04-10 12:10 [PATCH v3] sched: do not call __put_task_struct() on rt if pi_blocked_on is set Luis Claudio R. Goncalves
  2025-04-10 12:20 ` Sebastian Andrzej Siewior
@ 2025-04-10 12:40 ` Peter Zijlstra
  2025-04-10 14:32   ` Luis Claudio R. Goncalves
  1 sibling, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2025-04-10 12:40 UTC (permalink / raw)
  To: Luis Claudio R. Goncalves
  Cc: Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Tejun Heo, David Vernet, Barret Rhoden, Josh Don, Crystal Wood,
	linux-kernel, linux-rt-devel, Juri Lelli, Ben Segall,
	Dietmar Eggemann, Ingo Molnar, Mel Gorman, Valentin Schneider,
	Vincent Guittot, Thomas Gleixner, lclaudio00

On Thu, Apr 10, 2025 at 09:10:12AM -0300, Luis Claudio R. Goncalves wrote:
> With PREEMPT_RT enabled, some of the calls to put_task_struct() coming
> from rt_mutex_adjust_prio_chain() could happen in preemptible context and
> with a mutex enqueued. That could lead to this sequence:
> 
> 	rt_mutex_adjust_prio_chain()
> 	  put_task_struct()
> 	    __put_task_struct()
> 	      sched_ext_free()
> 	        spin_lock_irqsave()
> 	          rtlock_lock() --->  TRIGGERS
> 	                              lockdep_assert(!current->pi_blocked_on);
> 
> Fix that by unconditionally resorting to the deferred call to
> __put_task_struct().
> 
> v2: (Rostedt) remove the #ifdef from put_task_struct() and create
>     tsk_is_pi_blocked_on() in sched.h to make the change cleaner.
> 
> v3: (Sebastian and PeterZ) always call the RCU deferred __put_task_struct().

Changelog goes below the --- line.

> Suggested-by: Crystal Wood <crwood@redhat.com>
> Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
> ---
>  include/linux/sched/task.h |   20 +++++---------------
>  1 file changed, 5 insertions(+), 15 deletions(-)
> 
> diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
> index 0f2aeb37bbb04..49847efe5559e 100644
> --- a/include/linux/sched/task.h
> +++ b/include/linux/sched/task.h
> @@ -134,22 +134,12 @@ static inline void put_task_struct(struct task_struct *t)
>  		return;
>  
>  	/*
> -	 * In !RT, it is always safe to call __put_task_struct().
> -	 * Under RT, we can only call it in preemptible context.
> -	 */
> -	if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible()) {
> -		static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP);
> -
> -		lock_map_acquire_try(&put_task_map);
> -		__put_task_struct(t);
> -		lock_map_release(&put_task_map);
> -		return;
> -	}

I don't think you've substantiated why the !PREEMPT_RT case needs to go.

> -
> -	/*
> -	 * under PREEMPT_RT, we can't call put_task_struct
> +	 * In !RT, it is always safe to call __put_task_struct(),
> +	 * but under PREEMPT_RT, we can't call put_task_struct
>  	 * in atomic context because it will indirectly
> -	 * acquire sleeping locks.
> +	 * acquire sleeping locks. The same is true if the
> +	 * current process has a mutex enqueued (blocked on
> +	 * a PI chain).
>  	 *
>  	 * call_rcu() will schedule delayed_put_task_struct_rcu()
>  	 * to be called in process context.
> -- 
> 2.49.0
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] sched: do not call __put_task_struct() on rt if pi_blocked_on is set
  2025-04-10 12:40 ` Peter Zijlstra
@ 2025-04-10 14:32   ` Luis Claudio R. Goncalves
  0 siblings, 0 replies; 5+ messages in thread
From: Luis Claudio R. Goncalves @ 2025-04-10 14:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Tejun Heo, David Vernet, Barret Rhoden, Josh Don, Crystal Wood,
	linux-kernel, linux-rt-devel, Juri Lelli, Ben Segall,
	Dietmar Eggemann, Ingo Molnar, Mel Gorman, Valentin Schneider,
	Vincent Guittot, Thomas Gleixner

On Thu, Apr 10, 2025 at 02:40:59PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 10, 2025 at 09:10:12AM -0300, Luis Claudio R. Goncalves wrote:
> > With PREEMPT_RT enabled, some of the calls to put_task_struct() coming
> > from rt_mutex_adjust_prio_chain() could happen in preemptible context and
> > with a mutex enqueued. That could lead to this sequence:
> > 
> > 	rt_mutex_adjust_prio_chain()
> > 	  put_task_struct()
> > 	    __put_task_struct()
> > 	      sched_ext_free()
> > 	        spin_lock_irqsave()
> > 	          rtlock_lock() --->  TRIGGERS
> > 	                              lockdep_assert(!current->pi_blocked_on);
> > 
> > Fix that by unconditionally resorting to the deferred call to
> > __put_task_struct().
> > 
> > v2: (Rostedt) remove the #ifdef from put_task_struct() and create
> >     tsk_is_pi_blocked_on() in sched.h to make the change cleaner.
> > 
> > v3: (Sebastian and PeterZ) always call the RCU deferred __put_task_struct().
> 
> Changelog goes below the --- line.
> 
> > Suggested-by: Crystal Wood <crwood@redhat.com>
> > Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
> > ---
> >  include/linux/sched/task.h |   20 +++++---------------
> >  1 file changed, 5 insertions(+), 15 deletions(-)
> > 
> > diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
> > index 0f2aeb37bbb04..49847efe5559e 100644
> > --- a/include/linux/sched/task.h
> > +++ b/include/linux/sched/task.h
> > @@ -134,22 +134,12 @@ static inline void put_task_struct(struct task_struct *t)
> >  		return;
> >  
> >  	/*
> > -	 * In !RT, it is always safe to call __put_task_struct().
> > -	 * Under RT, we can only call it in preemptible context.
> > -	 */
> > -	if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible()) {
> > -		static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP);
> > -
> > -		lock_map_acquire_try(&put_task_map);
> > -		__put_task_struct(t);
> > -		lock_map_release(&put_task_map);
> > -		return;
> > -	}
> 
> I don't think you've substantiated why the !PREEMPT_RT case needs to go.

That was my misunderstanding of "unconditionally call the deferred
function". I see I took it too far and made the patch wrong.

I am testing v4 (closer to the original code with fixed comments) that is
basically:

	if !RT
		__put_task_struct (original code)
	else
		call_rcu(__put_task_struct_rcu_cb)

With the corrected comments Sebastian pointed out.

As soon as the tests complete I will post v4.

Thanks,
Luis
 
> > -
> > -	/*
> > -	 * under PREEMPT_RT, we can't call put_task_struct
> > +	 * In !RT, it is always safe to call __put_task_struct(),
> > +	 * but under PREEMPT_RT, we can't call put_task_struct
> >  	 * in atomic context because it will indirectly
> > -	 * acquire sleeping locks.
> > +	 * acquire sleeping locks. The same is true if the
> > +	 * current process has a mutex enqueued (blocked on
> > +	 * a PI chain).
> >  	 *
> >  	 * call_rcu() will schedule delayed_put_task_struct_rcu()
> >  	 * to be called in process context.
> > -- 
> > 2.49.0
> > 
> 
---end quoted text---


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-04-10 14:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-10 12:10 [PATCH v3] sched: do not call __put_task_struct() on rt if pi_blocked_on is set Luis Claudio R. Goncalves
2025-04-10 12:20 ` Sebastian Andrzej Siewior
2025-04-10 12:39   ` Luis Claudio R. Goncalves
2025-04-10 12:40 ` Peter Zijlstra
2025-04-10 14:32   ` Luis Claudio R. Goncalves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).