linux-rt-devel.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt
@ 2025-08-06 19:43 Luis Claudio R. Goncalves
  2025-08-11 10:06 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 17+ messages in thread
From: Luis Claudio R. Goncalves @ 2025-08-06 19:43 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Sebastian Andrzej Siewior,
	Clark Williams, Steven Rostedt, Tejun Heo, David Vernet,
	Barret Rhoden, Josh Don, Crystal Wood, linux-kernel,
	linux-rt-devel, Juri Lelli, Ben Segall, Dietmar Eggemann,
	Ingo Molnar, Mel Gorman, Valentin Schneider, Vincent Guittot,
	Thomas Gleixner, Wander Lairson Costa

Commit 8671bad873eb ("sched: Do not call __put_task_struct() on rt
if pi_blocked_on is set") changed the behavior of put_task_struct()
unconditionally, even when PREEMPT_RT was not enabled, in clear mismatch
with the commit description.

Restore the previous behavior of put_task_struct() for the PREEMPT_RT
disabled case.

Fixes: 8671bad873eb ("sched: Do not call __put_task_struct() on rt if pi_blocked_on is set")
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
---
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index ea41795a352b..51678a541477 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -130,6 +133,16 @@ static inline void put_task_struct(struct task_struct *t)
 	if (!refcount_dec_and_test(&t->usage))
 		return;
 
+	/* In !RT, it is always safe to call __put_task_struct(). */
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
+		static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP);
+
+		lock_map_acquire_try(&put_task_map);
+		__put_task_struct(t);
+		lock_map_release(&put_task_map);
+		return;
+	}
+
 	/*
 	 * Under PREEMPT_RT, we can't call __put_task_struct
 	 * in atomic context because it will indirectly
@@ -137,10 +150,6 @@ static inline void put_task_struct(struct task_struct *t)
 	 * current process has a mutex enqueued (blocked on
 	 * a PI chain).
 	 *
-	 * In !RT, it is always safe to call __put_task_struct().
-	 * Though, in order to simplify the code, resort to the
-	 * deferred call too.
-	 *
 	 * call_rcu() will schedule __put_task_struct_rcu_cb()
 	 * to be called in process context.
 	 *


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt
  2025-08-06 19:43 Luis Claudio R. Goncalves
@ 2025-08-11 10:06 ` Sebastian Andrzej Siewior
  2025-08-11 10:40   ` Oleg Nesterov
  0 siblings, 1 reply; 17+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-08-11 10:06 UTC (permalink / raw)
  To: Luis Claudio R. Goncalves
  Cc: Oleg Nesterov, Peter Zijlstra, Clark Williams, Steven Rostedt,
	Tejun Heo, David Vernet, Barret Rhoden, Josh Don, Crystal Wood,
	linux-kernel, linux-rt-devel, Juri Lelli, Ben Segall,
	Dietmar Eggemann, Ingo Molnar, Mel Gorman, Valentin Schneider,
	Vincent Guittot, Thomas Gleixner, Wander Lairson Costa

On 2025-08-06 16:43:55 [-0300], Luis Claudio R. Goncalves wrote:
> Commit 8671bad873eb ("sched: Do not call __put_task_struct() on rt
> if pi_blocked_on is set") changed the behavior of put_task_struct()
> unconditionally, even when PREEMPT_RT was not enabled, in clear mismatch
> with the commit description.
> 
> Restore the previous behavior of put_task_struct() for the PREEMPT_RT
> disabled case.
> 
> Fixes: 8671bad873eb ("sched: Do not call __put_task_struct() on rt if pi_blocked_on is set")
> Acked-by: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
> ---
> diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
> index ea41795a352b..51678a541477 100644
> --- a/include/linux/sched/task.h
> +++ b/include/linux/sched/task.h
> @@ -130,6 +133,16 @@ static inline void put_task_struct(struct task_struct *t)
>  	if (!refcount_dec_and_test(&t->usage))
>  		return;
>  
> +	/* In !RT, it is always safe to call __put_task_struct(). */

I don't want to drag this but this comment is obvious for anyone who is
fluent in C. It is just a statement with no explanation.
An important note would be that the atomic context restriction only
apply to PREEMPT_RT and therefore we have this context override for
lockdep below. The other question would be why don't we do this
unconditionally regardless of PREEMPT_RT. The only reason I could find
is that releasing the task here from the "exit path" makes the vmap
stack "earlier" available for reuse.

> +	if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
> +		static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP);
> +
> +		lock_map_acquire_try(&put_task_map);
> +		__put_task_struct(t);
> +		lock_map_release(&put_task_map);
> +		return;
> +	}
> +
>  	/*
>  	 * Under PREEMPT_RT, we can't call __put_task_struct
>  	 * in atomic context because it will indirectly
> @@ -137,10 +150,6 @@ static inline void put_task_struct(struct task_struct *t)
>  	 * current process has a mutex enqueued (blocked on
>  	 * a PI chain).
>  	 *
> -	 * In !RT, it is always safe to call __put_task_struct().
> -	 * Though, in order to simplify the code, resort to the
> -	 * deferred call too.
> -	 *
>  	 * call_rcu() will schedule __put_task_struct_rcu_cb()
>  	 * to be called in process context.
>  	 *
> 

Sebastian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt
  2025-08-11 10:06 ` Sebastian Andrzej Siewior
@ 2025-08-11 10:40   ` Oleg Nesterov
  2025-08-11 11:05     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 17+ messages in thread
From: Oleg Nesterov @ 2025-08-11 10:40 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Luis Claudio R. Goncalves, Peter Zijlstra, Clark Williams,
	Steven Rostedt, Tejun Heo, David Vernet, Barret Rhoden, Josh Don,
	Crystal Wood, linux-kernel, linux-rt-devel, Juri Lelli,
	Ben Segall, Dietmar Eggemann, Ingo Molnar, Mel Gorman,
	Valentin Schneider, Vincent Guittot, Thomas Gleixner,
	Wander Lairson Costa

On 08/11, Sebastian Andrzej Siewior wrote:
>
> I don't want to drag this but this comment is obvious for anyone who is
> fluent in C. It is just a statement with no explanation.
> An important note would be that the atomic context restriction only
> apply to PREEMPT_RT and therefore we have this context override for
> lockdep below. The other question would be why don't we do this
> unconditionally regardless of PREEMPT_RT. The only reason I could find
> is that releasing the task here from the "exit path" makes the vmap
> stack "earlier" available for reuse.

Sorry, could you clarify your "other" question?

What exactly do you think we could do regardless of PREEMPT_RT?

Oleg.

> 
> > +	if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
> > +		static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP);
> > +
> > +		lock_map_acquire_try(&put_task_map);
> > +		__put_task_struct(t);
> > +		lock_map_release(&put_task_map);
> > +		return;
> > +	}
> > +
> >  	/*
> >  	 * Under PREEMPT_RT, we can't call __put_task_struct
> >  	 * in atomic context because it will indirectly
> > @@ -137,10 +150,6 @@ static inline void put_task_struct(struct task_struct *t)
> >  	 * current process has a mutex enqueued (blocked on
> >  	 * a PI chain).
> >  	 *
> > -	 * In !RT, it is always safe to call __put_task_struct().
> > -	 * Though, in order to simplify the code, resort to the
> > -	 * deferred call too.
> > -	 *
> >  	 * call_rcu() will schedule __put_task_struct_rcu_cb()
> >  	 * to be called in process context.
> >  	 *
> > 
> 
> Sebastian
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt
  2025-08-11 10:40   ` Oleg Nesterov
@ 2025-08-11 11:05     ` Sebastian Andrzej Siewior
  2025-08-11 11:21       ` Oleg Nesterov
  0 siblings, 1 reply; 17+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-08-11 11:05 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Luis Claudio R. Goncalves, Peter Zijlstra, Clark Williams,
	Steven Rostedt, Tejun Heo, David Vernet, Barret Rhoden, Josh Don,
	Crystal Wood, linux-kernel, linux-rt-devel, Juri Lelli,
	Ben Segall, Dietmar Eggemann, Ingo Molnar, Mel Gorman,
	Valentin Schneider, Vincent Guittot, Thomas Gleixner,
	Wander Lairson Costa

On 2025-08-11 12:40:34 [+0200], Oleg Nesterov wrote:
> On 08/11, Sebastian Andrzej Siewior wrote:
> >
> > I don't want to drag this but this comment is obvious for anyone who is
> > fluent in C. It is just a statement with no explanation.
> > An important note would be that the atomic context restriction only
> > apply to PREEMPT_RT and therefore we have this context override for
> > lockdep below. The other question would be why don't we do this
> > unconditionally regardless of PREEMPT_RT. The only reason I could find
> > is that releasing the task here from the "exit path" makes the vmap
> > stack "earlier" available for reuse.
> 
> Sorry, could you clarify your "other" question?
> 
> What exactly do you think we could do regardless of PREEMPT_RT?

Do what we do now and have one free path for task_struct regardless if
PREEMPT_RT is enabled or not. The one via RCU which delays the freeing
until after the grace period.

> Oleg.

Sebastian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt
  2025-08-11 11:05     ` Sebastian Andrzej Siewior
@ 2025-08-11 11:21       ` Oleg Nesterov
  2025-08-11 12:15         ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 17+ messages in thread
From: Oleg Nesterov @ 2025-08-11 11:21 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Luis Claudio R. Goncalves, Peter Zijlstra, Clark Williams,
	Steven Rostedt, Tejun Heo, David Vernet, Barret Rhoden, Josh Don,
	Crystal Wood, linux-kernel, linux-rt-devel, Juri Lelli,
	Ben Segall, Dietmar Eggemann, Ingo Molnar, Mel Gorman,
	Valentin Schneider, Vincent Guittot, Thomas Gleixner,
	Wander Lairson Costa

On 08/11, Sebastian Andrzej Siewior wrote:
>
> On 2025-08-11 12:40:34 [+0200], Oleg Nesterov wrote:
> >
> > What exactly do you think we could do regardless of PREEMPT_RT?
>
> Do what we do now and have one free path for task_struct regardless if
> PREEMPT_RT is enabled or not. The one via RCU which delays the freeing
> until after the grace period.

Ah, got it. I won't really argue, but...

call_rcu() is not free, it obviously delays free_task/etc. To me this
!PREEMPT_RT optimization makes sense.

But lets forget it for the moment. This patch

	https://lore.kernel.org/all/aGvTz5VaPFyj0pBV@uudg.org
	[PATCH v6] sched: do not call __put_task_struct() on rt if pi_blocked_on is set

removed that optimization by mistake, this doesn't even match the changelog.
I think it should be restored, and this is what the new patch from Luis does.

Then we can discuss this all again and possibly remove it, but this
should be explicitly documented in the changelog.

Oleg.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt
  2025-08-11 11:21       ` Oleg Nesterov
@ 2025-08-11 12:15         ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 17+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-08-11 12:15 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Luis Claudio R. Goncalves, Peter Zijlstra, Clark Williams,
	Steven Rostedt, Tejun Heo, David Vernet, Barret Rhoden, Josh Don,
	Crystal Wood, linux-kernel, linux-rt-devel, Juri Lelli,
	Ben Segall, Dietmar Eggemann, Ingo Molnar, Mel Gorman,
	Valentin Schneider, Vincent Guittot, Thomas Gleixner,
	Wander Lairson Costa

On 2025-08-11 13:21:20 [+0200], Oleg Nesterov wrote:
> On 08/11, Sebastian Andrzej Siewior wrote:
> >
> > On 2025-08-11 12:40:34 [+0200], Oleg Nesterov wrote:
> > >
> > > What exactly do you think we could do regardless of PREEMPT_RT?
> >
> > Do what we do now and have one free path for task_struct regardless if
> > PREEMPT_RT is enabled or not. The one via RCU which delays the freeing
> > until after the grace period.
> 
> Ah, got it. I won't really argue, but...
> 
> call_rcu() is not free, it obviously delays free_task/etc. To me this
> !PREEMPT_RT optimization makes sense.
> 
> But lets forget it for the moment. This patch
> 
> 	https://lore.kernel.org/all/aGvTz5VaPFyj0pBV@uudg.org
> 	[PATCH v6] sched: do not call __put_task_struct() on rt if pi_blocked_on is set
> 
> removed that optimization by mistake, this doesn't even match the changelog.
> I think it should be restored, and this is what the new patch from Luis does.
> 
> Then we can discuss this all again and possibly remove it, but this
> should be explicitly documented in the changelog.

Certainly. I am not saying we should keep it as is. The added comment
appears wrong but I am all for getting this merged and then sorting out
later.
Thank you for spotting this ;)

> Oleg.

Sebastian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt
@ 2025-08-25 13:50 Luis Claudio R. Goncalves
  0 siblings, 0 replies; 17+ messages in thread
From: Luis Claudio R. Goncalves @ 2025-08-25 13:50 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Sebastian Andrzej Siewior,
	Clark Williams, Steven Rostedt, Tejun Heo, David Vernet,
	Barret Rhoden, Josh Don, Crystal Wood, linux-kernel,
	linux-rt-devel, Juri Lelli, Ben Segall, Dietmar Eggemann,
	Ingo Molnar, Mel Gorman, Valentin Schneider, Vincent Guittot,
	Thomas Gleixner, Wander Lairson Costa

Commit 8671bad873eb ("sched: Do not call __put_task_struct() on rt
if pi_blocked_on is set") changed the behavior of put_task_struct()
unconditionally, even when PREEMPT_RT was not enabled, in clear mismatch
with the commit description.

Restore the previous behavior of put_task_struct() for the PREEMPT_RT
disabled case.

Fixes: 8671bad873eb ("sched: Do not call __put_task_struct() on rt if pi_blocked_on is set")
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
---

  Note: This patch is a fix motivated by Oleg Nesterov's question at
  https://lore.kernel.org/linux-rt-devel/20250728201441.GA4690@redhat.com/

diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index ea41795a352b..51678a541477 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -130,6 +133,16 @@ static inline void put_task_struct(struct task_struct *t)
 	if (!refcount_dec_and_test(&t->usage))
 		return;
 
+	/* In !RT, it is always safe to call __put_task_struct(). */
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
+		static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP);
+
+		lock_map_acquire_try(&put_task_map);
+		__put_task_struct(t);
+		lock_map_release(&put_task_map);
+		return;
+	}
+
 	/*
 	 * Under PREEMPT_RT, we can't call __put_task_struct
 	 * in atomic context because it will indirectly
@@ -137,10 +150,6 @@ static inline void put_task_struct(struct task_struct *t)
 	 * current process has a mutex enqueued (blocked on
 	 * a PI chain).
 	 *
-	 * In !RT, it is always safe to call __put_task_struct().
-	 * Though, in order to simplify the code, resort to the
-	 * deferred call too.
-	 *
 	 * call_rcu() will schedule __put_task_struct_rcu_cb()
 	 * to be called in process context.
 	 *


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt
@ 2025-09-15 11:15 Luis Claudio R. Goncalves
  2025-09-15 11:38 ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Luis Claudio R. Goncalves @ 2025-09-15 11:15 UTC (permalink / raw)
  To: Peter Zijlstra, Oleg Nesterov, Sebastian Andrzej Siewior,
	Clark Williams, Steven Rostedt, Tejun Heo, David Vernet,
	Barret Rhoden, Josh Don, Crystal Wood, linux-kernel,
	linux-rt-devel, Juri Lelli, Ben Segall, Dietmar Eggemann,
	Ingo Molnar, Mel Gorman, Valentin Schneider, Vincent Guittot,
	Thomas Gleixner, Wander Lairson Costa

Commit 8671bad873eb ("sched: Do not call __put_task_struct() on rt
if pi_blocked_on is set") changed the behavior of put_task_struct()
unconditionally, even when PREEMPT_RT was not enabled, in clear mismatch
with the commit description.

Restore the previous behavior of put_task_struct() for the PREEMPT_RT
disabled case.

Fixes: 8671bad873eb ("sched: Do not call __put_task_struct() on rt if pi_blocked_on is set")
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
---

  Note: This patch is a fix motivated by Oleg Nesterov's question at
  https://lore.kernel.org/linux-rt-devel/20250728201441.GA4690@redhat.com/

  Also, the kernel test robot reported a problem found in a x86 (32 bit) VM
  test that was bisected to the original fix being amended here:

    https://lkml.org/lkml/2025/9/5/147

  Though I was not able the reproduce the reported problem, this patch here
  would minimize the problem by isolating the behavior to PREEMPT_RT-enabled
  kernels.

diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index ea41795a352b..51678a541477 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -130,6 +133,16 @@ static inline void put_task_struct(struct task_struct *t)
 	if (!refcount_dec_and_test(&t->usage))
 		return;
 
+	/* In !RT, it is always safe to call __put_task_struct(). */
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
+		static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP);
+
+		lock_map_acquire_try(&put_task_map);
+		__put_task_struct(t);
+		lock_map_release(&put_task_map);
+		return;
+	}
+
 	/*
 	 * Under PREEMPT_RT, we can't call __put_task_struct
 	 * in atomic context because it will indirectly
@@ -137,10 +150,6 @@ static inline void put_task_struct(struct task_struct *t)
 	 * current process has a mutex enqueued (blocked on
 	 * a PI chain).
 	 *
-	 * In !RT, it is always safe to call __put_task_struct().
-	 * Though, in order to simplify the code, resort to the
-	 * deferred call too.
-	 *
 	 * call_rcu() will schedule __put_task_struct_rcu_cb()
 	 * to be called in process context.
 	 *




^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt
  2025-09-15 11:15 [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt Luis Claudio R. Goncalves
@ 2025-09-15 11:38 ` Peter Zijlstra
  2025-09-15 12:24   ` Sebastian Andrzej Siewior
  2025-09-15 12:35   ` Oleg Nesterov
  0 siblings, 2 replies; 17+ messages in thread
From: Peter Zijlstra @ 2025-09-15 11:38 UTC (permalink / raw)
  To: Luis Claudio R. Goncalves
  Cc: Oleg Nesterov, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt, Tejun Heo, David Vernet, Barret Rhoden, Josh Don,
	Crystal Wood, linux-kernel, linux-rt-devel, Juri Lelli,
	Ben Segall, Dietmar Eggemann, Ingo Molnar, Mel Gorman,
	Valentin Schneider, Vincent Guittot, Thomas Gleixner,
	Wander Lairson Costa

On Mon, Sep 15, 2025 at 08:15:19AM -0300, Luis Claudio R. Goncalves wrote:
> Commit 8671bad873eb ("sched: Do not call __put_task_struct() on rt
> if pi_blocked_on is set") changed the behavior of put_task_struct()
> unconditionally, even when PREEMPT_RT was not enabled, in clear mismatch
> with the commit description.
> 
> Restore the previous behavior of put_task_struct() for the PREEMPT_RT
> disabled case.
> 
> Fixes: 8671bad873eb ("sched: Do not call __put_task_struct() on rt if pi_blocked_on is set")
> Acked-by: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
> ---
> 
>   Note: This patch is a fix motivated by Oleg Nesterov's question at
>   https://lore.kernel.org/linux-rt-devel/20250728201441.GA4690@redhat.com/

Right, but I thought we did want to make this behaviour consistent.

Why have !RT behave differently? That is, why isn't this simply a
'buggy' comment/changelog issue?

>   Also, the kernel test robot reported a problem found in a x86 (32 bit) VM
>   test that was bisected to the original fix being amended here:
> 
>     https://lkml.org/lkml/2025/9/5/147
> 
>   Though I was not able the reproduce the reported problem, this patch here
>   would minimize the problem by isolating the behavior to PREEMPT_RT-enabled
>   kernels.

Yeah, robot also had trouble reproducing that if I read that
%reproduction thing right.

And IIRC RT has been running with this for ages, and never seen a
problem.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt
  2025-09-15 11:38 ` Peter Zijlstra
@ 2025-09-15 12:24   ` Sebastian Andrzej Siewior
  2025-09-15 14:49     ` Luis Claudio R. Goncalves
  2025-09-15 12:35   ` Oleg Nesterov
  1 sibling, 1 reply; 17+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-09-15 12:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Luis Claudio R. Goncalves, Oleg Nesterov, Clark Williams,
	Steven Rostedt, Tejun Heo, David Vernet, Barret Rhoden, Josh Don,
	Crystal Wood, linux-kernel, linux-rt-devel, Juri Lelli,
	Ben Segall, Dietmar Eggemann, Ingo Molnar, Mel Gorman,
	Valentin Schneider, Vincent Guittot, Thomas Gleixner,
	Wander Lairson Costa

On 2025-09-15 13:38:12 [+0200], Peter Zijlstra wrote:
> Right, but I thought we did want to make this behaviour consistent.

That is correct, that is what I asked for (either consistent or a
compelling reason why not).
Oleg pointed out that the patch description does not match the change.
That is the only complaint.

> And IIRC RT has been running with this for ages, and never seen a
> problem.

The syz-bot picked up RT recently. Other than that, yes, RT had it for
ages.

Sebastian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt
  2025-09-15 11:38 ` Peter Zijlstra
  2025-09-15 12:24   ` Sebastian Andrzej Siewior
@ 2025-09-15 12:35   ` Oleg Nesterov
  2025-09-16 10:09     ` Peter Zijlstra
  1 sibling, 1 reply; 17+ messages in thread
From: Oleg Nesterov @ 2025-09-15 12:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Luis Claudio R. Goncalves, Sebastian Andrzej Siewior,
	Clark Williams, Steven Rostedt, Tejun Heo, David Vernet,
	Barret Rhoden, Josh Don, Crystal Wood, linux-kernel,
	linux-rt-devel, Juri Lelli, Ben Segall, Dietmar Eggemann,
	Ingo Molnar, Mel Gorman, Valentin Schneider, Vincent Guittot,
	Thomas Gleixner, Wander Lairson Costa

On 09/15, Peter Zijlstra wrote:
>
> On Mon, Sep 15, 2025 at 08:15:19AM -0300, Luis Claudio R. Goncalves wrote:
> > Commit 8671bad873eb ("sched: Do not call __put_task_struct() on rt
> > if pi_blocked_on is set") changed the behavior of put_task_struct()
> > unconditionally, even when PREEMPT_RT was not enabled, in clear mismatch
> > with the commit description.
> >
> > Restore the previous behavior of put_task_struct() for the PREEMPT_RT
> > disabled case.
> >
> > Fixes: 8671bad873eb ("sched: Do not call __put_task_struct() on rt if pi_blocked_on is set")
> > Acked-by: Oleg Nesterov <oleg@redhat.com>
> > Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
> > ---
> >
> >   Note: This patch is a fix motivated by Oleg Nesterov's question at
> >   https://lore.kernel.org/linux-rt-devel/20250728201441.GA4690@redhat.com/
>
> Right, but I thought we did want to make this behaviour consistent.
>
> Why have !RT behave differently? That is, why isn't this simply a
> 'buggy' comment/changelog issue?

Well, this was discussed several times, in particular see
https://lore.kernel.org/all/CAHk-=whtj+aSYftniMRG4xvFE8dmmYyrqcJyPmzStsfj5w9r=w@mail.gmail.com/

And task_struct->rcu_users was introduced to avoid RCU call in put_task_struct() ...

But I won't really argue if you decide to remove this !RT optimization, although
I think it would be better to do this in a separate patch.

Oleg.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt
  2025-09-15 12:24   ` Sebastian Andrzej Siewior
@ 2025-09-15 14:49     ` Luis Claudio R. Goncalves
  2025-09-15 15:31       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 17+ messages in thread
From: Luis Claudio R. Goncalves @ 2025-09-15 14:49 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Peter Zijlstra, Oleg Nesterov, Clark Williams, Steven Rostedt,
	Tejun Heo, David Vernet, Barret Rhoden, Josh Don, Crystal Wood,
	linux-kernel, linux-rt-devel, Juri Lelli, Ben Segall,
	Dietmar Eggemann, Ingo Molnar, Mel Gorman, Valentin Schneider,
	Vincent Guittot, Thomas Gleixner, Wander Lairson Costa

On Mon, Sep 15, 2025 at 02:24:44PM +0200, Sebastian Andrzej Siewior wrote:
> On 2025-09-15 13:38:12 [+0200], Peter Zijlstra wrote:
> > Right, but I thought we did want to make this behaviour consistent.
> 
> That is correct, that is what I asked for (either consistent or a
> compelling reason why not).

I received a ping from colleagues investigating a problem possible caused
by excessive pressure on RCU and this change could make the problem worse.
But last I heard from them, sounds like the problem they are investigating
lies elsewhere.

That said, I don't have problems either way, limiting the change to RT or
keeping it general. I will follow whatever guidance I get from you all.

Luis

> Oleg pointed out that the patch description does not match the change.
> That is the only complaint.
> 
> > And IIRC RT has been running with this for ages, and never seen a
> > problem.
> 
> The syz-bot picked up RT recently. Other than that, yes, RT had it for
> ages.
> 
> Sebastian
> 
---end quoted text---


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt
  2025-09-15 14:49     ` Luis Claudio R. Goncalves
@ 2025-09-15 15:31       ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 17+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-09-15 15:31 UTC (permalink / raw)
  To: Luis Claudio R. Goncalves
  Cc: Peter Zijlstra, Oleg Nesterov, Clark Williams, Steven Rostedt,
	Tejun Heo, David Vernet, Barret Rhoden, Josh Don, Crystal Wood,
	linux-kernel, linux-rt-devel, Juri Lelli, Ben Segall,
	Dietmar Eggemann, Ingo Molnar, Mel Gorman, Valentin Schneider,
	Vincent Guittot, Thomas Gleixner, Wander Lairson Costa

On 2025-09-15 11:49:37 [-0300], Luis Claudio R. Goncalves wrote:
> On Mon, Sep 15, 2025 at 02:24:44PM +0200, Sebastian Andrzej Siewior wrote:
> > On 2025-09-15 13:38:12 [+0200], Peter Zijlstra wrote:
> > > Right, but I thought we did want to make this behaviour consistent.
> > 
> > That is correct, that is what I asked for (either consistent or a
> > compelling reason why not).
> 
> I received a ping from colleagues investigating a problem possible caused
> by excessive pressure on RCU and this change could make the problem worse.
> But last I heard from them, sounds like the problem they are investigating
> lies elsewhere.

Either way, if this change introduces a problem then it affects RT as
today. So we should deal with this. If it is not a problem and it just
put pressure on an existing problem then it shouldn't be an issue.

> Luis

Sebastian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt
  2025-09-15 12:35   ` Oleg Nesterov
@ 2025-09-16 10:09     ` Peter Zijlstra
  2025-09-16 11:30       ` Oleg Nesterov
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2025-09-16 10:09 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Luis Claudio R. Goncalves, Sebastian Andrzej Siewior,
	Clark Williams, Steven Rostedt, Tejun Heo, David Vernet,
	Barret Rhoden, Josh Don, Crystal Wood, linux-kernel,
	linux-rt-devel, Juri Lelli, Ben Segall, Dietmar Eggemann,
	Ingo Molnar, Mel Gorman, Valentin Schneider, Vincent Guittot,
	Thomas Gleixner, Wander Lairson Costa

On Mon, Sep 15, 2025 at 02:35:40PM +0200, Oleg Nesterov wrote:
> On 09/15, Peter Zijlstra wrote:
> >
> > On Mon, Sep 15, 2025 at 08:15:19AM -0300, Luis Claudio R. Goncalves wrote:
> > > Commit 8671bad873eb ("sched: Do not call __put_task_struct() on rt
> > > if pi_blocked_on is set") changed the behavior of put_task_struct()
> > > unconditionally, even when PREEMPT_RT was not enabled, in clear mismatch
> > > with the commit description.
> > >
> > > Restore the previous behavior of put_task_struct() for the PREEMPT_RT
> > > disabled case.
> > >
> > > Fixes: 8671bad873eb ("sched: Do not call __put_task_struct() on rt if pi_blocked_on is set")
> > > Acked-by: Oleg Nesterov <oleg@redhat.com>
> > > Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
> > > ---
> > >
> > >   Note: This patch is a fix motivated by Oleg Nesterov's question at
> > >   https://lore.kernel.org/linux-rt-devel/20250728201441.GA4690@redhat.com/
> >
> > Right, but I thought we did want to make this behaviour consistent.
> >
> > Why have !RT behave differently? That is, why isn't this simply a
> > 'buggy' comment/changelog issue?
> 
> Well, this was discussed several times, in particular see
> https://lore.kernel.org/all/CAHk-=whtj+aSYftniMRG4xvFE8dmmYyrqcJyPmzStsfj5w9r=w@mail.gmail.com/
> 
> And task_struct->rcu_users was introduced to avoid RCU call in put_task_struct() ...

Ah, I forgot about that thing.. Although I had vague memories of that
argument on rcu_assign_pointer() vs RCU_INIT_POINTER().

> But I won't really argue if you decide to remove this !RT optimization, although
> I think it would be better to do this in a separate patch.

Right. So when they wanted to remove that preemptible() clause, I was
like why again do we have this !RT exception at all, and can't we get
rid of that.

If git isn't confusing me again, this got merged in this cycle. But so
far no benchmark came and told us this was a bad idea.

So what do we do now... do we restore the !RT exception (so far there
aren't any numbers to suggest this mattered) or do we let it be for a
bit and then go and clean things up?

---
 include/linux/sched.h      |  1 -
 include/linux/sched/task.h | 37 +++----------------------------------
 kernel/bpf/helpers.c       |  6 +++---
 kernel/exit.c              | 21 ++-------------------
 kernel/fork.c              | 14 +++++++++-----
 kernel/sched/core.c        |  3 +--
 6 files changed, 18 insertions(+), 64 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 644a01bdae70..fd6586c04ed3 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1576,7 +1576,6 @@ struct task_struct {
 # endif
 #endif
 	struct rcu_head			rcu;
-	refcount_t			rcu_users;
 	int				pagefault_disabled;
 #ifdef CONFIG_MMU
 	struct task_struct		*oom_reaper_list;
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index ea41795a352b..1125c44b205a 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -122,41 +122,12 @@ static inline struct task_struct *tryget_task_struct(struct task_struct *t)
 	return refcount_inc_not_zero(&t->usage) ? t : NULL;
 }
 
-extern void __put_task_struct(struct task_struct *t);
 extern void __put_task_struct_rcu_cb(struct rcu_head *rhp);
 
 static inline void put_task_struct(struct task_struct *t)
 {
-	if (!refcount_dec_and_test(&t->usage))
-		return;
-
-	/*
-	 * Under PREEMPT_RT, we can't call __put_task_struct
-	 * in atomic context because it will indirectly
-	 * acquire sleeping locks. The same is true if the
-	 * current process has a mutex enqueued (blocked on
-	 * a PI chain).
-	 *
-	 * In !RT, it is always safe to call __put_task_struct().
-	 * Though, in order to simplify the code, resort to the
-	 * deferred call too.
-	 *
-	 * call_rcu() will schedule __put_task_struct_rcu_cb()
-	 * to be called in process context.
-	 *
-	 * __put_task_struct() is called when
-	 * refcount_dec_and_test(&t->usage) succeeds.
-	 *
-	 * This means that it can't "conflict" with
-	 * put_task_struct_rcu_user() which abuses ->rcu the same
-	 * way; rcu_users has a reference so task->usage can't be
-	 * zero after rcu_users 1 -> 0 transition.
-	 *
-	 * delayed_free_task() also uses ->rcu, but it is only called
-	 * when it fails to fork a process. Therefore, there is no
-	 * way it can conflict with __put_task_struct().
-	 */
-	call_rcu(&t->rcu, __put_task_struct_rcu_cb);
+	if (refcount_dec_and_test(&t->usage))
+		call_rcu(&t->rcu, __put_task_struct_rcu_cb);
 }
 
 DEFINE_FREE(put_task, struct task_struct *, if (_T) put_task_struct(_T))
@@ -164,11 +135,9 @@ DEFINE_FREE(put_task, struct task_struct *, if (_T) put_task_struct(_T))
 static inline void put_task_struct_many(struct task_struct *t, int nr)
 {
 	if (refcount_sub_and_test(nr, &t->usage))
-		__put_task_struct(t);
+		call_rcu(&t->rcu, __put_task_struct_rcu_cb);
 }
 
-void put_task_struct_rcu_user(struct task_struct *task);
-
 /* Free all architecture-specific resources held by a thread. */
 void release_thread(struct task_struct *dead_task);
 
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 8af62cb243d9..bdc4e65bca5c 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2459,7 +2459,7 @@ __bpf_kfunc struct bpf_rb_node *bpf_rbtree_right(struct bpf_rb_root *root, struc
  */
 __bpf_kfunc struct task_struct *bpf_task_acquire(struct task_struct *p)
 {
-	if (refcount_inc_not_zero(&p->rcu_users))
+	if (refcount_inc_not_zero(&p->usage))
 		return p;
 	return NULL;
 }
@@ -2470,12 +2470,12 @@ __bpf_kfunc struct task_struct *bpf_task_acquire(struct task_struct *p)
  */
 __bpf_kfunc void bpf_task_release(struct task_struct *p)
 {
-	put_task_struct_rcu_user(p);
+	put_task_struct(p);
 }
 
 __bpf_kfunc void bpf_task_release_dtor(void *p)
 {
-	put_task_struct_rcu_user(p);
+	put_task_struct(p);
 }
 CFI_NOSEAL(bpf_task_release_dtor);
 
diff --git a/kernel/exit.c b/kernel/exit.c
index 343eb97543d5..240a6faa0e26 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -220,23 +220,6 @@ static void __exit_signal(struct release_task_post *post, struct task_struct *ts
 		tty_kref_put(tty);
 }
 
-static void delayed_put_task_struct(struct rcu_head *rhp)
-{
-	struct task_struct *tsk = container_of(rhp, struct task_struct, rcu);
-
-	kprobe_flush_task(tsk);
-	rethook_flush_task(tsk);
-	perf_event_delayed_put(tsk);
-	trace_sched_process_free(tsk);
-	put_task_struct(tsk);
-}
-
-void put_task_struct_rcu_user(struct task_struct *task)
-{
-	if (refcount_dec_and_test(&task->rcu_users))
-		call_rcu(&task->rcu, delayed_put_task_struct);
-}
-
 void __weak release_thread(struct task_struct *dead_task)
 {
 }
@@ -305,7 +288,7 @@ void release_task(struct task_struct *p)
 	if (thread_group_leader(p))
 		flush_sigqueue(&p->signal->shared_pending);
 
-	put_task_struct_rcu_user(p);
+	put_task_struct(p);
 
 	p = leader;
 	if (unlikely(zap_leader))
@@ -1057,7 +1040,7 @@ void __noreturn make_task_dead(int signr)
 		pr_alert("Fixing recursive fault but reboot is needed!\n");
 		futex_exit_recursive(tsk);
 		tsk->exit_state = EXIT_DEAD;
-		refcount_inc(&tsk->rcu_users);
+		get_task_struct(tsk);
 		do_task_dead();
 	}
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 33dfb82af25b..383c811626a3 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -106,6 +106,7 @@
 #include <linux/pidfs.h>
 #include <linux/tick.h>
 #include <linux/unwind_deferred.h>
+#include <linux/kprobes.h>
 
 #include <asm/pgalloc.h>
 #include <linux/uaccess.h>
@@ -729,12 +730,17 @@ static inline void put_signal_struct(struct signal_struct *sig)
 		free_signal_struct(sig);
 }
 
-void __put_task_struct(struct task_struct *tsk)
+static void __put_task_struct(struct task_struct *tsk)
 {
 	WARN_ON(!tsk->exit_state);
 	WARN_ON(refcount_read(&tsk->usage));
 	WARN_ON(tsk == current);
 
+	kprobe_flush_task(tsk);
+	rethook_flush_task(tsk);
+	perf_event_delayed_put(tsk);
+	trace_sched_process_free(tsk);
+
 	unwind_task_free(tsk);
 	sched_ext_free(tsk);
 	io_uring_free(tsk);
@@ -915,12 +921,10 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
 	dup_user_cpus_ptr(tsk, orig, node);
 
 	/*
-	 * One for the user space visible state that goes away when reaped.
+	 * One for user space visible state that goes away when reaped.
 	 * One for the scheduler.
 	 */
-	refcount_set(&tsk->rcu_users, 2);
-	/* One for the rcu users */
-	refcount_set(&tsk->usage, 1);
+	refcount_set(&tsk->usage, 2);
 #ifdef CONFIG_BLK_DEV_IO_TRACE
 	tsk->btrace_seq = 0;
 #endif
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index da2062de97a2..ad3c0e9f3eb4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5259,8 +5259,7 @@ static struct rq *finish_task_switch(struct task_struct *prev)
 
 		/* Task is done with its stack. */
 		put_task_stack(prev);
-
-		put_task_struct_rcu_user(prev);
+		put_task_struct(prev);
 	}
 
 	return rq;

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt
  2025-09-16 10:09     ` Peter Zijlstra
@ 2025-09-16 11:30       ` Oleg Nesterov
  2025-10-17 14:39         ` Oleg Nesterov
  0 siblings, 1 reply; 17+ messages in thread
From: Oleg Nesterov @ 2025-09-16 11:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Luis Claudio R. Goncalves, Sebastian Andrzej Siewior,
	Clark Williams, Steven Rostedt, Tejun Heo, David Vernet,
	Barret Rhoden, Josh Don, Crystal Wood, linux-kernel,
	linux-rt-devel, Juri Lelli, Ben Segall, Dietmar Eggemann,
	Ingo Molnar, Mel Gorman, Valentin Schneider, Vincent Guittot,
	Thomas Gleixner, Wander Lairson Costa

On 09/16, Peter Zijlstra wrote:
>
> On Mon, Sep 15, 2025 at 02:35:40PM +0200, Oleg Nesterov wrote:
> > On 09/15, Peter Zijlstra wrote:
> > >
> > > Why have !RT behave differently? That is, why isn't this simply a
> > > 'buggy' comment/changelog issue?
> >
> > Well, this was discussed several times, in particular see
> > https://lore.kernel.org/all/CAHk-=whtj+aSYftniMRG4xvFE8dmmYyrqcJyPmzStsfj5w9r=w@mail.gmail.com/
> >
> > And task_struct->rcu_users was introduced to avoid RCU call in put_task_struct() ...
>
> Ah, I forgot about that thing.. Although I had vague memories of that
> argument on rcu_assign_pointer() vs RCU_INIT_POINTER().
>
> > But I won't really argue if you decide to remove this !RT optimization, although
> > I think it would be better to do this in a separate patch.
>
> Right. So when they wanted to remove that preemptible() clause, I was
> like why again do we have this !RT exception at all, and can't we get
> rid of that.
>
> If git isn't confusing me again, this got merged in this cycle. But so
> far no benchmark came and told us this was a bad idea.

I still think it would be safer to merge this patch from Luis before
v6.17, then possibly remove this special case in a separate patch...

> So what do we do now... do we restore the !RT exception (so far there
> aren't any numbers to suggest this mattered) or do we let it be for a
> bit and then go and clean things up?

It is not that simple. Please note that put_task_struct_rcu_user()
delays put(tsk->usage), not free(tsk).

So for example with this change

> @@ -305,7 +288,7 @@ void release_task(struct task_struct *p)
>  	if (thread_group_leader(p))
>  		flush_sigqueue(&p->signal->shared_pending);
>
> -	put_task_struct_rcu_user(p);
> +	put_task_struct(p);
>
>  	p = leader;
>  	if (unlikely(zap_leader))

This code

	rcu_read_lock();
	tsk = find_task_by_vpid(...);
	if (tsk)
		get_task_struct(tsk);
	rcu_read_unlock();

becomes wrong, get_task_struct(tsk) can increment tsk->usage == 0.

Oleg.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt
  2025-09-16 11:30       ` Oleg Nesterov
@ 2025-10-17 14:39         ` Oleg Nesterov
  2025-10-18 13:11           ` Luis Claudio R. Goncalves
  0 siblings, 1 reply; 17+ messages in thread
From: Oleg Nesterov @ 2025-10-17 14:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Luis Claudio R. Goncalves, Sebastian Andrzej Siewior,
	Clark Williams, Steven Rostedt, Tejun Heo, David Vernet,
	Barret Rhoden, Josh Don, Crystal Wood, linux-kernel,
	linux-rt-devel, Juri Lelli, Ben Segall, Dietmar Eggemann,
	Ingo Molnar, Mel Gorman, Valentin Schneider, Vincent Guittot,
	Thomas Gleixner, Wander Lairson Costa

Peter,

So I think that delayed_put_task_struct() has to stay and this means
that we have 2 RCU grace periods in the "typical" case when the exiting
task without extra references is reaped (or autoreaped if a subthread).

As I said I won't really argue, but what is your final thought on this?
Should we forget this patch from Luis? I'd prefer to restore the !RT
exception...

And another question. Sasha and Greg are going to add 8671bad873eb
("sched: Do not call __put_task_struct() on rt if pi_blocked_on is set")
to -stable. Do you think it is fine to backport this commit as is?

Oleg.

On 09/16, Oleg Nesterov wrote:
>
> On 09/16, Peter Zijlstra wrote:
> >
> > On Mon, Sep 15, 2025 at 02:35:40PM +0200, Oleg Nesterov wrote:
> > > On 09/15, Peter Zijlstra wrote:
> > > >
> > > > Why have !RT behave differently? That is, why isn't this simply a
> > > > 'buggy' comment/changelog issue?
> > >
> > > Well, this was discussed several times, in particular see
> > > https://lore.kernel.org/all/CAHk-=whtj+aSYftniMRG4xvFE8dmmYyrqcJyPmzStsfj5w9r=w@mail.gmail.com/
> > >
> > > And task_struct->rcu_users was introduced to avoid RCU call in put_task_struct() ...
> >
> > Ah, I forgot about that thing.. Although I had vague memories of that
> > argument on rcu_assign_pointer() vs RCU_INIT_POINTER().
> >
> > > But I won't really argue if you decide to remove this !RT optimization, although
> > > I think it would be better to do this in a separate patch.
> >
> > Right. So when they wanted to remove that preemptible() clause, I was
> > like why again do we have this !RT exception at all, and can't we get
> > rid of that.
> >
> > If git isn't confusing me again, this got merged in this cycle. But so
> > far no benchmark came and told us this was a bad idea.
>
> I still think it would be safer to merge this patch from Luis before
> v6.17, then possibly remove this special case in a separate patch...
>
> > So what do we do now... do we restore the !RT exception (so far there
> > aren't any numbers to suggest this mattered) or do we let it be for a
> > bit and then go and clean things up?
>
> It is not that simple. Please note that put_task_struct_rcu_user()
> delays put(tsk->usage), not free(tsk).
>
> So for example with this change
>
> > @@ -305,7 +288,7 @@ void release_task(struct task_struct *p)
> >  	if (thread_group_leader(p))
> >  		flush_sigqueue(&p->signal->shared_pending);
> >
> > -	put_task_struct_rcu_user(p);
> > +	put_task_struct(p);
> >
> >  	p = leader;
> >  	if (unlikely(zap_leader))
>
> This code
>
> 	rcu_read_lock();
> 	tsk = find_task_by_vpid(...);
> 	if (tsk)
> 		get_task_struct(tsk);
> 	rcu_read_unlock();
>
> becomes wrong, get_task_struct(tsk) can increment tsk->usage == 0.
>
> Oleg.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt
  2025-10-17 14:39         ` Oleg Nesterov
@ 2025-10-18 13:11           ` Luis Claudio R. Goncalves
  0 siblings, 0 replies; 17+ messages in thread
From: Luis Claudio R. Goncalves @ 2025-10-18 13:11 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt, Tejun Heo, David Vernet, Barret Rhoden, Josh Don,
	Crystal Wood, linux-kernel, linux-rt-devel, Juri Lelli,
	Ben Segall, Dietmar Eggemann, Ingo Molnar, Mel Gorman,
	Valentin Schneider, Vincent Guittot, Thomas Gleixner,
	Wander Lairson Costa

On Fri, Oct 17, 2025 at 04:39:45PM +0200, Oleg Nesterov wrote:
> Peter,
> 
> So I think that delayed_put_task_struct() has to stay and this means
> that we have 2 RCU grace periods in the "typical" case when the exiting
> task without extra references is reaped (or autoreaped if a subthread).
> 
> As I said I won't really argue, but what is your final thought on this?
> Should we forget this patch from Luis? I'd prefer to restore the !RT
> exception...
> 
> And another question. Sasha and Greg are going to add 8671bad873eb
> ("sched: Do not call __put_task_struct() on rt if pi_blocked_on is set")
> to -stable. Do you think it is fine to backport this commit as is?

I am not aware of any specific RCU fixes that need to be present in order
for commit 8671bad873eb to work properly on older kernels. That was my initial
fear when Sasha and Greg mentioned the backport to Stable in August. The
reference I have is that the deferred __put_task_struct() calls have been
working for RT on older kernels for quite a while.

Anyway, I asked them to hold because this discussion here was still happening.

If that helps to get the ball moving and sounds reasonable to you all, I could
propose a STABLE-only patch that isolates the change to RT. Does that make
sense?

Luis

> Oleg.
> 
> On 09/16, Oleg Nesterov wrote:
> >
> > On 09/16, Peter Zijlstra wrote:
> > >
> > > On Mon, Sep 15, 2025 at 02:35:40PM +0200, Oleg Nesterov wrote:
> > > > On 09/15, Peter Zijlstra wrote:
> > > > >
> > > > > Why have !RT behave differently? That is, why isn't this simply a
> > > > > 'buggy' comment/changelog issue?
> > > >
> > > > Well, this was discussed several times, in particular see
> > > > https://lore.kernel.org/all/CAHk-=whtj+aSYftniMRG4xvFE8dmmYyrqcJyPmzStsfj5w9r=w@mail.gmail.com/
> > > >
> > > > And task_struct->rcu_users was introduced to avoid RCU call in put_task_struct() ...
> > >
> > > Ah, I forgot about that thing.. Although I had vague memories of that
> > > argument on rcu_assign_pointer() vs RCU_INIT_POINTER().
> > >
> > > > But I won't really argue if you decide to remove this !RT optimization, although
> > > > I think it would be better to do this in a separate patch.
> > >
> > > Right. So when they wanted to remove that preemptible() clause, I was
> > > like why again do we have this !RT exception at all, and can't we get
> > > rid of that.
> > >
> > > If git isn't confusing me again, this got merged in this cycle. But so
> > > far no benchmark came and told us this was a bad idea.
> >
> > I still think it would be safer to merge this patch from Luis before
> > v6.17, then possibly remove this special case in a separate patch...
> >
> > > So what do we do now... do we restore the !RT exception (so far there
> > > aren't any numbers to suggest this mattered) or do we let it be for a
> > > bit and then go and clean things up?
> >
> > It is not that simple. Please note that put_task_struct_rcu_user()
> > delays put(tsk->usage), not free(tsk).
> >
> > So for example with this change
> >
> > > @@ -305,7 +288,7 @@ void release_task(struct task_struct *p)
> > >  	if (thread_group_leader(p))
> > >  		flush_sigqueue(&p->signal->shared_pending);
> > >
> > > -	put_task_struct_rcu_user(p);
> > > +	put_task_struct(p);
> > >
> > >  	p = leader;
> > >  	if (unlikely(zap_leader))
> >
> > This code
> >
> > 	rcu_read_lock();
> > 	tsk = find_task_by_vpid(...);
> > 	if (tsk)
> > 		get_task_struct(tsk);
> > 	rcu_read_unlock();
> >
> > becomes wrong, get_task_struct(tsk) can increment tsk->usage == 0.
> >
> > Oleg.
> 
---end quoted text---


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-10-18 13:11 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-15 11:15 [RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt Luis Claudio R. Goncalves
2025-09-15 11:38 ` Peter Zijlstra
2025-09-15 12:24   ` Sebastian Andrzej Siewior
2025-09-15 14:49     ` Luis Claudio R. Goncalves
2025-09-15 15:31       ` Sebastian Andrzej Siewior
2025-09-15 12:35   ` Oleg Nesterov
2025-09-16 10:09     ` Peter Zijlstra
2025-09-16 11:30       ` Oleg Nesterov
2025-10-17 14:39         ` Oleg Nesterov
2025-10-18 13:11           ` Luis Claudio R. Goncalves
  -- strict thread matches above, loose matches on Subject: below --
2025-08-25 13:50 Luis Claudio R. Goncalves
2025-08-06 19:43 Luis Claudio R. Goncalves
2025-08-11 10:06 ` Sebastian Andrzej Siewior
2025-08-11 10:40   ` Oleg Nesterov
2025-08-11 11:05     ` Sebastian Andrzej Siewior
2025-08-11 11:21       ` Oleg Nesterov
2025-08-11 12:15         ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).