All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] exit: change the release_task() paths to call flush_sigqueue() lockless
@ 2025-02-05 17:51 Oleg Nesterov
  2025-02-05 17:51 ` [PATCH 1/1] " Oleg Nesterov
  2025-02-05 17:53 ` [PATCH 0/1] " Mateusz Guzik
  0 siblings, 2 replies; 8+ messages in thread
From: Oleg Nesterov @ 2025-02-05 17:51 UTC (permalink / raw)
  To: Andrew Morton, Eric W. Biederman, Frederic Weisbecker,
	Peter Zijlstra, Thomas Gleixner
  Cc: Mateusz Guzik, linux-kernel

Thomas, Frederic, could you review?

Untested, and I'll recheck this patch tomorrow, perhaps I missed something...

Mateusz, this patch has minor conflicts with your

	[PATCH v3 0/6] reduce tasklist_lock hold time on exit and do some

WIP changes. I can wait for v4 from you and rebase on top, but I'd like
to show this patch right now to simplify the review.

Oleg.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/1] exit: change the release_task() paths to call flush_sigqueue() lockless
  2025-02-05 17:51 [PATCH 0/1] exit: change the release_task() paths to call flush_sigqueue() lockless Oleg Nesterov
@ 2025-02-05 17:51 ` Oleg Nesterov
  2025-02-05 22:18   ` Frederic Weisbecker
  2025-02-06 15:57   ` Thomas Gleixner
  2025-02-05 17:53 ` [PATCH 0/1] " Mateusz Guzik
  1 sibling, 2 replies; 8+ messages in thread
From: Oleg Nesterov @ 2025-02-05 17:51 UTC (permalink / raw)
  To: Andrew Morton, Eric W. Biederman, Frederic Weisbecker,
	Peter Zijlstra, Thomas Gleixner
  Cc: Mateusz Guzik, linux-kernel

A task can block a signal, accumulate up to RLIMIT_SIGPENDING sigqueues,
and exit. In this case __exit_signal()->flush_sigqueue() called with irqs
disabled can triger a hard lockup, see
https://lore.kernel.org/all/20190322114917.GC28876@redhat.com/

Fortunately, after the recent posixtimer changes sys_timer_delete() paths
no longer try to clear SIGQUEUE_PREALLOC and/or free tmr->sigq, and after
the exiting task passes __exit_signal() lock_task_sighand() can't succeed
and pid_task(tmr->it_pid) will return NULL.

This means that after __exit_signal(tsk) nobody can play with tsk->pending
or (if group_dead) with tsk->signal->shared_pending, so release_task() can
safely call flush_sigqueue() after write_unlock_irq(&tasklist_lock).

Also, kill clear_tsk_thread_flag(TIF_SIGPENDING), it was never needed.

TODO:
	- we can probably shift posix_cpu_timers_exit() as well
	- do_sigaction() can hit the similar problem

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/exit.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 3485e5fc499e..bc2c24ea4181 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -200,20 +200,12 @@ static void __exit_signal(struct task_struct *tsk)
 	__unhash_process(tsk, group_dead);
 	write_sequnlock(&sig->stats_lock);
 
-	/*
-	 * Do this under ->siglock, we can race with another thread
-	 * doing sigqueue_free() if we have SIGQUEUE_PREALLOC signals.
-	 */
-	flush_sigqueue(&tsk->pending);
 	tsk->sighand = NULL;
 	spin_unlock(&sighand->siglock);
 
 	__cleanup_sighand(sighand);
-	clear_tsk_thread_flag(tsk, TIF_SIGPENDING);
-	if (group_dead) {
-		flush_sigqueue(&sig->shared_pending);
+	if (group_dead)
 		tty_kref_put(tty);
-	}
 }
 
 static void delayed_put_task_struct(struct rcu_head *rhp)
@@ -279,6 +271,11 @@ void release_task(struct task_struct *p)
 	proc_flush_pid(thread_pid);
 	put_pid(thread_pid);
 	release_thread(p);
+
+	flush_sigqueue(&p->pending);
+	if (thread_group_leader(p))
+		flush_sigqueue(&p->signal->shared_pending);
+
 	put_task_struct_rcu_user(p);
 
 	p = leader;
-- 
2.25.1.362.g51ebf55



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] exit: change the release_task() paths to call flush_sigqueue() lockless
  2025-02-05 17:51 [PATCH 0/1] exit: change the release_task() paths to call flush_sigqueue() lockless Oleg Nesterov
  2025-02-05 17:51 ` [PATCH 1/1] " Oleg Nesterov
@ 2025-02-05 17:53 ` Mateusz Guzik
  1 sibling, 0 replies; 8+ messages in thread
From: Mateusz Guzik @ 2025-02-05 17:53 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Eric W. Biederman, Frederic Weisbecker,
	Peter Zijlstra, Thomas Gleixner, linux-kernel

On Wed, Feb 5, 2025 at 6:52 PM Oleg Nesterov <oleg@redhat.com> wrote:
>
> Thomas, Frederic, could you review?
>
> Untested, and I'll recheck this patch tomorrow, perhaps I missed something...
>
> Mateusz, this patch has minor conflicts with your
>
>         [PATCH v3 0/6] reduce tasklist_lock hold time on exit and do some
>
> WIP changes. I can wait for v4 from you and rebase on top, but I'd like
> to show this patch right now to simplify the review.
>

I'm still waiting for feedback from Eric.

I'll rebase on whatever patch which lands.
-- 
Mateusz Guzik <mjguzik gmail.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] exit: change the release_task() paths to call flush_sigqueue() lockless
  2025-02-05 17:51 ` [PATCH 1/1] " Oleg Nesterov
@ 2025-02-05 22:18   ` Frederic Weisbecker
  2025-02-06 12:14     ` Oleg Nesterov
  2025-02-06 15:57   ` Thomas Gleixner
  1 sibling, 1 reply; 8+ messages in thread
From: Frederic Weisbecker @ 2025-02-05 22:18 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Eric W. Biederman, Peter Zijlstra, Thomas Gleixner,
	Mateusz Guzik, linux-kernel

Le Wed, Feb 05, 2025 at 06:51:59PM +0100, Oleg Nesterov a écrit :
> A task can block a signal, accumulate up to RLIMIT_SIGPENDING sigqueues,
> and exit. In this case __exit_signal()->flush_sigqueue() called with irqs
> disabled can triger a hard lockup, see
> https://lore.kernel.org/all/20190322114917.GC28876@redhat.com/
> 
> Fortunately, after the recent posixtimer changes sys_timer_delete() paths
> no longer try to clear SIGQUEUE_PREALLOC and/or free tmr->sigq, and after
> the exiting task passes __exit_signal() lock_task_sighand() can't succeed
> and pid_task(tmr->it_pid) will return NULL.
> 
> This means that after __exit_signal(tsk) nobody can play with tsk->pending
> or (if group_dead) with tsk->signal->shared_pending, so release_task() can
> safely call flush_sigqueue() after write_unlock_irq(&tasklist_lock).
> 
> Also, kill clear_tsk_thread_flag(TIF_SIGPENDING), it was never needed.
> 
> TODO:
> 	- we can probably shift posix_cpu_timers_exit() as well
> 	- do_sigaction() can hit the similar problem
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  kernel/exit.c | 15 ++++++---------
>  1 file changed, 6 insertions(+), 9 deletions(-)
> 
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 3485e5fc499e..bc2c24ea4181 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -200,20 +200,12 @@ static void __exit_signal(struct task_struct *tsk)
>  	__unhash_process(tsk, group_dead);
>  	write_sequnlock(&sig->stats_lock);
>  
> -	/*
> -	 * Do this under ->siglock, we can race with another thread
> -	 * doing sigqueue_free() if we have SIGQUEUE_PREALLOC signals.
> -	 */
> -	flush_sigqueue(&tsk->pending);
>  	tsk->sighand = NULL;
>  	spin_unlock(&sighand->siglock);
>  
>  	__cleanup_sighand(sighand);
> -	clear_tsk_thread_flag(tsk, TIF_SIGPENDING);

Looks good to me, except for this TIF_SIGPENDING removal which I'm less
sure about, I see a lot of places where it is added/removed. Well it's
probably only checked locally on entry code. Would it make sense to move
this chunk to a separate preceding patch? Or keep it here but at least
explain on the changelog why it is safe to remove it?

Thanks!

> -	if (group_dead) {
> -		flush_sigqueue(&sig->shared_pending);
> +	if (group_dead)
>  		tty_kref_put(tty);
> -	}
>  }
>  
>  static void delayed_put_task_struct(struct rcu_head *rhp)
> @@ -279,6 +271,11 @@ void release_task(struct task_struct *p)
>  	proc_flush_pid(thread_pid);
>  	put_pid(thread_pid);
>  	release_thread(p);
> +
> +	flush_sigqueue(&p->pending);
> +	if (thread_group_leader(p))
> +		flush_sigqueue(&p->signal->shared_pending);
> +
>  	put_task_struct_rcu_user(p);
>  
>  	p = leader;
> -- 
> 2.25.1.362.g51ebf55
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] exit: change the release_task() paths to call flush_sigqueue() lockless
  2025-02-05 22:18   ` Frederic Weisbecker
@ 2025-02-06 12:14     ` Oleg Nesterov
  0 siblings, 0 replies; 8+ messages in thread
From: Oleg Nesterov @ 2025-02-06 12:14 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Andrew Morton, Eric W. Biederman, Peter Zijlstra, Thomas Gleixner,
	Mateusz Guzik, linux-kernel

On 02/05, Frederic Weisbecker wrote:
>
> Looks good to me, except for this TIF_SIGPENDING removal which I'm less
> sure about, I see a lot of places where it is added/removed. Well it's
> probably only checked locally on entry code. Would it make sense to move
> this chunk to a separate preceding patch?

OK, will do.

Thanks!

Oleg.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] exit: change the release_task() paths to call flush_sigqueue() lockless
  2025-02-05 17:51 ` [PATCH 1/1] " Oleg Nesterov
  2025-02-05 22:18   ` Frederic Weisbecker
@ 2025-02-06 15:57   ` Thomas Gleixner
  2025-02-06 16:23     ` Oleg Nesterov
  1 sibling, 1 reply; 8+ messages in thread
From: Thomas Gleixner @ 2025-02-06 15:57 UTC (permalink / raw)
  To: Oleg Nesterov, Andrew Morton, Eric W. Biederman,
	Frederic Weisbecker, Peter Zijlstra
  Cc: Mateusz Guzik, linux-kernel

On Wed, Feb 05 2025 at 18:51, Oleg Nesterov wrote:
> A task can block a signal, accumulate up to RLIMIT_SIGPENDING sigqueues,
> and exit. In this case __exit_signal()->flush_sigqueue() called with irqs
> disabled can triger a hard lockup, see
> https://lore.kernel.org/all/20190322114917.GC28876@redhat.com/
>
> Fortunately, after the recent posixtimer changes sys_timer_delete() paths
> no longer try to clear SIGQUEUE_PREALLOC and/or free tmr->sigq, and after
> the exiting task passes __exit_signal() lock_task_sighand() can't succeed
> and pid_task(tmr->it_pid) will return NULL.
>
> This means that after __exit_signal(tsk) nobody can play with tsk->pending
> or (if group_dead) with tsk->signal->shared_pending, so release_task() can
> safely call flush_sigqueue() after write_unlock_irq(&tasklist_lock).

I can't find a problem with that.

> Also, kill clear_tsk_thread_flag(TIF_SIGPENDING), it was never needed.

I'm not entirely sure about that, but it does not hurt to clear it,
right?

> TODO:
> 	- we can probably shift posix_cpu_timers_exit() as well

I think so.

> 	- do_sigaction() can hit the similar problem

Indeed, but that's a tough on to solve.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] exit: change the release_task() paths to call flush_sigqueue() lockless
  2025-02-06 15:57   ` Thomas Gleixner
@ 2025-02-06 16:23     ` Oleg Nesterov
  2025-02-06 21:04       ` Thomas Gleixner
  0 siblings, 1 reply; 8+ messages in thread
From: Oleg Nesterov @ 2025-02-06 16:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andrew Morton, Eric W. Biederman, Frederic Weisbecker,
	Peter Zijlstra, Mateusz Guzik, linux-kernel

On 02/06, Thomas Gleixner wrote:
>
> On Wed, Feb 05 2025 at 18:51, Oleg Nesterov wrote:
> > A task can block a signal, accumulate up to RLIMIT_SIGPENDING sigqueues,
> > and exit. In this case __exit_signal()->flush_sigqueue() called with irqs
> > disabled can triger a hard lockup, see
> > https://lore.kernel.org/all/20190322114917.GC28876@redhat.com/
> >
> > Fortunately, after the recent posixtimer changes sys_timer_delete() paths
> > no longer try to clear SIGQUEUE_PREALLOC and/or free tmr->sigq, and after
> > the exiting task passes __exit_signal() lock_task_sighand() can't succeed
> > and pid_task(tmr->it_pid) will return NULL.
> >
> > This means that after __exit_signal(tsk) nobody can play with tsk->pending
> > or (if group_dead) with tsk->signal->shared_pending, so release_task() can
> > safely call flush_sigqueue() after write_unlock_irq(&tasklist_lock).
>
> I can't find a problem with that.

Ah, good.

> > Also, kill clear_tsk_thread_flag(TIF_SIGPENDING), it was never needed.
>
> I'm not entirely sure about that, but it does not hurt to clear it,
> right?

Please see v2 which documents this change in a separate patch.
Again, it is not that it is really bad, just it looks very confusing
to me and I think it can confuse other readers of this code.

> > 	- do_sigaction() can hit the similar problem
>
> Indeed, but that's a tough on to solve.

Yeah... Although I have to admit that yesterday I had a very simple
(and wrong) solution in mind ;)

Thanks!

Oleg.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] exit: change the release_task() paths to call flush_sigqueue() lockless
  2025-02-06 16:23     ` Oleg Nesterov
@ 2025-02-06 21:04       ` Thomas Gleixner
  0 siblings, 0 replies; 8+ messages in thread
From: Thomas Gleixner @ 2025-02-06 21:04 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Eric W. Biederman, Frederic Weisbecker,
	Peter Zijlstra, Mateusz Guzik, linux-kernel

Oleg!

On Thu, Feb 06 2025 at 17:23, Oleg Nesterov wrote:
> On 02/06, Thomas Gleixner wrote:
>> > 	- do_sigaction() can hit the similar problem
>>
>> Indeed, but that's a tough on to solve.
>
> Yeah... Although I have to admit that yesterday I had a very simple
> (and wrong) solution in mind ;)

I can relate. My initial "brilliant" idea to solve that turned out to be
broken after less than 10 seconds... :)

That needs some more thought to the whole signal handling business and
unfortunately the recent posix-timer changes do not help much in that
context. Let me think about it some more.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-02-06 21:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-05 17:51 [PATCH 0/1] exit: change the release_task() paths to call flush_sigqueue() lockless Oleg Nesterov
2025-02-05 17:51 ` [PATCH 1/1] " Oleg Nesterov
2025-02-05 22:18   ` Frederic Weisbecker
2025-02-06 12:14     ` Oleg Nesterov
2025-02-06 15:57   ` Thomas Gleixner
2025-02-06 16:23     ` Oleg Nesterov
2025-02-06 21:04       ` Thomas Gleixner
2025-02-05 17:53 ` [PATCH 0/1] " Mateusz Guzik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.