public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] exit: change the release_task() paths to call flush_sigqueue() lockless
@ 2025-02-06 15:22 Oleg Nesterov
  2025-02-06 15:23 ` [PATCH v2 1/2] " Oleg Nesterov
  2025-02-06 15:23 ` [PATCH v2 2/2] exit: kill the pointless __exit_signal()->clear_tsk_thread_flag(TIF_SIGPENDING) Oleg Nesterov
  0 siblings, 2 replies; 7+ messages in thread
From: Oleg Nesterov @ 2025-02-06 15:22 UTC (permalink / raw)
  To: Andrew Morton, Eric W. Biederman, Frederic Weisbecker,
	Peter Zijlstra, Thomas Gleixner
  Cc: Mateusz Guzik, linux-kernel

Changes:

	- add a comment to explain why the lockless flush_sigqueue() is safe

	- make a separate 2/2 oneliner for TIF_SIGPENDING removal

Link to v1: https://lore.kernel.org/all/20250205175136.GA8702@redhat.com/

Oleg.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/2] exit: change the release_task() paths to call flush_sigqueue() lockless
  2025-02-06 15:22 [PATCH v2 0/2] exit: change the release_task() paths to call flush_sigqueue() lockless Oleg Nesterov
@ 2025-02-06 15:23 ` Oleg Nesterov
  2025-02-06 16:27   ` Frederic Weisbecker
  2025-02-06 15:23 ` [PATCH v2 2/2] exit: kill the pointless __exit_signal()->clear_tsk_thread_flag(TIF_SIGPENDING) Oleg Nesterov
  1 sibling, 1 reply; 7+ messages in thread
From: Oleg Nesterov @ 2025-02-06 15:23 UTC (permalink / raw)
  To: Andrew Morton, Eric W. Biederman, Frederic Weisbecker,
	Peter Zijlstra, Thomas Gleixner
  Cc: Mateusz Guzik, linux-kernel

A task can block a signal, accumulate up to RLIMIT_SIGPENDING sigqueues,
and exit. In this case __exit_signal()->flush_sigqueue() called with irqs
disabled can trigger a hard lockup, see
https://lore.kernel.org/all/20190322114917.GC28876@redhat.com/

Fortunately, after the recent posixtimer changes sys_timer_delete() paths
no longer try to clear SIGQUEUE_PREALLOC and/or free tmr->sigq, and after
the exiting task passes __exit_signal() lock_task_sighand() can't succeed
and pid_task(tmr->it_pid) will return NULL.

This means that after __exit_signal(tsk) nobody can play with tsk->pending
or (if group_dead) with tsk->signal->shared_pending, so release_task() can
safely call flush_sigqueue() after write_unlock_irq(&tasklist_lock).

TODO:
	- we can probably shift posix_cpu_timers_exit() as well
	- do_sigaction() can hit the similar problem

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/exit.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 3485e5fc499e..2d7444da743d 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -200,20 +200,13 @@ static void __exit_signal(struct task_struct *tsk)
 	__unhash_process(tsk, group_dead);
 	write_sequnlock(&sig->stats_lock);
 
-	/*
-	 * Do this under ->siglock, we can race with another thread
-	 * doing sigqueue_free() if we have SIGQUEUE_PREALLOC signals.
-	 */
-	flush_sigqueue(&tsk->pending);
 	tsk->sighand = NULL;
 	spin_unlock(&sighand->siglock);
 
 	__cleanup_sighand(sighand);
 	clear_tsk_thread_flag(tsk, TIF_SIGPENDING);
-	if (group_dead) {
-		flush_sigqueue(&sig->shared_pending);
+	if (group_dead)
 		tty_kref_put(tty);
-	}
 }
 
 static void delayed_put_task_struct(struct rcu_head *rhp)
@@ -279,6 +272,16 @@ void release_task(struct task_struct *p)
 	proc_flush_pid(thread_pid);
 	put_pid(thread_pid);
 	release_thread(p);
+	/*
+	 * This task was already removed from the process/thread/pid lists
+	 * and lock_task_sighand(p) can't succeed. Nobody else can touch
+	 * ->pending or, if group dead, signal->shared_pending. We can call
+	 * flush_sigqueue() lockless.
+	 */
+	flush_sigqueue(&p->pending);
+	if (thread_group_leader(p))
+		flush_sigqueue(&p->signal->shared_pending);
+
 	put_task_struct_rcu_user(p);
 
 	p = leader;
-- 
2.25.1.362.g51ebf55



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 2/2] exit: kill the pointless __exit_signal()->clear_tsk_thread_flag(TIF_SIGPENDING)
  2025-02-06 15:22 [PATCH v2 0/2] exit: change the release_task() paths to call flush_sigqueue() lockless Oleg Nesterov
  2025-02-06 15:23 ` [PATCH v2 1/2] " Oleg Nesterov
@ 2025-02-06 15:23 ` Oleg Nesterov
  2025-02-06 16:30   ` Frederic Weisbecker
  1 sibling, 1 reply; 7+ messages in thread
From: Oleg Nesterov @ 2025-02-06 15:23 UTC (permalink / raw)
  To: Andrew Morton, Eric W. Biederman, Frederic Weisbecker,
	Peter Zijlstra, Thomas Gleixner
  Cc: Mateusz Guzik, linux-kernel

It predates the git history and most probably it was never needed. It
doesn't really hurt, but it looks confusing because its purpose is not
clear at all.

release_task(p) is called when this task has already passed exit_notify()
so signal_pending(p) == T shouldn't make any difference.

And even _if_ there were a subtle reason to clear TIF_SIGPENDING after
exit_notify(), this clear_tsk_thread_flag() can't help anyway.  If the
exiting task is a group leader or if it is ptraced, release_task() will
be likely called when this task has already done its last schedule() from
do_task_dead().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/exit.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 2d7444da743d..0acb94b17caa 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -204,7 +204,6 @@ static void __exit_signal(struct task_struct *tsk)
 	spin_unlock(&sighand->siglock);
 
 	__cleanup_sighand(sighand);
-	clear_tsk_thread_flag(tsk, TIF_SIGPENDING);
 	if (group_dead)
 		tty_kref_put(tty);
 }
-- 
2.25.1.362.g51ebf55



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 1/2] exit: change the release_task() paths to call flush_sigqueue() lockless
  2025-02-06 15:23 ` [PATCH v2 1/2] " Oleg Nesterov
@ 2025-02-06 16:27   ` Frederic Weisbecker
  2025-02-06 16:55     ` Oleg Nesterov
  0 siblings, 1 reply; 7+ messages in thread
From: Frederic Weisbecker @ 2025-02-06 16:27 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Eric W. Biederman, Peter Zijlstra, Thomas Gleixner,
	Mateusz Guzik, linux-kernel

Le Thu, Feb 06, 2025 at 04:23:14PM +0100, Oleg Nesterov a écrit :
> A task can block a signal, accumulate up to RLIMIT_SIGPENDING sigqueues,
> and exit. In this case __exit_signal()->flush_sigqueue() called with irqs
> disabled can trigger a hard lockup, see
> https://lore.kernel.org/all/20190322114917.GC28876@redhat.com/
> 
> Fortunately, after the recent posixtimer changes sys_timer_delete() paths
> no longer try to clear SIGQUEUE_PREALLOC and/or free tmr->sigq, and after
> the exiting task passes __exit_signal() lock_task_sighand() can't succeed
> and pid_task(tmr->it_pid) will return NULL.
> 
> This means that after __exit_signal(tsk) nobody can play with tsk->pending
> or (if group_dead) with tsk->signal->shared_pending, so release_task() can
> safely call flush_sigqueue() after write_unlock_irq(&tasklist_lock).
> 
> TODO:
> 	- we can probably shift posix_cpu_timers_exit() as well

Hmm, can't a timer be concurrently deleted between __exit_signal() set
tsk->sighand = NULL and release sighand lock, and the actual call to
posix_cpu_timer_exit() ? And then posix_cpu_timer_exit() calls timerqueue_del()
on a node that don't exist anymore?

That would even trigger the warning in posix_cpu_timer_del().

> 	- do_sigaction() can hit the similar problem
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 2/2] exit: kill the pointless __exit_signal()->clear_tsk_thread_flag(TIF_SIGPENDING)
  2025-02-06 15:23 ` [PATCH v2 2/2] exit: kill the pointless __exit_signal()->clear_tsk_thread_flag(TIF_SIGPENDING) Oleg Nesterov
@ 2025-02-06 16:30   ` Frederic Weisbecker
  0 siblings, 0 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2025-02-06 16:30 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Eric W. Biederman, Peter Zijlstra, Thomas Gleixner,
	Mateusz Guzik, linux-kernel

Le Thu, Feb 06, 2025 at 04:23:34PM +0100, Oleg Nesterov a écrit :
> It predates the git history and most probably it was never needed. It
> doesn't really hurt, but it looks confusing because its purpose is not
> clear at all.
> 
> release_task(p) is called when this task has already passed exit_notify()
> so signal_pending(p) == T shouldn't make any difference.
> 
> And even _if_ there were a subtle reason to clear TIF_SIGPENDING after
> exit_notify(), this clear_tsk_thread_flag() can't help anyway.  If the
> exiting task is a group leader or if it is ptraced, release_task() will
> be likely called when this task has already done its last schedule() from
> do_task_dead().
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Acked-by: Frederic Weisbecker <frederic@kernel.org>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 1/2] exit: change the release_task() paths to call flush_sigqueue() lockless
  2025-02-06 16:27   ` Frederic Weisbecker
@ 2025-02-06 16:55     ` Oleg Nesterov
  2025-02-06 17:03       ` Frederic Weisbecker
  0 siblings, 1 reply; 7+ messages in thread
From: Oleg Nesterov @ 2025-02-06 16:55 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Andrew Morton, Eric W. Biederman, Peter Zijlstra, Thomas Gleixner,
	Mateusz Guzik, linux-kernel

On 02/06, Frederic Weisbecker wrote:
>
> > TODO:
> > 	- we can probably shift posix_cpu_timers_exit() as well
>
> Hmm, can't a timer be concurrently deleted between __exit_signal() set
> tsk->sighand = NULL and release sighand lock, and the actual call to
> posix_cpu_timer_exit() ? And then posix_cpu_timer_exit() calls timerqueue_del()
> on a node that don't exist anymore?

Can't answer right now, I will think about it when/if I will actually try to
make this change ;) This "TODO" note just tries to explain what else we could
try to do, and "probably" means that I am not sure yet. I can remove this spam
from the changelog, but I'd prefer to keep it as a reminder, at least for myself.

> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

Thanks Frederic!

Oleg.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 1/2] exit: change the release_task() paths to call flush_sigqueue() lockless
  2025-02-06 16:55     ` Oleg Nesterov
@ 2025-02-06 17:03       ` Frederic Weisbecker
  0 siblings, 0 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2025-02-06 17:03 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Eric W. Biederman, Peter Zijlstra, Thomas Gleixner,
	Mateusz Guzik, linux-kernel

Le Thu, Feb 06, 2025 at 05:55:28PM +0100, Oleg Nesterov a écrit :
> On 02/06, Frederic Weisbecker wrote:
> >
> > > TODO:
> > > 	- we can probably shift posix_cpu_timers_exit() as well
> >
> > Hmm, can't a timer be concurrently deleted between __exit_signal() set
> > tsk->sighand = NULL and release sighand lock, and the actual call to
> > posix_cpu_timer_exit() ? And then posix_cpu_timer_exit() calls timerqueue_del()
> > on a node that don't exist anymore?
> 
> Can't answer right now, I will think about it when/if I will actually try to
> make this change ;) This "TODO" note just tries to explain what else we could
> try to do, and "probably" means that I am not sure yet. I can remove this spam
> from the changelog, but I'd prefer to keep it as a reminder, at least for
> myself.

Sure!

And thanks again for the patch!

> 
> > Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> 
> Thanks Frederic!
> 
> Oleg.
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-02-06 17:03 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-06 15:22 [PATCH v2 0/2] exit: change the release_task() paths to call flush_sigqueue() lockless Oleg Nesterov
2025-02-06 15:23 ` [PATCH v2 1/2] " Oleg Nesterov
2025-02-06 16:27   ` Frederic Weisbecker
2025-02-06 16:55     ` Oleg Nesterov
2025-02-06 17:03       ` Frederic Weisbecker
2025-02-06 15:23 ` [PATCH v2 2/2] exit: kill the pointless __exit_signal()->clear_tsk_thread_flag(TIF_SIGPENDING) Oleg Nesterov
2025-02-06 16:30   ` Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox