* [PATCH RFC exit] Sleep at TASK_IDLE when waiting for application core dump
@ 2024-07-25 0:04 Paul E. McKenney
2024-07-25 13:29 ` Oleg Nesterov
0 siblings, 1 reply; 4+ messages in thread
From: Paul E. McKenney @ 2024-07-25 0:04 UTC (permalink / raw)
To: linux-kernel; +Cc: axboe, brauner, oleg, akpm, willy, clm, riel, ffledgling
Currently, the coredump_task_exit() function sets the task state to
TASK_UNINTERRUPTIBLE|TASK_FREEZABLE, which usually works well. But a
combination of large memory and slow (and/or highly contended) mass
storage can cause application core dumps to take more than two minutes,
which can triggers "task blocked" splats. There does not seem to be
any reasonable benefit to getting these splats.
Therefore, change that TASK_UNINTERRUPTIBLE to TASK_IDLE.
Reported-by: Anhad Jai Singh <ffledgling@meta.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: Rik van Riel <riel@surriel.com>
diff --git a/kernel/exit.c b/kernel/exit.c
index f95a2c1338a8..b0d18f7b6d15 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -429,7 +429,7 @@ static void coredump_task_exit(struct task_struct *tsk)
complete(&core_state->startup);
for (;;) {
- set_current_state(TASK_UNINTERRUPTIBLE|TASK_FREEZABLE);
+ set_current_state(TASK_IDLE|TASK_FREEZABLE);
if (!self.task) /* see coredump_finish() */
break;
schedule();
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH RFC exit] Sleep at TASK_IDLE when waiting for application core dump
2024-07-25 0:04 [PATCH RFC exit] Sleep at TASK_IDLE when waiting for application core dump Paul E. McKenney
@ 2024-07-25 13:29 ` Oleg Nesterov
2024-07-25 16:24 ` Paul E. McKenney
0 siblings, 1 reply; 4+ messages in thread
From: Oleg Nesterov @ 2024-07-25 13:29 UTC (permalink / raw)
To: Paul E. McKenney
Cc: linux-kernel, axboe, brauner, akpm, willy, clm, riel, ffledgling
On 07/24, Paul E. McKenney wrote:
>
> Currently, the coredump_task_exit() function sets the task state to
> TASK_UNINTERRUPTIBLE|TASK_FREEZABLE, which usually works well. But a
> combination of large memory and slow (and/or highly contended) mass
> storage can cause application core dumps to take more than two minutes,
> which can triggers "task blocked" splats.
Do you mean check_hung_uninterruptible_tasks() ?
In any case,
> Therefore, change that TASK_UNINTERRUPTIBLE to TASK_IDLE.
...
> @@ -429,7 +429,7 @@ static void coredump_task_exit(struct task_struct *tsk)
> complete(&core_state->startup);
>
> for (;;) {
> - set_current_state(TASK_UNINTERRUPTIBLE|TASK_FREEZABLE);
> + set_current_state(TASK_IDLE|TASK_FREEZABLE);
To me this change makes sense regardless...
To some degree TASK_UNINTERRUPTIBLE is misleading, in that the task which
sleeps in coredump_task_exit() is _KILLABLE, although not "directly".
A SIGKILL sent to the coredumping process will interrupt the coredumping thread
(see the signal->core_state check in prepare_signal() and fatal_signal_pending()
check in dump_interrupted()) and then this thread will wakeup the threads sleeping
in coredump_task_exit(), see coredump_finish().
Acked-by: Oleg Nesterov <oleg@redhat.com>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH RFC exit] Sleep at TASK_IDLE when waiting for application core dump
2024-07-25 13:29 ` Oleg Nesterov
@ 2024-07-25 16:24 ` Paul E. McKenney
2024-07-25 17:09 ` Oleg Nesterov
0 siblings, 1 reply; 4+ messages in thread
From: Paul E. McKenney @ 2024-07-25 16:24 UTC (permalink / raw)
To: Oleg Nesterov
Cc: linux-kernel, axboe, brauner, akpm, willy, clm, riel, ffledgling
On Thu, Jul 25, 2024 at 03:29:35PM +0200, Oleg Nesterov wrote:
> On 07/24, Paul E. McKenney wrote:
> >
> > Currently, the coredump_task_exit() function sets the task state to
> > TASK_UNINTERRUPTIBLE|TASK_FREEZABLE, which usually works well. But a
> > combination of large memory and slow (and/or highly contended) mass
> > storage can cause application core dumps to take more than two minutes,
> > which can triggers "task blocked" splats.
>
> Do you mean check_hung_uninterruptible_tasks() ?
Yes, from its call to check_hung_task(). Good point, I will add that
to the commit log.
> In any case,
>
> > Therefore, change that TASK_UNINTERRUPTIBLE to TASK_IDLE.
> ...
> > @@ -429,7 +429,7 @@ static void coredump_task_exit(struct task_struct *tsk)
> > complete(&core_state->startup);
> >
> > for (;;) {
> > - set_current_state(TASK_UNINTERRUPTIBLE|TASK_FREEZABLE);
> > + set_current_state(TASK_IDLE|TASK_FREEZABLE);
>
> To me this change makes sense regardless...
>
> To some degree TASK_UNINTERRUPTIBLE is misleading, in that the task which
> sleeps in coredump_task_exit() is _KILLABLE, although not "directly".
>
> A SIGKILL sent to the coredumping process will interrupt the coredumping thread
> (see the signal->core_state check in prepare_signal() and fatal_signal_pending()
> check in dump_interrupted()) and then this thread will wakeup the threads sleeping
> in coredump_task_exit(), see coredump_finish().
Very good, I will add this to the commit log with attribution.
> Acked-by: Oleg Nesterov <oleg@redhat.com>
Thank you for the thorough review! How does the updated patch shown
below look to you?
Thanx, Paul
------------------------------------------------------------------------
commit a6c7779283d67a409b81616a5b485ac21637d7e7
Author: Paul E. McKenney <paulmck@kernel.org>
Date: Wed Jul 24 16:51:52 2024 -0700
exit: Sleep at TASK_IDLE when waiting for application core dump
Currently, the coredump_task_exit() function sets the task state
to TASK_UNINTERRUPTIBLE|TASK_FREEZABLE, which usually works well.
But a combination of large memory and slow (and/or highly contended)
mass storage can cause application core dumps to take more than
two minutes, which can cause check_hung_task(), which is invoked by
check_hung_uninterruptible_tasks(), to produce task-blocked splats.
There does not seem to be any reasonable benefit to getting these splats.
Furthermore, as Oleg Nesterov points out, TASK_UNINTERRUPTIBLE could
be misleading because the task sleeping in coredump_task_exit() really
is killable, albeit indirectly. See the check of signal->core_state
in prepare_signal() and the check of fatal_signal_pending()
in dump_interrupted(), which bypass the normal unkillability of
TASK_UNINTERRUPTIBLE, resulting in coredump_finish() invoking
wake_up_process() on any threads sleeping in coredump_task_exit().
Therefore, change that TASK_UNINTERRUPTIBLE to TASK_IDLE.
Reported-by: Anhad Jai Singh <ffledgling@meta.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: Rik van Riel <riel@surriel.com>
diff --git a/kernel/exit.c b/kernel/exit.c
index f95a2c1338a8..b0d18f7b6d15 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -429,7 +429,7 @@ static void coredump_task_exit(struct task_struct *tsk)
complete(&core_state->startup);
for (;;) {
- set_current_state(TASK_UNINTERRUPTIBLE|TASK_FREEZABLE);
+ set_current_state(TASK_IDLE|TASK_FREEZABLE);
if (!self.task) /* see coredump_finish() */
break;
schedule();
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH RFC exit] Sleep at TASK_IDLE when waiting for application core dump
2024-07-25 16:24 ` Paul E. McKenney
@ 2024-07-25 17:09 ` Oleg Nesterov
0 siblings, 0 replies; 4+ messages in thread
From: Oleg Nesterov @ 2024-07-25 17:09 UTC (permalink / raw)
To: Paul E. McKenney
Cc: linux-kernel, axboe, brauner, akpm, willy, clm, riel, ffledgling
On 07/25, Paul E. McKenney wrote:
>
> How does the updated patch shown below look to you?
Thanks, looks good to me ;)
Oleg.
> ------------------------------------------------------------------------
>
> commit a6c7779283d67a409b81616a5b485ac21637d7e7
> Author: Paul E. McKenney <paulmck@kernel.org>
> Date: Wed Jul 24 16:51:52 2024 -0700
>
> exit: Sleep at TASK_IDLE when waiting for application core dump
>
> Currently, the coredump_task_exit() function sets the task state
> to TASK_UNINTERRUPTIBLE|TASK_FREEZABLE, which usually works well.
> But a combination of large memory and slow (and/or highly contended)
> mass storage can cause application core dumps to take more than
> two minutes, which can cause check_hung_task(), which is invoked by
> check_hung_uninterruptible_tasks(), to produce task-blocked splats.
> There does not seem to be any reasonable benefit to getting these splats.
>
> Furthermore, as Oleg Nesterov points out, TASK_UNINTERRUPTIBLE could
> be misleading because the task sleeping in coredump_task_exit() really
> is killable, albeit indirectly. See the check of signal->core_state
> in prepare_signal() and the check of fatal_signal_pending()
> in dump_interrupted(), which bypass the normal unkillability of
> TASK_UNINTERRUPTIBLE, resulting in coredump_finish() invoking
> wake_up_process() on any threads sleeping in coredump_task_exit().
>
> Therefore, change that TASK_UNINTERRUPTIBLE to TASK_IDLE.
>
> Reported-by: Anhad Jai Singh <ffledgling@meta.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Acked-by: Oleg Nesterov <oleg@redhat.com>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> Cc: Chris Mason <clm@fb.com>
> Cc: Rik van Riel <riel@surriel.com>
>
> diff --git a/kernel/exit.c b/kernel/exit.c
> index f95a2c1338a8..b0d18f7b6d15 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -429,7 +429,7 @@ static void coredump_task_exit(struct task_struct *tsk)
> complete(&core_state->startup);
>
> for (;;) {
> - set_current_state(TASK_UNINTERRUPTIBLE|TASK_FREEZABLE);
> + set_current_state(TASK_IDLE|TASK_FREEZABLE);
> if (!self.task) /* see coredump_finish() */
> break;
> schedule();
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-07-25 17:11 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-25 0:04 [PATCH RFC exit] Sleep at TASK_IDLE when waiting for application core dump Paul E. McKenney
2024-07-25 13:29 ` Oleg Nesterov
2024-07-25 16:24 ` Paul E. McKenney
2024-07-25 17:09 ` Oleg Nesterov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox