From: "Paul E. McKenney" <paulmck@linux.ibm.com>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Petr Mladek <pmladek@suse.com>,
Vitaly Kuznetsov <vkuznets@redhat.com>,
Liu Chuansheng <chuansheng.liu@intel.com>,
Valdis Kletnieks <valdis.kletnieks@vt.edu>,
linux-kernel@vger.kernel.org, Dmitry Vyukov <dvyukov@google.com>
Subject: Re: [PATCH] kernel/hung_task.c: Monitor killed tasks.
Date: Tue, 14 May 2019 15:28:52 -0700 [thread overview]
Message-ID: <20190514222852.GE4184@linux.ibm.com> (raw)
In-Reply-To: <1557745331-10367-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp>
On Mon, May 13, 2019 at 08:02:11PM +0900, Tetsuo Handa wrote:
> syzbot's second top report is "no output from test machine" where the
> userspace process failed to spawn a new test process for 300 seconds
> for some reason. One of reasons which can result in this report is that
> an already spawned test process was unable to terminate (e.g. trapped at
> an unkillable retry loop due to some bug) after SIGKILL was sent to that
> process. Therefore, reporting when a thread is failing to terminate
> despite a fatal signal is pending would give us more useful information.
>
> This version shares existing sysctl settings (e.g. check interval,
> timeout, whether to panic) used for detecting TASK_UNINTERRUPTIBLE
> threads, for I don't know whether people want to use a new kernel
> config option and different sysctl settings for monitoring killed
> threads.
>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Dmitry Vyukov <dvyukov@google.com>
Looks good to me.
Acked-by: Paul E. McKenney <paulmck@linux.ibm.com>
A few inconsequential comments below.
> ---
> include/linux/sched.h | 1 +
> kernel/hung_task.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 45 insertions(+)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index a2cd1585..d42bdd7 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -850,6 +850,7 @@ struct task_struct {
> #ifdef CONFIG_DETECT_HUNG_TASK
> unsigned long last_switch_count;
> unsigned long last_switch_time;
> + unsigned long killed_time;
> #endif
> /* Filesystem information: */
> struct fs_struct *fs;
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index f108a95..34e7b84 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -141,6 +141,47 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
> touch_nmi_watchdog();
> }
>
> +static void check_killed_task(struct task_struct *t, unsigned long timeout)
> +{
> + unsigned long stamp = t->killed_time;
> +
> + /*
> + * Ensure the task is not frozen.
> + * Also, skip vfork and any other user process that freezer should skip.
> + */
> + if (unlikely(t->flags & (PF_FROZEN | PF_FREEZER_SKIP)))
> + return;
> + /*
> + * Skip threads which are already inside do_exit(), for exit_mm() etc.
> + * might take many seconds.
> + */
> + if (t->flags & PF_EXITING)
> + return;
> + if (!stamp) {
> + stamp = jiffies;
> + if (!stamp)
> + stamp++;
Cute trick to avoid issues with jiffy overflow on 32-bit systems. ;-)
> + t->killed_time = stamp;
> + return;
> + }
> + if (time_is_after_jiffies(stamp + timeout * HZ))
And if I understand correctly, timeout of zero disables everything, so
we don't get the backwards false-positive comparison above.
> + return;
> + trace_sched_process_hang(t);
> + if (sysctl_hung_task_panic) {
> + console_verbose();
> + hung_task_call_panic = true;
> + }
> + /*
> + * This thread failed to terminate for more than
> + * sysctl_hung_task_timeout_secs seconds, complain:
> + */
> + pr_err("INFO: task %s:%d can't die for more than %ld seconds.\n",
> + t->comm, t->pid, (jiffies - stamp) / HZ);
> + sched_show_task(t);
> + hung_task_show_lock = true;
> + touch_nmi_watchdog();
> +}
> +
> /*
> * To avoid extending the RCU grace period for an unbounded amount of time,
> * periodically exit the critical section and enter a new one.
> @@ -192,6 +233,9 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
> goto unlock;
> last_break = jiffies;
> }
> + /* Check threads which are about to terminate. */
> + if (unlikely(fatal_signal_pending(t)))
> + check_killed_task(t, timeout);
> /* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */
> if (t->state == TASK_UNINTERRUPTIBLE)
> check_hung_task(t, timeout);
> --
> 1.8.3.1
>
next prev parent reply other threads:[~2019-05-14 22:30 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-13 11:02 [PATCH] kernel/hung_task.c: Monitor killed tasks Tetsuo Handa
2019-05-13 11:11 ` Dmitry Vyukov
2019-05-14 22:28 ` Paul E. McKenney [this message]
2019-05-15 10:55 ` Petr Mladek
2019-05-16 8:19 ` Tetsuo Handa
2019-05-16 11:57 ` Petr Mladek
2019-05-16 12:38 ` Tetsuo Handa
2019-05-22 12:38 ` Tetsuo Handa
2019-05-22 13:41 ` Stephen Rothwell
2019-05-22 14:58 ` Tetsuo Handa
2019-05-22 21:09 ` Tetsuo Handa
2019-05-22 21:39 ` Stephen Rothwell
2019-05-22 21:43 ` Andrew Morton
2019-05-22 23:46 ` Tetsuo Handa
2019-05-27 14:12 ` Tetsuo Handa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190514222852.GE4184@linux.ibm.com \
--to=paulmck@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=chuansheng.liu@intel.com \
--cc=dvyukov@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=peterz@infradead.org \
--cc=pmladek@suse.com \
--cc=valdis.kletnieks@vt.edu \
--cc=vkuznets@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox