* [PATCH 0/1] hung_task debugging: Add tracepoint to report the hang
@ 2013-10-19 16:18 Oleg Nesterov
2013-10-19 16:18 ` [PATCH 1/1] " Oleg Nesterov
0 siblings, 1 reply; 5+ messages in thread
From: Oleg Nesterov @ 2013-10-19 16:18 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Steven Rostedt; +Cc: Dave Sullivan, linux-kernel
Hi,
We have a feature request, the customer needs something more hookable
than just printk's from check_hung_task() to implement the user-space
watchdog which can potentially resolve the problems which caused the
hang.
The patch simply adds a tracepoint into check_hung_task(), do you
think we can (ab)use tp this way?
And I am just curious, perhaps another patch (below) makes sense too?
Oleg.
------------------------------------------------------------------------------
[PATCH] hung_task debugging: Do not report the same task twice
If a task hangs, check_hung_task() will blame it again and again
until it wakes up or sysctl_hung_task_warnings becomes zero. IMO,
this just adds the unnecessary noise and if another task hangs
after sysctl_hung_task_timeout_secs * sysctl_hung_task_warnings
it won't be reported.
With this patch check_hung_task() simply sets the most significant
bit in ->last_switch_count to mark this task as "already reported".
This bit is cleared if we notice the change in nvcsw/nivcsw, so we
do not skip this task if it hangs again later.
Note that we also ignore the MSB in switch_count, we need this to
avoid the false-positive "already reported". This means that
->last_switch_count is not necessarily equal to ->nvcsw + ->nivcsw
but we do not care, we have enough bits to notice the change.
And this allows to remove the special switch_count == 0 case, just
we need to initialize ->last_switch_count = LONG_MIN.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
kernel/fork.c | 2 +-
kernel/hung_task.c | 19 +++++++++----------
2 files changed, 10 insertions(+), 11 deletions(-)
diff --git a/kernel/fork.c b/kernel/fork.c
index 8531609..aea397b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -866,7 +866,7 @@ static int copy_mm(unsigned long clone_flags, struct task_struct *tsk)
tsk->min_flt = tsk->maj_flt = 0;
tsk->nvcsw = tsk->nivcsw = 0;
#ifdef CONFIG_DETECT_HUNG_TASK
- tsk->last_switch_count = tsk->nvcsw + tsk->nivcsw;
+ tsk->last_switch_count = LONG_MIN; /* see check_hung_task() */
#endif
tsk->mm = NULL;
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 3952ab1..0f6233c 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -72,7 +72,7 @@ static struct notifier_block panic_block = {
static void check_hung_task(struct task_struct *t, unsigned long timeout)
{
- unsigned long switch_count = t->nvcsw + t->nivcsw;
+ unsigned long switch_count;
/*
* Ensure the task is not frozen.
@@ -81,19 +81,18 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
if (unlikely(t->flags & (PF_FROZEN | PF_FREEZER_SKIP)))
return;
- /*
- * When a freshly created task is scheduled once, changes its state to
- * TASK_UNINTERRUPTIBLE without having ever been switched out once, it
- * musn't be checked.
- */
- if (unlikely(!switch_count))
- return;
-
- if (switch_count != t->last_switch_count) {
+ /* Ignore MSB, see below. LONG_MAX = ~LONG_MIN. */
+ switch_count = (t->nvcsw + t->nivcsw) & LONG_MAX;
+ if (switch_count != (LONG_MAX & t->last_switch_count)) {
t->last_switch_count = switch_count;
return;
}
+ /* We use MSB to mark this task as already reported. */
+ if (t->last_switch_count & LONG_MIN)
+ return;
+ t->last_switch_count |= LONG_MIN;
+
trace_sched_process_hang(t);
if (!sysctl_hung_task_warnings)
--
1.5.5.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 1/1] hung_task debugging: Add tracepoint to report the hang
2013-10-19 16:18 [PATCH 0/1] hung_task debugging: Add tracepoint to report the hang Oleg Nesterov
@ 2013-10-19 16:18 ` Oleg Nesterov
2013-10-20 8:48 ` Ingo Molnar
2013-10-31 10:53 ` [tip:core/locking] " tip-bot for Oleg Nesterov
0 siblings, 2 replies; 5+ messages in thread
From: Oleg Nesterov @ 2013-10-19 16:18 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Steven Rostedt; +Cc: Dave Sullivan, linux-kernel
Currently check_hung_task() prints a warning if it detects the
problem, but it is not convenient to watch the system logs if
user-space wants to be notified about the hang.
Add the new trace_sched_process_hang() into check_hung_task(),
this way a user-space monitor can easily wait for the hang and
potentially resolve a problem.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
include/trace/events/sched.h | 19 +++++++++++++++++++
kernel/hung_task.c | 4 ++++
2 files changed, 23 insertions(+), 0 deletions(-)
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 2e7d994..2a652d1 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -424,6 +424,25 @@ TRACE_EVENT(sched_pi_setprio,
__entry->oldprio, __entry->newprio)
);
+#ifdef CONFIG_DETECT_HUNG_TASK
+TRACE_EVENT(sched_process_hang,
+ TP_PROTO(struct task_struct *tsk),
+ TP_ARGS(tsk),
+
+ TP_STRUCT__entry(
+ __array( char, comm, TASK_COMM_LEN )
+ __field( pid_t, pid )
+ ),
+
+ TP_fast_assign(
+ memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
+ __entry->pid = tsk->pid;
+ ),
+
+ TP_printk("comm=%s pid=%d", __entry->comm, __entry->pid)
+);
+#endif /* CONFIG_DETECT_HUNG_TASK */
+
#endif /* _TRACE_SCHED_H */
/* This part must be outside protection */
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 3e97fb1..3952ab1 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -16,6 +16,7 @@
#include <linux/export.h>
#include <linux/sysctl.h>
#include <linux/utsname.h>
+#include <trace/events/sched.h>
/*
* The number of tasks checked:
@@ -92,6 +93,9 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
t->last_switch_count = switch_count;
return;
}
+
+ trace_sched_process_hang(t);
+
if (!sysctl_hung_task_warnings)
return;
sysctl_hung_task_warnings--;
--
1.5.5.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] hung_task debugging: Add tracepoint to report the hang
2013-10-19 16:18 ` [PATCH 1/1] " Oleg Nesterov
@ 2013-10-20 8:48 ` Ingo Molnar
2013-10-29 19:10 ` Oleg Nesterov
2013-10-31 10:53 ` [tip:core/locking] " tip-bot for Oleg Nesterov
1 sibling, 1 reply; 5+ messages in thread
From: Ingo Molnar @ 2013-10-20 8:48 UTC (permalink / raw)
To: Oleg Nesterov; +Cc: Peter Zijlstra, Steven Rostedt, Dave Sullivan, linux-kernel
* Oleg Nesterov <oleg@redhat.com> wrote:
> Currently check_hung_task() prints a warning if it detects the problem,
> but it is not convenient to watch the system logs if user-space wants to
> be notified about the hang.
>
> Add the new trace_sched_process_hang() into check_hung_task(), this way
> a user-space monitor can easily wait for the hang and potentially
> resolve a problem.
I'm wondering, is the data of trace_console() in kernel/printk/printk.c
not sufficient?
If it's not enough then it might be better to add a higher level printk
tracepoint instead - that can catch hung_task messages and (much) more.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] hung_task debugging: Add tracepoint to report the hang
2013-10-20 8:48 ` Ingo Molnar
@ 2013-10-29 19:10 ` Oleg Nesterov
0 siblings, 0 replies; 5+ messages in thread
From: Oleg Nesterov @ 2013-10-29 19:10 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Peter Zijlstra, Steven Rostedt, Dave Sullivan, linux-kernel
On 10/20, Ingo Molnar wrote:
>
> * Oleg Nesterov <oleg@redhat.com> wrote:
>
> > Currently check_hung_task() prints a warning if it detects the problem,
> > but it is not convenient to watch the system logs if user-space wants to
> > be notified about the hang.
> >
> > Add the new trace_sched_process_hang() into check_hung_task(), this way
> > a user-space monitor can easily wait for the hang and potentially
> > resolve a problem.
>
> I'm wondering, is the data of trace_console() in kernel/printk/printk.c
> not sufficient?
Probably yes... I do not think they disable CONFIG_PRINTK.
But this is obviously much less convenient, they will need to parse the
text. And the user-space watchdog will be woken up much more often than
necessary. And they could probably simply read /var/log or interact with
syslogd somehow, but they specially asked for something better and more
robust.
But of course, I understand that every tracepoint should be justified.
So if you do not like this change I try to convince them to use
trace_console().
> If it's not enough then it might be better to add a higher level printk
> tracepoint instead - that can catch hung_task messages and (much) more.
Not sure I understand... I mean I do not understand why this is really
better for them, except this will simplify the parsing a bit. Anyway
I'd prefer to not send another doubtful patch ;)
Thanks.
Oleg.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [tip:core/locking] hung_task debugging: Add tracepoint to report the hang
2013-10-19 16:18 ` [PATCH 1/1] " Oleg Nesterov
2013-10-20 8:48 ` Ingo Molnar
@ 2013-10-31 10:53 ` tip-bot for Oleg Nesterov
1 sibling, 0 replies; 5+ messages in thread
From: tip-bot for Oleg Nesterov @ 2013-10-31 10:53 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, hpa, mingo, dsulliva, peterz, rostedt, tglx, oleg
Commit-ID: 6a716c90a51338009c3bc1f460829afaed8f922d
Gitweb: http://git.kernel.org/tip/6a716c90a51338009c3bc1f460829afaed8f922d
Author: Oleg Nesterov <oleg@redhat.com>
AuthorDate: Sat, 19 Oct 2013 18:18:28 +0200
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 31 Oct 2013 11:16:18 +0100
hung_task debugging: Add tracepoint to report the hang
Currently check_hung_task() prints a warning if it detects the
problem, but it is not convenient to watch the system logs if
user-space wants to be notified about the hang.
Add the new trace_sched_process_hang() into check_hung_task(),
this way a user-space monitor can easily wait for the hang and
potentially resolve a problem.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Dave Sullivan <dsulliva@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20131019161828.GA7439@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
include/trace/events/sched.h | 19 +++++++++++++++++++
kernel/hung_task.c | 4 ++++
2 files changed, 23 insertions(+)
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 2e7d994..2a652d1 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -424,6 +424,25 @@ TRACE_EVENT(sched_pi_setprio,
__entry->oldprio, __entry->newprio)
);
+#ifdef CONFIG_DETECT_HUNG_TASK
+TRACE_EVENT(sched_process_hang,
+ TP_PROTO(struct task_struct *tsk),
+ TP_ARGS(tsk),
+
+ TP_STRUCT__entry(
+ __array( char, comm, TASK_COMM_LEN )
+ __field( pid_t, pid )
+ ),
+
+ TP_fast_assign(
+ memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
+ __entry->pid = tsk->pid;
+ ),
+
+ TP_printk("comm=%s pid=%d", __entry->comm, __entry->pid)
+);
+#endif /* CONFIG_DETECT_HUNG_TASK */
+
#endif /* _TRACE_SCHED_H */
/* This part must be outside protection */
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 0422523..8807061 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -16,6 +16,7 @@
#include <linux/export.h>
#include <linux/sysctl.h>
#include <linux/utsname.h>
+#include <trace/events/sched.h>
/*
* The number of tasks checked:
@@ -92,6 +93,9 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
t->last_switch_count = switch_count;
return;
}
+
+ trace_sched_process_hang(t);
+
if (!sysctl_hung_task_warnings)
return;
sysctl_hung_task_warnings--;
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-10-31 10:54 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-19 16:18 [PATCH 0/1] hung_task debugging: Add tracepoint to report the hang Oleg Nesterov
2013-10-19 16:18 ` [PATCH 1/1] " Oleg Nesterov
2013-10-20 8:48 ` Ingo Molnar
2013-10-29 19:10 ` Oleg Nesterov
2013-10-31 10:53 ` [tip:core/locking] " tip-bot for Oleg Nesterov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.