* task blocked for more than 120 seconds
@ 2010-11-04 15:58 Sergey Senozhatsky
2010-11-04 16:19 ` Markus Trippelsdorf
0 siblings, 1 reply; 6+ messages in thread
From: Sergey Senozhatsky @ 2010-11-04 15:58 UTC (permalink / raw)
To: Oleg Nesterov
Cc: KOSAKI Motohiro, Roland McGrath, David Howells, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 4855 bytes --]
Hello,
Got the following traces:
[42001.449295] INFO: task openbox:17761 blocked for more than 120 seconds.
[42001.449303] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[42001.449311] openbox D 0000000000000003 0 17761 3723 0x00000000
[42001.449321] ffff88004ce8dc38 0000000000000046 0000000000000000 ffff88004ce8c000
[42001.449333] ffff88004ce8dfd8 0000000000012040 ffff880133010000 ffff88004ce8dfd8
[42001.449343] 0000000000012040 0000000000012040 ffff88004ce8dfd8 0000000000012040
[42001.449354] Call Trace:
[42001.449369] [<ffffffff814212ff>] schedule_timeout+0x38/0x220
[42001.449381] [<ffffffff81066f21>] ? mark_held_locks+0x50/0x72
[42001.449389] [<ffffffff81423dae>] ? _raw_spin_unlock_irq+0x2b/0x59
[42001.449398] [<ffffffff81036070>] ? get_parent_ip+0x11/0x41
[42001.449406] [<ffffffff81036132>] ? sub_preempt_count+0x92/0xa5
[42001.449413] [<ffffffff8142057e>] wait_for_common+0xca/0x144
[42001.449421] [<ffffffff81038b13>] ? default_wake_function+0x0/0xf
[42001.449429] [<ffffffff81423e4b>] ? _raw_spin_unlock_irqrestore+0x6f/0x74
[42001.449436] [<ffffffff81420692>] wait_for_completion+0x18/0x1a
[42001.449445] [<ffffffff81084808>] stop_one_cpu+0x8c/0xba
[42001.449453] [<ffffffff810355cb>] ? migration_cpu_stop+0x0/0x3d
[42001.449461] [<ffffffff81035aef>] sched_exec+0xc3/0xdc
[42001.449470] [<ffffffff810e7dad>] do_execve+0xaa/0x267
[42001.449479] [<ffffffff8100a58b>] sys_execve+0x3e/0x55
[42001.449488] [<ffffffff8100265c>] stub_execve+0x6c/0xc0
[42001.449494] 1 lock held by openbox/17761:
[42001.449498] #0: (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff810e7b1d>] prepare_bprm_creds+0x28/0x64
[47761.448699] INFO: task make:29123 blocked for more than 120 seconds.
[47761.448704] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47761.448710] make D 0000000000000001 0 29123 24208 0x00000004
[47761.448719] ffff880115dadcf0 0000000000000046 ffff880100000000 ffff880115dac000
[47761.448731] ffff880115dadfd8 0000000000012040 ffff880133290000 ffff880115dadfd8
[47761.448741] 0000000000012040 0000000000012040 ffff880115dadfd8 0000000000012040
[47761.448752] Call Trace:
[47761.448766] [<ffffffff814212ff>] schedule_timeout+0x38/0x220
[47761.448774] [<ffffffff814204da>] ? wait_for_common+0x26/0x144
[47761.448784] [<ffffffff81207b53>] ? do_raw_spin_lock+0x6b/0x122
[47761.448792] [<ffffffff81423dae>] ? _raw_spin_unlock_irq+0x2b/0x59
[47761.448801] [<ffffffff81036070>] ? get_parent_ip+0x11/0x41
[47761.448808] [<ffffffff81036132>] ? sub_preempt_count+0x92/0xa5
[47761.448816] [<ffffffff8142057e>] wait_for_common+0xca/0x144
[47761.448823] [<ffffffff81038b13>] ? default_wake_function+0x0/0xf
[47761.448831] [<ffffffff81036132>] ? sub_preempt_count+0x92/0xa5
[47761.448838] [<ffffffff81420692>] wait_for_completion+0x18/0x1a
[47761.448845] [<ffffffff8103cec0>] do_fork+0x2b0/0x32e
[47761.448856] [<ffffffff810021ba>] ? sysret_check+0x2e/0x69
[47761.448864] [<ffffffff8100a526>] sys_vfork+0x20/0x22
[47761.448871] [<ffffffff81002563>] stub_vfork+0x13/0x20
[47761.448878] [<ffffffff81002182>] ? system_call_fastpath+0x16/0x1b
[47761.448884] no locks held by make/29123.
[47761.448891] INFO: task make:31380 blocked for more than 120 seconds.
[47761.448895] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47761.448899] make D 0000000000000000 0 31380 29123 0x00000000
[47761.448908] ffff880157f01c38 0000000000000046 0000000000000000 ffff880157f00000
[47761.448918] ffff880157f01fd8 0000000000012040 ffff8800773abea0 ffff880157f01fd8
[47761.448929] 0000000000012040 0000000000012040 ffff880157f01fd8 0000000000012040
[47761.448939] Call Trace:
[47761.448947] [<ffffffff814212ff>] schedule_timeout+0x38/0x220
[47761.448957] [<ffffffff81066f21>] ? mark_held_locks+0x50/0x72
[47761.448965] [<ffffffff81423dae>] ? _raw_spin_unlock_irq+0x2b/0x59
[47761.448972] [<ffffffff81036070>] ? get_parent_ip+0x11/0x41
[47761.448979] [<ffffffff81036132>] ? sub_preempt_count+0x92/0xa5
[47761.448986] [<ffffffff8142057e>] wait_for_common+0xca/0x144
[47761.448994] [<ffffffff81038b13>] ? default_wake_function+0x0/0xf
[47761.449001] [<ffffffff81423e4b>] ? _raw_spin_unlock_irqrestore+0x6f/0x74
[47761.449008] [<ffffffff81420692>] wait_for_completion+0x18/0x1a
[47761.449017] [<ffffffff81084808>] stop_one_cpu+0x8c/0xba
[47761.449025] [<ffffffff810355cb>] ? migration_cpu_stop+0x0/0x3d
[47761.449033] [<ffffffff81035aef>] sched_exec+0xc3/0xdc
[47761.449041] [<ffffffff810e7dad>] do_execve+0xaa/0x267
[47761.449049] [<ffffffff8100a58b>] sys_execve+0x3e/0x55
[47761.449056] [<ffffffff8100265c>] stub_execve+0x6c/0xc0
[47761.449062] 1 lock held by make/31380:
[47761.449065] #0: (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff810e7b1d>] prepare_bprm_creds+0x28/0x64
Sergey
[-- Attachment #2: Type: application/pgp-signature, Size: 316 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: task blocked for more than 120 seconds 2010-11-04 15:58 task blocked for more than 120 seconds Sergey Senozhatsky @ 2010-11-04 16:19 ` Markus Trippelsdorf 2010-11-04 17:32 ` Peter Zijlstra 0 siblings, 1 reply; 6+ messages in thread From: Markus Trippelsdorf @ 2010-11-04 16:19 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Oleg Nesterov, KOSAKI Motohiro, Roland McGrath, David Howells, linux-kernel, Marcelo Tosatti, Peter Zijlstra, Ingo Molnar, Thomas Gleixner On Thu, Nov 04, 2010 at 05:58:26PM +0200, Sergey Senozhatsky wrote: > Hello, > Got the following traces: > > [42001.449295] INFO: task openbox:17761 blocked for more than 120 seconds. > [42001.449303] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [42001.449311] openbox D 0000000000000003 0 17761 3723 0x00000000 > [42001.449321] ffff88004ce8dc38 0000000000000046 0000000000000000 ffff88004ce8c000 > [42001.449333] ffff88004ce8dfd8 0000000000012040 ffff880133010000 ffff88004ce8dfd8 > [42001.449343] 0000000000012040 0000000000012040 ffff88004ce8dfd8 0000000000012040 > [42001.449354] Call Trace: > [42001.449369] [<ffffffff814212ff>] schedule_timeout+0x38/0x220 > [42001.449381] [<ffffffff81066f21>] ? mark_held_locks+0x50/0x72 > [42001.449389] [<ffffffff81423dae>] ? _raw_spin_unlock_irq+0x2b/0x59 > [42001.449398] [<ffffffff81036070>] ? get_parent_ip+0x11/0x41 > [42001.449406] [<ffffffff81036132>] ? sub_preempt_count+0x92/0xa5 > [42001.449413] [<ffffffff8142057e>] wait_for_common+0xca/0x144 > [42001.449421] [<ffffffff81038b13>] ? default_wake_function+0x0/0xf > [42001.449429] [<ffffffff81423e4b>] ? _raw_spin_unlock_irqrestore+0x6f/0x74 > [42001.449436] [<ffffffff81420692>] wait_for_completion+0x18/0x1a > [42001.449445] [<ffffffff81084808>] stop_one_cpu+0x8c/0xba These traces look related to: http://article.gmane.org/gmane.linux.kernel/1055100 Does reverting 34f971f6f7988be4d014eec3e3526bee6d007ffa help in your case? -- Markus ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: task blocked for more than 120 seconds 2010-11-04 16:19 ` Markus Trippelsdorf @ 2010-11-04 17:32 ` Peter Zijlstra 2010-11-04 18:16 ` Sergey Senozhatsky 2010-11-05 11:14 ` Sergey Senozhatsky 0 siblings, 2 replies; 6+ messages in thread From: Peter Zijlstra @ 2010-11-04 17:32 UTC (permalink / raw) To: Markus Trippelsdorf Cc: Sergey Senozhatsky, Oleg Nesterov, KOSAKI Motohiro, Roland McGrath, David Howells, linux-kernel, Marcelo Tosatti, Ingo Molnar, Thomas Gleixner On Thu, 2010-11-04 at 17:19 +0100, Markus Trippelsdorf wrote: > On Thu, Nov 04, 2010 at 05:58:26PM +0200, Sergey Senozhatsky wrote: > > Hello, > > Got the following traces: > > > > [42001.449295] INFO: task openbox:17761 blocked for more than 120 seconds. > > [42001.449303] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > [42001.449311] openbox D 0000000000000003 0 17761 3723 0x00000000 > > [42001.449321] ffff88004ce8dc38 0000000000000046 0000000000000000 ffff88004ce8c000 > > [42001.449333] ffff88004ce8dfd8 0000000000012040 ffff880133010000 ffff88004ce8dfd8 > > [42001.449343] 0000000000012040 0000000000012040 ffff88004ce8dfd8 0000000000012040 > > [42001.449354] Call Trace: > > [42001.449369] [<ffffffff814212ff>] schedule_timeout+0x38/0x220 > > [42001.449381] [<ffffffff81066f21>] ? mark_held_locks+0x50/0x72 > > [42001.449389] [<ffffffff81423dae>] ? _raw_spin_unlock_irq+0x2b/0x59 > > [42001.449398] [<ffffffff81036070>] ? get_parent_ip+0x11/0x41 > > [42001.449406] [<ffffffff81036132>] ? sub_preempt_count+0x92/0xa5 > > [42001.449413] [<ffffffff8142057e>] wait_for_common+0xca/0x144 > > [42001.449421] [<ffffffff81038b13>] ? default_wake_function+0x0/0xf > > [42001.449429] [<ffffffff81423e4b>] ? _raw_spin_unlock_irqrestore+0x6f/0x74 > > [42001.449436] [<ffffffff81420692>] wait_for_completion+0x18/0x1a > > [42001.449445] [<ffffffff81084808>] stop_one_cpu+0x8c/0xba > > These traces look related to: > http://article.gmane.org/gmane.linux.kernel/1055100 > > Does reverting 34f971f6f7988be4d014eec3e3526bee6d007ffa help in your > case? --- Subject: sched: Fixup cross sched_class wakeup preemption From: Peter Zijlstra <a.p.zijlstra@chello.nl> Date: Sun Oct 31 12:37:04 CET 2010 Instead of dealing with sched classes inside each check_preempt_curr() implementation, pull out this logic into the generic wakeup preemption path. This fixes a hang in KVM (and others) where we are waiting for the stop machine thread to run.. Tested-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> --- kernel/sched.c | 39 ++++++++++++++++++++++++++++----------- kernel/sched_fair.c | 6 ------ kernel/sched_stoptask.c | 2 +- 3 files changed, 29 insertions(+), 18 deletions(-) Index: linux-2.6/kernel/sched.c =================================================================== --- linux-2.6.orig/kernel/sched.c +++ linux-2.6/kernel/sched.c @@ -560,18 +560,9 @@ struct rq { static DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues); -static inline -void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags) -{ - rq->curr->sched_class->check_preempt_curr(rq, p, flags); - /* - * A queue event has occurred, and we're going to schedule. In - * this case, we can save a useless back to back clock update. - */ - if (test_tsk_need_resched(p)) - rq->skip_clock_update = 1; -} +static inline +void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags); static inline int cpu_of(struct rq *rq) { @@ -9400,4 +9391,30 @@ void synchronize_sched_expedited(void) } EXPORT_SYMBOL_GPL(synchronize_sched_expedited); +static inline +void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags) +{ + const struct sched_class *class; + + if (p->sched_class == rq->curr->sched_class) + rq->curr->sched_class->check_preempt_curr(rq, p, flags); + + for_each_class(class) { + if (class == rq->curr->sched_class) + break; + if (class == p->sched_class) { + resched_task(rq->curr); + break; + } + } + + /* + * A queue event has occurred, and we're going to schedule. In + * this case, we can save a useless back to back clock update. + */ + if (test_tsk_need_resched(rq->curr)) + rq->skip_clock_update = 1; +} + + #endif /* #else #ifndef CONFIG_SMP */ Index: linux-2.6/kernel/sched_fair.c =================================================================== --- linux-2.6.orig/kernel/sched_fair.c +++ linux-2.6/kernel/sched_fair.c @@ -1654,12 +1654,6 @@ static void check_preempt_wakeup(struct struct cfs_rq *cfs_rq = task_cfs_rq(curr); int scale = cfs_rq->nr_running >= sched_nr_latency; - if (unlikely(rt_prio(p->prio))) - goto preempt; - - if (unlikely(p->sched_class != &fair_sched_class)) - return; - if (unlikely(se == pse)) return; Index: linux-2.6/kernel/sched_stoptask.c =================================================================== --- linux-2.6.orig/kernel/sched_stoptask.c +++ linux-2.6/kernel/sched_stoptask.c @@ -19,7 +19,7 @@ select_task_rq_stop(struct rq *rq, struc static void check_preempt_curr_stop(struct rq *rq, struct task_struct *p, int flags) { - resched_task(rq->curr); /* we preempt everything */ + /* we're never preempted */ } static struct task_struct *pick_next_task_stop(struct rq *rq) ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: task blocked for more than 120 seconds 2010-11-04 17:32 ` Peter Zijlstra @ 2010-11-04 18:16 ` Sergey Senozhatsky 2010-11-05 11:14 ` Sergey Senozhatsky 1 sibling, 0 replies; 6+ messages in thread From: Sergey Senozhatsky @ 2010-11-04 18:16 UTC (permalink / raw) To: Peter Zijlstra Cc: Markus Trippelsdorf, Sergey Senozhatsky, Oleg Nesterov, KOSAKI Motohiro, Roland McGrath, David Howells, linux-kernel, Marcelo Tosatti, Ingo Molnar, Thomas Gleixner [-- Attachment #1: Type: text/plain, Size: 5494 bytes --] On (11/04/10 18:32), Peter Zijlstra wrote: > On Thu, 2010-11-04 at 17:19 +0100, Markus Trippelsdorf wrote: > > On Thu, Nov 04, 2010 at 05:58:26PM +0200, Sergey Senozhatsky wrote: > > > Hello, > > > Got the following traces: > > > > > > [42001.449295] INFO: task openbox:17761 blocked for more than 120 seconds. > > > [42001.449303] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > > [42001.449311] openbox D 0000000000000003 0 17761 3723 0x00000000 > > > [42001.449321] ffff88004ce8dc38 0000000000000046 0000000000000000 ffff88004ce8c000 > > > [42001.449333] ffff88004ce8dfd8 0000000000012040 ffff880133010000 ffff88004ce8dfd8 > > > [42001.449343] 0000000000012040 0000000000012040 ffff88004ce8dfd8 0000000000012040 > > > [42001.449354] Call Trace: > > > [42001.449369] [<ffffffff814212ff>] schedule_timeout+0x38/0x220 > > > [42001.449381] [<ffffffff81066f21>] ? mark_held_locks+0x50/0x72 > > > [42001.449389] [<ffffffff81423dae>] ? _raw_spin_unlock_irq+0x2b/0x59 > > > [42001.449398] [<ffffffff81036070>] ? get_parent_ip+0x11/0x41 > > > [42001.449406] [<ffffffff81036132>] ? sub_preempt_count+0x92/0xa5 > > > [42001.449413] [<ffffffff8142057e>] wait_for_common+0xca/0x144 > > > [42001.449421] [<ffffffff81038b13>] ? default_wake_function+0x0/0xf > > > [42001.449429] [<ffffffff81423e4b>] ? _raw_spin_unlock_irqrestore+0x6f/0x74 > > > [42001.449436] [<ffffffff81420692>] wait_for_completion+0x18/0x1a > > > [42001.449445] [<ffffffff81084808>] stop_one_cpu+0x8c/0xba > > > > These traces look related to: > > http://article.gmane.org/gmane.linux.kernel/1055100 > > > > Does reverting 34f971f6f7988be4d014eec3e3526bee6d007ffa help in your > > case? > > Hello, I'll test "sched: Fixup cross sched_class wakeup preemption" first and then (if it fails) I'll test 34f971f6f7988be4d014eec3e3526bee6d007ffa revert. Sergey > --- > Subject: sched: Fixup cross sched_class wakeup preemption > From: Peter Zijlstra <a.p.zijlstra@chello.nl> > Date: Sun Oct 31 12:37:04 CET 2010 > > Instead of dealing with sched classes inside each check_preempt_curr() > implementation, pull out this logic into the generic wakeup preemption > path. > > This fixes a hang in KVM (and others) where we are waiting for the > stop machine thread to run.. > > Tested-by: Marcelo Tosatti <mtosatti@redhat.com> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> > LKML-Reference: <new-submission> > --- > kernel/sched.c | 39 ++++++++++++++++++++++++++++----------- > kernel/sched_fair.c | 6 ------ > kernel/sched_stoptask.c | 2 +- > 3 files changed, 29 insertions(+), 18 deletions(-) > > Index: linux-2.6/kernel/sched.c > =================================================================== > --- linux-2.6.orig/kernel/sched.c > +++ linux-2.6/kernel/sched.c > @@ -560,18 +560,9 @@ struct rq { > > static DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues); > > -static inline > -void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags) > -{ > - rq->curr->sched_class->check_preempt_curr(rq, p, flags); > > - /* > - * A queue event has occurred, and we're going to schedule. In > - * this case, we can save a useless back to back clock update. > - */ > - if (test_tsk_need_resched(p)) > - rq->skip_clock_update = 1; > -} > +static inline > +void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags); > > static inline int cpu_of(struct rq *rq) > { > @@ -9400,4 +9391,30 @@ void synchronize_sched_expedited(void) > } > EXPORT_SYMBOL_GPL(synchronize_sched_expedited); > > +static inline > +void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags) > +{ > + const struct sched_class *class; > + > + if (p->sched_class == rq->curr->sched_class) > + rq->curr->sched_class->check_preempt_curr(rq, p, flags); > + > + for_each_class(class) { > + if (class == rq->curr->sched_class) > + break; > + if (class == p->sched_class) { > + resched_task(rq->curr); > + break; > + } > + } > + > + /* > + * A queue event has occurred, and we're going to schedule. In > + * this case, we can save a useless back to back clock update. > + */ > + if (test_tsk_need_resched(rq->curr)) > + rq->skip_clock_update = 1; > +} > + > + > #endif /* #else #ifndef CONFIG_SMP */ > Index: linux-2.6/kernel/sched_fair.c > =================================================================== > --- linux-2.6.orig/kernel/sched_fair.c > +++ linux-2.6/kernel/sched_fair.c > @@ -1654,12 +1654,6 @@ static void check_preempt_wakeup(struct > struct cfs_rq *cfs_rq = task_cfs_rq(curr); > int scale = cfs_rq->nr_running >= sched_nr_latency; > > - if (unlikely(rt_prio(p->prio))) > - goto preempt; > - > - if (unlikely(p->sched_class != &fair_sched_class)) > - return; > - > if (unlikely(se == pse)) > return; > > Index: linux-2.6/kernel/sched_stoptask.c > =================================================================== > --- linux-2.6.orig/kernel/sched_stoptask.c > +++ linux-2.6/kernel/sched_stoptask.c > @@ -19,7 +19,7 @@ select_task_rq_stop(struct rq *rq, struc > static void > check_preempt_curr_stop(struct rq *rq, struct task_struct *p, int flags) > { > - resched_task(rq->curr); /* we preempt everything */ > + /* we're never preempted */ > } > > static struct task_struct *pick_next_task_stop(struct rq *rq) > [-- Attachment #2: Type: application/pgp-signature, Size: 316 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: task blocked for more than 120 seconds 2010-11-04 17:32 ` Peter Zijlstra 2010-11-04 18:16 ` Sergey Senozhatsky @ 2010-11-05 11:14 ` Sergey Senozhatsky 1 sibling, 0 replies; 6+ messages in thread From: Sergey Senozhatsky @ 2010-11-05 11:14 UTC (permalink / raw) To: Peter Zijlstra Cc: Markus Trippelsdorf, Sergey Senozhatsky, Oleg Nesterov, KOSAKI Motohiro, Roland McGrath, David Howells, linux-kernel, Marcelo Tosatti, Ingo Molnar, Thomas Gleixner [-- Attachment #1: Type: text/plain, Size: 583 bytes --] On (11/04/10 18:32), Peter Zijlstra wrote: > Subject: sched: Fixup cross sched_class wakeup preemption > From: Peter Zijlstra <a.p.zijlstra@chello.nl> > Date: Sun Oct 31 12:37:04 CET 2010 > > Instead of dealing with sched classes inside each check_preempt_curr() > implementation, pull out this logic into the generic wakeup preemption > path. > > This fixes a hang in KVM (and others) where we are waiting for the > stop machine thread to run.. > Hello, Peter, patch seems to work. Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Sergey [-- Attachment #2: Type: application/pgp-signature, Size: 316 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* task blocked for more than 120 seconds. @ 2010-10-07 23:18 Shawn Bohrer 0 siblings, 0 replies; 6+ messages in thread From: Shawn Bohrer @ 2010-10-07 23:18 UTC (permalink / raw) To: linux-nfs; +Cc: linux-kernel Hello, I've got some machines with nfs mounted home directories and recently had the machines lock up with the following output below from the kernel logs. This machine was running 2.6.32.21 at the time, and was locked up for at least 20 minutes before we rebooted. We've had this happen to us twice now, so while I haven't tried I believe we can reproduce it in about a day. Does anyone have any insight on what may be happening here or any suggestions for debugging? Thanks, Shawn INFO: task java:24970 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. java D ffff88189c50e198 0 24970 24239 0x00000000 ffff88205b7339c8 0000000000000086 0000000000000000 000000005ae2d09c 0000000000000286 0000000000000030 ffff88205b733988 000000010174a797 ffff88205acd4a80 ffff88205b733fd8 000000000000e198 ffff88205acd4a80 Call Trace: [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs] [<ffffffff81419df3>] io_schedule+0x73/0xc0 [<ffffffffa03fe02e>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs] [<ffffffff8141a61f>] __wait_on_bit+0x5f/0x90 [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs] [<ffffffff8141a6c8>] out_of_line_wait_on_bit+0x78/0x90 [<ffffffff8107fe50>] ? wake_bit_function+0x0/0x40 [<ffffffff810e11ce>] ? find_get_page+0x1e/0xa0 [<ffffffffa03fe00f>] nfs_wait_on_request+0x2f/0x40 [nfs] [<ffffffffa0403d77>] nfs_updatepage+0x107/0x4f0 [nfs] [<ffffffffa03f3a6a>] nfs_write_end+0x5a/0x2c0 [nfs] [<ffffffff8100c9ce>] ? common_interrupt+0xe/0x13 [<ffffffff810e0bb4>] generic_file_buffered_write+0x174/0x2a0 [<ffffffff810e25f0>] __generic_file_aio_write+0x240/0x470 [<ffffffff81047774>] ? __enqueue_entity+0x84/0x90 [<ffffffff8104f7d5>] ? enqueue_task_fair+0x45/0x90 [<ffffffff810e288f>] generic_file_aio_write+0x6f/0xe0 [<ffffffffa03f371a>] nfs_file_write+0xda/0x1e0 [nfs] [<ffffffff8112fb7a>] do_sync_write+0xfa/0x140 [<ffffffff8107fe10>] ? autoremove_wake_function+0x0/0x40 [<ffffffff811bd246>] ? security_file_permission+0x16/0x20 [<ffffffff8112fe78>] vfs_write+0xb8/0x1a0 [<ffffffff81130711>] sys_write+0x51/0x90 [<ffffffff8100c11b>] system_call_fastpath+0x16/0x1b INFO: task java:24982 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. java D ffff8800283ee198 0 24982 24239 0x00000000 ffff88104f6339c8 0000000000000086 0000000000000000 00000000ffffff10 0000000000000286 0000000000000030 0000000000000282 000000010174a681 ffff88105a592580 ffff88104f633fd8 000000000000e198 ffff88105a592580 Call Trace: [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs] [<ffffffff81419df3>] io_schedule+0x73/0xc0 [<ffffffffa03fe02e>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs] [<ffffffff8141a61f>] __wait_on_bit+0x5f/0x90 [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs] [<ffffffff8141a6c8>] out_of_line_wait_on_bit+0x78/0x90 [<ffffffff8107fe50>] ? wake_bit_function+0x0/0x40 [<ffffffff810e11ce>] ? find_get_page+0x1e/0xa0 [<ffffffffa03fe00f>] nfs_wait_on_request+0x2f/0x40 [nfs] [<ffffffffa0403d77>] nfs_updatepage+0x107/0x4f0 [nfs] [<ffffffffa03f3a6a>] nfs_write_end+0x5a/0x2c0 [nfs] [<ffffffff810e0bb4>] generic_file_buffered_write+0x174/0x2a0 [<ffffffff810e25f0>] __generic_file_aio_write+0x240/0x470 [<ffffffff810e288f>] generic_file_aio_write+0x6f/0xe0 [<ffffffffa03f371a>] nfs_file_write+0xda/0x1e0 [nfs] [<ffffffff8112fb7a>] do_sync_write+0xfa/0x140 [<ffffffff810ffef9>] ? do_wp_page+0x109/0x7c0 [<ffffffff8107fe10>] ? autoremove_wake_function+0x0/0x40 [<ffffffff811bd246>] ? security_file_permission+0x16/0x20 [<ffffffff8112fe78>] vfs_write+0xb8/0x1a0 [<ffffffff81130711>] sys_write+0x51/0x90 [<ffffffff8100c11b>] system_call_fastpath+0x16/0x1b INFO: task hbitimestamp:23184 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. hbitimestamp D ffff88109c6ae198 0 23184 23175 0x00000000 ffff880855d179c8 0000000000000086 0000000000000000 000000005bb2d600 0000000000000282 0000000000000030 0000000000000282 00000001017445ab ffff8808577bc640 ffff880855d17fd8 000000000000e198 ffff8808577bc640 Call Trace: [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs] [<ffffffff81419df3>] io_schedule+0x73/0xc0 [<ffffffffa03fe02e>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs] [<ffffffff8141a61f>] __wait_on_bit+0x5f/0x90 [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs] [<ffffffff8141a6c8>] out_of_line_wait_on_bit+0x78/0x90 [<ffffffff8107fe50>] ? wake_bit_function+0x0/0x40 [<ffffffff810e11ce>] ? find_get_page+0x1e/0xa0 [<ffffffffa03fe00f>] nfs_wait_on_request+0x2f/0x40 [nfs] [<ffffffffa0403d77>] nfs_updatepage+0x107/0x4f0 [nfs] [<ffffffffa03f3a6a>] nfs_write_end+0x5a/0x2c0 [nfs] [<ffffffff810e0bb4>] generic_file_buffered_write+0x174/0x2a0 [<ffffffff810e25f0>] __generic_file_aio_write+0x240/0x470 [<ffffffff810e288f>] generic_file_aio_write+0x6f/0xe0 [<ffffffffa03f371a>] nfs_file_write+0xda/0x1e0 [nfs] [<ffffffff8112fb7a>] do_sync_write+0xfa/0x140 [<ffffffff8107fe10>] ? autoremove_wake_function+0x0/0x40 [<ffffffff8100cb4e>] ? apic_timer_interrupt+0xe/0x20 [<ffffffff811bd246>] ? security_file_permission+0x16/0x20 [<ffffffff8112fe78>] vfs_write+0xb8/0x1a0 [<ffffffff81130711>] sys_write+0x51/0x90 [<ffffffff8141c74e>] ? do_device_not_available+0xe/0x10 [<ffffffff8100c11b>] system_call_fastpath+0x16/0x1b INFO: task java:24970 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. java D ffff88189c50e198 0 24970 24239 0x00000000 ffff88205b7339c8 0000000000000086 0000000000000000 000000005ae2d09c 0000000000000286 0000000000000030 ffff88205b733988 000000010174a797 ffff88205acd4a80 ffff88205b733fd8 000000000000e198 ffff88205acd4a80 Call Trace: [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs] [<ffffffff81419df3>] io_schedule+0x73/0xc0 [<ffffffffa03fe02e>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs] [<ffffffff8141a61f>] __wait_on_bit+0x5f/0x90 [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs] [<ffffffff8141a6c8>] out_of_line_wait_on_bit+0x78/0x90 [<ffffffff8107fe50>] ? wake_bit_function+0x0/0x40 [<ffffffff810e11ce>] ? find_get_page+0x1e/0xa0 [<ffffffffa03fe00f>] nfs_wait_on_request+0x2f/0x40 [nfs] [<ffffffffa0403d77>] nfs_updatepage+0x107/0x4f0 [nfs] [<ffffffffa03f3a6a>] nfs_write_end+0x5a/0x2c0 [nfs] [<ffffffff8100c9ce>] ? common_interrupt+0xe/0x13 [<ffffffff810e0bb4>] generic_file_buffered_write+0x174/0x2a0 [<ffffffff810e25f0>] __generic_file_aio_write+0x240/0x470 [<ffffffff81047774>] ? __enqueue_entity+0x84/0x90 [<ffffffff8104f7d5>] ? enqueue_task_fair+0x45/0x90 [<ffffffff810e288f>] generic_file_aio_write+0x6f/0xe0 [<ffffffffa03f371a>] nfs_file_write+0xda/0x1e0 [nfs] [<ffffffff8112fb7a>] do_sync_write+0xfa/0x140 [<ffffffff8107fe10>] ? autoremove_wake_function+0x0/0x40 [<ffffffff811bd246>] ? security_file_permission+0x16/0x20 [<ffffffff8112fe78>] vfs_write+0xb8/0x1a0 [<ffffffff81130711>] sys_write+0x51/0x90 [<ffffffff8100c11b>] system_call_fastpath+0x16/0x1b INFO: task java:24982 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. java D ffff8800283ee198 0 24982 24239 0x00000000 ffff88104f6339c8 0000000000000086 0000000000000000 00000000ffffff10 0000000000000286 0000000000000030 0000000000000282 000000010174a681 ffff88105a592580 ffff88104f633fd8 000000000000e198 ffff88105a592580 Call Trace: [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs] [<ffffffff81419df3>] io_schedule+0x73/0xc0 [<ffffffffa03fe02e>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs] [<ffffffff8141a61f>] __wait_on_bit+0x5f/0x90 [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs] [<ffffffff8141a6c8>] out_of_line_wait_on_bit+0x78/0x90 [<ffffffff8107fe50>] ? wake_bit_function+0x0/0x40 [<ffffffff810e11ce>] ? find_get_page+0x1e/0xa0 [<ffffffffa03fe00f>] nfs_wait_on_request+0x2f/0x40 [nfs] [<ffffffffa0403d77>] nfs_updatepage+0x107/0x4f0 [nfs] [<ffffffffa03f3a6a>] nfs_write_end+0x5a/0x2c0 [nfs] [<ffffffff810e0bb4>] generic_file_buffered_write+0x174/0x2a0 [<ffffffff810e25f0>] __generic_file_aio_write+0x240/0x470 [<ffffffff810e288f>] generic_file_aio_write+0x6f/0xe0 [<ffffffffa03f371a>] nfs_file_write+0xda/0x1e0 [nfs] [<ffffffff8112fb7a>] do_sync_write+0xfa/0x140 [<ffffffff810ffef9>] ? do_wp_page+0x109/0x7c0 [<ffffffff8107fe10>] ? autoremove_wake_function+0x0/0x40 [<ffffffff811bd246>] ? security_file_permission+0x16/0x20 [<ffffffff8112fe78>] vfs_write+0xb8/0x1a0 [<ffffffff81130711>] sys_write+0x51/0x90 [<ffffffff8100c11b>] system_call_fastpath+0x16/0x1b INFO: task tail:22115 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. tail D ffff88189c4ee198 0 22115 14494 0x00000000 ffff88085b95bbd8 0000000000000086 0000000000000000 ffffffffa035d896 ffff88205be57cc8 ffff88205be57cc8 ffff88205be57cc8 000000010174d70e ffff8808564fe400 ffff88085b95bfd8 000000000000e198 ffff8808564fe400 Call Trace: [<ffffffffa035d896>] ? __rpc_execute+0xd6/0x2a0 [sunrpc] [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs] [<ffffffff81419df3>] io_schedule+0x73/0xc0 [<ffffffffa03fe02e>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs] [<ffffffff8141a61f>] __wait_on_bit+0x5f/0x90 [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs] [<ffffffff8141a6c8>] out_of_line_wait_on_bit+0x78/0x90 [<ffffffff8107fe50>] ? wake_bit_function+0x0/0x40 [<ffffffffa03fe00f>] nfs_wait_on_request+0x2f/0x40 [nfs] [<ffffffffa04037d2>] nfs_sync_mapping_wait+0x122/0x260 [nfs] [<ffffffffa0403aa9>] nfs_write_mapping+0x79/0xb0 [nfs] [<ffffffffa0403afa>] nfs_wb_nocommit+0x1a/0x20 [nfs] [<ffffffffa03f60f8>] nfs_getattr+0x128/0x140 [nfs] [<ffffffff811351d1>] vfs_getattr+0x51/0x80 [<ffffffff8113548f>] vfs_fstat+0x3f/0x60 [<ffffffff811354d4>] sys_newfstat+0x24/0x40 [<ffffffff810841a4>] ? sys_nanosleep+0x74/0x80 [<ffffffff8100c11b>] system_call_fastpath+0x16/0x1b INFO: task hbitimestamp:23184 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. hbitimestamp D ffff88109c6ae198 0 23184 23175 0x00000000 ffff880855d179c8 0000000000000086 0000000000000000 000000005bb2d600 0000000000000282 0000000000000030 0000000000000282 00000001017445ab ffff8808577bc640 ffff880855d17fd8 000000000000e198 ffff8808577bc640 Call Trace: [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs] [<ffffffff81419df3>] io_schedule+0x73/0xc0 [<ffffffffa03fe02e>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs] [<ffffffff8141a61f>] __wait_on_bit+0x5f/0x90 [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs] [<ffffffff8141a6c8>] out_of_line_wait_on_bit+0x78/0x90 [<ffffffff8107fe50>] ? wake_bit_function+0x0/0x40 [<ffffffff810e11ce>] ? find_get_page+0x1e/0xa0 [<ffffffffa03fe00f>] nfs_wait_on_request+0x2f/0x40 [nfs] [<ffffffffa0403d77>] nfs_updatepage+0x107/0x4f0 [nfs] [<ffffffffa03f3a6a>] nfs_write_end+0x5a/0x2c0 [nfs] [<ffffffff810e0bb4>] generic_file_buffered_write+0x174/0x2a0 [<ffffffff810e25f0>] __generic_file_aio_write+0x240/0x470 [<ffffffff810e288f>] generic_file_aio_write+0x6f/0xe0 [<ffffffffa03f371a>] nfs_file_write+0xda/0x1e0 [nfs] [<ffffffff8112fb7a>] do_sync_write+0xfa/0x140 [<ffffffff8107fe10>] ? autoremove_wake_function+0x0/0x40 [<ffffffff8100cb4e>] ? apic_timer_interrupt+0xe/0x20 [<ffffffff811bd246>] ? security_file_permission+0x16/0x20 [<ffffffff8112fe78>] vfs_write+0xb8/0x1a0 [<ffffffff81130711>] sys_write+0x51/0x90 [<ffffffff8141c74e>] ? do_device_not_available+0xe/0x10 [<ffffffff8100c11b>] system_call_fastpath+0x16/0x1b INFO: task java:24970 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. java D ffff88189c50e198 0 24970 24239 0x00000000 ffff88205b7339c8 0000000000000086 0000000000000000 000000005ae2d09c 0000000000000286 0000000000000030 ffff88205b733988 000000010174a797 ffff88205acd4a80 ffff88205b733fd8 000000000000e198 ffff88205acd4a80 Call Trace: [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs] [<ffffffff81419df3>] io_schedule+0x73/0xc0 [<ffffffffa03fe02e>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs] [<ffffffff8141a61f>] __wait_on_bit+0x5f/0x90 [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs] [<ffffffff8141a6c8>] out_of_line_wait_on_bit+0x78/0x90 [<ffffffff8107fe50>] ? wake_bit_function+0x0/0x40 [<ffffffff810e11ce>] ? find_get_page+0x1e/0xa0 [<ffffffffa03fe00f>] nfs_wait_on_request+0x2f/0x40 [nfs] [<ffffffffa0403d77>] nfs_updatepage+0x107/0x4f0 [nfs] [<ffffffffa03f3a6a>] nfs_write_end+0x5a/0x2c0 [nfs] [<ffffffff8100c9ce>] ? common_interrupt+0xe/0x13 [<ffffffff810e0bb4>] generic_file_buffered_write+0x174/0x2a0 [<ffffffff810e25f0>] __generic_file_aio_write+0x240/0x470 [<ffffffff81047774>] ? __enqueue_entity+0x84/0x90 [<ffffffff8104f7d5>] ? enqueue_task_fair+0x45/0x90 [<ffffffff810e288f>] generic_file_aio_write+0x6f/0xe0 [<ffffffffa03f371a>] nfs_file_write+0xda/0x1e0 [nfs] [<ffffffff8112fb7a>] do_sync_write+0xfa/0x140 [<ffffffff8107fe10>] ? autoremove_wake_function+0x0/0x40 [<ffffffff811bd246>] ? security_file_permission+0x16/0x20 [<ffffffff8112fe78>] vfs_write+0xb8/0x1a0 [<ffffffff81130711>] sys_write+0x51/0x90 [<ffffffff8100c11b>] system_call_fastpath+0x16/0x1b INFO: task java:24982 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. java D ffff8800283ee198 0 24982 24239 0x00000000 ffff88104f6339c8 0000000000000086 0000000000000000 00000000ffffff10 0000000000000286 0000000000000030 0000000000000282 000000010174a681 ffff88105a592580 ffff88104f633fd8 000000000000e198 ffff88105a592580 Call Trace: [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs] [<ffffffff81419df3>] io_schedule+0x73/0xc0 [<ffffffffa03fe02e>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs] [<ffffffff8141a61f>] __wait_on_bit+0x5f/0x90 [<ffffffffa03fe020>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs] [<ffffffff8141a6c8>] out_of_line_wait_on_bit+0x78/0x90 [<ffffffff8107fe50>] ? wake_bit_function+0x0/0x40 [<ffffffff810e11ce>] ? find_get_page+0x1e/0xa0 [<ffffffffa03fe00f>] nfs_wait_on_request+0x2f/0x40 [nfs] [<ffffffffa0403d77>] nfs_updatepage+0x107/0x4f0 [nfs] [<ffffffffa03f3a6a>] nfs_write_end+0x5a/0x2c0 [nfs] [<ffffffff810e0bb4>] generic_file_buffered_write+0x174/0x2a0 [<ffffffff810e25f0>] __generic_file_aio_write+0x240/0x470 [<ffffffff810e288f>] generic_file_aio_write+0x6f/0xe0 [<ffffffffa03f371a>] nfs_file_write+0xda/0x1e0 [nfs] [<ffffffff8112fb7a>] do_sync_write+0xfa/0x140 [<ffffffff810ffef9>] ? do_wp_page+0x109/0x7c0 [<ffffffff8107fe10>] ? autoremove_wake_function+0x0/0x40 [<ffffffff811bd246>] ? security_file_permission+0x16/0x20 [<ffffffff8112fe78>] vfs_write+0xb8/0x1a0 [<ffffffff81130711>] sys_write+0x51/0x90 [<ffffffff8100c11b>] system_call_fastpath+0x16/0x1b INFO: task hbitimestamp:24427 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. hbitimestamp D ffff88089c40e198 0 24427 24317 0x00000000 ffff88205a0b3c88 0000000000000082 0000000000000000 0000000000000000 ffff88205a0b3ca8 ffffffff814196a8 0000000000000000 00000001017777de ffff88205a6ec8c0 ffff88205a0b3fd8 000000000000e198 ffff88205a6ec8c0 Call Trace: [<ffffffff814196a8>] ? thread_return+0x4e/0x726 [<ffffffff8141aac1>] __mutex_lock_slowpath+0xf1/0x170 [<ffffffff8141a9ab>] mutex_lock+0x2b/0x50 [<ffffffff810e2879>] generic_file_aio_write+0x59/0xe0 [<ffffffffa03f371a>] nfs_file_write+0xda/0x1e0 [nfs] [<ffffffff8112fb7a>] do_sync_write+0xfa/0x140 [<ffffffff810fcb91>] ? __do_fault+0x3e1/0x4c0 [<ffffffff8107fe10>] ? autoremove_wake_function+0x0/0x40 [<ffffffff8116a68f>] ? inotify_inode_queue_event+0x2f/0x120 [<ffffffff811bd246>] ? security_file_permission+0x16/0x20 [<ffffffff8112fe78>] vfs_write+0xb8/0x1a0 [<ffffffff81130711>] sys_write+0x51/0x90 [<ffffffff8141c74e>] ? do_device_not_available+0xe/0x10 [<ffffffff8100c11b>] system_call_fastpath+0x16/0x1b ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-11-05 11:14 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-11-04 15:58 task blocked for more than 120 seconds Sergey Senozhatsky 2010-11-04 16:19 ` Markus Trippelsdorf 2010-11-04 17:32 ` Peter Zijlstra 2010-11-04 18:16 ` Sergey Senozhatsky 2010-11-05 11:14 ` Sergey Senozhatsky -- strict thread matches above, loose matches on Subject: below -- 2010-10-07 23:18 Shawn Bohrer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox